ToC

Emmy Pattern Matching

The library is built out of a few stacked ideas: matcher combinators, matchers, consequences and rules.

Matcher Combinators

  • A "matcher combinator" is a function that takes three arguments:

[binding-frame data success-continuation]

and either returns nil or false, or calls its success-continuation with a potentially-updated binding map.

Matcher

A matcher is a function of a data input that can succeed or fail. If it succeeds, it will return the binding map generated by the matching process.

On failure, a matcher returns a special failure singleton object. You can check for this object using [[fail?]].

Consequence Functions

A consequence is a function from the binding map returned by a matcher to a final form. A consequence can fail by returning false, nil or the special failure singleton.

Rule

A rule is a function of data built from a pair of:

  • a matcher
  • a consequence function.

If the matcher fails on the rule's input, the whole rule returns failure. If the match succeeds, the rule calls the consequence function with the matcher's binding map. If THIS function succeeds, the rule returns that value.

If the consequence function returns nil or false (or failure), the whole rule fails with failure.

Combinators

emmy.pattern.match contains many matcher combinators. These are either functions with the contract described above, or functions that take one or more combinators and return a new combinator with the same contract. Examples are [[or]], [[not]], [[and]], [[match-when]] and more below.

emmy.pattern.rule contains many rule combinators, which are either primitive rules or functions from zero or more rules to a new rule.

Finally, emmy.pattern.syntax defines a small pattern language. Any matcher combinator that take another matcher can take a pattern instead.

Basic Matcher Combinators

(defn fail
"Matcher which will fail for any input."
[_ _ _])
#object[emmy.pattern.match$fail 0x4c149077 "
emmy.pattern.match$fail@4c149077"
]
(defn pass
"Matcher that succeeds (with no new bindings) for any input, passing along its
input frame."
[frame _ succeed]
(succeed frame))
#object[emmy.pattern.match$pass 0x846d357 "
emmy.pattern.match$pass@846d357"
]
(defn with-frame
"Takes a `new-frame` of bindings and returns a matcher that will ignore its
input and always succeed by replacing the current map of bindings with
`new-frame`."
[new-frame]
(fn [_ _ succeed]
(succeed new-frame)))
#object[emmy.pattern.match$with_frame 0x148ef3e1 "
emmy.pattern.match$with_frame@148ef3e1"
]
(defn update-frame
"Takes a function from `frame` to a new frame (or false) and any number of
arguments `args`, and returns a matcher that will ignore its input and
- succeed with `(apply f frame args)` if that value is truthy,
- fail otherwise."
[f & args]
(fn [frame _ succeed]
(when-let [new-frame (apply f frame args)]
(succeed new-frame))))
#object[emmy.pattern.match$update_frame 0x1f242a00 "
emmy.pattern.match$update_frame@1f242a00"
]
(defn predicate
"Takes a predicate function `pred` and returns a matcher that succeeds (with no
new bindings) if its data input passes the predicate, fails otherwise."
[pred]
(fn predicate-match [frame data succeed]
(core/and (pred data)
(succeed frame))))
#object[emmy.pattern.match$predicate 0x77db96bb "
emmy.pattern.match$predicate@77db96bb"
]
(defn frame-predicate
"Takes a predicate function `pred` and returns a matcher that succeeds (with no
new bindings) if its data input passes the predicate, fails otherwise."
[pred]
(fn frame-pred [frame _ succeed]
(core/and (pred frame)
(succeed frame))))
#object[emmy.pattern.match$frame_predicate 0x49392f7c "
emmy.pattern.match$frame_predicate@49392f7c"
]
(defn eq
"Takes some input `x` and returns a matcher which succeeds if its data input is
equal to `x` (via `=` or the optional `eq-fn` argument). Fails otherwise.
The frame is not modified."
([x] (eq x =))
([x eq-fn]
(predicate
(fn [other]
(eq-fn x other)))))
#object[emmy.pattern.match$eq 0x2acbcc41 "
emmy.pattern.match$eq@2acbcc41"
]
(defn bind
"Takes a binding variable `sym` and an optional predicate `pred`, and returns a
matcher that binds its input to `sym` in the returned `frame`.
The returned matcher only succeeds if `input` passes `pred`.
If `sym` is already present in `frame`, the matcher only succeeds if the
values are equal, fails otherwise.
NOTE: If `sym` is the wildcard `_`, the returned matcher will not introduce a
new binding, but _will_ still check the predicate."
([sym]
(bind sym (fn [_] true)))
([sym pred]
(if (s/wildcard? sym)
(predicate pred)
(fn bind-match [frame data succeed]
(when (pred data)
(if-let [[_ binding] (find frame sym)]
(core/and (= binding data)
(succeed frame))
(succeed (assoc frame sym data))))))))
#object[emmy.pattern.match$bind 0x14818732 "
emmy.pattern.match$bind@14818732"
]

Matcher Combinators

This section introduces functions that are able to build new matcher combinators out of the primitive matcher combinators defined above.

Each of the following functions can take EITHER a matcher combinator or a "pattern". The emmy.pattern.syntax is described in emmy.pattern.syntax.

As an example, you might provide the symbol '?x instead of an explicit (bind '?x):

(comment
(let [m (match-if odd? '?odd '?even)]
(= [{'?odd 11} {'?even 12}]
[(m {} 11 identity)
(m {} 12 identity)])))
nil
(declare pattern->combinators)
#object[emmy.pattern.match$pattern__GT_combinators 0x3f44b557 "
emmy.pattern.match$pattern__GT_combinators@3f44b557"
]
(defn match-when
"Returns a matcher that passes its `frame` on to `success-pattern` if `pred`
succeeds on its data input, fails otherwise."
[pred success-pattern]
(let [match (pattern->combinators success-pattern)]
(fn [frame xs success]
(when (pred xs)
(match frame xs success)))))
#object[emmy.pattern.match$match_when 0x3f863f0c "
emmy.pattern.match$match_when@3f863f0c"
]
(defn match-if
"Returns a matcher that passes its `frame` on to `success-pattern` if `pred`
succeeds on its data input, `fail-pattern` otherwise.
If no `fail-matcher` is supplied, the behavior is equivalent
to [[match-when]]."
([pred success-pattern]
(match-when pred success-pattern))
([pred success-pattern fail-pattern]
(let [s-match (pattern->combinators success-pattern)
f-match (pattern->combinators fail-pattern)]
(fn [frame xs success]
(if (pred xs)
(s-match frame xs success)
(f-match frame xs success))))))
#object[emmy.pattern.match$match_if 0x4bdb4579 "
emmy.pattern.match$match_if@4bdb4579"
]
(defn or
"Takes a sequence of patterns, and returns a matcher that will apply its
arguments to each matcher in turn. Returns the value of the first pattern that
succeeds."
([] fail)
([pattern] (pattern->combinators pattern))
([pattern & more]
(let [matchers (map pattern->combinators (cons pattern more))]
(fn call [frame xs succeed]
(some #(% frame xs succeed)
matchers)))))
#object[emmy.pattern.match$or 0x53fbe327 "
emmy.pattern.match$or@53fbe327"
]
(defn and
"Takes a sequence of patterns and returns a matcher that will apply its
arguments to the first pattern;
If that match succeeds, the next pattern will be called with the new, returned
frame (and the original data and success continuation).
The returned matcher succeeds only of all patterns succeed, and returns the
value of the final pattern."
([] pass)
([pattern] (pattern->combinators pattern))
([pattern & more]
(let [matchers (map pattern->combinators (cons pattern more))]
(fn [frame xs succeed]
(reduce (fn [acc matcher]
(if acc
(matcher acc xs succeed)
(reduced acc)))
frame
matchers)))))
#object[emmy.pattern.match$and 0x490f1a02 "
emmy.pattern.match$and@490f1a02"
]
(defn not
"Takes a `pattern` and returns a matcher that will apply its arguments to the
`pattern`. The returned pattern will succeed with the original frame if
`pattern` fails, and fail if `pattern` succeeds."
[pattern]
(let [match (pattern->combinators pattern)]
(fn [frame xs succeed]
(when-not (match frame xs succeed)
(succeed frame)))))
#object[emmy.pattern.match$not 0x47ff2270 "
emmy.pattern.match$not@47ff2270"
]

Lists and Segments

Segment variables introduce some additional trouble. Unlike other matchers, a segment variable is not tested against a fixed input, but against a sequence such that it may match any prefix. This means that in general, segment variables must search, trying one match and possibly backtracking.

There are two circumstances when the search can be avoided:

  • if the variable is already bound, the bound value needs to be checked against the input data, and will either fail or succeed.

  • If the segment variable is the last matcher in its enclosing list (which actually happens quite often!) then the segment matcher can match the entire remaining segment, no search required.

This requires a different interface for the continutation. Segment matchers pass TWO arguments into their success continuation - the binding frame, and the remaining unmatched segment.

The following two functions let us mark matcher combinators with this interface using their metadata.

(defn as-segment-matcher
"Takes a matcher and returns `f` with its metadata modified such
that [[segment-matcher?]] will return `true` when applied to `f`."
[f]
(vary-meta f assoc ::segment? true))
#object[emmy.pattern.match$as_segment_matcher 0x569e577b "
emmy.pattern.match$as_segment_matcher@569e577b"
]
(defn- segment-matcher?
"Returns true if the supplied matcher `f` is a segment matcher, false
otherwise."
[f]
(::segment? (meta f) false))
#object[emmy.pattern.match$segment_matcher_QMARK_ 0x5a707454 "
emmy.pattern.match$segment_matcher_QMARK_@5a707454"
]
(defn segment
"Takes a binding variable `sym` and returns a matcher that calls its success
continuation with successively longer prefixes of its (sequential) data input
bound to `sym` inside the frame.
If `sym` is already present in the frame, the returned matcher only succeeds
if the bound value is a prefix of the data argument `xs`.
If `sym` matches the wildcard symbol `_`, the behavior is the same, but no new
binding is introduced.
NOTE: the returned matcher will call its success continuation with TWO
arguments; the new frame and the remaining elements in `xs`. This is a
different contract than all other matchers, making `segment` appropriate for
use inside `sequence`."
([sym]
(segment sym (constantly true)))
([sym pred]
(as-segment-matcher
(fn segment-match [frame xs succeed]
(let [xs (core/or xs [])]
(when (sequential? xs)
(if-let [binding (core/and
(core/not (s/wildcard? sym))
(frame sym))]
(when (pred binding)
(let [binding-count (count binding)]
(when (= (take binding-count xs) binding)
(succeed frame (drop binding-count xs)))))
(loop [prefix []
suffix xs]
(core/or
(core/and (pred prefix)
(let [new-frame (if (s/wildcard? sym)
frame
(assoc frame sym prefix))]
(succeed new-frame suffix)))
(core/and (seq suffix)
(recur (conj prefix (first suffix))
(next suffix))))))))))))
#object[emmy.pattern.match$segment 0x6b1eba12 "
emmy.pattern.match$segment@6b1eba12"
]
(defn- entire-segment
"Similar to [[segment]], but matches the entire remaining sequential argument
`xs`. Fails if its input is not sequential, or `sym` is already bound to some
other variable or non-equal sequence.
If `sym` matches the wildcard symbol `_`, succeeds if `xs` is a sequence and
introduces NO new bindings.
Calls its continuation with the new frame and `nil`, always."
([sym]
(entire-segment sym (constantly true)))
([sym pred]
(as-segment-matcher
(fn entire-segment-match [frame xs succeed]
(let [xs (core/or xs [])]
(when (core/and (sequential? xs) (pred xs))
(if (s/wildcard? sym)
(succeed frame nil)
(if-let [binding (frame sym)]
(when (= xs binding)
(succeed frame nil))
(succeed (assoc frame sym xs) nil)))))))))
#object[emmy.pattern.match$entire_segment 0x554cf9a3 "
emmy.pattern.match$entire_segment@554cf9a3"
]
(defn reverse-segment
"Returns a matcher that takes a binding variable `sym`, and succeeds if it's
called with a sequential data argument with a prefix that is the REVERSE of
the sequence bound to `sym` in `frame`.
Fails if any of the following are true:
- `sym` is not bound in the frame
- `sym` is bound to something other than a vector prefix created by `segment`
- the data argument does not have a prefix matching the reverse of vector
bound to `sym`."
([sym]
(reverse-segment sym (constantly true)))
([sym pred]
(as-segment-matcher
(fn reverse-segment-match [frame xs succeed]
(let [xs (core/or xs [])]
(when (sequential? xs)
(when-let [binding (frame sym)]
(when (vector? binding)
(let [binding-count (count binding)
reversed (rseq binding)]
(when (core/and (= (take binding-count xs) reversed)
(pred xs))
(succeed frame (drop binding-count xs))))))))))))
#object[emmy.pattern.match$reverse_segment 0x714d49fe "
emmy.pattern.match$reverse_segment@714d49fe"
]
(defn sequence*
"Version of [[sequence]] that takes an explicit sequence of `patterns`, vs the
multi-arity version. See [[sequence]] for documentation."
[patterns]
(fn sequence-match [frame xs succeed]
(when (sequential? xs)
(letfn [(step [frame items matchers]
(letfn [(try-elem [matcher]
(matcher frame
(first items)
(fn [new-frame]
(step new-frame
(next items)
(next matchers)))))
(try-segment [matcher]
(matcher frame
items
(fn [new-frame new-xs]
(step new-frame
new-xs
(next matchers)))))]
(cond matchers (let [m (first matchers)]
(if (segment-matcher? m)
(try-segment m)
(core/and (seq items)
(try-elem m))))
(seq items) false
:else (succeed frame))))]
(let [matchers (map pattern->combinators patterns)]
(step frame xs matchers))))))
#object[emmy.pattern.match$sequence_STAR_ 0x28c65157 "
emmy.pattern.match$sequence_STAR_@28c65157"
]
(defn sequence
"Takes a sequence of patterns and returns a matcher that accepts a sequential
data input, and attempts to match successive items (or segments) in the
sequence with the supplied patterns.
The returned matcher succeeds if `patterns` can consume all elements, fails
otherwise (or of any of the supplied patterns fails on its argument).
On success, the returned matcher calls its success continuation with a frame
processed by each pattern in sequence."
[& patterns]
(sequence* patterns))
#object[emmy.pattern.match$sequence 0x3696a077 "
emmy.pattern.match$sequence@3696a077"
]

Emmy.Pattern.Matching Compiler

The next function transforms a pattern (as defined by emmy.pattern.syntax) into a matcher combinator. Any function you pass to [[pattern->combinators]] is returned, so it's appropriate to pass other matcher combinators as pattern elements.

(defn pattern->combinators
"Given a pattern (built using the syntax elements described in
`emmy.pattern.syntax`), returns a matcher combinator that will successfully
match data structures described by the input pattern, and fail otherwise."
[pattern]
(cond (fn? pattern) pattern
(s/binding? pattern)
(bind (s/variable-name pattern)
(s/restriction pattern))
(s/segment? pattern)
(segment
(s/variable-name pattern)
(s/restriction pattern))
(s/reverse-segment? pattern)
(reverse-segment
(s/reverse-segment-name pattern)
(s/restriction pattern))
(s/wildcard? pattern) pass
(core/or (seq? pattern)
(vector? pattern))
(if (empty? pattern)
(eq pattern)
(sequence*
(concat (map pattern->combinators (butlast pattern))
(let [p (last pattern)]
[(if (s/segment? p)
(entire-segment
(s/variable-name p)
(s/restriction p))
(pattern->combinators p))]))))
:else (eq pattern)))
#object[emmy.pattern.match$pattern__GT_combinators 0x3f44b557 "
emmy.pattern.match$pattern__GT_combinators@3f44b557"
]

This concludes the matcher combinator section of our program. On to the next act: the "matcher"!

Top Level Matchers

Once you've built up a combinator out of smaller matcher combinators, you can turn your combinator into a "matcher". This is a function from a data object to either:

  • the binding map, if successful
  • if failed, a special failure singleton object.

This interface will become important in emmy.rule, for building up groups of rules that can, say, search for the first successful matcher of many, or accumulate binding maps from matchers run in sequence until one fails.

The next few functions define this explicit failure singleton.

(defrecord Failure [])
emmy.pattern.match.Failure
(def failure
"Singleton object representing the failure of a matcher to match its input.
Check for failure with [[failed?]]"
(Failure.))
{}
(defn failed?
"Returns true if `x` is equivalent to the failure sentinel [[failure]], false
otherwise."
[x]
(instance? Failure x))
#object[emmy.pattern.match$failed_QMARK_ 0x4a5c6e7 "
emmy.pattern.match$failed_QMARK_@4a5c6e7"
]
(defn matcher
"Takes a `pattern` or matcher combinator, and returns a function from a data
object to either:
- A successful map of bindings extracted by matching the supplied `pattern` or
combinator to the input data
- An explicit `failure` object
Check for failure with [[failed?]].
Optionally, you can supply a predicate `pred`. `pred` takes the map of
bindings from a successful match and returns either:
- `nil`, `false` or the explicit `failure` object to force a match failure,
potentially causing a backtrack back into the data
- a map of NEW bindings to merge into the binding map (and signal success)
Any other truthy value signals success with no new bindings."
([pattern]
(let [match (pattern->combinators pattern)]
(fn [data]
(core/or (match {} data identity)
failure))))
([pattern pred]
(let [match (pattern->combinators pattern)
success (fn [frame]
(when-let [m (pred frame)]
(when (core/and m (not (failed? m)))
(if (map? m)
(merge frame m)
frame))))]
(fn [data]
(core/or (match {} data success)
failure)))))
#object[emmy.pattern.match$matcher 0x1f3c6d34 "
emmy.pattern.match$matcher@1f3c6d34"
]
(defn match
"Convenience function that creates a matcher from the supplied `pattern` (and
optional predicate `pred`) and immediately applies it to `data`.
Equivalent to:
```clojure
((matcher pattern pred) data)
```"
([pattern data]
((matcher pattern) data))
([pattern pred data]
((matcher pattern pred) data)))
#object[emmy.pattern.match$match 0x5eacb6d "
emmy.pattern.match$match@5eacb6d"
]
(defn foreach-matcher
"Takes a `pattern` and side-effecting callback function `f`, and returns a
matcher that calls `f` with a map of bindings for every possible match of
`pattern` to its input data.
For a convenience function that applies the matcher to data immediately,
see [[foreach]].
NOTE: If you pass a segment matcher, `f` must accept two arguments - the
binding map, and the sequence of all remaining items that the segment
matcher rejected."
[pattern f]
(let [match (pattern->combinators pattern)
cont (fn ([frame]
(f frame)
false)
([frame xs]
(f frame xs)
false))]
(fn [data]
(match {} data cont))))
#object[emmy.pattern.match$foreach_matcher 0x6fc56bf "
emmy.pattern.match$foreach_matcher@6fc56bf"
]
(defn foreach
"Convenience function that creates a [[foreach-matcher]] from the supplied
`pattern` and callback `f` and immediately applies it to `data`.
Equivalent to:
```clojure
((foreach-matcher pattern pred) data)
```"
[pattern f data]
((foreach-matcher pattern f) data))
#object[emmy.pattern.match$foreach 0x2cc43a79 "
emmy.pattern.match$foreach@2cc43a79"
]
(defn all-results-matcher
"Takes a `pattern` and callback function `f`, and returns a matcher that takes a
`data` argument and returns a sequence of every possible match of `pattern` to
the data.
For a convenience function that applies the matcher to data immediately,
see [[all-results]].
NOTE: If you pass a segment matcher, `f` must accept two arguments - the
binding map, and the sequence of all remaining items that the segment
matcher rejected."
[pattern]
(let [match (pattern->combinators pattern)]
(fn [data]
(let [results (atom [])
cont (fn
([frame]
(swap! results conj frame)
false)
([frame xs]
(swap! results conj [frame xs])
false))]
(match {} data cont)
@results))))
#object[emmy.pattern.match$all_results_matcher 0x506db718 "
emmy.pattern.match$all_results_matcher@506db718"
]
(defn all-results
"Convenience function that creates an [[all-results-matcher]] from the supplied
`pattern` and immediately applies it to `data`.
Equivalent to:
```clojure
((all-results-matcher pattern pred) data)
```"
[pattern data]
((all-results-matcher pattern) data))
#object[emmy.pattern.match$all_results 0x30fafdb "
emmy.pattern.match$all_results@30fafdb"
]