We implement the algorithm described in [F90]:
[Analytic Variations on the Common Subexpression Problem] (https://algo.inria.fr/flajolet/Publications/FlSiSt90.pdf) Philippe Flajolet, Paolo Sipala, Jean-Marc Steyaert (1990)
From the abstract:
Any tree can be represented in a maximally compact form as a directed acyclic graph where common subtrees are factored and shared, being represented only once. Such a compaction can be effected in linear time. It is used to save storage in implementations of functional programming languages, as well as in symbolic manipulation and computer algebra systems.
which is exactly what we want.
The heart of the algorithm is a function which computes a unique identifier (UID) for a subtree, such that isomorphic subtrees receive the same UID. If we regard each node of the tree as the root of such a subtree, the UID assignment will have the effect of partitioning the nodes of the tree into equivalence classes. We can then produce a DAG encoding of the original tree by using these class indices.
The paper assumes the input trees are binary, but Emmy expressions often have sums and products in which the +
and *
functions are applied to long strings of arguments. We have observed that such long chains of operands can frustrate this kind of optimization, by hiding common (permuted) subsequences of operand within two similar subexpressions of a commutative operation.
We begin by rewriting our input expression to avoid this. While this will likely generate a tree with more nodes, it exposes each primitive computation to the algorithm which enables a maximal factoring (at the cost, as we will see, of generating code in which operations are limited to two operands. This may produce less compact-looking compiled code, but should be perfectly transparent to the code generators of the host languages.)
This is the heart of the F90 algorithm. We do a post-order traversal of the tree, building an association between equivalence classes of subtree to small integers. We can then represent the maximally factored DAG as a simple sequence of expressions, where integers in argument position represent back-references to the indexed computation emitted previously (that is, in the nth expression, indices in the range [0..n-1] may appear). Thus the computation may be efficiently performed by evaluating the subexpressions in the order reported.
An example will make this process clear:
This produces the sequence
[sin x] [cos x] [* 0 1] [+ 2 2] [+ 3 2]
Which we may interpret as follows:
(sin x)
and (cos x)
are computed and considered values 0 and 1, respectively. To get value 2, we multiply values 0 and 1: (* (sin x) (cos x))
To get value 3, we add value 2 to itself, and to get value 4, we add values 3 and 2. This is the "unrolled form" of the sum of the three identical elements.
What about numbers that may appear in the original expression? We've tweaked the algorithm to stash such numbers into an indexed sub-computation so that there is no ambiguity
We get
(2 [expt a 0] 3 [expt a 2] [+ 1 3])
in which the constants 2 and 3 are stored in values 0 and 2, and referred to by those indices when needed. This is reminiscent of a technique called the "literal pool," used by compilers on architectures which lack immediate-mode instructions, so that values have to be loaded by address. (It does have the side effect of coalescing the use of the same constant more than once in a single variable, but the platform code generators probably do not need our help with that.)