A Lazy Evaluation Graph is a way to perform the lazy evaluation of
values that are highly interdependant in an efficient manor with minimal
effort. The only requirement is that are no cycles in the dependancies.
One does this by building a data structure, and then performing queries
against the data structure. Only the minimal amount of computation
required to answer the query is performed, and all values discovered
while answering the query are saved, and can be reused. Only the minimal
number of recomputations will need be performed if any values are
updated, deleted, or otherwise changed.
This type of structure is very similar to any generic mapping
data structure. It contains keys and values. The difference is that
values are functions to calculate output values. These functions are
allowed to query other values in the graph as part of their computation
process (thus creating the calculation dependancies that form our graph).
The calculation method has a signature that is predefined by the implementation. This method is given a mechanism to query values from our graph by key.
A typical implementation will wrap an existing generic mapping
structure. The keys to this structure wil be identical to the keys used
externally through our interface, but the right hand side values will be
a structure for internal use only. It will hold the calculation method
for this key, the cached result for this method (initially empty), and
a list of values which depend on this one (initially empty). Other
assorted values and flags may be attached as needed for the
implementation.
The costs of different operations depend in part on the costs of operations on our internal map (typically hash table, or red-black tree). These costs can be sumerized as one insert per value added to the graph, one delete per value deleted, a lookup per value invalidated, and a lookup per value queried (directly, or indirectly during a calculation).
Setting values has to deal with two cases. If the value does not
already exist, we create a right-hand side structure, and add it to our
internal map with the given key. If the value does already exist, we
invalidate it (defined below), and update it's right hand structure. In
either case, the value cache and dependancy list for this value will be
empty when we are done.
The cost of an update is driven by the cost of invalidation. If the worst case,
this is N invalidates for a graph containing N nodes. However, if one performs
additional updates (of the same or of different keys) without performing any
queries, then the cumulative cost of all invalidates is N.
The above system does not avoid all possible recalculation. All values which may be affected by a change are invalidated and recalculated. There are several cases that can cause things to be recalculated, but to end up with the same results. The most obvious is that a calculation method may return the same result for multiple inputs. We can't predict this without detailed analysis of each calculation method, which is unsuitable for a generic structure. However, values which are dependant on these 'same result' functions, but not directly on the original value do not need to be recalculated. They will be. If a value is changed, then changed back to it's original value, all dependants will be recalculated when an optimal implementation will simply restore the previously calculated value.
The optimal solution in terms of preventing value calculation is to cache each and every a value is calculated. However, this cache can easily use substantial memory in comparison to the rest of the graph, and can causes unbounded memory growth over the life of the graph.
Maintaining and using this cache is made more complicated by the fact that is must take into account the calculation methods used, when they change, the list of tags that a given calulation depends on (which may not be consistant, even for a given calculation), and the values of each of those tags we depend on. Achieving best performance from this cache is dependant of the memory restrictions, and exact usage patterns of the graph. It is also possible that working with the cache may become more expensive than the calculations that we consider to be our driving expense.
A simpler form of cache is to only cache the last value calculated, and it's inputs, and to store that on our RHS, when a value is re-calculated, it checks it's inputs versus their values for the previous calculation. If they haven't changed, then the value is not recalculated. This does not allow unbounded memory growth (since we only store one cache entry per graph entry) and it is generally easier to manage. However, it still allows the cache usage to be larger than the size of the rest of the graph, since for each cache entry, we must store both the value calculated, and the values that it was based on. Under this scheme, tags can be deleted and replaced (even with different calculation functions) and never cause more than a single recalculation as long as the value calculated is the same.
Another scheme requires no caching of source values (actually you must have a transient cache of a value during it's recalculation), only of source tags. Under this scheme, there are two types of invalidation, soft and hard. All values are invalidated with hard invalidations, but their dependants only receive soft invalidates. Whenever a node is invalidated, it's previous value is not deleted. If a node has been marked with a hard invalidate, it is recalculated. If it's result value is found to change, then the nodes that depend on it receive hard invalidates. When a node with a soft invalidate is recalculated, it first requests all of the values it depends on. If it does not receive a hard invalidate before it is complete, then it uses it's cached value and does not actually recalculate itself. The dependancy list for a node is only cleared after it is recalculated and found to have changed (after it passes along the hard invalidates). This scheme has the possible advantage of reducing the number of value comparisons versus other caching schemes. Since some of our value types may be very expensive to compare (such as video frames or video segments), this may be of substantial value.
A possibly useful extention of the LEG is to create one or more adapters that generate nodes that conform to a known name scheme when they are first referenced. A LEG could contain any number of these adapters as long as there is a scheme in place to avoid tag conflicts.
One specific use for this is to define nodes relative to external data structures. For example, one could define methods to calculate new attributes for certain types of XML nodes, then adapt relative to an XML DOM data structure. For a situation like this, there is no reason to create actual entries in the LEG until the nodes defined relative to the external structure are first referenced. For bonus points, the adapter should be able to track changes to the external data structure and perform the appropriate invalidates/deletes in the graph.