Lazy Evaluation Graph

A Lazy Evaluation Graph is a way to perform the lazy evaluation of values that are highly interdependant in an efficient manor with minimal effort. The only requirement is that are no cycles in the dependancies. One does this by building a data structure, and then performing queries against the data structure. Only the minimal amount of computation required to answer the query is performed, and all values discovered while answering the query are saved, and can be reused. Only the minimal number of recomputations will need be performed if any values are updated, deleted, or otherwise changed.

This type of  structure is very similar to any generic mapping data structure. It contains keys and values. The difference is that values are functions to calculate output values. These functions are allowed to query other values in the graph as part of their computation process (thus creating the calculation dependancies that form our graph).

Interface Operations

The calculation method has a signature that is predefined by the implementation. This method is given a mechanism to query values from our graph by key.

Internal Operations

A typical implementation will wrap an existing generic mapping structure. The keys to this structure wil be identical to the keys used externally through our interface, but the right hand side values will be a structure for internal use only. It will hold the calculation method for this key, the cached result for this method (initially empty), and a list of values which depend on this one (initially empty). Other assorted values and flags may be attached as needed for the implementation.

The costs of different operations depend in part on the costs of operations on our internal map (typically hash table, or red-black tree). These costs can be sumerized as one insert per value added to the graph, one delete per value deleted, a lookup per value invalidated, and a lookup per value queried (directly, or indirectly during a calculation).

Set

Setting values has to deal with two cases. If the value does not already exist, we create a right-hand side structure, and add it to our internal map with the given key. If the value does already exist, we invalidate it (defined below), and update it's right hand structure. In either case, the value cache and dependancy list for this value will be empty when we are done.

The cost of an update is driven by the cost of invalidation. If the worst case, this is N invalidates for a graph containing N nodes. However, if one performs additional updates (of the same or of different keys) without performing any queries, then the cumulative cost of all invalidates is N.

Query

Quering values also has to handle two cases. If there is a value in our value cache, we simply return it. Otherwise, we need to calculate it. To do so, we call the user supplied method. If it asks for the values for other keys, we add ourselves to the dependancy list of those other keys and then query them. When the user method returns, we store the result in our value cache.

The worst case for a query is that N calculations are performed. However, if multiple queries (of the same or different keys) are performed without any updates to the graph, then a cumulative total of N calculations will be performed. It should also be noted that if M additional keys are added to a structure which has been fully calculated, then at most M additional calculations are needed to again fully populate the structure.

Remove

To remove a value, we invalidate it, and them remove it from our internal map.

The cost is the cost of the invalidates (identical to the cost of a value update) plus the cost of a removal from our internal map.

Invalidate

To Invalidate a value (referred to several times above) we clear it's cache, invalidate every tag referenced in it's dependancy list, and then clear it's dependancy list.

Extentions

Cached Results

The above system does not avoid all possible recalculation. All values which may be affected by a change are invalidated and recalculated. There are several cases that can cause things to be recalculated, but to end up with the same results. The most obvious is that a calculation method may return the same result for multiple inputs. We can't predict this without detailed analysis of each calculation method, which is unsuitable for a generic structure. However, values which are dependant on these 'same result' functions, but not directly on the original value do not need to be recalculated. They will be. If a value is changed, then changed back to it's original value, all dependants will be recalculated when an optimal implementation will simply restore the previously calculated value.

The optimal solution in terms of preventing value calculation is to cache each and every a value is calculated. However, this cache can easily use substantial memory in comparison to the rest of the graph, and can causes unbounded memory growth over the life of the graph.

Maintaining and using this cache is made more complicated by the fact that is must take into account the calculation methods used, when they change, the list of tags that a given calulation depends on (which may not be consistant, even for a given calculation), and the values of each of those tags we depend on. Achieving best performance from this cache is dependant of the memory restrictions, and exact usage patterns of the graph. It is also possible that working with the cache may become more expensive than the calculations that we consider to be our driving expense.

A simpler form of cache is to only cache the last value calculated, and it's inputs, and to store that on our RHS, when a value is re-calculated, it checks it's inputs versus their values for the previous calculation. If they haven't changed, then the value is not recalculated. This does not allow unbounded memory growth (since we only store one cache entry per graph entry) and it is generally easier to manage. However, it still allows the cache usage to be larger than the size of the rest of the graph, since for each cache entry, we must store both the value calculated, and the values that it was based on. Under this scheme, tags can be deleted and replaced (even with different calculation functions) and never cause more than a single recalculation as long as the value calculated is the same.

Another scheme requires no caching of source values (actually you must have a transient cache of a value during it's recalculation), only of source tags. Under this scheme, there are two types of invalidation, soft and hard. All values are invalidated with hard invalidations, but their dependants only receive soft invalidates. Whenever a node is invalidated, it's previous value is not deleted. If a node has been marked with a hard invalidate, it is recalculated. If it's result value is found to change, then the nodes that depend on it receive hard invalidates. When a node with a soft invalidate is recalculated, it first requests all of the values it depends on. If it does not receive a hard invalidate before it is complete, then it uses it's cached value and does not actually recalculate itself. The dependancy list for a node is only cleared after it is recalculated and found to have changed (after it passes along the hard invalidates). This scheme has the possible advantage of reducing the number of value comparisons versus other caching schemes. Since some of our value types may be very expensive to compare (such as video frames or video segments), this may be of substantial value.

Virtual Nodes

A possibly useful extention of the LEG is to create one or more adapters that generate nodes that conform to a known name scheme when they are first referenced. A LEG could contain any number of these adapters as long as there is a scheme in place to avoid tag conflicts.

One specific use for this is to define nodes relative to external data structures. For example, one could define methods to calculate new attributes for certain types of XML nodes, then adapt relative to an XML DOM data structure. For a situation like this, there is no reason to create actual entries in the LEG until the nodes defined relative to the external structure are first referenced. For bonus points, the adapter should be able to track changes to the external data structure and perform the appropriate invalidates/deletes in the graph.