For version 0.7.0.
To top.  Up: User's guide index

Hierarchical graphs


A hierarchical graph is a graph defined using smaller subgraphs. The mechanism allows a kind of abstraction for constructing large graphs where there is an inherent hierarchical structure.
The hierarchical graph structure also provides a natural way to represent hierarchical structure that has been detected in a graph where the inherent hierarchy was not known a priori.
The design described here is still under development, but you are encouraged to read it nevertheless and provide feedback on how to finalize the design. The last section describes remaining design challanges.
Sections:    Leafs and complexes    Addresses    Interfaces and nodes of a complex    The flat graph    Traversing hierarchical graphs    Open design challenges

Leafs and complexes

A subgraph used as a component in a hierarchical graph is referred to as a complex, and a complex is really just a graph which knows that it is part of a hierarchy, what parent it has in the hierarchy, and what key the parent uses to refer to the complex. Where an ordinary node is introduced in a hierarchy, it is called a leaf node, or simply a leaf. The nodes of a hierarchical graph will be defined later, but each node will refer to some leaf node, and never to a complex.
If a complex is to be thought of as a node at the next level in the hierarchy, it is referred to as a meta node. The term also includes nodes that are not inside any complex.
Leaf nodes and complexes share the same name space for keys, so any complex or leaf node directly present at the top of a hierarchical graph (or at the top of any complex inside the hierarchy) can be uniquely identified by a graph key.
To distinguish between a hierarchical graph that is not a complex of a bigger hierarchy, and a graph that is part of a hierarchy, it is possible to test for g.complex? where g is the graph in question. If it turns out to be a complex, the hierarchical graph one level up in the hierarchy is obtained as g.graph.
In this guide, the keys of complexes will begin with a capital letter, while the keys of leafs will begin with a lowercase letter.

Addresses

It is possible to refer to nodes inside a complex, nodes inside a complex inside a complex, and so forth using the keys of complexes and leaf nodes. For example, let g be the top level of a hierarchical graph, and set
c: [g.find_complex 'A] /** c is a complex **/
n: [c.find_node 'b]    /** n is a leaf node belonging to a complex of the graph g **/
m: [g.find_node [list 'A 'b]] /** Shortcut for finding a leaf node in a hierarchical graph **/
Here, we say that [list 'A 'b] is the address to the node m. For brevity, we will write the address similar to a file system path: A/b. An address of a complex takes the same form as the address of a leaf, so it should be clear from the context what is being referred to. A leaf may also be addressed relative to some complex in the hierarchy, in which case we specify the address of the complex followed by the address of the leaf relative to that complex: A/B:C/n.
We shall use the convention that a node address that is written without separator is referring to a leaf, whereas a node address that is written with a separator is referring to a node in the complex addressed by the first part. That is, both nodes A/B:C/n and A:B/C/n correspond to the same leaf node A/B/C/n. Similarly, if A/B/C is the address of a complex, we use A/B:C to refer to the meta node C in the complex A/B.
A node n appearing at the level where the corresponding leaf is introduced (like A/B/C:n, but not A/B:C/n) can be recognized by testing for n.leaf?.
There are natural mappings between the different kinds of addresses and values in a Shapes program. Node addresses without separator, like A/B/C/n, are mapped to §FlatGraphNode values in the flat graph. Complex addresses without separator, like A/B/C, are mapped to §GraphComplex values, that is, graphs that know where they belong in a hierarchical graph. Node addresses with separator and just a node key after the separator, like A/B/C:n, are mapped to §LeafNode values. Node addresses with separator and at least one complex key after the separator, like A/B:C/n, are mapped to §PropagatedNode values. Finally, complex addresses with separator and just one key after the separator, like A/B:C, are mapped to §MetaNode values. The remaining case, complex addresses with separator and more than one key after the separator, like A:B/C do not correspond to any values in a Shapes program.

Interfaces and nodes of a complex

When a graph is used as a complex in a hierarchy, it must define which nodes that graphs higher up in the hierarchy may connect to. This set of nodes is called the interface of the complex, and the interface is really just a mapping from keys to a subset (which may include all) of the nodes of the complex. The keys of the interface have a separate name space from that of the complexes and leaf nodes.
When a hierarchical graph is constructed from some leaf nodes and some complexes one level down in the hierarchy, the nodes of the hierarchical graph is defined as all the leaf nodes and all the nodes exposed by the interfaces of the complexes. The hierarchical graph may only define edges between these nodes. Other nodes and edges will typically exist deeper down in the hierarchy, but these are not considered part of the top level of the hierarchical graph. When a hierarchical graph is used as a complex inside a bigger hierarchy, the nodes and edges of the complex is defined in the same way.
The interface of a hierarchical graph may expose both its own leaf nodes as well as the nodes deeper down in the hierarchy exposed via the interfaces of contained complexes. This allows a leaf node introduced at some level in the graph to be part of any complex higher up in the hierarchy, as long as it is propagated through the interface at every intermediate level. Conversely, every node of a complex corresponds to a leaf node in that part of the hierarchy, some of which may be introduced in the complex at hand, other introduced deeper down in the hierarchy.
With nodes of a complex corresponding to leafs of the hierarchical graph, an edge between nodes of a complex also corresponds to an edge between leafs.
By default, it is not allowed that nodes in the interface are adjacent. This restriction makes it much easier to reason about the edges in the hierarchical graph.
The interface of a non-hierarchical graph is by default defined as all nodes of degree at most 1, or all nodes if the restriction that nodes in the interface may not be adjacent has been removed.
Note that the node addresses have nothing to do with interfaces or the keys used in interfaces. The leaf A/B/C/n may be present in A/B via the key x in the interface of A/B:C. Given the node A/B:C/n, one must not conclude that there is also a node A:B/C/n; this will only be the case if A:B did expose the node in its interface. Going in the other direction is safe, however; given the node A:B/C/n, there must also be a corresponding node A/B:C/n one level down in the hierarchy. For a node n, the member n.exposed? tells whether it is exposed in the interface. In that case n.up gives the corresponding node one level up in the hierarchy. Similarly, n.down gives the node one level down. Note that it is not possible to use n.up unless n.graph.complex?, or to use n.down if n.leaf?.
It should be noted that although the nodes are nicely defined for each complex in the hierarchy, the edges present in the complex are generally not all the edges that connect the complex' nodes. The missing edges are the ones that may exist between interface nodes. When a graph is used as a complex inside a higher level graph, the nodes in the interface may be connected, or be further exposed in a new interface so that they can become adjacent even higher up in the hierarchy. However, if the higher levels in the hierarchy are disregarded, there are no missing edges — this is ensured by the restriction that nodes in an interface may not be adjacent. It should also be noted that the restriction on nodes in an interface not being adjacent does not impose any constraints on which node hierarchies than can be modeled as hierarchical graphs in Shapes. The restriction only means that edges between nodes in an interface must be placed higher up in the hierarchy — it just gets a bit messy if the restriction forces the edge to be more than one level up.

The flat graph

The flat graph of a hierarhical graph is basically an ordinary (not hierarchical) graph where the nodes correspond to the leaf nodes at levels in the hierarchy. Every edge in the hierarchical graph has a corresponding edge in the flat graph, obtained by considering the edge as an edge between leafs rather than an edge between nodes of a complex.
A leaf node may be present as a node at several levels of the hierarchy, and they all correspond to the same node in the flat graph. That is, we are dealing with a many-to-one mapping that does not have a natural inverse. For any node a in the hierarchical graph, a.flat_node gives the corresponding node in the flat graph. The reverse mapping goes to the node at the level where the leaf was introduced in the hierarchy, and for a any node b in the flat graph, the node in the hierarchy is accessed as b.hierarchy_node.
The nodes of the flat graph are keyed by index.
Graph domain restrictions such as not allowing parallel or undirected edges are interpreted via the flat graph.

Traversing hierarchical graphs

A hierarchical graph may be traversed in several ways. Three fundamentally different ways will be described below, each with its own advantages and limitations.

Meta traversal

A hierarchical graph or any complex in a hierarchical graph may be traversed without considering the details of complexes. A meta edge is the result of grouping together all edges that share the same source and target complex or leaf node. That is, the source and targets of a meta edge are meta nodes. It is similar to a multiedge, but the hierarchical graph is not considered to have parallel edges just because there is a meta edge containing more than one edge. For example, a meta edge in the complex A/B can contain the edges (A/B:C/n, A/B:D/m) and (A/B:C/k, A/B:D/E/l). In this case, the meta edge is considered incident to the meta nodes A/B:C and A/B:D. Using meta edges allows complexes to be treated similar to leaf nodes, and gives the big picture of any complex in the hierarchy.
There are no meta edges that lead deeper down or further up the hierarchy; the meta edges only connect the nodes and complexes directly present in the current complex. Going deeper down the hierarchy is a simple matter of going from A/B:D to the complex A/B/D, which can also be navigated using meta edges. Note, though, that there is no obvious place to start from when navigating A/B/D; the meta edge leading to A/B:D contained several edges that generally won't connect to the same node in A/B/D. In other words, if you need to know where to start the navigation after going deeper down in the hierarchy, using meta edges is not the right option for you.
Going further up in the hierarchy is a simple matter of going from A/B to A:B. It can never fail — even if A/B did not expose any nodes at all, the complex A:B is still present as such in A, it's just that there won't be any meta edges incident to it.

Flat graph traversal

Another important way of navigating a hierarchical graph is by using the flat graph. Consider standing at A/B:C/n, meaning that you are currently looking at the complex A/B and stand at the node C/n. In the flat graph, this is the node A/B/C/n, so once we switch to the flat graph we no longer keep track of levels in the hierarchy in the same way as with the other approaches to navigation.
Once in the flat graph, following normal graph edges is the only thing we can do. For example, following an edge may lead to A/X/k. For some applications, just staying in the flat graph may be sufficient, while other applications may require that the traversal in the flat graph is mapped back to the hierarchical graph. It is then up to us if we want to think of this as A/X:k or A:X/k. Since A:X/k could be a vacuous reference in the sense that A/X/k is not among the nodes of A, the only safe bet is to use A/X:k. Recall that this node is directly accessible from a node b in the flat graph, as b.hierarchy_node.

Hierarchy traversal

Navigating a hierarchical graph on the node level without the flat graph comes down to navigating individual complexes, and moving up and down the hierarchy to reach other complexes. If the current position in the graph is A/B:C/n, these are the moves to consider:
  1. Follow an edge in A/B, leading to another node in A/B, for example m, D/o, or D/E/p.
  2. Enter the complex C by going to A/B/C:n.
  3. Check whether the node is also present higher up in the hierarchy, and in that case go to A:B/C/n

Open design challenges

The design for hierarchical graphs described here is not yet final. There is still one open design challenge that needs to be worked out before going ahead and implementing. Suggestions for how to tackle this challenge would be most welcome.

Becoming part of a hierarchy requires duplication

Consider using a graph (hierarchical or not) as a complex in a hierarchy. Per the current design, the graph by itself and the corresponding graph in the hierarchy are — in a way — completely different:
This means that it will never be possible to reuse an existing hierarchy when it is included in a bigger hierarchy; every single node will have to be duplicated to reflect its new belonging, and therefore all the edges as well. In a hierarchy with many layers, this means that nodes and edges in the low layers will be duplicated several times before ending up in the final hierarchy. If the code is reasonably written, a garbage collector will be able to get rid of the intermediate copies, but there would still be a whole lot of costly copying going on. One would need something like linear types (which is not even planned for Shapes) to even have the chance to modify the nodes in place by means of tedious and error-prone optimizations in the implementation. (Only relying on reference counting will not work, as the graph being included in the hierarchy will typically be bound to a variable to make the code readable, so the environment will hold one reference that prevents us from doing modifications in place. To get around the problem with the reference, we would need some language feature that would enforce that the variable holding the intermediate graph cannot be referred to after being used in the construction of the hierarchical graph.)
Besides being a performance issue, it may also be surprising from a semantical point of view that the hierarchy doesn't really consist of the graphs that it was built from.
Are there any variations on the design that will allow a graph with all its nodes and edges to be used as is when included in a bigger hierarchy? What would be the drawbacks of such a design?
Get Shapes at SourceForge.net. Fast, secure and Free Open Source software downloads