A
control flow graph (CFG) is an abstract
data structure used in
compilers. It is an abstract representation of a
procedure or program, maintained internally by a compiler. Each
node in the
graph represents a
basic block, i.e. a straight-line piece of code without any jumps or jump targets; jump targets start a block, and jumps end a block. Directed
edges are used to represent jumps in the control flow. There are two specially designated blocks: the
entry block, through which control enters into the flow graph, and the
exit block, through which all control flow leaves.
The CFG is essential to several compiler optimizations based on global dataflow analysis (def-use chaining, use-def chaining).
A CFG is a static representation of the program, and represents all alternatives of control flow. So, for example, both arms of an IF
statement are represented in the CFG. A cycle in a CFG may imply that there is a loop in the code (specifically, a cycle caused by a back edge[?] to a dominator). This allows a compiler to detect non-syntactic loops, such as those created with the GOTO
statement. Even if a programmer avoids GOTO
s, passes like tail recursion optimization could introduce non-syntactic loops.
Reachability is another graph property useful in optimization. If a block/subgraph is not connected from the subgraph containing the entry block, that block is unreachable during any execution, and so is dead code; it can be safely removed. If the exit block is unreachable from the entry block, it indicates an infinite loop (not all infinite loops are detectable, of course. See Halting problem). Again, dead code and some infinite loops are possible even if the programmer didn't explicitly code that way: optimizations like constant propagation and constant folding followed by jump threading[?] could collapse multiple basic blocks into one, cause edges to be removed from a CFG, etc., thus possibly disconnecting parts of the graph.
- entry block
- block through which all control flow enters the graph
- exit block
- block through which all control flow leaves the graph
- back edge
- an edge that points to an ancestor in a depth-first (DFS[?]) traversal of the graph
- dominator
- block M dominates block N if every path from the entry that reaches block N has to pass through block M. So, the entry block dominates all blocks.
- postdominator
- block M postdominates block N if every path from N to the exit has to pass through block M. The exit block postdominates all blocks.
- immediate dominator
- block M immediately dominates block N if M dominates N, and there is no intervening block P such that M dominates P and P dominates N. In other words, M is the last dominator on any path from entry to N. Each block has an unique immediate dominator, if it has any at all.
- immediate postdominator
- Analogous to immediate dominator.
- dominator tree
- An anciliary data structure depicting the dominator relationships. There is an arc from Block M to Block N if M is an immediate dominator of N. This graph is a tree, since each block has a unique immediate dominator. This tree is rooted at the entry block. Can be calculated efficiently using Lengauer-Tarjan's algorithm.
- postdominator tree
- Analogous to dominator tree. This tree is rooted at the exit block.
- loop header
- A dominator that is the target of a loop-forming back edge.
- loop pre-header
- Suppose block M is a dominator with several incoming edges, some of them being back edges (so M is a loop header). It is advantageous to several optimization passes to break M up into two blocks Mpre and Mloop. The contents of M and back edges are moved to Mloop, the rest of the edges are moved to point into Mpre, and a new edge from Mpre to Mloop is inserted (so that Mpre is the immediate dominator of Mloop). In the beginning, Mpre would be empty, but passes like loop-invariant code motion[?] could populate it. Mpre is called the loop pre-header, and Mloop would be the loop header.
See also: CFG, compiler optimization
All Wikipedia text
is available under the
terms of the GNU Free Documentation License