Abstract Syntax Trees

The parsing method discussed here does not require that the parse tree be kept around. Note, though, that the parse tree can be reconstructed directly if we keep

a file of the rule numbers used in the parse
the token file

Many parsing strategies require that tree information be kept around for the duration of the compile. Use can often be made of these trees to produce more efficient target code. However, there are a lot of rules in most grammars that are necessary for ensuring that a string has the proper form (syntax) that can be discarded once that determination has been made. This can result in more compact "abstract syntax trees" that can be used as the intermediate output of the compiler. For example, the expression ε

3*a + 5*b

has the following tree according to grammar 3. Note some of the leaves of this tree are labeled "lambda." The software used to generate this tree did not have access to the Unicode character set. Wherever "lambda" occurs the empty string ε (Greek epsilon) should be substituted.

This is one big ugly tree for a relatively small expression. It was necessary to build this tree in order to ensure that the expression met the rules of the grammar, specifically the precedence of the arithmetic operators of the grammar. However, once built, the important information in the tree can be distilled to:

                             +
                            / \
                           /   \
                          /     \
                         *       *
                        / \     / \
                       /   \   /   \ 
                     3     a 5     b

This tree gives in concise form the order of the operations and the operands. In other words, it abstracts the essential syntactical information from the parse tree above, which is why it is called an abstract syntax tree.