Attempting an LR(1) Parse Using an LR(1) Machine
We continued our discussion of LR(1) parsing by looking at some of the issues behind constructing an LR(1) parser.
Just as with LL(1) parsing, an LR(1) parser is constructed by first analyzing the grammar and building data structure that helps us parse. For LL parsing we computed First, Follow, and Predict sets on our way to building the LL(1) table that guides the construction of an LL(1) parser.
For LR(1) parsing we build what is called the LR(1) machine from the grammar. This machine can be used to build two tables that are used in the construction of an LR(1) parser: the action table and the GOTO table.
We examined an LR(1) table first by way of the following example
The Example Grammar (taken from Crafting a Compiler by Fischer and LeBlanc)
S → E $
E → E + T
E → T
T → T * P
T → P
P → id
P → ( E )
Without going into the details of how the LR(1) machine is constructed for this grammar, we first just give it:
We note the following about this LR(1) machine
-
It is essentially a finite state automaton.
-
The states of the LR(1) machine remember which possible rules we are parsing at the moment.
-
The dots in the rules indicate how far along in the parse of a rule we are (i.e., how much of a right hand side of a rule we have possibly found so far): the symbols to the left of the dot are the symbols of the right hand side we have found already; the ones to thre right of the dot are the ones we still have to find in order to have the entire right hand side of the rule.
-
If a dot is all the way to the right end of the right hand side of a rule in some state, this implies that a reduction of the right hand side to the left hand side is possible. At this point, the tokens in the curly braces at the end of the rule come into play: if the lookahead token is one of these tokens in the curly braces, a reduction should be done of the right hand side to the left hand side. Otherwise no reduction (by that rule anyway) can be made.
-
The transitions are labeled with single terminals or single nonterminals
-
Transisitions exist leaving a state for each nonterminal and termminal that has a dot in front of it.
-
Following a transition is an indication that that nonterminal or token has been found during the parse, so we now know the next symbol on the right hand side of the rule in question. Thus, the next state contains this rule with the dot moved past the symbol on the transition we took.
How this LR(1) machine is constructed is not the issue yet, but we can learn how to use it to parse a string. Let's look at our original grammar again.
The Example Grammar
S → E $
E → E + T
E → T
T → T * P
T → P
P → id
P → ( E )
Again, let's parse the string
id + id * id $
where the $ represents the end of file marker.
In applying the LR(1) machine we use a stack. There are two actions to take at each step of an LR(1) parse:
-
SHIFT (place the lookahead symbol onto the stack and move the file pointer to the next token in the token file)
-
REDUCE (reduce a right hand side on the stack to its left hand side).
We use the lookahead symbol to determine which of these two actions to take.
We also keep track of which state we are in by way of the stack.
-
The stack initially starts with state S0 on the stack.
-
Each step of an LR parse has two substeps.
-
Given the state on top of the stack, look at that state in the LR(1) machine.
-
If a dot appears in front of a token in that state, and that token is also the lookahead token, SHIFT (push) the lookahead token onto the stack and move the file pointer to the next lookahead token.
-
If a dot appears at the end of a right hand side in that state and the lookahead token is one of the tokens represented in the cruly braces behind the rule in question, reduce the right hand side of that rule to its left hand side. This is done by popping all of the symbols on the right hand side of the rule in question off of the stack, along with all of the interleaved states on the stack (this will reveal a state on the stack) and pushing its left hand side.
-
After this substep, the top two elements of the stack will be a state (one down on the stack) and either a new token on top of the stack (the one we shifted onto the stack) or a new nonterminal on top of the stack (the one corresponding to the left hand side of the rule by which we just reduced).
-
The second substep pushes a new state on top of the stack that corresponds to following the transition labeled with the symbol on top of the stack from the state one down on the stack.
-
You will need to refer to your notes for the example we went through.
.