The LL(1) Table

The LL(1) table is at the heart of the recursive descent parsing algorithm. Thus we need to construct this table in order to complete implementation of the parser.

Constructing the LL(1) Table

The LL(1) table is constructed from the application of two major algorithms: First and Follow. Our discussion is focused on our two familiar grammars.

Grammar 2

1.  <start>      ⟶ <expression> eof
2.  <expression> ⟶ <expression> + <term> 
3.  <expression> ⟶ <expression> - <term> 
4.  <expression> ⟶ <term> 
5.  <term>       ⟶ <term> * <factor> 
6.  <term>       ⟶ <term> / <factor> 
7.  <term>       ⟶ <factor> 
8.  <factor>     ⟶ <factor> ^ <primary> 
9.  <factor>     ⟶ <primary> 
10. <primary>    ⟶ identifier 
11. <primary>    ⟶ integer_literal 
12. <primary>    ⟶ ( <expression> )

Grammar 3


   1. <expression>      ⟶  <term> <expression_tail>
   2. <expression_tail> ⟶  + <term> <expression_tail>
   3. <expression_tail> ⟶  - <term> <expression_ail>
   4. <expression_tail> ⟶  ε
   5. <term>            ⟶  <factor> <term_tail>
   6. <term_tail>       ⟶  * <factor> <term_tail>
   7. <term_tail>       ⟶  / <factor> <term_tail>
   8. <term_tail>       ⟶  ε
   9. <factor>          ⟶  <primary><factor_tail>
  10. <factor_tail>     ⟶  ^ <primary> <factor_tail>
  11. <factor_tail>     ⟶  ε
  12. <primary>         ⟶  identifier
  13. <primary>         ⟶  integer_literal
  14. <primary>         ⟶  ( <expression> )

We have examined the construction of the LL(1) table for a grammar intuitively, trying to figure out how we could determine which tokens predict which rules. Of course, in order for the process to be made precise, we need to find algorithms that help us with this process. We focus now on the algorithm First, which is to determine for each nonterminal A and each token a in the grammar, which tokens can appear as the first token in any string derived from A or a. Of course, if we start with a token a, there are no rules to apply to a so First(a) will be {a} for all tokens a.

Grammars with no rules that produce the empty string

We consider first a grammar G1 that has no rules that produce the empty string. That is, assume that there are no rules of the form

A ⟶ ε

in G1. Then every rule in the grammar of the form

A ⟶ α, where α = X1 X2 ... Xn, and each Xi is a single nonterminal or token

to determine what tokens can be arrived at first from A we need to find First(X1).

An algorithm for doing this is

loop for each nonterminal A
   set First(A) = {} -- set First(A) to the empty set
 end loop            -- note that the empty set does not contain anything,
                     -- not even ε
loop for each terminal a
   set First(a) = {a}
end loop

loop while this is the first time through the loop, or any additions have
           been made to the set First(A) for any A during the previous pass
           through the loop

  loop through each rule in the grammar
     -- the current rule being processed has the general form A ⟶ X1 X2...Xn
     -- where A is the nonterminal on the left, and each Xi represents a single
     -- symbol, either a token or nonterminal.

      First(A) = First(A) ∪ First(X1)

  end loop
end loop

Grammars that Contain Rules that Produce the Empty String

Things get a little more complex when empty string rules,

A ⟶ ε

exist in the grammar. Why? Consider a general rule

A ⟶ α, where α = X1 X2 ... Xn

To compute First(A) we must find out what can come first from α (the right hand side of this rule). That would normally be First(X1) if there were no empty string rules in the grammar. In this case it is still First(X1), but if X1 can lead to the empty string, then we must union First(X1) with First(X2), and so on. We quit this unioning process when we reach an (Xi) that cannot derive the empty string. If the entire right hand side can derive the empty string (or is the empty string) then First(A) must also contain the empty string; otherwise not.

The way to do this is to find First(A) for all nonterminals A (we know that First(a) for all terminals a is trivially just {a}.

loop for each nonterminal A
   set First(A) = {} -- set First(A) to the empty set

end loop

loop for each terminal a
   set First(a) = {a}
end loop

loop for each rule in the grammar
  if the rule is A ⟶ε
    then
      set First(A) = {ε}
  end if
end loop

loop while this is the first time through the loop, or any additions have
           been made to set First(A) for any A during the previous pass
           through the loop

  loop through each non-ε rule in the grammar
     -- the current rule being processed has the general form A ⟶ X1 X2...Xn
     -- where A is the nonterminal on the left, and each Xi represents a single
     -- symbol on the right, either a terminal or nonterminal.

    k := 1
      loop while k <= n
        First(A) = First(A) ∪ (First(Xk) - {ε})       
      if ε is not in First(Xk)
        then
          exit this loop
      end if
   k := k + 1
 end loop

     -- At this point we have finished examining the right hand side of the
     --   current rule.  If we have discovered that every symbol on the right hand
     --   side can derive ε, then that means that A can derive ε as well, so we 
     --   must add ε to First(A).
     -- If not every symbol on the right hand side of
     --   this rule can derive ε, then neither can A through application of this
     --   rule, so we should not include ε in First(A) (although it might already
     --   be there for some other reason, and will not be removed)
  if k > n and ε is in First(Xn)
    then
      set First(A) := First(A) ∪ {ε}
  end if

  end loop through

end loop while

-- at this point, first(A) has been computed for every nonterminal A, and
-- First(a) has been set to {a} for each terminal a.

Try this algorithm with just the rules:

A ⟶ Bc
B ⟶ b
B ⟶ e

How many times do you think the outermost while loop will run in the worst case?

If it turns out that First(A) contains ε, this means that A can derive ε, so then we also need to find what tokens could follow A, because the lookahead token returned from the scanner when we are about to expand A will have to come from some symbol that follows A.