Follow Sets, Predict Sets, and the LL(1) Table


Predict Which Rule to Apply

Recall what our eventual goal is:  the construction of the LL(1) table for a programming language grammar.  To construct that table we need to know for each production rule in the grammar the set of tokens that predict that this rule should be applied at this point in a leftmost parse.  For example, consider three rules that each have the nonterminal A on the left hand side.  When we are about to expand an A in a leftmost parse, which of these three rules should we use?


  5. A ⟶ α1
  6. A ⟶ α2
  7. A ⟶ α3

We decide which rule to apply by computing the First set for each of these right hand side.  For example, the tokens in First(α1) will predict that we should apply rule 5 if we are about to expand the nonterminal A in a leftmost, top-down derivation and the lookahead is one of the tokens in First(α1). Similarly, First(α2) will predict that we should use rule 6 and First(α3) will predict that we should use rule 7.  Thus we can say that


  Predict(5. A ⟶ α1) = First(α1)
  Predict(6. A ⟶ α2) = First(α2)
  Predict(7. A ⟶ α3) = First(α3)
  

That is, the tokens that predict that we should use the rule


  5. A ⟶ α1
  

Are precisely the tokens in

  First(α1)
  

and so on.  In the LL(1) table for this grammar, in row A and in every column that is labeled with one of the tokens in First(α1) we put a 5.  Similarly we put a 6 in every column of this row that is labeled with a token that is a member of First(α2).  We do the same for rule 7.

There is one glitch.  Taking the first rule as an example, if ε happens to be in First(α1) then, as we have seen, there will be tokens in Follow(A) that could predict that A ⟶ α1 should be the rule applied.  That is, the real set of predicting tokens for this rule is

if ε is in First(α1)
   then Predict(A ⟶ α1) = First(α1) union Follow(A)
   else Predict(A ⟶ α1) = First(α1)

The analogous thing holds true for each rule in the grammar.  So, this means that in addition to computing the First sets for every symbol in the grammar, we also need to compute the Follow sets for each nonterminal.  Once we have computed the First and Follow sets for the grammar we can easily determine which tokens predict which rules to apply.

 

Follow Sets

We look now at the follow set algorithm.

First, we note that to find what can follow a particular nonterminal, say A, we must find where A appears on the right hand side of rules in the grammar.  Consider grammar 3 that we have been using for much of our discussions.


   0. <sysgoal>         ⟶  <expression> eof
   1. <expression>      ⟶  <term> <expression_tail>
   2. <expression_tail> ⟶  + <term> <expression_tail>
   3. <expression_tail> ⟶  - <term> <expression_ail>
   4. <expression_tail> ⟶  ε
   5. <term>            ⟶  <factor> <term_tail>
   6. <term_tail>       ⟶  * <factor> <term_tail>
   7. <term_tail>       ⟶  / <factor> <term_tail>
   8. <term_tail>       ⟶  ε
   9. <factor>          ⟶  <primary><factor_tail>
  10. <factor_tail>     ⟶  ^ <primary> <factor_tail>
  11. <factor_tail>     ⟶  ε
  12. <primary>         ⟶  identifier
  13. <primary>         ⟶  integer_literal
  14. <primary>         ⟶  ( <expression> )  

If we want to find Follow of <expression> we have to find each occurrence of <expression> on the right hand side of the rules in the grammar.  We mark them here in red.


   0. <sysgoal>         ⟶  <expression> eof
   1. <expression>      ⟶  <term> <expression_tail>
   2. <expression_tail> ⟶  + <term> <expression_tail>
   3. <expression_tail> ⟶  - <term> <expression_tail>
   4. <expression_tail> ⟶  ε
   5. <term>            ⟶  <factor> <term_tail>
   6. <term_tail>       ⟶  * <factor> <term_tail>
   7. <term_tail>       ⟶  / <factor> <term_tail>
   8. <term_tail>       ⟶  ε
   9. <factor>          ⟶  <primary><factor_tail>
  10. <factor_tail>     ⟶  ^ <primary> <factor_tail>
  11. <factor_tail>     ⟶  ε
  12. <primary>         ⟶  identifier
  13. <primary>         ⟶  integer_literal
  14. <primary>         ⟶  ( <expression> )  

To find out what can follow <expression> then, we just take the First set of what follows <expression> in these rules.  Notice that this means First("eof") from rule 0 and First(")") from rule 14.  Thus, an eof or a right parenthesis can follow <expression>.

Computing Follow sets for a nonterminal is usually not quite that easy.  Consider computing the follow set for the nonterminal <term>.  First look for all instances of <term> on the right hand side of the rules in this grammar.

0.  <start>           ⟶ <expression> eof 
1. <expression> ⟶ <term> <expression_tail>
2. <expression_tail> ⟶ + <term> <expression_tail>
3. <expression_tail> ⟶ - <term> <expression_tail>
4. <expression_tail> ⟶ ε
5. <term> ⟶ <factor> <term_tail>
6. <term_tail> ⟶ * <factor> <term_tail>>
7. <term_tail> ⟶ / <factor> <term_tail>
8. <term_tail> ⟶ ε
9. <factor> ⟶ <primary> <factor_tail>
10. <factor_tail> ⟶ ^ <primary> <factor_tail>
11. <factor_tail> ⟶ ε
12. <primary> ⟶ identifier
13. <primary> ⟶ integer_literal 14. <primary> ⟶ ( <expression> )

To compute Follow(<term>) we see that we must compute First(<expression_tail>) in every instance where <term> appears on the right hand side of a rule.  Remember, though, that we have already figured out how to compute First sets. First(<expression_tail>) = {"+", "-", and ε}, where ε stands for the empty string.  So, what can follow the <term> nonterminal in a valid parse are the tokens "+" and "-" for sure.  What about the ε?  Only tokens can be in a follow set.  Since <expression_tail> can derive ε, this means that Follow(<term>) also contains symbols that can follow the left hand side of the rule in which <term> appears; in this case, this means that Follow(<term>) contains "+", "-", and Follow (<expression_tail>).  Thus, this looks like some kind of iterative process.

An algorithm suggests itself as follows.

for each nonterminal A in grammar G
     Follow(A) := {} -- the empty set
end for loop
Follow(S) := {eof}, where S is the start symbol of G
while there are changes to any Follow set loop
    for each rule A ⟶ X1 X2 ... Xn in grammar G loop
      for each nonterminal Xi in the current rule loop
         Follow(Xi) := Follow(Xi) union First(Xi+1 Xi+2 ... Xn) - {ε}
         -- if i = n, First(Xi+1 Xi+2 ... Xn) = {ε}(the set containing the empty string)
         if ε is in First(Xi+1 Xi+2 ... Xn) then
            Follow(Xi) := Follow(Xi) union Follow(A)
         end if
      end for
    end for
end while

LL(1) Table

Once the predict sets for each rule in the grammar have been calculated, building the LL(1) table is trivial.  For each noterminal A in the grammar, collect all of the rules with A on the left and their predict sets, such as


  Predict(5. A ⟶ α1)
  Predict(6. A ⟶ α2)
  Predict(7. A ⟶ α3) 
  

Then in row A fill in the columns as follows:

  1. for each token a in Predict(A ⟶ α1) put 5 in row A, column a in the LL(1) table.
  2. for each token a in Predict(A ⟶ α2) put 6 in row A, column a in the LL(1) table.
  3. for each token a in Predict(A ⟶ α3) put 7 in row A, column a in the LL(1) table.

To check that the grammar is LL(1), look at each cell in the table. 

  • either the rules involved may be able to be rewritten to resolve the conflict, making the grammar LL(1), or
  • these conflicts can usually be resolved by using some other non-LL(1) technique when the nonterminal with the conflict is about to be expanded in a parse. 

Completing the Parser Stubs

Once the LL(1) table has been finished you can easily complete your parser stubs by putting on each clause of the case statement for the rules in each method the tokens that predict that you expand by that rule.