LL(1) Table Conflicts

Most attempts to construct an LL(1) grammar for modern programming languages will fail in minor ways. Nearly all of our current object oriented and procedural programming languages admit only context free grammars that are not LL(1), but generally only in relatively minor ways. That is, after all attempts to create a grammar that is LL(1) by ensuring that the grammar is not left recursive and has no common prefix rules and by attempting to remove all ambiguity in the grammar, there may still be conflicts in the LL(1) table for that grammar. This will become clearer later.

Dealing with Conflicts in LL(1) Tables

Recall that an LL(1) table has a conflict if two or more rules show up in one or more entries of the table. Such conflicts imply that the grammar is not LL(1). Sometimes it is possible to fix the grammar to make it LL(1). Sometimes it is not possible, and workarounds must be designed to handle these cases.

Example of Fixing a grammar to be LL(1)

Suppose you have a grammar that has these two rules.that after constructing the LL(1) table you find that in row <statement> column if you find rules 15 and 16. When looking at rules 15 and 16 in the grammar you see:

15.  <statement> ⟶ if <expression> then <statements> endif 
16.  <statement> ⟶ if <expression> then <statements> else <statements> endif

In this case it is clear why the conflict arose. The keyword if predicts that you can use either rule. That is, if we look in row <statement> column if of the LL(1) table we will find

		if

<statement>		15, 16

The problem is that the right hand sides of both of these rules with <statement> on the left have a common prefix. The common prefix is the substring of nonterminals and tokes

 if <expression> then <statements>

The else or endif that would determine which rule to apply is arbitrarily far away, so this grammar is not LL(k) for any k. We change the grammar by eliminating the common prefix problem by removing the above rules and replacing them with

15.  <statement>     ⟶ if <expression> then <statements> <optional_else> 
16.  <optional_else> ⟶ else <statements> endif 
17.  <optional_else> ⟶ endif

In the new LL(1) table for this changed grammar the old problem has disappeared.

	if	endif	else

<statement>	15
<optional_else>		17	16

When <statement> is to be expanded with lookahead token if, only rule 15 applies. When <optional else> is to be expanded, token else implies that we apply rule 16, whereas token endif implies that we apply rule 17.

Handling Conflicts

The example above illustrates just one case that causes a grammar to be not LL(1). There are other possibilities. In this section we examine different cases and their causes.

If we want to use LL(1) techniques to parse, it is likely that we will not have an LL(1) grammar to start with. There will be conflicts in the table. We might try to clean up the grammar a bit and then regenerate the table, only still to find some conflicts. Some things we can do to clean up a grammar are to:

Remove ambiguity.
Remove left recursion. For example, the grammar may have rules of the form

<expression> ⟶ <expression> + <term>
<expression> ⟶ <term>

The left recursion in this grammar can be changed to right recursion (there is an algorithm for doing this with any grammar), yielding rules that look something like

<expression> ⟶ <term> <expression_tail>
<expression_tail> ⟶ + <term> <expression_tail>
<expression_tail> ⟶ ε
Remove common prefixes. This can be done as demonstrated above.
Use an English description to provide a solution to an inherent problem. The dangling else problem can be solved by stating that every else is associated with the most recent unmatched them.

Rewrite rules. Sometimes one can simply rewrite grammar rules for situations that are not ambiguous, left recursive, or common prefix problems, but which still lead to a conflict


            <statement>                      ⟶ <assignment-statement>
    <statement>                      ⟶ <procedure-statement>
    <assignment-statement>           ⟶ <variable-identifier> := <expression>
    <procedure-statement>            ⟶ <procedure-identifier> <optional-actual-parameter-list>
    <optional-actual-parameter-list> ⟶ ( <actual-parameter> <actual-parameter-tail> )                                     ⟶ ε 
    <variable-identifier>            ⟶ id
    <procedure-identifier            ⟶ id

Use more than one symbol of lookahead. This will work to resolve some conflicts. For example, think of assignment statements and procedure call statements in your project.

    <statement>                      ⟶ <assignment-statement>
    <statement>                      ⟶ <procedure-statement>
    <assignment-statement>           ⟶ <variable-identifier> := <expression>
    <procedure-statement>            ⟶ <procedure-identifier> <optional-actual-parameter-list>
    <optional-actual-parameter-list> ⟶ ( <actual-parameter> <actual-parameter-tail> )                                     ⟶ ε

Note: We can always remove left recursion and common prefixes from a grammar to get a new grammar that produces the same language. However, this does not imply that the new grammar is LL(1). It will be a lot closer to LL(1), though, and is the way to start if the grammar is to be used in LL(1) or recursive descent parsing strategies.

When No Modifications to the Grammar Make the Grammar LL(1)

Look at the above example again. In this case, suppose that your grammar had the following rules.

15.  <statement> ⟶ if <expression> then <statement> 
16.  <statement> ⟶ if <expression> then <statement> else <statement>

The only difference here is that there is no endif keyword. That is, there is no special marker that indicates when an if statement is finished. Pascal and C are languages like this. We can still get rid of the common prefix by using the tack taken above.

15.  <statement> ⟶     if <expression> then <statement> <optional_else> 
16.  <optional_else> ⟶ else <statement> 
17.  <optional_else> ⟶ ε

What does the LL(1) table look like for this?

	if	else

<statement>	15
<optional_else>		16, 17

Notice that the token else predicts the use of rule 16, and that it also predicts the use of rule 17. To see that it predicts the use of rule 17, notice that we have to compute Follow(<optional_else>) to determine what tokens predict the use of rule 17, which in turn means from rule 15 that we need to compute Follow(<statement>>, and that in rule 15,<optional_else> is right behind <statement> and First(<optional_else>) contains else.

There are many other tokens that will predict the use of rule 17. Tokens that can follow <statement> are tokens that begin other statements, such as id (start of an assignment statement), while (start of a loop), if (start of an if statement), end (no more statements), and so forth.

	if	else	end

<statement>	15
<optional_else>	17	16, 17	17

The problem is that this modified grammar, in which we have removed the common prefix of the original rules 15 and 16 is now ambiguous. Consider the following input stream from the scanner:

if id < intlit then if id = intlit then write ( id ) else write ( id )

Consider the following (abbreviated) parse:

                          <statement>
                                |
            ____________________|______________
           |      |        |     |             |
          if <expression> then <statement> <optional else>
                                  |
               ___________________|__________________
              |       |        |      |              |
             if <expression> then <statement> <optional else>

Suppose that everything has been expanded now except the two <optional else> nonterminals. Further, assume that the lookahead is else. Then we could choose to expand either <optional else> to ε and the other to else <statement> and get two different valid parse trees.

                           <statement>
                                |
            ___________________________________
           |      |        |     |             |
          if <expression> then <statement> <optional else>
                                  |             |
                                  |          _________
                                  |         |         |
                                  |        else <statement>
               ______________________________________
              |       |        |      |              |
             if <expression> then <statement> <optional else>
                                                     |
                                                           ε

and

                          <statement>
                                |
            ___________________________________
           |      |        |     |             |
          if <expression> then <statement> <optional else>
                                  |               |
                                  |               ε
               ______________________________________
              |       |        |      |              |
             if <expression> then <statement> <optional else>
                                                  |
                                              _________
                                             |         |
                                            else <statement>

In case 1, the else associates with the first then. In case 2, the else associates with the second then, So, in the LL(1) table for this grammar in row <optional else> column else, either rule 16 or 17 applies. There is no way to fix the grammar to make it LL(1). The language just doesn't allow a way to include else associations with then in an unambiguous fashion (the language is inherently non LL(1)---there is no LL(1) grammar for it).

We can find a non-ambiguous grammar for the language, but we cannot find an LL(1) grammar for it. The following grammar is non-ambiguous:

<statement>              ⟶ <if_then_statement>
<if_then_statement>      ⟶ if <expression> then <if_then_statement>
<if_then_statement>      ⟶ <if_then_else_statement>
<if_then_else_statement> ⟶ if <expression> then <if_then_else_statement> else <statement>
<if_then_else_statement> ⟶ <other_statement>

However, this grammar is not LL(1). It is, however, LR(1). It is possible to parse strings according to this grammar bottom up. The reason is that we find entire right hand sides before reducing them to the left hand side. So, if we were parsing an if statement and we had found if <expression> <then> <other statement> and the lookahead is else, we know that we still don't have the right hand side. Otherwise, we do.

But we don't need to go through all of these contortions for either LL(1) or LR(1) parsing. Go ahead and use the original grammar. If its only ambiguity is in the if statement, this means that the LL(1) table will have a conflict in row <optional_else> column else. Both rules 16 and 17 will show up there. In such a case, the language definition will include, in English, a paragraph that states that any else must be matched with the most recent then. This just means that we remove rule 17 from row <optional_else> column elseleaving

	if	else	end

<statement>	15
<optional_else>	17	16	17