Generating Code for Expressions
Last time we discussed how we could translate an assignment statement into code. This time we dig deeper: how is code for an entire assignment statement generated?
Expressions Translation
Example 1
At first glance, you might think that translating expressions into intermediate code that, when executed, would evaluate those expressions is going to be a very complex task. After all, an expression like the right hand side of this assignment statement
a := x * ((a + b) / 3 - (a - b) / 3) ^ n
would appear to be a bear to generate code for. It looks like we should generate something similar (for a stack based archtecture) to the code below.
push x push a push b adds push 3 divs push a push b subs push 3 divs subs push n exps muls pop a
Expressions on the right hand side of an assignment statement can be much more complex than this. Whatever are we to do?
Actually, the job is going to be pretty easy. The reason why is that when we parse the above expression, we get a subtree under expression that provides a proper parse for operator precedence. That is, again for a stack based architecture:
- each time we match an id or inlit in our parse, we will generate code immediately after that point to push the related value onto the stack.
- At any point where we have parsed the left operand, the operator, and the right operand, we will generate code to perform the operation (e.g., adds) on the top two elements of the stack, because that is where the left and right operands will be. (For a register based architecture, we can generate code to perform the operation on the left and right operands—which may be registers or a temporaries.)
- When it is time to generate code to pop the entire expression value into the variable on the left side of the :=, we will simply generate the proper pop.
So, at every point where we need to generate code we will only be generating very small portions of the code, often just one line, such as a push, an adds, or a pop). The cumulative effect will be that code for the entire expression (see the example above) will be generated correctly in its entirety one small line or two at a time.
An Example Using an Expression Grammar
To see better how this works, let's use the following example grammar segment for assignment statements (which closely mirrors the one for μPascal, with the features for relational expressions removed).
1. <statement> ⟶ id := <expression> 2. <expression> ⟶ <term> <term_tail> 3. <term_tail> ⟶ addop <term> <term_tail> 4. ⟶ ε 5. <term> ⟶ <factor> <factor_tail> 6. <factor_tail> ⟶ mulop <factor> <factor_tail> 7. ⟶ ε 8. <factor> ⟶ id 9. ⟶ int_lit 10. ⟶ ( <expression> )
Parse Tree Walkthrough
It helps when trying to determine how to generate code to keep the parse tree in mind that is implicitly generated by our recursive descent parser. Let's look at parsing the following string with the above grammar
a := a + b * 3
In class we decorated the tree with semantic records as they were passed down the tree and back up to show the role of the semantic records. We also indicated points at which code is generated.
Example of Code Generation Based on Grammar Rules
Rule 1
Consider rule 1. When we reach the end of parsing Rule 1, we have information about the id on the left of the assignment operator (:=) and the expression on the right of the assignment operator to generate the assignment. So, we modify this rule with the #genAssign action symbol.
<statement> ⟶ id := <expression> #genAssign(id_rec, exp_rec)
This implies the need for semantic records for id and for <expression>, but not for <statement> or :=.
Rule 3
Let's skip now to rule 3.
<term_tail> ⟶ addop <term> <term_tail> ^
This one is a little trickier. By the time we have parsed this rule to the point where we are finished with <term> (i.e., we have called procedure <term> and returned , we should know what the left operand, the operator, and the right operand are, so we should be able to generate IR code to carry out the operation on the two operands. However, it isn't clear from this single rule where the left operand was parsed. To see this, look at places where <term_tail> is on the right hand side of a rule (rules 2 and 3).
- By the time we are about to parse <term_tail> in each of these cases, we know what the left operand is. It was already parsed before this instance <term_tail> (on the left of the assignment above) is processed.
- The solution, then, is to construct a semantic record corresponding to the left operand when it is parsed, and pass this record as an actual parameter to <term_tail>, so that the left operand comes in to <term_tail> as a parameter.
- then the operator (addop) and right operand (<term>) are parsed in the current rule as shown above.
- then we need to generate code that causes the arithmetic of the operator to be applied to the left operand and the right operand.
- Finally, for further parsing the expression, we also need to build a record that gives information about the result that would result from this operation (to be used as the expression is further parsed and translated).
To summarize, for any of the rules like the one above, in which we expand <expression_tail>, <term_tail>, or <factor_tail> to an operator followed by the right operand, followed by a recursive call to the same tail nonterminal we need to
- get the left operand as input to the relevant procedure
- determine the operator and right operand (if any) from the right hand side of the rule
- generate code for the operator applied to the left operand and the right operand
- produce a result semantic record that contains such information as the type of the result of the operation just generated for processing further up the line. For example, the result of applying this operator to the left operand and the right operand may actually be the left operand of yet another operator further down and to the right in the tree.
Look at the following illustration of a parse of a + b * 3 to see this better.
The essence of this tree can be captured as
+ / \ a * / \ b c
Looking back at the rule that we were interested in, we can see better what we need to do.
#genArithmetic(term_tail_rec, addop_rec, term_rec, result_rec) <term_tail> ⟶ addop <term><term_tail> ^
Once again,
- term_tail_rec holds information about the left operand
- addop_rec contains information about whether the addop is a "+" or a "-",
- term_rec holds information about the right operand.
- result_rec holds information about the resulting expression (e.g., its type)
Notice here that the result of calling genArithmetic must be two things:
- The code to carry out the arithmetic must be produced and written to the IR file.
- The result_rec must be modified and now contain information about the intermediate value produced by the translated IR (e.g., its type) to be passed on to <term_tail> . This result_rec will now contain information about the new left operand when <term_tail> is parsed.
Finally, when we return from parsing this rule, we must return in term_tail_rec information about the intermediate value (e.g., its type) for proper compiling farther long in the tree.
Of course, what genArithmetic and genAssign do depend on the kind of IR and machine model we have.
Let's look at some possible procedure outlines for some of the rules above (the parts in blue are parts that need to be added to your parser):
Procedure Statement;id_rec, expression_rec : Semantic_Record;case lookahead is when id => -- parse rule 1.<statement> ⟶ id := <expression> -- First call Match, which must match and id and place the matched -- lexeme into semantic record id_rec (declared above). Match(id, id_rec);-- the following call to Analyzer.processId(id_rec) -- determines whether the lexeme in id_rec is declared and completes -- the record by including a pointer to its entry in the symbol table. -- This ensures that all information necessary about the id is -- available to the analyzer. Analyzer.processId(id_rec); Match(assign_op);-- In the following call to Expression, expression_rec is sent in -- empty and comes back with information about the expression (e.g., -- its type), so that this information can be used to determine -- whether the expression result is assignment compatible with the -- identifier on the left side of the assignment statement. Expression(expression_rec); -- the following call to Analyzer.genAssign(id_rec, expression_rec) -- determines whether the type in expression_rec is assignment compatible -- with the type in id_rec, then generates the IR to perform -- the assignment if so Analyzer.genAssign(id_rec, expression_rec);when read => . . .end case; end Statement;
Here is an outline for pocedure Term_Tail.
Procedure Term_Tail (term_tail_rec: in out Semantic_Record); -- term_tail_rec contains information about the left operand upon -- entry to this procedure and information about the result of the -- expression parsed in this call to term_tail upon exit. -- So, whenever Term_tail is called, necessary information about the
-- left operand MUST be sent in by a parameter from the part of the -- parse that uncovered the left operand, because Term_Tail will only -- be able to parse the operator and the right operand.addop_rec, term_rec, result_rec : Semantic_Record;case lookahead is when addop => -- parse <term_tail> ⟶ addop <term> <term_tail> -- First match the addop. The addop_rec should hold the lexeme of the
-- actual operator found. Match(addop, addop_rec);-- At this point we need to use the lexeme scanned to fill the addop_rec -- with the value of the actual operator found (e.g. "+" or "-"). The -- following call does this. Analyzer.processOp(addop_rec) -- Now call procedure term. Pass along an empty semantic record for term, -- term_rec, that will be filled with the necessary attributes determined -- for term in the parse. Term(term_rec);-- The next call to genArithmetic must do the following: -- generate the IR for the arithmetic expression whose left operand info -- is in term_tail_rec, right operand info is in term_rec, and operator info is -- in addop_rec. Put the info about the resulting intermediate value -- (e.g., the type of the intermediate value) into result_rec. Analyzer.genArithmetic(term_tail_rec, addop_rec, term_rec, result_rec);-- the value returned in result_rec from genArithmetic is the info about -- the intermediate value that will be computed by the run time code. Since -- this intermediate value is going to be the new left operand in the next -- recursive call to Term_Tail, we must pass it into Term_Tail Term_Tail(result_rec);-- After calling Term_Tail with result_rec, result_rec will come back -- filled with info (e.g., the type) about the sub_expression -- compiled by the call to Term_Tail. This result must now become -- the value passed back out from Term_Tail in the parameter -- term_tail_rec. This is accomplished next: term_tail_rec := result_rec;when ";" | etc => -- this option is the ε rule. In this case, term_tail_rec -- returns unchanged (there is no operator or right operand, and -- since term_tail_rec contains the result type information of the -- previous expression evaluation, it is still valid. null;when others => Syntax_Error; end case;end Term_Tail;
Now, look briefly at how one of these analyzer methods might be implemented for a stack based architecture.
Procedure genArithmetic(left_op, operator, right_op in : semantic_record; result out semantic_record);-- insert compiler code here to -- check the type of left_op and right_op in the symbol table to determine if -- they are type compatible with the operator in the semantic record operator. If they are not, -- and if type casting is allowed for these types, generate code to perform the -- type casting, otherwise produce an error.-- generate the IR code to perform the arithmetic, such asoutput(operator.value); -- e.g., adds or subs is printed to the IR_file-- set the result semantic record to be the type resulting from the operation -- just generated in IR codeend genArithmetic
The other procedures are handled in similar fashion. For example, in procedure factor,
- after the id is matched, an analyzer call to ProcessId should be done to ensure that the matched id's lexeme is in the symbol table, and to set up an id_record with a pointer to the symbol table entry for that lexeme.
- Once the id_rec has been built, a call to generate code to push the value of the id onto the stack should be made. This anaylzer routine, which we call genPush needs to have the id_rec in order to generate the proper push, and to construct a factor record that contains the type of the value that will be on the stack.
<factor> ⟶ id #ProcessId(id_rec) #genPushId(id_rec, factor_rec)
The analyzer method genPushId will need to look something like:
procedure genPushId(id_rec in semantic record; result_rec: out semantic record);-- check whether the lexeme in id_rec is in the symbol table and that it -- has a proper type to be pushed onto the arithmetic stack (generate an error -- if not)-- look up the display register and the offset in the symbol table for -- the lexeme in the symbol table output("push 4(D0)") -- assuming that D0 is the display register and -- 4 is the offset, as found in the symbol table-- set the result_rec to have the type of the lexeme in id_rec, -- as found in the symbol tableend genPushId
(Notice that you would have a similar, but different semantic action for integer literals.)
Rule 2
Now, let's go back to rule 2.
2. <expression> ⟶ <term> <term_tail>
In parsing this rule we need to remember that in this expression,
- what lies below term will result in code that, when executed, results in an operand on top of the stack (in a stack-based architecture).
- If term_tail derives
ε
, then the operand derived from term will be the only operand, and thus, the whole expression. - If term_tail does not derive
ε
, then the result from term is the left operand, and term_tail derives the operator and the right operand for the expression (see rule 3).
The point is, in order for code for the entire expression to be produced, we must pass the left operand to term_tail in order for term tail to have the left operand available when it derives the operator and the right operand.
Looking just at rule 1, for example, we should modify this rule to look like
<expression> ⟶ <term> #copy(term_rec, term_tail_rec) <term_tail>
In other words, the call to term will have term_tail as the actual parameter. This record will be filled up by the call to term and passed back with the information about the left operand. We then copy this record into term_tail_rec so that term_tail can pass term_tail_rec as the left operand.
You might be wondering why we don't just pass term_rec as the actual parameter to <term_tail> rather than doing the copy. There may be some shortcuts you can take like this when building a compiler by hand. An important point to note, however is this:
If we want to discover a method that is consistent so that we can have a program generate the compiler automatically we need to be systematic. One suggestion is this: if a procedure needs a parameter, always pass the parameter record for the nonterminal being called.
So, when calling term_tail, we want to pass term_tail_rec as the parameter, for example. This will become clearer later. When we return from <term_tail>, the value in term_tail_rec will be passed back, likely changed, to refer to the result of generating code for the arithmetic of the left operand, operator, and right operand. This value now needs to becomes the value of <expression>, held in expression_rec. So, the rule should actually look like:
<expression> ⟶ <term> #copy(term_rec, term_tail_rec) <term_tail> #copy(term_tail_rec, expression_rec)
Using this information, we can now modify the procedure:
procedure expression(expression_rec out semantic record);term_rec, term_tail_rec : semantic_record;begin --expression case Scanner.lookahead when id, int_lit, l_paren ==> -- Parse rule 2. <expression ⟶ <term> <term_tail> -- the following call to term will process the left operand and -- return information about the left operand (e.g., its type) -- in term_rec term(term_rec); -- the following call to Analyzer.copy will ensure that the values -- in term_tail are copied into term_tail_rec, so that the call -- to term_tail will have information about the left operand Analyzer.copy(term_rec, term_tail_rec); -- or term_tail_rec := term_rec -- the following call to term_tail has information about the left -- operand in term_tail_rec. The call to term_tail will generate -- any necessary code to carry out the parsed subexpression and -- return the type of the subexpression result in term_tail_rec -- (this means that term_tail_rec may be changed when the return -- is made). term_tail(term_tail_rec); -- after the return from term_tail, term_tail_rec contains the -- information about the resulting expression (e.g., its type), -- which should now be passed back in expression_rec. This is -- done by copying term_tail_rec into expression_rec before -- returning. Analyzer.copy(term_tail_rec, expression_rec); -- or expression_rec := term_tail_rec when others ==>. . .end case; end expression;