Generating code for If Statements:
Boolean Expressions and Control Structure Semantics
Relational and Boolean Expressions
In translating an if statement, we need to be able to generate code that will evaluate Boolean expressions and we need to be able to generate code that will effectively perform branches to the correct parts of the if statement based on the value computed at run time for the Boolean expression.
Expression Types
There are three types of expressions that can occur in most modern programming languages:
Arithmetic Expressions
These are expressions incorporating the standard arithmetic operations, such as
a * (b + c) / Fred
Operands for arithmetic operations must be values of compatible numeric types for the given operation. Such compatibilities are specified in the semantic description of the programming language.
Relational Expressions
These are expressions that use the relational operators, such as < , <=, and so forth. The operands of relational operators must be compatible as defined by the language, but may be numeric values, characters, strings, or even enumerated type values in most programming languages. For non-numeric operands, their ASCII values are used for comparison in the case of characters and strings, and underlying numeric values for enumerated types. The result of a relational expression is Boolean. An example is
a < Fred * Mary
Logical Expressions
These are expression whose operators are AND, OR, and NOT. The operands of a logical expression must be Boolean, and the result type is Boolean. An example is
0 <= score AND score <= 100
Notice that in choosing precedence for operators, it might be good to put the arithmetic operators at the highest level, the relational operators below that, and then the logical operators lower than the relational operators in the order NOT, AND, and then OR. There seems to be a wide variation in operator precedence in programming languages with respect to the relational and logical operators, though, so one needs to check the language carefully when writing a compiler for a new language. In general, programmers should be sure to use parentheses to indicate the order in which they want the operators to be performed.
Boolean Expression Evaluation
There are some special aspects to the translation of expressions that result in Boolean values (True or False) that arise because most computer architectures have no specific values designated as True and False. Thus, the compiler writer must represent these values in some way, say with the integers 0 (false) and 1 (true), and then treat these values consistently throughout the compiled program.
Real architectures do usually have such logical operations as AND, OR, NOT, and XOR (exclusive or). The ways these are implemented normally is that the operands must have the same size (number of bits) for binary operations, and the logical operations are done bitwise (one bit at a time, with 1 representing True and 0 false). So, for example,
0101 AND 1110 = 0100 0101 OR 1100 = 1101 0101 XOR 1100 = 100 NOT 0101 = 1010
These machine language instructions can be used to perform the desired NOT, AND, and OR logical operations of high level programming languages if the compiler writer represents True, say, as the byte 11111111 and False as 00000000 throughout your program (one bit would be enough, but it might not be feasible or efficient to deal with individual bits in a real computer).
For IR that is not machine specific, we can simply represent the fact that we are doing various relational and logical operations in the code and leave the messy details of implementation to the code generation phase (where IR is translated into actual target machine code).
For your projects, you have aspects of real machine code to deal with. Remember the following:
- Implement False as 0 and True as 1.
- you must use existing instructions in the virtual machine when generating code to perform relational and logical operations in a the hierarchical manner specifiec by Pascal semantics
- Pascal is not defined like C; you cannot perform arithmetic on Boolean values, nor can you assign a Boolean value to an integer variable.
Translating If Statements
A Hand-Translation Example
Suppose we are parsing using the rules
<statement> ⟶ if <expression> then <statements> <opt_else> endif <opt_else> ⟶ else <statements>
A sample statement being parsed might be
if a < b then read(a); c := a + 1 else write(b) endif
The most important thing to note about translating any control structure is that it is the statements for performing the control that are the result of this translation. That is, branch and label instructions are the primary instructons produced by the translation of a control structure, such as an if or a loop. The call to <expression> in the parse produces the code for evaluating the Boolean expression of the if statement. The two calls to <statements> produces the code in the then and else parts. The only code left to be produced by semantic calls in the if statement is the code for branching to to the proper parts of the if statement.
This is demonstrated below, where the black parts are generated by calls to <expression> and <statments>, and the blue parts must be generated by semantic calls in the if statement. Here we are supposing that a is located at (0)D0, b at (4)D0, and c at (8)D0.
--if -- generated by call to <expression> push 0(D0) -- push a push 4(D0) -- push b cmplts -- pop the top two stack values, compare for less than and -- push the result, True (1) or False (0), onto the stack branchf L1 -- branch to the optional else if the top of stack value is -- false -- then -- generated by call to <statements> in then clause read 0(D0) -- read the next value in the input into 0(D0) (a) push 0(D0) -- push a push '1' -- push 1 adds -- add the top two stack values, leaving the sum on the
-- stack top pop 8(D0) -- pop the top of stack value into c branch L2 -- branch around the optional else part -- else Label L1 -- optional else part label -- generated by call to <statements> in else part push 4(D0) -- push b writes -- write the top of stack -- endif label L2 -- endif label
As described in the first part of this lecture, you will need to do more work in most target machine languages in <expression> to generate the proper code for handling Boolean values.
Modifying the Compiler for Translation of if Statements
What are some of the issues involved in upgrading a working parser to generate code for an if statement? Remember that the only code we need to generate while compiling an if statement directly are the branch and label instructions, as illustrated in the above example. Indirectly, of course, the code for evaluating the Boolean expression, the statements in the then clause and the statements in the else clause, but this code is generated during calls related to <expression> and <statements>. So, let's see what we must do to generate labels and insert branch and label instructions into the translated code.
A New Label Object
One thing we need is some way to generate labels. The best way to do this is to have a new class in our semantic analyzer with a getNewLabel method that always generates a unique label each time it is called. This object will be private to the semantic analyzer, because only the semantic analyzer needs it. That is, when it is time to generate code, the parser calls an action method in the semantic analyzer; these actions may then need to call the label generator to obtain new labels for the code about to be generated. The new label class will keep a numeric count of the number of labels generated so far. For example, each time the new label object is called, it could add one to an internal variable count that has been initialized to zero, and create and return the string L || To_String(count), which is just L concatenated with the string version of the value in count.
Then, the value of the labels generated must be used in jump and label statements. Since we must insert the code to jump to a label before the label instruction is actually inserted into the code, we must keep labels in proper semantic records and pass them to actions symbols as necessary.
Recursive Descent Parser Example
After thinking about the if statement for a while, we might come up with the following augmentation of the grammar rules for the if statement:
<statement> ⟶ if #process_if(if_rec) <expression> #if_test(if_rec, exp_rec) then <statements> <opt_else> endif #finish_if(if_rec) <opt_else> ⟶ else #process_else(if_rec) <statements> ⟶ ε #process_no_else(if_rec)
Thus, the code in a recursive descent parser for this if statement should look something like
procedure Statement; if_rec : semantic record -- (has room for the else and endif labels) expression_rec : semantic record -- (contains the type of expression) begin -- Statement case Scanner.lookahead is when if ==> Match("if"); -- The next call constructs an if_rec that contains the two labels -- needed to construct an if statement. Analyzer.process_if(if_rec); Expression(expression_rec); -- The next call to test_if first checks to see if the type in -- expression_rec is boolean and then,if so, generates code to
-- branch on false to the second label in the if_rec. Analyzer.if_test(if_rec, expression_rec); Match(then); Statements; Opt_Else(if_rec); Match(endif) -- The call to finish_if inserts the second label from the if_rec -- into the code. This is the label used to branch around the -- statements in the else clause for execution paths that go through -- the then clause. Analyzer.finish_if(if_rec); when id ==> . . . end case; end Statement;
The call to Analyzer.process_if must generate the labels for the optional else part and for the endif and put them in the if_rec semantic record.
The call to Analyzer.process_if_exp gets the if_rec semantic record with its two labels, and the expression_rec. This routine must
- check the type listed in expression_rec. If it is not Boolean, there is something wrong with the expression, and translation cannot continue (if expressions are allowed to be only Boolean, except in some dufus languages, like C).
- If the expression type is incorrect, an error must be reported.
- If the expression is of the proper type, code must be generated to branch on false around the then part to the optional else part.
The call to Analyzer.finish_if drops the label for the endif (the second label in parameter if_rec).
The code that manages control for the optional else part is done in the call to <opt_else>:
procedure Opt_else(opt_else_rec: in semantic_record); -- opt_else_rec is a copy of the if_rec generated in the if statement begin -- Opt_else case lookahead is when else ==> Match(else); -- The next analyzer call generates the unconditional branch -- around the statements in the else part for execution code -- that comes through the then part, and then generates the -- else label for execution that branches to the else clause Analyzer.process_else(opt_else_rec); Statements; when endif ==> -- The next analyzer call inserts the first label in the if_rec -- into the code to be sure that this label exists, because -- code has already been generated to branch to it. However, -- since there is no else part, no branch statement is inserted -- jump around the else part if execution comes through the -- then part. Analyzer.process_no_else(opt_else_rec); null; when others ==> Syntax_error; end case; end Opt_else;
Procedure Analyzer.process_else must insert an unconditional branch to the endif label in opt_else_rec followed by a label statement for the else label in opt_else_rec.
Procedure Analyzer.process_no_else must generate a label statement for the else label in opt_else_rec, but it does not need to first generate an unconditional branch to the endif label, because there is no else code to branch around.
Note that in this example, the branch at the end of the then clause that would cause a jump around the else code is actually generated by the Opt_else routine rather than the if statement. This just saves one instruction in the case that there is no else clause.
If there is no else part, the code is not generated for jumping around the else part, but code must be generated for the else label, because a jump on false to this label would have been produced right after checking the expression type.
Notice, too, that even if the then or else clauses contain other if statements (i.e., there are nested ifs), this still all works correctly!
Finally, remember that there are variations on how the desired code can be generated. We have just given you one way.
The Analyzer Routines for if Statements
For the approach we have given above, there will be four methods needed in the semantic analyzer plus a label generating class. The label generating class will also be used for generating labels needed in other control structures, such as loops and case statements.
Let's look first at method process_if in the analyzer, which just needs to ensure that two labels are ready for use for generating code for the if control structure.
procedure process_if(if_rec: out semantic record); begin -- process_if if_rec.label_1 := labelGenerator.getNewLabel; if_rec.label_2 := labelGenerator.getNewLabel; end. -- process_if
We can look next at the call to if_test.
procedure if_test(if_rec, conditional_rec: in semantic record) begin if_test -- Code goes here to check whether the type of the value is -- proper for a conditional expression. If it isn't, an error -- must be generated here, akin to "Expecting type Boolean for -- the conditional expression, found type character." -- If the type of the conditional expression is compatible, the -- following line is executed. output(IR_File, "jumpf " || if_rec.label_1) end if_test
The other needed analyzer methods will be similar.