Action Symbols, Action Routines, and
Generating a Symbol Table

The Issues

Once we have completed work on the parser, it is time to begin including semantics in our compiler. Remember, "semantics" means "meaning." What we are supposed to do in the semantic analyzer is to determine the meaning of the source program based on the parse tree produced by the parser and then translate it into a program in a different form (e.g., IR) with the same meaning. Eventually, the translated executable program we generate must run exactly as the original source program was intended to run. There are two parts to determining the meaning of a program and then translating that into a different form: construction of the symbol table (which contains the "meanings" of all the identifiers in the program), and the generation of IR.

Construction of the Symbol Table

We have discussed some of the issues surrounding the construction of a symbol table. Action symbols are inserted into grammar rules at points where we believe we should know enough information to insert an id into the table. These action symbols then become calls to public methods in the symbol table module. These action routines check that an identifier to be inserted is not already in the table and then to insert it and its attributes if not, or issue an error message if so.

Generating Code

The second responsibility of the semantic analyzer is to generate code. This is also one in similar fashion. We find places in our grammar where we should know enough information to actually generate a section of code, and then we modify our parser with corresponding calls to the Semantic Analyzer. These public methods in the Semantic Analyzer must then be written and included for this to work.

Semantic Records and Semantic Actions

In order to construct a symbol table and to generate code we will need some help. We use semantic records to hold semantic information about the section of the program we are currently parsing (e.g., the lexeme and type of a variable) so that we can update the symbol table (e.g., insert the lexeme and type of a variable into the symbol table) or generate code (e.g., check that variables in an arithmetic expression are type compatible for the operation being performed on them and then generating code to perform that operation).

In our recursive descent parser, the way we will determine the semantics (meaning) of various program pieces and carry out appropriate actions (e.g., perform a translation into the target code or insert information into the symbol table) is as follows:

For every token and nonterminal in the grammar we will associate a semantic record. This record will be filled with the semantics (meaning) of the associated token or nonterminal as parsing proceeds.
At various points in the grammar we will insert semantic actions. Semantic actions will be public methods or procedures in either the symbol table class or the semantic analyzer class of the compiler. The semantic action calls will have semantic records as parameters. The semantic action procedures in the symbol table module will then use the semantic information in the semantic records passed as parameters to update the symbol table appropriately.

Symbol Tables and Scopes

There are many different issues regarding symbol tables. Some of them are:

Where should we create a new symbol table for a new scope?
Where should we insert identifiers into a symbol table?
Where do we look up identifiers in a symbol table?
Where should we remove a symbol table?
How do we include procedure and function identifiers in a symbol table?

Let's consider these issues in more detail.

The Symbol Table Abstract Data Type, or Class

First, we assume that we all have decided how we are going to implement a symbol table. Each group has chosen a data structure, such as a hash table, a sequential list, or a binary search tree for implementing the symbol table.

Second, we assume that we all recognize that for scoping issues, we will need one symbol table per scope and that each new scope is stacked on the most recent one as a new scope (procedure or function declaration) is encountered, and that the topmost symbol table is popped (discarded) from the stack when the end of a scope (procedure or function end) is encountered during parsing.

Finally, we assume that we will all be implementing a common set of operations on symbol tables:

create -- for creating a new symbol table when a new scope is encountered
destroy -- for deleting, or popping, a symbol table when a scope is exited
insert -- for inserting a new identifier and its attributes into a symbol table
find -- for looking up an identifier in the symbol table

There may be one more operation on your symbol table depending on how you implement it:

return_attributes -- for retrieving the attributes of a found identifier

Creating a Symbol Table

Question: When should a new symbol table be created while parsing?

Answer: Whenever a new scope is encountered. (e.g., beginning of the program, beginning of each procedure and function, and for some languages in other places, such as a for loop in which the loop index variable is considered to be in scope only for the duration of the loop, or, as in C, the curly braces)

Question: When will the parser be able to determine during parsing that a new scope is encountered?

Answer:: The first scope, of course, will be for the program as a whole. Therefore, when it is clear that a program has been found, it is time to create the first symbol table. Other symbol tables will need to be created as the structures that indicate that a new scope is necessary are encountered, for example when the procedure, function, for, and other indicating tokens are found.

Instance 1

For mPascal, a reasonable place to include an action to create the first symbol table would be as soon as the programreserved word is encountered.

So, if we have a rule similar to

program_heading ⟶ "program" program_identifier

then at the point in the parse when we have matched the reserved word program, we know that it is time to create a symbol table. We can therefore modify this rule to include an action symbol to reflect this fact, as in

program_heading ⟶ "program" #create_symbol_table program_identifier

If we want to name this newly created symbol table with the program name, we will need to wait to create the symbol table until the program identifier has been parsed. This would give the following modification instead

program_heading ⟶ "program" program_identifier #create_symbol_table(program_identifier_rec)

In this case we can create a symbol table and give it a name at the same time so that if we print the stack of symbol tables in existence at any point in a parse we can see (by also printing their names) which program, procedure, or function they belong to (this is the way you are to implement it).

Corresponding to this rule modification is a modification to procedure program_heading in the parser. It will look something like the following, where updates are shown in blue.

procedure ProgramHeading;

  variables     program_identifier_rec: semantic_record;

  begin --ProgramHeading
     case lookahead is
       when reserved_program ==>
         match(reserved_program);
         -- procedure ProgramIdentifier, called next, must return the name of the program
         -- in program_identifier_rec
         ProgramIdentifier(program_identifier_rec);
         -- the next call causes a new symbol table to be created with the name given in
         -- program_identifier_rec.  This is not an identifier to be inserted into the
         -- symbol table, but rather just used to give a name to the symbol table.
         SymbolTable.create(program_identifier_rec);
       when others ==>
         raise error exception
     end case;

end ProgramHeading

Instance 2

There are other places where new symbol tables should be created, namely when a procedure or function is declared. Thus, a rule similar to:

ProcedureHeading ⟶ "procedure" Identifier OptionalFormalParameterList

could be marked up as

ProcedureHeading ⟶ "procedure" Identifier #create_symbol_table(identifier_rec) OptionalFormalParameterList

and the procedure in the parser that handles nonterminal ProcedureHeading will have to be modified similar to the above. The rule given below will also need to be modified.

FunctionHeading ⟶ "function" Identifier OptionalFormalParameterList

There are other things to do when a procedure or function is declared. This case is complex enough that we will visit it later.

Inserting Variable Identifiers and their Attributes into a Symbol Table

Let's look at the following set of rules for microPascal.

<vardeclpart> ⟶ var <vardecl> ; <vardecltail>
<vardeclpart> ⟶ ε
<vardecltail> ⟶ <vardecl> ; <vardecltail>
<vardecltail> ⟶ ε
<vardecl> ⟶ <idlist> : <type>

<idlist> ⟶ id <idlistail>
<idlistail> ⟶ , id <idlistail>
<idlistail> ⟶ ε

<type> ⟶ integer
<type> ⟶ fixed
<type  ⟶ float

Recall again from the last lecture that the rules for <idlist> can actually be reached from places other than the <vardecl> rules. This means that without other information, the parser procedure for <idlist> doesn't know whether or not the id's being found are part of a variable declaration or not. We discussed three solutions to this problem last time. We could:

Change the rules of the grammar so that there are different id_list nonterminals for each different situation. For example, <var_id_list> could be a new nonterminal leading to rules that expand a list of identifiers that can only be encountered in a variable declaration section.

Pass a semantic record to id_list whenever that procedure is called indicating from where the call was made. The information in this semantic record is called an "inherited attribute" of id_list, since it is inherited from rules farther up in the tree (e.g., the variable declaration rules).

Just build a record that is a linked list of identifier lexemes, and pass this parameter back up to the procedure that called id_list. This record is the semantic information computed for nonterminal id_list. When the calling procedures get this information, they can then decide what to do with it (e.g., the variable declaration procedures would insert each lexeme in this linked list into the symbol table). The information in this semantic record is called a "synthesized attribute." This just means that the attribute was not inherited from above, but was constructed (i.e., synthesized) by this procedure or one below that was called by this procedure.

Let's explore solution 3.

We first remember that every nonterminal and token has an associated semantic record that we can use to pass information gathered in the parser and on to the symbol table or semantic analyzer. We will name these records the same as the nonterminal or token with a _rec suffix. So, for example, the name of the semantic record for token id isid_rec, and the semantic record for nonterminal <idlistail> is idlistail_rec.

An analysis of the grammar and leads to the insertion of the following semantic actions.

<vardeclpart> ⟶ var <vardecl> ; <vardecltail>
<vardeclpart> ⟶ ε
<vardecltail> ⟶ <vardecl> ; <vardecltail>
<vardecltail> ⟶ ε
<vardecl> ⟶ <idlist> : <type> #symbol_table.insert(idlist_rec, type_rec)

<idlist> ⟶ id #analyzer.process_id(id_rec, idlist_rec) <idlistail>
<idlistail> ⟶ , id #analyzer.process_id(id_rec, idlist_rec) <idlistail>
<idlistail> ⟶ ε

<type> ⟶ integer
<type> ⟶ fixed
<type  ⟶ float

To explain, look at the rule

<idlist> ⟶ id #process_id(id_rec, idlist_rec) <idlistail>

After matching the id, we know enough to attach the lexeme of that id to the list of id's being constructed at this point. So, we make a new method in the Semantic_Analyzer module named process_id whose task is to take incoming id's and attach them to the list of id's. To accomplish this task, method process_id must be given the semantic record of the current id, which contains the lexeme matched, and the current list, in the semantic record for id_list.

The procedure for idlist would thus look like:

procedure idlist(out idlist_rec);

  var id_rec     : semantic_record;

begin -- idlist
   case lookahead is
     when id ==>
       Match(id, id_rec);
       analyzer.process_id(id_rec, idlist_rec);
       idlistail(idlist_rec); -- must pass idlist_rec to idlistail for further processing
     when others => Syntax_Error;
   end case;
 end idlist;

The procedure for idlistail would be

procedure idlistail(in out idlist_rec);

  var id_rec : semantic_record;

begin -- idlist
   case lookahead is
     when comma ==>
       Match(comma);
       Match(id, id_rec);
       analyzer.process_id(id_rec, idlist_rec);
       idlistail(idlist_rec); -- must pass idlist_rec to idlistail for further processing
     when ... (follow set) ==>
         null;
     when others => Syntax_Error;
   end case;
 end idlist;

The procedure for

<vardecl> ⟶ <idlist> : <type> #symbol_table_insert(idlist_rec, type_rec)

would look like

procedure vardecl;
    var idlist_rec : semantic_record initialized to null;
        type_rec   : semantic_record initialized to "variable" for kind
                     and null for type;

begin -- vardecl;
   case lookahead is
     when id ==>
       idlist(idlist_rec);
       Match(colon);
       type(type_rec);
       analyzer.symbol_table_insert(idlist_rec, type_rec);
     when others => Syntax_Error;
   end case;
 end vardecl;

Notice that each time we find we need to gather information to pass to the semantic analyzer or to other procedures in the parser (for eventually submission to the semantic analyzer) we need to have semantic records associated with the appropriate tokens or nonterminals that can be filled up with the proper values. Every time we need to have an action carried out by the semantic analyzer (e.g., to build a list of identifiers or to insert values into the symbol table) we need to have a new public method in the analyzer to perform that task given the appropriately filled semantic records. Notice that the analyzer method for inserting values into the symbol table requires deconstructing the list of identifiers and calling the symbol table for each one to insert that identifier into the symbol table with its type. Of course it must also be determined whether this identifier already exists in this scope when the insert operation is performed.

Inserting Procedure Names and Parameters into a Symbol Table

Procedure Declaration Time: When a procedure declaration is encountered a number of things must happen with respect to the symbol table structure.

The procedure name and attributes (modes and types of the formal parameters) must be inserted into the currentsymbol table.
A new symbol table must be created for the new scope.
The formal parameter names, modes, and types must be inserted into the newly created symbol table.

These steps are not done necessarily in this order; in fact one would normally have to interleave some of these operations.

Inserting Function Identifiers, Parameters, and Return Type into a Symbol Table

Of course, the issues that apply to procedures also apply to functions. In addition, the return type of the function must be associated with the function name in the scope in which the symbol table was declared.

Destroying a Symbol Table

When should a symbol table be destroyed? When the end of the scope is reached. That is, when the end of a program, procedure, or function is encountered during parsing, it is time to destroy the current symbol table. This would occur in rules like

ProcedureDeclaration ⟶ ProcedureHeading ";" Block ";"

which would be modified as follows:

ProcedureDeclaration ⟶ ProcedureHeading ";" Block #pop_symbol_table ";"

The #pop_symbol_table semantic action will be a call to pop the current symbol table from the stack, leaving the one below it on the stack as the new current symbol table.

This same thing will need to be done with respect to functions. For programs (as far as your project is concerned), it is not technically necessary to destroy the symbol table, because for microPascal, the end of the program is the end of the compilation process. This may not be true for programs that consist, say, of many different classes that are not embedded within each other.

Finally, in programming languages that allow for different scoping boundaries (for loops, braces (blocks), etc.) the end of the respective scoping construct is where the symbol table should be destroyed.

Looking up Values in a Symbol Table

To determine whether an identifier found elsewhere in a program has indeed been declared, we simply need to find every place where an identifier is matched in a parse and include an action, say #lookup, that returns "true" if the identifier is found in the symbol table and "false" if it is not. Of course, in a real program much more will need to be done than this, but for now, we are considering just how to construct a symbol table and then to look up in it whether an identifier has been defined. We don't consider at this time whether identifiers are used properly for their type.

Summary

This process of looking at the grammar rules to see where actions go, in this case to create, insert values into the symbol table, look up values in the symbol table, and destroy a symbol table, is repeated for all semantic actions.

Action Symbols, Action Routines, and Generating a Symbol Table