Run Time Storage

Mixing and Matching

At this point of the course, we need to jump around a bit. In terms of your compiler project, we will need to discuss various aspects of

symbol table issues
run time memory management
code translation issues

pretty much all at the same time in order for you to begin creating translated code that executes properly. This means that you will need to be alert. Textbooks also have problems at this point, because the authors don't know where to place each of the topics in the book, because, as just mentioned, they are often interwoven. This means that you will need to be on your toes.

Indicating Run Time Addresses in the Symbol Table

One of the issues that comes up when building a symbol table has to do with run time storage. Run time storage is the RAM used to hold program variables and other values when the translated program is actually executing (a ctually, it isn't physical RAM, but rather virtual memory on today's modern archtectures and operating systems). Think of it this way. When the parser encounters a variable declaration, say

VAR   A : integer;

a call needs to be made to the symbol table module to insert the lexeme A and its attributes (in this case "variable" and "integer") into the symbol table for this scope. One other piece of information also needs to be included in the symbol table, and that is where A can eventually be found in memory when the translated program is actually executed. That is, we will encounter statements later when parsing similar to

A := 3;

When the parser encounters this statement, it will need to make a call to a method in the semantic analyzer that in turn makes a call to the symbol table to look up A. It must be determined by this call

Whether A has been declared (i.e., whether it is in the symbol table)
Whether A has the correct kind and type to be assigned an integer value (3)
Where A will be in memory when the translated program is actually executed.

Think about the assignment statement

A := 3;

again. Eventually this statement needs to be translated into the target code. We might think of the translation looking like

STORE "3", A

which we think of as meaning "store the value 3 into A." However, something is wrong with this translation. As we know from having studied assembly and machine language in CS 330, at the machine level, we need to have an address in (virtual) RAM for A, not the symbolic name A. Thus, we need to have a translation more like

STORE "3", 000008FA

where 000008FA is the virtual (hexadecimal) address in memory of where A is located. Then, when the computer actually executes this instruction, it can store the integer literal value 3 into memory location 000008FA, which is the RAM address we are using for A. (The operating system and the hardware combine to translate the virtual address into the real RAM address; this translation is not the responsibility of the compiler writer. Instead, the compiler writer must read and understand the requirements placed on compiler generated code by the operating system. This includes how to set up run time menory, how to make system calls for I/O, how to register exception handlers with the operating system, and so forth).

This means that the symbol table for A must have yet another attribute: the eventual virtual RAM address that will be used for A in the translated program. Thus, when the declaration for A is parsed

VAR   A : integer;

the symbol table (abbreviated for the purpose of this discussion) entry for identifier A should look like

variable

integer

000008FA

In other words, one of the attributes of A in the symbol table is where A is to be located in RAM so that translated code can refer to A by its address. Then, when the line

A := 3;

is parsed and translated, the attributes of A can be looked up in the symbol table to find A's type and planned RAM location to produce the translation

STORE "3", 000008FA

Important Note: It is easy at first to confuse the two concepts of the symbol table and the run time storage. Just remember these three things:

The symbol table is only used during compiling. It generally does not exist after the compiler has finished its job. This means that it does not exist at the time the translated program is executing.
After the program has been translated and the compiler has finished its job, all variables are referred to by their RAM addresses, not their names.
During compiling the symbol table must contain the names of all variables in scope as well as where these variables are to be located in RAM when the translated program is executed. The symbol table is NOT the place where variable values are stored. Only the eventual virtual RAM addresses of the variables are kept in the symbol table.

Determining the RAM Addresses of Variables at Compile Time

The foregoing discussion raises one interesting question. How can the compiler know at compile time where variables should be located in RAM? In other words, given the above examples, how could it be possible that the compiler can determine that A should be placed at location 000008FA. The answer is simple. It can't! From your study of operating systems you should know that each operating system has its own memory management scheme. All modern computers and operating systems use virtual memory and can wind up storing a program all over the place in physical RAM, wherever it maps virtual RAM pages to physical memory frames. So we must make the following observation:

At compile time it is not possible to assign actual physical RAM addresses to variables.

So what are we to do? The solution is to refer to variables by a relative RAM address rather than an actual RAM address. Relative to what? Well, when the program is being parsed the order in which the variables are encountered is known, and the sizes of the types of values to be stored in each variable is also known. We can use this information to build up a symbol table as illustrated in the next example.

Example of a Symbol Table with Relative Addresses

Suppose that we are translating the following program (this is not a microPascal program, but is similar to one).

program myprog;

  var     a, b : integer;
          letter : character;
          x : float;
          c : integer;

  begin -- myprog;
     . . .
     c := a + b;
     . . .
   end. -- myprog

By the time the begin statement is about to be parsed, the symbol table (again abbreviated) should look something like the following:

myprog

nest 0

link

null

lexeme	kind	type	offset
a	variable	integer	0
b	variable	integer	4
letter	variable	character	8
x	variable	float	9
c	variable	integer	17

The name of this symbol table is "myprog" as shown in the first field, the name field. The scope level (nesting level) of this symbol table is 0, and that is included in the symbol table as well and will be used for addressing. The data structure for holding the identifiers given in the next 5 rows. Then there is a link field that allows a symbol table to be stacked with other symbol tables for scoping purposes.

The point of interest is the fourth field in the data structure that holds the identifiers. This is the "offset" field. Since it is not known where these variables will actually wind up in physical RAM when the translated program is run, we instead show where they are relative to this scope. Since variable a is the first one encountered, it is offset 0 from the start of this scope. By the same token, variable b is offset 4 bytes from the start of this scope. In this case we are assuming that integers take up 4 bytes, so if variable a is at offset 0, variable b, which is next, must be offset 4 bytes from the start of this scope. Similarly, variable letter comes after the integer b, so will be offset 8 from the start of this scope. Since a character only takes up one byte, the variable x, which follows the variable letter, is offset 9 from the start of the scope. If we assume that a float type takes up 8 bytes, then c is located at 17 bytes offset from the start of the scope.

In all of this, we do assume that wherever the translated program is loaded into (virtual) memory the variables a, b, letter, x, and c will be located contiguously.

At this point we know where all of the variables reside in relation to each other. We don't know, however, where these variables will be in RAM when the operating system loads it. To see how we can get around this problem let's look at how the above program might be translated into a target language. We're going to make this easier, though, by only translating the variable references to addresses.

Here is the trick. We are going to use a processor register, which we will call D0 for now, to hold the RAM location where the variable section of the program (at nesting level 0) starts. Then we will refer to each variable as being so many bytes offset from D0.

program myprog;              program myprog

  var                        set D0 to contain the address of where variable a starts in RAM
     a, b : integer;          -- D0 now contains a's address
     letter : character;
     x : float;
     c : integer;

  begin -- myprog;           begin -- myprog;
     . . .                      . . .
     c := a + b;                17(D0) := 0(D0) + 4(D0);
     . . .                      . . .
   end. -- myprog              end. -- myprog

Notice the blue parts. If we can somehow set D0 to contain the starting address of the memory location the first variable in this scope (variable a in this case) in virtual RAM, we can get to any other variable in this same scope by referring to that variable by adding the offset of that variable (found in the symbol table) to D0. Most machine languages have an addressing mode similar to 4(D0), which means "refer to the value in the location in RAM computed by adding 4 to the contents of register D0."

Completing the Picture: Activation Records and the Run Time Stack

Of course, the picture above does not represent a true translation. Also, we still have the problem of determining how to set D0 so that it has the proper starting address of the first variable (a) regardless of where the variable section is placed in memory. Finally, variables come and go. When a procedure or a function is called, the local variables and parameters for the procedure and function exist during that time, but are removed when a return is made from the procedure or function.

Let's look at this picture first for the translation of a single program that contains no procedures or functions.

The standard approach for this is to use a run time stack. A typical layout of a program in (virtual) memory would look something like

                   Heap
                     |
                     |
              HP --> V        dynamic memory (e.g., linked lists) go here

                 Free Space
               SP --> ^        activation records for procedures and functions go here
                     |
                     |
               Run Time Stack

              Static Variables

                Program Code

             Control Information

We call this illustration a run time memory model. We use this model to determine where variable values will be in memory, so that we can create translated code that will access these variables properly when the translated program is running. The run time stack is where the activation records are pushed for each procedure and function as they are called (and popped each time they end). The heap is where dynamic data structures go, such as linked lists. Notice that the run time stack grows towards the heap, and the heap grows towards the run time stack. If the two meet during program execution, there is a "stack overflow error".

An Example

To see how semantic analysis works with all of the components above (except for error handling at the moment), we will look at an example. Consider compiling the following program.

program fuzzy;

  var       a, b, c:   integer;     ask, find: char;

  const     pi : real := 3.14159;

  begin
     read(a, b);
     c := a + b * a;
     write(a);
   end

At this point we are going to bypass IR in our translation and translate directly to a machine code to demonstrate how run time memory might be managed (run time memory is, after all, directly tied to a target platform).

        Source             Target Machine Code

program fuzzy;          --program fuzzy

We put comments in to help keep track of which source code goes with which section of translated target code code. There might actually be some code here at the start of the program, but we won't discuss it here in order to focus on the run time memory management.

During parsing of the variable declaration section, we do not produce any output code, but we do generate the symbol table, and we do need to know where in memory the variables in the symbol table will eventually be put, so we need to have the memory model in mind as we build the symbol table.

We now see that the symbol table should hold some more information to help us with the translation. It would be helpful to know the kind of an identifier, its type, its size, its offset, and its nesting level. The number of bytes that need to be reserved in run time memory for this activation record would also be valuable information. The symbol table after processing the variable declaration section would thus be

Table Name	Nesting Level	Activation Record Size	Link to Next Table
fuzzy	0	18	null

lexeme	kind	type	size (bytes)	offset
a	var	int	4	0
b	var	int	4	4
c	var	int	4	8
ask	var	char	1	12
find	var	char	1	13
pi	const	real	4	14

At this point, the compiler has compiled through the variables declaration section and has created the above symbol table and output the one line (startprog) of code. Then the line

begin

is reached. At first glance, it might appear that no code should be generated here. However, this is the point at which we can output code to manage the run time stack. In other words, at this point in the translated code, the body of the program would begin executing. Thus, the activation record for this program must be on the run time stack. What we envision is that run time memory will look like the following at this point

            HP -->  ?

            SP -->  ?

                    Static Variables

                    Program Code

                    Control Information

HP is a pointer to the first free element of the heap space, and SP is the run time stack pointer. We assume that SP points to the first free element on the of the top stack (i.e., the byte in RAM where SP points is undefined. Everything above it is also undefined, up to where the heap pointer (HP) points.). A push causes the operand to be stored at the RAM address where SP points and an automatic increment of SP by the size of the operand is performed so that SP now points to the next free byte. A pop is similar in that it first decrements the SP by one byte, then removes the number of bytes from the stack represented by the size of the operand, leaving the SP pointing to the first free byte on the top of the stack. This illustration represents the case that would exist when the translated program is about to begin running with the code just following the "begin" statement. The program code translation is in memory, but there are no activation records on the run time stack, and nothing on the heap.

What we need to do at this point in the compiling process is to generate code that will make sure that the activation record for program fuzzy gets pushed onto the run time stack. All this really means is that we need to reserve space on the run time stack for fuzzy's activation record.

Since program fuzzy is at nesting level 0, we will be using display register 0, namely D0, to access values in fuzzy's activation record. As a general principle, then, we will first generate code to save the value currently in D0 onto the run time stack (so that we can restore it later), and then to put the value of SP into D0. The machine code to do this might look like

push D0      -- put D0's value on the run time stack              -- automatically increments SP by the number of bytes in D0 load D0,SP   -- put the current SP value into D0.  D0 now points to the position              -- on the run time stack where the first variable (a) for this activation              -- record is located.

After these statements are eventually executed, we can envision run time memory looking like the following if we assume that the arbitrary value 0089080 was in D0.

            HP -->  ?

     D0 --> SP -->  ?                     0089080

                    Static Variables

                    Program Code

                    Control Information

Now it is time to generate the code that will cause room to be left on the stack for the variables of fuzzy's activation record. From the symbol table, we see that 18 bytes are needed for the variables and constants of fuzzy. So, we include the output line

add SP,18,SP

to move the value in SP up by a18., which, after execution, would make the stack look like

            HP -->  ?

                    Static Variables

                    Program Code

                    Control Information

Here, we think of six slots being left on the run time stack for the five variables and one constant in the scope of fuzzy. In fact, we might actually write the names in to help us see this (the names would not really be there).

            HP -->   ?

                    Static Variables

                    Program Code

                    Control Information

Now that the code to force run time memory to look like the above, we can translate the remaining lines of the program quite easily. First we need to know what kind of arithmetic model we are using in our target machine. For this example, assume that we are using a stack for arithmetic.

First, the following line is translated as shown.

read(a, b);           read 0(D0)
                      read 4(D0)

In a real machine language program, the read would be translated to an operating system call that would perform the desired read.

Notice that in our code, the first value is read and stored into 0(D0), or in the memory address whose value is in D0 with an offset of 0 bytes. This is where variable a is, (notice that we determine this from the symbol table). Our run time memory model matches this. Similarly, the second value will be read in and stored where we have left room for b, namely, 4(D0). We can tell from looking b up in the symbol table that we use D0 as the "display register" because b is at nesting level 0. We can also tell that the offset is 4, because the offset in the symbol table for b is 4.

Second, we translate the following line as shown:

c := a + b * a;       push 0(D0) -- push a
                      push 4(D0) -- push b
                      push 0(D0) -- push a
                      mulint     -- pop the top two stack values,
                                 -- multiply them, and push the product
                      addint     -- pop the top two stack values,
                                 -- add them, and push the sum
                      pop 8(D0)  -- pop the top stack value and store
                                 -- it in c

Third, we translate the following write statement in a manner similar to the translation of the read statement, as shown.

write(a);            write 0(D0)

Lastly, we reach the end keyword. At this point we should generate code that will pop fuzzy's activation record off the stack, leaving the stack pointer where it was. We also need to restore the old value into D0. Thus, the code generated when the end is encountered, might look like:

end.                 sub SP,18,SP -- set SP so that it effectively pops the activation record                                   -- record for this program from the stack                      pop D0       -- reset D0 to its saved value (this automatically                                   -- decrements the SP correctly)                      halt

The result of executing this translated code will leave the run time memory model looking like the following again:

            HP -->  ?

            SP -->  ?

                    Static Variables

                    Program Code

                    Control Information

The entire translated program now looks like:

 --program fuzzy
 --begin
 store D0, SP -- put D0's value on the run time stack
 add SP,4,SP  -- bump SP up by 4 (assuming D0 takes 4 bytes on the stack)
 load D0,SP   -- put the current SP value into D0
 add SP,18,SP
 --read(a,b);
 read 0(D0)
 read 4(D0)
 --c := a + b * a;
 push 0(D0) -- push a
 push 4(D0) -- push b
 push 0(D0) -- push a
 mulint     -- pop the top two stack values,
            -- multiply them, and push the product
 addint     -- pop the top two stack values,
            -- add them, and push the sum
 pop 8(D0)  -- pop the top stack value and store
            -- it in c
 -- write(a);
 write 0(D0)
 --end
 sub SP,18,SP -- set SP to its original value
 pop D0  -- reset D0 to its saved value
 halt

You should do a walkthrough of this program to see how it works.