Lab 2 - Using a Scanner Generator
Objectives
During one of our recent lectures, we discussed how the generation of the finite state automata for each regular expression (for each token) is so repetitive, mechanical, and, well, boring. The theory of finite state automata and regular expressions tells us that there must be a way to design an automatic scanner generator, because we saw the following theorems while studying the theory of regular expressions and finite state automata:
- There is an algorithm that turns any given regular expression into an equivalent nondeterministic finite state automaton.
- There is an algorithm that turns any given nondeterministic finite state automaton into an equivalent deterministic finite state automaton
- There is an algorithm that turns any deterministic finite state automaton into an equivalent deterministic finite state automaton with the fewest possible number of states
In addition, we have seen as we noted above that the process of turning a finite state automaton into a procedure (or program, or method, etc.)that does the work of that finite state automaton is mechanical, so we could give our own new theorem:
- There is an algorithm that turns any deterministic finite state automaton into a procedure in C (name your language here) that does the work of that finite state automaton.
What does this all mean? It means that it should be possible to write a program that takes a set of regular expressions as input and produces a program as output that will scan an input stream for the tokens represented by the regular expression:
A list of Scanner Scanner -----------------------> Generator ------------> regular expressions Program Program
One widely used scanner generator is the program lex (short for lexical analyzer generator, which is another term for scanner generator). It comes with most Unix distributions. We also have the gnu equivalent available, called flex. Flex accepts a list of regular expressions which it processes, producing a scanner in C. In order to use the scanner generated by flex, it, of course, must itself be compiled by a C compiler.
The objective today is to learn how to use a scanner generator.
The State of your Scanner
The TA will come around to ask about the state of your project. If you are far enough along with it, feel free to demonstrate it. Otherwise, just let him know how far you think you are with it in terms of percent complete.
Note
The official tokens file for microPascal on the resources page has the official regular expressions for identifier and float_literal now defined. The float_literal token might be a bit different than you did for the lab last week. Be sure to make any changes necessary for your final scanner based on these officially listed tokens.
To Do
On Linux, do a "man" on flex to learn about flex. As seniors you should now be able to read, understand, and apply information that you read about tools such as flex. Remember that this is a working laboratory. You should be actively collaborating with other team members.
Work with flex
- Use the flex-token.h file, the flex driver.c file , and the tokens.xlsx file (official token set for microPascal) to work on this lab exercise.
- Create input for flex for μPascal according to the man instructions. Notice that the form flex requires for regular expressions is not the same as the form used in theory textbooks or in our definitions.
- Test your flex-generated scanner with the flex-driver.c. Do this in stages. For example, just include enough to scan simple tokens to start with, such as identifier, and then test whether your flex-generated scanner can handle input files consisting just of identifiers. Then incrementally add other tokens and test the generated scanner with them. You should study the flex driver to see how a parser could be written that calls the scanner generated by flex.
To Turn In
With 10 or 15 minutes to go in lab, build a small text file that has representatives of all the tokens you are now able to scan with your flex generated scanner. Create a PDF listing of this text file along with a PDF of the corresponding list output from a run of your flex generated scanner. Turn these two lists in to the D2l lab 2 dropbox before leaving lab. Include a PDF cover sheet that has the following information on it.
- Your group number
- Lab 3
- Your team names