Lab 2 - Using a Scanner Generator

Objectives

During one of our recent lectures, we discussed how the generation of the finite state automata for each regular expression (for each token) is so repetitive, mechanical, and, well, boring.  The theory of finite state automata and regular expressions tells us that there must be a way to design an automatic scanner generator, because we saw the following theorems while studying the theory of regular expressions and finite state automata:  

In addition, we have seen as we noted above that the process of turning a finite state automaton into a procedure (or program, or method, etc.)that does the work of that finite state automaton is mechanical, so we could give our own new theorem:

What does this all mean?  It means that  it should be possible to write a program that takes a set of regular expressions as input and produces a program as output that will scan an input stream for the tokens represented by the regular expression:

      A list of             Scanner      Scanner
  ----------------------->  Generator  ------------>
   regular expressions      Program      Program

 

One widely used scanner generator is the program lex (short for lexical analyzer generator, which is another term for scanner generator).  It comes with most Unix distributions.  We also have the gnu equivalent available, called flex.  Flex accepts a list of regular expressions which it processes, producing a scanner in C.  In order to use the scanner generated by flex, it, of course, must itself be compiled by a C compiler.

The objective today is to learn how to use a scanner generator.

The State of your Scanner

The TA will come around to ask about the state of your project.  If you are far enough along with it, feel free to demonstrate it. Otherwise, just let him know how far you think you are with it in terms of percent complete.

Note

The official tokens file for microPascal on the resources page has the official regular expressions for identifier and float_literal now defined.  The float_literal token might be a bit different than you did for the lab last week.  Be sure to make any changes necessary for your final scanner based on these officially listed tokens.

To Do

On Linux, do a "man" on flex to learn about flex.  As seniors you should now be able to read, understand, and apply information that you read about tools such as flex. Remember that this is a working laboratory. You should be actively collaborating with other team members.  

Work with flex

To Turn In

With 10 or 15 minutes to go in lab, build a small text file that has representatives of all the tokens you are now able to scan with your flex generated scanner.  Create a PDF listing of this text file along with a PDF of the corresponding list output from a run of your flex generated scanner. Turn these two lists in to the D2l lab 2 dropbox before leaving lab. Include a PDF cover sheet that has the following information on it.

  1. Your group number
  2. Lab 3
  3. Your team names