241 lines
13 KiB
HTML
241 lines
13 KiB
HTML
<!--startcut ==========================================================-->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<title>Compiler Construction Tools LG #39</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<meta name="GENERATOR" content="Mozilla/4.5 [en] (X11; I; Linux 2.0.34 i586) [Netscape]">
|
|
</HEAD>
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
|
|
ALINK="#FF0000">
|
|
<!--endcut ============================================================-->
|
|
|
|
<H4>
|
|
"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
|
</H4>
|
|
|
|
<P> <HR> <P>
|
|
<!--===================================================================-->
|
|
|
|
<center>
|
|
<H1><font color="maroon">Compiler Construction Tools</font></H1>
|
|
<H4>By <a href="mailto:rsevenic@penguin.sirti.org">Richard A. Sevenich</a></H4>
|
|
</center>
|
|
<P> <HR> <P>
|
|
|
|
<h1>
|
|
<b><font face="Times New Roman,Times">Part I: - JFlex and CUP</font></b></h1>
|
|
|
|
<font face="Times New Roman,Times">by Richard A. Sevenich, Department of
|
|
Computer Science, Eastern Washington University</font>
|
|
<br><font face="Times New Roman,Times">March 25, 1999</font><font face="Times New Roman,Times"></font>
|
|
<p><font face="Times New Roman,Times">Traditionally, in the UNIX world,
|
|
there are two complementary compiler construction tools which are available:</font>
|
|
<ul>
|
|
<li>
|
|
<font face="Times New Roman,Times">one to build lexical analyzers (often
|
|
called 'lexers' or 'scanners') e.g. <i>lex, JLex, JFlex</i></font></li>
|
|
|
|
<li>
|
|
<font face="Times New Roman,Times">one to build syntactic analyzers (often
|
|
called 'parsers') e.g. <i>byacc, bison, CUP</i></font></li>
|
|
</ul>
|
|
<font face="Times New Roman,Times">These tools are freely available in
|
|
the GNU/Linux world, usually free, in some cases licensed under the GPL.
|
|
It should be pointed out that lexical and syntactic analysis are two of
|
|
the primary jobs to be performed by a language translator. A lexer and
|
|
parser built with the above tools do not automatically accomplish a third
|
|
crucial task, target code generation. However, these tools provide the
|
|
programmer with convenient 'hooks' for incorporating target code generation.</font><font face="Times New Roman,Times"></font>
|
|
<p><font face="Times New Roman,Times">Later in this series of articles
|
|
I hope to introduce two of these tools, JFlex and CUP, in a tutorial setting.
|
|
I will enlist the help of several students in my compiler design course
|
|
as coauthors. Ultimately, I hope to persuade those unfamiliar with these
|
|
tools that they are very practical. I've chosen JFlex and CUP because they
|
|
produce java code and it is high time for me to learn some java. This unfamiliarity
|
|
with java will also provide me with a scapegoat when I do something stupid
|
|
- I can blame it on that unfamiliarity.</font><font face="Times New Roman,Times"></font>
|
|
<p><font face="Times New Roman,Times">This first article will provide background
|
|
for the series. The next article, which should follow fairly soon, will
|
|
show how/where to get these tools, give a very specific installation scenario,
|
|
and produce a simple application as a development example. A third article
|
|
will give a practical real world example (to be described below). If the
|
|
third article becomes unwieldy, it may be broken into parts.</font>
|
|
<h3>
|
|
<font face="Times New Roman,Times">The Lexical Analyzer (a.k.a. 'lexer'
|
|
or 'scanner')</font></h3>
|
|
Language translation converts source code from some language into target
|
|
code in some other language. The 'traditional' compiler may convert source
|
|
code into assembly language or even machine code - although the later articles
|
|
in this series will focus on other targets than these. The first task of
|
|
language translation is akin to examining an English essay to make sure
|
|
that the words are spelled correctly. The lexer performs this job on our
|
|
source code by recognizing a series of contiguous symbols as valid or not
|
|
e.g. the lexer might
|
|
<ul>
|
|
<li>
|
|
recognise 'while' as a keyword</li>
|
|
|
|
<li>
|
|
recognize '47452' as a decimal integer literal</li>
|
|
|
|
<li>
|
|
complain that 'fr%$glp' is not recognized</li>
|
|
</ul>
|
|
The lexer is analogous to a spelling checker for a source program.
|
|
<p>A utility such as JFlex builds a lexer from a specification file the
|
|
programmer writes to define the 'words' (lexical tokens) in the desired
|
|
language. Let's say the programmer defines a new langauge called <i>pronto</i>
|
|
and writes a file 'pronto.flex' which defines valid lexical tokens for
|
|
<i>pronto</i>. Then the command line operation 'JFlex pronto.flex' will
|
|
produce a java version of a lexical analyzer, say, "Lexer.java'.
|
|
<p>In its most primitive deployment, the lexer merely indicates that the
|
|
source file consists of all valid lexical tokens or not - a boolean yes
|
|
or no. The family of lexers under discussion are built to do more and,
|
|
in particular, can cooperate with the parser (to be discussed under the
|
|
next heading). In the typical application the lexer is, in fact, called
|
|
by the parser and the lexer can do these jobs:
|
|
<ul>
|
|
<li>
|
|
recognize a lexical token and return to the parser an identification code
|
|
indicating the token type</li>
|
|
|
|
<li>
|
|
pass to the parser related information e.g. the actual string recognized,
|
|
or the value of an integer literal</li>
|
|
|
|
<li>
|
|
perform other programmer coded actions upon recognition of a token</li>
|
|
</ul>
|
|
The first of the three items above allows the lexer to support the parser's
|
|
central task, syntactic analysis. The other two items are especially useful
|
|
in helping the parser in ultimately generating target code.
|
|
<h3>
|
|
The Syntactic Analyzer (a.k.a. the 'parser')</h3>
|
|
Continuing the analogy that began the previous section, just as the lexer
|
|
checks words for spelling, the parser examines the source to assure that
|
|
the 'words' are arranged in valid grammatical constructs. For example,
|
|
for some programmer's new language the lexer might pass these six valid
|
|
tokens to the parser:
|
|
<br>{ if + while ] } - the lexer only worries about token validity,
|
|
not the arrangement.
|
|
<br>However, the parser sees the arrangement '{ if + while ] }' as invalid.
|
|
Just as the lexer is a source code spelling checker, the parser is a grammar
|
|
checker.
|
|
<p>[ Note: The compilerati will realize that the lexer is actually checking
|
|
the source code validity against a very simple 'regular grammar' specification
|
|
and that the parser is checking the source code against a less simple 'context
|
|
free grammar' specification - as defined in the Chomsky hierarchy. Typical
|
|
compiler design books discuss the theory at length.]
|
|
<p>A utility such as CUP builds a parser from a specification file the
|
|
programmer writes to define the syntactic structure in the desired language.
|
|
For the fictitious new language called <i>pronto, </i>the programmer might
|
|
write the specification file as 'pronto.cup'. Then the command line
|
|
operation 'java java_cup.Main < pronto.cup' will produce several files
|
|
one of which is a java version of the desired syntactic analyzer, "parser.java'.
|
|
<p>In its most primitive deployment, the parser indicates that the source
|
|
file is either grammatically correct or not - again, a boolean yes or no.
|
|
The family of parsers under discussion can do an additional task - whenever
|
|
a valid grammatical construct is found, perform any code that the programmer
|
|
cares to encode. This 'hook' is typically used to support target code generation
|
|
in two execution styles:
|
|
<br>generated code written to a file to be executed later
|
|
<br>generated code to be executed while the parser operates
|
|
<h3>
|
|
Application Specific Languages</h3>
|
|
The compiler construction tools under discussion can be used to develop
|
|
a full-blown language translator e.g. for C, Pascal, FORTRAN, Perl, etc.
|
|
These would comprise major development projects. Here I'd like to discuss
|
|
translators for 'Application Specific Languages', typically a more modest
|
|
project, yet quite useful. I'll define an 'Application Specific Language'
|
|
(ASL) operationally, by means of two examples.
|
|
<p><b>Example 1 - A generalized industrial control language</b>
|
|
<p>Let's say Fred works for a company that produces industrial controllers,
|
|
which are driven by a computer. When Fred is hired, the company has already
|
|
developed and deployed a powerful, general pupose control language based
|
|
on generalized, parallel state machines. However, customers must become
|
|
programmers to use the controller; customers formally trained as chemical
|
|
engineers, mechanical engineers, technicians etc. with little desire or
|
|
time to learn a new general purpose programming language. The product is
|
|
very general pupose, useful in many niche industries - automotive, petroleum,
|
|
logging mills, satellite control, etc.
|
|
<p>Fred has been hired to put a front end on the language for every one
|
|
of the exploitable niche markets. In each case, the front end is to be
|
|
tailored to the terminology used by the niche market customer and to be
|
|
easy to use. The front end might be of the 'fill in the blanks' variety,
|
|
a GUI, whatever. The front ends are really new languages all with the same
|
|
target language (the general purpose control langauge). Each front end
|
|
exemplifies an ASL. By using the compiler construction tools (e.g.
|
|
JFlex and CUP), Fred capitalizes on several benefits including:
|
|
<ul>
|
|
<li>
|
|
the capability of checking the customer's source code for lexical and syntactic
|
|
errors and returning meaningful error messages</li>
|
|
|
|
<li>
|
|
the development of each ASL will have certain similarities</li>
|
|
|
|
<li>
|
|
the specification files for the lexer and parser are highly maintainable</li>
|
|
</ul>
|
|
<b>Example 2 - A generalized Fuzzy Logic based decision package</b>
|
|
<p>Fuzzy Logic has proved useful, not only in its traditional role in industrial
|
|
control, but also in a decision making role. It can be used to play the
|
|
stock market (and lose money more elegantly?), to choose from among a corporation's
|
|
research or marketing strategies, to aid in avalanche prediction, etc.
|
|
<p>Let's suppose that Fred's significant other, Sally, has programmed a
|
|
general pupose Fuzzy Logic decision maker. To apply it to different problems
|
|
it is initialized from appropriately different initialization data files.
|
|
Sally markets this product to various niches, but finds former customers
|
|
a constant burden on her time. The problem is really inherent in the way
|
|
Fuzzy Logic works. The customer is the expert in his/her particular problem
|
|
space. A Fuzzy Logic model is initialized by incorporating the expertise
|
|
of the user. The user gains more expertise as the model is used and will
|
|
constantly want to tweak the model. The crux of Sally's problem is that
|
|
the initialization file must be prepared with great programming care. The
|
|
customers are not programmers and easily make mistakes (most likely syntactic)
|
|
in preparing such a data file. They always run into problems, call on Sally
|
|
for help, and expect her to do it at a fairly low margin. She must respond
|
|
to maintain her reputation and, hence, her financila success.
|
|
<p>Her solution is to make an 'idiot-proof' front end that will automate
|
|
writing the initialization data file. The front end is tailored to the
|
|
niche customer's terminology and problem space. The front end is an ASL
|
|
with the initialization data file as the target language. The translator
|
|
can be written with the help of the compiler construction tools, just as
|
|
done by Fred for the industrial control scenario. The translator guarantees
|
|
that the lexical and syntactic content of the data file is correct.
|
|
<p>If the front end is cleverly defined the customer will find it useful.
|
|
Note that the customer is an expert on the problem semantics, where programmer
|
|
Sally would be weakest. The customer will solve the semantic problems,
|
|
and the ASL translator will avoid the lexical and syntactic problems related
|
|
to the initialization data file.
|
|
<p><b>Upshot</b>
|
|
<p>The two preceding examples have an obvious common thread. It should
|
|
be emphasized that in designing the front end ASL's, Fred and Sally must
|
|
interact strongly with the niche customers. After all, the ASL is to be
|
|
useful to a programming novice who, at the same time, has expertise in
|
|
the problem space. If this interaction fails, the ASL will not be well
|
|
received and may fail in its intended market.
|
|
<p>The Fuzzy Logic example is, in fact, the course project for this quarter's
|
|
compiler design class - assuming Sally doesn't beat us to it.
|
|
|
|
<!--===================================================================-->
|
|
<P> <hr> <P>
|
|
<center><H5>Copyright © 1999, Richard A. Sevenich<BR>
|
|
Published in Issue 39 of <i>Linux Gazette</i>, April 1999</H5></center>
|
|
|
|
<!--===================================================================-->
|
|
<P> <hr> <P>
|
|
<A HREF="./index.html"><IMG ALIGN=BOTTOM SRC="../gx/indexnew.gif"
|
|
ALT="[ TABLE OF CONTENTS ]"></A>
|
|
<A HREF="../index.html"><IMG ALIGN=BOTTOM SRC="../gx/homenew.gif"
|
|
ALT="[ FRONT PAGE ]"></A>
|
|
<A HREF="./bullington.html"><IMG SRC="../gx/back2.gif"
|
|
ALT=" Back "></A>
|
|
<A HREF="./marsden.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
|
|
<P> <hr> <P>
|
|
<!--startcut ==========================================================-->
|
|
</BODY>
|
|
</HTML>
|
|
<!--endcut ============================================================-->
|