1375 lines
39 KiB
Plaintext
1375 lines
39 KiB
Plaintext
|
Lex and YACC primer/HOWTO
|
|||
|
PowerDNS BV (bert hubert <bert@powerdns.com>)
|
|||
|
v0.8 $Date: 2002/04/20 19:46:46 $
|
|||
|
|
|||
|
This document tries to help you get started using Lex and YACC
|
|||
|
______________________________________________________________________
|
|||
|
|
|||
|
Table of Contents
|
|||
|
|
|||
|
|
|||
|
1. Introduction
|
|||
|
|
|||
|
1.1 What this document is NOT
|
|||
|
1.2 Downloading stuff
|
|||
|
1.3 License
|
|||
|
|
|||
|
2. What Lex & YACC can do for you
|
|||
|
|
|||
|
2.1 What each program does on its own
|
|||
|
|
|||
|
3. Lex
|
|||
|
|
|||
|
3.1 Regular expressions in matches
|
|||
|
3.2 A more complicated example for a C like syntax
|
|||
|
3.3 What we've seen
|
|||
|
|
|||
|
4. YACC
|
|||
|
|
|||
|
4.1 A simple thermostat controller
|
|||
|
4.1.1 A complete YACC file
|
|||
|
4.1.2 Compiling & running the thermostat controller
|
|||
|
4.2 Expanding the thermostat to handle parameters
|
|||
|
4.3 Parsing a configuration file
|
|||
|
|
|||
|
5. Making a Parser in C++
|
|||
|
|
|||
|
6. How do Lex and YACC work internally
|
|||
|
|
|||
|
6.1 Token values
|
|||
|
6.2 Recursion: 'right is wrong'
|
|||
|
6.3 Advanced yylval: %union
|
|||
|
|
|||
|
7. Debugging
|
|||
|
|
|||
|
7.1 The state machine
|
|||
|
7.2 Conflicts: 'shift/reduce', 'reduce/reduce'
|
|||
|
|
|||
|
8. Further reading
|
|||
|
|
|||
|
9. Acknowledgements & Thanks
|
|||
|
|
|||
|
|
|||
|
|
|||
|
______________________________________________________________________
|
|||
|
|
|||
|
1. Introduction
|
|||
|
|
|||
|
Welcome, gentle reader.
|
|||
|
|
|||
|
If you have been programming for any length of time in a Unix
|
|||
|
environment, you will have encountered the mystical programs Lex &
|
|||
|
YACC, or as they are known to GNU/Linux users worldwide, Flex & Bison,
|
|||
|
where Flex is a Lex implementation by Vern Paxon and Bison the GNU
|
|||
|
version of YACC. We will call these programs Lex and YACC throughout -
|
|||
|
the newer versions are upwardly compatible, so you can use Flex and
|
|||
|
Bison when trying our examples.
|
|||
|
These programs are massively useful, but as with your C compiler,
|
|||
|
their manpage does not explain the language they understand, nor how
|
|||
|
to use them. YACC is really amazing when used in combination with
|
|||
|
Lex, however, the Bison manpage does not describe how to integrate Lex
|
|||
|
generated code with your Bison program.
|
|||
|
|
|||
|
|
|||
|
1.1. What this document is NOT
|
|||
|
|
|||
|
There are several great books which deal with Lex & YACC. By all means
|
|||
|
read these books if you need to know more. They provide far more
|
|||
|
information than we ever will. See the 'Further Reading' section at
|
|||
|
the end. This document is aimed at bootstrapping your use of Lex &
|
|||
|
YACC, to allow you to create your first programs.
|
|||
|
|
|||
|
The documentation that comes with Flex and BISON is also excellent,
|
|||
|
but no tutorial. They do complement my HOWTO very well though. They
|
|||
|
too are referenced at the end.
|
|||
|
|
|||
|
I am by no means a YACC/Lex expert. When I started writing this
|
|||
|
document, I had exactly two days of experience. All I want to
|
|||
|
accomplish is to make those two days easier for you.
|
|||
|
|
|||
|
In no way expect the HOWTO to show proper YACC and Lex style. Examples
|
|||
|
have been kept very simple and there may be better ways to write them.
|
|||
|
If you know how to, please let me know.
|
|||
|
|
|||
|
1.2. Downloading stuff
|
|||
|
|
|||
|
Please note that you can download all the examples shown, which are in
|
|||
|
machine readable form. See the homepage <http://ds9a.nl/lex-yacc> for
|
|||
|
details.
|
|||
|
|
|||
|
|
|||
|
1.3. License
|
|||
|
|
|||
|
Copyright (c) 2001 by bert hubert. This material may be distributed
|
|||
|
only subject to the terms and conditions set forth in the Open
|
|||
|
Publication License, vX.Y or later (the latest version is presently
|
|||
|
available at http://www.opencontent.org/openpub/).
|
|||
|
|
|||
|
2. What Lex & YACC can do for you
|
|||
|
|
|||
|
When properly used, these programs allow you to parse complex
|
|||
|
languages with ease. This is a great boon when you want to read a
|
|||
|
configuration file, or want to write a compiler for any language you
|
|||
|
(or anyone else) might have invented.
|
|||
|
|
|||
|
With a little help, which this document will hopefully provide, you
|
|||
|
will find that you will never write a parser again by hand - Lex &
|
|||
|
YACC are the tools to do this.
|
|||
|
|
|||
|
|
|||
|
2.1. What each program does on its own
|
|||
|
|
|||
|
Although these programs shine when used together, they each serve a
|
|||
|
different purpose. The next chapter will explain what each part does.
|
|||
|
|
|||
|
|
|||
|
3. Lex
|
|||
|
|
|||
|
The program Lex generates a so called `Lexer'. This is a function that
|
|||
|
takes a stream of characters as its input, and whenever it sees a
|
|||
|
group of characters that match a key, takes a certain action. A very
|
|||
|
simple example:
|
|||
|
|
|||
|
%{
|
|||
|
#include <stdio.h>
|
|||
|
%}
|
|||
|
|
|||
|
%%
|
|||
|
stop printf("Stop command received\n");
|
|||
|
start printf("Start command received\n");
|
|||
|
%%
|
|||
|
|
|||
|
|
|||
|
|
|||
|
The first section, in between the %{ and %} pair is included directly
|
|||
|
in the output program. We need this, because we use printf later on,
|
|||
|
which is defined in stdio.h.
|
|||
|
|
|||
|
Sections are separated using '%%', so the first line of the second
|
|||
|
section starts with the 'stop' key. Whenever the 'stop' key is
|
|||
|
encountered in the input, the rest of the line (a printf() call) is
|
|||
|
executed.
|
|||
|
|
|||
|
Besides 'stop', we've also defined 'start', which otherwise does
|
|||
|
mostly the same.
|
|||
|
|
|||
|
We terminate the code section with '%%' again.
|
|||
|
|
|||
|
To compile Example 1, do this:
|
|||
|
|
|||
|
|
|||
|
lex example1.l
|
|||
|
cc lex.yy.c -o example1 -ll
|
|||
|
|
|||
|
|
|||
|
|
|||
|
NOTE: If you are using flex, instead of lex, you may have to change
|
|||
|
'-ll' to '-lfl' in the compilation scripts. RedHat 6.x and SuSE need
|
|||
|
this, even when you invoke 'flex' as 'lex'!
|
|||
|
|
|||
|
|
|||
|
This will generate the file 'example1'. If you run it, it waits for
|
|||
|
you to type some input. Whenever you type something that is not
|
|||
|
matched by any of the defined keys (ie, 'stop' and 'start') it's
|
|||
|
output again. If you enter 'stop' it will output 'Stop command
|
|||
|
received';
|
|||
|
|
|||
|
Terminate with a EOF (^D).
|
|||
|
|
|||
|
You may wonder how the program runs, as we didn't define a main()
|
|||
|
function. This function is defined for you in libl (liblex) which we
|
|||
|
compiled in with the -ll command.
|
|||
|
|
|||
|
|
|||
|
3.1. Regular expressions in matches
|
|||
|
|
|||
|
This example wasn't very useful in itself, and our next one won't be
|
|||
|
either. It will however show how to use regular expressions in Lex,
|
|||
|
which are massively useful later on.
|
|||
|
|
|||
|
Example 2:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
%{
|
|||
|
#include <stdio.h>
|
|||
|
%}
|
|||
|
|
|||
|
%%
|
|||
|
[0123456789]+ printf("NUMBER\n");
|
|||
|
[a-zA-Z][a-zA-Z0-9]* printf("WORD\n");
|
|||
|
%%
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This Lex file describes two kinds of matches (tokens): WORDs and
|
|||
|
NUMBERs. Regular expressions can be pretty daunting but with only a
|
|||
|
little work it is easy to understand them. Let's examine the NUMBER
|
|||
|
match:
|
|||
|
|
|||
|
[0123456789]+
|
|||
|
|
|||
|
This says: a sequence of one or more characters from the group
|
|||
|
0123456789. We could also have written it shorter as:
|
|||
|
|
|||
|
[0-9]+
|
|||
|
|
|||
|
Now, the WORD match is somewhat more involved:
|
|||
|
|
|||
|
[a-zA-Z][a-zA-Z0-9]*
|
|||
|
|
|||
|
The first part matches 1 and only 1 character that is between 'a' and
|
|||
|
'z', or between 'A' and 'Z'. In other words, a letter. This initial
|
|||
|
letter then needs to be followed by zero or more characters which are
|
|||
|
either a letter or a digit. Why use an asterisk here? The '+'
|
|||
|
signifies 1 or more matches, but a WORD might very well consist of
|
|||
|
only one character, which we've already matched. So the second part
|
|||
|
may have zero matches, so we write a '*'.
|
|||
|
|
|||
|
This way, we've mimicked the behaviour of many programming languages
|
|||
|
which demand that a variable name *must* start with a letter, but can
|
|||
|
contain digits afterwards. In other words, 'temperature1' is a valid
|
|||
|
name, but '1temperature' is not.
|
|||
|
|
|||
|
Try compiling Example 2, lust like Example 1, and feed it some text.
|
|||
|
Here is a sample session:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
$ ./example2
|
|||
|
foo
|
|||
|
WORD
|
|||
|
|
|||
|
bar
|
|||
|
WORD
|
|||
|
|
|||
|
123
|
|||
|
NUMBER
|
|||
|
|
|||
|
bar123
|
|||
|
WORD
|
|||
|
|
|||
|
123bar
|
|||
|
NUMBER
|
|||
|
WORD
|
|||
|
|
|||
|
|
|||
|
|
|||
|
You may also be wondering where all this whitespace is coming from in
|
|||
|
the output. The reason is simple: it was in the input, and we don't
|
|||
|
match on it anywhere, so it gets output again.
|
|||
|
|
|||
|
The Flex manpage documents its regular expressions in detail. Many
|
|||
|
people feel that the perl regular expression manpage (perlre) is also
|
|||
|
very useful, although Flex does not implement everything perl does.
|
|||
|
|
|||
|
Make sure that you do not create zero length matches like '[0-9]*' -
|
|||
|
your lexer might get confused and start matching empty strings
|
|||
|
repeatedly.
|
|||
|
|
|||
|
3.2. A more complicated example for a C like syntax
|
|||
|
|
|||
|
Let's say we want to parse a file that looks like this:
|
|||
|
|
|||
|
|
|||
|
logging {
|
|||
|
category lame-servers { null; };
|
|||
|
category cname { null; };
|
|||
|
};
|
|||
|
|
|||
|
zone "." {
|
|||
|
type hint;
|
|||
|
file "/etc/bind/db.root";
|
|||
|
};
|
|||
|
|
|||
|
|
|||
|
|
|||
|
We clearly see a number of categories (tokens) in this file:
|
|||
|
|
|||
|
<20> WORDs, like 'zone' and 'type'
|
|||
|
|
|||
|
<20> FILENAMEs, like '/etc/bind/db.root'
|
|||
|
|
|||
|
<20> QUOTEs, like those surrounding the filename
|
|||
|
|
|||
|
<20> OBRACEs, {
|
|||
|
|
|||
|
<20> EBRACEs, }
|
|||
|
|
|||
|
<20> SEMICOLONs, ;
|
|||
|
|
|||
|
The corresponding Lex file is Example 3:
|
|||
|
|
|||
|
|
|||
|
%{
|
|||
|
#include <stdio.h>
|
|||
|
%}
|
|||
|
|
|||
|
%%
|
|||
|
[a-zA-Z][a-zA-Z0-9]* printf("WORD ");
|
|||
|
[a-zA-Z0-9\/.-]+ printf("FILENAME ");
|
|||
|
\" printf("QUOTE ");
|
|||
|
\{ printf("OBRACE ");
|
|||
|
\} printf("EBRACE ");
|
|||
|
; printf("SEMICOLON ");
|
|||
|
\n printf("\n");
|
|||
|
[ \t]+ /* ignore whitespace */;
|
|||
|
%%
|
|||
|
|
|||
|
|
|||
|
|
|||
|
When we feed our file to the program this Lex file generates (using
|
|||
|
example3.compile), we get:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
WORD OBRACE
|
|||
|
WORD FILENAME OBRACE WORD SEMICOLON EBRACE SEMICOLON
|
|||
|
WORD WORD OBRACE WORD SEMICOLON EBRACE SEMICOLON
|
|||
|
EBRACE SEMICOLON
|
|||
|
|
|||
|
WORD QUOTE FILENAME QUOTE OBRACE
|
|||
|
WORD WORD SEMICOLON
|
|||
|
WORD QUOTE FILENAME QUOTE SEMICOLON
|
|||
|
EBRACE SEMICOLON
|
|||
|
|
|||
|
|
|||
|
|
|||
|
When compared with the configuration file mentioned above, it is clear
|
|||
|
that we have neatly 'Tokenized' it. Each part of the configuration
|
|||
|
file has been matched, and converted into a token.
|
|||
|
|
|||
|
And this is exactly what we need to put YACC to good use.
|
|||
|
|
|||
|
|
|||
|
3.3. What we've seen
|
|||
|
|
|||
|
We've seen that Lex is able to read arbitrary input, and determine
|
|||
|
what each part of the input is. This is called 'Tokenizing'.
|
|||
|
|
|||
|
|
|||
|
4. YACC
|
|||
|
|
|||
|
YACC can parse input streams consisting of tokens with certain values.
|
|||
|
This clearly describes the relation YACC has with Lex, YACC has no
|
|||
|
idea what 'input streams' are, it needs preprocessed tokens. While you
|
|||
|
can write your own Tokenizer, we will leave that entirely up to Lex.
|
|||
|
|
|||
|
A note on grammars and parsers. When YACC saw the light of day, the
|
|||
|
tool was used to parse input files for compilers: programs. Programs
|
|||
|
written in a programming language for computers are typically *not*
|
|||
|
ambiguous - they have just one meaning. As such, YACC does not cope
|
|||
|
with ambiguity and will complain about shift/reduce or reduce/reduce
|
|||
|
conflicts. More about ambiguity and YACC "problems" can be found in
|
|||
|
'Conflicts' chapter.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
4.1. A simple thermostat controller
|
|||
|
|
|||
|
Let's say we have a thermostat that we want to control using a simple
|
|||
|
language. A session with the thermostat may look like this:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
heat on
|
|||
|
Heater on!
|
|||
|
heat off
|
|||
|
Heater off!
|
|||
|
target temperature 22
|
|||
|
New temperature set!
|
|||
|
|
|||
|
|
|||
|
|
|||
|
The tokens we need to recognize are: heat, on/off (STATE), target,
|
|||
|
temperature, NUMBER.
|
|||
|
|
|||
|
The Lex tokenizer (Example 4) is:
|
|||
|
|
|||
|
|
|||
|
%{
|
|||
|
#include <stdio.h>
|
|||
|
#include "y.tab.h"
|
|||
|
%}
|
|||
|
%%
|
|||
|
[0-9]+ return NUMBER;
|
|||
|
heat return TOKHEAT;
|
|||
|
on|off return STATE;
|
|||
|
target return TOKTARGET;
|
|||
|
temperature return TOKTEMPERATURE;
|
|||
|
\n /* ignore end of line */;
|
|||
|
[ \t]+ /* ignore whitespace */;
|
|||
|
%%
|
|||
|
|
|||
|
|
|||
|
|
|||
|
We note two important changes. First, we include the file 'y.tab.h',
|
|||
|
and secondly, we no longer print stuff, we return names of tokens.
|
|||
|
This change is because we are now feeding it all to YACC, which isn't
|
|||
|
interested in what we output to the screen. Y.tab.h has definitions
|
|||
|
for these tokens.
|
|||
|
|
|||
|
But where does y.tab.h come from? It is generated by YACC from the
|
|||
|
Grammar File we are about to create. As our language is very basic, so
|
|||
|
is the grammar:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
commands: /* empty */
|
|||
|
| commands command
|
|||
|
;
|
|||
|
|
|||
|
command:
|
|||
|
heat_switch
|
|||
|
|
|
|||
|
target_set
|
|||
|
;
|
|||
|
|
|||
|
heat_switch:
|
|||
|
TOKHEAT STATE
|
|||
|
{
|
|||
|
printf("\tHeat turned on or off\n");
|
|||
|
}
|
|||
|
;
|
|||
|
|
|||
|
target_set:
|
|||
|
TOKTARGET TOKTEMPERATURE NUMBER
|
|||
|
{
|
|||
|
printf("\tTemperature set\n");
|
|||
|
}
|
|||
|
;
|
|||
|
|
|||
|
|
|||
|
|
|||
|
The first part is what I call the 'root'. It tells us that we have
|
|||
|
'commands', and that these commands consist of individual 'command'
|
|||
|
parts. As you can see this rule is very recursive, because it again
|
|||
|
contains the word 'commands'. What this means is that the program is
|
|||
|
now capable of reducing a series of commands one by one. Read the
|
|||
|
chapter 'How do Lex and YACC work internally' for important details on
|
|||
|
recursion.
|
|||
|
|
|||
|
The second rule defines what a command is. We support only two kinds
|
|||
|
of commands, the 'heat_switch' and the 'target_set'. This is what the
|
|||
|
|-symbol signifies - 'a command consists of either a heat_switch or a
|
|||
|
target_set'.
|
|||
|
|
|||
|
A heat_switch consists of the HEAT token, which is simply the word
|
|||
|
'heat', followed by a state (which we defined in the Lex file as 'on'
|
|||
|
or 'off').
|
|||
|
|
|||
|
Somewhat more complicated is the target_set, which consists of the
|
|||
|
TARGET token (the word 'target'), the TEMPERATURE token (the word
|
|||
|
'temperature') and a number.
|
|||
|
|
|||
|
|
|||
|
4.1.1. A complete YACC file
|
|||
|
|
|||
|
The previous section only showed the grammar part of the YACC file,
|
|||
|
but there is more. This is the header that we omitted:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
%{
|
|||
|
#include <stdio.h>
|
|||
|
#include <string.h>
|
|||
|
|
|||
|
void yyerror(const char *str)
|
|||
|
{
|
|||
|
fprintf(stderr,"error: %s\n",str);
|
|||
|
}
|
|||
|
|
|||
|
int yywrap()
|
|||
|
{
|
|||
|
return 1;
|
|||
|
}
|
|||
|
|
|||
|
main()
|
|||
|
{
|
|||
|
yyparse();
|
|||
|
}
|
|||
|
|
|||
|
%}
|
|||
|
|
|||
|
%token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE
|
|||
|
|
|||
|
|
|||
|
|
|||
|
The yyerror() function is called by YACC if it finds an error. We sim<69>
|
|||
|
ply output the message passed, but there are smarter things to do. See
|
|||
|
the 'Further reading' section at the end.
|
|||
|
|
|||
|
The function yywrap() can be used to continue reading from another
|
|||
|
file. It is called at EOF and you can than open another file, and
|
|||
|
return 0. Or you can return 1, indicating that this is truly the end.
|
|||
|
For more about this, see the 'How do Lex and YACC work internally'
|
|||
|
chapter.
|
|||
|
|
|||
|
Then there is the main() function, that does nothing but set
|
|||
|
everything in motion.
|
|||
|
|
|||
|
The last line simply defines the tokens we will be using. These are
|
|||
|
output using y.tab.h if YACC is invoked with the '-d' option.
|
|||
|
4.1.2. Compiling & running the thermostat controller
|
|||
|
|
|||
|
|
|||
|
|
|||
|
lex example4.l
|
|||
|
yacc -d example4.y
|
|||
|
cc lex.yy.c y.tab.c -o example4
|
|||
|
|
|||
|
|
|||
|
|
|||
|
A few things have changed. We now also invoke YACC to compile our
|
|||
|
grammar, which creates y.tab.c and y.tab.h. We then call Lex as usual.
|
|||
|
When compiling, we remove the -ll flag: we now have our own main()
|
|||
|
function and don't need the one provided by libl.
|
|||
|
|
|||
|
|
|||
|
NOTE: if you get an error about your compiler not being able to find
|
|||
|
'yylval', add this to example4.l, just beneath #include <y.tab.h>:
|
|||
|
|
|||
|
extern YYSTYPE yylval;
|
|||
|
|
|||
|
|
|||
|
This is explained in the 'How Lex and YACC work internally' section.
|
|||
|
|
|||
|
|
|||
|
A sample session:
|
|||
|
|
|||
|
|
|||
|
$ ./example4
|
|||
|
heat on
|
|||
|
Heat turned on or off
|
|||
|
heat off
|
|||
|
Heat turned on or off
|
|||
|
target temperature 10
|
|||
|
Temperature set
|
|||
|
target humidity 20
|
|||
|
error: parse error
|
|||
|
$
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This is not quite what we set out to achieve, but in the interest of
|
|||
|
keeping the learning curve manageable, not all cool stuff can be
|
|||
|
presented at once.
|
|||
|
|
|||
|
|
|||
|
4.2. Expanding the thermostat to handle parameters
|
|||
|
|
|||
|
As we've seen, we now parse the thermostat commands correctly, and
|
|||
|
even flag mistakes properly. But as you might have guessed by the
|
|||
|
weasely wording, the program has no idea of what it should do, it does
|
|||
|
not get passed any of the values you enter.
|
|||
|
|
|||
|
Let's start by adding the ability to read the new target temperature.
|
|||
|
In order to do so, we need to learn the NUMBER match in the Lexer to
|
|||
|
convert itself into an integer value, which can then be read in YACC.
|
|||
|
|
|||
|
Whenever Lex matches a target, it puts the text of the match in the
|
|||
|
character string 'yytext'. YACC in turn expects to find a value in the
|
|||
|
variable 'yylval'. In Example 5, we see the obvious solution:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
%{
|
|||
|
#include <stdio.h>
|
|||
|
#include "y.tab.h"
|
|||
|
%}
|
|||
|
%%
|
|||
|
[0-9]+ yylval=atoi(yytext); return NUMBER;
|
|||
|
heat return TOKHEAT;
|
|||
|
on|off yylval=!strcmp(yytext,"on"); return STATE;
|
|||
|
target return TOKTARGET;
|
|||
|
temperature return TOKTEMPERATURE;
|
|||
|
\n /* ignore end of line */;
|
|||
|
[ \t]+ /* ignore whitespace */;
|
|||
|
%%
|
|||
|
|
|||
|
|
|||
|
|
|||
|
As you can see, we run atoi() on yytext, and put the result in yylval,
|
|||
|
where YACC can see it. We do much the same for the STATE match, where
|
|||
|
we compare it to 'on', and set yylval to 1 if it is equal. Please note
|
|||
|
that having a separate 'on' and 'off' match in Lex would produce
|
|||
|
faster code, but I wanted to show a more complicated rule and action
|
|||
|
for a change.
|
|||
|
|
|||
|
Now we need to learn YACC how to deal with this. What is called
|
|||
|
'yylval' in Lex has a different name in YACC. Let's examine the rule
|
|||
|
setting the new temperature target:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
target_set:
|
|||
|
TOKTARGET TOKTEMPERATURE NUMBER
|
|||
|
{
|
|||
|
printf("\tTemperature set to %d\n",$3);
|
|||
|
}
|
|||
|
;
|
|||
|
|
|||
|
|
|||
|
|
|||
|
To access the value of the third part of the rule (ie, NUMBER), we
|
|||
|
need to use $3. Whenever yylex() returns, the contents of yylval are
|
|||
|
attached to the terminal, the value of which can be accessed with the
|
|||
|
$-construct.
|
|||
|
|
|||
|
To expound on this further, let's observe the new 'heat_switch' rule:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
heat_switch:
|
|||
|
TOKHEAT STATE
|
|||
|
{
|
|||
|
if($2)
|
|||
|
printf("\tHeat turned on\n");
|
|||
|
else
|
|||
|
printf("\tHeat turned off\n");
|
|||
|
}
|
|||
|
;
|
|||
|
|
|||
|
|
|||
|
|
|||
|
If you now run example5, it properly outputs what you entered.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
4.3. Parsing a configuration file
|
|||
|
|
|||
|
Let's repeat part of the configuration file we mentioned earlier:
|
|||
|
|
|||
|
|
|||
|
zone "." {
|
|||
|
type hint;
|
|||
|
file "/etc/bind/db.root";
|
|||
|
};
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Remember that we already wrote a Lexer for this file. Now all we need
|
|||
|
to do is write the YACC grammar, and modify the Lexer so it returns
|
|||
|
values in a format YACC can understand.
|
|||
|
|
|||
|
In the lexer from Example 6 we see:
|
|||
|
|
|||
|
|
|||
|
%{
|
|||
|
#include <stdio.h>
|
|||
|
#include "y.tab.h"
|
|||
|
%}
|
|||
|
|
|||
|
%%
|
|||
|
|
|||
|
zone return ZONETOK;
|
|||
|
file return FILETOK;
|
|||
|
[a-zA-Z][a-zA-Z0-9]* yylval=strdup(yytext); return WORD;
|
|||
|
[a-zA-Z0-9\/.-]+ yylval=strdup(yytext); return FILENAME;
|
|||
|
\" return QUOTE;
|
|||
|
\{ return OBRACE;
|
|||
|
\} return EBRACE;
|
|||
|
; return SEMICOLON;
|
|||
|
\n /* ignore EOL */;
|
|||
|
[ \t]+ /* ignore whitespace */;
|
|||
|
%%
|
|||
|
|
|||
|
|
|||
|
|
|||
|
If you look carefully, you can see that yylval has changed! We no
|
|||
|
longer expect it to be an integer, but in fact assume that it is a
|
|||
|
char *. In the interest of keeping things simple, we invoke strdup and
|
|||
|
waste a lot of memory. Please note that this may not be a problem in
|
|||
|
many areas where you only need to parse a file once, and then exit.
|
|||
|
|
|||
|
We want to store character strings because we are now mostly dealing
|
|||
|
with names: file names and zone names. In a later chapter we will
|
|||
|
explain how to deal with multiple types of data.
|
|||
|
|
|||
|
In order to tell YACC about the new type of yylval, we add this line
|
|||
|
to the header of our YACC grammar:
|
|||
|
|
|||
|
|
|||
|
#define YYSTYPE char *
|
|||
|
|
|||
|
|
|||
|
|
|||
|
The grammar itself is again more complicated. We chop it in parts to
|
|||
|
make it easier to digest.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
commands:
|
|||
|
|
|
|||
|
commands command SEMICOLON
|
|||
|
;
|
|||
|
|
|||
|
|
|||
|
command:
|
|||
|
zone_set
|
|||
|
;
|
|||
|
|
|||
|
zone_set:
|
|||
|
ZONETOK quotedname zonecontent
|
|||
|
{
|
|||
|
printf("Complete zone for '%s' found\n",$2);
|
|||
|
}
|
|||
|
;
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This is the intro, including the aforementioned recursive 'root'.
|
|||
|
Please note that we specify that commands are terminated (and
|
|||
|
separated) by ;'s. We define one kind of command, the 'zone_set'. It
|
|||
|
consists of the ZONE token (the word 'zone'), followed by a quoted
|
|||
|
name and the 'zonecontent'. This zonecontent starts out simple enough:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
zonecontent:
|
|||
|
OBRACE zonestatements EBRACE
|
|||
|
|
|||
|
|
|||
|
|
|||
|
It needs to start with an OBRACE, a {. Then follow the zonestatements,
|
|||
|
followed by an EBRACE, }.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
quotedname:
|
|||
|
QUOTE FILENAME QUOTE
|
|||
|
{
|
|||
|
$$=$2;
|
|||
|
}
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This section defines what a 'quotedname' is: a FILENAME between
|
|||
|
QUOTEs. Then it says something special: the value of a quotedname
|
|||
|
token is the value of the FILENAME. This means that the quotedname has
|
|||
|
as its value the filename without quotes.
|
|||
|
|
|||
|
This is what the magic '$$=$2;' command does. It says: my value is the
|
|||
|
value of my second part. When the quotedname is now referenced in
|
|||
|
other rules, and you access its value with the $-construct, you see
|
|||
|
the value that we set here with $$=$2.
|
|||
|
|
|||
|
|
|||
|
NOTE: this grammar chokes on filenames without either a '.' or a '/'
|
|||
|
in them.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
zonestatements:
|
|||
|
|
|
|||
|
zonestatements zonestatement SEMICOLON
|
|||
|
;
|
|||
|
|
|||
|
zonestatement:
|
|||
|
statements
|
|||
|
|
|
|||
|
FILETOK quotedname
|
|||
|
{
|
|||
|
printf("A zonefile name '%s' was encountered\n", $2);
|
|||
|
}
|
|||
|
;
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This is a generic statement that catches all kinds of statements
|
|||
|
within the 'zone' block. We again see the recursiveness.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
block:
|
|||
|
OBRACE zonestatements EBRACE SEMICOLON
|
|||
|
;
|
|||
|
|
|||
|
statements:
|
|||
|
| statements statement
|
|||
|
;
|
|||
|
|
|||
|
statement: WORD | block | quotedname
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This defines a block, and 'statements' which may be found within.
|
|||
|
|
|||
|
When executed, the output is like this:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
$ ./example6
|
|||
|
zone "." {
|
|||
|
type hint;
|
|||
|
file "/etc/bind/db.root";
|
|||
|
type hint;
|
|||
|
};
|
|||
|
A zonefile name '/etc/bind/db.root' was encountered
|
|||
|
Complete zone for '.' found
|
|||
|
|
|||
|
|
|||
|
|
|||
|
5. Making a Parser in C++
|
|||
|
|
|||
|
Although Lex and YACC predate C++, it is possible to generate a C++
|
|||
|
parser. While Flex includes an option to generate a C++ lexer, we
|
|||
|
won't be using that, as YACC doesn't know how to deal with it
|
|||
|
directly.
|
|||
|
|
|||
|
My preferred way to make a C++ parser is to have Lex generate a plain
|
|||
|
C file, and to let YACC generate C++ code. When you then link your
|
|||
|
application, you may run into some problems because the C++ code by
|
|||
|
default won't be able to find C functions, unless you've told it that
|
|||
|
those functions are extern "C".
|
|||
|
To do so, make a C header in YACC like this:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
extern "C"
|
|||
|
{
|
|||
|
int yyparse(void);
|
|||
|
int yylex(void);
|
|||
|
int yywrap()
|
|||
|
{
|
|||
|
return 1;
|
|||
|
}
|
|||
|
|
|||
|
}
|
|||
|
|
|||
|
|
|||
|
|
|||
|
If you want to declare or change yydebug, you must now do it like
|
|||
|
this:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
extern int yydebug;
|
|||
|
|
|||
|
main()
|
|||
|
{
|
|||
|
yydebug=1;
|
|||
|
yyparse();
|
|||
|
}
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This is because C++'s One Definition Rule, which disallows multiple
|
|||
|
definitions of yydebug.
|
|||
|
|
|||
|
You may also find that you need to repeat the #define of YYSTYPE in
|
|||
|
your Lex file, because of C++'s stricter type checking.
|
|||
|
|
|||
|
To compile, do something like this:
|
|||
|
|
|||
|
|
|||
|
lex bindconfig2.l
|
|||
|
yacc --verbose --debug -d bindconfig2.y -o bindconfig2.cc
|
|||
|
cc -c lex.yy.c -o lex.yy.o
|
|||
|
c++ lex.yy.o bindconfig2.cc -o bindconfig2
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Because of the -o statement, y.tab.h is now called bindconfig2.cc.h,
|
|||
|
so take that into account.
|
|||
|
|
|||
|
To summarize: don't bother to compile your Lexer in C++, keep it in C.
|
|||
|
Make your Parser in C++ and explain your compiler that some functions
|
|||
|
are C functions with extern "C" statements.
|
|||
|
|
|||
|
|
|||
|
6. How do Lex and YACC work internally
|
|||
|
|
|||
|
In the YACC file, you write your own main() function, which calls
|
|||
|
yyparse() at one point. The function yyparse() is created for you by
|
|||
|
YACC, and ends up in y.tab.c.
|
|||
|
|
|||
|
|
|||
|
yyparse() reads a stream of token/value pairs from yylex(), which
|
|||
|
needs to be supplied. You can code this function yourself, or have Lex
|
|||
|
do it for you. In our examples, we've chosen to leave this task to
|
|||
|
Lex.
|
|||
|
|
|||
|
The yylex() as written by Lex reads characters from a FILE * file
|
|||
|
pointer called yyin. If you do not set yyin, it defaults to standard
|
|||
|
input. It outputs to yyout, which if unset defaults to stdout. You can
|
|||
|
also modify yyin in the yywrap() function which is called at the end
|
|||
|
of a file. It allows you to open another file, and continue parsing.
|
|||
|
|
|||
|
If this is the case, have it return 0. If you want to end parsing at
|
|||
|
this file, let it return 1.
|
|||
|
|
|||
|
Each call to yylex() returns an integer value which represents a token
|
|||
|
type. This tells YACC what kind of token it has read. The token may
|
|||
|
optionally have a value, which should be placed in the variable
|
|||
|
yylval.
|
|||
|
|
|||
|
By default yylval is of type int, but you can override that from the
|
|||
|
YACC file by re#defining YYSTYPE.
|
|||
|
|
|||
|
The Lexer needs to be able to access yylval. In order to do so, it
|
|||
|
must be declared in the scope of the lexer as an extern variable. The
|
|||
|
original YACC neglects to do this for you, so you should add the
|
|||
|
following to your lexter, just beneath #include <y.tab.h>:
|
|||
|
|
|||
|
|
|||
|
extern YYSTYPE yylval;
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Bison, which most people are using these days, does this for you
|
|||
|
automatically.
|
|||
|
|
|||
|
|
|||
|
6.1. Token values
|
|||
|
|
|||
|
As mentioned before, yylex() needs to return what kind of token it
|
|||
|
encountered, and put its value in yylval. When these tokens are
|
|||
|
defined with the %token command, they are assigned numerical id's,
|
|||
|
starting from 256.
|
|||
|
|
|||
|
Because of that fact, it is possible to have all ascii characters as a
|
|||
|
token. Let's say you are writing a calculator, up till now we would
|
|||
|
have written the lexer like this:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
[0-9]+ yylval=atoi(yytext); return NUMBER;
|
|||
|
[ \n]+ /* eat whitespace */;
|
|||
|
- return MINUS;
|
|||
|
\* return MULT;
|
|||
|
\+ return PLUS;
|
|||
|
...
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Our YACC grammer would then contain:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
exp: NUMBER
|
|||
|
|
|
|||
|
exp PLUS exp
|
|||
|
|
|
|||
|
exp MINUS exp
|
|||
|
|
|
|||
|
exp MULT exp
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This is needlessly complicated. By using characters as shorthands for
|
|||
|
numerical token id's, we can rewrite our lexer like this:
|
|||
|
|
|||
|
|
|||
|
[0-9]+ yylval=atoi(yytext); return NUMBER;
|
|||
|
[ \n]+ /* eat whitespace */;
|
|||
|
. return (int) yytext[0];
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This last dot matches all single otherwise unmatched characters.
|
|||
|
|
|||
|
Our YACC grammer would then be:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
exp: NUMBER
|
|||
|
|
|
|||
|
exp '+' exp
|
|||
|
|
|
|||
|
exp '-' exp
|
|||
|
|
|
|||
|
exp '*' exp
|
|||
|
|
|||
|
|
|||
|
|
|||
|
This is lots shorter and also more obvious. You do not need to declare
|
|||
|
these ascii tokens with %token in the header, they work out of the
|
|||
|
box.
|
|||
|
|
|||
|
One other very good thing about this construct is that Lex will now
|
|||
|
match everything we throw at it - avoiding the default behaviour of
|
|||
|
echoing unmatched input to standard output. If a user of this
|
|||
|
calculator uses a ^, for example, it will now generate a parsing
|
|||
|
error, instead of being echoed to standard output.
|
|||
|
|
|||
|
|
|||
|
6.2. Recursion: 'right is wrong'
|
|||
|
|
|||
|
Recursion is a vital aspect of YACC. Without it, you can't specify
|
|||
|
that a file consists of a sequence of independent commands or
|
|||
|
statements. Out of its own accord, YACC is only interested in the
|
|||
|
first rule, or the one you designate as the starting rule, with the
|
|||
|
'%start' symbol.
|
|||
|
|
|||
|
Recursion in YACC comes in two flavours: right and left. Left
|
|||
|
recursion, which is the one you should use most of the time, looks
|
|||
|
like this:
|
|||
|
|
|||
|
commands: /* empty */
|
|||
|
|
|
|||
|
commands command
|
|||
|
|
|||
|
|
|||
|
This says: a command is either empty, or it consists of more commands,
|
|||
|
followed by a command. They way YACC works means that it can now eas<61>
|
|||
|
ily chop off individual command groups (from the front) and reduce
|
|||
|
them.
|
|||
|
|
|||
|
Compare this to right recursion, which confusingly enough looks better
|
|||
|
to many eyes:
|
|||
|
|
|||
|
commands: /* empty */
|
|||
|
|
|
|||
|
command commands
|
|||
|
|
|||
|
|
|||
|
But this is expensive. If used as the %start rule, it requires YACC to
|
|||
|
keep all commands in your file on the stack, which may take a lot of
|
|||
|
memory. So by all means, use left recursion when parsing long state<74>
|
|||
|
ments, like entire files. Sometimes it is hard to avoid right recur<75>
|
|||
|
sion but if your statements are not too long, you do not need to go
|
|||
|
out of your way to use left recursion.
|
|||
|
|
|||
|
If you have something terminating (and therefore separating) your
|
|||
|
commands, right recursion looks very natural, but is still expensive:
|
|||
|
|
|||
|
commands: /* empty */
|
|||
|
|
|
|||
|
command SEMICOLON commands
|
|||
|
|
|||
|
|
|||
|
|
|||
|
The right way to code this is using left recursion (I didn't invent
|
|||
|
this either):
|
|||
|
|
|||
|
commands: /* empty */
|
|||
|
|
|
|||
|
commands command SEMICOLON
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Earlier versions of this HOWTO mistakenly used right recursion. Markus
|
|||
|
Triska kindly informed us of this.
|
|||
|
|
|||
|
|
|||
|
6.3. Advanced yylval: %union
|
|||
|
|
|||
|
Currently, we need to define *the* type of yylval. This however is not
|
|||
|
always appropriate. There will be times when we need to be able to
|
|||
|
handle multiple data types. Returning to our hypothetical thermostat,
|
|||
|
perhaps we want to be able to choose a heater to control, like this:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
heater mainbuiling
|
|||
|
Selected 'mainbuilding' heater
|
|||
|
target temperature 23
|
|||
|
'mainbuilding' heater target temperature now 23
|
|||
|
|
|||
|
|
|||
|
|
|||
|
What this calls for is for yylval to be a union, which can hold both
|
|||
|
strings and integers - but not simultaneously.
|
|||
|
|
|||
|
Remember that we told YACC previously what type yylval was supposed to
|
|||
|
by by defining YYSTYPE. We could conceivably define YYSTYPE to be a
|
|||
|
union this way, by YACC has an easier method for doing this: the
|
|||
|
%union statement.
|
|||
|
Based on Example 4, we now write the Example 7 YACC grammar. First the
|
|||
|
intro:
|
|||
|
|
|||
|
|
|||
|
%token TOKHEATER TOKHEAT TOKTARGET TOKTEMPERATURE
|
|||
|
|
|||
|
%union
|
|||
|
{
|
|||
|
int number;
|
|||
|
char *string;
|
|||
|
}
|
|||
|
|
|||
|
%token <number> STATE
|
|||
|
%token <number> NUMBER
|
|||
|
%token <string> WORD
|
|||
|
|
|||
|
|
|||
|
|
|||
|
We define our union, which contains only a number and a string. Then
|
|||
|
using an extended %token syntax, we explain to YACC which part of the
|
|||
|
union each token should access.
|
|||
|
|
|||
|
In this case, we let the STATE token use an integer, as before. Same
|
|||
|
goes for the NUMBER token, which we use for reading temperatures.
|
|||
|
|
|||
|
New however is the WORD token, which is declared to need a string.
|
|||
|
|
|||
|
The Lexer file changes a bit too:
|
|||
|
|
|||
|
|
|||
|
%{
|
|||
|
#include <stdio.h>
|
|||
|
#include <string.h>
|
|||
|
#include "y.tab.h"
|
|||
|
%}
|
|||
|
%%
|
|||
|
[0-9]+ yylval.number=atoi(yytext); return NUMBER;
|
|||
|
heater return TOKHEATER;
|
|||
|
heat return TOKHEAT;
|
|||
|
on|off yylval.number=!strcmp(yytext,"on"); return STATE;
|
|||
|
target return TOKTARGET;
|
|||
|
temperature return TOKTEMPERATURE;
|
|||
|
[a-z0-9]+ yylval.string=strdup(yytext);return WORD;
|
|||
|
\n /* ignore end of line */;
|
|||
|
[ \t]+ /* ignore whitespace */;
|
|||
|
%%
|
|||
|
|
|||
|
|
|||
|
|
|||
|
As you can see, we don't access the yylval directly anymore, we add a
|
|||
|
suffix indicating which part we want to access. We don't need to do
|
|||
|
that in the YACC grammar however, as YACC performs the magic for us:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
heater_select:
|
|||
|
TOKHEATER WORD
|
|||
|
{
|
|||
|
printf("\tSelected heater '%s'\n",$2);
|
|||
|
heater=$2;
|
|||
|
}
|
|||
|
;
|
|||
|
|
|||
|
|
|||
|
Because of the %token declaration above, YACC automatically picks the
|
|||
|
'string' member from our union. Note also that we store a copy of $2,
|
|||
|
which is later used to tell the user which heater he is sending
|
|||
|
commands to:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
target_set:
|
|||
|
TOKTARGET TOKTEMPERATURE NUMBER
|
|||
|
{
|
|||
|
printf("\tHeater '%s' temperature set to %d\n",heater,$3);
|
|||
|
}
|
|||
|
;
|
|||
|
|
|||
|
|
|||
|
|
|||
|
For more details, read example7.y.
|
|||
|
|
|||
|
|
|||
|
7. Debugging
|
|||
|
|
|||
|
Especially when learning, it is important to have debugging
|
|||
|
facilities. Luckily, YACC can give a lot of feedback. This feedback
|
|||
|
comes at the cost of some overhead, so you need to supply some
|
|||
|
switches to enable it.
|
|||
|
|
|||
|
When compiling your grammar, add --debug and --verbose to the YACC
|
|||
|
commandline. In your grammar C heading, add the following:
|
|||
|
|
|||
|
int yydebug=1;
|
|||
|
|
|||
|
This will generate the file 'y.output' which explains the state
|
|||
|
machine that was created.
|
|||
|
|
|||
|
When you now run the generated binary, it will output a *lot* of what
|
|||
|
is happening. This includes what state the state machine currently
|
|||
|
has, and what tokens are being read.
|
|||
|
|
|||
|
Peter Jinks wrote a page on debugging
|
|||
|
<http://www.cs.man.ac.uk/~pjj/cs2121/debug.html> which contains some
|
|||
|
common errors and how to solve them.
|
|||
|
|
|||
|
|
|||
|
7.1. The state machine
|
|||
|
|
|||
|
Internally, your YACC parser runs a so called 'state machine'. As the
|
|||
|
name implies, this is a machine that can be in several states. Then
|
|||
|
there are rules which govern transitions from one state to another.
|
|||
|
Everything starts with the so called 'root' rule I mentioned earlier.
|
|||
|
|
|||
|
To quote from the output from the Example 7 y.output:
|
|||
|
|
|||
|
|
|||
|
state 0
|
|||
|
|
|||
|
ZONETOK , and go to state 1
|
|||
|
|
|||
|
$default reduce using rule 1 (commands)
|
|||
|
|
|||
|
commands go to state 29
|
|||
|
command go to state 2
|
|||
|
zone_set go to state 3
|
|||
|
|
|||
|
|
|||
|
|
|||
|
By default, this state reduces using the 'commands' rule. This is the
|
|||
|
aforementioned recursive rule that defines 'commands' to be built up
|
|||
|
from individual command statements, followed by a semicolon, followed
|
|||
|
by possibly more commands.
|
|||
|
|
|||
|
This state reduces until it hits something it understands, in this
|
|||
|
case, a ZONETOK, ie, the word 'zone'. It then goes to state 1, which
|
|||
|
deals further with a zone command:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
state 1
|
|||
|
|
|||
|
zone_set -> ZONETOK . quotedname zonecontent (rule 4)
|
|||
|
|
|||
|
QUOTE , and go to state 4
|
|||
|
|
|||
|
quotedname go to state 5
|
|||
|
|
|||
|
|
|||
|
|
|||
|
The first line has a '.' in it to indicate where we are: we've just
|
|||
|
seen a ZONETOK and are now looking for a 'quotedname'. Apparently, a
|
|||
|
quotedname starts with a QUOTE, which sends us to state 4.
|
|||
|
|
|||
|
To follow this further, compile Example 7 with the flags mentioned in
|
|||
|
the Debugging section.
|
|||
|
|
|||
|
|
|||
|
7.2. Conflicts: 'shift/reduce', 'reduce/reduce'
|
|||
|
|
|||
|
Whenever YACC warns you about conflicts, you may be in for trouble.
|
|||
|
Solving these conflicts appears to be somewhat of an art form that may
|
|||
|
teach you a lot about your language. More than you possibly would have
|
|||
|
wanted to know.
|
|||
|
|
|||
|
The problems revolve around how to interpret a sequence of tokens.
|
|||
|
Let's suppose we define a language that needs to accept both these
|
|||
|
commands:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
delete heater all
|
|||
|
delete heater number1
|
|||
|
|
|||
|
|
|||
|
|
|||
|
To do this, we define this grammar:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
delete_heaters:
|
|||
|
TOKDELETE TOKHEATER mode
|
|||
|
{
|
|||
|
deleteheaters($3);
|
|||
|
}
|
|||
|
|
|||
|
mode: WORD
|
|||
|
|
|||
|
delete_a_heater:
|
|||
|
TOKDELETE TOKHEATER WORD
|
|||
|
{
|
|||
|
delete($3);
|
|||
|
}
|
|||
|
|
|||
|
|
|||
|
|
|||
|
You may already be smelling trouble. The state machine starts by
|
|||
|
reading the word 'delete', and then needs to decide where to go based
|
|||
|
on the next token. This next token can either be a mode, specifying
|
|||
|
how to delete the heaters, or the name of a heater to delete.
|
|||
|
|
|||
|
The problem however is that for both commands, the next token is going
|
|||
|
to be a WORD. YACC has therefore no idea what to do. This leads to a
|
|||
|
'reduce/reduce' warning, and a further warning that the
|
|||
|
'delete_a_heater' node is never going to be reached.
|
|||
|
|
|||
|
In this case the conflict is resolved easily (ie, by renaming the
|
|||
|
first command to 'delete heaters all', or by making 'all' a separate
|
|||
|
token), but sometimes it is harder. The y.output file generated when
|
|||
|
you pass yacc the --verbose flag can be of tremendous help.
|
|||
|
|
|||
|
|
|||
|
8. Further reading
|
|||
|
|
|||
|
GNU YACC (Bison) comes with a very nice info-file (.info) which
|
|||
|
documents the YACC syntax very well. It mentions Lex only once, but
|
|||
|
otherwise it's very good. You can read .info files with Emacs or with
|
|||
|
the very nice tool 'pinfo'. It is also available on the GNU site:
|
|||
|
BISON Manual <http://www.gnu.org/manual/bison/>.
|
|||
|
|
|||
|
Flex comes with a good manpage which is very useful if you already
|
|||
|
have a rough understanding of what Flex does. The Flex Manual
|
|||
|
<http://www.gnu.org/manual/flex/> is also available online.
|
|||
|
|
|||
|
After this introduction to Lex and YACC, you may find that you need
|
|||
|
more information. I haven't read any of these books yet, but they
|
|||
|
sound good:
|
|||
|
|
|||
|
Bison-The Yacc-Compatible Parser Generator
|
|||
|
By Charles Donnelly and Richard Stallman. An Amazon
|
|||
|
<http://www.amazon.com/exec/obidos/ASIN/0595100325/qid=989165194/sr=1-2/ref=sc_b_3/002-7737249-1404015>
|
|||
|
user found it useful.
|
|||
|
|
|||
|
|
|||
|
Lex & Yacc
|
|||
|
By John R. Levine, Tony Mason and Doug Brown. Considered to be
|
|||
|
the standard work on this subject, although a bit dated. Reviews
|
|||
|
over at Amazon
|
|||
|
<http://www.amazon.com/exec/obidos/ASIN/1565920007/ref=sim_books/002-7737249-1404015>.
|
|||
|
|
|||
|
Compilers : Principles, Techniques, and Tools
|
|||
|
By Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman. The 'Dragon
|
|||
|
Book'. From 1985 and they just keep printing it. Considered the
|
|||
|
standard work on constructing compilers. Amazon
|
|||
|
<http://www.amazon.com/exec/obidos/ASIN/0201100886/ref=sim_books/002-7737249-1404015>
|
|||
|
Thomas Niemann wrote a document discussing how to write compilers and
|
|||
|
calculators with Lex & YACC. You can find it here
|
|||
|
<http://epaperpress.com/y_man.html>.
|
|||
|
|
|||
|
The moderated usenet newsgroup comp.compilers can also be very useful
|
|||
|
but please keep in mind that the people there are not a dedicated
|
|||
|
parser helpdesk! Before posting, read their interesting page
|
|||
|
<http://compilers.iecc.com/> and especially the FAQ
|
|||
|
<http://compilers.iecc.com/faq.txt>.
|
|||
|
|
|||
|
Lex - A Lexical Analyzer Generator by M. E. Lesk and E. Schmidt is one
|
|||
|
of the original reference papers. It can be found here
|
|||
|
<http://www.cs.utexas.edu/users/novak/lexpaper.htm>.
|
|||
|
|
|||
|
Yacc: Yet Another Compiler-Compiler by Stephen C. Johnson is one of
|
|||
|
the original reference papers for YACC. It can be found here
|
|||
|
<http://www.cs.utexas.edu/users/novak/yaccpaper.htm>. It contains
|
|||
|
useful hints on style.
|
|||
|
|
|||
|
|
|||
|
9. Acknowledgements & Thanks
|
|||
|
|
|||
|
|
|||
|
<20> Pete Jinks <pjj%cs.man.ac.uk>
|
|||
|
|
|||
|
<20> Chris Lattner <sabre%nondot.org>
|
|||
|
|
|||
|
<20> John W. Millaway <johnmillaway%yahoo.com>
|
|||
|
|
|||
|
<20> Martin Neitzel <neitzel%gaertner.de>
|
|||
|
|
|||
|
<20> Esmond Pitt <esmond.pitt%bigpond.com>
|
|||
|
|
|||
|
<20> Eric S. Raymond
|
|||
|
|
|||
|
<20> Bob Schmertz <schmertz%wam.umd.edu>
|
|||
|
|
|||
|
<20> Adam Sulmicki <adam%cfar.umd.edu>
|
|||
|
|
|||
|
<20> Markus Triska <triska%gmx.at>
|
|||
|
|
|||
|
<20> Erik Verbruggen <erik%road-warrior.cs.kun.nl>
|
|||
|
|
|||
|
<20> Gary V. Vaughan <gary%gnu.org> (read his awesome Autobook
|
|||
|
<http://sources.redhat.com/autobook>)
|
|||
|
|
|||
|
<20> Ivo van der Wijk <http://vanderwijk.info> ( Amaze Internet
|
|||
|
<http://www.amaze.nl>)
|
|||
|
|
|||
|
|
|||
|
|