mirror of https://github.com/mkerrisk/man-pages
1286 lines
48 KiB
Groff
1286 lines
48 KiB
Groff
|
.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved
|
||
|
.TH "YACC" P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual"
|
||
|
.\" yacc
|
||
|
.SH NAME
|
||
|
yacc \- yet another compiler compiler (\fBDEVELOPMENT\fP)
|
||
|
.SH SYNOPSIS
|
||
|
.LP
|
||
|
\fByacc\fP \fB[\fP\fB-dltv\fP\fB][\fP\fB-b\fP \fIfile_prefix\fP\fB][\fP\fB-p\fP
|
||
|
\fIsym_prefix\fP\fB]\fP \fIgrammar\fP\fB\fP
|
||
|
.SH DESCRIPTION
|
||
|
.LP
|
||
|
The \fIyacc\fP utility shall read a description of a context-free
|
||
|
grammar in \fIgrammar\fP and write C source code, conforming
|
||
|
to the ISO\ C standard, to a code file, and optionally header information
|
||
|
into a header file, in the current directory. The C
|
||
|
code shall define a function and related routines and macros for an
|
||
|
automaton that executes a parsing algorithm meeting the
|
||
|
requirements in Algorithms .
|
||
|
.LP
|
||
|
The form and meaning of the grammar are described in the EXTENDED
|
||
|
DESCRIPTION section.
|
||
|
.LP
|
||
|
The C source code and header file shall be produced in a form suitable
|
||
|
as input for the C compiler (see \fIc99\fP ).
|
||
|
.SH OPTIONS
|
||
|
.LP
|
||
|
The \fIyacc\fP utility shall conform to the Base Definitions volume
|
||
|
of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
|
||
|
.LP
|
||
|
The following options shall be supported:
|
||
|
.TP 7
|
||
|
\fB-b\ \fP \fIfile_prefix\fP
|
||
|
Use \fIfile_prefix\fP instead of \fBy\fP as the prefix for all output
|
||
|
filenames. The code file \fBy.tab.c\fP, the header
|
||
|
file \fBy.tab.h\fP (created when \fB-d\fP is specified), and the description
|
||
|
file \fBy.output\fP (created when \fB-v\fP is
|
||
|
specified), shall be changed to \fIfile_prefix\fP \fB.tab.c\fP, \fIfile_prefix\fP
|
||
|
\fB\&.tab.h\fP, and \fIfile_prefix\fP
|
||
|
\fB\&.output\fP, respectively.
|
||
|
.TP 7
|
||
|
\fB-d\fP
|
||
|
Write the header file; by default only the code file is written. The
|
||
|
\fB#define\fP statements associate the token codes
|
||
|
assigned by \fIyacc\fP with the user-declared token names. This allows
|
||
|
source files other than \fBy.tab.c\fP to access the token
|
||
|
codes.
|
||
|
.TP 7
|
||
|
\fB-l\fP
|
||
|
Produce a code file that does not contain any \fB#line\fP constructs.
|
||
|
If this option is not present, it is unspecified whether
|
||
|
the code file or header file contains \fB#line\fP directives. This
|
||
|
should only be used after the grammar and the associated
|
||
|
actions are fully debugged.
|
||
|
.TP 7
|
||
|
\fB-p\ \fP \fIsym_prefix\fP
|
||
|
.sp
|
||
|
Use \fIsym_prefix\fP instead of \fByy\fP as the prefix for all external
|
||
|
names produced by \fIyacc\fP. The names affected shall
|
||
|
include the functions \fIyyparse\fP(), \fIyylex\fP(), and \fIyyerror\fP(),
|
||
|
and the variables \fIyylval\fP, \fIyychar\fP, and
|
||
|
\fIyydebug\fP. (In the remainder of this section, the six symbols
|
||
|
cited are referenced using their default names only as a
|
||
|
notational convenience.) Local names may also be affected by the \fB-p\fP
|
||
|
option; however, the \fB-p\fP option shall not affect
|
||
|
\fB#define\fP symbols generated by \fIyacc\fP.
|
||
|
.TP 7
|
||
|
\fB-t\fP
|
||
|
Modify conditional compilation directives to permit compilation of
|
||
|
debugging code in the code file. Runtime debugging
|
||
|
statements shall always be contained in the code file, but by default
|
||
|
conditional compilation directives prevent their
|
||
|
compilation.
|
||
|
.TP 7
|
||
|
\fB-v\fP
|
||
|
Write a file containing a description of the parser and a report of
|
||
|
conflicts generated by ambiguities in the grammar.
|
||
|
.sp
|
||
|
.SH OPERANDS
|
||
|
.LP
|
||
|
The following operand is required:
|
||
|
.TP 7
|
||
|
\fIgrammar\fP
|
||
|
A pathname of a file containing instructions, hereafter called \fIgrammar\fP,
|
||
|
for which a parser is to be created. The format
|
||
|
for the grammar is described in the EXTENDED DESCRIPTION section.
|
||
|
.sp
|
||
|
.SH STDIN
|
||
|
.LP
|
||
|
Not used.
|
||
|
.SH INPUT FILES
|
||
|
.LP
|
||
|
The file \fIgrammar\fP shall be a text file formatted as specified
|
||
|
in the EXTENDED DESCRIPTION section.
|
||
|
.SH ENVIRONMENT VARIABLES
|
||
|
.LP
|
||
|
The following environment variables shall affect the execution of
|
||
|
\fIyacc\fP:
|
||
|
.TP 7
|
||
|
\fILANG\fP
|
||
|
Provide a default value for the internationalization variables that
|
||
|
are unset or null. (See the Base Definitions volume of
|
||
|
IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables
|
||
|
for
|
||
|
the precedence of internationalization variables used to determine
|
||
|
the values of locale categories.)
|
||
|
.TP 7
|
||
|
\fILC_ALL\fP
|
||
|
If set to a non-empty string value, override the values of all the
|
||
|
other internationalization variables.
|
||
|
.TP 7
|
||
|
\fILC_CTYPE\fP
|
||
|
Determine the locale for the interpretation of sequences of bytes
|
||
|
of text data as characters (for example, single-byte as
|
||
|
opposed to multi-byte characters in arguments and input files).
|
||
|
.TP 7
|
||
|
\fILC_MESSAGES\fP
|
||
|
Determine the locale that should be used to affect the format and
|
||
|
contents of diagnostic messages written to standard
|
||
|
error.
|
||
|
.TP 7
|
||
|
\fINLSPATH\fP
|
||
|
Determine the location of message catalogs for the processing of \fILC_MESSAGES
|
||
|
\&.\fP
|
||
|
.sp
|
||
|
.LP
|
||
|
The \fILANG\fP and \fILC_*\fP variables affect the execution of the
|
||
|
\fIyacc\fP utility as stated. The \fImain\fP() function
|
||
|
defined in Yacc Library shall call:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBsetlocale(LC_ALL, "")
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
and thus the program generated by \fIyacc\fP shall also be affected
|
||
|
by the contents of these variables at runtime.
|
||
|
.SH ASYNCHRONOUS EVENTS
|
||
|
.LP
|
||
|
Default.
|
||
|
.SH STDOUT
|
||
|
.LP
|
||
|
Not used.
|
||
|
.SH STDERR
|
||
|
.LP
|
||
|
If shift/reduce or reduce/reduce conflicts are detected in \fIgrammar\fP,
|
||
|
\fIyacc\fP shall write a report of those conflicts
|
||
|
to the standard error in an unspecified format.
|
||
|
.LP
|
||
|
Standard error shall also be used for diagnostic messages.
|
||
|
.SH OUTPUT FILES
|
||
|
.LP
|
||
|
The code file, the header file, and the description file shall be
|
||
|
text files. All are described in the following sections.
|
||
|
.SS Code File
|
||
|
.LP
|
||
|
This file shall contain the C source code for the \fIyyparse\fP()
|
||
|
function. It shall contain code for the various semantic
|
||
|
actions with macro substitution performed on them as described in
|
||
|
the EXTENDED DESCRIPTION section. It also shall contain a copy of
|
||
|
the \fB#define\fP statements in the header file. If a \fB%union\fP
|
||
|
declaration is used, the declaration for YYSTYPE shall also be
|
||
|
included in this file.
|
||
|
.SS Header File
|
||
|
.LP
|
||
|
The header file shall contain \fB#define\fP statements that associate
|
||
|
the token numbers with the token names. This allows
|
||
|
source files other than the code file to access the token codes. If
|
||
|
a \fB%union\fP declaration is used, the declaration for
|
||
|
YYSTYPE and an \fIextern YYSTYPE yylval\fP declaration shall also
|
||
|
be included in this file.
|
||
|
.SS Description File
|
||
|
.LP
|
||
|
The description file shall be a text file containing a description
|
||
|
of the state machine corresponding to the parser, using an
|
||
|
unspecified format. Limits for internal tables (see Limits ) shall
|
||
|
also be reported, in an
|
||
|
implementation-defined manner. (Some implementations may use dynamic
|
||
|
allocation techniques and have no specific limit values to
|
||
|
report.)
|
||
|
.SH EXTENDED DESCRIPTION
|
||
|
.LP
|
||
|
The \fIyacc\fP command accepts a language that is used to define a
|
||
|
grammar for a target language to be parsed by the tables and
|
||
|
code generated by \fIyacc\fP. The language accepted by \fIyacc\fP
|
||
|
as a grammar for the target language is described below using
|
||
|
the \fIyacc\fP input language itself.
|
||
|
.LP
|
||
|
The input \fIgrammar\fP includes rules describing the input structure
|
||
|
of the target language and code to be invoked when these
|
||
|
rules are recognized to provide the associated semantic action. The
|
||
|
code to be executed shall appear as bodies of text that are
|
||
|
intended to be C-language code. The C-language inclusions are presumed
|
||
|
to form a correct function when processed by \fIyacc\fP
|
||
|
into its output files. The code included in this way shall be executed
|
||
|
during the recognition of the target language.
|
||
|
.LP
|
||
|
Given a grammar, the \fIyacc\fP utility generates the files described
|
||
|
in the OUTPUT FILES section. The code file can be
|
||
|
compiled and linked using \fIc99\fP. If the declaration and programs
|
||
|
sections of the grammar
|
||
|
file did not include definitions of \fImain\fP(), \fIyylex\fP(), and
|
||
|
\fIyyerror\fP(), the compiled output requires linking with
|
||
|
externally supplied versions of those functions. Default versions
|
||
|
of \fImain\fP() and \fIyyerror\fP() are supplied in the
|
||
|
\fIyacc\fP library and can be linked in by using the \fB-l\ y\fP operand
|
||
|
to \fIc99\fP.
|
||
|
The \fIyacc\fP library interfaces need not support interfaces with
|
||
|
other than the default \fByy\fP symbol prefix. The application
|
||
|
provides the lexical analyzer function, \fIyylex\fP(); the \fIlex\fP
|
||
|
utility is specifically
|
||
|
designed to generate such a routine.
|
||
|
.SS Input Language
|
||
|
.LP
|
||
|
The application shall ensure that every specification file consists
|
||
|
of three sections in order: \fIdeclarations\fP, \fIgrammar
|
||
|
rules\fP, and \fIprograms\fP, separated by double percent signs (
|
||
|
\fB"%%"\fP ). The declarations and programs sections can be
|
||
|
empty. If the latter is empty, the preceding \fB"%%"\fP mark separating
|
||
|
it from the rules section can be omitted.
|
||
|
.LP
|
||
|
The input is free form text following the structure of the grammar
|
||
|
defined below.
|
||
|
.SS Lexical Structure of the Grammar
|
||
|
.LP
|
||
|
The <blank>s, <newline>s, and <form-feed>s shall be ignored, except
|
||
|
that the application shall ensure that
|
||
|
they do not appear in names or multi-character reserved symbols. Comments
|
||
|
shall be enclosed in \fB"/*\ ...\ */"\fP , and
|
||
|
can appear wherever a name is valid.
|
||
|
.LP
|
||
|
Names are of arbitrary length, made up of letters, periods ( \fB'.'\fP
|
||
|
), underscores ( \fB'_'\fP ), and non-initial
|
||
|
digits. Uppercase and lowercase letters are distinct. Conforming applications
|
||
|
shall not use names beginning in \fByy\fP or
|
||
|
\fBYY\fP since the \fIyacc\fP parser uses such names. Many of the
|
||
|
names appear in the final output of \fIyacc\fP, and thus they
|
||
|
should be chosen to conform with any additional rules created by the
|
||
|
C compiler to be used. In particular they appear in
|
||
|
\fB#define\fP statements.
|
||
|
.LP
|
||
|
A literal shall consist of a single character enclosed in single-quotes
|
||
|
( \fB'"\fP ). All of the escape sequences supported
|
||
|
for character constants by the ISO\ C standard shall be supported
|
||
|
by \fIyacc\fP.
|
||
|
.LP
|
||
|
The relationship with the lexical analyzer is discussed in detail
|
||
|
below.
|
||
|
.LP
|
||
|
The application shall ensure that the NUL character is not used in
|
||
|
grammar rules or literals.
|
||
|
.SS Declarations Section
|
||
|
.LP
|
||
|
The declarations section is used to define the symbols used to define
|
||
|
the target language and their relationship with each
|
||
|
other. In particular, much of the additional information required
|
||
|
to resolve ambiguities in the context-free grammar for the target
|
||
|
language is provided here.
|
||
|
.LP
|
||
|
Usually \fIyacc\fP assigns the relationship between the symbolic names
|
||
|
it generates and their underlying numeric value. The
|
||
|
declarations section makes it possible to control the assignment of
|
||
|
these values.
|
||
|
.LP
|
||
|
It is also possible to keep semantic information associated with the
|
||
|
tokens currently on the parse stack in a user-defined
|
||
|
C-language \fBunion\fP, if the members of the union are associated
|
||
|
with the various names in the grammar. The declarations section
|
||
|
provides for this as well.
|
||
|
.LP
|
||
|
The first group of declarators below all take a list of names as arguments.
|
||
|
That list can optionally be preceded by the name of
|
||
|
a C union member (called a \fItag\fP below) appearing within \fB'<'\fP
|
||
|
and \fB'>'\fP . (As an exception to the
|
||
|
typographical conventions of the rest of this volume of IEEE\ Std\ 1003.1-2001,
|
||
|
in this case <\fItag\fP> does not
|
||
|
represent a metavariable, but the literal angle bracket characters
|
||
|
surrounding a symbol.) The use of \fItag\fP specifies that the
|
||
|
tokens named on this line shall be of the same C type as the union
|
||
|
member referenced by \fItag\fP. This is discussed in more
|
||
|
detail below.
|
||
|
.LP
|
||
|
For lists used to define tokens, the first appearance of a given token
|
||
|
can be followed by a positive integer (as a string of
|
||
|
decimal digits). If this is done, the underlying value assigned to
|
||
|
it for lexical purposes shall be taken to be that number.
|
||
|
.LP
|
||
|
The following declares \fIname\fP to be a token:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB%token\fP \fB[\fP\fB<\fP\fItag\fP\fB>\fP\fB]\fP \fIname\fP \fB[\fP\fInumber\fP\fB][\fP\fIname\fP \fB[\fP\fInumber\fP\fB]]\fP\fB...
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
If \fItag\fP is present, the C type for all tokens on this line shall
|
||
|
be declared to be the type referenced by \fItag\fP. If a
|
||
|
positive integer, \fInumber\fP, follows a \fIname\fP, that value shall
|
||
|
be assigned to the token.
|
||
|
.LP
|
||
|
The following declares \fIname\fP to be a token, and assigns precedence
|
||
|
to it:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB%left\fP \fB[\fP\fB<\fP\fItag\fP\fB>\fP\fB]\fP \fIname\fP \fB[\fP\fInumber\fP\fB][\fP\fIname\fP \fB[\fP\fInumber\fP\fB]]\fP\fB...
|
||
|
%right\fP \fB[\fP\fB<\fP\fItag\fP\fB>\fP\fB]\fP \fIname\fP \fB[\fP\fInumber\fP\fB][\fP\fIname\fP \fB[\fP\fInumber\fP\fB]]\fP\fB...
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
One or more lines, each beginning with one of these symbols, can appear
|
||
|
in this section. All tokens on the same line have the
|
||
|
same precedence level and associativity; the lines are in order of
|
||
|
increasing precedence or binding strength. \fB%left\fP denotes
|
||
|
that the operators on that line are left associative, and \fB%right\fP
|
||
|
similarly denotes right associative operators. If
|
||
|
\fItag\fP is present, it shall declare a C type for \fIname\fPs as
|
||
|
described for \fB%token\fP.
|
||
|
.LP
|
||
|
The following declares \fIname\fP to be a token, and indicates that
|
||
|
this cannot be used associatively:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB%nonassoc\fP \fB[\fP\fB<\fP\fItag\fP\fB>\fP\fB]\fP \fIname\fP \fB[\fP\fInumber\fP\fB][\fP\fIname\fP \fB[\fP\fInumber\fP\fB]]\fP\fB...
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
If the parser encounters associative use of this token it reports
|
||
|
an error. If \fItag\fP is present, it shall declare a C type
|
||
|
for \fIname\fPs as described for \fB%token\fP.
|
||
|
.LP
|
||
|
The following declares that union member \fIname\fPs are non-terminals,
|
||
|
and thus it is required to have a \fItag\fP field at
|
||
|
its beginning:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB%type <\fP\fItag\fP\fB>\fP \fIname\fP\fB...
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
Because it deals with non-terminals only, assigning a token number
|
||
|
or using a literal is also prohibited. If this construct is
|
||
|
present, \fIyacc\fP shall perform type checking; if this construct
|
||
|
is not present, the parse stack shall hold only the \fBint\fP
|
||
|
type.
|
||
|
.LP
|
||
|
Every name used in \fIgrammar\fP not defined by a \fB%token\fP, \fB%left\fP,
|
||
|
\fB%right\fP, or \fB%nonassoc\fP declaration
|
||
|
is assumed to represent a non-terminal symbol. The \fIyacc\fP utility
|
||
|
shall report an error for any non-terminal symbol that does
|
||
|
not appear on the left side of at least one grammar rule.
|
||
|
.LP
|
||
|
Once the type, precedence, or token number of a name is specified,
|
||
|
it shall not be changed. If the first declaration of a token
|
||
|
does not assign a token number, \fIyacc\fP shall assign a token number.
|
||
|
Once this assignment is made, the token number shall not
|
||
|
be changed by explicit assignment.
|
||
|
.LP
|
||
|
The following declarators do not follow the previous pattern.
|
||
|
.LP
|
||
|
The following declares the non-terminal \fIname\fP to be the \fIstart
|
||
|
symbol\fP, which represents the largest, most general
|
||
|
structure described by the grammar rules:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB%start\fP \fIname\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
By default, it is the left-hand side of the first grammar rule; this
|
||
|
default can be overridden with this declaration.
|
||
|
.LP
|
||
|
The following declares the \fIyacc\fP value stack to be a union of
|
||
|
the various types of values desired:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB%union {\fP \fIbody of union\fP \fB(\fP\fIin C\fP\fB) }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
By default, the values returned by actions (see below) and the lexical
|
||
|
analyzer shall be of type \fBint\fP. The \fIyacc\fP
|
||
|
utility keeps track of types, and it shall insert corresponding union
|
||
|
member names in order to perform strict type checking of the
|
||
|
resulting parser.
|
||
|
.LP
|
||
|
Alternatively, given that at least one <\fItag\fP> construct is used,
|
||
|
the union can be declared in a header file (which
|
||
|
shall be included in the declarations section by using a \fB#include\fP
|
||
|
construct within \fB%{\fP and \fB%}\fP), and a
|
||
|
\fBtypedef\fP used to define the symbol YYSTYPE to represent this
|
||
|
union. The effect of \fB%union\fP is to provide the declaration
|
||
|
of YYSTYPE directly from the \fIyacc\fP input.
|
||
|
.LP
|
||
|
C-language declarations and definitions can appear in the declarations
|
||
|
section, enclosed by the following marks:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB%{ ... %}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
These statements shall be copied into the code file, and have global
|
||
|
scope within it so that they can be used in the rules and
|
||
|
program sections.
|
||
|
.LP
|
||
|
The application shall ensure that the declarations section is terminated
|
||
|
by the token \fB%%\fP.
|
||
|
.SS Grammar Rules in yacc
|
||
|
.LP
|
||
|
The rules section defines the context-free grammar to be accepted
|
||
|
by the function \fIyacc\fP generates, and associates with
|
||
|
those rules C-language actions and additional precedence information.
|
||
|
The grammar is described below, and a formal definition
|
||
|
follows.
|
||
|
.LP
|
||
|
The rules section is comprised of one or more grammar rules. A grammar
|
||
|
rule has the form:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBA : BODY ;
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
The symbol \fBA\fP represents a non-terminal name, and \fBBODY\fP
|
||
|
represents a sequence of zero or more \fIname\fPs,
|
||
|
\fIliteral\fPs, and \fIsemantic action\fPs that can then be followed
|
||
|
by optional \fIprecedence rule\fPs. Only the names and
|
||
|
literals participate in the formation of the grammar; the semantic
|
||
|
actions and precedence rules are used in other ways. The colon
|
||
|
and the semicolon are \fIyacc\fP punctuation. If there are several
|
||
|
successive grammar rules with the same left-hand side, the
|
||
|
vertical bar \fB'|'\fP can be used to avoid rewriting the left-hand
|
||
|
side; in this case the semicolon appears only after the last
|
||
|
rule. The BODY part can be empty (or empty of names and literals)
|
||
|
to indicate that the non-terminal symbol matches the empty
|
||
|
string.
|
||
|
.LP
|
||
|
The \fIyacc\fP utility assigns a unique number to each rule. Rules
|
||
|
using the vertical bar notation are distinct rules. The
|
||
|
number assigned to the rule appears in the description file.
|
||
|
.LP
|
||
|
The elements comprising a BODY are:
|
||
|
.TP 7
|
||
|
\fIname\fP,\ \fIliteral\fP
|
||
|
These form the rules of the grammar: \fIname\fP is either a \fItoken\fP
|
||
|
or a \fInon-terminal\fP; \fIliteral\fP stands for
|
||
|
itself (less the lexically required quotation marks).
|
||
|
.TP 7
|
||
|
\fIsemantic action\fP
|
||
|
.sp
|
||
|
With each grammar rule, the user can associate actions to be performed
|
||
|
each time the rule is recognized in the input process. (Note
|
||
|
that the word "action" can also refer to the actions of the parser-shift,
|
||
|
reduce, and so on.)
|
||
|
.LP
|
||
|
These actions can return values and can obtain the values returned
|
||
|
by previous actions. These values are kept in objects of type
|
||
|
YYSTYPE (see \fB%union\fP). The result value of the action shall be
|
||
|
kept on the parse stack with the left-hand side of the rule,
|
||
|
to be accessed by other reductions as part of their right-hand side.
|
||
|
By using the <\fItag\fP> information provided in the
|
||
|
declarations section, the code generated by \fIyacc\fP can be strictly
|
||
|
type checked and contain arbitrary information. In
|
||
|
addition, the lexical analyzer can provide the same kinds of values
|
||
|
for tokens, if desired.
|
||
|
.LP
|
||
|
An action is an arbitrary C statement and as such can do input or
|
||
|
output, call subprograms, and alter external variables. An
|
||
|
action is one or more C statements enclosed in curly braces \fB'{'\fP
|
||
|
and \fB'}'\fP .
|
||
|
.LP
|
||
|
Certain pseudo-variables can be used in the action. These are macros
|
||
|
for access to data structures known internally to
|
||
|
\fIyacc\fP.
|
||
|
.TP 7
|
||
|
$$
|
||
|
.RS
|
||
|
The value of the action can be set by assigning it to $$. If type
|
||
|
checking is enabled and the type of the value to be assigned
|
||
|
cannot be determined, a diagnostic message may be generated.
|
||
|
.RE
|
||
|
.TP 7
|
||
|
$\fInumber\fP
|
||
|
.RS
|
||
|
This refers to the value returned by the component specified by the
|
||
|
token \fInumber\fP in the right side of a rule, reading
|
||
|
from left to right; \fInumber\fP can be zero or negative. If \fInumber\fP
|
||
|
is zero or negative, it refers to the data associated
|
||
|
with the name on the parser's stack preceding the leftmost symbol
|
||
|
of the current rule. (That is, \fB"$0"\fP refers to the name
|
||
|
immediately preceding the leftmost name in the current rule to be
|
||
|
found on the parser's stack and \fB"$-1"\fP refers to the
|
||
|
symbol to \fIits\fP left.) If \fInumber\fP refers to an element past
|
||
|
the current point in the rule, or beyond the bottom of the
|
||
|
stack, the result is undefined. If type checking is enabled and the
|
||
|
type of the value to be assigned cannot be determined, a
|
||
|
diagnostic message may be generated.
|
||
|
.RE
|
||
|
.TP 7
|
||
|
$<\fItag\fP>\fInumber\fP
|
||
|
.RS
|
||
|
.sp
|
||
|
These correspond exactly to the corresponding symbols without the
|
||
|
\fItag\fP inclusion, but allow for strict type checking (and
|
||
|
preclude unwanted type conversions). The effect is that the macro
|
||
|
is expanded to use \fItag\fP to select an element from the
|
||
|
YYSTYPE union (using \fIdataname.tag\fP). This is particularly useful
|
||
|
if \fInumber\fP is not positive.
|
||
|
.RE
|
||
|
.TP 7
|
||
|
$<\fItag\fP>$
|
||
|
.RS
|
||
|
This imposes on the reference the type of the union member referenced
|
||
|
by \fItag\fP. This construction is applicable when a
|
||
|
reference to a left context value occurs in the grammar, and provides
|
||
|
\fIyacc\fP with a means for selecting a type.
|
||
|
.RE
|
||
|
.sp
|
||
|
.LP
|
||
|
Actions can occur anywhere in a rule (not just at the end); an action
|
||
|
can access values returned by actions to its left, and in
|
||
|
turn the value it returns can be accessed by actions to its right.
|
||
|
An action appearing in the middle of a rule shall be equivalent
|
||
|
to replacing the action with a new non-terminal symbol and adding
|
||
|
an empty rule with that non-terminal symbol on the left-hand
|
||
|
side. The semantic action associated with the new rule shall be equivalent
|
||
|
to the original action. The use of actions within rules
|
||
|
might introduce conflicts that would not otherwise exist.
|
||
|
.LP
|
||
|
By default, the value of a rule shall be the value of the first element
|
||
|
in it. If the first element does not have a type
|
||
|
(particularly in the case of a literal) and type checking is turned
|
||
|
on by \fB%type\fP, an error message shall result.
|
||
|
.TP 7
|
||
|
\fIprecedence\fP
|
||
|
The keyword \fB%prec\fP can be used to change the precedence level
|
||
|
associated with a particular grammar rule. Examples of this
|
||
|
are in cases where a unary and binary operator have the same symbolic
|
||
|
representation, but need to be given different precedences,
|
||
|
or where the handling of an ambiguous if-else construction is necessary.
|
||
|
The reserved symbol \fB%prec\fP can appear immediately
|
||
|
after the body of the grammar rule and can be followed by a token
|
||
|
name or a literal. It shall cause the precedence of the grammar
|
||
|
rule to become that of the following token name or literal. The action
|
||
|
for the rule as a whole can follow \fB%prec\fP.
|
||
|
.sp
|
||
|
.LP
|
||
|
If a program section follows, the application shall ensure that the
|
||
|
grammar rules are terminated by \fB%%\fP.
|
||
|
.SS Programs Section
|
||
|
.LP
|
||
|
The \fIprograms\fP section can include the definition of the lexical
|
||
|
analyzer \fIyylex\fP(), and any other functions; for
|
||
|
example, those used in the actions specified in the grammar rules.
|
||
|
It is unspecified whether the programs section precedes or
|
||
|
follows the semantic actions in the output file; therefore, if the
|
||
|
application contains any macro definitions and declarations
|
||
|
intended to apply to the code in the semantic actions, it shall place
|
||
|
them within \fB"%{\ ...\ %}"\fP in the
|
||
|
declarations section.
|
||
|
.SS Input Grammar
|
||
|
.LP
|
||
|
The following input to \fIyacc\fP yields a parser for the input to
|
||
|
\fIyacc\fP. This formal syntax takes precedence over the
|
||
|
preceding text syntax description.
|
||
|
.LP
|
||
|
The lexical structure is defined less precisely; Lexical Structure
|
||
|
of the Grammar defines most
|
||
|
terms. The correspondence between the previous terms and the tokens
|
||
|
below is as follows.
|
||
|
.TP 7
|
||
|
\fBIDENTIFIER\fP
|
||
|
This corresponds to the concept of \fIname\fP, given previously. It
|
||
|
also includes literals as defined previously.
|
||
|
.TP 7
|
||
|
\fBC_IDENTIFIER\fP
|
||
|
This is a name, and additionally it is known to be followed by a colon.
|
||
|
A literal cannot yield this token.
|
||
|
.TP 7
|
||
|
\fBNUMBER\fP
|
||
|
A string of digits (a non-negative decimal integer).
|
||
|
.TP 7
|
||
|
\fBTYPE\fP,\ \fBLEFT\fP,\ \fBMARK\fP,\ \fBLCURL\fP,\ \fBRCURL\fP
|
||
|
.sp
|
||
|
These correspond directly to \fB%type\fP, \fB%left\fP, \fB%%\fP, \fB%{\fP,
|
||
|
and \fB%}\fP.
|
||
|
.TP 7
|
||
|
\fB{\ ...\ }\fP
|
||
|
This indicates C-language source code, with the possible inclusion
|
||
|
of \fB'$'\fP macros as discussed previously.
|
||
|
.sp
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB/* Grammar for the input to yacc. */
|
||
|
/* Basic entries. */
|
||
|
/* The following are recognized by the lexical analyzer. */
|
||
|
.sp
|
||
|
|
||
|
%token IDENTIFIER /* Includes identifiers and literals */
|
||
|
%token C_IDENTIFIER /* identifier (but not literal)
|
||
|
followed by a :. */
|
||
|
%token NUMBER /* [0-9][0-9]* */
|
||
|
.sp
|
||
|
|
||
|
/* Reserved words : %type=>TYPE %left=>LEFT, and so on */
|
||
|
.sp
|
||
|
|
||
|
%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
|
||
|
.sp
|
||
|
|
||
|
%token MARK /* The %% mark. */
|
||
|
%token LCURL /* The %{ mark. */
|
||
|
%token RCURL /* The %} mark. */
|
||
|
.sp
|
||
|
|
||
|
/* 8-bit character literals stand for themselves; */
|
||
|
/* tokens have to be defined for multi-byte characters. */
|
||
|
.sp
|
||
|
|
||
|
%start spec
|
||
|
.sp
|
||
|
|
||
|
%%
|
||
|
.sp
|
||
|
|
||
|
spec : defs MARK rules tail
|
||
|
;
|
||
|
tail : MARK
|
||
|
{
|
||
|
/* In this action, set up the rest of the file. */
|
||
|
}
|
||
|
| /* Empty; the second MARK is optional. */
|
||
|
;
|
||
|
defs : /* Empty. */
|
||
|
| defs def
|
||
|
;
|
||
|
def : START IDENTIFIER
|
||
|
| UNION
|
||
|
{
|
||
|
/* Copy union definition to output. */
|
||
|
}
|
||
|
| LCURL
|
||
|
{
|
||
|
/* Copy C code to output file. */
|
||
|
}
|
||
|
RCURL
|
||
|
| rword tag nlist
|
||
|
;
|
||
|
rword : TOKEN
|
||
|
| LEFT
|
||
|
| RIGHT
|
||
|
| NONASSOC
|
||
|
| TYPE
|
||
|
;
|
||
|
tag : /* Empty: union tag ID optional. */
|
||
|
| '<' IDENTIFIER '>'
|
||
|
;
|
||
|
nlist : nmno
|
||
|
| nlist nmno
|
||
|
;
|
||
|
nmno : IDENTIFIER /* Note: literal invalid with % type. */
|
||
|
| IDENTIFIER NUMBER /* Note: invalid with % type. */
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
/* Rule section */
|
||
|
.sp
|
||
|
|
||
|
rules : C_IDENTIFIER rbody prec
|
||
|
| rules rule
|
||
|
;
|
||
|
rule : C_IDENTIFIER rbody prec
|
||
|
| '|' rbody prec
|
||
|
;
|
||
|
rbody : /* empty */
|
||
|
| rbody IDENTIFIER
|
||
|
| rbody act
|
||
|
;
|
||
|
act : '{'
|
||
|
{
|
||
|
/* Copy action, translate $$, and so on. */
|
||
|
}
|
||
|
'}'
|
||
|
;
|
||
|
prec : /* Empty */
|
||
|
| PREC IDENTIFIER
|
||
|
| PREC IDENTIFIER act
|
||
|
| prec ';'
|
||
|
;
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.SS Conflicts
|
||
|
.LP
|
||
|
The parser produced for an input grammar may contain states in which
|
||
|
conflicts occur. The conflicts occur because the grammar is
|
||
|
not LALR(1). An ambiguous grammar always contains at least one LALR(1)
|
||
|
conflict. The \fIyacc\fP utility shall resolve all
|
||
|
conflicts, using either default rules or user-specified precedence
|
||
|
rules.
|
||
|
.LP
|
||
|
Conflicts are either shift/reduce conflicts or reduce/reduce conflicts.
|
||
|
A shift/reduce conflict is where, for a given state and
|
||
|
lookahead symbol, both a shift action and a reduce action are possible.
|
||
|
A reduce/reduce conflict is where, for a given state and
|
||
|
lookahead symbol, reductions by two different rules are possible.
|
||
|
.LP
|
||
|
The rules below describe how to specify what actions to take when
|
||
|
a conflict occurs. Not all shift/reduce conflicts can be
|
||
|
successfully resolved this way because the conflict may be due to
|
||
|
something other than ambiguity, so incautious use of these
|
||
|
facilities can cause the language accepted by the parser to be much
|
||
|
different from that which was intended. The description file
|
||
|
shall contain sufficient information to understand the cause of the
|
||
|
conflict. Where ambiguity is the reason either the default or
|
||
|
explicit rules should be adequate to produce a working parser.
|
||
|
.LP
|
||
|
The declared precedences and associativities (see Declarations Section
|
||
|
) are used to resolve
|
||
|
parsing conflicts as follows:
|
||
|
.IP " 1." 4
|
||
|
A precedence and associativity is associated with each grammar rule;
|
||
|
it is the precedence and associativity of the last token or
|
||
|
literal in the body of the rule. If the \fB%prec\fP keyword is used,
|
||
|
it overrides this default. Some grammar rules might not have
|
||
|
both precedence and associativity.
|
||
|
.LP
|
||
|
.IP " 2." 4
|
||
|
If there is a shift/reduce conflict, and both the grammar rule and
|
||
|
the input symbol have precedence and associativity associated
|
||
|
with them, then the conflict is resolved in favor of the action (shift
|
||
|
or reduce) associated with the higher precedence. If the
|
||
|
precedences are the same, then the associativity is used; left associative
|
||
|
implies reduce, right associative implies shift, and
|
||
|
non-associative implies an error in the string being parsed.
|
||
|
.LP
|
||
|
.IP " 3." 4
|
||
|
When there is a shift/reduce conflict that cannot be resolved by rule
|
||
|
2, the shift is done. Conflicts resolved this way are
|
||
|
counted in the diagnostic output described in Error Handling .
|
||
|
.LP
|
||
|
.IP " 4." 4
|
||
|
When there is a reduce/reduce conflict, a reduction is done by the
|
||
|
grammar rule that occurs earlier in the input sequence.
|
||
|
Conflicts resolved this way are counted in the diagnostic output described
|
||
|
in Error Handling .
|
||
|
.LP
|
||
|
.LP
|
||
|
Conflicts resolved by precedence or associativity shall not be counted
|
||
|
in the shift/reduce and reduce/reduce conflicts reported
|
||
|
by \fIyacc\fP on either standard error or in the description file.
|
||
|
.SS Error Handling
|
||
|
.LP
|
||
|
The token \fBerror\fP shall be reserved for error handling. The name
|
||
|
\fBerror\fP can be used in grammar rules. It indicates
|
||
|
places where the parser can recover from a syntax error. The default
|
||
|
value of \fBerror\fP shall be 256. Its value can be changed
|
||
|
using a \fB%token\fP declaration. The lexical analyzer should not
|
||
|
return the value of \fBerror\fP.
|
||
|
.LP
|
||
|
The parser shall detect a syntax error when it is in a state where
|
||
|
the action associated with the lookahead symbol is
|
||
|
\fBerror\fP. A semantic action can cause the parser to initiate error
|
||
|
handling by executing the macro YYERROR. When YYERROR is
|
||
|
executed, the semantic action passes control back to the parser. YYERROR
|
||
|
cannot be used outside of semantic actions.
|
||
|
.LP
|
||
|
When the parser detects a syntax error, it normally calls \fIyyerror\fP()
|
||
|
with the character string
|
||
|
\fB"syntax\ error"\fP as its argument. The call shall not be made
|
||
|
if the parser is still recovering from a previous error
|
||
|
when the error is detected. The parser is considered to be recovering
|
||
|
from a previous error until the parser has shifted over at
|
||
|
least three normal input symbols since the last error was detected
|
||
|
or a semantic action has executed the macro \fIyyerrok\fP. The
|
||
|
parser shall not call \fIyyerror\fP() when YYERROR is executed.
|
||
|
.LP
|
||
|
The macro function YYRECOVERING shall return 1 if a syntax error has
|
||
|
been detected and the parser has not yet fully recovered
|
||
|
from it. Otherwise, zero shall be returned.
|
||
|
.LP
|
||
|
When a syntax error is detected by the parser, the parser shall check
|
||
|
if a previous syntax error has been detected. If a
|
||
|
previous error was detected, and if no normal input symbols have been
|
||
|
shifted since the preceding error was detected, the parser
|
||
|
checks if the lookahead symbol is an endmarker (see Interface to the
|
||
|
Lexical Analyzer ). If it is,
|
||
|
the parser shall return with a non-zero value. Otherwise, the lookahead
|
||
|
symbol shall be discarded and normal parsing shall
|
||
|
resume.
|
||
|
.LP
|
||
|
When YYERROR is executed or when the parser detects a syntax error
|
||
|
and no previous error has been detected, or at least one
|
||
|
normal input symbol has been shifted since the previous error was
|
||
|
detected, the parser shall pop back one state at a time until the
|
||
|
parse stack is empty or the current state allows a shift over \fBerror\fP.
|
||
|
If the parser empties the parse stack, it shall return
|
||
|
with a non-zero value. Otherwise, it shall shift over \fBerror\fP
|
||
|
and then resume normal parsing. If the parser reads a lookahead
|
||
|
symbol before the error was detected, that symbol shall still be the
|
||
|
lookahead symbol when parsing is resumed.
|
||
|
.LP
|
||
|
The macro \fIyyerrok\fP in a semantic action shall cause the parser
|
||
|
to act as if it has fully recovered from any previous
|
||
|
errors. The macro \fIyyclearin\fP shall cause the parser to discard
|
||
|
the current lookahead token. If the current lookahead token
|
||
|
has not yet been read, \fIyyclearin\fP shall have no effect.
|
||
|
.LP
|
||
|
The macro YYACCEPT shall cause the parser to return with the value
|
||
|
zero. The macro YYABORT shall cause the parser to return with
|
||
|
a non-zero value.
|
||
|
.SS Interface to the Lexical Analyzer
|
||
|
.LP
|
||
|
The \fIyylex\fP() function is an integer-valued function that returns
|
||
|
a \fItoken number\fP representing the kind of token
|
||
|
read. If there is a value associated with the token returned by \fIyylex\fP()
|
||
|
(see the discussion of \fItag\fP above), it shall
|
||
|
be assigned to the external variable \fIyylval\fP.
|
||
|
.LP
|
||
|
If the parser and \fIyylex\fP() do not agree on these token numbers,
|
||
|
reliable communication between them cannot occur. For
|
||
|
(single-byte character) literals, the token is simply the numeric
|
||
|
value of the character in the current character set. The numbers
|
||
|
for other tokens can either be chosen by \fIyacc\fP, or chosen by
|
||
|
the user. In either case, the \fB#define\fP construct of C is
|
||
|
used to allow \fIyylex\fP() to return these numbers symbolically.
|
||
|
The \fB#define\fP statements are put into the code file, and
|
||
|
the header file if that file is requested. The set of characters permitted
|
||
|
by \fIyacc\fP in an identifier is larger than that
|
||
|
permitted by C. Token names found to contain such characters shall
|
||
|
not be included in the \fB#define\fP declarations.
|
||
|
.LP
|
||
|
If the token numbers are chosen by \fIyacc\fP, the tokens other than
|
||
|
literals shall be assigned numbers greater than 256,
|
||
|
although no order is implied. A token can be explicitly assigned a
|
||
|
number by following its first appearance in the declarations
|
||
|
section with a number. Names and literals not defined this way retain
|
||
|
their default definition. All token numbers assigned by
|
||
|
\fIyacc\fP shall be unique and distinct from the token numbers used
|
||
|
for literals and user-assigned tokens. If duplicate token
|
||
|
numbers cause conflicts in parser generation, \fIyacc\fP shall report
|
||
|
an error; otherwise, it is unspecified whether the token
|
||
|
assignment is accepted or an error is reported.
|
||
|
.LP
|
||
|
The end of the input is marked by a special token called the \fIendmarker\fP,
|
||
|
which has a token number that is zero or
|
||
|
negative. (These values are invalid for any other token.) All lexical
|
||
|
analyzers shall return zero or negative as a token number
|
||
|
upon reaching the end of their input. If the tokens up to, but excluding,
|
||
|
the endmarker form a structure that matches the start
|
||
|
symbol, the parser shall accept the input. If the endmarker is seen
|
||
|
in any other context, it shall be considered an error.
|
||
|
.SS Completing the Program
|
||
|
.LP
|
||
|
In addition to \fIyyparse\fP() and \fIyylex\fP(), the functions \fIyyerror\fP()
|
||
|
and \fImain\fP() are required to make a
|
||
|
complete program. The application can supply \fImain\fP() and \fIyyerror\fP(),
|
||
|
or those routines can be obtained from the
|
||
|
\fIyacc\fP library.
|
||
|
.SS Yacc Library
|
||
|
.LP
|
||
|
The following functions shall appear only in the \fIyacc\fP library
|
||
|
accessible through the \fB-l\ y\fP operand to \fIc99\fP; they can
|
||
|
therefore be redefined by a conforming application:
|
||
|
.TP 7
|
||
|
\fBint\ \fP \fImain\fP(\fBvoid\fP)
|
||
|
.sp
|
||
|
This function shall call \fIyyparse\fP() and exit with an unspecified
|
||
|
value. Other actions within this function are
|
||
|
unspecified.
|
||
|
.TP 7
|
||
|
\fBint\ \fP \fIyyerror\fP(\fBconst\ char\fP\ *\fIs\fP)
|
||
|
.sp
|
||
|
This function shall write the NUL-terminated argument to standard
|
||
|
error, followed by a <newline>.
|
||
|
.sp
|
||
|
.LP
|
||
|
The order of the \fB-l\ y\fP and \fB-l\ l\fP operands given to \fIc99\fP
|
||
|
is
|
||
|
significant; the application shall either provide its own \fImain\fP()
|
||
|
function or ensure that \fB-l\ y\fP precedes
|
||
|
\fB-l\ l\fP.
|
||
|
.SS Debugging the Parser
|
||
|
.LP
|
||
|
The parser generated by \fIyacc\fP shall have diagnostic facilities
|
||
|
in it that can be optionally enabled at either compile time
|
||
|
or at runtime (if enabled at compile time). The compilation of the
|
||
|
runtime debugging code is under the control of YYDEBUG, a
|
||
|
preprocessor symbol. If YYDEBUG has a non-zero value, the debugging
|
||
|
code shall be included. If its value is zero, the code shall
|
||
|
not be included.
|
||
|
.LP
|
||
|
In parsers where the debugging code has been included, the external
|
||
|
\fBint\fP \fIyydebug\fP can be used to turn debugging on
|
||
|
(with a non-zero value) and off (zero value) at runtime. The initial
|
||
|
value of \fIyydebug\fP shall be zero.
|
||
|
.LP
|
||
|
When \fB-t\fP is specified, the code file shall be built such that,
|
||
|
if YYDEBUG is not already defined at compilation time
|
||
|
(using the \fIc99\fP \fB-D\fP YYDEBUG option, for example), YYDEBUG
|
||
|
shall be set explicitly
|
||
|
to 1. When \fB-t\fP is not specified, the code file shall be built
|
||
|
such that, if YYDEBUG is not already defined, it shall be set
|
||
|
explicitly to zero.
|
||
|
.LP
|
||
|
The format of the debugging output is unspecified but includes at
|
||
|
least enough information to determine the shift and reduce
|
||
|
actions, and the input symbols. It also provides information about
|
||
|
error recovery.
|
||
|
.SS Algorithms
|
||
|
.LP
|
||
|
The parser constructed by \fIyacc\fP implements an LALR(1) parsing
|
||
|
algorithm as documented in the literature. It is unspecified
|
||
|
whether the parser is table-driven or direct-coded.
|
||
|
.LP
|
||
|
A parser generated by \fIyacc\fP shall never request an input symbol
|
||
|
from \fIyylex\fP() while in a state where the only
|
||
|
actions other than the error action are reductions by a single rule.
|
||
|
.LP
|
||
|
The literature of parsing theory defines these concepts.
|
||
|
.SS Limits
|
||
|
.LP
|
||
|
The \fIyacc\fP utility may have several internal tables. The minimum
|
||
|
maximums for these tables are shown in the following
|
||
|
table. The exact meaning of these values is implementation-defined.
|
||
|
The implementation shall define the relationship between these
|
||
|
values and between them and any error messages that the implementation
|
||
|
may generate should it run out of space for any internal
|
||
|
structure. An implementation may combine groups of these resources
|
||
|
into a single pool as long as the total available to the user
|
||
|
does not fall below the sum of the sizes specified by this section.
|
||
|
.br
|
||
|
.sp
|
||
|
.ce 1
|
||
|
\fBTable: Internal Limits in \fIyacc\fP\fP
|
||
|
.TS C
|
||
|
center; l l lw(40).
|
||
|
\fB\ \fP \fBMinimum\fP T{
|
||
|
.na
|
||
|
\fB\ \fP
|
||
|
.ad
|
||
|
T}
|
||
|
\fBLimit\fP \fBMaximum\fP T{
|
||
|
.na
|
||
|
\fBDescription\fP
|
||
|
.ad
|
||
|
T}
|
||
|
{NTERMS} 126 T{
|
||
|
.na
|
||
|
Number of tokens.
|
||
|
.ad
|
||
|
T}
|
||
|
{NNONTERM} 200 T{
|
||
|
.na
|
||
|
Number of non-terminals.
|
||
|
.ad
|
||
|
T}
|
||
|
{NPROD} 300 T{
|
||
|
.na
|
||
|
Number of rules.
|
||
|
.ad
|
||
|
T}
|
||
|
{NSTATES} 600 T{
|
||
|
.na
|
||
|
Number of states.
|
||
|
.ad
|
||
|
T}
|
||
|
{MEMSIZE} 5200 T{
|
||
|
.na
|
||
|
Length of rules. The total length, in names (tokens and non-terminals), of all the rules of the grammar. The left-hand side is counted for each rule, even if it is not explicitly repeated, as specified in Grammar Rules in yacc .
|
||
|
.ad
|
||
|
T}
|
||
|
{ACTSIZE} 4000 T{
|
||
|
.na
|
||
|
Number of actions. "Actions" here (and in the description file) refer to parser actions (shift, reduce, and so on) not to semantic actions defined in Grammar Rules in yacc .
|
||
|
.ad
|
||
|
T}
|
||
|
.TE
|
||
|
.SH EXIT STATUS
|
||
|
.LP
|
||
|
The following exit values shall be returned:
|
||
|
.TP 7
|
||
|
\ 0
|
||
|
Successful completion.
|
||
|
.TP 7
|
||
|
>0
|
||
|
An error occurred.
|
||
|
.sp
|
||
|
.SH CONSEQUENCES OF ERRORS
|
||
|
.LP
|
||
|
If any errors are encountered, the run is aborted and \fIyacc\fP exits
|
||
|
with a non-zero status. Partial code files and header
|
||
|
files may be produced. The summary information in the description
|
||
|
file shall always be produced if the \fB-v\fP flag is
|
||
|
present.
|
||
|
.LP
|
||
|
\fIThe following sections are informative.\fP
|
||
|
.SH APPLICATION USAGE
|
||
|
.LP
|
||
|
Historical implementations experience name conflicts on the names
|
||
|
\fByacc.tmp\fP, \fByacc.acts\fP, \fByacc.debug\fP,
|
||
|
\fBy.tab.c\fP, \fBy.tab.h\fP, and \fBy.output\fP if more than one
|
||
|
copy of \fIyacc\fP is running in a single directory at one
|
||
|
time. The \fB-b\fP option was added to overcome this problem. The
|
||
|
related problem of allowing multiple \fIyacc\fP parsers to be
|
||
|
placed in the same file was addressed by adding a \fB-p\fP option
|
||
|
to override the previously hard-coded \fByy\fP variable
|
||
|
prefix.
|
||
|
.LP
|
||
|
The description of the \fB-p\fP option specifies the minimal set of
|
||
|
function and variable names that cause conflict when
|
||
|
multiple parsers are linked together. YYSTYPE does not need to be
|
||
|
changed. Instead, the programmer can use \fB-b\fP to give the
|
||
|
header files for different parsers different names, and then the file
|
||
|
with the \fIyylex\fP() for a given parser can include the
|
||
|
header for that parser. Names such as \fIyyclearerr\fP do not need
|
||
|
to be changed because they are used only in the actions; they
|
||
|
do not have linkage. It is possible that an implementation has other
|
||
|
names, either internal ones for implementing things such as
|
||
|
\fIyyclearerr\fP, or providing non-standard features that it wants
|
||
|
to change with \fB-p\fP.
|
||
|
.LP
|
||
|
Unary operators that are the same token as a binary operator in general
|
||
|
need their precedence adjusted. This is handled by the
|
||
|
\fB%prec\fP advisory symbol associated with the particular grammar
|
||
|
rule defining that unary operator. (See Grammar Rules in yacc .) Applications
|
||
|
are not required to use this operator for unary operators, but the
|
||
|
grammars that do not require it are rare.
|
||
|
.SH EXAMPLES
|
||
|
.LP
|
||
|
Access to the \fIyacc\fP library is obtained with library search operands
|
||
|
to \fIc99\fP. To
|
||
|
use the \fIyacc\fP library \fImain\fP():
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBc99 y.tab.c -l y
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
Both the \fIlex\fP library and the \fIyacc\fP library contain \fImain\fP().
|
||
|
To access the
|
||
|
\fIyacc\fP \fImain\fP():
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBc99 y.tab.c lex.yy.c -l y -l l
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
This ensures that the \fIyacc\fP library is searched first, so that
|
||
|
its \fImain\fP() is used.
|
||
|
.LP
|
||
|
The historical \fIyacc\fP libraries have contained two simple functions
|
||
|
that are normally coded by the application programmer.
|
||
|
These functions are similar to the following code:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB#include <locale.h>
|
||
|
int main(void)
|
||
|
{
|
||
|
extern int yyparse();
|
||
|
.sp
|
||
|
|
||
|
setlocale(LC_ALL, "");
|
||
|
.sp
|
||
|
|
||
|
/* If the following parser is one created by lex, the
|
||
|
application must be careful to ensure that LC_CTYPE
|
||
|
and LC_COLLATE are set to the POSIX locale. */
|
||
|
(void) yyparse();
|
||
|
return (0);
|
||
|
}
|
||
|
.sp
|
||
|
|
||
|
#include <stdio.h>
|
||
|
.sp
|
||
|
|
||
|
int yyerror(const char *msg)
|
||
|
{
|
||
|
(void) fprintf(stderr, "%s\\n", msg);
|
||
|
return (0);
|
||
|
}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.SH RATIONALE
|
||
|
.LP
|
||
|
The references in may be helpful in constructing the parser generator.
|
||
|
The referenced DeRemer and Pennello article (along with
|
||
|
the works it references) describes a technique to generate parsers
|
||
|
that conform to this volume of IEEE\ Std\ 1003.1-2001.
|
||
|
Work in this area continues to be done, so implementors should consult
|
||
|
current literature before doing any new implementations. The
|
||
|
original Knuth article is the theoretical basis for this kind of parser,
|
||
|
but the tables it generates are impractically large for
|
||
|
reasonable grammars and should not be used. The "equivalent to" wording
|
||
|
is intentional to assure that the best tables that are
|
||
|
LALR(1) can be generated.
|
||
|
.LP
|
||
|
There has been confusion between the class of grammars, the algorithms
|
||
|
needed to generate parsers, and the algorithms needed to
|
||
|
parse the languages. They are all reasonably orthogonal. In particular,
|
||
|
a parser generator that accepts the full range of LR(1)
|
||
|
grammars need not generate a table any more complex than one that
|
||
|
accepts SLR(1) (a relatively weak class of LR grammars) for a
|
||
|
grammar that happens to be SLR(1). Such an implementation need not
|
||
|
recognize the case, either; table compression can yield the
|
||
|
SLR(1) table (or one even smaller than that) without recognizing that
|
||
|
the grammar is SLR(1). The speed of an LR(1) parser for any
|
||
|
class is dependent more upon the table representation and compression
|
||
|
(or the code generation if a direct parser is generated) than
|
||
|
upon the class of grammar that the table generator handles.
|
||
|
.LP
|
||
|
The speed of the parser generator is somewhat dependent upon the class
|
||
|
of grammar it handles. However, the original Knuth
|
||
|
article algorithms for constructing LR parsers were judged by its
|
||
|
author to be impractically slow at that time. Although full LR is
|
||
|
more complex than LALR(1), as computer speeds and algorithms improve,
|
||
|
the difference (in terms of acceptable wall-clock execution
|
||
|
time) is becoming less significant.
|
||
|
.LP
|
||
|
Potential authors are cautioned that the referenced DeRemer and Pennello
|
||
|
article previously cited identifies a bug (an
|
||
|
over-simplification of the computation of LALR(1) lookahead sets)
|
||
|
in some of the LALR(1) algorithm statements that preceded it to
|
||
|
publication. They should take the time to seek out that paper, as
|
||
|
well as current relevant work, particularly Aho's.
|
||
|
.LP
|
||
|
The \fB-b\fP option was added to provide a portable method for permitting
|
||
|
\fIyacc\fP to work on multiple separate parsers in
|
||
|
the same directory. If a directory contains more than one \fIyacc\fP
|
||
|
grammar, and both grammars are constructed at the same time
|
||
|
(by, for example, a parallel \fImake\fP program), conflict results.
|
||
|
While the solution is not
|
||
|
historical practice, it corrects a known deficiency in historical
|
||
|
implementations. Corresponding changes were made to all sections
|
||
|
that referenced the filenames \fBy.tab.c\fP (now "the code file"),
|
||
|
\fBy.tab.h\fP (now "the header file"), and \fBy.output\fP
|
||
|
(now "the description file").
|
||
|
.LP
|
||
|
The grammar for \fIyacc\fP input is based on System V documentation.
|
||
|
The textual description shows there that the \fB';'\fP
|
||
|
is required at the end of the rule. The grammar and the implementation
|
||
|
do not require this. (The use of \fBC_IDENTIFIER\fP causes
|
||
|
a reduce to occur in the right place.)
|
||
|
.LP
|
||
|
Also, in that implementation, the constructs such as \fB%token\fP
|
||
|
can be terminated by a semicolon, but this is not permitted
|
||
|
by the grammar. The keywords such as \fB%token\fP can also appear
|
||
|
in uppercase, which is again not discussed. In most places where
|
||
|
\fB'%'\fP is used, \fB'\\'\fP can be substituted, and there are alternate
|
||
|
spellings for some of the symbols (for example,
|
||
|
\fB%LEFT\fP can be \fB"%<"\fP or even \fB"\\<"\fP ).
|
||
|
.LP
|
||
|
Historically, <\fItag\fP> can contain any characters except \fB'>'\fP
|
||
|
, including white space, in the
|
||
|
implementation. However, since the \fItag\fP must reference an ISO\ C
|
||
|
standard union member, in practice conforming
|
||
|
implementations need to support only the set of characters for ISO\ C
|
||
|
standard identifiers in this context.
|
||
|
.LP
|
||
|
Some historical implementations are known to accept actions that are
|
||
|
terminated by a period. Historical implementations often
|
||
|
allow \fB'$'\fP in names. A conforming implementation does not need
|
||
|
to support either of these behaviors.
|
||
|
.LP
|
||
|
Deciding when to use \fB%prec\fP illustrates the difficulty in specifying
|
||
|
the behavior of \fIyacc\fP. There may be situations
|
||
|
in which the \fIgrammar\fP is not, strictly speaking, in error, and
|
||
|
yet \fIyacc\fP cannot interpret it unambiguously. The
|
||
|
resolution of ambiguities in the grammar can in many instances be
|
||
|
resolved by providing additional information, such as using
|
||
|
\fB%type\fP or \fB%union\fP declarations. It is often easier and it
|
||
|
usually yields a smaller parser to take this alternative when
|
||
|
it is appropriate.
|
||
|
.LP
|
||
|
The size and execution time of a program produced without the runtime
|
||
|
debugging code is usually smaller and slightly faster in
|
||
|
historical implementations.
|
||
|
.LP
|
||
|
Statistics messages from several historical implementations include
|
||
|
the following types of information:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fIn\fP\fB/512 terminals,\fP \fIn\fP\fB/300 non-terminals
|
||
|
\fP\fIn\fP\fB/600 grammar rules,\fP \fIn\fP\fB/1500 states
|
||
|
\fP\fIn\fP \fBshift/reduce,\fP \fIn\fP \fBreduce/reduce conflicts reported
|
||
|
\fP\fIn\fP\fB/350 working sets used
|
||
|
Memory: states, etc.\fP \fIn\fP\fB/15000, parser\fP \fIn\fP\fB/15000
|
||
|
\fP\fIn\fP\fB/600 distinct lookahead sets
|
||
|
\fP\fIn\fP \fBextra closures
|
||
|
\fP\fIn\fP \fBshift entries,\fP \fIn\fP \fBexceptions
|
||
|
\fP\fIn\fP \fBgoto entries
|
||
|
\fP\fIn\fP \fBentries saved by goto default
|
||
|
Optimizer space used: input\fP \fIn\fP\fB/15000, output\fP \fIn\fP\fB/15000
|
||
|
\fP\fIn\fP \fBtable entries,\fP \fIn\fP \fBzero
|
||
|
Maximum spread:\fP \fIn\fP\fB, Maximum offset:\fP \fIn\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
The report of internal tables in the description file is left implementation-defined
|
||
|
because all aspects of these limits are
|
||
|
also implementation-defined. Some implementations may use dynamic
|
||
|
allocation techniques and have no specific limit values to
|
||
|
report.
|
||
|
.LP
|
||
|
The format of the \fBy.output\fP file is not given because specification
|
||
|
of the format was not seen to enhance applications
|
||
|
portability. The listing is primarily intended to help human users
|
||
|
understand and debug the parser; use of \fBy.output\fP by a
|
||
|
conforming application script would be unusual. Furthermore, implementations
|
||
|
have not produced consistent output and no popular
|
||
|
format was apparent. The format selected by the implementation should
|
||
|
be human-readable, in addition to the requirement that it be
|
||
|
a text file.
|
||
|
.LP
|
||
|
Standard error reports are not specifically described because they
|
||
|
are seldom of use to conforming applications and there was no
|
||
|
reason to restrict implementations.
|
||
|
.LP
|
||
|
Some implementations recognize \fB"={"\fP as equivalent to \fB'{'\fP
|
||
|
because it appears in historical documentation. This
|
||
|
construction was recognized and documented as obsolete as long ago
|
||
|
as 1978, in the referenced \fIYacc: Yet Another
|
||
|
Compiler-Compiler\fP. This volume of IEEE\ Std\ 1003.1-2001 chose
|
||
|
to leave it as obsolete and omit it.
|
||
|
.LP
|
||
|
Multi-byte characters should be recognized by the lexical analyzer
|
||
|
and returned as tokens. They should not be returned as
|
||
|
multi-byte character literals. The token \fBerror\fP that is used
|
||
|
for error recovery is normally assigned the value 256 in the
|
||
|
historical implementation. Thus, the token value 256, which is used
|
||
|
in many multi-byte character sets, is not available for use as
|
||
|
the value of a user-defined token.
|
||
|
.SH FUTURE DIRECTIONS
|
||
|
.LP
|
||
|
None.
|
||
|
.SH SEE ALSO
|
||
|
.LP
|
||
|
\fIc99\fP , \fIlex\fP
|
||
|
.SH COPYRIGHT
|
||
|
Portions of this text are reprinted and reproduced in electronic form
|
||
|
from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
|
||
|
-- Portable Operating System Interface (POSIX), The Open Group Base
|
||
|
Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
|
||
|
Electrical and Electronics Engineers, Inc and The Open Group. In the
|
||
|
event of any discrepancy between this version and the original IEEE and
|
||
|
The Open Group Standard, the original IEEE and The Open Group Standard
|
||
|
is the referee document. The original Standard can be obtained online at
|
||
|
http://www.opengroup.org/unix/online.html .
|