mirror of https://github.com/mkerrisk/man-pages
2985 lines
100 KiB
Groff
2985 lines
100 KiB
Groff
|
.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved
|
||
|
.TH "AWK" P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual"
|
||
|
.\" awk
|
||
|
.SH NAME
|
||
|
awk \- pattern scanning and processing language
|
||
|
.SH SYNOPSIS
|
||
|
.LP
|
||
|
\fBawk\fP \fB[\fP\fB-F\fP \fIERE\fP\fB][\fP\fB-v\fP \fIassignment\fP\fB]\fP
|
||
|
\fB\&...\fP \fIprogram\fP
|
||
|
\fB[\fP\fIargument\fP \fB...\fP\fB]\fP\fB
|
||
|
.br
|
||
|
.sp
|
||
|
awk\fP \fB[\fP\fB-F\fP \fIERE\fP\fB]\fP \fB-f\fP \fIprogfile\fP \fB...\fP
|
||
|
\fB[\fP\fB-v\fP
|
||
|
\fIassignment\fP\fB]\fP \fB...\fP\fB[\fP\fIargument\fP \fB...\fP\fB]\fP\fB
|
||
|
.br
|
||
|
\fP
|
||
|
.SH DESCRIPTION
|
||
|
.LP
|
||
|
The \fIawk\fP utility shall execute programs written in the \fIawk\fP
|
||
|
programming language, which is specialized for textual
|
||
|
data manipulation. An \fIawk\fP program is a sequence of patterns
|
||
|
and corresponding actions. When input is read that matches a
|
||
|
pattern, the action associated with that pattern is carried out.
|
||
|
.LP
|
||
|
Input shall be interpreted as a sequence of records. By default, a
|
||
|
record is a line, less its terminating <newline>, but
|
||
|
this can be changed by using the \fBRS\fP built-in variable. Each
|
||
|
record of input shall be matched in turn against each pattern in
|
||
|
the program. For each pattern matched, the associated action shall
|
||
|
be executed.
|
||
|
.LP
|
||
|
The \fIawk\fP utility shall interpret each input record as a sequence
|
||
|
of fields where, by default, a field is a string of non-
|
||
|
<blank>s. This default white-space field delimiter can be changed
|
||
|
by using the \fBFS\fP built-in variable or \fB-F\fP
|
||
|
\fIERE\fP. The \fIawk\fP utility shall denote the first field in a
|
||
|
record $1, the second $2, and so on. The symbol $0 shall refer
|
||
|
to the entire record; setting any other field causes the re-evaluation
|
||
|
of $0. Assigning to $0 shall reset the values of all other
|
||
|
fields and the \fBNF\fP built-in variable.
|
||
|
.SH OPTIONS
|
||
|
.LP
|
||
|
The \fIawk\fP utility shall conform to the Base Definitions volume
|
||
|
of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
|
||
|
.LP
|
||
|
The following options shall be supported:
|
||
|
.TP 7
|
||
|
\fB-F\ \fP \fIERE\fP
|
||
|
Define the input field separator to be the extended regular expression
|
||
|
\fIERE\fP, before any input is read; see Regular Expressions .
|
||
|
.TP 7
|
||
|
\fB-f\ \fP \fIprogfile\fP
|
||
|
Specify the pathname of the file \fIprogfile\fP containing an \fIawk\fP
|
||
|
program. If multiple instances of this option are
|
||
|
specified, the concatenation of the files specified as \fIprogfile\fP
|
||
|
in the order specified shall be the \fIawk\fP program. The
|
||
|
\fIawk\fP program can alternatively be specified in the command line
|
||
|
as a single argument.
|
||
|
.TP 7
|
||
|
\fB-v\ \fP \fIassignment\fP
|
||
|
The application shall ensure that the \fIassignment\fP argument is
|
||
|
in the same form as an \fIassignment\fP operand. The specified
|
||
|
variable assignment shall occur prior to executing the \fIawk\fP program,
|
||
|
including the actions associated with \fBBEGIN\fP
|
||
|
patterns (if any). Multiple occurrences of this option can be specified.
|
||
|
.sp
|
||
|
.SH OPERANDS
|
||
|
.LP
|
||
|
The following operands shall be supported:
|
||
|
.TP 7
|
||
|
\fIprogram\fP
|
||
|
If no \fB-f\fP option is specified, the first operand to \fIawk\fP
|
||
|
shall be the text of the \fIawk\fP program. The
|
||
|
application shall supply the \fIprogram\fP operand as a single argument
|
||
|
to \fIawk\fP. If the text does not end in a
|
||
|
<newline>, \fIawk\fP shall interpret the text as if it did.
|
||
|
.TP 7
|
||
|
\fIargument\fP
|
||
|
Either of the following two types of \fIargument\fP can be intermixed:
|
||
|
.TP 7
|
||
|
\fIfile\fP
|
||
|
.RS
|
||
|
A pathname of a file that contains the input to be read, which is
|
||
|
matched against the set of patterns in the program. If no
|
||
|
\fIfile\fP operands are specified, or if a \fIfile\fP operand is \fB'-'\fP
|
||
|
, the standard input shall be used.
|
||
|
.RE
|
||
|
.TP 7
|
||
|
\fIassignment\fP
|
||
|
.RS
|
||
|
An operand that begins with an underscore or alphabetic character
|
||
|
from the portable character set (see the table in the Base
|
||
|
Definitions volume of IEEE\ Std\ 1003.1-2001, Section 6.1, Portable
|
||
|
Character Set), followed by a sequence of underscores, digits, and
|
||
|
alphabetics from the portable character set, followed by the
|
||
|
\fB'='\fP character, shall specify a variable assignment rather than
|
||
|
a pathname. The characters before the \fB'='\fP
|
||
|
represent the name of an \fIawk\fP variable; if that name is an \fIawk\fP
|
||
|
reserved word (see Grammar ) the behavior is undefined. The characters
|
||
|
following the equal sign shall be interpreted as if they
|
||
|
appeared in the \fIawk\fP program preceded and followed by a double-quote
|
||
|
( \fB' )'\fP character, as a \fBSTRING\fP token (see
|
||
|
Grammar ), except that if the last character is an unescaped backslash,
|
||
|
it shall be interpreted as a
|
||
|
literal backslash rather than as the first character of the sequence
|
||
|
\fB"\\""\fP . The variable shall be assigned the value of
|
||
|
that \fBSTRING\fP token and, if appropriate, shall be considered a
|
||
|
\fInumeric string\fP (see Expressions in awk ), the variable shall
|
||
|
also be assigned its numeric value. Each such variable assignment
|
||
|
shall occur just prior to the processing of the following \fIfile\fP,
|
||
|
if any. Thus, an assignment before the first \fIfile\fP
|
||
|
argument shall be executed after the \fBBEGIN\fP actions (if any),
|
||
|
while an assignment after the last \fIfile\fP argument shall
|
||
|
occur before the \fBEND\fP actions (if any). If there are no \fIfile\fP
|
||
|
arguments, assignments shall be executed before
|
||
|
processing the standard input.
|
||
|
.RE
|
||
|
.sp
|
||
|
.sp
|
||
|
.SH STDIN
|
||
|
.LP
|
||
|
The standard input shall be used only if no \fIfile\fP operands are
|
||
|
specified, or if a \fIfile\fP operand is \fB'-'\fP ;
|
||
|
see the INPUT FILES section. If the \fIawk\fP program contains no
|
||
|
actions and no patterns, but is otherwise a valid \fIawk\fP
|
||
|
program, standard input and any \fIfile\fP operands shall not be read
|
||
|
and \fIawk\fP shall exit with a return status of zero.
|
||
|
.SH INPUT FILES
|
||
|
.LP
|
||
|
Input files to the \fIawk\fP program from any of the following sources
|
||
|
shall be text files:
|
||
|
.IP " *" 3
|
||
|
Any \fIfile\fP operands or their equivalents, achieved by modifying
|
||
|
the \fIawk\fP variables \fBARGV\fP and \fBARGC\fP
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
Standard input in the absence of any \fIfile\fP operands
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
Arguments to the \fBgetline\fP function
|
||
|
.LP
|
||
|
.LP
|
||
|
Whether the variable \fBRS\fP is set to a value other than a <newline>
|
||
|
or not, for these files, implementations shall
|
||
|
support records terminated with the specified separator up to {LINE_MAX}
|
||
|
bytes and may support longer records.
|
||
|
.LP
|
||
|
If \fB-f\fP \fIprogfile\fP is specified, the application shall ensure
|
||
|
that the files named by each of the \fIprogfile\fP
|
||
|
option-arguments are text files and their concatenation, in the same
|
||
|
order as they appear in the arguments, is an \fIawk\fP
|
||
|
program.
|
||
|
.SH ENVIRONMENT VARIABLES
|
||
|
.LP
|
||
|
The following environment variables shall affect the execution of
|
||
|
\fIawk\fP:
|
||
|
.TP 7
|
||
|
\fILANG\fP
|
||
|
Provide a default value for the internationalization variables that
|
||
|
are unset or null. (See the Base Definitions volume of
|
||
|
IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables
|
||
|
for
|
||
|
the precedence of internationalization variables used to determine
|
||
|
the values of locale categories.)
|
||
|
.TP 7
|
||
|
\fILC_ALL\fP
|
||
|
If set to a non-empty string value, override the values of all the
|
||
|
other internationalization variables.
|
||
|
.TP 7
|
||
|
\fILC_COLLATE\fP
|
||
|
Determine the locale for the behavior of ranges, equivalence classes,
|
||
|
and multi-character collating elements within regular
|
||
|
expressions and in comparisons of string values.
|
||
|
.TP 7
|
||
|
\fILC_CTYPE\fP
|
||
|
Determine the locale for the interpretation of sequences of bytes
|
||
|
of text data as characters (for example, single-byte as
|
||
|
opposed to multi-byte characters in arguments and input files), the
|
||
|
behavior of character classes within regular expressions, the
|
||
|
identification of characters as letters, and the mapping of uppercase
|
||
|
and lowercase characters for the \fBtoupper\fP and
|
||
|
\fBtolower\fP functions.
|
||
|
.TP 7
|
||
|
\fILC_MESSAGES\fP
|
||
|
Determine the locale that should be used to affect the format and
|
||
|
contents of diagnostic messages written to standard
|
||
|
error.
|
||
|
.TP 7
|
||
|
\fILC_NUMERIC\fP
|
||
|
Determine the radix character used when interpreting numeric input,
|
||
|
performing conversions between numeric and string values, and
|
||
|
formatting numeric output. Regardless of locale, the period character
|
||
|
(the decimal-point character of the POSIX locale) is the
|
||
|
decimal-point character recognized in processing \fIawk\fP programs
|
||
|
(including assignments in command line arguments).
|
||
|
.TP 7
|
||
|
\fINLSPATH\fP
|
||
|
Determine the location of message catalogs for the processing of \fILC_MESSAGES
|
||
|
\&.\fP
|
||
|
.TP 7
|
||
|
\fIPATH\fP
|
||
|
Determine the search path when looking for commands executed by \fIsystem\fP(\fIexpr\fP),
|
||
|
or input and output pipes; see the
|
||
|
Base Definitions volume of IEEE\ Std\ 1003.1-2001, Chapter 8, Environment
|
||
|
Variables.
|
||
|
.sp
|
||
|
.LP
|
||
|
In addition, all environment variables shall be visible via the \fIawk\fP
|
||
|
variable \fBENVIRON\fP.
|
||
|
.SH ASYNCHRONOUS EVENTS
|
||
|
.LP
|
||
|
Default.
|
||
|
.SH STDOUT
|
||
|
.LP
|
||
|
The nature of the output files depends on the \fIawk\fP program.
|
||
|
.SH STDERR
|
||
|
.LP
|
||
|
The standard error shall be used only for diagnostic messages.
|
||
|
.SH OUTPUT FILES
|
||
|
.LP
|
||
|
The nature of the output files depends on the \fIawk\fP program.
|
||
|
.SH EXTENDED DESCRIPTION
|
||
|
.SS Overall Program Structure
|
||
|
.LP
|
||
|
An \fIawk\fP program is composed of pairs of the form:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fIpattern\fP \fB{\fP \fIaction\fP \fB}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
Either the pattern or the action (including the enclosing brace characters)
|
||
|
can be omitted.
|
||
|
.LP
|
||
|
A missing pattern shall match any record of input, and a missing action
|
||
|
shall be equivalent to:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB{ print }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
Execution of the \fIawk\fP program shall start by first executing
|
||
|
the actions associated with all \fBBEGIN\fP patterns in the
|
||
|
order they occur in the program. Then each \fIfile\fP operand (or
|
||
|
standard input if no files were specified) shall be processed in
|
||
|
turn by reading data from the file until a record separator is seen
|
||
|
( <newline> by default). Before the first reference to a
|
||
|
field in the record is evaluated, the record shall be split into fields,
|
||
|
according to the rules in Regular Expressions , using the value of
|
||
|
\fBFS\fP that was current at the time the record was read. Each
|
||
|
pattern in the program then shall be evaluated in the order of occurrence,
|
||
|
and the action associated with each pattern that matches
|
||
|
the current record executed. The action for a matching pattern shall
|
||
|
be executed before evaluating subsequent patterns. Finally,
|
||
|
the actions associated with all \fBEND\fP patterns shall be executed
|
||
|
in the order they occur in the program.
|
||
|
.SS Expressions in awk
|
||
|
.LP
|
||
|
Expressions describe computations used in \fIpatterns\fP and \fIactions\fP.
|
||
|
In the following table, valid expression
|
||
|
operations are given in groups from highest precedence first to lowest
|
||
|
precedence last, with equal-precedence operators grouped
|
||
|
between horizontal lines. In expression evaluation, where the grammar
|
||
|
is formally ambiguous, higher precedence operators shall be
|
||
|
evaluated before lower precedence operators. In this table \fIexpr\fP,
|
||
|
\fIexpr1\fP, \fIexpr2\fP, and \fIexpr3\fP represent any
|
||
|
expression, while lvalue represents any entity that can be assigned
|
||
|
to (that is, on the left side of an assignment operator). The
|
||
|
precise syntax of expressions is given in Grammar .
|
||
|
.sp
|
||
|
.ce 1
|
||
|
\fBTable: Expressions in Decreasing Precedence in \fIawk\fP\fP
|
||
|
.TS C
|
||
|
center; l1 l1 l1 l.
|
||
|
\fBSyntax\fP \fBName\fP \fBType of Result\fP \fBAssociativity\fP
|
||
|
( \fIexpr\fP ) Grouping Type of \fIexpr\fP N/A
|
||
|
$\fIexpr\fP Field reference String N/A
|
||
|
++ lvalue Pre-increment Numeric N/A
|
||
|
-- lvalue Pre-decrement Numeric N/A
|
||
|
lvalue ++ Post-increment Numeric N/A
|
||
|
lvalue -- Post-decrement Numeric N/A
|
||
|
\fIexpr\fP ^ \fIexpr\fP Exponentiation Numeric Right
|
||
|
! \fIexpr\fP Logical not Numeric N/A
|
||
|
+ \fIexpr\fP Unary plus Numeric N/A
|
||
|
- \fIexpr\fP Unary minus Numeric N/A
|
||
|
\fIexpr\fP * \fIexpr\fP Multiplication Numeric Left
|
||
|
\fIexpr\fP / \fIexpr\fP Division Numeric Left
|
||
|
\fIexpr\fP % \fIexpr\fP Modulus Numeric Left
|
||
|
\fIexpr\fP + \fIexpr\fP Addition Numeric Left
|
||
|
\fIexpr\fP - \fIexpr\fP Subtraction Numeric Left
|
||
|
\fIexpr\fP \fIexpr\fP String concatenation String Left
|
||
|
\fIexpr\fP < \fIexpr\fP Less than Numeric None
|
||
|
\fIexpr\fP <= \fIexpr\fP Less than or equal to Numeric None
|
||
|
\fIexpr\fP != \fIexpr\fP Not equal to Numeric None
|
||
|
\fIexpr\fP == \fIexpr\fP Equal to Numeric None
|
||
|
\fIexpr\fP > \fIexpr\fP Greater than Numeric None
|
||
|
\fIexpr\fP >= \fIexpr\fP Greater than or equal to Numeric None
|
||
|
\fIexpr\fP ~ \fIexpr\fP ERE match Numeric None
|
||
|
\fIexpr\fP !~ \fIexpr\fP ERE non-match Numeric None
|
||
|
\fIexpr\fP in array Array membership Numeric Left
|
||
|
( \fIindex\fP ) in \fIarray\fP Multi-dimension array Numeric Left
|
||
|
\ membership \ \
|
||
|
\fIexpr\fP && \fIexpr\fP Logical AND Numeric Left
|
||
|
\fIexpr\fP || \fIexpr\fP Logical OR Numeric Left
|
||
|
\fIexpr1\fP ? \fIexpr2\fP : \fIexpr3\fP Conditional expression Type of selected Right
|
||
|
\ \ \fIexpr2\fP or \fIexpr3\fP \
|
||
|
lvalue ^= \fIexpr\fP Exponentiation assignment Numeric Right
|
||
|
lvalue %= \fIexpr\fP Modulus assignment Numeric Right
|
||
|
lvalue *= \fIexpr\fP Multiplication assignment Numeric Right
|
||
|
lvalue /= \fIexpr\fP Division assignment Numeric Right
|
||
|
lvalue += \fIexpr\fP Addition assignment Numeric Right
|
||
|
lvalue -= \fIexpr\fP Subtraction assignment Numeric Right
|
||
|
lvalue = \fIexpr\fP Assignment Type of \fIexpr\fP Right
|
||
|
.TE
|
||
|
.LP
|
||
|
Each expression shall have either a string value, a numeric value,
|
||
|
or both. Except as stated for specific contexts, the value of
|
||
|
an expression shall be implicitly converted to the type needed for
|
||
|
the context in which it is used. A string value shall be
|
||
|
converted to a numeric value by the equivalent of the following calls
|
||
|
to functions defined by the ISO\ C standard:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBsetlocale(LC_NUMERIC, "");
|
||
|
\fP\fInumeric_value\fP \fB= atof(\fP\fIstring_value\fP\fB);
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
A numeric value that is exactly equal to the value of an integer (see
|
||
|
\fIConcepts Derived
|
||
|
from the ISO C Standard\fP ) shall be converted to a string by the
|
||
|
equivalent of a call to the \fBsprintf\fP function (see String Functions
|
||
|
) with the string \fB"%d"\fP as the \fIfmt\fP argument and the numeric
|
||
|
value being
|
||
|
converted as the first and only \fIexpr\fP argument. Any other numeric
|
||
|
value shall be converted to a string by the equivalent of a
|
||
|
call to the \fBsprintf\fP function with the value of the variable
|
||
|
\fBCONVFMT\fP as the \fIfmt\fP argument and the numeric value
|
||
|
being converted as the first and only \fIexpr\fP argument. The result
|
||
|
of the conversion is unspecified if the value of
|
||
|
\fBCONVFMT\fP is not a floating-point format specification. This volume
|
||
|
of IEEE\ Std\ 1003.1-2001 specifies no explicit
|
||
|
conversions between numbers and strings. An application can force
|
||
|
an expression to be treated as a number by adding zero to it, or
|
||
|
can force it to be treated as a string by concatenating the null string
|
||
|
( \fB""\fP ) to it.
|
||
|
.LP
|
||
|
A string value shall be considered a \fInumeric string\fP if it comes
|
||
|
from one of the following:
|
||
|
.IP " 1." 4
|
||
|
Field variables
|
||
|
.LP
|
||
|
.IP " 2." 4
|
||
|
Input from the \fIgetline\fP() function
|
||
|
.LP
|
||
|
.IP " 3." 4
|
||
|
\fBFILENAME\fP
|
||
|
.LP
|
||
|
.IP " 4." 4
|
||
|
\fBARGV\fP array elements
|
||
|
.LP
|
||
|
.IP " 5." 4
|
||
|
\fBENVIRON\fP array elements
|
||
|
.LP
|
||
|
.IP " 6." 4
|
||
|
Array elements created by the \fIsplit\fP() function
|
||
|
.LP
|
||
|
.IP " 7." 4
|
||
|
A command line variable assignment
|
||
|
.LP
|
||
|
.IP " 8." 4
|
||
|
Variable assignment from another numeric string variable
|
||
|
.LP
|
||
|
.LP
|
||
|
and after all the following conversions have been applied, the resulting
|
||
|
string would lexically be recognized as a \fBNUMBER\fP
|
||
|
token as described by the lexical conventions in Grammar :
|
||
|
.IP " *" 3
|
||
|
All leading and trailing <blank>s are discarded.
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
If the first non- <blank> is \fB'+'\fP or \fB'-'\fP , it is discarded.
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
Changing each occurrence of the decimal point character from the current
|
||
|
locale to a period.
|
||
|
.LP
|
||
|
.LP
|
||
|
If a \fB'-'\fP character is ignored in the preceding description,
|
||
|
the numeric value of the \fInumeric string\fP shall be the
|
||
|
negation of the numeric value of the recognized \fBNUMBER\fP token.
|
||
|
Otherwise, the numeric value of the \fInumeric string\fP
|
||
|
shall be the numeric value of the recognized \fBNUMBER\fP token. Whether
|
||
|
or not a string is a \fInumeric string\fP shall be
|
||
|
relevant only in contexts where that term is used in this section.
|
||
|
.LP
|
||
|
When an expression is used in a Boolean context, if it has a numeric
|
||
|
value, a value of zero shall be treated as false and any
|
||
|
other value shall be treated as true. Otherwise, a string value of
|
||
|
the null string shall be treated as false and any other value
|
||
|
shall be treated as true. A Boolean context shall be one of the following:
|
||
|
.IP " *" 3
|
||
|
The first subexpression of a conditional expression
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
An expression operated on by logical NOT, logical AND, or logical
|
||
|
OR
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
The second expression of a \fBfor\fP statement
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
The expression of an \fBif\fP statement
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
The expression of the \fBwhile\fP clause in either a \fBwhile\fP or
|
||
|
\fBdo\fP... \fBwhile\fP statement
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
An expression used as a pattern (as in Overall Program Structure)
|
||
|
.LP
|
||
|
.LP
|
||
|
All arithmetic shall follow the semantics of floating-point arithmetic
|
||
|
as specified by the ISO\ C standard (see \fIConcepts Derived from
|
||
|
the ISO C Standard\fP ).
|
||
|
.LP
|
||
|
The value of the expression:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fIexpr1\fP \fB^\fP \fIexpr2\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
shall be equivalent to the value returned by the ISO\ C standard function
|
||
|
call:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBpow(\fP\fIexpr1\fP\fB,\fP \fIexpr2\fP\fB)
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
The expression:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBlvalue ^=\fP \fIexpr\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
shall be equivalent to the ISO\ C standard expression:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBlvalue = pow(lvalue,\fP \fIexpr\fP\fB)
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
except that lvalue shall be evaluated only once. The value of the
|
||
|
expression:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fIexpr1\fP \fB%\fP \fIexpr2\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
shall be equivalent to the value returned by the ISO\ C standard function
|
||
|
call:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBfmod(\fP\fIexpr1\fP\fB,\fP \fIexpr2\fP\fB)
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
The expression:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBlvalue %=\fP \fIexpr\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
shall be equivalent to the ISO\ C standard expression:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBlvalue = fmod(lvalue,\fP \fIexpr\fP\fB)
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
except that lvalue shall be evaluated only once.
|
||
|
.LP
|
||
|
Variables and fields shall be set by the assignment statement:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBlvalue =\fP \fIexpression\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
and the type of \fIexpression\fP shall determine the resulting variable
|
||
|
type. The assignment includes the arithmetic
|
||
|
assignments ( \fB"+="\fP , \fB"-="\fP , \fB"*="\fP , \fB"/="\fP ,
|
||
|
\fB"%="\fP , \fB"^="\fP , \fB"++"\fP ,
|
||
|
\fB"--"\fP ) all of which shall produce a numeric result. The left-hand
|
||
|
side of an assignment and the target of increment and
|
||
|
decrement operators can be one of a variable, an array with index,
|
||
|
or a field selector.
|
||
|
.LP
|
||
|
The \fIawk\fP language supplies arrays that are used for storing numbers
|
||
|
or strings. Arrays need not be declared. They shall
|
||
|
initially be empty, and their sizes shall change dynamically. The
|
||
|
subscripts, or element identifiers, are strings, providing a type
|
||
|
of associative array capability. An array name followed by a subscript
|
||
|
within square brackets can be used as an lvalue and thus as
|
||
|
an expression, as described in the grammar; see Grammar . Unsubscripted
|
||
|
array names can be used in
|
||
|
only the following contexts:
|
||
|
.IP " *" 3
|
||
|
A parameter in a function definition or function call
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
The \fBNAME\fP token following any use of the keyword \fBin\fP as
|
||
|
specified in the grammar (see Grammar ); if the name used in this
|
||
|
context is not an array name, the behavior is undefined
|
||
|
.LP
|
||
|
.LP
|
||
|
A valid array \fIindex\fP shall consist of one or more comma-separated
|
||
|
expressions, similar to the way in which
|
||
|
multi-dimensional arrays are indexed in some programming languages.
|
||
|
Because \fIawk\fP arrays are really one-dimensional, such a
|
||
|
comma-separated list shall be converted to a single string by concatenating
|
||
|
the string values of the separate expressions, each
|
||
|
separated from the other by the value of the \fBSUBSEP\fP variable.
|
||
|
Thus, the following two index operations shall be
|
||
|
equivalent:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fIvar\fP\fB[\fP\fIexpr1\fP\fB,\fP \fIexpr2\fP\fB, ...\fP \fIexprn\fP\fB]
|
||
|
.sp
|
||
|
|
||
|
\fP\fIvar\fP\fB[\fP\fIexpr1\fP \fBSUBSEP\fP \fIexpr2\fP \fBSUBSEP ... SUBSEP\fP \fIexprn\fP\fB]\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
The application shall ensure that a multi-dimensioned \fIindex\fP
|
||
|
used with the \fBin\fP operator is parenthesized. The
|
||
|
\fBin\fP operator, which tests for the existence of a particular array
|
||
|
element, shall not cause that element to exist. Any other
|
||
|
reference to a nonexistent array element shall automatically create
|
||
|
it.
|
||
|
.LP
|
||
|
Comparisons (with the \fB'<'\fP , \fB"<="\fP , \fB"!="\fP , \fB"=="\fP
|
||
|
, \fB'>'\fP , and
|
||
|
\fB">="\fP operators) shall be made numerically if both operands are
|
||
|
numeric, if one is numeric and the other has a string
|
||
|
value that is a numeric string, or if one is numeric and the other
|
||
|
has the uninitialized value. Otherwise, operands shall be
|
||
|
converted to strings as required and a string comparison shall be
|
||
|
made using the locale-specific collation sequence. The value of
|
||
|
the comparison expression shall be 1 if the relation is true, or 0
|
||
|
if the relation is false.
|
||
|
.SS Variables and Special Variables
|
||
|
.LP
|
||
|
Variables can be used in an \fIawk\fP program by referencing them.
|
||
|
With the exception of function parameters (see User-Defined Functions
|
||
|
), they are not explicitly declared. Function parameter names shall
|
||
|
be local to the
|
||
|
function; all other variable names shall be global. The same name
|
||
|
shall not be used as both a function parameter name and as the
|
||
|
name of a function or a special \fIawk\fP variable. The same name
|
||
|
shall not be used both as a variable name with global scope and
|
||
|
as the name of a function. The same name shall not be used within
|
||
|
the same scope both as a scalar variable and as an array.
|
||
|
Uninitialized variables, including scalar variables, array elements,
|
||
|
and field variables, shall have an uninitialized value. An
|
||
|
uninitialized value shall have both a numeric value of zero and a
|
||
|
string value of the empty string. Evaluation of variables with an
|
||
|
uninitialized value, to either string or numeric, shall be determined
|
||
|
by the context in which they are used.
|
||
|
.LP
|
||
|
Field variables shall be designated by a \fB'$'\fP followed by a number
|
||
|
or numerical expression. The effect of the field
|
||
|
number \fIexpression\fP evaluating to anything other than a non-negative
|
||
|
integer is unspecified; uninitialized variables or string
|
||
|
values need not be converted to numeric values in this context. New
|
||
|
field variables can be created by assigning a value to them.
|
||
|
References to nonexistent fields (that is, fields after $\fBNF\fP),
|
||
|
shall evaluate to the uninitialized value. Such references
|
||
|
shall not create new fields. However, assigning to a nonexistent field
|
||
|
(for example, $(\fBNF\fP+2)=5) shall increase the value of
|
||
|
\fBNF\fP; create any intervening fields with the uninitialized value;
|
||
|
and cause the value of $0 to be recomputed, with the fields
|
||
|
being separated by the value of \fBOFS\fP. Each field variable shall
|
||
|
have a string value or an uninitialized value when created.
|
||
|
Field variables shall have the uninitialized value when created from
|
||
|
$0 using \fBFS\fP and the variable does not contain any
|
||
|
characters. If appropriate, the field variable shall be considered
|
||
|
a numeric string (see Expressions in
|
||
|
awk ).
|
||
|
.LP
|
||
|
Implementations shall support the following other special variables
|
||
|
that are set by \fIawk\fP:
|
||
|
.TP 7
|
||
|
\fBARGC\fP
|
||
|
The number of elements in the \fBARGV\fP array.
|
||
|
.TP 7
|
||
|
\fBARGV\fP
|
||
|
An array of command line arguments, excluding options and the \fIprogram\fP
|
||
|
argument, numbered from zero to \fBARGC\fP-1.
|
||
|
.LP
|
||
|
The arguments in \fBARGV\fP can be modified or added to; \fBARGC\fP
|
||
|
can be altered. As each input file ends, \fIawk\fP shall
|
||
|
treat the next non-null element of \fBARGV\fP, up to the current value
|
||
|
of \fBARGC\fP-1, inclusive, as the name of the next input
|
||
|
file. Thus, setting an element of \fBARGV\fP to null means that it
|
||
|
shall not be treated as an input file. The name \fB'-'\fP
|
||
|
indicates the standard input. If an argument matches the format of
|
||
|
an \fIassignment\fP operand, this argument shall be treated as
|
||
|
an \fIassignment\fP rather than a \fIfile\fP argument.
|
||
|
.TP 7
|
||
|
\fBCONVFMT\fP
|
||
|
The \fBprintf\fP format for converting numbers to strings (except
|
||
|
for output statements, where \fBOFMT\fP is used);
|
||
|
\fB"%.6g"\fP by default.
|
||
|
.TP 7
|
||
|
\fBENVIRON\fP
|
||
|
An array representing the value of the environment, as described in
|
||
|
the \fIexec\fP functions defined in the System Interfaces
|
||
|
volume of IEEE\ Std\ 1003.1-2001. The indices of the array shall be
|
||
|
strings consisting of the names of the environment
|
||
|
variables, and the value of each array element shall be a string consisting
|
||
|
of the value of that variable. If appropriate, the
|
||
|
environment variable shall be considered a \fInumeric string\fP (see
|
||
|
Expressions in awk ); the
|
||
|
array element shall also have its numeric value.
|
||
|
.LP
|
||
|
In all cases where the behavior of \fIawk\fP is affected by environment
|
||
|
variables (including the environment of any commands
|
||
|
that \fIawk\fP executes via the \fBsystem\fP function or via pipeline
|
||
|
redirections with the \fBprint\fP statement, the
|
||
|
\fBprintf\fP statement, or the \fBgetline\fP function), the environment
|
||
|
used shall be the environment at the time \fIawk\fP
|
||
|
began executing; it is implementation-defined whether any modification
|
||
|
of \fBENVIRON\fP affects this environment.
|
||
|
.TP 7
|
||
|
\fBFILENAME\fP
|
||
|
A pathname of the current input file. Inside a \fBBEGIN\fP action
|
||
|
the value is undefined. Inside an \fBEND\fP action the
|
||
|
value shall be the name of the last input file processed.
|
||
|
.TP 7
|
||
|
\fBFNR\fP
|
||
|
The ordinal number of the current record in the current file. Inside
|
||
|
a \fBBEGIN\fP action the value shall be zero. Inside an
|
||
|
\fBEND\fP action the value shall be the number of the last record
|
||
|
processed in the last file processed.
|
||
|
.TP 7
|
||
|
\fBFS\fP
|
||
|
Input field separator regular expression; a <space> by default.
|
||
|
.TP 7
|
||
|
\fBNF\fP
|
||
|
The number of fields in the current record. Inside a \fBBEGIN\fP action,
|
||
|
the use of \fBNF\fP is undefined unless a
|
||
|
\fBgetline\fP function without a \fIvar\fP argument is executed previously.
|
||
|
Inside an \fBEND\fP action, \fBNF\fP shall retain
|
||
|
the value it had for the last record read, unless a subsequent, redirected,
|
||
|
\fBgetline\fP function without a \fIvar\fP argument
|
||
|
is performed prior to entering the \fBEND\fP action.
|
||
|
.TP 7
|
||
|
\fBNR\fP
|
||
|
The ordinal number of the current record from the start of input.
|
||
|
Inside a \fBBEGIN\fP action the value shall be zero. Inside
|
||
|
an \fBEND\fP action the value shall be the number of the last record
|
||
|
processed.
|
||
|
.TP 7
|
||
|
\fBOFMT\fP
|
||
|
The \fBprintf\fP format for converting numbers to strings in output
|
||
|
statements (see Output
|
||
|
Statements ); \fB"%.6g"\fP by default. The result of the conversion
|
||
|
is unspecified if the value of \fBOFMT\fP is not a
|
||
|
floating-point format specification.
|
||
|
.TP 7
|
||
|
\fBOFS\fP
|
||
|
The \fBprint\fP statement output field separation; <space> by default.
|
||
|
.TP 7
|
||
|
\fBORS\fP
|
||
|
The \fBprint\fP statement output record separator; a <newline> by
|
||
|
default.
|
||
|
.TP 7
|
||
|
\fBRLENGTH\fP
|
||
|
The length of the string matched by the \fBmatch\fP function.
|
||
|
.TP 7
|
||
|
\fBRS\fP
|
||
|
The first character of the string value of \fBRS\fP shall be the input
|
||
|
record separator; a <newline> by default. If
|
||
|
\fBRS\fP contains more than one character, the results are unspecified.
|
||
|
If \fBRS\fP is null, then records are separated by
|
||
|
sequences consisting of a <newline> plus one or more blank lines,
|
||
|
leading or trailing blank lines shall not result in empty
|
||
|
records at the beginning or end of the input, and a <newline> shall
|
||
|
always be a field separator, no matter what the value of
|
||
|
\fBFS\fP is.
|
||
|
.TP 7
|
||
|
\fBRSTART\fP
|
||
|
The starting position of the string matched by the \fBmatch\fP function,
|
||
|
numbering from 1. This shall always be equivalent to
|
||
|
the return value of the \fBmatch\fP function.
|
||
|
.TP 7
|
||
|
\fBSUBSEP\fP
|
||
|
The subscript separator string for multi-dimensional arrays; the default
|
||
|
value is implementation-defined.
|
||
|
.sp
|
||
|
.SS Regular Expressions
|
||
|
.LP
|
||
|
The \fIawk\fP utility shall make use of the extended regular expression
|
||
|
notation (see the Base Definitions volume of
|
||
|
IEEE\ Std\ 1003.1-2001, Section 9.4, Extended Regular Expressions)
|
||
|
except that it shall allow the use of C-language conventions for escaping
|
||
|
special characters within the EREs, as specified in the
|
||
|
table in the Base Definitions volume of IEEE\ Std\ 1003.1-2001, Chapter
|
||
|
5, File
|
||
|
Format Notation ( \fB'\\\\'\fP , \fB'\\a'\fP , \fB'\\b'\fP , \fB'\\f'\fP
|
||
|
, \fB'\\n'\fP , \fB'\\r'\fP , \fB'\\t'\fP
|
||
|
, \fB'\\v'\fP ) and the following table; these escape sequences shall
|
||
|
be recognized both inside and outside bracket expressions.
|
||
|
Note that records need not be separated by <newline>s and string constants
|
||
|
can contain <newline>s, so even the
|
||
|
\fB"\\n"\fP sequence is valid in \fIawk\fP EREs. Using a slash character
|
||
|
within an ERE requires the escaping shown in the
|
||
|
following table.
|
||
|
.br
|
||
|
.sp
|
||
|
.ce 1
|
||
|
\fBTable: Escape Sequences in \fIawk\fP\fP
|
||
|
.TS C
|
||
|
center; l1 lw(30)1 lw(30).
|
||
|
\fBEscape\fP T{
|
||
|
.na
|
||
|
\fB\ \fP
|
||
|
.ad
|
||
|
T} T{
|
||
|
.na
|
||
|
\fB\ \fP
|
||
|
.ad
|
||
|
T}
|
||
|
\fBSequence\fP T{
|
||
|
.na
|
||
|
\fBDescription\fP
|
||
|
.ad
|
||
|
T} T{
|
||
|
.na
|
||
|
\fBMeaning\fP
|
||
|
.ad
|
||
|
T}
|
||
|
\\" T{
|
||
|
.na
|
||
|
Backslash quotation-mark
|
||
|
.ad
|
||
|
T} T{
|
||
|
.na
|
||
|
Quotation-mark character
|
||
|
.ad
|
||
|
T}
|
||
|
\\/ T{
|
||
|
.na
|
||
|
Backslash slash
|
||
|
.ad
|
||
|
T} T{
|
||
|
.na
|
||
|
Slash character
|
||
|
.ad
|
||
|
T}
|
||
|
\\ddd T{
|
||
|
.na
|
||
|
A backslash character followed by the longest sequence of one, two, or three octal-digit characters (01234567). If all of the digits are 0 (that is, representation of the NUL character), the behavior is undefined.
|
||
|
.ad
|
||
|
T} T{
|
||
|
.na
|
||
|
The character whose encoding is represented by the one, two, or three-digit octal integer. Multi-byte characters require multiple, concatenated escape sequences of this type, including the leading \fB'\\'\fP for each byte.
|
||
|
.ad
|
||
|
T}
|
||
|
\\c T{
|
||
|
.na
|
||
|
A backslash character followed by any character not described in this table or in the table in the Base Definitions volume of IEEE\ Std\ 1003.1-2001, Chapter 5, File Format Notation ( \fB'\\\\'\fP , \fB'\\a'\fP , \fB'\\b'\fP , \fB'\\f'\fP , \fB'\\n'\fP , \fB'\\r'\fP , \fB'\\t'\fP , \fB'\\v'\fP ).
|
||
|
.ad
|
||
|
T} T{
|
||
|
.na
|
||
|
Undefined
|
||
|
.ad
|
||
|
T}
|
||
|
.TE
|
||
|
.LP
|
||
|
A regular expression can be matched against a specific field or string
|
||
|
by using one of the two regular expression matching
|
||
|
operators, \fB'~'\fP and \fB"!~"\fP . These operators shall interpret
|
||
|
their right-hand operand as a regular
|
||
|
expression and their left-hand operand as a string. If the regular
|
||
|
expression matches the string, the \fB'~'\fP expression
|
||
|
shall evaluate to a value of 1, and the \fB"!~"\fP expression shall
|
||
|
evaluate to a value of 0. (The regular expression
|
||
|
matching operation is as defined by the term matched in the Base Definitions
|
||
|
volume of IEEE\ Std\ 1003.1-2001, Section 9.1, Regular Expression
|
||
|
Definitions, where a match occurs on any part of the
|
||
|
string unless the regular expression is limited with the circumflex
|
||
|
or dollar sign special characters.) If the regular expression
|
||
|
does not match the string, the \fB'~'\fP expression shall evaluate
|
||
|
to a value of 0, and the \fB"!~"\fP expression
|
||
|
shall evaluate to a value of 1. If the right-hand operand is any expression
|
||
|
other than the lexical token \fBERE\fP, the string
|
||
|
value of the expression shall be interpreted as an extended regular
|
||
|
expression, including the escape conventions described above.
|
||
|
Note that these same escape conventions shall also be applied in determining
|
||
|
the value of a string literal (the lexical token
|
||
|
\fBSTRING\fP), and thus shall be applied a second time when a string
|
||
|
literal is used in this context.
|
||
|
.LP
|
||
|
When an \fBERE\fP token appears as an expression in any context other
|
||
|
than as the right-hand of the \fB'~'\fP or
|
||
|
\fB"!~"\fP operator or as one of the built-in function arguments described
|
||
|
below, the value of the resulting expression
|
||
|
shall be the equivalent of:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB$0 ~ /\fP\fIere\fP\fB/
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
The \fIere\fP argument to the \fBgsub\fP, \fBmatch\fP, \fBsub\fP functions,
|
||
|
and the \fIfs\fP argument to the \fBsplit\fP
|
||
|
function (see String Functions ) shall be interpreted as extended
|
||
|
regular expressions. These can be
|
||
|
either \fBERE\fP tokens or arbitrary expressions, and shall be interpreted
|
||
|
in the same manner as the right-hand side of the
|
||
|
\fB'~'\fP or \fB"!~"\fP operator.
|
||
|
.LP
|
||
|
An extended regular expression can be used to separate fields by using
|
||
|
the \fB-F\fP \fIERE\fP option or by assigning a string
|
||
|
containing the expression to the built-in variable \fBFS\fP. The default
|
||
|
value of the \fBFS\fP variable shall be a single
|
||
|
<space>. The following describes \fBFS\fP behavior:
|
||
|
.IP " 1." 4
|
||
|
If \fBFS\fP is a null string, the behavior is unspecified.
|
||
|
.LP
|
||
|
.IP " 2." 4
|
||
|
If \fBFS\fP is a single character:
|
||
|
.RS
|
||
|
.IP " a." 4
|
||
|
If \fBFS\fP is <space>, skip leading and trailing <blank>s; fields
|
||
|
shall be delimited by sets of one or more
|
||
|
<blank>s.
|
||
|
.LP
|
||
|
.IP " b." 4
|
||
|
Otherwise, if \fBFS\fP is any other character \fIc\fP, fields shall
|
||
|
be delimited by each single occurrence of \fIc\fP.
|
||
|
.LP
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP " 3." 4
|
||
|
Otherwise, the string value of \fBFS\fP shall be considered to be
|
||
|
an extended regular expression. Each occurrence of a sequence
|
||
|
matching the extended regular expression shall delimit fields.
|
||
|
.LP
|
||
|
.LP
|
||
|
Except for the \fB'~'\fP and \fB"!~"\fP operators, and in the \fBgsub\fP,
|
||
|
\fBmatch\fP, \fBsplit\fP, and
|
||
|
\fBsub\fP built-in functions, ERE matching shall be based on input
|
||
|
records; that is, record separator characters (the first
|
||
|
character of the value of the variable \fBRS\fP, <newline> by default)
|
||
|
cannot be embedded in the expression, and no
|
||
|
expression shall match the record separator character. If the record
|
||
|
separator is not <newline>, <newline>s embedded in
|
||
|
the expression can be matched. For the \fB'~'\fP and \fB"!~"\fP operators,
|
||
|
and in those four built-in functions,
|
||
|
ERE matching shall be based on text strings; that is, any character
|
||
|
(including <newline> and the record separator) can be
|
||
|
embedded in the pattern, and an appropriate pattern shall match any
|
||
|
character. However, in all \fIawk\fP ERE matching, the use of
|
||
|
one or more NUL characters in the pattern, input record, or text string
|
||
|
produces undefined results.
|
||
|
.SS Patterns
|
||
|
.LP
|
||
|
A \fIpattern\fP is any valid \fIexpression\fP, a range specified by
|
||
|
two expressions separated by a comma, or one of the two
|
||
|
special patterns \fBBEGIN\fP or \fBEND\fP.
|
||
|
.SS Special Patterns
|
||
|
.LP
|
||
|
The \fIawk\fP utility shall recognize two special patterns, \fBBEGIN\fP
|
||
|
and \fBEND\fP. Each \fBBEGIN\fP pattern shall be
|
||
|
matched once and its associated action executed before the first record
|
||
|
of input is read (except possibly by use of the
|
||
|
\fBgetline\fP function-see Input/Output and General Functions - in
|
||
|
a prior \fBBEGIN\fP action) and
|
||
|
before command line assignment is done. Each \fBEND\fP pattern shall
|
||
|
be matched once and its associated action executed after the
|
||
|
last record of input has been read. These two patterns shall have
|
||
|
associated actions.
|
||
|
.LP
|
||
|
\fBBEGIN\fP and \fBEND\fP shall not combine with other patterns. Multiple
|
||
|
\fBBEGIN\fP and \fBEND\fP patterns shall be
|
||
|
allowed. The actions associated with the \fBBEGIN\fP patterns shall
|
||
|
be executed in the order specified in the program, as are the
|
||
|
\fBEND\fP actions. An \fBEND\fP pattern can precede a \fBBEGIN\fP
|
||
|
pattern in a program.
|
||
|
.LP
|
||
|
If an \fIawk\fP program consists of only actions with the pattern
|
||
|
\fBBEGIN\fP, and the \fBBEGIN\fP action contains no
|
||
|
\fBgetline\fP function, \fIawk\fP shall exit without reading its input
|
||
|
when the last statement in the last \fBBEGIN\fP action is
|
||
|
executed. If an \fIawk\fP program consists of only actions with the
|
||
|
pattern \fBEND\fP or only actions with the patterns
|
||
|
\fBBEGIN\fP and \fBEND\fP, the input shall be read before the statements
|
||
|
in the \fBEND\fP actions are executed.
|
||
|
.SS Expression Patterns
|
||
|
.LP
|
||
|
An expression pattern shall be evaluated as if it were an expression
|
||
|
in a Boolean context. If the result is true, the pattern
|
||
|
shall be considered to match, and the associated action (if any) shall
|
||
|
be executed. If the result is false, the action shall not be
|
||
|
executed.
|
||
|
.SS Pattern Ranges
|
||
|
.LP
|
||
|
A pattern range consists of two expressions separated by a comma;
|
||
|
in this case, the action shall be performed for all records
|
||
|
between a match of the first expression and the following match of
|
||
|
the second expression, inclusive. At this point, the pattern
|
||
|
range can be repeated starting at input records subsequent to the
|
||
|
end of the matched range.
|
||
|
.SS Actions
|
||
|
.LP
|
||
|
An action is a sequence of statements as shown in the grammar in Grammar
|
||
|
\&. Any single statement
|
||
|
can be replaced by a statement list enclosed in braces. The application
|
||
|
shall ensure that statements in a statement list are
|
||
|
separated by <newline>s or semicolons. Statements in a statement list
|
||
|
shall be executed sequentially in the order that they
|
||
|
appear.
|
||
|
.LP
|
||
|
The \fIexpression\fP acting as the conditional in an \fBif\fP statement
|
||
|
shall be evaluated and if it is non-zero or non-null,
|
||
|
the following statement shall be executed; otherwise, if \fBelse\fP
|
||
|
is present, the statement following the \fBelse\fP shall be
|
||
|
executed.
|
||
|
.LP
|
||
|
The \fBif\fP, \fBwhile\fP, \fBdo\fP... \fBwhile\fP, \fBfor\fP, \fBbreak\fP,
|
||
|
and \fBcontinue\fP statements are based on
|
||
|
the ISO\ C standard (see \fIConcepts Derived from the ISO C Standard\fP
|
||
|
), except
|
||
|
that the Boolean expressions shall be treated as described in Expressions
|
||
|
in awk , and except in the
|
||
|
case of:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBfor (\fP\fIvariable\fP \fBin\fP \fIarray\fP\fB)
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
which shall iterate, assigning each \fIindex\fP of \fIarray\fP to
|
||
|
\fIvariable\fP in an unspecified order. The results of
|
||
|
adding new elements to \fIarray\fP within such a \fBfor\fP loop are
|
||
|
undefined. If a \fBbreak\fP or \fBcontinue\fP statement
|
||
|
occurs outside of a loop, the behavior is undefined.
|
||
|
.LP
|
||
|
The \fBdelete\fP statement shall remove an individual array element.
|
||
|
Thus, the following code deletes an entire array:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBfor (index in array)
|
||
|
delete array[index]
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
The \fBnext\fP statement shall cause all further processing of the
|
||
|
current input record to be abandoned. The behavior is
|
||
|
undefined if a \fBnext\fP statement appears or is invoked in a \fBBEGIN\fP
|
||
|
or \fBEND\fP action.
|
||
|
.LP
|
||
|
The \fBexit\fP statement shall invoke all \fBEND\fP actions in the
|
||
|
order in which they occur in the program source and then
|
||
|
terminate the program without reading further input. An \fBexit\fP
|
||
|
statement inside an \fBEND\fP action shall terminate the
|
||
|
program without further execution of \fBEND\fP actions. If an expression
|
||
|
is specified in an \fBexit\fP statement, its numeric
|
||
|
value shall be the exit status of \fIawk\fP, unless subsequent errors
|
||
|
are encountered or a subsequent \fBexit\fP statement with
|
||
|
an expression is executed.
|
||
|
.SS Output Statements
|
||
|
.LP
|
||
|
Both \fBprint\fP and \fBprintf\fP statements shall write to standard
|
||
|
output by default. The output shall be written to the
|
||
|
location specified by \fIoutput_redirection\fP if one is supplied,
|
||
|
as follows:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB>\fP \fIexpression\fP\fB>>\fP \fIexpression\fP\fB|\fP \fIexpression\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
In all cases, the \fIexpression\fP shall be evaluated to produce a
|
||
|
string that is used as a pathname into which to write (for
|
||
|
\fB'>'\fP or \fB">>"\fP ) or as a command to be executed (for \fB'|'\fP
|
||
|
). Using the first two forms, if the file
|
||
|
of that name is not currently open, it shall be opened, creating it
|
||
|
if necessary and using the first form, truncating the file. The
|
||
|
output then shall be appended to the file. As long as the file remains
|
||
|
open, subsequent calls in which \fIexpression\fP evaluates
|
||
|
to the same string value shall simply append output to the file. The
|
||
|
file remains open until the \fBclose\fP function (see Input/Output
|
||
|
and General Functions ) is called with an expression that evaluates
|
||
|
to the same string
|
||
|
value.
|
||
|
.LP
|
||
|
The third form shall write output onto a stream piped to the input
|
||
|
of a command. The stream shall be created if no stream is
|
||
|
currently open with the value of \fIexpression\fP as its command name.
|
||
|
The stream created shall be equivalent to one created by a
|
||
|
call to the \fIpopen\fP() function defined in the System Interfaces
|
||
|
volume of
|
||
|
IEEE\ Std\ 1003.1-2001 with the value of \fIexpression\fP as the \fIcommand\fP
|
||
|
argument and a value of \fIw\fP as the
|
||
|
\fImode\fP argument. As long as the stream remains open, subsequent
|
||
|
calls in which \fIexpression\fP evaluates to the same string
|
||
|
value shall write output to the existing stream. The stream shall
|
||
|
remain open until the \fBclose\fP function (see Input/Output and General
|
||
|
Functions ) is called with an expression that evaluates to the same
|
||
|
string value.
|
||
|
At that time, the stream shall be closed as if by a call to the \fIpclose\fP()
|
||
|
function
|
||
|
defined in the System Interfaces volume of IEEE\ Std\ 1003.1-2001.
|
||
|
.LP
|
||
|
As described in detail by the grammar in Grammar , these output statements
|
||
|
shall take a
|
||
|
comma-separated list of \fIexpression\fPs referred to in the grammar
|
||
|
by the non-terminal symbols \fBexpr_list\fP,
|
||
|
\fBprint_expr_list\fP, or \fBprint_expr_list_opt\fP. This list is
|
||
|
referred to here as the \fIexpression list\fP, and each member
|
||
|
is referred to as an \fIexpression argument\fP.
|
||
|
.LP
|
||
|
The \fBprint\fP statement shall write the value of each expression
|
||
|
argument onto the indicated output stream separated by the
|
||
|
current output field separator (see variable \fBOFS\fP above), and
|
||
|
terminated by the output record separator (see variable
|
||
|
\fBORS\fP above). All expression arguments shall be taken as strings,
|
||
|
being converted if necessary; this conversion shall be as
|
||
|
described in Expressions in awk , with the exception that the \fBprintf\fP
|
||
|
format in \fBOFMT\fP
|
||
|
shall be used instead of the value in \fBCONVFMT\fP. An empty expression
|
||
|
list shall stand for the whole input record ($0).
|
||
|
.LP
|
||
|
The \fBprintf\fP statement shall produce output based on a notation
|
||
|
similar to the File Format Notation used to describe file
|
||
|
formats in this volume of IEEE\ Std\ 1003.1-2001 (see the Base Definitions
|
||
|
volume of IEEE\ Std\ 1003.1-2001, Chapter 5, File Format Notation).
|
||
|
Output shall be produced as specified with the first
|
||
|
\fIexpression\fP argument as the string \fIformat\fP and subsequent
|
||
|
\fIexpression\fP arguments as the strings \fIarg1\fP to
|
||
|
\fIargn\fP, inclusive, with the following exceptions:
|
||
|
.IP " 1." 4
|
||
|
The \fIformat\fP shall be an actual character string rather than a
|
||
|
graphical representation. Therefore, it cannot contain empty
|
||
|
character positions. The <space> in the \fIformat\fP string, in any
|
||
|
context other than a \fIflag\fP of a conversion
|
||
|
specification, shall be treated as an ordinary character that is copied
|
||
|
to the output.
|
||
|
.LP
|
||
|
.IP " 2." 4
|
||
|
If the character set contains a \fB' '\fP character and that character
|
||
|
appears in
|
||
|
the \fIformat\fP string, it shall be treated as an ordinary character
|
||
|
that is copied to the output.
|
||
|
.LP
|
||
|
.IP " 3." 4
|
||
|
The \fIescape sequences\fP beginning with a backslash character shall
|
||
|
be treated as sequences of ordinary characters that are
|
||
|
copied to the output. Note that these same sequences shall be interpreted
|
||
|
lexically by \fIawk\fP when they appear in literal
|
||
|
strings, but they shall not be treated specially by the \fBprintf\fP
|
||
|
statement.
|
||
|
.LP
|
||
|
.IP " 4." 4
|
||
|
A \fIfield width\fP or \fIprecision\fP can be specified as the \fB'*'\fP
|
||
|
character instead of a digit string. In this case
|
||
|
the next argument from the expression list shall be fetched and its
|
||
|
numeric value taken as the field width or precision.
|
||
|
.LP
|
||
|
.IP " 5." 4
|
||
|
The implementation shall not precede or follow output from the \fBd\fP
|
||
|
or \fBu\fP conversion specifier characters with
|
||
|
<blank>s not specified by the \fIformat\fP string.
|
||
|
.LP
|
||
|
.IP " 6." 4
|
||
|
The implementation shall not precede output from the \fBo\fP conversion
|
||
|
specifier character with leading zeros not specified
|
||
|
by the \fIformat\fP string.
|
||
|
.LP
|
||
|
.IP " 7." 4
|
||
|
For the \fBc\fP conversion specifier character: if the argument has
|
||
|
a numeric value, the character whose encoding is that
|
||
|
value shall be output. If the value is zero or is not the encoding
|
||
|
of any character in the character set, the behavior is
|
||
|
undefined. If the argument does not have a numeric value, the first
|
||
|
character of the string value shall be output; if the string
|
||
|
does not contain any characters, the behavior is undefined.
|
||
|
.LP
|
||
|
.IP " 8." 4
|
||
|
For each conversion specification that consumes an argument, the next
|
||
|
expression argument shall be evaluated. With the exception
|
||
|
of the \fBc\fP conversion specifier character, the value shall be
|
||
|
converted (according to the rules specified in Expressions in awk
|
||
|
) to the appropriate type for the conversion specification.
|
||
|
.LP
|
||
|
.IP " 9." 4
|
||
|
If there are insufficient expression arguments to satisfy all the
|
||
|
conversion specifications in the \fIformat\fP string, the
|
||
|
behavior is undefined.
|
||
|
.LP
|
||
|
.IP "10." 4
|
||
|
If any character sequence in the \fIformat\fP string begins with a
|
||
|
\fB'%'\fP character, but does not form a valid conversion
|
||
|
specification, the behavior is unspecified.
|
||
|
.LP
|
||
|
.LP
|
||
|
Both \fBprint\fP and \fBprintf\fP can output at least {LINE_MAX} bytes.
|
||
|
.SS Functions
|
||
|
.LP
|
||
|
The \fIawk\fP language has a variety of built-in functions: arithmetic,
|
||
|
string, input/output, and general.
|
||
|
.SS Arithmetic Functions
|
||
|
.LP
|
||
|
The arithmetic functions, except for \fBint\fP, shall be based on
|
||
|
the ISO\ C standard (see \fIConcepts Derived from the ISO C Standard\fP
|
||
|
). The behavior is undefined in cases where the
|
||
|
ISO\ C standard specifies that an error be returned or that the behavior
|
||
|
is undefined. Although the grammar (see Grammar ) permits built-in
|
||
|
functions to appear with no arguments or parentheses, unless the argument
|
||
|
or
|
||
|
parentheses are indicated as optional in the following list (by displaying
|
||
|
them within the \fB"[]"\fP brackets), such use is
|
||
|
undefined.
|
||
|
.TP 7
|
||
|
\fBatan2\fP(\fIy\fP,\fIx\fP)
|
||
|
Return arctangent of \fIy\fP/\fIx\fP in radians in the range [-pi,pi].
|
||
|
.TP 7
|
||
|
\fBcos\fP(\fIx\fP)
|
||
|
Return cosine of \fIx\fP, where \fIx\fP is in radians.
|
||
|
.TP 7
|
||
|
\fBsin\fP(\fIx\fP)
|
||
|
Return sine of \fIx\fP, where \fIx\fP is in radians.
|
||
|
.TP 7
|
||
|
\fBexp\fP(\fIx\fP)
|
||
|
Return the exponential function of \fIx\fP.
|
||
|
.TP 7
|
||
|
\fBlog\fP(\fIx\fP)
|
||
|
Return the natural logarithm of \fIx\fP.
|
||
|
.TP 7
|
||
|
\fBsqrt\fP(\fIx\fP)
|
||
|
Return the square root of \fIx\fP.
|
||
|
.TP 7
|
||
|
\fBint\fP(\fIx\fP)
|
||
|
Return the argument truncated to an integer. Truncation shall be toward
|
||
|
0 when \fIx\fP>0.
|
||
|
.TP 7
|
||
|
\fBrand\fP()
|
||
|
Return a random number \fIn\fP, such that 0<=\fIn\fP<1.
|
||
|
.TP 7
|
||
|
\fBsrand\fP(\fB[\fP\fIexpr\fP\fB]\fP)
|
||
|
Set the seed value for \fIrand\fP to \fIexpr\fP or use the time of
|
||
|
day if \fIexpr\fP is omitted. The previous seed value
|
||
|
shall be returned.
|
||
|
.sp
|
||
|
.SS String Functions
|
||
|
.LP
|
||
|
The string functions in the following list shall be supported. Although
|
||
|
the grammar (see Grammar
|
||
|
) permits built-in functions to appear with no arguments or parentheses,
|
||
|
unless the argument or parentheses are indicated as
|
||
|
optional in the following list (by displaying them within the \fB"[]"\fP
|
||
|
brackets), such use is undefined.
|
||
|
.TP 7
|
||
|
\fBgsub\fP(\fIere\fP,\ \fIrepl\fP\fB[\fP,\ \fIin\fP\fB]\fP)
|
||
|
Behave like \fBsub\fP (see below), except that it shall replace all
|
||
|
occurrences of the regular expression (like the \fIed\fP utility global
|
||
|
substitute) in $0 or in the \fIin\fP argument, when specified.
|
||
|
.TP 7
|
||
|
\fBindex\fP(\fIs\fP,\ \fIt\fP)
|
||
|
Return the position, in characters, numbering from 1, in string \fIs\fP
|
||
|
where string \fIt\fP first occurs, or zero if it does
|
||
|
not occur at all.
|
||
|
.TP 7
|
||
|
\fBlength[\fP(\fB[\fP\fIs\fP\fB]\fP)\fB]\fP
|
||
|
Return the length, in characters, of its argument taken as a string,
|
||
|
or of the whole record, $0, if there is no argument.
|
||
|
.TP 7
|
||
|
\fBmatch\fP(\fIs\fP,\ \fIere\fP)
|
||
|
Return the position, in characters, numbering from 1, in string \fIs\fP
|
||
|
where the extended regular expression \fIere\fP
|
||
|
occurs, or zero if it does not occur at all. RSTART shall be set to
|
||
|
the starting position (which is the same as the returned
|
||
|
value), zero if no match is found; RLENGTH shall be set to the length
|
||
|
of the matched string, -1 if no match is found.
|
||
|
.TP 7
|
||
|
\fBsplit\fP(\fIs\fP,\ \fIa\fP\fB[\fP,\ \fIfs\ \fP \fB]\fP)
|
||
|
Split the string \fIs\fP into array elements \fIa\fP[1], \fIa\fP[2],
|
||
|
\&..., \fIa\fP[\fIn\fP], and return \fIn\fP. All elements
|
||
|
of the array shall be deleted before the split is performed. The separation
|
||
|
shall be done with the ERE \fIfs\fP or with the field
|
||
|
separator \fBFS\fP if \fIfs\fP is not given. Each array element shall
|
||
|
have a string value when created and, if appropriate, the
|
||
|
array element shall be considered a numeric string (see Expressions
|
||
|
in awk ). The effect of a null
|
||
|
string as the value of \fIfs\fP is unspecified.
|
||
|
.TP 7
|
||
|
\fBsprintf\fP(\fIfmt\fP,\ \fIexpr\fP,\ \fIexpr\fP,\ ...)
|
||
|
Format the expressions according to the \fBprintf\fP format given
|
||
|
by \fIfmt\fP and return the resulting string.
|
||
|
.TP 7
|
||
|
\fBsub(\fP\fIere\fP,\ \fIrepl\fP\fB[\fP,\ \fIin\ \fP \fB]\fP)
|
||
|
Substitute the string \fIrepl\fP in place of the first instance of
|
||
|
the extended regular expression \fIERE\fP in string \fIin\fP
|
||
|
and return the number of substitutions. An ampersand ( \fB'&'\fP )
|
||
|
appearing in the string \fIrepl\fP shall be replaced by
|
||
|
the string from \fIin\fP that matches the ERE. An ampersand preceded
|
||
|
with a backslash ( \fB'\\'\fP ) shall be interpreted as the
|
||
|
literal ampersand character. An occurrence of two consecutive backslashes
|
||
|
shall be interpreted as just a single literal backslash
|
||
|
character. Any other occurrence of a backslash (for example, preceding
|
||
|
any other character) shall be treated as a literal backslash
|
||
|
character. Note that if \fIrepl\fP is a string literal (the lexical
|
||
|
token \fBSTRING\fP; see Grammar ), the handling of the ampersand character
|
||
|
occurs after any lexical processing, including any
|
||
|
lexical backslash escape sequence processing. If \fIin\fP is specified
|
||
|
and it is not an lvalue (see Expressions in awk ), the behavior is
|
||
|
undefined. If \fIin\fP is omitted, \fIawk\fP shall use the current
|
||
|
record ($0) in its place.
|
||
|
.TP 7
|
||
|
\fBsubstr\fP(\fIs\fP,\ \fIm\fP\fB[\fP,\ \fIn\ \fP \fB]\fP)
|
||
|
Return the at most \fIn\fP-character substring of \fIs\fP that begins
|
||
|
at position \fIm\fP, numbering from 1. If \fIn\fP is
|
||
|
omitted, or if \fIn\fP specifies more characters than are left in
|
||
|
the string, the length of the substring shall be limited by the
|
||
|
length of the string \fIs\fP.
|
||
|
.TP 7
|
||
|
\fBtolower\fP(\fIs\fP)
|
||
|
Return a string based on the string \fIs\fP. Each character in \fIs\fP
|
||
|
that is an uppercase letter specified to have a
|
||
|
\fBtolower\fP mapping by the \fILC_CTYPE\fP category of the current
|
||
|
locale shall be replaced in the returned string by the
|
||
|
lowercase letter specified by the mapping. Other characters in \fIs\fP
|
||
|
shall be unchanged in the returned string.
|
||
|
.TP 7
|
||
|
\fBtoupper\fP(\fIs\fP)
|
||
|
Return a string based on the string \fIs\fP. Each character in \fIs\fP
|
||
|
that is a lowercase letter specified to have a
|
||
|
\fBtoupper\fP mapping by the \fILC_CTYPE\fP category of the current
|
||
|
locale is replaced in the returned string by the uppercase
|
||
|
letter specified by the mapping. Other characters in \fIs\fP are unchanged
|
||
|
in the returned string.
|
||
|
.sp
|
||
|
.LP
|
||
|
All of the preceding functions that take \fIERE\fP as a parameter
|
||
|
expect a pattern or a string valued expression that is a
|
||
|
regular expression as defined in Regular Expressions .
|
||
|
.SS Input/Output and General Functions
|
||
|
.LP
|
||
|
The input/output and general functions are:
|
||
|
.TP 7
|
||
|
\fBclose\fP(\fIexpression\fP)
|
||
|
Close the file or pipe opened by a \fBprint\fP or \fBprintf\fP statement
|
||
|
or a call to \fBgetline\fP with the same string-valued
|
||
|
\fIexpression\fP. The limit on the number of open \fIexpression\fP
|
||
|
arguments is implementation-defined. If the close was
|
||
|
successful, the function shall return zero; otherwise, it shall return
|
||
|
non-zero.
|
||
|
.TP 7
|
||
|
\fIexpression\ |\ \fP \fBgetline\ [\fP\fIvar\fP\fB]\fP
|
||
|
Read a record of input from a stream piped from the output of a command.
|
||
|
The stream shall be created if no stream is currently open
|
||
|
with the value of \fIexpression\fP as its command name. The stream
|
||
|
created shall be equivalent to one created by a call to the \fIpopen\fP()
|
||
|
function with the value of \fIexpression\fP as the \fIcommand\fP argument
|
||
|
and a
|
||
|
value of \fIr\fP as the \fImode\fP argument. As long as the stream
|
||
|
remains open, subsequent calls in which \fIexpression\fP
|
||
|
evaluates to the same string value shall read subsequent records from
|
||
|
the stream. The stream shall remain open until the
|
||
|
\fBclose\fP function is called with an expression that evaluates to
|
||
|
the same string value. At that time, the stream shall be
|
||
|
closed as if by a call to the \fIpclose\fP() function. If \fIvar\fP
|
||
|
is omitted, $0 and
|
||
|
\fBNF\fP shall be set; otherwise, \fIvar\fP shall be set and, if appropriate,
|
||
|
it shall be considered a numeric string (see Expressions in awk ).
|
||
|
.LP
|
||
|
The \fBgetline\fP operator can form ambiguous constructs when there
|
||
|
are unparenthesized operators (including concatenate) to
|
||
|
the left of the \fB'|'\fP (to the beginning of the expression containing
|
||
|
\fBgetline\fP). In the context of the \fB'$'\fP
|
||
|
operator, \fB'|'\fP shall behave as if it had a lower precedence than
|
||
|
\fB'$'\fP . The result of evaluating other operators is
|
||
|
unspecified, and conforming applications shall parenthesize properly
|
||
|
all such usages.
|
||
|
.TP 7
|
||
|
\fBgetline\fP
|
||
|
Set $0 to the next input record from the current input file. This
|
||
|
form of \fBgetline\fP shall set the \fBNF\fP, \fBNR\fP,
|
||
|
and \fBFNR\fP variables.
|
||
|
.TP 7
|
||
|
\fBgetline\ \fP \fIvar\fP
|
||
|
Set variable \fIvar\fP to the next input record from the current input
|
||
|
file and, if appropriate, \fIvar\fP shall be
|
||
|
considered a numeric string (see Expressions in awk ). This form of
|
||
|
\fBgetline\fP shall set the
|
||
|
\fBFNR\fP and \fBNR\fP variables.
|
||
|
.TP 7
|
||
|
\fBgetline\ [\fP\fIvar\fP\fB]\ \fP <\ \fIexpression\fP
|
||
|
Read the next record of input from a named file. The \fIexpression\fP
|
||
|
shall be evaluated to produce a string that is used as a
|
||
|
pathname. If the file of that name is not currently open, it shall
|
||
|
be opened. As long as the stream remains open, subsequent calls
|
||
|
in which \fIexpression\fP evaluates to the same string value shall
|
||
|
read subsequent records from the file. The file shall remain
|
||
|
open until the \fBclose\fP function is called with an expression that
|
||
|
evaluates to the same string value. If \fIvar\fP is
|
||
|
omitted, $0 and \fBNF\fP shall be set; otherwise, \fIvar\fP shall
|
||
|
be set and, if appropriate, it shall be considered a numeric
|
||
|
string (see Expressions in awk ).
|
||
|
.LP
|
||
|
The \fBgetline\fP operator can form ambiguous constructs when there
|
||
|
are unparenthesized binary operators (including
|
||
|
concatenate) to the right of the \fB'<'\fP (up to the end of the expression
|
||
|
containing the \fBgetline\fP). The result of
|
||
|
evaluating such a construct is unspecified, and conforming applications
|
||
|
shall parenthesize properly all such usages.
|
||
|
.TP 7
|
||
|
\fBsystem\fP(\fIexpression\fP)
|
||
|
Execute the command given by \fIexpression\fP in a manner equivalent
|
||
|
to the \fIsystem\fP()
|
||
|
function defined in the System Interfaces volume of IEEE\ Std\ 1003.1-2001
|
||
|
and return the exit status of the command.
|
||
|
.sp
|
||
|
.LP
|
||
|
All forms of \fBgetline\fP shall return 1 for successful input, zero
|
||
|
for end-of-file, and -1 for an error.
|
||
|
.LP
|
||
|
Where strings are used as the name of a file or pipeline, the application
|
||
|
shall ensure that the strings are textually identical.
|
||
|
The terminology "same string value" implies that "equivalent strings",
|
||
|
even those that differ only by <space>s, represent
|
||
|
different files.
|
||
|
.SS User-Defined Functions
|
||
|
.LP
|
||
|
The \fIawk\fP language also provides user-defined functions. Such
|
||
|
functions can be defined as:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBfunction\fP \fIname\fP\fB(\fP\fB[\fP\fIparameter\fP\fB, ...\fP\fB]\fP\fB) {\fP \fIstatements\fP \fB}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
A function can be referred to anywhere in an \fIawk\fP program; in
|
||
|
particular, its use can precede its definition. The scope of
|
||
|
a function is global.
|
||
|
.LP
|
||
|
Function parameters, if present, can be either scalars or arrays;
|
||
|
the behavior is undefined if an array name is passed as a
|
||
|
parameter that the function uses as a scalar, or if a scalar expression
|
||
|
is passed as a parameter that the function uses as an
|
||
|
array. Function parameters shall be passed by value if scalar and
|
||
|
by reference if array name.
|
||
|
.LP
|
||
|
The number of parameters in the function definition need not match
|
||
|
the number of parameters in the function call. Excess formal
|
||
|
parameters can be used as local variables. If fewer arguments are
|
||
|
supplied in a function call than are in the function definition,
|
||
|
the extra parameters that are used in the function body as scalars
|
||
|
shall evaluate to the uninitialized value until they are
|
||
|
otherwise initialized, and the extra parameters that are used in the
|
||
|
function body as arrays shall be treated as uninitialized
|
||
|
arrays where each element evaluates to the uninitialized value until
|
||
|
otherwise initialized.
|
||
|
.LP
|
||
|
When invoking a function, no white space can be placed between the
|
||
|
function name and the opening parenthesis. Function calls can
|
||
|
be nested and recursive calls can be made upon functions. Upon return
|
||
|
from any nested or recursive function call, the values of all
|
||
|
of the calling function's parameters shall be unchanged, except for
|
||
|
array parameters passed by reference. The \fBreturn\fP
|
||
|
statement can be used to return a value. If a \fBreturn\fP statement
|
||
|
appears outside of a function definition, the behavior is
|
||
|
undefined.
|
||
|
.LP
|
||
|
In the function definition, <newline>s shall be optional before the
|
||
|
opening brace and after the closing brace. Function
|
||
|
definitions can appear anywhere in the program where a \fIpattern-action\fP
|
||
|
pair is allowed.
|
||
|
.SS Grammar
|
||
|
.LP
|
||
|
The grammar in this section and the lexical conventions in the following
|
||
|
section shall together describe the syntax for
|
||
|
\fIawk\fP programs. The general conventions for this style of grammar
|
||
|
are described in \fIGrammar Conventions\fP . A valid program can be
|
||
|
represented as the non-terminal symbol
|
||
|
\fIprogram\fP in the grammar. This formal syntax shall take precedence
|
||
|
over the preceding text syntax description.
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB%token NAME NUMBER STRING ERE
|
||
|
%token FUNC_NAME /* Name followed by '(' without white space. */
|
||
|
.sp
|
||
|
|
||
|
/* Keywords */
|
||
|
%token Begin End
|
||
|
/* 'BEGIN' 'END' */
|
||
|
.sp
|
||
|
|
||
|
%token Break Continue Delete Do Else
|
||
|
/* 'break' 'continue' 'delete' 'do' 'else' */
|
||
|
.sp
|
||
|
|
||
|
%token Exit For Function If In
|
||
|
/* 'exit' 'for' 'function' 'if' 'in' */
|
||
|
.sp
|
||
|
|
||
|
%token Next Print Printf Return While
|
||
|
/* 'next' 'print' 'printf' 'return' 'while' */
|
||
|
.sp
|
||
|
|
||
|
/* Reserved function names */
|
||
|
%token BUILTIN_FUNC_NAME
|
||
|
/* One token for the following:
|
||
|
* atan2 cos sin exp log sqrt int rand srand
|
||
|
* gsub index length match split sprintf sub
|
||
|
* substr tolower toupper close system
|
||
|
*/
|
||
|
%token GETLINE
|
||
|
/* Syntactically different from other built-ins. */
|
||
|
.sp
|
||
|
|
||
|
/* Two-character tokens. */
|
||
|
%token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
|
||
|
/* '+=' '-=' '*=' '/=' '%=' '^=' */
|
||
|
.sp
|
||
|
|
||
|
%token OR AND NO_MATCH EQ LE GE NE INCR DECR APPEND
|
||
|
/* '||' '&&' '!~' '==' '<=' '>=' '!=' '++' '--' '>>' */
|
||
|
.sp
|
||
|
|
||
|
/* One-character tokens. */
|
||
|
%token '{' '}' '(' ')' '[' ']' ',' ';' NEWLINE
|
||
|
%token '+' '-' '*' '%' '^' '!' '>' '<' '|' '?' ':' '~' '$' '='
|
||
|
.sp
|
||
|
|
||
|
%start program
|
||
|
%%
|
||
|
.sp
|
||
|
|
||
|
program : item_list
|
||
|
| actionless_item_list
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
item_list : newline_opt
|
||
|
| actionless_item_list item terminator
|
||
|
| item_list item terminator
|
||
|
| item_list action terminator
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
actionless_item_list : item_list pattern terminator
|
||
|
| actionless_item_list pattern terminator
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
item : pattern action
|
||
|
| Function NAME '(' param_list_opt ')'
|
||
|
newline_opt action
|
||
|
| Function FUNC_NAME '(' param_list_opt ')'
|
||
|
newline_opt action
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
param_list_opt : /* empty */
|
||
|
| param_list
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
param_list : NAME
|
||
|
| param_list ',' NAME
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
pattern : Begin
|
||
|
| End
|
||
|
| expr
|
||
|
| expr ',' newline_opt expr
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
action : '{' newline_opt '}'
|
||
|
| '{' newline_opt terminated_statement_list '}'
|
||
|
| '{' newline_opt unterminated_statement_list '}'
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
terminator : terminator ';'
|
||
|
| terminator NEWLINE
|
||
|
| ';'
|
||
|
| NEWLINE
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
terminated_statement_list : terminated_statement
|
||
|
| terminated_statement_list terminated_statement
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
unterminated_statement_list : unterminated_statement
|
||
|
| terminated_statement_list unterminated_statement
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
terminated_statement : action newline_opt
|
||
|
| If '(' expr ')' newline_opt terminated_statement
|
||
|
| If '(' expr ')' newline_opt terminated_statement
|
||
|
Else newline_opt terminated_statement
|
||
|
| While '(' expr ')' newline_opt terminated_statement
|
||
|
| For '(' simple_statement_opt ';'
|
||
|
expr_opt ';' simple_statement_opt ')' newline_opt
|
||
|
terminated_statement
|
||
|
| For '(' NAME In NAME ')' newline_opt
|
||
|
terminated_statement
|
||
|
| ';' newline_opt
|
||
|
| terminatable_statement NEWLINE newline_opt
|
||
|
| terminatable_statement ';' newline_opt
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
unterminated_statement : terminatable_statement
|
||
|
| If '(' expr ')' newline_opt unterminated_statement
|
||
|
| If '(' expr ')' newline_opt terminated_statement
|
||
|
Else newline_opt unterminated_statement
|
||
|
| While '(' expr ')' newline_opt unterminated_statement
|
||
|
| For '(' simple_statement_opt ';'
|
||
|
expr_opt ';' simple_statement_opt ')' newline_opt
|
||
|
unterminated_statement
|
||
|
| For '(' NAME In NAME ')' newline_opt
|
||
|
unterminated_statement
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
terminatable_statement : simple_statement
|
||
|
| Break
|
||
|
| Continue
|
||
|
| Next
|
||
|
| Exit expr_opt
|
||
|
| Return expr_opt
|
||
|
| Do newline_opt terminated_statement While '(' expr ')'
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
simple_statement_opt : /* empty */
|
||
|
| simple_statement
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
simple_statement : Delete NAME '[' expr_list ']'
|
||
|
| expr
|
||
|
| print_statement
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
print_statement : simple_print_statement
|
||
|
| simple_print_statement output_redirection
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
simple_print_statement : Print print_expr_list_opt
|
||
|
| Print '(' multiple_expr_list ')'
|
||
|
| Printf print_expr_list
|
||
|
| Printf '(' multiple_expr_list ')'
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
output_redirection : '>' expr
|
||
|
| APPEND expr
|
||
|
| '|' expr
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
expr_list_opt : /* empty */
|
||
|
| expr_list
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
expr_list : expr
|
||
|
| multiple_expr_list
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
multiple_expr_list : expr ',' newline_opt expr
|
||
|
| multiple_expr_list ',' newline_opt expr
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
expr_opt : /* empty */
|
||
|
| expr
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
expr : unary_expr
|
||
|
| non_unary_expr
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
unary_expr : '+' expr
|
||
|
| '-' expr
|
||
|
| unary_expr '^' expr
|
||
|
| unary_expr '*' expr
|
||
|
| unary_expr '/' expr
|
||
|
| unary_expr '%' expr
|
||
|
| unary_expr '+' expr
|
||
|
| unary_expr '-' expr
|
||
|
| unary_expr non_unary_expr
|
||
|
| unary_expr '<' expr
|
||
|
| unary_expr LE expr
|
||
|
| unary_expr NE expr
|
||
|
| unary_expr EQ expr
|
||
|
| unary_expr '>' expr
|
||
|
| unary_expr GE expr
|
||
|
| unary_expr '~' expr
|
||
|
| unary_expr NO_MATCH expr
|
||
|
| unary_expr In NAME
|
||
|
| unary_expr AND newline_opt expr
|
||
|
| unary_expr OR newline_opt expr
|
||
|
| unary_expr '?' expr ':' expr
|
||
|
| unary_input_function
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
non_unary_expr : '(' expr ')'
|
||
|
| '!' expr
|
||
|
| non_unary_expr '^' expr
|
||
|
| non_unary_expr '*' expr
|
||
|
| non_unary_expr '/' expr
|
||
|
| non_unary_expr '%' expr
|
||
|
| non_unary_expr '+' expr
|
||
|
| non_unary_expr '-' expr
|
||
|
| non_unary_expr non_unary_expr
|
||
|
| non_unary_expr '<' expr
|
||
|
| non_unary_expr LE expr
|
||
|
| non_unary_expr NE expr
|
||
|
| non_unary_expr EQ expr
|
||
|
| non_unary_expr '>' expr
|
||
|
| non_unary_expr GE expr
|
||
|
| non_unary_expr '~' expr
|
||
|
| non_unary_expr NO_MATCH expr
|
||
|
| non_unary_expr In NAME
|
||
|
| '(' multiple_expr_list ')' In NAME
|
||
|
| non_unary_expr AND newline_opt expr
|
||
|
| non_unary_expr OR newline_opt expr
|
||
|
| non_unary_expr '?' expr ':' expr
|
||
|
| NUMBER
|
||
|
| STRING
|
||
|
| lvalue
|
||
|
| ERE
|
||
|
| lvalue INCR
|
||
|
| lvalue DECR
|
||
|
| INCR lvalue
|
||
|
| DECR lvalue
|
||
|
| lvalue POW_ASSIGN expr
|
||
|
| lvalue MOD_ASSIGN expr
|
||
|
| lvalue MUL_ASSIGN expr
|
||
|
| lvalue DIV_ASSIGN expr
|
||
|
| lvalue ADD_ASSIGN expr
|
||
|
| lvalue SUB_ASSIGN expr
|
||
|
| lvalue '=' expr
|
||
|
| FUNC_NAME '(' expr_list_opt ')'
|
||
|
/* no white space allowed before '(' */
|
||
|
| BUILTIN_FUNC_NAME '(' expr_list_opt ')'
|
||
|
| BUILTIN_FUNC_NAME
|
||
|
| non_unary_input_function
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
print_expr_list_opt : /* empty */
|
||
|
| print_expr_list
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
print_expr_list : print_expr
|
||
|
| print_expr_list ',' newline_opt print_expr
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
print_expr : unary_print_expr
|
||
|
| non_unary_print_expr
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
unary_print_expr : '+' print_expr
|
||
|
| '-' print_expr
|
||
|
| unary_print_expr '^' print_expr
|
||
|
| unary_print_expr '*' print_expr
|
||
|
| unary_print_expr '/' print_expr
|
||
|
| unary_print_expr '%' print_expr
|
||
|
| unary_print_expr '+' print_expr
|
||
|
| unary_print_expr '-' print_expr
|
||
|
| unary_print_expr non_unary_print_expr
|
||
|
| unary_print_expr '~' print_expr
|
||
|
| unary_print_expr NO_MATCH print_expr
|
||
|
| unary_print_expr In NAME
|
||
|
| unary_print_expr AND newline_opt print_expr
|
||
|
| unary_print_expr OR newline_opt print_expr
|
||
|
| unary_print_expr '?' print_expr ':' print_expr
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
non_unary_print_expr : '(' expr ')'
|
||
|
| '!' print_expr
|
||
|
| non_unary_print_expr '^' print_expr
|
||
|
| non_unary_print_expr '*' print_expr
|
||
|
| non_unary_print_expr '/' print_expr
|
||
|
| non_unary_print_expr '%' print_expr
|
||
|
| non_unary_print_expr '+' print_expr
|
||
|
| non_unary_print_expr '-' print_expr
|
||
|
| non_unary_print_expr non_unary_print_expr
|
||
|
| non_unary_print_expr '~' print_expr
|
||
|
| non_unary_print_expr NO_MATCH print_expr
|
||
|
| non_unary_print_expr In NAME
|
||
|
| '(' multiple_expr_list ')' In NAME
|
||
|
| non_unary_print_expr AND newline_opt print_expr
|
||
|
| non_unary_print_expr OR newline_opt print_expr
|
||
|
| non_unary_print_expr '?' print_expr ':' print_expr
|
||
|
| NUMBER
|
||
|
| STRING
|
||
|
| lvalue
|
||
|
| ERE
|
||
|
| lvalue INCR
|
||
|
| lvalue DECR
|
||
|
| INCR lvalue
|
||
|
| DECR lvalue
|
||
|
| lvalue POW_ASSIGN print_expr
|
||
|
| lvalue MOD_ASSIGN print_expr
|
||
|
| lvalue MUL_ASSIGN print_expr
|
||
|
| lvalue DIV_ASSIGN print_expr
|
||
|
| lvalue ADD_ASSIGN print_expr
|
||
|
| lvalue SUB_ASSIGN print_expr
|
||
|
| lvalue '=' print_expr
|
||
|
| FUNC_NAME '(' expr_list_opt ')'
|
||
|
/* no white space allowed before '(' */
|
||
|
| BUILTIN_FUNC_NAME '(' expr_list_opt ')'
|
||
|
| BUILTIN_FUNC_NAME
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
lvalue : NAME
|
||
|
| NAME '[' expr_list ']'
|
||
|
| '$' expr
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
non_unary_input_function : simple_get
|
||
|
| simple_get '<' expr
|
||
|
| non_unary_expr '|' simple_get
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
unary_input_function : unary_expr '|' simple_get
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
simple_get : GETLINE
|
||
|
| GETLINE lvalue
|
||
|
;
|
||
|
.sp
|
||
|
|
||
|
newline_opt : /* empty */
|
||
|
| newline_opt NEWLINE
|
||
|
;
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
This grammar has several ambiguities that shall be resolved as follows:
|
||
|
.IP " *" 3
|
||
|
Operator precedence and associativity shall be as described in Expressions
|
||
|
in Decreasing Precedence in \fIawk\fP .
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
In case of ambiguity, an \fBelse\fP shall be associated with the most
|
||
|
immediately preceding \fBif\fP that would satisfy the
|
||
|
grammar.
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
In some contexts, a slash ( \fB'/'\fP ) that is used to surround an
|
||
|
ERE could also be the division operator. This shall be
|
||
|
resolved in such a way that wherever the division operator could appear,
|
||
|
a slash is assumed to be the division operator. (There is
|
||
|
no unary division operator.)
|
||
|
.LP
|
||
|
.LP
|
||
|
One convention that might not be obvious from the formal grammar is
|
||
|
where <newline>s are acceptable. There are several
|
||
|
obvious placements such as terminating a statement, and a backslash
|
||
|
can be used to escape <newline>s between any lexical
|
||
|
tokens. In addition, <newline>s without backslashes can follow a comma,
|
||
|
an open brace, logical AND operator (
|
||
|
\fB"&&"\fP ), logical OR operator ( \fB"||"\fP ), the \fBdo\fP keyword,
|
||
|
the \fBelse\fP keyword, and the closing
|
||
|
parenthesis of an \fBif\fP, \fBfor\fP, or \fBwhile\fP statement. For
|
||
|
example:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB{ print $1,
|
||
|
$2 }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.SS Lexical Conventions
|
||
|
.LP
|
||
|
The lexical conventions for \fIawk\fP programs, with respect to the
|
||
|
preceding grammar, shall be as follows:
|
||
|
.IP " 1." 4
|
||
|
Except as noted, \fIawk\fP shall recognize the longest possible token
|
||
|
or delimiter beginning at a given point.
|
||
|
.LP
|
||
|
.IP " 2." 4
|
||
|
A comment shall consist of any characters beginning with the number
|
||
|
sign character and terminated by, but excluding the next
|
||
|
occurrence of, a <newline>. Comments shall have no effect, except
|
||
|
to delimit lexical tokens.
|
||
|
.LP
|
||
|
.IP " 3." 4
|
||
|
The <newline> shall be recognized as the token \fBNEWLINE\fP.
|
||
|
.LP
|
||
|
.IP " 4." 4
|
||
|
A backslash character immediately followed by a <newline> shall have
|
||
|
no effect.
|
||
|
.LP
|
||
|
.IP " 5." 4
|
||
|
The token \fBSTRING\fP shall represent a string constant. A string
|
||
|
constant shall begin with the character \fB' .'\fP Within
|
||
|
a string constant, a backslash character shall be considered to begin
|
||
|
an escape sequence as specified in the table in the Base
|
||
|
Definitions volume of IEEE\ Std\ 1003.1-2001, Chapter 5, File Format
|
||
|
Notation (
|
||
|
\fB'\\\\'\fP , \fB'\\a'\fP , \fB'\\b'\fP , \fB'\\f'\fP , \fB'\\n'\fP
|
||
|
, \fB'\\r'\fP , \fB'\\t'\fP , \fB'\\v'\fP ). In
|
||
|
addition, the escape sequences in Expressions in Decreasing Precedence
|
||
|
in \fIawk\fP shall be recognized. A <newline> shall not
|
||
|
occur within a string constant. A string constant shall be terminated
|
||
|
by the first unescaped occurrence of the character
|
||
|
\fB''\fP after the one that begins the string constant. The value
|
||
|
of the string shall be the sequence of all unescaped
|
||
|
characters and values of escape sequences between, but not including,
|
||
|
the two delimiting \fB''\fP characters.
|
||
|
.LP
|
||
|
.IP " 6." 4
|
||
|
The token \fBERE\fP represents an extended regular expression constant.
|
||
|
An ERE constant shall begin with the slash character.
|
||
|
Within an ERE constant, a backslash character shall be considered
|
||
|
to begin an escape sequence as specified in the table in the Base
|
||
|
Definitions volume of IEEE\ Std\ 1003.1-2001, Chapter 5, File Format
|
||
|
Notation. In
|
||
|
addition, the escape sequences in Expressions in Decreasing Precedence
|
||
|
in \fIawk\fP shall be recognized. The application shall
|
||
|
ensure that a <newline> does not occur within an ERE constant. An
|
||
|
ERE constant shall be terminated by the first unescaped
|
||
|
occurrence of the slash character after the one that begins the ERE
|
||
|
constant. The extended regular expression represented by the
|
||
|
ERE constant shall be the sequence of all unescaped characters and
|
||
|
values of escape sequences between, but not including, the two
|
||
|
delimiting slash characters.
|
||
|
.LP
|
||
|
.IP " 7." 4
|
||
|
A <blank> shall have no effect, except to delimit lexical tokens or
|
||
|
within \fBSTRING\fP or \fBERE\fP tokens.
|
||
|
.LP
|
||
|
.IP " 8." 4
|
||
|
The token \fBNUMBER\fP shall represent a numeric constant. Its form
|
||
|
and numeric value shall be equivalent to either of the
|
||
|
tokens \fBfloating-constant\fP or \fBinteger-constant\fP as specified
|
||
|
by the ISO\ C standard, with the following
|
||
|
exceptions:
|
||
|
.RS
|
||
|
.IP " a." 4
|
||
|
An integer constant cannot begin with 0x or include the hexadecimal
|
||
|
digits \fB'a'\fP , \fB'b'\fP , \fB'c'\fP ,
|
||
|
\fB'd'\fP , \fB'e'\fP , \fB'f'\fP , \fB'A'\fP , \fB'B'\fP , \fB'C'\fP
|
||
|
, \fB'D'\fP , \fB'E'\fP , or
|
||
|
\fB'F'\fP .
|
||
|
.LP
|
||
|
.IP " b." 4
|
||
|
The value of an integer constant beginning with 0 shall be taken in
|
||
|
decimal rather than octal.
|
||
|
.LP
|
||
|
.IP " c." 4
|
||
|
An integer constant cannot include a suffix ( \fB'u'\fP , \fB'U'\fP
|
||
|
, \fB'l'\fP , or \fB'L'\fP ).
|
||
|
.LP
|
||
|
.IP " d." 4
|
||
|
A floating constant cannot include a suffix ( \fB'f'\fP , \fB'F'\fP
|
||
|
, \fB'l'\fP , or \fB'L'\fP ).
|
||
|
.LP
|
||
|
.RE
|
||
|
.LP
|
||
|
If the value is too large or too small to be representable (see \fIConcepts
|
||
|
Derived from
|
||
|
the ISO C Standard\fP ), the behavior is undefined.
|
||
|
.LP
|
||
|
.IP " 9." 4
|
||
|
A sequence of underscores, digits, and alphabetics from the portable
|
||
|
character set (see the Base Definitions volume of
|
||
|
IEEE\ Std\ 1003.1-2001, Section 6.1, Portable Character Set), beginning
|
||
|
with an underscore or alphabetic, shall be considered a word.
|
||
|
.LP
|
||
|
.IP "10." 4
|
||
|
The following words are keywords that shall be recognized as individual
|
||
|
tokens; the name of the token is the same as the
|
||
|
keyword:
|
||
|
.TS C
|
||
|
center; lw(13) lw(13) lw(13) lw(13) lw(13) lw(13).
|
||
|
T{
|
||
|
\fB
|
||
|
.br
|
||
|
BEGIN
|
||
|
.br
|
||
|
break
|
||
|
.br
|
||
|
continue
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
delete
|
||
|
.br
|
||
|
do
|
||
|
.br
|
||
|
else
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
END
|
||
|
.br
|
||
|
exit
|
||
|
.br
|
||
|
for
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
function
|
||
|
.br
|
||
|
getline
|
||
|
.br
|
||
|
if
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
in
|
||
|
.br
|
||
|
next
|
||
|
.br
|
||
|
print
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
printf
|
||
|
.br
|
||
|
return
|
||
|
.br
|
||
|
while
|
||
|
.br
|
||
|
\fP
|
||
|
T}
|
||
|
.TE
|
||
|
.LP
|
||
|
.IP "11." 4
|
||
|
The following words are names of built-in functions and shall be recognized
|
||
|
as the token \fBBUILTIN_FUNC_NAME\fP:
|
||
|
.TS C
|
||
|
center; lw(13) lw(13) lw(13) lw(13) lw(13) lw(13).
|
||
|
T{
|
||
|
\fB
|
||
|
.br
|
||
|
atan2
|
||
|
.br
|
||
|
close
|
||
|
.br
|
||
|
cos
|
||
|
.br
|
||
|
exp
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
gsub
|
||
|
.br
|
||
|
index
|
||
|
.br
|
||
|
int
|
||
|
.br
|
||
|
length
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
log
|
||
|
.br
|
||
|
match
|
||
|
.br
|
||
|
rand
|
||
|
.br
|
||
|
sin
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
split
|
||
|
.br
|
||
|
sprintf
|
||
|
.br
|
||
|
sqrt
|
||
|
.br
|
||
|
srand
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
sub
|
||
|
.br
|
||
|
substr
|
||
|
.br
|
||
|
system
|
||
|
.br
|
||
|
tolower
|
||
|
.br
|
||
|
\fP
|
||
|
T} T{
|
||
|
\fB
|
||
|
.br
|
||
|
toupper
|
||
|
.br
|
||
|
\fP
|
||
|
T}
|
||
|
.TE
|
||
|
.LP
|
||
|
The above-listed keywords and names of built-in functions are considered
|
||
|
reserved words.
|
||
|
.LP
|
||
|
.IP "12." 4
|
||
|
The token \fBNAME\fP shall consist of a word that is not a keyword
|
||
|
or a name of a built-in function and is not followed
|
||
|
immediately (without any delimiters) by the \fB'('\fP character.
|
||
|
.LP
|
||
|
.IP "13." 4
|
||
|
The token \fBFUNC_NAME\fP shall consist of a word that is not a keyword
|
||
|
or a name of a built-in function, followed immediately
|
||
|
(without any delimiters) by the \fB'('\fP character. The \fB'('\fP
|
||
|
character shall not be included as part of the token.
|
||
|
.LP
|
||
|
.IP "14." 4
|
||
|
The following two-character sequences shall be recognized as the named
|
||
|
tokens:
|
||
|
.TS C
|
||
|
center; l l l l.
|
||
|
\fBToken Name\fP \fBSequence\fP \fBToken Name\fP \fBSequence\fP
|
||
|
\fBADD_ASSIGN\fP += \fBNO_MATCH\fP !~
|
||
|
\fBSUB_ASSIGN\fP -= \fBEQ\fP ==
|
||
|
\fBMUL_ASSIGN\fP *= \fBLE\fP <=
|
||
|
\fBDIV_ASSIGN\fP /= \fBGE\fP >=
|
||
|
\fBMOD_ASSIGN\fP %= \fBNE\fP !=
|
||
|
\fBPOW_ASSIGN\fP ^= \fBINCR\fP ++
|
||
|
\fBOR\fP || \fBDECR\fP --
|
||
|
\fBAND\fP && \fBAPPEND\fP >>
|
||
|
.TE
|
||
|
.LP
|
||
|
.IP "15." 4
|
||
|
The following single characters shall be recognized as tokens whose
|
||
|
names are the character:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB<newline> { } ( ) [ ] , ; + - * % ^ ! > < | ? : ~ $ =
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.LP
|
||
|
There is a lexical ambiguity between the token \fBERE\fP and the tokens
|
||
|
\fB'/'\fP and \fBDIV_ASSIGN\fP. When an input
|
||
|
sequence begins with a slash character in any syntactic context where
|
||
|
the token \fB'/'\fP or \fBDIV_ASSIGN\fP could appear as
|
||
|
the next token in a valid program, the longer of those two tokens
|
||
|
that can be recognized shall be recognized. In any other
|
||
|
syntactic context where the token \fBERE\fP could appear as the next
|
||
|
token in a valid program, the token \fBERE\fP shall be
|
||
|
recognized.
|
||
|
.SH EXIT STATUS
|
||
|
.LP
|
||
|
The following exit values shall be returned:
|
||
|
.TP 7
|
||
|
\ 0
|
||
|
All input files were processed successfully.
|
||
|
.TP 7
|
||
|
>0
|
||
|
An error occurred.
|
||
|
.sp
|
||
|
.LP
|
||
|
The exit status can be altered within the program by using an \fBexit\fP
|
||
|
expression.
|
||
|
.SH CONSEQUENCES OF ERRORS
|
||
|
.LP
|
||
|
If any \fIfile\fP operand is specified and the named file cannot be
|
||
|
accessed, \fIawk\fP shall write a diagnostic message to
|
||
|
standard error and terminate without any further action.
|
||
|
.LP
|
||
|
If the program specified by either the \fIprogram\fP operand or a
|
||
|
\fIprogfile\fP operand is not a valid \fIawk\fP program (as
|
||
|
specified in the EXTENDED DESCRIPTION section), the behavior is undefined.
|
||
|
.LP
|
||
|
\fIThe following sections are informative.\fP
|
||
|
.SH APPLICATION USAGE
|
||
|
.LP
|
||
|
The \fBindex\fP, \fBlength\fP, \fBmatch\fP, and \fBsubstr\fP functions
|
||
|
should not be confused with similar functions in the
|
||
|
ISO\ C standard; the \fIawk\fP versions deal with characters, while
|
||
|
the ISO\ C standard deals with bytes.
|
||
|
.LP
|
||
|
Because the concatenation operation is represented by adjacent expressions
|
||
|
rather than an explicit operator, it is often
|
||
|
necessary to use parentheses to enforce the proper evaluation precedence.
|
||
|
.SH EXAMPLES
|
||
|
.LP
|
||
|
The \fIawk\fP program specified in the command line is most easily
|
||
|
specified within single-quotes (for example,
|
||
|
'\fIprogram\fP') for applications using \fIsh\fP, because \fIawk\fP
|
||
|
programs commonly contain
|
||
|
characters that are special to the shell, including double-quotes.
|
||
|
In the cases where an \fIawk\fP program contains single-quote
|
||
|
characters, it is usually easiest to specify most of the program as
|
||
|
strings within single-quotes concatenated by the shell with
|
||
|
quoted single-quote characters. For example:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBawk '/'\\''/ { print "quote:", $0 }'
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
prints all lines from the standard input containing a single-quote
|
||
|
character, prefixed with \fIquote\fP:.
|
||
|
.LP
|
||
|
The following are examples of simple \fIawk\fP programs:
|
||
|
.IP " 1." 4
|
||
|
Write to the standard output all input lines for which field 3 is
|
||
|
greater than 5:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB$3 > 5
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP " 2." 4
|
||
|
Write every tenth line:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB(NR % 10) == 0
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP " 3." 4
|
||
|
Write any line with a substring matching the regular expression:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB/(G|D)(2[0-9][[:alpha:]]*)/
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP " 4." 4
|
||
|
Print any line with a substring containing a \fB'G'\fP or \fB'D'\fP
|
||
|
, followed by a sequence of digits and characters.
|
||
|
This example uses character classes \fBdigit\fP and \fBalpha\fP to
|
||
|
match language-independent digit and alphabetic characters
|
||
|
respectively:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB/(G|D)([[:digit:][:alpha:]]*)/
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP " 5." 4
|
||
|
Write any line in which the second field matches the regular expression
|
||
|
and the fourth field does not:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB$2 ~ /xyz/ && $4 !~ /xyz/
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP " 6." 4
|
||
|
Write any line in which the second field contains a backslash:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB$2 ~ /\\\\/
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP " 7." 4
|
||
|
Write any line in which the second field contains a backslash. Note
|
||
|
that backslash escapes are interpreted twice; once in
|
||
|
lexical processing of the string and once in processing the regular
|
||
|
expression:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB$2 ~ "\\\\\\\\"
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP " 8." 4
|
||
|
Write the second to the last and the last field in each line. Separate
|
||
|
the fields by a colon:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB{OFS=":";print $(NF-1), $NF}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP " 9." 4
|
||
|
Write the line number and number of fields in each line. The three
|
||
|
strings representing the line number, the colon, and the
|
||
|
number of fields are concatenated and that string is written to standard
|
||
|
output:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB{print NR ":" NF}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "10." 4
|
||
|
Write lines longer than 72 characters:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBlength($0) > 72
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "11." 4
|
||
|
Write the first two fields in opposite order separated by \fBOFS\fP:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB{ print $2, $1 }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "12." 4
|
||
|
Same, with input fields separated by a comma or <space>s and <tab>s,
|
||
|
or both:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBBEGIN { FS = ",[ \\t]*|[ \\t]+" }
|
||
|
{ print $2, $1 }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "13." 4
|
||
|
Add up the first column, print sum, and average:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB {s += $1 }
|
||
|
END {print "sum is ", s, " average is", s/NR}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "14." 4
|
||
|
Write fields in reverse order, one per line (many lines out for each
|
||
|
line in):
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB{ for (i = NF; i > 0; --i) print $i }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "15." 4
|
||
|
Write all lines between occurrences of the strings \fBstart\fP and
|
||
|
\fBstop\fP:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB/start/, /stop/
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "16." 4
|
||
|
Write all lines whose first field is different from the previous one:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB$1 != prev { print; prev = $1 }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "17." 4
|
||
|
Simulate \fIecho\fP:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBBEGIN {
|
||
|
for (i = 1; i < ARGC; ++i)
|
||
|
printf("%s%s", ARGV[i], i==ARGC-1?"\\n":" ")
|
||
|
}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "18." 4
|
||
|
Write the path prefixes contained in the \fIPATH\fP environment variable,
|
||
|
one per line:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBBEGIN {
|
||
|
n = split (ENVIRON["PATH"], path, ":")
|
||
|
for (i = 1; i <= n; ++i)
|
||
|
print path[i]
|
||
|
}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.IP "19." 4
|
||
|
If there is a file named \fBinput\fP containing page headers of the
|
||
|
form:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
Page #
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
and a file named \fBprogram\fP that contains:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB/Page/ { $2 = n++; }
|
||
|
{ print }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
then the command line:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBawk -f program n=5 input
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
prints the file \fBinput\fP, filling in page numbers starting at 5.
|
||
|
.LP
|
||
|
.SH RATIONALE
|
||
|
.LP
|
||
|
This description is based on the new \fIawk\fP, "nawk", (see the referenced
|
||
|
\fIThe AWK Programming Language\fP), which
|
||
|
introduced a number of new features to the historical \fIawk\fP:
|
||
|
.IP " 1." 4
|
||
|
New keywords: \fBdelete\fP, \fBdo\fP, \fBfunction\fP, \fBreturn\fP
|
||
|
.LP
|
||
|
.IP " 2." 4
|
||
|
New built-in functions: \fBatan2\fP, \fBclose\fP, \fBcos\fP, \fBgsub\fP,
|
||
|
\fBmatch\fP, \fBrand\fP, \fBsin\fP,
|
||
|
\fBsrand\fP, \fBsub\fP, \fBsystem\fP
|
||
|
.LP
|
||
|
.IP " 3." 4
|
||
|
New predefined variables: \fBFNR\fP, \fBARGC\fP, \fBARGV\fP, \fBRSTART\fP,
|
||
|
\fBRLENGTH\fP, \fBSUBSEP\fP
|
||
|
.LP
|
||
|
.IP " 4." 4
|
||
|
New expression operators: \fB?\fP, \fB:\fP, \fB,\fP, \fB^\fP
|
||
|
.LP
|
||
|
.IP " 5." 4
|
||
|
The \fBFS\fP variable and the third argument to \fBsplit\fP, now treated
|
||
|
as extended regular expressions.
|
||
|
.LP
|
||
|
.IP " 6." 4
|
||
|
The operator precedence, changed to more closely match the C language.
|
||
|
Two examples of code that operate differently are:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBwhile ( n /= 10 > 1) ...
|
||
|
if (!"wk" ~ /bwk/) ...
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
.LP
|
||
|
Several features have been added based on newer implementations of
|
||
|
\fIawk\fP:
|
||
|
.IP " *" 3
|
||
|
Multiple instances of \fB-f\fP \fIprogfile\fP are permitted.
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
The new option \fB-v\fP \fIassignment.\fP
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
The new predefined variable \fBENVIRON\fP.
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
New built-in functions \fBtoupper\fP and \fBtolower\fP.
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
More formatting capabilities are added to \fBprintf\fP to match the
|
||
|
ISO\ C standard.
|
||
|
.LP
|
||
|
.LP
|
||
|
The overall \fIawk\fP syntax has always been based on the C language,
|
||
|
with a few features from the shell command language and
|
||
|
other sources. Because of this, it is not completely compatible with
|
||
|
any other language, which has caused confusion for some users.
|
||
|
It is not the intent of the standard developers to address such issues.
|
||
|
A few relatively minor changes toward making the language
|
||
|
more compatible with the ISO\ C standard were made; most of these
|
||
|
changes are based on similar changes in recent
|
||
|
implementations, as described above. There remain several C-language
|
||
|
conventions that are not in \fIawk\fP. One of the notable
|
||
|
ones is the comma operator, which is commonly used to specify multiple
|
||
|
expressions in the C language \fBfor\fP statement. Also,
|
||
|
there are various places where \fIawk\fP is more restrictive than
|
||
|
the C language regarding the type of expression that can be used
|
||
|
in a given context. These limitations are due to the different features
|
||
|
that the \fIawk\fP language does provide.
|
||
|
.LP
|
||
|
Regular expressions in \fIawk\fP have been extended somewhat from
|
||
|
historical implementations to make them a pure superset of
|
||
|
extended regular expressions, as defined by IEEE\ Std\ 1003.1-2001
|
||
|
(see the Base Definitions volume of
|
||
|
IEEE\ Std\ 1003.1-2001, Section 9.4, Extended Regular Expressions).
|
||
|
The
|
||
|
main extensions are internationalization features and interval expressions.
|
||
|
Historical implementations of \fIawk\fP have long
|
||
|
supported backslash escape sequences as an extension to extended regular
|
||
|
expressions, and this extension has been retained despite
|
||
|
inconsistency with other utilities. The number of escape sequences
|
||
|
recognized in both extended regular expressions and strings has
|
||
|
varied (generally increasing with time) among implementations. The
|
||
|
set specified by IEEE\ Std\ 1003.1-2001 includes most
|
||
|
sequences known to be supported by popular implementations and by
|
||
|
the ISO\ C standard. One sequence that is not supported is
|
||
|
hexadecimal value escapes beginning with \fB'\\x'\fP . This would
|
||
|
allow values expressed in more than 9 bits to be used within
|
||
|
\fIawk\fP as in the ISO\ C standard. However, because this syntax
|
||
|
has a non-deterministic length, it does not permit the
|
||
|
subsequent character to be a hexadecimal digit. This limitation can
|
||
|
be dealt with in the C language by the use of lexical string
|
||
|
concatenation. In the \fIawk\fP language, concatenation could also
|
||
|
be a solution for strings, but not for extended regular
|
||
|
expressions (either lexical ERE tokens or strings used dynamically
|
||
|
as regular expressions). Because of this limitation, the feature
|
||
|
has not been added to IEEE\ Std\ 1003.1-2001.
|
||
|
.LP
|
||
|
When a string variable is used in a context where an extended regular
|
||
|
expression normally appears (where the lexical token ERE
|
||
|
is used in the grammar) the string does not contain the literal slashes.
|
||
|
.LP
|
||
|
Some versions of \fIawk\fP allow the form:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBfunc name(args, ... ) { statements }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
This has been deprecated by the authors of the language, who asked
|
||
|
that it not be specified.
|
||
|
.LP
|
||
|
Historical implementations of \fIawk\fP produce an error if a \fBnext\fP
|
||
|
statement is executed in a \fBBEGIN\fP action, and
|
||
|
cause \fIawk\fP to terminate if a \fBnext\fP statement is executed
|
||
|
in an \fBEND\fP action. This behavior has not been
|
||
|
documented, and it was not believed that it was necessary to standardize
|
||
|
it.
|
||
|
.LP
|
||
|
The specification of conversions between string and numeric values
|
||
|
is much more detailed than in the documentation of historical
|
||
|
implementations or in the referenced \fIThe AWK Programming Language\fP.
|
||
|
Although most of the behavior is designed to be
|
||
|
intuitive, the details are necessary to ensure compatible behavior
|
||
|
from different implementations. This is especially important in
|
||
|
relational expressions since the types of the operands determine whether
|
||
|
a string or numeric comparison is performed. From the
|
||
|
perspective of an application writer, it is usually sufficient to
|
||
|
expect intuitive behavior and to force conversions (by adding
|
||
|
zero or concatenating a null string) when the type of an expression
|
||
|
does not obviously match what is needed. The intent has been to
|
||
|
specify historical practice in almost all cases. The one exception
|
||
|
is that, in historical implementations, variables and constants
|
||
|
maintain both string and numeric values after their original value
|
||
|
is converted by any use. This means that referencing a variable
|
||
|
or constant can have unexpected side effects. For example, with historical
|
||
|
implementations the following program:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB{
|
||
|
a = "+2"
|
||
|
b = 2
|
||
|
if (NR % 2)
|
||
|
c = a + b
|
||
|
if (a == b)
|
||
|
print "numeric comparison"
|
||
|
else
|
||
|
print "string comparison"
|
||
|
}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
would perform a numeric comparison (and output numeric comparison)
|
||
|
for each odd-numbered line, but perform a string comparison
|
||
|
(and output string comparison) for each even-numbered line. IEEE\ Std\ 1003.1-2001
|
||
|
ensures that comparisons will be numeric
|
||
|
if necessary. With historical implementations, the following program:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBBEGIN {
|
||
|
OFMT = "%e"
|
||
|
print 3.14
|
||
|
OFMT = "%f"
|
||
|
print 3.14
|
||
|
}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
would output \fB"3.140000e+00"\fP twice, because in the second \fBprint\fP
|
||
|
statement the constant \fB"3.14"\fP would have
|
||
|
a string value from the previous conversion. IEEE\ Std\ 1003.1-2001
|
||
|
requires that the output of the second \fBprint\fP
|
||
|
statement be \fB"3.140000"\fP . The behavior of historical implementations
|
||
|
was seen as too unintuitive and unpredictable.
|
||
|
.LP
|
||
|
It was pointed out that with the rules contained in early drafts,
|
||
|
the following script would print nothing:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBBEGIN {
|
||
|
y[1.5] = 1
|
||
|
OFMT = "%e"
|
||
|
print y[1.5]
|
||
|
}
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
Therefore, a new variable, \fBCONVFMT\fP, was introduced. The \fBOFMT\fP
|
||
|
variable is now restricted to affecting output
|
||
|
conversions of numbers to strings and \fBCONVFMT\fP is used for internal
|
||
|
conversions, such as comparisons or array indexing. The
|
||
|
default value is the same as that for \fBOFMT\fP, so unless a program
|
||
|
changes \fBCONVFMT\fP (which no historical program would
|
||
|
do), it will receive the historical behavior associated with internal
|
||
|
string conversions.
|
||
|
.LP
|
||
|
The POSIX \fIawk\fP lexical and syntactic conventions are specified
|
||
|
more formally than in other sources. Again the intent has
|
||
|
been to specify historical practice. One convention that may not be
|
||
|
obvious from the formal grammar as in other verbal descriptions
|
||
|
is where <newline>s are acceptable. There are several obvious placements
|
||
|
such as terminating a statement, and a backslash can
|
||
|
be used to escape <newline>s between any lexical tokens. In addition,
|
||
|
<newline>s without backslashes can follow a
|
||
|
comma, an open brace, a logical AND operator ( \fB"&&"\fP ), a logical
|
||
|
OR operator ( \fB"||"\fP ), the \fBdo\fP
|
||
|
keyword, the \fBelse\fP keyword, and the closing parenthesis of an
|
||
|
\fBif\fP, \fBfor\fP, or \fBwhile\fP statement. For
|
||
|
example:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB{ print $1,
|
||
|
$2 }
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
The requirement that \fIawk\fP add a trailing <newline> to the program
|
||
|
argument text is to simplify the grammar, making
|
||
|
it match a text file in form. There is no way for an application or
|
||
|
test suite to determine whether a literal <newline> is
|
||
|
added or whether \fIawk\fP simply acts as if it did.
|
||
|
.LP
|
||
|
IEEE\ Std\ 1003.1-2001 requires several changes from historical implementations
|
||
|
in order to support
|
||
|
internationalization. Probably the most subtle of these is the use
|
||
|
of the decimal-point character, defined by the \fILC_NUMERIC\fP
|
||
|
category of the locale, in representations of floating-point numbers.
|
||
|
This locale-specific character is used in recognizing numeric
|
||
|
input, in converting between strings and numeric values, and in formatting
|
||
|
output. However, regardless of locale, the period
|
||
|
character (the decimal-point character of the POSIX locale) is the
|
||
|
decimal-point character recognized in processing \fIawk\fP
|
||
|
programs (including assignments in command line arguments). This is
|
||
|
essentially the same convention as the one used in the
|
||
|
ISO\ C standard. The difference is that the C language includes the
|
||
|
\fIsetlocale\fP() function, which permits an application to modify
|
||
|
its locale. Because of this
|
||
|
capability, a C application begins executing with its locale set to
|
||
|
the C locale, and only executes in the environment-specified
|
||
|
locale after an explicit call to \fIsetlocale\fP(). However, adding
|
||
|
such an elaborate
|
||
|
new feature to the \fIawk\fP language was seen as inappropriate for
|
||
|
IEEE\ Std\ 1003.1-2001. It is possible to execute an
|
||
|
\fIawk\fP program explicitly in any desired locale by setting the
|
||
|
environment in the shell.
|
||
|
.LP
|
||
|
The undefined behavior resulting from NULs in extended regular expressions
|
||
|
allows future extensions for the GNU \fIgawk\fP
|
||
|
program to process binary data.
|
||
|
.LP
|
||
|
The behavior in the case of invalid \fIawk\fP programs (including
|
||
|
lexical, syntactic, and semantic errors) is undefined because
|
||
|
it was considered overly limiting on implementations to specify. In
|
||
|
most cases such errors can be expected to produce a diagnostic
|
||
|
and a non-zero exit status. However, some implementations may choose
|
||
|
to extend the language in ways that make use of certain
|
||
|
invalid constructs. Other invalid constructs might be deemed worthy
|
||
|
of a warning, but otherwise cause some reasonable behavior.
|
||
|
Still other constructs may be very difficult to detect in some implementations.
|
||
|
Also, different implementations might detect a
|
||
|
given error during an initial parsing of the program (before reading
|
||
|
any input files) while others might detect it when executing
|
||
|
the program after reading some input. Implementors should be aware
|
||
|
that diagnosing errors as early as possible and producing useful
|
||
|
diagnostics can ease debugging of applications, and thus make an implementation
|
||
|
more usable.
|
||
|
.LP
|
||
|
The unspecified behavior from using multi-character \fBRS\fP values
|
||
|
is to allow possible future extensions based on extended
|
||
|
regular expressions used for record separators. Historical implementations
|
||
|
take the first character of the string and ignore the
|
||
|
others.
|
||
|
.LP
|
||
|
Unspecified behavior when \fIsplit\fP( \fIstring\fP, \fIarray\fP,
|
||
|
<null>) is used
|
||
|
is to allow a proposed future extension that would split up a string
|
||
|
into an array of individual characters.
|
||
|
.LP
|
||
|
In the context of the \fBgetline\fP function, equally good arguments
|
||
|
for different precedences of the \fB|\fP and \fB<\fP
|
||
|
operators can be made. Historical practice has been that:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBgetline < "a" "b"
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
is parsed as:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB( getline < "a" ) "b"
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
although many would argue that the intent was that the file \fBab\fP
|
||
|
should be read. However:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBgetline < "x" + 1
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
parses as:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBgetline < ( "x" + 1 )
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
Similar problems occur with the \fB|\fP version of \fBgetline\fP,
|
||
|
particularly in combination with \fB$\fP. For example:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fB$"echo hi" | getline
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
(This situation is particularly problematic when used in a \fBprint\fP
|
||
|
statement, where the \fB|getline\fP part might be a
|
||
|
redirection of the \fBprint\fP.)
|
||
|
.LP
|
||
|
Since in most cases such constructs are not (or at least should not)
|
||
|
be used (because they have a natural ambiguity for which
|
||
|
there is no conventional parsing), the meaning of these constructs
|
||
|
has been made explicitly unspecified. (The effect is that a
|
||
|
conforming application that runs into the problem must parenthesize
|
||
|
to resolve the ambiguity.) There appeared to be few if any
|
||
|
actual uses of such constructs.
|
||
|
.LP
|
||
|
Grammars can be written that would cause an error under these circumstances.
|
||
|
Where backwards-compatibility is not a large
|
||
|
consideration, implementors may wish to use such grammars.
|
||
|
.LP
|
||
|
Some historical implementations have allowed some built-in functions
|
||
|
to be called without an argument list, the result being a
|
||
|
default argument list chosen in some "reasonable" way. Use of \fBlength\fP
|
||
|
as a synonym for \fBlength($0)\fP is the only one of
|
||
|
these forms that is thought to be widely known or widely used; this
|
||
|
particular form is documented in various places (for example,
|
||
|
most historical \fIawk\fP reference pages, although not in the referenced
|
||
|
\fIThe AWK Programming Language\fP) as legitimate
|
||
|
practice. With this exception, default argument lists have always
|
||
|
been undocumented and vaguely defined, and it is not at all clear
|
||
|
how (or if) they should be generalized to user-defined functions.
|
||
|
They add no useful functionality and preclude possible future
|
||
|
extensions that might need to name functions without calling them.
|
||
|
Not standardizing them seems the simplest course. The standard
|
||
|
developers considered that \fBlength\fP merited special treatment,
|
||
|
however, since it has been documented in the past and sees
|
||
|
possibly substantial use in historical programs. Accordingly, this
|
||
|
usage has been made legitimate, but Issue\ 5 removed the
|
||
|
obsolescent marking for XSI-conforming implementations and many otherwise
|
||
|
conforming applications depend on this feature.
|
||
|
.LP
|
||
|
In \fBsub\fP and \fBgsub\fP, if \fIrepl\fP is a string literal (the
|
||
|
lexical token \fBSTRING\fP), then two consecutive
|
||
|
backslash characters should be used in the string to ensure a single
|
||
|
backslash will precede the ampersand when the resultant string
|
||
|
is passed to the function. (For example, to specify one literal ampersand
|
||
|
in the replacement string, use \fBgsub\fP( \fBERE\fP,
|
||
|
\fB"\\\\&"\fP ).)
|
||
|
.LP
|
||
|
Historically the only special character in the \fIrepl\fP argument
|
||
|
of \fBsub\fP and \fBgsub\fP string functions was the
|
||
|
ampersand ( \fB'&'\fP ) character and preceding it with the backslash
|
||
|
character was used to turn off its special
|
||
|
meaning.
|
||
|
.LP
|
||
|
The description in the ISO\ POSIX-2:1993 standard introduced behavior
|
||
|
such that the backslash character was another special
|
||
|
character and it was unspecified whether there were any other special
|
||
|
characters. This description introduced several portability
|
||
|
problems, some of which are described below, and so it has been replaced
|
||
|
with the more historical description. Some of the problems
|
||
|
include:
|
||
|
.IP " *" 3
|
||
|
Historically, to create the replacement string, a script could use
|
||
|
\fBgsub\fP( \fBERE\fP, \fB"\\\\&"\fP ), but with the
|
||
|
ISO\ POSIX-2:1993 standard wording, it was necessary to use \fBgsub\fP(
|
||
|
\fBERE\fP, \fB"\\\\\\\\&"\fP ). Backslash
|
||
|
characters are doubled here because all string literals are subject
|
||
|
to lexical analysis, which would reduce each pair of backslash
|
||
|
characters to a single backslash before being passed to \fBgsub\fP.
|
||
|
.LP
|
||
|
.IP " *" 3
|
||
|
Since it was unspecified what the special characters were, for portable
|
||
|
scripts to guarantee that characters are printed
|
||
|
literally, each character had to be preceded with a backslash. (For
|
||
|
example, a portable script had to use \fBgsub\fP( \fBERE\fP,
|
||
|
\fB"\\\\h\\\\i"\fP ) to produce a replacement string of \fB"hi"\fP
|
||
|
\&.)
|
||
|
.LP
|
||
|
.LP
|
||
|
The description for comparisons in the ISO\ POSIX-2:1993 standard
|
||
|
did not properly describe historical practice because of
|
||
|
the way numeric strings are compared as numbers. The current rules
|
||
|
cause the following code:
|
||
|
.sp
|
||
|
.RS
|
||
|
.nf
|
||
|
|
||
|
\fBif (0 == "000")
|
||
|
print "strange, but true"
|
||
|
else
|
||
|
print "not true"
|
||
|
\fP
|
||
|
.fi
|
||
|
.RE
|
||
|
.LP
|
||
|
to do a numeric comparison, causing the \fBif\fP to succeed. It should
|
||
|
be intuitively obvious that this is incorrect behavior,
|
||
|
and indeed, no historical implementation of \fIawk\fP actually behaves
|
||
|
this way.
|
||
|
.LP
|
||
|
To fix this problem, the definition of \fInumeric string\fP was enhanced
|
||
|
to include only those values obtained from specific
|
||
|
circumstances (mostly external sources) where it is not possible to
|
||
|
determine unambiguously whether the value is intended to be a
|
||
|
string or a numeric.
|
||
|
.LP
|
||
|
Variables that are assigned to a numeric string shall also be treated
|
||
|
as a numeric string. (For example, the notion of a numeric
|
||
|
string can be propagated across assignments.) In comparisons, all
|
||
|
variables having the uninitialized value are to be treated as a
|
||
|
numeric operand evaluating to the numeric value zero.
|
||
|
.LP
|
||
|
Uninitialized variables include all types of variables including scalars,
|
||
|
array elements, and fields. The definition of an
|
||
|
uninitialized value in Variables and Special Variables is necessary
|
||
|
to describe the value placed on
|
||
|
uninitialized variables and on fields that are valid (for example,
|
||
|
\fB<\fP \fB$NF\fP) but have no characters in them and to
|
||
|
describe how these variables are to be used in comparisons. A valid
|
||
|
field, such as \fB$1\fP, that has no characters in it can be
|
||
|
obtained from an input line of \fB"\\t\\t"\fP when \fBFS=\fP \fB'\\t'\fP
|
||
|
\&. Historically, the comparison ( \fB$1<\fP10) was
|
||
|
done numerically after evaluating \fB$1\fP to the value zero.
|
||
|
.LP
|
||
|
The phrase "... also shall have the numeric value of the numeric string"
|
||
|
was removed from several sections of the
|
||
|
ISO\ POSIX-2:1993 standard because is specifies an unnecessary implementation
|
||
|
detail. It is not necessary for
|
||
|
IEEE\ Std\ 1003.1-2001 to specify that these objects be assigned two
|
||
|
different values. It is only necessary to specify that
|
||
|
these objects may evaluate to two different values depending on context.
|
||
|
.LP
|
||
|
The description of numeric string processing is based on the behavior
|
||
|
of the \fIatof\fP()
|
||
|
function in the ISO\ C standard. While it is not a requirement for
|
||
|
an implementation to use this function, many historical
|
||
|
implementations of \fIawk\fP do. In the ISO\ C standard, floating-point
|
||
|
constants use a period as a decimal point character
|
||
|
for the language itself, independent of the current locale, but the
|
||
|
\fIatof\fP() function and
|
||
|
the associated \fIstrtod\fP() function use the decimal point character
|
||
|
of the current
|
||
|
locale when converting strings to numeric values. Similarly in \fIawk\fP,
|
||
|
floating-point constants in an \fIawk\fP script use a
|
||
|
period independent of the locale, but input strings use the decimal
|
||
|
point character of the locale.
|
||
|
.SH FUTURE DIRECTIONS
|
||
|
.LP
|
||
|
None.
|
||
|
.SH SEE ALSO
|
||
|
.LP
|
||
|
\fIGrammar Conventions\fP , \fIgrep\fP , \fIlex\fP , \fIsed\fP , the
|
||
|
System Interfaces volume of IEEE\ Std\ 1003.1-2001, \fIatof\fP(),
|
||
|
\fIexec\fP, \fIpopen\fP(), \fIsetlocale\fP(), \fIstrtod\fP()
|
||
|
.SH COPYRIGHT
|
||
|
Portions of this text are reprinted and reproduced in electronic form
|
||
|
from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
|
||
|
-- Portable Operating System Interface (POSIX), The Open Group Base
|
||
|
Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
|
||
|
Electrical and Electronics Engineers, Inc and The Open Group. In the
|
||
|
event of any discrepancy between this version and the original IEEE and
|
||
|
The Open Group Standard, the original IEEE and The Open Group Standard
|
||
|
is the referee document. The original Standard can be obtained online at
|
||
|
http://www.opengroup.org/unix/online.html .
|