Try and bring some consistency to quotes.

This commit is contained in:
Michael Kerrisk 2008-06-09 21:03:52 +00:00
parent ce17c5d391
commit 333a424b0e
4 changed files with 153 additions and 148 deletions

View File

@ -178,11 +178,11 @@ and an optional
The arguments must correspond properly (after type promotion) with the The arguments must correspond properly (after type promotion) with the
conversion specifier. conversion specifier.
By default, the arguments are used in the order By default, the arguments are used in the order
given, where each `*' and each conversion specifier asks for the next given, where each \(aq*\(aq and each conversion specifier asks for the next
argument (and it is an error if insufficiently many arguments are given). argument (and it is an error if insufficiently many arguments are given).
One can also specify explicitly which argument is taken, One can also specify explicitly which argument is taken,
at each place where an argument is required, by writing `%m$' instead at each place where an argument is required, by writing "%m$" instead
of `%' and `*m$' instead of `*', where the decimal integer m denotes of \(aq%\(aq and "*m$" instead of \(aq*\(aq, where the decimal integer m denotes
the position in the argument list of the desired argument, indexed starting the position in the argument list of the desired argument, indexed starting
from 1. from 1.
Thus, Thus,
@ -204,35 +204,35 @@ printf("%2$*1$d", width, num);
are equivalent. are equivalent.
The second style allows repeated references to the The second style allows repeated references to the
same argument. same argument.
The C99 standard does not include the style using `$', The C99 standard does not include the style using \(aq$\(aq,
which comes from the Single Unix Specification. which comes from the Single Unix Specification.
If the style using If the style using
`$' is used, it must be used throughout for all conversions taking an \(aq$\(aq is used, it must be used throughout for all conversions taking an
argument and all width and precision arguments, but it may be mixed argument and all width and precision arguments, but it may be mixed
with `%%' formats which do not consume an argument. with "%%" formats which do not consume an argument.
There may be no There may be no
gaps in the numbers of arguments specified using `$'; for example, if gaps in the numbers of arguments specified using \(aq$\(aq; for example, if
arguments 1 and 3 are specified, argument 2 must also be specified arguments 1 and 3 are specified, argument 2 must also be specified
somewhere in the format string. somewhere in the format string.
For some numeric conversions a radix character (`decimal point') or For some numeric conversions a radix character ("decimal point") or
thousands' grouping character is used. thousands' grouping character is used.
The actual character used The actual character used
depends on the depends on the
.B LC_NUMERIC .B LC_NUMERIC
part of the locale. part of the locale.
The POSIX locale The POSIX locale
uses `.' as radix character, and does not have a grouping character. uses \(aq.\(aq as radix character, and does not have a grouping character.
Thus, Thus,
.in +4n .in +4n
.nf .nf
printf("%'.2f", 1234567.89); printf("%\(aq.2f", 1234567.89);
.fi .fi
.in .in
results in `1234567.89' in the POSIX locale, in `1234567,89' in the results in "1234567.89" in the POSIX locale, in "1234567,89" in the
nl_NL locale, and in `1.234.567,89' in the da_DK locale. nl_NL locale, and in "1.234.567,89" in the da_DK locale.
.SS "The flag characters" .SS "The flag characters"
The character % is followed by zero or more of the following flags: The character % is followed by zero or more of the following flags:
.TP .TP
@ -246,7 +246,7 @@ For
.B x .B x
and and
.B X .B X
conversions, a non-zero result has the string `0x' (or `0X' for conversions, a non-zero result has the string "0x" (or "0X" for
.B X .B X
conversions) prepended to it. conversions) prepended to it.
For For
@ -323,7 +323,7 @@ overrides a
.B \&0 .B \&0
if both are given. if both are given.
.TP .TP
.B ' ' .B \(aq \(aq
(a space) A blank should be left before a positive number (a space) A blank should be left before a positive number
(or empty string) produced by a signed conversion. (or empty string) produced by a signed conversion.
.TP .TP
@ -338,7 +338,7 @@ overrides a space if both are used.
The five flag characters above are defined in the C standard. The five flag characters above are defined in the C standard.
The SUSv2 specifies one further flag character. The SUSv2 specifies one further flag character.
.TP .TP
.B ' .B \(aq
For decimal conversion For decimal conversion
.RB ( i , .RB ( i ,
.BR d , .BR d ,
@ -353,7 +353,7 @@ Note that many versions of
.BR gcc (1) .BR gcc (1)
cannot parse this option and will issue a warning. cannot parse this option and will issue a warning.
SUSv2 does not SUSv2 does not
include %'F. include \fI%\(aqF\fP.
.PP .PP
glibc 2.2 adds one further flag character. glibc 2.2 adds one further flag character.
.TP .TP
@ -364,7 +364,7 @@ For decimal integer conversion
.BR u ) .BR u )
the output uses the locale's alternative output digits, if any. the output uses the locale's alternative output digits, if any.
For example, since glibc 2.2.3 this will give Arabic-Indic digits For example, since glibc 2.2.3 this will give Arabic-Indic digits
in the Persian (`fa_IR') locale. in the Persian ("fa_IR") locale.
.\" outdigits keyword in locale file .\" outdigits keyword in locale file
.SS "The field width" .SS "The field width"
An optional decimal digit string (with non-zero first digit) specifying An optional decimal digit string (with non-zero first digit) specifying
@ -372,25 +372,25 @@ a minimum field width.
If the converted value has fewer characters If the converted value has fewer characters
than the field width, it will be padded with spaces on the left than the field width, it will be padded with spaces on the left
(or right, if the left-adjustment flag has been given). (or right, if the left-adjustment flag has been given).
Instead of a decimal digit string one may write `*' or `*m$' Instead of a decimal digit string one may write "*" or "*m$"
(for some decimal integer m) to specify that the field width (for some decimal integer \fIm\fP) to specify that the field width
is given in the next argument, or in the m-th argument, respectively, is given in the next argument, or in the \fIm\fP-th argument, respectively,
which must be of type which must be of type
.IR int . .IR int .
A negative field width is taken as a `\-' flag followed by a A negative field width is taken as a \(aq\-\(aq flag followed by a
positive field width. positive field width.
In no case does a nonexistent or small field width cause truncation of a In no case does a nonexistent or small field width cause truncation of a
field; if the result of a conversion is wider than the field width, the field; if the result of a conversion is wider than the field width, the
field is expanded to contain the conversion result. field is expanded to contain the conversion result.
.SS "The precision" .SS "The precision"
An optional precision, in the form of a period (`\&.') followed by an An optional precision, in the form of a period (\(aq.\(aq) followed by an
optional decimal digit string. optional decimal digit string.
Instead of a decimal digit string one may write `*' or `*m$' Instead of a decimal digit string one may write "*" or "*m$"
(for some decimal integer m) to specify that the precision (for some decimal integer m) to specify that the precision
is given in the next argument, or in the m-th argument, respectively, is given in the next argument, or in the m-th argument, respectively,
which must be of type which must be of type
.IR int . .IR int .
If the precision is given as just `.', or the precision is negative, If the precision is given as just \(aq.\(aq, or the precision is negative,
the precision is taken to be zero. the precision is taken to be zero.
This gives the minimum number of digits to appear for This gives the minimum number of digits to appear for
.BR d , .BR d ,
@ -419,7 +419,7 @@ and
.B S .B S
conversions. conversions.
.SS "The length modifier" .SS "The length modifier"
Here, `integer conversion' stands for Here, "integer conversion" stands for
.BR d , .BR d ,
.BR i , .BR i ,
.BR o , .BR o ,
@ -499,7 +499,7 @@ argument.
(C99 allows %LF, but SUSv2 does not.) (C99 allows %LF, but SUSv2 does not.)
.TP .TP
.B q .B q
(`quad'. 4.4BSD and Linux libc5 only. ("quad". 4.4BSD and Linux libc5 only.
Don't use.) Don't use.)
This is a synonym for This is a synonym for
.BR ll . .BR ll .
@ -631,10 +631,10 @@ If a decimal point appears, at least one digit appears before it.
.B F .B F
and says that character string representations for infinity and NaN and says that character string representations for infinity and NaN
may be made available. may be made available.
The C99 standard specifies `[\-]inf' or `[\-]infinity' The C99 standard specifies "[\-]inf" or "[\-]infinity"
for infinity, and a string starting with `nan' for NaN, in the case of for infinity, and a string starting with "nan" for NaN, in the case of
.B f .B f
conversion, and `[\-]INF' or `[\-]INFINITY' or `NAN*' in the case of conversion, and "[\-]INF" or "[\-]INFINITY" or "NAN*" in the case of
.B F .B F
conversion.) conversion.)
.TP .TP
@ -713,7 +713,7 @@ modifier is present: The
argument is expected to be a pointer to an array of character type (pointer argument is expected to be a pointer to an array of character type (pointer
to a string). to a string).
Characters from the array are written up to (but not Characters from the array are written up to (but not
including) a terminating null byte ('\\0'); including) a terminating null byte (\(aq\\0\(aq);
if a precision is specified, no more than the number specified if a precision is specified, no more than the number specified
are written. are written.
If a precision is given, no null byte need be present; If a precision is given, no null byte need be present;
@ -781,10 +781,10 @@ Print output of
No argument is required. No argument is required.
.TP .TP
.B % .B %
A `%' is written. A \(aq%\(aq is written.
No argument is converted. No argument is converted.
The complete conversion The complete conversion
specification is `%%'. specification is \(aq%%\(aq.
.SH "CONFORMING TO" .SH "CONFORMING TO"
The The
.BR fprintf (), .BR fprintf (),
@ -823,7 +823,7 @@ support for %D disappeared.)
No locale-dependent radix character, No locale-dependent radix character,
no thousands' separator, no NaN or infinity, no %m$ and *m$. no thousands' separator, no NaN or infinity, no %m$ and *m$.
.PP .PP
Linux libc5 knows about the five C standard flags and the ' flag, Linux libc5 knows about the five C standard flags and the \(aq flag,
locale, %m$ and *m$. locale, %m$ and *m$.
It knows about the length modifiers h,l,L,Z,q, but accepts L and q It knows about the length modifiers h,l,L,Z,q, but accepts L and q
both for \fIlong double\fP and for \fIlong long int\fP (this is a bug). both for \fIlong double\fP and for \fIlong long int\fP (this is a bug).
@ -936,7 +936,7 @@ fprintf(stdout, "pi = %.5f\en", 4 * atan(1.0));
.fi .fi
.in .in
.PP .PP
To print a date and time in the form `Sunday, July 3, 10:02', To print a date and time in the form "Sunday, July 3, 10:02",
where where
.I weekday .I weekday
and and
@ -974,7 +974,7 @@ With the value:
.fi .fi
.in .in
one might obtain `Sonntag, 3. Juli, 10:02'. one might obtain "Sonntag, 3. Juli, 10:02".
.PP .PP
To allocate a sufficiently large string and print into it To allocate a sufficiently large string and print into it
(code correct for both glibc 2.0 and glibc 2.1): (code correct for both glibc 2.0 and glibc 2.1):

View File

@ -159,11 +159,12 @@ This directive matches any amount of white space,
including none, in the input. including none, in the input.
.TP .TP
\(bu \(bu
An ordinary character (i.e., one other than white space or '%'). An ordinary character (i.e., one other than white space or \(aq%\(aq).
This character must exactly match the next character of input. This character must exactly match the next character of input.
.TP .TP
\(bu \(bu
A conversion specification, which commences with a '%' (percent) character. A conversion specification,
which commences with a \(aq%\(aq (percent) character.
A sequence of characters from the input is converted according to A sequence of characters from the input is converted according to
this specification, and the result is placed in the corresponding this specification, and the result is placed in the corresponding
.I pointer .I pointer
@ -176,12 +177,12 @@ Each
.I conversion specification .I conversion specification
in in
.I format .I format
begins with either the character '%' or the character sequence begins with either the character \(aq%\(aq or the character sequence
"\fB%\fP\fIn\fP\fB$\fP" "\fB%\fP\fIn\fP\fB$\fP"
(see below for the distinction) followed by: (see below for the distinction) followed by:
.TP .TP
\(bu \(bu
An optional '*' assignment-suppression character: An optional \(aq*\(aq assignment-suppression character:
.BR scanf () .BR scanf ()
reads input as directed by the conversion specification, reads input as directed by the conversion specification,
but discards the input. but discards the input.
@ -192,7 +193,7 @@ included in the count of successful assignments returned by
.BR scanf (). .BR scanf ().
.TP .TP
\(bu \(bu
An optional 'a' character. An optional \(aqa\(aq character.
This is used with string conversions, and relieves the caller of the This is used with string conversions, and relieves the caller of the
need to allocate a corresponding buffer to hold the input: instead, need to allocate a corresponding buffer to hold the input: instead,
.BR scanf () .BR scanf ()
@ -206,7 +207,7 @@ The caller should subsequently
.BR free (3) .BR free (3)
this buffer when it is no longer required. this buffer when it is no longer required.
This is a GNU extension; This is a GNU extension;
C99 employs the 'a' character as a conversion specifier (and C99 employs the \(aqa\(aq character as a conversion specifier (and
it can also be used as such in the GNU implementation). it can also be used as such in the GNU implementation).
.TP .TP
\(bu \(bu
@ -217,7 +218,7 @@ when a non-matching character is found, whichever happens first.
Most conversions discard initial whitespace characters (the exceptions Most conversions discard initial whitespace characters (the exceptions
are noted below), are noted below),
and these discarded characters don't count towards the maximum field width. and these discarded characters don't count towards the maximum field width.
String input conversions store a null terminator ('\\0') String input conversions store a null terminator (\(aq\\0\(aq)
to mark the end of the input; to mark the end of the input;
the maximum field width does not include this terminator. the maximum field width does not include this terminator.
.TP .TP
@ -242,7 +243,7 @@ that specifies the type of input conversion to be performed.
.PP .PP
The conversion specifications in The conversion specifications in
.I format .I format
are of two forms, either beginning with '%' or beginning with are of two forms, either beginning with \(aq%\(aq or beginning with
"\fB%\fP\fIn\fP\fB$\fP". "\fB%\fP\fIn\fP\fB$\fP".
The two forms should not be mixed in the same The two forms should not be mixed in the same
.I format .I format
@ -254,7 +255,7 @@ and
.BR %* . .BR %* .
If If
.I format .I format
contains '%' contains \(aq%\(aq
specifications then these correspond in order with successive specifications then these correspond in order with successive
.I pointer .I pointer
arguments. arguments.
@ -371,11 +372,11 @@ The following
are available: are available:
.TP .TP
.B % .B %
Matches a literal '%'. Matches a literal \(aq%\(aq.
That is, That is,
.B %\&% .B %\&%
in the format string matches a in the format string matches a
single input '%' character. single input \(aq%\(aq character.
No conversion is done, and assignment does not No conversion is done, and assignment does not
occur. occur.
.TP .TP
@ -448,7 +449,7 @@ Equivalent to
Matches a sequence of non-white-space characters; Matches a sequence of non-white-space characters;
the next pointer must be a pointer to character array that is the next pointer must be a pointer to character array that is
long enough to hold the input sequence and the terminating null long enough to hold the input sequence and the terminating null
character ('\\0'), which is added automatically. character (\(aq\\0\(aq), which is added automatically.
The input string stops at white space or at the maximum field The input string stops at white space or at the maximum field
width, whichever occurs first. width, whichever occurs first.
.TP .TP

View File

@ -38,64 +38,68 @@ that will perform this function for a user program.
The rules are as follows (POSIX.2, 3.13). The rules are as follows (POSIX.2, 3.13).
.SS "Wildcard Matching" .SS "Wildcard Matching"
A string is a wildcard pattern if it contains one of the A string is a wildcard pattern if it contains one of the
characters `?', `*' or `['. characters \(aq?\(aq, \(aq*\(aq or \(aq[\(aq.
Globbing is the operation Globbing is the operation
that expands a wildcard pattern into the list of pathnames that expands a wildcard pattern into the list of pathnames
matching the pattern. matching the pattern.
Matching is defined by: Matching is defined by:
A `?' (not between brackets) matches any single character. A \(aq?\(aq (not between brackets) matches any single character.
A `*' (not between brackets) matches any string, A \(aq*\(aq (not between brackets) matches any string,
including the empty string. including the empty string.
.PP .PP
.B "Character classes" .B "Character classes"
.sp .sp
An expression `[...]' where the first character after the An expression "\fI[...]\fP" where the first character after the
leading `[' is not an `!' matches a single character, leading \(aq[\(aq is not an \(aq!\(aq matches a single character,
namely any of the characters enclosed by the brackets. namely any of the characters enclosed by the brackets.
The string enclosed by the brackets cannot be empty; The string enclosed by the brackets cannot be empty;
therefore `]' can be allowed between the brackets, provided therefore \(aq]\(aq can be allowed between the brackets, provided
that it is the first character. that it is the first character.
(Thus, `[][!]' matches the three characters `[', `]' and `!'.) (Thus, "\fI[][!]\fP" matches the
three characters \(aq[\(aq, \(aq]\(aq and \(aq!\(aq.)
.PP .PP
.B Ranges .B Ranges
.sp .sp
There is one special convention: There is one special convention:
two characters separated by `\-' denote a range. two characters separated by \(aq\-\(aq denote a range.
(Thus, `[A\-Fa\-f0\-9]' is equivalent to `[ABCDEFabcdef0123456789]'.) (Thus, "\fI[A\-Fa\-f0\-9]\fP" is equivalent to "\fI[ABCDEFabcdef0123456789]\fP".)
One may include `\-' in its literal meaning by making it the One may include \(aq\-\(aq in its literal meaning by making it the
first or last character between the brackets. first or last character between the brackets.
(Thus, `[]\-]' matches just the two characters `]' and `\-', (Thus, "\fI[]\-]\fP" matches just the two characters \(aq]\(aq and \(aq\-\(aq,
and `[\-\-0]' matches the three characters `\-', `.', `0', since `/' and "\fI[\-\-0]\fP" matches the
three characters \(aq\-\(aq, \(aq.\(aq, \(aq0\(aq, since \(aq/\(aq
cannot be matched.) cannot be matched.)
.PP .PP
.B Complementation .B Complementation
.sp .sp
An expression `[!...]' matches a single character, namely An expression "\fI[!...]\fP" matches a single character, namely
any character that is not matched by the expression obtained any character that is not matched by the expression obtained
by removing the first `!' from it. by removing the first \(aq!\(aq from it.
(Thus, `[!]a\-]' matches any single character except `]', `a' and `\-'.) (Thus, "\fI[!]a\-]\fP" matches any
single character except \(aq]\(aq, \(aqa\(aq and \(aq\-\(aq.)
One can remove the special meaning of `?', `*' and `[' by One can remove the special meaning of \(aq?\(aq, \(aq*\(aq and \(aq[\(aq by
preceding them by a backslash, or, in case this is part of preceding them by a backslash, or, in case this is part of
a shell command line, enclosing them in quotes. a shell command line, enclosing them in quotes.
Between brackets these characters stand for themselves. Between brackets these characters stand for themselves.
Thus, `[[?*\e]' matches the four characters `[', `?', `*' and `\e'. Thus, "\fI[[?*\e]\fP" matches the
four characters \(aq[\(aq, \(aq?\(aq, \(aq*\(aq and \(aq\e\(aq.
.SS Pathnames .SS Pathnames
Globbing is applied on each of the components of a pathname Globbing is applied on each of the components of a pathname
separately. separately.
A `/' in a pathname cannot be matched by a `?' or `*' A \(aq/\(aq in a pathname cannot be matched by a \(aq?\(aq or \(aq*\(aq
wildcard, or by a range like `[.\-0]'. wildcard, or by a range like "\fI[.\-0]\fP".
A range cannot contain an A range cannot contain an
explicit `/' character; this would lead to a syntax error. explicit \(aq/\(aq character; this would lead to a syntax error.
If a filename starts with a `.', this character must be matched explicitly. If a filename starts with a \(aq.\(aq, this character must be matched explicitly.
(Thus, `rm *' will not remove .profile, and `tar c *' will not (Thus, \fIrm\ *\fP will not remove .profile, and \fItar\ c\ *\fP will not
archive all your files; `tar c .' is better.) archive all your files; \fItar\ c\ .\fP is better.)
.SS "Empty Lists" .SS "Empty Lists"
The nice and simple rule given above: `expand a wildcard pattern The nice and simple rule given above: "expand a wildcard pattern
into the list of matching pathnames' was the original Unix into the list of matching pathnames" was the original Unix
definition. definition.
It allowed one to have patterns that expand into It allowed one to have patterns that expand into
an empty list, as in an empty list, as in
@ -133,15 +137,15 @@ Note that wildcard patterns are not regular expressions,
although they are a bit similar. although they are a bit similar.
First of all, they match First of all, they match
filenames, rather than text, and secondly, the conventions filenames, rather than text, and secondly, the conventions
are not the same: for example, in a regular expression `*' means zero or are not the same: for example, in a regular expression \(aq*\(aq means zero or
more copies of the preceding thing. more copies of the preceding thing.
Now that regular expressions have bracket expressions where Now that regular expressions have bracket expressions where
the negation is indicated by a `^', POSIX has declared the the negation is indicated by a \(aq^\(aq, POSIX has declared the
effect of a wildcard pattern `[^...]' to be undefined. effect of a wildcard pattern "\fI[^...]\fP" to be undefined.
.SS Character classes and Internationalization .SS Character classes and Internationalization
Of course ranges were originally meant to be ASCII ranges, Of course ranges were originally meant to be ASCII ranges,
so that `[\ \-%]' stands for `[\ !"#$%]' and `[a\-z]' stands so that "\fI[\ \-%]\fP" stands for "\fI[\ !"#$%]\fP" and "\fI[a\-z]\fP" stands
for "any lowercase letter". for "any lowercase letter".
Some Unix implementations generalized this so that a range X\-Y Some Unix implementations generalized this so that a range X\-Y
stands for the set of characters with code between the codes for stands for the set of characters with code between the codes for
@ -172,29 +176,29 @@ category in the current locale.
[:punct:] [:space:] [:upper:] [:xdigit:] [:punct:] [:space:] [:upper:] [:xdigit:]
.fi .fi
so that one can say `[[:lower:]]' instead of `[a\-z]', and have so that one can say "\fI[[:lower:]]\fP" instead of "\fI[a\-z]\fP", and have
things work in Denmark, too, where there are three letters past `z' things work in Denmark, too, where there are three letters past \(aqz\(aq
in the alphabet. in the alphabet.
These character classes are defined by the These character classes are defined by the
.B LC_CTYPE .B LC_CTYPE
category category
in the current locale. in the current locale.
(v) Collating symbols, like `[.ch.]' or `[.a-acute.]', (v) Collating symbols, like "\fI[.ch.]\fP" or "\fI[.a-acute.]\fP",
where the string between `[.' and `.]' is a collating where the string between "\fI[.\fP" and "\fI.]\fP" is a collating
element defined for the current locale. element defined for the current locale.
Note that this may Note that this may
be a multi-character element. be a multi-character element.
(vi) Equivalence class expressions, like `[=a=]', (vi) Equivalence class expressions, like "\fI[=a=]\fP",
where the string between `[=' and `=]' is any collating where the string between "\fI[=\fP" and "\fI=]\fP" is any collating
element from its equivalence class, as defined for the element from its equivalence class, as defined for the
current locale. current locale.
For example, `[[=a=]]' might be equivalent For example, "\fI[[=a=]]\fP" might be equivalent
.\" FIXME . the accented 'a' characters are not rendering properly .\" FIXME . the accented 'a' characters are not rendering properly
.\" mtk May 2007 .\" mtk May 2007
to `[aáàäâ]' (warning: Latin-1 here), that is, to "\fI[aáàäâ]\fP" (warning: Latin-1 here), that is,
to `[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]'. to "\fI[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]\fP".
.SH "SEE ALSO" .SH "SEE ALSO"
.BR sh (1), .BR sh (1),
.BR fnmatch (3), .BR fnmatch (3),

View File

@ -47,26 +47,26 @@ POSIX.2 "basic" REs).
Obsolete REs mostly exist for backward compatibility in some old programs; Obsolete REs mostly exist for backward compatibility in some old programs;
they will be discussed at the end. they will be discussed at the end.
POSIX.2 leaves some aspects of RE syntax and semantics open; POSIX.2 leaves some aspects of RE syntax and semantics open;
`\*(dg' marks decisions on these aspects that "\*(dg" marks decisions on these aspects that
may not be fully portable to other POSIX.2 implementations. may not be fully portable to other POSIX.2 implementations.
.PP .PP
A (modern) RE is one\*(dg or more non-empty\*(dg \fIbranches\fR, A (modern) RE is one\*(dg or more non-empty\*(dg \fIbranches\fR,
separated by `|'. separated by \(aq|\(aq.
It matches anything that matches one of the branches. It matches anything that matches one of the branches.
.PP .PP
A branch is one\*(dg or more \fIpieces\fR, concatenated. A branch is one\*(dg or more \fIpieces\fR, concatenated.
It matches a match for the first, followed by a match for the second, etc. It matches a match for the first, followed by a match for the second, etc.
.PP .PP
A piece is an \fIatom\fR possibly followed A piece is an \fIatom\fR possibly followed
by a single\*(dg `*', `+', `?', or \fIbound\fR. by a single\*(dg \(aq*\(aq, \(aq+\(aq, \(aq?\(aq, or \fIbound\fR.
An atom followed by `*' matches a sequence of 0 or more matches of the atom. An atom followed by \(aq*\(aq matches a sequence of 0 or more matches of the atom.
An atom followed by `+' matches a sequence of 1 or more matches of the atom. An atom followed by \(aq+\(aq matches a sequence of 1 or more matches of the atom.
An atom followed by `?' matches a sequence of 0 or 1 matches of the atom. An atom followed by \(aq?\(aq matches a sequence of 0 or 1 matches of the atom.
.PP .PP
A \fIbound\fR is `{' followed by an unsigned decimal integer, A \fIbound\fR is \(aq{\(aq followed by an unsigned decimal integer,
possibly followed by `,' possibly followed by \(aq,\(aq
possibly followed by another unsigned decimal integer, possibly followed by another unsigned decimal integer,
always followed by `}'. always followed by \(aq}\(aq.
The integers must lie between 0 and The integers must lie between 0 and
.B RE_DUP_MAX .B RE_DUP_MAX
(255\*(dg) inclusive, (255\*(dg) inclusive,
@ -81,71 +81,71 @@ An atom followed by a bound
containing two integers \fIi\fR and \fIj\fR matches containing two integers \fIi\fR and \fIj\fR matches
a sequence of \fIi\fR through \fIj\fR (inclusive) matches of the atom. a sequence of \fIi\fR through \fIj\fR (inclusive) matches of the atom.
.PP .PP
An atom is a regular expression enclosed in `()' (matching a match for the An atom is a regular expression enclosed in "\fI()\fP" (matching a match for the
regular expression), regular expression),
an empty set of `()' (matching the null string)\*(dg, an empty set of "\fI()\fP" (matching the null string)\*(dg,
a \fIbracket expression\fR (see below), `.' a \fIbracket expression\fR (see below), \(aq.\(aq
(matching any single character), `^' (matching the null string at the (matching any single character), \(aq^\(aq (matching the null string at the
beginning of a line), `$' (matching the null string at the beginning of a line), \(aq$\(aq (matching the null string at the
end of a line), a `\e' followed by one of the characters end of a line), a \(aq\e\(aq followed by one of the characters
`^.[$()|*+?{\e' "\fI^.[$()|*+?{\e\fP"
(matching that character taken as an ordinary character), (matching that character taken as an ordinary character),
a `\e' followed by any other character\*(dg a \(aq\e\(aq followed by any other character\*(dg
(matching that character taken as an ordinary character, (matching that character taken as an ordinary character,
as if the `\e' had not been present\*(dg), as if the \(aq\e\(aq had not been present\*(dg),
or a single character with no other significance (matching that character). or a single character with no other significance (matching that character).
A `{' followed by a character other than a digit is an ordinary A \(aq{\(aq followed by a character other than a digit is an ordinary
character, not the beginning of a bound\*(dg. character, not the beginning of a bound\*(dg.
It is illegal to end an RE with `\e'. It is illegal to end an RE with \(aq\e\(aq.
.PP .PP
A \fIbracket expression\fR is a list of characters enclosed in `[]'. A \fIbracket expression\fR is a list of characters enclosed in "\fI[]\fP".
It normally matches any single character from the list (but see below). It normally matches any single character from the list (but see below).
If the list begins with `^', If the list begins with \(aq^\(aq,
it matches any single character it matches any single character
(but see below) \fInot\fR from the rest of the list. (but see below) \fInot\fR from the rest of the list.
If two characters in the list are separated by `\-', this is shorthand If two characters in the list are separated by \(aq\-\(aq, this is shorthand
for the full \fIrange\fR of characters between those two (inclusive) in the for the full \fIrange\fR of characters between those two (inclusive) in the
collating sequence, collating sequence,
for example, `[0\-9]' in ASCII matches any decimal digit. for example, "\fI[0\-9]\fP" in ASCII matches any decimal digit.
It is illegal\*(dg for two ranges to share an It is illegal\*(dg for two ranges to share an
endpoint, for example, `a-c-e'. endpoint, for example, "\fIa-c-e\fP".
Ranges are very collating-sequence-dependent, Ranges are very collating-sequence-dependent,
and portable programs should avoid relying on them. and portable programs should avoid relying on them.
.PP .PP
To include a literal `]' in the list, make it the first character To include a literal \(aq]\(aq in the list, make it the first character
(following a possible `^'). (following a possible \(aq^\(aq).
To include a literal `\-', make it the first or last character, To include a literal \(aq\-\(aq, make it the first or last character,
or the second endpoint of a range. or the second endpoint of a range.
To use a literal `\-' as the first endpoint of a range, To use a literal \(aq\-\(aq as the first endpoint of a range,
enclose it in `[.' and `.]' to make it a collating element (see below). enclose it in "\fI[.\fP" and "\fI.]\fP" to make it a collating element (see below).
With the exception of these and some combinations using `[' (see next With the exception of these and some combinations using \(aq[\(aq (see next
paragraphs), all other special characters, including `\e', lose their paragraphs), all other special characters, including \(aq\e\(aq, lose their
special significance within a bracket expression. special significance within a bracket expression.
.PP .PP
Within a bracket expression, a collating element (a character, Within a bracket expression, a collating element (a character,
a multi-character sequence that collates as if it were a single character, a multi-character sequence that collates as if it were a single character,
or a collating-sequence name for either) or a collating-sequence name for either)
enclosed in `[.' and `.]' stands for the enclosed in "\fI[.\fP" and "\fI.]\fP" stands for the
sequence of characters of that collating element. sequence of characters of that collating element.
The sequence is a single element of the bracket expression's list. The sequence is a single element of the bracket expression's list.
A bracket expression containing a multi-character collating element A bracket expression containing a multi-character collating element
can thus match more than one character, can thus match more than one character,
for example, if the collating sequence includes a `ch' collating element, for example, if the collating sequence includes a "ch" collating element,
then the RE `[[.ch.]]*c' matches the first five characters then the RE "\fI[[.ch.]]*c\fP" matches the first five characters
of `chchcc'. of "chchcc".
.PP .PP
Within a bracket expression, a collating element enclosed in `[=' and Within a bracket expression, a collating element enclosed in "\fI[=\fP" and
`=]' is an equivalence class, standing for the sequences of characters "\fI=]\fP" is an equivalence class, standing for the sequences of characters
of all collating elements equivalent to that one, including itself. of all collating elements equivalent to that one, including itself.
(If there are no other equivalent collating elements, (If there are no other equivalent collating elements,
the treatment is as if the enclosing delimiters were `[.' and `.]'.) the treatment is as if the enclosing delimiters were "\fI[.\fP" and "\fI.]\fP".)
For example, if o and \o'o^' are the members of an equivalence class, For example, if o and \o'o^' are the members of an equivalence class,
then `[[=o=]]', `[[=\o'o^'=]]', and `[o\o'o^']' are all synonymous. then "\fI[[=o=]]\fP", "\fI[[=\o'o^'=]]\fP", and "\fI[o\o'o^']\fP" are all synonymous.
An equivalence class may not\*(dg be an endpoint An equivalence class may not\*(dg be an endpoint
of a range. of a range.
.PP .PP
Within a bracket expression, the name of a \fIcharacter class\fR enclosed Within a bracket expression, the name of a \fIcharacter class\fR enclosed
in `[:' and `:]' stands for the list of all characters belonging to that in "\fI[:\fP" and "\fI:]\fP" stands for the list of all characters belonging to that
class. class.
Standard character class names are: Standard character class names are:
.PP .PP
@ -167,7 +167,7 @@ A character class may not be used as an endpoint of a range.
.\" The following does not seem to apply in the glibc implementation .\" The following does not seem to apply in the glibc implementation
.\" .PP .\" .PP
.\" There are two special cases\*(dg of bracket expressions: .\" There are two special cases\*(dg of bracket expressions:
.\" the bracket expressions `[[:<:]]' and `[[:>:]]' match the null string at .\" the bracket expressions "\fI[[:<:]]\fP" and "\fI[[:>:]]\fP" match the null string at
.\" the beginning and end of a word respectively. .\" the beginning and end of a word respectively.
.\" A word is defined as a sequence of .\" A word is defined as a sequence of
.\" word characters .\" word characters
@ -198,11 +198,11 @@ their lower-level component subexpressions.
Match lengths are measured in characters, not collating elements. Match lengths are measured in characters, not collating elements.
A null string is considered longer than no match at all. A null string is considered longer than no match at all.
For example, For example,
`bb*' matches the three middle characters of `abbbc', "\fIbb*\fP" matches the three middle characters of "abbbc",
`(wee|week)(knights|nights)' matches all ten characters of `weeknights', "\fI(wee|week)(knights|nights)\fP" matches all ten characters of "weeknights",
when `(.*).*' is matched against `abc' the parenthesized subexpression when "\fI(.*).*\fP" is matched against "abc" the parenthesized subexpression
matches all three characters, and matches all three characters, and
when `(a*)*' is matched against `bc' both the whole RE and the parenthesized when "\fI(a*)*\fP" is matched against "bc" both the whole RE and the parenthesized
subexpression match the null string. subexpression match the null string.
.PP .PP
If case-independent matching is specified, If case-independent matching is specified,
@ -211,10 +211,10 @@ alphabet.
When an alphabetic that exists in multiple cases appears as an When an alphabetic that exists in multiple cases appears as an
ordinary character outside a bracket expression, it is effectively ordinary character outside a bracket expression, it is effectively
transformed into a bracket expression containing both cases, transformed into a bracket expression containing both cases,
for example, `x' becomes `[xX]'. for example, \(aqx\(aq becomes "\fI[xX]\fP".
When it appears inside a bracket expression, all case counterparts When it appears inside a bracket expression, all case counterparts
of it are added to the bracket expression, so that, for example, `[x]' of it are added to the bracket expression, so that, for example, "\fI[x]\fP"
becomes `[xX]' and `[^x]' becomes `[^xX]'. becomes "\fI[xX]\fP" and "\fI[^x]\fP" becomes "\fI[^xX]\fP".
.PP .PP
No particular limit is imposed on the length of REs\*(dg. No particular limit is imposed on the length of REs\*(dg.
Programs intended to be portable should not employ REs longer Programs intended to be portable should not employ REs longer
@ -223,32 +223,32 @@ as an implementation can refuse to accept such REs and remain
POSIX-compliant. POSIX-compliant.
.PP .PP
Obsolete ("basic") regular expressions differ in several respects. Obsolete ("basic") regular expressions differ in several respects.
`|', `+', and `?' are ordinary characters and there is no equivalent \(aq|\(aq, \(aq+\(aq, and \(aq?\(aq are ordinary characters and there is no equivalent
for their functionality. for their functionality.
The delimiters for bounds are `\e{' and `\e}', The delimiters for bounds are "\fI\e{\fP" and "\fI\e}\fP",
with `{' and `}' by themselves ordinary characters. with \(aq{\(aq and \(aq}\(aq by themselves ordinary characters.
The parentheses for nested subexpressions are `\e(' and `\e)', The parentheses for nested subexpressions are "\fI\e(\fP" and "\fI\e)\fP",
with `(' and `)' by themselves ordinary characters. with \(aq(\(aq and \(aq)\(aq by themselves ordinary characters.
`^' is an ordinary character except at the beginning of the \(aq^\(aq is an ordinary character except at the beginning of the
RE or\*(dg the beginning of a parenthesized subexpression, RE or\*(dg the beginning of a parenthesized subexpression,
`$' is an ordinary character except at the end of the \(aq$\(aq is an ordinary character except at the end of the
RE or\*(dg the end of a parenthesized subexpression, RE or\*(dg the end of a parenthesized subexpression,
and `*' is an ordinary character if it appears at the beginning of the and \(aq*\(aq is an ordinary character if it appears at the beginning of the
RE or the beginning of a parenthesized subexpression RE or the beginning of a parenthesized subexpression
(after a possible leading `^'). (after a possible leading \(aq^\(aq).
.PP .PP
Finally, there is one new type of atom, a \fIback reference\fR: Finally, there is one new type of atom, a \fIback reference\fR:
`\e' followed by a non-zero decimal digit \fId\fR \(aq\e\(aq followed by a non-zero decimal digit \fId\fR
matches the same sequence of characters matches the same sequence of characters
matched by the \fId\fRth parenthesized subexpression matched by the \fId\fRth parenthesized subexpression
(numbering subexpressions by the positions of their opening parentheses, (numbering subexpressions by the positions of their opening parentheses,
left to right), left to right),
so that, for example, `\e([bc]\e)\e1' matches `bb' or `cc' but not `bc'. so that, for example, "\fI\e([bc]\e)\e1\fP" matches "bb" or "cc" but not "bc".
.SH BUGS .SH BUGS
Having two kinds of REs is a botch. Having two kinds of REs is a botch.
.PP .PP
The current POSIX.2 spec says that `)' is an ordinary character in The current POSIX.2 spec says that \(aq)\(aq is an ordinary character in
the absence of an unmatched `('; the absence of an unmatched \(aq(\(aq;
this was an unintentional result of a wording error, this was an unintentional result of a wording error,
and change is likely. and change is likely.
Avoid relying on it. Avoid relying on it.
@ -257,7 +257,7 @@ Back references are a dreadful botch,
posing major problems for efficient implementations. posing major problems for efficient implementations.
They are also somewhat vaguely defined They are also somewhat vaguely defined
(does (does
`a\e(\e(b\e)*\e2\e)*d' match `abbbd'?). "\fIa\e(\e(b\e)*\e2\e)*d\fP" match "abbbd"?).
Avoid using them. Avoid using them.
.PP .PP
POSIX.2's specification of case-independent matching is vague. POSIX.2's specification of case-independent matching is vague.