Try and bring some consistency to quotes.

This commit is contained in:
Michael Kerrisk 2008-06-09 21:03:52 +00:00
parent ce17c5d391
commit 333a424b0e
4 changed files with 153 additions and 148 deletions

View File

@ -178,11 +178,11 @@ and an optional
The arguments must correspond properly (after type promotion) with the
conversion specifier.
By default, the arguments are used in the order
given, where each `*' and each conversion specifier asks for the next
given, where each \(aq*\(aq and each conversion specifier asks for the next
argument (and it is an error if insufficiently many arguments are given).
One can also specify explicitly which argument is taken,
at each place where an argument is required, by writing `%m$' instead
of `%' and `*m$' instead of `*', where the decimal integer m denotes
at each place where an argument is required, by writing "%m$" instead
of \(aq%\(aq and "*m$" instead of \(aq*\(aq, where the decimal integer m denotes
the position in the argument list of the desired argument, indexed starting
from 1.
Thus,
@ -204,35 +204,35 @@ printf("%2$*1$d", width, num);
are equivalent.
The second style allows repeated references to the
same argument.
The C99 standard does not include the style using `$',
The C99 standard does not include the style using \(aq$\(aq,
which comes from the Single Unix Specification.
If the style using
`$' is used, it must be used throughout for all conversions taking an
\(aq$\(aq is used, it must be used throughout for all conversions taking an
argument and all width and precision arguments, but it may be mixed
with `%%' formats which do not consume an argument.
with "%%" formats which do not consume an argument.
There may be no
gaps in the numbers of arguments specified using `$'; for example, if
gaps in the numbers of arguments specified using \(aq$\(aq; for example, if
arguments 1 and 3 are specified, argument 2 must also be specified
somewhere in the format string.
For some numeric conversions a radix character (`decimal point') or
For some numeric conversions a radix character ("decimal point") or
thousands' grouping character is used.
The actual character used
depends on the
.B LC_NUMERIC
part of the locale.
The POSIX locale
uses `.' as radix character, and does not have a grouping character.
uses \(aq.\(aq as radix character, and does not have a grouping character.
Thus,
.in +4n
.nf
printf("%'.2f", 1234567.89);
printf("%\(aq.2f", 1234567.89);
.fi
.in
results in `1234567.89' in the POSIX locale, in `1234567,89' in the
nl_NL locale, and in `1.234.567,89' in the da_DK locale.
results in "1234567.89" in the POSIX locale, in "1234567,89" in the
nl_NL locale, and in "1.234.567,89" in the da_DK locale.
.SS "The flag characters"
The character % is followed by zero or more of the following flags:
.TP
@ -246,7 +246,7 @@ For
.B x
and
.B X
conversions, a non-zero result has the string `0x' (or `0X' for
conversions, a non-zero result has the string "0x" (or "0X" for
.B X
conversions) prepended to it.
For
@ -323,7 +323,7 @@ overrides a
.B \&0
if both are given.
.TP
.B ' '
.B \(aq \(aq
(a space) A blank should be left before a positive number
(or empty string) produced by a signed conversion.
.TP
@ -338,7 +338,7 @@ overrides a space if both are used.
The five flag characters above are defined in the C standard.
The SUSv2 specifies one further flag character.
.TP
.B '
.B \(aq
For decimal conversion
.RB ( i ,
.BR d ,
@ -353,7 +353,7 @@ Note that many versions of
.BR gcc (1)
cannot parse this option and will issue a warning.
SUSv2 does not
include %'F.
include \fI%\(aqF\fP.
.PP
glibc 2.2 adds one further flag character.
.TP
@ -364,7 +364,7 @@ For decimal integer conversion
.BR u )
the output uses the locale's alternative output digits, if any.
For example, since glibc 2.2.3 this will give Arabic-Indic digits
in the Persian (`fa_IR') locale.
in the Persian ("fa_IR") locale.
.\" outdigits keyword in locale file
.SS "The field width"
An optional decimal digit string (with non-zero first digit) specifying
@ -372,25 +372,25 @@ a minimum field width.
If the converted value has fewer characters
than the field width, it will be padded with spaces on the left
(or right, if the left-adjustment flag has been given).
Instead of a decimal digit string one may write `*' or `*m$'
(for some decimal integer m) to specify that the field width
is given in the next argument, or in the m-th argument, respectively,
Instead of a decimal digit string one may write "*" or "*m$"
(for some decimal integer \fIm\fP) to specify that the field width
is given in the next argument, or in the \fIm\fP-th argument, respectively,
which must be of type
.IR int .
A negative field width is taken as a `\-' flag followed by a
A negative field width is taken as a \(aq\-\(aq flag followed by a
positive field width.
In no case does a nonexistent or small field width cause truncation of a
field; if the result of a conversion is wider than the field width, the
field is expanded to contain the conversion result.
.SS "The precision"
An optional precision, in the form of a period (`\&.') followed by an
An optional precision, in the form of a period (\(aq.\(aq) followed by an
optional decimal digit string.
Instead of a decimal digit string one may write `*' or `*m$'
Instead of a decimal digit string one may write "*" or "*m$"
(for some decimal integer m) to specify that the precision
is given in the next argument, or in the m-th argument, respectively,
which must be of type
.IR int .
If the precision is given as just `.', or the precision is negative,
If the precision is given as just \(aq.\(aq, or the precision is negative,
the precision is taken to be zero.
This gives the minimum number of digits to appear for
.BR d ,
@ -419,7 +419,7 @@ and
.B S
conversions.
.SS "The length modifier"
Here, `integer conversion' stands for
Here, "integer conversion" stands for
.BR d ,
.BR i ,
.BR o ,
@ -499,7 +499,7 @@ argument.
(C99 allows %LF, but SUSv2 does not.)
.TP
.B q
(`quad'. 4.4BSD and Linux libc5 only.
("quad". 4.4BSD and Linux libc5 only.
Don't use.)
This is a synonym for
.BR ll .
@ -631,10 +631,10 @@ If a decimal point appears, at least one digit appears before it.
.B F
and says that character string representations for infinity and NaN
may be made available.
The C99 standard specifies `[\-]inf' or `[\-]infinity'
for infinity, and a string starting with `nan' for NaN, in the case of
The C99 standard specifies "[\-]inf" or "[\-]infinity"
for infinity, and a string starting with "nan" for NaN, in the case of
.B f
conversion, and `[\-]INF' or `[\-]INFINITY' or `NAN*' in the case of
conversion, and "[\-]INF" or "[\-]INFINITY" or "NAN*" in the case of
.B F
conversion.)
.TP
@ -713,7 +713,7 @@ modifier is present: The
argument is expected to be a pointer to an array of character type (pointer
to a string).
Characters from the array are written up to (but not
including) a terminating null byte ('\\0');
including) a terminating null byte (\(aq\\0\(aq);
if a precision is specified, no more than the number specified
are written.
If a precision is given, no null byte need be present;
@ -781,10 +781,10 @@ Print output of
No argument is required.
.TP
.B %
A `%' is written.
A \(aq%\(aq is written.
No argument is converted.
The complete conversion
specification is `%%'.
specification is \(aq%%\(aq.
.SH "CONFORMING TO"
The
.BR fprintf (),
@ -823,7 +823,7 @@ support for %D disappeared.)
No locale-dependent radix character,
no thousands' separator, no NaN or infinity, no %m$ and *m$.
.PP
Linux libc5 knows about the five C standard flags and the ' flag,
Linux libc5 knows about the five C standard flags and the \(aq flag,
locale, %m$ and *m$.
It knows about the length modifiers h,l,L,Z,q, but accepts L and q
both for \fIlong double\fP and for \fIlong long int\fP (this is a bug).
@ -936,7 +936,7 @@ fprintf(stdout, "pi = %.5f\en", 4 * atan(1.0));
.fi
.in
.PP
To print a date and time in the form `Sunday, July 3, 10:02',
To print a date and time in the form "Sunday, July 3, 10:02",
where
.I weekday
and
@ -974,7 +974,7 @@ With the value:
.fi
.in
one might obtain `Sonntag, 3. Juli, 10:02'.
one might obtain "Sonntag, 3. Juli, 10:02".
.PP
To allocate a sufficiently large string and print into it
(code correct for both glibc 2.0 and glibc 2.1):

View File

@ -159,11 +159,12 @@ This directive matches any amount of white space,
including none, in the input.
.TP
\(bu
An ordinary character (i.e., one other than white space or '%').
An ordinary character (i.e., one other than white space or \(aq%\(aq).
This character must exactly match the next character of input.
.TP
\(bu
A conversion specification, which commences with a '%' (percent) character.
A conversion specification,
which commences with a \(aq%\(aq (percent) character.
A sequence of characters from the input is converted according to
this specification, and the result is placed in the corresponding
.I pointer
@ -176,12 +177,12 @@ Each
.I conversion specification
in
.I format
begins with either the character '%' or the character sequence
begins with either the character \(aq%\(aq or the character sequence
"\fB%\fP\fIn\fP\fB$\fP"
(see below for the distinction) followed by:
.TP
\(bu
An optional '*' assignment-suppression character:
An optional \(aq*\(aq assignment-suppression character:
.BR scanf ()
reads input as directed by the conversion specification,
but discards the input.
@ -192,7 +193,7 @@ included in the count of successful assignments returned by
.BR scanf ().
.TP
\(bu
An optional 'a' character.
An optional \(aqa\(aq character.
This is used with string conversions, and relieves the caller of the
need to allocate a corresponding buffer to hold the input: instead,
.BR scanf ()
@ -206,7 +207,7 @@ The caller should subsequently
.BR free (3)
this buffer when it is no longer required.
This is a GNU extension;
C99 employs the 'a' character as a conversion specifier (and
C99 employs the \(aqa\(aq character as a conversion specifier (and
it can also be used as such in the GNU implementation).
.TP
\(bu
@ -217,7 +218,7 @@ when a non-matching character is found, whichever happens first.
Most conversions discard initial whitespace characters (the exceptions
are noted below),
and these discarded characters don't count towards the maximum field width.
String input conversions store a null terminator ('\\0')
String input conversions store a null terminator (\(aq\\0\(aq)
to mark the end of the input;
the maximum field width does not include this terminator.
.TP
@ -242,7 +243,7 @@ that specifies the type of input conversion to be performed.
.PP
The conversion specifications in
.I format
are of two forms, either beginning with '%' or beginning with
are of two forms, either beginning with \(aq%\(aq or beginning with
"\fB%\fP\fIn\fP\fB$\fP".
The two forms should not be mixed in the same
.I format
@ -254,7 +255,7 @@ and
.BR %* .
If
.I format
contains '%'
contains \(aq%\(aq
specifications then these correspond in order with successive
.I pointer
arguments.
@ -371,11 +372,11 @@ The following
are available:
.TP
.B %
Matches a literal '%'.
Matches a literal \(aq%\(aq.
That is,
.B %\&%
in the format string matches a
single input '%' character.
single input \(aq%\(aq character.
No conversion is done, and assignment does not
occur.
.TP
@ -448,7 +449,7 @@ Equivalent to
Matches a sequence of non-white-space characters;
the next pointer must be a pointer to character array that is
long enough to hold the input sequence and the terminating null
character ('\\0'), which is added automatically.
character (\(aq\\0\(aq), which is added automatically.
The input string stops at white space or at the maximum field
width, whichever occurs first.
.TP

View File

@ -38,64 +38,68 @@ that will perform this function for a user program.
The rules are as follows (POSIX.2, 3.13).
.SS "Wildcard Matching"
A string is a wildcard pattern if it contains one of the
characters `?', `*' or `['.
characters \(aq?\(aq, \(aq*\(aq or \(aq[\(aq.
Globbing is the operation
that expands a wildcard pattern into the list of pathnames
matching the pattern.
Matching is defined by:
A `?' (not between brackets) matches any single character.
A \(aq?\(aq (not between brackets) matches any single character.
A `*' (not between brackets) matches any string,
A \(aq*\(aq (not between brackets) matches any string,
including the empty string.
.PP
.B "Character classes"
.sp
An expression `[...]' where the first character after the
leading `[' is not an `!' matches a single character,
An expression "\fI[...]\fP" where the first character after the
leading \(aq[\(aq is not an \(aq!\(aq matches a single character,
namely any of the characters enclosed by the brackets.
The string enclosed by the brackets cannot be empty;
therefore `]' can be allowed between the brackets, provided
therefore \(aq]\(aq can be allowed between the brackets, provided
that it is the first character.
(Thus, `[][!]' matches the three characters `[', `]' and `!'.)
(Thus, "\fI[][!]\fP" matches the
three characters \(aq[\(aq, \(aq]\(aq and \(aq!\(aq.)
.PP
.B Ranges
.sp
There is one special convention:
two characters separated by `\-' denote a range.
(Thus, `[A\-Fa\-f0\-9]' is equivalent to `[ABCDEFabcdef0123456789]'.)
One may include `\-' in its literal meaning by making it the
two characters separated by \(aq\-\(aq denote a range.
(Thus, "\fI[A\-Fa\-f0\-9]\fP" is equivalent to "\fI[ABCDEFabcdef0123456789]\fP".)
One may include \(aq\-\(aq in its literal meaning by making it the
first or last character between the brackets.
(Thus, `[]\-]' matches just the two characters `]' and `\-',
and `[\-\-0]' matches the three characters `\-', `.', `0', since `/'
(Thus, "\fI[]\-]\fP" matches just the two characters \(aq]\(aq and \(aq\-\(aq,
and "\fI[\-\-0]\fP" matches the
three characters \(aq\-\(aq, \(aq.\(aq, \(aq0\(aq, since \(aq/\(aq
cannot be matched.)
.PP
.B Complementation
.sp
An expression `[!...]' matches a single character, namely
An expression "\fI[!...]\fP" matches a single character, namely
any character that is not matched by the expression obtained
by removing the first `!' from it.
(Thus, `[!]a\-]' matches any single character except `]', `a' and `\-'.)
by removing the first \(aq!\(aq from it.
(Thus, "\fI[!]a\-]\fP" matches any
single character except \(aq]\(aq, \(aqa\(aq and \(aq\-\(aq.)
One can remove the special meaning of `?', `*' and `[' by
One can remove the special meaning of \(aq?\(aq, \(aq*\(aq and \(aq[\(aq by
preceding them by a backslash, or, in case this is part of
a shell command line, enclosing them in quotes.
Between brackets these characters stand for themselves.
Thus, `[[?*\e]' matches the four characters `[', `?', `*' and `\e'.
Thus, "\fI[[?*\e]\fP" matches the
four characters \(aq[\(aq, \(aq?\(aq, \(aq*\(aq and \(aq\e\(aq.
.SS Pathnames
Globbing is applied on each of the components of a pathname
separately.
A `/' in a pathname cannot be matched by a `?' or `*'
wildcard, or by a range like `[.\-0]'.
A \(aq/\(aq in a pathname cannot be matched by a \(aq?\(aq or \(aq*\(aq
wildcard, or by a range like "\fI[.\-0]\fP".
A range cannot contain an
explicit `/' character; this would lead to a syntax error.
explicit \(aq/\(aq character; this would lead to a syntax error.
If a filename starts with a `.', this character must be matched explicitly.
(Thus, `rm *' will not remove .profile, and `tar c *' will not
archive all your files; `tar c .' is better.)
If a filename starts with a \(aq.\(aq, this character must be matched explicitly.
(Thus, \fIrm\ *\fP will not remove .profile, and \fItar\ c\ *\fP will not
archive all your files; \fItar\ c\ .\fP is better.)
.SS "Empty Lists"
The nice and simple rule given above: `expand a wildcard pattern
into the list of matching pathnames' was the original Unix
The nice and simple rule given above: "expand a wildcard pattern
into the list of matching pathnames" was the original Unix
definition.
It allowed one to have patterns that expand into
an empty list, as in
@ -133,15 +137,15 @@ Note that wildcard patterns are not regular expressions,
although they are a bit similar.
First of all, they match
filenames, rather than text, and secondly, the conventions
are not the same: for example, in a regular expression `*' means zero or
are not the same: for example, in a regular expression \(aq*\(aq means zero or
more copies of the preceding thing.
Now that regular expressions have bracket expressions where
the negation is indicated by a `^', POSIX has declared the
effect of a wildcard pattern `[^...]' to be undefined.
the negation is indicated by a \(aq^\(aq, POSIX has declared the
effect of a wildcard pattern "\fI[^...]\fP" to be undefined.
.SS Character classes and Internationalization
Of course ranges were originally meant to be ASCII ranges,
so that `[\ \-%]' stands for `[\ !"#$%]' and `[a\-z]' stands
so that "\fI[\ \-%]\fP" stands for "\fI[\ !"#$%]\fP" and "\fI[a\-z]\fP" stands
for "any lowercase letter".
Some Unix implementations generalized this so that a range X\-Y
stands for the set of characters with code between the codes for
@ -172,29 +176,29 @@ category in the current locale.
[:punct:] [:space:] [:upper:] [:xdigit:]
.fi
so that one can say `[[:lower:]]' instead of `[a\-z]', and have
things work in Denmark, too, where there are three letters past `z'
so that one can say "\fI[[:lower:]]\fP" instead of "\fI[a\-z]\fP", and have
things work in Denmark, too, where there are three letters past \(aqz\(aq
in the alphabet.
These character classes are defined by the
.B LC_CTYPE
category
in the current locale.
(v) Collating symbols, like `[.ch.]' or `[.a-acute.]',
where the string between `[.' and `.]' is a collating
(v) Collating symbols, like "\fI[.ch.]\fP" or "\fI[.a-acute.]\fP",
where the string between "\fI[.\fP" and "\fI.]\fP" is a collating
element defined for the current locale.
Note that this may
be a multi-character element.
(vi) Equivalence class expressions, like `[=a=]',
where the string between `[=' and `=]' is any collating
(vi) Equivalence class expressions, like "\fI[=a=]\fP",
where the string between "\fI[=\fP" and "\fI=]\fP" is any collating
element from its equivalence class, as defined for the
current locale.
For example, `[[=a=]]' might be equivalent
For example, "\fI[[=a=]]\fP" might be equivalent
.\" FIXME . the accented 'a' characters are not rendering properly
.\" mtk May 2007
to `[aáàäâ]' (warning: Latin-1 here), that is,
to `[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]'.
to "\fI[aáàäâ]\fP" (warning: Latin-1 here), that is,
to "\fI[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]\fP".
.SH "SEE ALSO"
.BR sh (1),
.BR fnmatch (3),

View File

@ -47,26 +47,26 @@ POSIX.2 "basic" REs).
Obsolete REs mostly exist for backward compatibility in some old programs;
they will be discussed at the end.
POSIX.2 leaves some aspects of RE syntax and semantics open;
`\*(dg' marks decisions on these aspects that
"\*(dg" marks decisions on these aspects that
may not be fully portable to other POSIX.2 implementations.
.PP
A (modern) RE is one\*(dg or more non-empty\*(dg \fIbranches\fR,
separated by `|'.
separated by \(aq|\(aq.
It matches anything that matches one of the branches.
.PP
A branch is one\*(dg or more \fIpieces\fR, concatenated.
It matches a match for the first, followed by a match for the second, etc.
.PP
A piece is an \fIatom\fR possibly followed
by a single\*(dg `*', `+', `?', or \fIbound\fR.
An atom followed by `*' matches a sequence of 0 or more matches of the atom.
An atom followed by `+' matches a sequence of 1 or more matches of the atom.
An atom followed by `?' matches a sequence of 0 or 1 matches of the atom.
by a single\*(dg \(aq*\(aq, \(aq+\(aq, \(aq?\(aq, or \fIbound\fR.
An atom followed by \(aq*\(aq matches a sequence of 0 or more matches of the atom.
An atom followed by \(aq+\(aq matches a sequence of 1 or more matches of the atom.
An atom followed by \(aq?\(aq matches a sequence of 0 or 1 matches of the atom.
.PP
A \fIbound\fR is `{' followed by an unsigned decimal integer,
possibly followed by `,'
A \fIbound\fR is \(aq{\(aq followed by an unsigned decimal integer,
possibly followed by \(aq,\(aq
possibly followed by another unsigned decimal integer,
always followed by `}'.
always followed by \(aq}\(aq.
The integers must lie between 0 and
.B RE_DUP_MAX
(255\*(dg) inclusive,
@ -81,71 +81,71 @@ An atom followed by a bound
containing two integers \fIi\fR and \fIj\fR matches
a sequence of \fIi\fR through \fIj\fR (inclusive) matches of the atom.
.PP
An atom is a regular expression enclosed in `()' (matching a match for the
An atom is a regular expression enclosed in "\fI()\fP" (matching a match for the
regular expression),
an empty set of `()' (matching the null string)\*(dg,
a \fIbracket expression\fR (see below), `.'
(matching any single character), `^' (matching the null string at the
beginning of a line), `$' (matching the null string at the
end of a line), a `\e' followed by one of the characters
`^.[$()|*+?{\e'
an empty set of "\fI()\fP" (matching the null string)\*(dg,
a \fIbracket expression\fR (see below), \(aq.\(aq
(matching any single character), \(aq^\(aq (matching the null string at the
beginning of a line), \(aq$\(aq (matching the null string at the
end of a line), a \(aq\e\(aq followed by one of the characters
"\fI^.[$()|*+?{\e\fP"
(matching that character taken as an ordinary character),
a `\e' followed by any other character\*(dg
a \(aq\e\(aq followed by any other character\*(dg
(matching that character taken as an ordinary character,
as if the `\e' had not been present\*(dg),
as if the \(aq\e\(aq had not been present\*(dg),
or a single character with no other significance (matching that character).
A `{' followed by a character other than a digit is an ordinary
A \(aq{\(aq followed by a character other than a digit is an ordinary
character, not the beginning of a bound\*(dg.
It is illegal to end an RE with `\e'.
It is illegal to end an RE with \(aq\e\(aq.
.PP
A \fIbracket expression\fR is a list of characters enclosed in `[]'.
A \fIbracket expression\fR is a list of characters enclosed in "\fI[]\fP".
It normally matches any single character from the list (but see below).
If the list begins with `^',
If the list begins with \(aq^\(aq,
it matches any single character
(but see below) \fInot\fR from the rest of the list.
If two characters in the list are separated by `\-', this is shorthand
If two characters in the list are separated by \(aq\-\(aq, this is shorthand
for the full \fIrange\fR of characters between those two (inclusive) in the
collating sequence,
for example, `[0\-9]' in ASCII matches any decimal digit.
for example, "\fI[0\-9]\fP" in ASCII matches any decimal digit.
It is illegal\*(dg for two ranges to share an
endpoint, for example, `a-c-e'.
endpoint, for example, "\fIa-c-e\fP".
Ranges are very collating-sequence-dependent,
and portable programs should avoid relying on them.
.PP
To include a literal `]' in the list, make it the first character
(following a possible `^').
To include a literal `\-', make it the first or last character,
To include a literal \(aq]\(aq in the list, make it the first character
(following a possible \(aq^\(aq).
To include a literal \(aq\-\(aq, make it the first or last character,
or the second endpoint of a range.
To use a literal `\-' as the first endpoint of a range,
enclose it in `[.' and `.]' to make it a collating element (see below).
With the exception of these and some combinations using `[' (see next
paragraphs), all other special characters, including `\e', lose their
To use a literal \(aq\-\(aq as the first endpoint of a range,
enclose it in "\fI[.\fP" and "\fI.]\fP" to make it a collating element (see below).
With the exception of these and some combinations using \(aq[\(aq (see next
paragraphs), all other special characters, including \(aq\e\(aq, lose their
special significance within a bracket expression.
.PP
Within a bracket expression, a collating element (a character,
a multi-character sequence that collates as if it were a single character,
or a collating-sequence name for either)
enclosed in `[.' and `.]' stands for the
enclosed in "\fI[.\fP" and "\fI.]\fP" stands for the
sequence of characters of that collating element.
The sequence is a single element of the bracket expression's list.
A bracket expression containing a multi-character collating element
can thus match more than one character,
for example, if the collating sequence includes a `ch' collating element,
then the RE `[[.ch.]]*c' matches the first five characters
of `chchcc'.
for example, if the collating sequence includes a "ch" collating element,
then the RE "\fI[[.ch.]]*c\fP" matches the first five characters
of "chchcc".
.PP
Within a bracket expression, a collating element enclosed in `[=' and
`=]' is an equivalence class, standing for the sequences of characters
Within a bracket expression, a collating element enclosed in "\fI[=\fP" and
"\fI=]\fP" is an equivalence class, standing for the sequences of characters
of all collating elements equivalent to that one, including itself.
(If there are no other equivalent collating elements,
the treatment is as if the enclosing delimiters were `[.' and `.]'.)
the treatment is as if the enclosing delimiters were "\fI[.\fP" and "\fI.]\fP".)
For example, if o and \o'o^' are the members of an equivalence class,
then `[[=o=]]', `[[=\o'o^'=]]', and `[o\o'o^']' are all synonymous.
then "\fI[[=o=]]\fP", "\fI[[=\o'o^'=]]\fP", and "\fI[o\o'o^']\fP" are all synonymous.
An equivalence class may not\*(dg be an endpoint
of a range.
.PP
Within a bracket expression, the name of a \fIcharacter class\fR enclosed
in `[:' and `:]' stands for the list of all characters belonging to that
in "\fI[:\fP" and "\fI:]\fP" stands for the list of all characters belonging to that
class.
Standard character class names are:
.PP
@ -167,7 +167,7 @@ A character class may not be used as an endpoint of a range.
.\" The following does not seem to apply in the glibc implementation
.\" .PP
.\" There are two special cases\*(dg of bracket expressions:
.\" the bracket expressions `[[:<:]]' and `[[:>:]]' match the null string at
.\" the bracket expressions "\fI[[:<:]]\fP" and "\fI[[:>:]]\fP" match the null string at
.\" the beginning and end of a word respectively.
.\" A word is defined as a sequence of
.\" word characters
@ -198,11 +198,11 @@ their lower-level component subexpressions.
Match lengths are measured in characters, not collating elements.
A null string is considered longer than no match at all.
For example,
`bb*' matches the three middle characters of `abbbc',
`(wee|week)(knights|nights)' matches all ten characters of `weeknights',
when `(.*).*' is matched against `abc' the parenthesized subexpression
"\fIbb*\fP" matches the three middle characters of "abbbc",
"\fI(wee|week)(knights|nights)\fP" matches all ten characters of "weeknights",
when "\fI(.*).*\fP" is matched against "abc" the parenthesized subexpression
matches all three characters, and
when `(a*)*' is matched against `bc' both the whole RE and the parenthesized
when "\fI(a*)*\fP" is matched against "bc" both the whole RE and the parenthesized
subexpression match the null string.
.PP
If case-independent matching is specified,
@ -211,10 +211,10 @@ alphabet.
When an alphabetic that exists in multiple cases appears as an
ordinary character outside a bracket expression, it is effectively
transformed into a bracket expression containing both cases,
for example, `x' becomes `[xX]'.
for example, \(aqx\(aq becomes "\fI[xX]\fP".
When it appears inside a bracket expression, all case counterparts
of it are added to the bracket expression, so that, for example, `[x]'
becomes `[xX]' and `[^x]' becomes `[^xX]'.
of it are added to the bracket expression, so that, for example, "\fI[x]\fP"
becomes "\fI[xX]\fP" and "\fI[^x]\fP" becomes "\fI[^xX]\fP".
.PP
No particular limit is imposed on the length of REs\*(dg.
Programs intended to be portable should not employ REs longer
@ -223,32 +223,32 @@ as an implementation can refuse to accept such REs and remain
POSIX-compliant.
.PP
Obsolete ("basic") regular expressions differ in several respects.
`|', `+', and `?' are ordinary characters and there is no equivalent
\(aq|\(aq, \(aq+\(aq, and \(aq?\(aq are ordinary characters and there is no equivalent
for their functionality.
The delimiters for bounds are `\e{' and `\e}',
with `{' and `}' by themselves ordinary characters.
The parentheses for nested subexpressions are `\e(' and `\e)',
with `(' and `)' by themselves ordinary characters.
`^' is an ordinary character except at the beginning of the
The delimiters for bounds are "\fI\e{\fP" and "\fI\e}\fP",
with \(aq{\(aq and \(aq}\(aq by themselves ordinary characters.
The parentheses for nested subexpressions are "\fI\e(\fP" and "\fI\e)\fP",
with \(aq(\(aq and \(aq)\(aq by themselves ordinary characters.
\(aq^\(aq is an ordinary character except at the beginning of the
RE or\*(dg the beginning of a parenthesized subexpression,
`$' is an ordinary character except at the end of the
\(aq$\(aq is an ordinary character except at the end of the
RE or\*(dg the end of a parenthesized subexpression,
and `*' is an ordinary character if it appears at the beginning of the
and \(aq*\(aq is an ordinary character if it appears at the beginning of the
RE or the beginning of a parenthesized subexpression
(after a possible leading `^').
(after a possible leading \(aq^\(aq).
.PP
Finally, there is one new type of atom, a \fIback reference\fR:
`\e' followed by a non-zero decimal digit \fId\fR
\(aq\e\(aq followed by a non-zero decimal digit \fId\fR
matches the same sequence of characters
matched by the \fId\fRth parenthesized subexpression
(numbering subexpressions by the positions of their opening parentheses,
left to right),
so that, for example, `\e([bc]\e)\e1' matches `bb' or `cc' but not `bc'.
so that, for example, "\fI\e([bc]\e)\e1\fP" matches "bb" or "cc" but not "bc".
.SH BUGS
Having two kinds of REs is a botch.
.PP
The current POSIX.2 spec says that `)' is an ordinary character in
the absence of an unmatched `(';
The current POSIX.2 spec says that \(aq)\(aq is an ordinary character in
the absence of an unmatched \(aq(\(aq;
this was an unintentional result of a wording error,
and change is likely.
Avoid relying on it.
@ -257,7 +257,7 @@ Back references are a dreadful botch,
posing major problems for efficient implementations.
They are also somewhat vaguely defined
(does
`a\e(\e(b\e)*\e2\e)*d' match `abbbd'?).
"\fIa\e(\e(b\e)*\e2\e)*d\fP" match "abbbd"?).
Avoid using them.
.PP
POSIX.2's specification of case-independent matching is vague.