mirror of https://github.com/mkerrisk/man-pages
Try and bring some consistency to quotes.
This commit is contained in:
parent
ce17c5d391
commit
333a424b0e
|
@ -178,11 +178,11 @@ and an optional
|
|||
The arguments must correspond properly (after type promotion) with the
|
||||
conversion specifier.
|
||||
By default, the arguments are used in the order
|
||||
given, where each `*' and each conversion specifier asks for the next
|
||||
given, where each \(aq*\(aq and each conversion specifier asks for the next
|
||||
argument (and it is an error if insufficiently many arguments are given).
|
||||
One can also specify explicitly which argument is taken,
|
||||
at each place where an argument is required, by writing `%m$' instead
|
||||
of `%' and `*m$' instead of `*', where the decimal integer m denotes
|
||||
at each place where an argument is required, by writing "%m$" instead
|
||||
of \(aq%\(aq and "*m$" instead of \(aq*\(aq, where the decimal integer m denotes
|
||||
the position in the argument list of the desired argument, indexed starting
|
||||
from 1.
|
||||
Thus,
|
||||
|
@ -204,35 +204,35 @@ printf("%2$*1$d", width, num);
|
|||
are equivalent.
|
||||
The second style allows repeated references to the
|
||||
same argument.
|
||||
The C99 standard does not include the style using `$',
|
||||
The C99 standard does not include the style using \(aq$\(aq,
|
||||
which comes from the Single Unix Specification.
|
||||
If the style using
|
||||
`$' is used, it must be used throughout for all conversions taking an
|
||||
\(aq$\(aq is used, it must be used throughout for all conversions taking an
|
||||
argument and all width and precision arguments, but it may be mixed
|
||||
with `%%' formats which do not consume an argument.
|
||||
with "%%" formats which do not consume an argument.
|
||||
There may be no
|
||||
gaps in the numbers of arguments specified using `$'; for example, if
|
||||
gaps in the numbers of arguments specified using \(aq$\(aq; for example, if
|
||||
arguments 1 and 3 are specified, argument 2 must also be specified
|
||||
somewhere in the format string.
|
||||
|
||||
For some numeric conversions a radix character (`decimal point') or
|
||||
For some numeric conversions a radix character ("decimal point") or
|
||||
thousands' grouping character is used.
|
||||
The actual character used
|
||||
depends on the
|
||||
.B LC_NUMERIC
|
||||
part of the locale.
|
||||
The POSIX locale
|
||||
uses `.' as radix character, and does not have a grouping character.
|
||||
uses \(aq.\(aq as radix character, and does not have a grouping character.
|
||||
Thus,
|
||||
.in +4n
|
||||
.nf
|
||||
|
||||
printf("%'.2f", 1234567.89);
|
||||
printf("%\(aq.2f", 1234567.89);
|
||||
|
||||
.fi
|
||||
.in
|
||||
results in `1234567.89' in the POSIX locale, in `1234567,89' in the
|
||||
nl_NL locale, and in `1.234.567,89' in the da_DK locale.
|
||||
results in "1234567.89" in the POSIX locale, in "1234567,89" in the
|
||||
nl_NL locale, and in "1.234.567,89" in the da_DK locale.
|
||||
.SS "The flag characters"
|
||||
The character % is followed by zero or more of the following flags:
|
||||
.TP
|
||||
|
@ -246,7 +246,7 @@ For
|
|||
.B x
|
||||
and
|
||||
.B X
|
||||
conversions, a non-zero result has the string `0x' (or `0X' for
|
||||
conversions, a non-zero result has the string "0x" (or "0X" for
|
||||
.B X
|
||||
conversions) prepended to it.
|
||||
For
|
||||
|
@ -323,7 +323,7 @@ overrides a
|
|||
.B \&0
|
||||
if both are given.
|
||||
.TP
|
||||
.B ' '
|
||||
.B \(aq \(aq
|
||||
(a space) A blank should be left before a positive number
|
||||
(or empty string) produced by a signed conversion.
|
||||
.TP
|
||||
|
@ -338,7 +338,7 @@ overrides a space if both are used.
|
|||
The five flag characters above are defined in the C standard.
|
||||
The SUSv2 specifies one further flag character.
|
||||
.TP
|
||||
.B '
|
||||
.B \(aq
|
||||
For decimal conversion
|
||||
.RB ( i ,
|
||||
.BR d ,
|
||||
|
@ -353,7 +353,7 @@ Note that many versions of
|
|||
.BR gcc (1)
|
||||
cannot parse this option and will issue a warning.
|
||||
SUSv2 does not
|
||||
include %'F.
|
||||
include \fI%\(aqF\fP.
|
||||
.PP
|
||||
glibc 2.2 adds one further flag character.
|
||||
.TP
|
||||
|
@ -364,7 +364,7 @@ For decimal integer conversion
|
|||
.BR u )
|
||||
the output uses the locale's alternative output digits, if any.
|
||||
For example, since glibc 2.2.3 this will give Arabic-Indic digits
|
||||
in the Persian (`fa_IR') locale.
|
||||
in the Persian ("fa_IR") locale.
|
||||
.\" outdigits keyword in locale file
|
||||
.SS "The field width"
|
||||
An optional decimal digit string (with non-zero first digit) specifying
|
||||
|
@ -372,25 +372,25 @@ a minimum field width.
|
|||
If the converted value has fewer characters
|
||||
than the field width, it will be padded with spaces on the left
|
||||
(or right, if the left-adjustment flag has been given).
|
||||
Instead of a decimal digit string one may write `*' or `*m$'
|
||||
(for some decimal integer m) to specify that the field width
|
||||
is given in the next argument, or in the m-th argument, respectively,
|
||||
Instead of a decimal digit string one may write "*" or "*m$"
|
||||
(for some decimal integer \fIm\fP) to specify that the field width
|
||||
is given in the next argument, or in the \fIm\fP-th argument, respectively,
|
||||
which must be of type
|
||||
.IR int .
|
||||
A negative field width is taken as a `\-' flag followed by a
|
||||
A negative field width is taken as a \(aq\-\(aq flag followed by a
|
||||
positive field width.
|
||||
In no case does a nonexistent or small field width cause truncation of a
|
||||
field; if the result of a conversion is wider than the field width, the
|
||||
field is expanded to contain the conversion result.
|
||||
.SS "The precision"
|
||||
An optional precision, in the form of a period (`\&.') followed by an
|
||||
An optional precision, in the form of a period (\(aq.\(aq) followed by an
|
||||
optional decimal digit string.
|
||||
Instead of a decimal digit string one may write `*' or `*m$'
|
||||
Instead of a decimal digit string one may write "*" or "*m$"
|
||||
(for some decimal integer m) to specify that the precision
|
||||
is given in the next argument, or in the m-th argument, respectively,
|
||||
which must be of type
|
||||
.IR int .
|
||||
If the precision is given as just `.', or the precision is negative,
|
||||
If the precision is given as just \(aq.\(aq, or the precision is negative,
|
||||
the precision is taken to be zero.
|
||||
This gives the minimum number of digits to appear for
|
||||
.BR d ,
|
||||
|
@ -419,7 +419,7 @@ and
|
|||
.B S
|
||||
conversions.
|
||||
.SS "The length modifier"
|
||||
Here, `integer conversion' stands for
|
||||
Here, "integer conversion" stands for
|
||||
.BR d ,
|
||||
.BR i ,
|
||||
.BR o ,
|
||||
|
@ -499,7 +499,7 @@ argument.
|
|||
(C99 allows %LF, but SUSv2 does not.)
|
||||
.TP
|
||||
.B q
|
||||
(`quad'. 4.4BSD and Linux libc5 only.
|
||||
("quad". 4.4BSD and Linux libc5 only.
|
||||
Don't use.)
|
||||
This is a synonym for
|
||||
.BR ll .
|
||||
|
@ -631,10 +631,10 @@ If a decimal point appears, at least one digit appears before it.
|
|||
.B F
|
||||
and says that character string representations for infinity and NaN
|
||||
may be made available.
|
||||
The C99 standard specifies `[\-]inf' or `[\-]infinity'
|
||||
for infinity, and a string starting with `nan' for NaN, in the case of
|
||||
The C99 standard specifies "[\-]inf" or "[\-]infinity"
|
||||
for infinity, and a string starting with "nan" for NaN, in the case of
|
||||
.B f
|
||||
conversion, and `[\-]INF' or `[\-]INFINITY' or `NAN*' in the case of
|
||||
conversion, and "[\-]INF" or "[\-]INFINITY" or "NAN*" in the case of
|
||||
.B F
|
||||
conversion.)
|
||||
.TP
|
||||
|
@ -713,7 +713,7 @@ modifier is present: The
|
|||
argument is expected to be a pointer to an array of character type (pointer
|
||||
to a string).
|
||||
Characters from the array are written up to (but not
|
||||
including) a terminating null byte ('\\0');
|
||||
including) a terminating null byte (\(aq\\0\(aq);
|
||||
if a precision is specified, no more than the number specified
|
||||
are written.
|
||||
If a precision is given, no null byte need be present;
|
||||
|
@ -781,10 +781,10 @@ Print output of
|
|||
No argument is required.
|
||||
.TP
|
||||
.B %
|
||||
A `%' is written.
|
||||
A \(aq%\(aq is written.
|
||||
No argument is converted.
|
||||
The complete conversion
|
||||
specification is `%%'.
|
||||
specification is \(aq%%\(aq.
|
||||
.SH "CONFORMING TO"
|
||||
The
|
||||
.BR fprintf (),
|
||||
|
@ -823,7 +823,7 @@ support for %D disappeared.)
|
|||
No locale-dependent radix character,
|
||||
no thousands' separator, no NaN or infinity, no %m$ and *m$.
|
||||
.PP
|
||||
Linux libc5 knows about the five C standard flags and the ' flag,
|
||||
Linux libc5 knows about the five C standard flags and the \(aq flag,
|
||||
locale, %m$ and *m$.
|
||||
It knows about the length modifiers h,l,L,Z,q, but accepts L and q
|
||||
both for \fIlong double\fP and for \fIlong long int\fP (this is a bug).
|
||||
|
@ -936,7 +936,7 @@ fprintf(stdout, "pi = %.5f\en", 4 * atan(1.0));
|
|||
.fi
|
||||
.in
|
||||
.PP
|
||||
To print a date and time in the form `Sunday, July 3, 10:02',
|
||||
To print a date and time in the form "Sunday, July 3, 10:02",
|
||||
where
|
||||
.I weekday
|
||||
and
|
||||
|
@ -974,7 +974,7 @@ With the value:
|
|||
|
||||
.fi
|
||||
.in
|
||||
one might obtain `Sonntag, 3. Juli, 10:02'.
|
||||
one might obtain "Sonntag, 3. Juli, 10:02".
|
||||
.PP
|
||||
To allocate a sufficiently large string and print into it
|
||||
(code correct for both glibc 2.0 and glibc 2.1):
|
||||
|
|
25
man3/scanf.3
25
man3/scanf.3
|
@ -159,11 +159,12 @@ This directive matches any amount of white space,
|
|||
including none, in the input.
|
||||
.TP
|
||||
\(bu
|
||||
An ordinary character (i.e., one other than white space or '%').
|
||||
An ordinary character (i.e., one other than white space or \(aq%\(aq).
|
||||
This character must exactly match the next character of input.
|
||||
.TP
|
||||
\(bu
|
||||
A conversion specification, which commences with a '%' (percent) character.
|
||||
A conversion specification,
|
||||
which commences with a \(aq%\(aq (percent) character.
|
||||
A sequence of characters from the input is converted according to
|
||||
this specification, and the result is placed in the corresponding
|
||||
.I pointer
|
||||
|
@ -176,12 +177,12 @@ Each
|
|||
.I conversion specification
|
||||
in
|
||||
.I format
|
||||
begins with either the character '%' or the character sequence
|
||||
begins with either the character \(aq%\(aq or the character sequence
|
||||
"\fB%\fP\fIn\fP\fB$\fP"
|
||||
(see below for the distinction) followed by:
|
||||
.TP
|
||||
\(bu
|
||||
An optional '*' assignment-suppression character:
|
||||
An optional \(aq*\(aq assignment-suppression character:
|
||||
.BR scanf ()
|
||||
reads input as directed by the conversion specification,
|
||||
but discards the input.
|
||||
|
@ -192,7 +193,7 @@ included in the count of successful assignments returned by
|
|||
.BR scanf ().
|
||||
.TP
|
||||
\(bu
|
||||
An optional 'a' character.
|
||||
An optional \(aqa\(aq character.
|
||||
This is used with string conversions, and relieves the caller of the
|
||||
need to allocate a corresponding buffer to hold the input: instead,
|
||||
.BR scanf ()
|
||||
|
@ -206,7 +207,7 @@ The caller should subsequently
|
|||
.BR free (3)
|
||||
this buffer when it is no longer required.
|
||||
This is a GNU extension;
|
||||
C99 employs the 'a' character as a conversion specifier (and
|
||||
C99 employs the \(aqa\(aq character as a conversion specifier (and
|
||||
it can also be used as such in the GNU implementation).
|
||||
.TP
|
||||
\(bu
|
||||
|
@ -217,7 +218,7 @@ when a non-matching character is found, whichever happens first.
|
|||
Most conversions discard initial whitespace characters (the exceptions
|
||||
are noted below),
|
||||
and these discarded characters don't count towards the maximum field width.
|
||||
String input conversions store a null terminator ('\\0')
|
||||
String input conversions store a null terminator (\(aq\\0\(aq)
|
||||
to mark the end of the input;
|
||||
the maximum field width does not include this terminator.
|
||||
.TP
|
||||
|
@ -242,7 +243,7 @@ that specifies the type of input conversion to be performed.
|
|||
.PP
|
||||
The conversion specifications in
|
||||
.I format
|
||||
are of two forms, either beginning with '%' or beginning with
|
||||
are of two forms, either beginning with \(aq%\(aq or beginning with
|
||||
"\fB%\fP\fIn\fP\fB$\fP".
|
||||
The two forms should not be mixed in the same
|
||||
.I format
|
||||
|
@ -254,7 +255,7 @@ and
|
|||
.BR %* .
|
||||
If
|
||||
.I format
|
||||
contains '%'
|
||||
contains \(aq%\(aq
|
||||
specifications then these correspond in order with successive
|
||||
.I pointer
|
||||
arguments.
|
||||
|
@ -371,11 +372,11 @@ The following
|
|||
are available:
|
||||
.TP
|
||||
.B %
|
||||
Matches a literal '%'.
|
||||
Matches a literal \(aq%\(aq.
|
||||
That is,
|
||||
.B %\&%
|
||||
in the format string matches a
|
||||
single input '%' character.
|
||||
single input \(aq%\(aq character.
|
||||
No conversion is done, and assignment does not
|
||||
occur.
|
||||
.TP
|
||||
|
@ -448,7 +449,7 @@ Equivalent to
|
|||
Matches a sequence of non-white-space characters;
|
||||
the next pointer must be a pointer to character array that is
|
||||
long enough to hold the input sequence and the terminating null
|
||||
character ('\\0'), which is added automatically.
|
||||
character (\(aq\\0\(aq), which is added automatically.
|
||||
The input string stops at white space or at the maximum field
|
||||
width, whichever occurs first.
|
||||
.TP
|
||||
|
|
80
man7/glob.7
80
man7/glob.7
|
@ -38,64 +38,68 @@ that will perform this function for a user program.
|
|||
The rules are as follows (POSIX.2, 3.13).
|
||||
.SS "Wildcard Matching"
|
||||
A string is a wildcard pattern if it contains one of the
|
||||
characters `?', `*' or `['.
|
||||
characters \(aq?\(aq, \(aq*\(aq or \(aq[\(aq.
|
||||
Globbing is the operation
|
||||
that expands a wildcard pattern into the list of pathnames
|
||||
matching the pattern.
|
||||
Matching is defined by:
|
||||
|
||||
A `?' (not between brackets) matches any single character.
|
||||
A \(aq?\(aq (not between brackets) matches any single character.
|
||||
|
||||
A `*' (not between brackets) matches any string,
|
||||
A \(aq*\(aq (not between brackets) matches any string,
|
||||
including the empty string.
|
||||
.PP
|
||||
.B "Character classes"
|
||||
.sp
|
||||
An expression `[...]' where the first character after the
|
||||
leading `[' is not an `!' matches a single character,
|
||||
An expression "\fI[...]\fP" where the first character after the
|
||||
leading \(aq[\(aq is not an \(aq!\(aq matches a single character,
|
||||
namely any of the characters enclosed by the brackets.
|
||||
The string enclosed by the brackets cannot be empty;
|
||||
therefore `]' can be allowed between the brackets, provided
|
||||
therefore \(aq]\(aq can be allowed between the brackets, provided
|
||||
that it is the first character.
|
||||
(Thus, `[][!]' matches the three characters `[', `]' and `!'.)
|
||||
(Thus, "\fI[][!]\fP" matches the
|
||||
three characters \(aq[\(aq, \(aq]\(aq and \(aq!\(aq.)
|
||||
.PP
|
||||
.B Ranges
|
||||
.sp
|
||||
There is one special convention:
|
||||
two characters separated by `\-' denote a range.
|
||||
(Thus, `[A\-Fa\-f0\-9]' is equivalent to `[ABCDEFabcdef0123456789]'.)
|
||||
One may include `\-' in its literal meaning by making it the
|
||||
two characters separated by \(aq\-\(aq denote a range.
|
||||
(Thus, "\fI[A\-Fa\-f0\-9]\fP" is equivalent to "\fI[ABCDEFabcdef0123456789]\fP".)
|
||||
One may include \(aq\-\(aq in its literal meaning by making it the
|
||||
first or last character between the brackets.
|
||||
(Thus, `[]\-]' matches just the two characters `]' and `\-',
|
||||
and `[\-\-0]' matches the three characters `\-', `.', `0', since `/'
|
||||
(Thus, "\fI[]\-]\fP" matches just the two characters \(aq]\(aq and \(aq\-\(aq,
|
||||
and "\fI[\-\-0]\fP" matches the
|
||||
three characters \(aq\-\(aq, \(aq.\(aq, \(aq0\(aq, since \(aq/\(aq
|
||||
cannot be matched.)
|
||||
.PP
|
||||
.B Complementation
|
||||
.sp
|
||||
An expression `[!...]' matches a single character, namely
|
||||
An expression "\fI[!...]\fP" matches a single character, namely
|
||||
any character that is not matched by the expression obtained
|
||||
by removing the first `!' from it.
|
||||
(Thus, `[!]a\-]' matches any single character except `]', `a' and `\-'.)
|
||||
by removing the first \(aq!\(aq from it.
|
||||
(Thus, "\fI[!]a\-]\fP" matches any
|
||||
single character except \(aq]\(aq, \(aqa\(aq and \(aq\-\(aq.)
|
||||
|
||||
One can remove the special meaning of `?', `*' and `[' by
|
||||
One can remove the special meaning of \(aq?\(aq, \(aq*\(aq and \(aq[\(aq by
|
||||
preceding them by a backslash, or, in case this is part of
|
||||
a shell command line, enclosing them in quotes.
|
||||
Between brackets these characters stand for themselves.
|
||||
Thus, `[[?*\e]' matches the four characters `[', `?', `*' and `\e'.
|
||||
Thus, "\fI[[?*\e]\fP" matches the
|
||||
four characters \(aq[\(aq, \(aq?\(aq, \(aq*\(aq and \(aq\e\(aq.
|
||||
.SS Pathnames
|
||||
Globbing is applied on each of the components of a pathname
|
||||
separately.
|
||||
A `/' in a pathname cannot be matched by a `?' or `*'
|
||||
wildcard, or by a range like `[.\-0]'.
|
||||
A \(aq/\(aq in a pathname cannot be matched by a \(aq?\(aq or \(aq*\(aq
|
||||
wildcard, or by a range like "\fI[.\-0]\fP".
|
||||
A range cannot contain an
|
||||
explicit `/' character; this would lead to a syntax error.
|
||||
explicit \(aq/\(aq character; this would lead to a syntax error.
|
||||
|
||||
If a filename starts with a `.', this character must be matched explicitly.
|
||||
(Thus, `rm *' will not remove .profile, and `tar c *' will not
|
||||
archive all your files; `tar c .' is better.)
|
||||
If a filename starts with a \(aq.\(aq, this character must be matched explicitly.
|
||||
(Thus, \fIrm\ *\fP will not remove .profile, and \fItar\ c\ *\fP will not
|
||||
archive all your files; \fItar\ c\ .\fP is better.)
|
||||
.SS "Empty Lists"
|
||||
The nice and simple rule given above: `expand a wildcard pattern
|
||||
into the list of matching pathnames' was the original Unix
|
||||
The nice and simple rule given above: "expand a wildcard pattern
|
||||
into the list of matching pathnames" was the original Unix
|
||||
definition.
|
||||
It allowed one to have patterns that expand into
|
||||
an empty list, as in
|
||||
|
@ -133,15 +137,15 @@ Note that wildcard patterns are not regular expressions,
|
|||
although they are a bit similar.
|
||||
First of all, they match
|
||||
filenames, rather than text, and secondly, the conventions
|
||||
are not the same: for example, in a regular expression `*' means zero or
|
||||
are not the same: for example, in a regular expression \(aq*\(aq means zero or
|
||||
more copies of the preceding thing.
|
||||
|
||||
Now that regular expressions have bracket expressions where
|
||||
the negation is indicated by a `^', POSIX has declared the
|
||||
effect of a wildcard pattern `[^...]' to be undefined.
|
||||
the negation is indicated by a \(aq^\(aq, POSIX has declared the
|
||||
effect of a wildcard pattern "\fI[^...]\fP" to be undefined.
|
||||
.SS Character classes and Internationalization
|
||||
Of course ranges were originally meant to be ASCII ranges,
|
||||
so that `[\ \-%]' stands for `[\ !"#$%]' and `[a\-z]' stands
|
||||
so that "\fI[\ \-%]\fP" stands for "\fI[\ !"#$%]\fP" and "\fI[a\-z]\fP" stands
|
||||
for "any lowercase letter".
|
||||
Some Unix implementations generalized this so that a range X\-Y
|
||||
stands for the set of characters with code between the codes for
|
||||
|
@ -172,29 +176,29 @@ category in the current locale.
|
|||
[:punct:] [:space:] [:upper:] [:xdigit:]
|
||||
|
||||
.fi
|
||||
so that one can say `[[:lower:]]' instead of `[a\-z]', and have
|
||||
things work in Denmark, too, where there are three letters past `z'
|
||||
so that one can say "\fI[[:lower:]]\fP" instead of "\fI[a\-z]\fP", and have
|
||||
things work in Denmark, too, where there are three letters past \(aqz\(aq
|
||||
in the alphabet.
|
||||
These character classes are defined by the
|
||||
.B LC_CTYPE
|
||||
category
|
||||
in the current locale.
|
||||
|
||||
(v) Collating symbols, like `[.ch.]' or `[.a-acute.]',
|
||||
where the string between `[.' and `.]' is a collating
|
||||
(v) Collating symbols, like "\fI[.ch.]\fP" or "\fI[.a-acute.]\fP",
|
||||
where the string between "\fI[.\fP" and "\fI.]\fP" is a collating
|
||||
element defined for the current locale.
|
||||
Note that this may
|
||||
be a multi-character element.
|
||||
|
||||
(vi) Equivalence class expressions, like `[=a=]',
|
||||
where the string between `[=' and `=]' is any collating
|
||||
(vi) Equivalence class expressions, like "\fI[=a=]\fP",
|
||||
where the string between "\fI[=\fP" and "\fI=]\fP" is any collating
|
||||
element from its equivalence class, as defined for the
|
||||
current locale.
|
||||
For example, `[[=a=]]' might be equivalent
|
||||
For example, "\fI[[=a=]]\fP" might be equivalent
|
||||
.\" FIXME . the accented 'a' characters are not rendering properly
|
||||
.\" mtk May 2007
|
||||
to `[aáàäâ]' (warning: Latin-1 here), that is,
|
||||
to `[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]'.
|
||||
to "\fI[aáàäâ]\fP" (warning: Latin-1 here), that is,
|
||||
to "\fI[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]\fP".
|
||||
.SH "SEE ALSO"
|
||||
.BR sh (1),
|
||||
.BR fnmatch (3),
|
||||
|
|
126
man7/regex.7
126
man7/regex.7
|
@ -47,26 +47,26 @@ POSIX.2 "basic" REs).
|
|||
Obsolete REs mostly exist for backward compatibility in some old programs;
|
||||
they will be discussed at the end.
|
||||
POSIX.2 leaves some aspects of RE syntax and semantics open;
|
||||
`\*(dg' marks decisions on these aspects that
|
||||
"\*(dg" marks decisions on these aspects that
|
||||
may not be fully portable to other POSIX.2 implementations.
|
||||
.PP
|
||||
A (modern) RE is one\*(dg or more non-empty\*(dg \fIbranches\fR,
|
||||
separated by `|'.
|
||||
separated by \(aq|\(aq.
|
||||
It matches anything that matches one of the branches.
|
||||
.PP
|
||||
A branch is one\*(dg or more \fIpieces\fR, concatenated.
|
||||
It matches a match for the first, followed by a match for the second, etc.
|
||||
.PP
|
||||
A piece is an \fIatom\fR possibly followed
|
||||
by a single\*(dg `*', `+', `?', or \fIbound\fR.
|
||||
An atom followed by `*' matches a sequence of 0 or more matches of the atom.
|
||||
An atom followed by `+' matches a sequence of 1 or more matches of the atom.
|
||||
An atom followed by `?' matches a sequence of 0 or 1 matches of the atom.
|
||||
by a single\*(dg \(aq*\(aq, \(aq+\(aq, \(aq?\(aq, or \fIbound\fR.
|
||||
An atom followed by \(aq*\(aq matches a sequence of 0 or more matches of the atom.
|
||||
An atom followed by \(aq+\(aq matches a sequence of 1 or more matches of the atom.
|
||||
An atom followed by \(aq?\(aq matches a sequence of 0 or 1 matches of the atom.
|
||||
.PP
|
||||
A \fIbound\fR is `{' followed by an unsigned decimal integer,
|
||||
possibly followed by `,'
|
||||
A \fIbound\fR is \(aq{\(aq followed by an unsigned decimal integer,
|
||||
possibly followed by \(aq,\(aq
|
||||
possibly followed by another unsigned decimal integer,
|
||||
always followed by `}'.
|
||||
always followed by \(aq}\(aq.
|
||||
The integers must lie between 0 and
|
||||
.B RE_DUP_MAX
|
||||
(255\*(dg) inclusive,
|
||||
|
@ -81,71 +81,71 @@ An atom followed by a bound
|
|||
containing two integers \fIi\fR and \fIj\fR matches
|
||||
a sequence of \fIi\fR through \fIj\fR (inclusive) matches of the atom.
|
||||
.PP
|
||||
An atom is a regular expression enclosed in `()' (matching a match for the
|
||||
An atom is a regular expression enclosed in "\fI()\fP" (matching a match for the
|
||||
regular expression),
|
||||
an empty set of `()' (matching the null string)\*(dg,
|
||||
a \fIbracket expression\fR (see below), `.'
|
||||
(matching any single character), `^' (matching the null string at the
|
||||
beginning of a line), `$' (matching the null string at the
|
||||
end of a line), a `\e' followed by one of the characters
|
||||
`^.[$()|*+?{\e'
|
||||
an empty set of "\fI()\fP" (matching the null string)\*(dg,
|
||||
a \fIbracket expression\fR (see below), \(aq.\(aq
|
||||
(matching any single character), \(aq^\(aq (matching the null string at the
|
||||
beginning of a line), \(aq$\(aq (matching the null string at the
|
||||
end of a line), a \(aq\e\(aq followed by one of the characters
|
||||
"\fI^.[$()|*+?{\e\fP"
|
||||
(matching that character taken as an ordinary character),
|
||||
a `\e' followed by any other character\*(dg
|
||||
a \(aq\e\(aq followed by any other character\*(dg
|
||||
(matching that character taken as an ordinary character,
|
||||
as if the `\e' had not been present\*(dg),
|
||||
as if the \(aq\e\(aq had not been present\*(dg),
|
||||
or a single character with no other significance (matching that character).
|
||||
A `{' followed by a character other than a digit is an ordinary
|
||||
A \(aq{\(aq followed by a character other than a digit is an ordinary
|
||||
character, not the beginning of a bound\*(dg.
|
||||
It is illegal to end an RE with `\e'.
|
||||
It is illegal to end an RE with \(aq\e\(aq.
|
||||
.PP
|
||||
A \fIbracket expression\fR is a list of characters enclosed in `[]'.
|
||||
A \fIbracket expression\fR is a list of characters enclosed in "\fI[]\fP".
|
||||
It normally matches any single character from the list (but see below).
|
||||
If the list begins with `^',
|
||||
If the list begins with \(aq^\(aq,
|
||||
it matches any single character
|
||||
(but see below) \fInot\fR from the rest of the list.
|
||||
If two characters in the list are separated by `\-', this is shorthand
|
||||
If two characters in the list are separated by \(aq\-\(aq, this is shorthand
|
||||
for the full \fIrange\fR of characters between those two (inclusive) in the
|
||||
collating sequence,
|
||||
for example, `[0\-9]' in ASCII matches any decimal digit.
|
||||
for example, "\fI[0\-9]\fP" in ASCII matches any decimal digit.
|
||||
It is illegal\*(dg for two ranges to share an
|
||||
endpoint, for example, `a-c-e'.
|
||||
endpoint, for example, "\fIa-c-e\fP".
|
||||
Ranges are very collating-sequence-dependent,
|
||||
and portable programs should avoid relying on them.
|
||||
.PP
|
||||
To include a literal `]' in the list, make it the first character
|
||||
(following a possible `^').
|
||||
To include a literal `\-', make it the first or last character,
|
||||
To include a literal \(aq]\(aq in the list, make it the first character
|
||||
(following a possible \(aq^\(aq).
|
||||
To include a literal \(aq\-\(aq, make it the first or last character,
|
||||
or the second endpoint of a range.
|
||||
To use a literal `\-' as the first endpoint of a range,
|
||||
enclose it in `[.' and `.]' to make it a collating element (see below).
|
||||
With the exception of these and some combinations using `[' (see next
|
||||
paragraphs), all other special characters, including `\e', lose their
|
||||
To use a literal \(aq\-\(aq as the first endpoint of a range,
|
||||
enclose it in "\fI[.\fP" and "\fI.]\fP" to make it a collating element (see below).
|
||||
With the exception of these and some combinations using \(aq[\(aq (see next
|
||||
paragraphs), all other special characters, including \(aq\e\(aq, lose their
|
||||
special significance within a bracket expression.
|
||||
.PP
|
||||
Within a bracket expression, a collating element (a character,
|
||||
a multi-character sequence that collates as if it were a single character,
|
||||
or a collating-sequence name for either)
|
||||
enclosed in `[.' and `.]' stands for the
|
||||
enclosed in "\fI[.\fP" and "\fI.]\fP" stands for the
|
||||
sequence of characters of that collating element.
|
||||
The sequence is a single element of the bracket expression's list.
|
||||
A bracket expression containing a multi-character collating element
|
||||
can thus match more than one character,
|
||||
for example, if the collating sequence includes a `ch' collating element,
|
||||
then the RE `[[.ch.]]*c' matches the first five characters
|
||||
of `chchcc'.
|
||||
for example, if the collating sequence includes a "ch" collating element,
|
||||
then the RE "\fI[[.ch.]]*c\fP" matches the first five characters
|
||||
of "chchcc".
|
||||
.PP
|
||||
Within a bracket expression, a collating element enclosed in `[=' and
|
||||
`=]' is an equivalence class, standing for the sequences of characters
|
||||
Within a bracket expression, a collating element enclosed in "\fI[=\fP" and
|
||||
"\fI=]\fP" is an equivalence class, standing for the sequences of characters
|
||||
of all collating elements equivalent to that one, including itself.
|
||||
(If there are no other equivalent collating elements,
|
||||
the treatment is as if the enclosing delimiters were `[.' and `.]'.)
|
||||
the treatment is as if the enclosing delimiters were "\fI[.\fP" and "\fI.]\fP".)
|
||||
For example, if o and \o'o^' are the members of an equivalence class,
|
||||
then `[[=o=]]', `[[=\o'o^'=]]', and `[o\o'o^']' are all synonymous.
|
||||
then "\fI[[=o=]]\fP", "\fI[[=\o'o^'=]]\fP", and "\fI[o\o'o^']\fP" are all synonymous.
|
||||
An equivalence class may not\*(dg be an endpoint
|
||||
of a range.
|
||||
.PP
|
||||
Within a bracket expression, the name of a \fIcharacter class\fR enclosed
|
||||
in `[:' and `:]' stands for the list of all characters belonging to that
|
||||
in "\fI[:\fP" and "\fI:]\fP" stands for the list of all characters belonging to that
|
||||
class.
|
||||
Standard character class names are:
|
||||
.PP
|
||||
|
@ -167,7 +167,7 @@ A character class may not be used as an endpoint of a range.
|
|||
.\" The following does not seem to apply in the glibc implementation
|
||||
.\" .PP
|
||||
.\" There are two special cases\*(dg of bracket expressions:
|
||||
.\" the bracket expressions `[[:<:]]' and `[[:>:]]' match the null string at
|
||||
.\" the bracket expressions "\fI[[:<:]]\fP" and "\fI[[:>:]]\fP" match the null string at
|
||||
.\" the beginning and end of a word respectively.
|
||||
.\" A word is defined as a sequence of
|
||||
.\" word characters
|
||||
|
@ -198,11 +198,11 @@ their lower-level component subexpressions.
|
|||
Match lengths are measured in characters, not collating elements.
|
||||
A null string is considered longer than no match at all.
|
||||
For example,
|
||||
`bb*' matches the three middle characters of `abbbc',
|
||||
`(wee|week)(knights|nights)' matches all ten characters of `weeknights',
|
||||
when `(.*).*' is matched against `abc' the parenthesized subexpression
|
||||
"\fIbb*\fP" matches the three middle characters of "abbbc",
|
||||
"\fI(wee|week)(knights|nights)\fP" matches all ten characters of "weeknights",
|
||||
when "\fI(.*).*\fP" is matched against "abc" the parenthesized subexpression
|
||||
matches all three characters, and
|
||||
when `(a*)*' is matched against `bc' both the whole RE and the parenthesized
|
||||
when "\fI(a*)*\fP" is matched against "bc" both the whole RE and the parenthesized
|
||||
subexpression match the null string.
|
||||
.PP
|
||||
If case-independent matching is specified,
|
||||
|
@ -211,10 +211,10 @@ alphabet.
|
|||
When an alphabetic that exists in multiple cases appears as an
|
||||
ordinary character outside a bracket expression, it is effectively
|
||||
transformed into a bracket expression containing both cases,
|
||||
for example, `x' becomes `[xX]'.
|
||||
for example, \(aqx\(aq becomes "\fI[xX]\fP".
|
||||
When it appears inside a bracket expression, all case counterparts
|
||||
of it are added to the bracket expression, so that, for example, `[x]'
|
||||
becomes `[xX]' and `[^x]' becomes `[^xX]'.
|
||||
of it are added to the bracket expression, so that, for example, "\fI[x]\fP"
|
||||
becomes "\fI[xX]\fP" and "\fI[^x]\fP" becomes "\fI[^xX]\fP".
|
||||
.PP
|
||||
No particular limit is imposed on the length of REs\*(dg.
|
||||
Programs intended to be portable should not employ REs longer
|
||||
|
@ -223,32 +223,32 @@ as an implementation can refuse to accept such REs and remain
|
|||
POSIX-compliant.
|
||||
.PP
|
||||
Obsolete ("basic") regular expressions differ in several respects.
|
||||
`|', `+', and `?' are ordinary characters and there is no equivalent
|
||||
\(aq|\(aq, \(aq+\(aq, and \(aq?\(aq are ordinary characters and there is no equivalent
|
||||
for their functionality.
|
||||
The delimiters for bounds are `\e{' and `\e}',
|
||||
with `{' and `}' by themselves ordinary characters.
|
||||
The parentheses for nested subexpressions are `\e(' and `\e)',
|
||||
with `(' and `)' by themselves ordinary characters.
|
||||
`^' is an ordinary character except at the beginning of the
|
||||
The delimiters for bounds are "\fI\e{\fP" and "\fI\e}\fP",
|
||||
with \(aq{\(aq and \(aq}\(aq by themselves ordinary characters.
|
||||
The parentheses for nested subexpressions are "\fI\e(\fP" and "\fI\e)\fP",
|
||||
with \(aq(\(aq and \(aq)\(aq by themselves ordinary characters.
|
||||
\(aq^\(aq is an ordinary character except at the beginning of the
|
||||
RE or\*(dg the beginning of a parenthesized subexpression,
|
||||
`$' is an ordinary character except at the end of the
|
||||
\(aq$\(aq is an ordinary character except at the end of the
|
||||
RE or\*(dg the end of a parenthesized subexpression,
|
||||
and `*' is an ordinary character if it appears at the beginning of the
|
||||
and \(aq*\(aq is an ordinary character if it appears at the beginning of the
|
||||
RE or the beginning of a parenthesized subexpression
|
||||
(after a possible leading `^').
|
||||
(after a possible leading \(aq^\(aq).
|
||||
.PP
|
||||
Finally, there is one new type of atom, a \fIback reference\fR:
|
||||
`\e' followed by a non-zero decimal digit \fId\fR
|
||||
\(aq\e\(aq followed by a non-zero decimal digit \fId\fR
|
||||
matches the same sequence of characters
|
||||
matched by the \fId\fRth parenthesized subexpression
|
||||
(numbering subexpressions by the positions of their opening parentheses,
|
||||
left to right),
|
||||
so that, for example, `\e([bc]\e)\e1' matches `bb' or `cc' but not `bc'.
|
||||
so that, for example, "\fI\e([bc]\e)\e1\fP" matches "bb" or "cc" but not "bc".
|
||||
.SH BUGS
|
||||
Having two kinds of REs is a botch.
|
||||
.PP
|
||||
The current POSIX.2 spec says that `)' is an ordinary character in
|
||||
the absence of an unmatched `(';
|
||||
The current POSIX.2 spec says that \(aq)\(aq is an ordinary character in
|
||||
the absence of an unmatched \(aq(\(aq;
|
||||
this was an unintentional result of a wording error,
|
||||
and change is likely.
|
||||
Avoid relying on it.
|
||||
|
@ -257,7 +257,7 @@ Back references are a dreadful botch,
|
|||
posing major problems for efficient implementations.
|
||||
They are also somewhat vaguely defined
|
||||
(does
|
||||
`a\e(\e(b\e)*\e2\e)*d' match `abbbd'?).
|
||||
"\fIa\e(\e(b\e)*\e2\e)*d\fP" match "abbbd"?).
|
||||
Avoid using them.
|
||||
.PP
|
||||
POSIX.2's specification of case-independent matching is vague.
|
||||
|
|
Loading…
Reference in New Issue