From 333a424b0ed691be51cc82a781822f8ae8b6fe16 Mon Sep 17 00:00:00 2001 From: Michael Kerrisk Date: Mon, 9 Jun 2008 21:03:52 +0000 Subject: [PATCH] Try and bring some consistency to quotes. --- man3/printf.3 | 70 ++++++++++++++-------------- man3/scanf.3 | 25 +++++----- man7/glob.7 | 80 +++++++++++++++++--------------- man7/regex.7 | 126 +++++++++++++++++++++++++------------------------- 4 files changed, 153 insertions(+), 148 deletions(-) diff --git a/man3/printf.3 b/man3/printf.3 index 3f5d1bc6a..94da81e82 100644 --- a/man3/printf.3 +++ b/man3/printf.3 @@ -178,11 +178,11 @@ and an optional The arguments must correspond properly (after type promotion) with the conversion specifier. By default, the arguments are used in the order -given, where each `*' and each conversion specifier asks for the next +given, where each \(aq*\(aq and each conversion specifier asks for the next argument (and it is an error if insufficiently many arguments are given). One can also specify explicitly which argument is taken, -at each place where an argument is required, by writing `%m$' instead -of `%' and `*m$' instead of `*', where the decimal integer m denotes +at each place where an argument is required, by writing "%m$" instead +of \(aq%\(aq and "*m$" instead of \(aq*\(aq, where the decimal integer m denotes the position in the argument list of the desired argument, indexed starting from 1. Thus, @@ -204,35 +204,35 @@ printf("%2$*1$d", width, num); are equivalent. The second style allows repeated references to the same argument. -The C99 standard does not include the style using `$', +The C99 standard does not include the style using \(aq$\(aq, which comes from the Single Unix Specification. If the style using -`$' is used, it must be used throughout for all conversions taking an +\(aq$\(aq is used, it must be used throughout for all conversions taking an argument and all width and precision arguments, but it may be mixed -with `%%' formats which do not consume an argument. +with "%%" formats which do not consume an argument. There may be no -gaps in the numbers of arguments specified using `$'; for example, if +gaps in the numbers of arguments specified using \(aq$\(aq; for example, if arguments 1 and 3 are specified, argument 2 must also be specified somewhere in the format string. -For some numeric conversions a radix character (`decimal point') or +For some numeric conversions a radix character ("decimal point") or thousands' grouping character is used. The actual character used depends on the .B LC_NUMERIC part of the locale. The POSIX locale -uses `.' as radix character, and does not have a grouping character. +uses \(aq.\(aq as radix character, and does not have a grouping character. Thus, .in +4n .nf - printf("%'.2f", 1234567.89); + printf("%\(aq.2f", 1234567.89); .fi .in -results in `1234567.89' in the POSIX locale, in `1234567,89' in the -nl_NL locale, and in `1.234.567,89' in the da_DK locale. +results in "1234567.89" in the POSIX locale, in "1234567,89" in the +nl_NL locale, and in "1.234.567,89" in the da_DK locale. .SS "The flag characters" The character % is followed by zero or more of the following flags: .TP @@ -246,7 +246,7 @@ For .B x and .B X -conversions, a non-zero result has the string `0x' (or `0X' for +conversions, a non-zero result has the string "0x" (or "0X" for .B X conversions) prepended to it. For @@ -323,7 +323,7 @@ overrides a .B \&0 if both are given. .TP -.B ' ' +.B \(aq \(aq (a space) A blank should be left before a positive number (or empty string) produced by a signed conversion. .TP @@ -338,7 +338,7 @@ overrides a space if both are used. The five flag characters above are defined in the C standard. The SUSv2 specifies one further flag character. .TP -.B ' +.B \(aq For decimal conversion .RB ( i , .BR d , @@ -353,7 +353,7 @@ Note that many versions of .BR gcc (1) cannot parse this option and will issue a warning. SUSv2 does not -include %'F. +include \fI%\(aqF\fP. .PP glibc 2.2 adds one further flag character. .TP @@ -364,7 +364,7 @@ For decimal integer conversion .BR u ) the output uses the locale's alternative output digits, if any. For example, since glibc 2.2.3 this will give Arabic-Indic digits -in the Persian (`fa_IR') locale. +in the Persian ("fa_IR") locale. .\" outdigits keyword in locale file .SS "The field width" An optional decimal digit string (with non-zero first digit) specifying @@ -372,25 +372,25 @@ a minimum field width. If the converted value has fewer characters than the field width, it will be padded with spaces on the left (or right, if the left-adjustment flag has been given). -Instead of a decimal digit string one may write `*' or `*m$' -(for some decimal integer m) to specify that the field width -is given in the next argument, or in the m-th argument, respectively, +Instead of a decimal digit string one may write "*" or "*m$" +(for some decimal integer \fIm\fP) to specify that the field width +is given in the next argument, or in the \fIm\fP-th argument, respectively, which must be of type .IR int . -A negative field width is taken as a `\-' flag followed by a +A negative field width is taken as a \(aq\-\(aq flag followed by a positive field width. In no case does a nonexistent or small field width cause truncation of a field; if the result of a conversion is wider than the field width, the field is expanded to contain the conversion result. .SS "The precision" -An optional precision, in the form of a period (`\&.') followed by an +An optional precision, in the form of a period (\(aq.\(aq) followed by an optional decimal digit string. -Instead of a decimal digit string one may write `*' or `*m$' +Instead of a decimal digit string one may write "*" or "*m$" (for some decimal integer m) to specify that the precision is given in the next argument, or in the m-th argument, respectively, which must be of type .IR int . -If the precision is given as just `.', or the precision is negative, +If the precision is given as just \(aq.\(aq, or the precision is negative, the precision is taken to be zero. This gives the minimum number of digits to appear for .BR d , @@ -419,7 +419,7 @@ and .B S conversions. .SS "The length modifier" -Here, `integer conversion' stands for +Here, "integer conversion" stands for .BR d , .BR i , .BR o , @@ -499,7 +499,7 @@ argument. (C99 allows %LF, but SUSv2 does not.) .TP .B q -(`quad'. 4.4BSD and Linux libc5 only. +("quad". 4.4BSD and Linux libc5 only. Don't use.) This is a synonym for .BR ll . @@ -631,10 +631,10 @@ If a decimal point appears, at least one digit appears before it. .B F and says that character string representations for infinity and NaN may be made available. -The C99 standard specifies `[\-]inf' or `[\-]infinity' -for infinity, and a string starting with `nan' for NaN, in the case of +The C99 standard specifies "[\-]inf" or "[\-]infinity" +for infinity, and a string starting with "nan" for NaN, in the case of .B f -conversion, and `[\-]INF' or `[\-]INFINITY' or `NAN*' in the case of +conversion, and "[\-]INF" or "[\-]INFINITY" or "NAN*" in the case of .B F conversion.) .TP @@ -713,7 +713,7 @@ modifier is present: The argument is expected to be a pointer to an array of character type (pointer to a string). Characters from the array are written up to (but not -including) a terminating null byte ('\\0'); +including) a terminating null byte (\(aq\\0\(aq); if a precision is specified, no more than the number specified are written. If a precision is given, no null byte need be present; @@ -781,10 +781,10 @@ Print output of No argument is required. .TP .B % -A `%' is written. +A \(aq%\(aq is written. No argument is converted. The complete conversion -specification is `%%'. +specification is \(aq%%\(aq. .SH "CONFORMING TO" The .BR fprintf (), @@ -823,7 +823,7 @@ support for %D disappeared.) No locale-dependent radix character, no thousands' separator, no NaN or infinity, no %m$ and *m$. .PP -Linux libc5 knows about the five C standard flags and the ' flag, +Linux libc5 knows about the five C standard flags and the \(aq flag, locale, %m$ and *m$. It knows about the length modifiers h,l,L,Z,q, but accepts L and q both for \fIlong double\fP and for \fIlong long int\fP (this is a bug). @@ -936,7 +936,7 @@ fprintf(stdout, "pi = %.5f\en", 4 * atan(1.0)); .fi .in .PP -To print a date and time in the form `Sunday, July 3, 10:02', +To print a date and time in the form "Sunday, July 3, 10:02", where .I weekday and @@ -974,7 +974,7 @@ With the value: .fi .in -one might obtain `Sonntag, 3. Juli, 10:02'. +one might obtain "Sonntag, 3. Juli, 10:02". .PP To allocate a sufficiently large string and print into it (code correct for both glibc 2.0 and glibc 2.1): diff --git a/man3/scanf.3 b/man3/scanf.3 index 80b99cd8d..41305b6aa 100644 --- a/man3/scanf.3 +++ b/man3/scanf.3 @@ -159,11 +159,12 @@ This directive matches any amount of white space, including none, in the input. .TP \(bu -An ordinary character (i.e., one other than white space or '%'). +An ordinary character (i.e., one other than white space or \(aq%\(aq). This character must exactly match the next character of input. .TP \(bu -A conversion specification, which commences with a '%' (percent) character. +A conversion specification, +which commences with a \(aq%\(aq (percent) character. A sequence of characters from the input is converted according to this specification, and the result is placed in the corresponding .I pointer @@ -176,12 +177,12 @@ Each .I conversion specification in .I format -begins with either the character '%' or the character sequence +begins with either the character \(aq%\(aq or the character sequence "\fB%\fP\fIn\fP\fB$\fP" (see below for the distinction) followed by: .TP \(bu -An optional '*' assignment-suppression character: +An optional \(aq*\(aq assignment-suppression character: .BR scanf () reads input as directed by the conversion specification, but discards the input. @@ -192,7 +193,7 @@ included in the count of successful assignments returned by .BR scanf (). .TP \(bu -An optional 'a' character. +An optional \(aqa\(aq character. This is used with string conversions, and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, .BR scanf () @@ -206,7 +207,7 @@ The caller should subsequently .BR free (3) this buffer when it is no longer required. This is a GNU extension; -C99 employs the 'a' character as a conversion specifier (and +C99 employs the \(aqa\(aq character as a conversion specifier (and it can also be used as such in the GNU implementation). .TP \(bu @@ -217,7 +218,7 @@ when a non-matching character is found, whichever happens first. Most conversions discard initial whitespace characters (the exceptions are noted below), and these discarded characters don't count towards the maximum field width. -String input conversions store a null terminator ('\\0') +String input conversions store a null terminator (\(aq\\0\(aq) to mark the end of the input; the maximum field width does not include this terminator. .TP @@ -242,7 +243,7 @@ that specifies the type of input conversion to be performed. .PP The conversion specifications in .I format -are of two forms, either beginning with '%' or beginning with +are of two forms, either beginning with \(aq%\(aq or beginning with "\fB%\fP\fIn\fP\fB$\fP". The two forms should not be mixed in the same .I format @@ -254,7 +255,7 @@ and .BR %* . If .I format -contains '%' +contains \(aq%\(aq specifications then these correspond in order with successive .I pointer arguments. @@ -371,11 +372,11 @@ The following are available: .TP .B % -Matches a literal '%'. +Matches a literal \(aq%\(aq. That is, .B %\&% in the format string matches a -single input '%' character. +single input \(aq%\(aq character. No conversion is done, and assignment does not occur. .TP @@ -448,7 +449,7 @@ Equivalent to Matches a sequence of non-white-space characters; the next pointer must be a pointer to character array that is long enough to hold the input sequence and the terminating null -character ('\\0'), which is added automatically. +character (\(aq\\0\(aq), which is added automatically. The input string stops at white space or at the maximum field width, whichever occurs first. .TP diff --git a/man7/glob.7 b/man7/glob.7 index 268e48379..75ad8c006 100644 --- a/man7/glob.7 +++ b/man7/glob.7 @@ -38,64 +38,68 @@ that will perform this function for a user program. The rules are as follows (POSIX.2, 3.13). .SS "Wildcard Matching" A string is a wildcard pattern if it contains one of the -characters `?', `*' or `['. +characters \(aq?\(aq, \(aq*\(aq or \(aq[\(aq. Globbing is the operation that expands a wildcard pattern into the list of pathnames matching the pattern. Matching is defined by: -A `?' (not between brackets) matches any single character. +A \(aq?\(aq (not between brackets) matches any single character. -A `*' (not between brackets) matches any string, +A \(aq*\(aq (not between brackets) matches any string, including the empty string. .PP .B "Character classes" .sp -An expression `[...]' where the first character after the -leading `[' is not an `!' matches a single character, +An expression "\fI[...]\fP" where the first character after the +leading \(aq[\(aq is not an \(aq!\(aq matches a single character, namely any of the characters enclosed by the brackets. The string enclosed by the brackets cannot be empty; -therefore `]' can be allowed between the brackets, provided +therefore \(aq]\(aq can be allowed between the brackets, provided that it is the first character. -(Thus, `[][!]' matches the three characters `[', `]' and `!'.) +(Thus, "\fI[][!]\fP" matches the +three characters \(aq[\(aq, \(aq]\(aq and \(aq!\(aq.) .PP .B Ranges .sp There is one special convention: -two characters separated by `\-' denote a range. -(Thus, `[A\-Fa\-f0\-9]' is equivalent to `[ABCDEFabcdef0123456789]'.) -One may include `\-' in its literal meaning by making it the +two characters separated by \(aq\-\(aq denote a range. +(Thus, "\fI[A\-Fa\-f0\-9]\fP" is equivalent to "\fI[ABCDEFabcdef0123456789]\fP".) +One may include \(aq\-\(aq in its literal meaning by making it the first or last character between the brackets. -(Thus, `[]\-]' matches just the two characters `]' and `\-', -and `[\-\-0]' matches the three characters `\-', `.', `0', since `/' +(Thus, "\fI[]\-]\fP" matches just the two characters \(aq]\(aq and \(aq\-\(aq, +and "\fI[\-\-0]\fP" matches the +three characters \(aq\-\(aq, \(aq.\(aq, \(aq0\(aq, since \(aq/\(aq cannot be matched.) .PP .B Complementation .sp -An expression `[!...]' matches a single character, namely +An expression "\fI[!...]\fP" matches a single character, namely any character that is not matched by the expression obtained -by removing the first `!' from it. -(Thus, `[!]a\-]' matches any single character except `]', `a' and `\-'.) +by removing the first \(aq!\(aq from it. +(Thus, "\fI[!]a\-]\fP" matches any +single character except \(aq]\(aq, \(aqa\(aq and \(aq\-\(aq.) -One can remove the special meaning of `?', `*' and `[' by +One can remove the special meaning of \(aq?\(aq, \(aq*\(aq and \(aq[\(aq by preceding them by a backslash, or, in case this is part of a shell command line, enclosing them in quotes. Between brackets these characters stand for themselves. -Thus, `[[?*\e]' matches the four characters `[', `?', `*' and `\e'. +Thus, "\fI[[?*\e]\fP" matches the +four characters \(aq[\(aq, \(aq?\(aq, \(aq*\(aq and \(aq\e\(aq. .SS Pathnames Globbing is applied on each of the components of a pathname separately. -A `/' in a pathname cannot be matched by a `?' or `*' -wildcard, or by a range like `[.\-0]'. +A \(aq/\(aq in a pathname cannot be matched by a \(aq?\(aq or \(aq*\(aq +wildcard, or by a range like "\fI[.\-0]\fP". A range cannot contain an -explicit `/' character; this would lead to a syntax error. +explicit \(aq/\(aq character; this would lead to a syntax error. -If a filename starts with a `.', this character must be matched explicitly. -(Thus, `rm *' will not remove .profile, and `tar c *' will not -archive all your files; `tar c .' is better.) +If a filename starts with a \(aq.\(aq, this character must be matched explicitly. +(Thus, \fIrm\ *\fP will not remove .profile, and \fItar\ c\ *\fP will not +archive all your files; \fItar\ c\ .\fP is better.) .SS "Empty Lists" -The nice and simple rule given above: `expand a wildcard pattern -into the list of matching pathnames' was the original Unix +The nice and simple rule given above: "expand a wildcard pattern +into the list of matching pathnames" was the original Unix definition. It allowed one to have patterns that expand into an empty list, as in @@ -133,15 +137,15 @@ Note that wildcard patterns are not regular expressions, although they are a bit similar. First of all, they match filenames, rather than text, and secondly, the conventions -are not the same: for example, in a regular expression `*' means zero or +are not the same: for example, in a regular expression \(aq*\(aq means zero or more copies of the preceding thing. Now that regular expressions have bracket expressions where -the negation is indicated by a `^', POSIX has declared the -effect of a wildcard pattern `[^...]' to be undefined. +the negation is indicated by a \(aq^\(aq, POSIX has declared the +effect of a wildcard pattern "\fI[^...]\fP" to be undefined. .SS Character classes and Internationalization Of course ranges were originally meant to be ASCII ranges, -so that `[\ \-%]' stands for `[\ !"#$%]' and `[a\-z]' stands +so that "\fI[\ \-%]\fP" stands for "\fI[\ !"#$%]\fP" and "\fI[a\-z]\fP" stands for "any lowercase letter". Some Unix implementations generalized this so that a range X\-Y stands for the set of characters with code between the codes for @@ -172,29 +176,29 @@ category in the current locale. [:punct:] [:space:] [:upper:] [:xdigit:] .fi -so that one can say `[[:lower:]]' instead of `[a\-z]', and have -things work in Denmark, too, where there are three letters past `z' +so that one can say "\fI[[:lower:]]\fP" instead of "\fI[a\-z]\fP", and have +things work in Denmark, too, where there are three letters past \(aqz\(aq in the alphabet. These character classes are defined by the .B LC_CTYPE category in the current locale. -(v) Collating symbols, like `[.ch.]' or `[.a-acute.]', -where the string between `[.' and `.]' is a collating +(v) Collating symbols, like "\fI[.ch.]\fP" or "\fI[.a-acute.]\fP", +where the string between "\fI[.\fP" and "\fI.]\fP" is a collating element defined for the current locale. Note that this may be a multi-character element. -(vi) Equivalence class expressions, like `[=a=]', -where the string between `[=' and `=]' is any collating +(vi) Equivalence class expressions, like "\fI[=a=]\fP", +where the string between "\fI[=\fP" and "\fI=]\fP" is any collating element from its equivalence class, as defined for the current locale. -For example, `[[=a=]]' might be equivalent +For example, "\fI[[=a=]]\fP" might be equivalent .\" FIXME . the accented 'a' characters are not rendering properly .\" mtk May 2007 -to `[aáàäâ]' (warning: Latin-1 here), that is, -to `[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]'. +to "\fI[aáàäâ]\fP" (warning: Latin-1 here), that is, +to "\fI[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]\fP". .SH "SEE ALSO" .BR sh (1), .BR fnmatch (3), diff --git a/man7/regex.7 b/man7/regex.7 index 644a0c8a9..440cbde92 100644 --- a/man7/regex.7 +++ b/man7/regex.7 @@ -47,26 +47,26 @@ POSIX.2 "basic" REs). Obsolete REs mostly exist for backward compatibility in some old programs; they will be discussed at the end. POSIX.2 leaves some aspects of RE syntax and semantics open; -`\*(dg' marks decisions on these aspects that +"\*(dg" marks decisions on these aspects that may not be fully portable to other POSIX.2 implementations. .PP A (modern) RE is one\*(dg or more non-empty\*(dg \fIbranches\fR, -separated by `|'. +separated by \(aq|\(aq. It matches anything that matches one of the branches. .PP A branch is one\*(dg or more \fIpieces\fR, concatenated. It matches a match for the first, followed by a match for the second, etc. .PP A piece is an \fIatom\fR possibly followed -by a single\*(dg `*', `+', `?', or \fIbound\fR. -An atom followed by `*' matches a sequence of 0 or more matches of the atom. -An atom followed by `+' matches a sequence of 1 or more matches of the atom. -An atom followed by `?' matches a sequence of 0 or 1 matches of the atom. +by a single\*(dg \(aq*\(aq, \(aq+\(aq, \(aq?\(aq, or \fIbound\fR. +An atom followed by \(aq*\(aq matches a sequence of 0 or more matches of the atom. +An atom followed by \(aq+\(aq matches a sequence of 1 or more matches of the atom. +An atom followed by \(aq?\(aq matches a sequence of 0 or 1 matches of the atom. .PP -A \fIbound\fR is `{' followed by an unsigned decimal integer, -possibly followed by `,' +A \fIbound\fR is \(aq{\(aq followed by an unsigned decimal integer, +possibly followed by \(aq,\(aq possibly followed by another unsigned decimal integer, -always followed by `}'. +always followed by \(aq}\(aq. The integers must lie between 0 and .B RE_DUP_MAX (255\*(dg) inclusive, @@ -81,71 +81,71 @@ An atom followed by a bound containing two integers \fIi\fR and \fIj\fR matches a sequence of \fIi\fR through \fIj\fR (inclusive) matches of the atom. .PP -An atom is a regular expression enclosed in `()' (matching a match for the +An atom is a regular expression enclosed in "\fI()\fP" (matching a match for the regular expression), -an empty set of `()' (matching the null string)\*(dg, -a \fIbracket expression\fR (see below), `.' -(matching any single character), `^' (matching the null string at the -beginning of a line), `$' (matching the null string at the -end of a line), a `\e' followed by one of the characters -`^.[$()|*+?{\e' +an empty set of "\fI()\fP" (matching the null string)\*(dg, +a \fIbracket expression\fR (see below), \(aq.\(aq +(matching any single character), \(aq^\(aq (matching the null string at the +beginning of a line), \(aq$\(aq (matching the null string at the +end of a line), a \(aq\e\(aq followed by one of the characters +"\fI^.[$()|*+?{\e\fP" (matching that character taken as an ordinary character), -a `\e' followed by any other character\*(dg +a \(aq\e\(aq followed by any other character\*(dg (matching that character taken as an ordinary character, -as if the `\e' had not been present\*(dg), +as if the \(aq\e\(aq had not been present\*(dg), or a single character with no other significance (matching that character). -A `{' followed by a character other than a digit is an ordinary +A \(aq{\(aq followed by a character other than a digit is an ordinary character, not the beginning of a bound\*(dg. -It is illegal to end an RE with `\e'. +It is illegal to end an RE with \(aq\e\(aq. .PP -A \fIbracket expression\fR is a list of characters enclosed in `[]'. +A \fIbracket expression\fR is a list of characters enclosed in "\fI[]\fP". It normally matches any single character from the list (but see below). -If the list begins with `^', +If the list begins with \(aq^\(aq, it matches any single character (but see below) \fInot\fR from the rest of the list. -If two characters in the list are separated by `\-', this is shorthand +If two characters in the list are separated by \(aq\-\(aq, this is shorthand for the full \fIrange\fR of characters between those two (inclusive) in the collating sequence, -for example, `[0\-9]' in ASCII matches any decimal digit. +for example, "\fI[0\-9]\fP" in ASCII matches any decimal digit. It is illegal\*(dg for two ranges to share an -endpoint, for example, `a-c-e'. +endpoint, for example, "\fIa-c-e\fP". Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them. .PP -To include a literal `]' in the list, make it the first character -(following a possible `^'). -To include a literal `\-', make it the first or last character, +To include a literal \(aq]\(aq in the list, make it the first character +(following a possible \(aq^\(aq). +To include a literal \(aq\-\(aq, make it the first or last character, or the second endpoint of a range. -To use a literal `\-' as the first endpoint of a range, -enclose it in `[.' and `.]' to make it a collating element (see below). -With the exception of these and some combinations using `[' (see next -paragraphs), all other special characters, including `\e', lose their +To use a literal \(aq\-\(aq as the first endpoint of a range, +enclose it in "\fI[.\fP" and "\fI.]\fP" to make it a collating element (see below). +With the exception of these and some combinations using \(aq[\(aq (see next +paragraphs), all other special characters, including \(aq\e\(aq, lose their special significance within a bracket expression. .PP Within a bracket expression, a collating element (a character, a multi-character sequence that collates as if it were a single character, or a collating-sequence name for either) -enclosed in `[.' and `.]' stands for the +enclosed in "\fI[.\fP" and "\fI.]\fP" stands for the sequence of characters of that collating element. The sequence is a single element of the bracket expression's list. A bracket expression containing a multi-character collating element can thus match more than one character, -for example, if the collating sequence includes a `ch' collating element, -then the RE `[[.ch.]]*c' matches the first five characters -of `chchcc'. +for example, if the collating sequence includes a "ch" collating element, +then the RE "\fI[[.ch.]]*c\fP" matches the first five characters +of "chchcc". .PP -Within a bracket expression, a collating element enclosed in `[=' and -`=]' is an equivalence class, standing for the sequences of characters +Within a bracket expression, a collating element enclosed in "\fI[=\fP" and +"\fI=]\fP" is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, -the treatment is as if the enclosing delimiters were `[.' and `.]'.) +the treatment is as if the enclosing delimiters were "\fI[.\fP" and "\fI.]\fP".) For example, if o and \o'o^' are the members of an equivalence class, -then `[[=o=]]', `[[=\o'o^'=]]', and `[o\o'o^']' are all synonymous. +then "\fI[[=o=]]\fP", "\fI[[=\o'o^'=]]\fP", and "\fI[o\o'o^']\fP" are all synonymous. An equivalence class may not\*(dg be an endpoint of a range. .PP Within a bracket expression, the name of a \fIcharacter class\fR enclosed -in `[:' and `:]' stands for the list of all characters belonging to that +in "\fI[:\fP" and "\fI:]\fP" stands for the list of all characters belonging to that class. Standard character class names are: .PP @@ -167,7 +167,7 @@ A character class may not be used as an endpoint of a range. .\" The following does not seem to apply in the glibc implementation .\" .PP .\" There are two special cases\*(dg of bracket expressions: -.\" the bracket expressions `[[:<:]]' and `[[:>:]]' match the null string at +.\" the bracket expressions "\fI[[:<:]]\fP" and "\fI[[:>:]]\fP" match the null string at .\" the beginning and end of a word respectively. .\" A word is defined as a sequence of .\" word characters @@ -198,11 +198,11 @@ their lower-level component subexpressions. Match lengths are measured in characters, not collating elements. A null string is considered longer than no match at all. For example, -`bb*' matches the three middle characters of `abbbc', -`(wee|week)(knights|nights)' matches all ten characters of `weeknights', -when `(.*).*' is matched against `abc' the parenthesized subexpression +"\fIbb*\fP" matches the three middle characters of "abbbc", +"\fI(wee|week)(knights|nights)\fP" matches all ten characters of "weeknights", +when "\fI(.*).*\fP" is matched against "abc" the parenthesized subexpression matches all three characters, and -when `(a*)*' is matched against `bc' both the whole RE and the parenthesized +when "\fI(a*)*\fP" is matched against "bc" both the whole RE and the parenthesized subexpression match the null string. .PP If case-independent matching is specified, @@ -211,10 +211,10 @@ alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, -for example, `x' becomes `[xX]'. +for example, \(aqx\(aq becomes "\fI[xX]\fP". When it appears inside a bracket expression, all case counterparts -of it are added to the bracket expression, so that, for example, `[x]' -becomes `[xX]' and `[^x]' becomes `[^xX]'. +of it are added to the bracket expression, so that, for example, "\fI[x]\fP" +becomes "\fI[xX]\fP" and "\fI[^x]\fP" becomes "\fI[^xX]\fP". .PP No particular limit is imposed on the length of REs\*(dg. Programs intended to be portable should not employ REs longer @@ -223,32 +223,32 @@ as an implementation can refuse to accept such REs and remain POSIX-compliant. .PP Obsolete ("basic") regular expressions differ in several respects. -`|', `+', and `?' are ordinary characters and there is no equivalent +\(aq|\(aq, \(aq+\(aq, and \(aq?\(aq are ordinary characters and there is no equivalent for their functionality. -The delimiters for bounds are `\e{' and `\e}', -with `{' and `}' by themselves ordinary characters. -The parentheses for nested subexpressions are `\e(' and `\e)', -with `(' and `)' by themselves ordinary characters. -`^' is an ordinary character except at the beginning of the +The delimiters for bounds are "\fI\e{\fP" and "\fI\e}\fP", +with \(aq{\(aq and \(aq}\(aq by themselves ordinary characters. +The parentheses for nested subexpressions are "\fI\e(\fP" and "\fI\e)\fP", +with \(aq(\(aq and \(aq)\(aq by themselves ordinary characters. +\(aq^\(aq is an ordinary character except at the beginning of the RE or\*(dg the beginning of a parenthesized subexpression, -`$' is an ordinary character except at the end of the +\(aq$\(aq is an ordinary character except at the end of the RE or\*(dg the end of a parenthesized subexpression, -and `*' is an ordinary character if it appears at the beginning of the +and \(aq*\(aq is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression -(after a possible leading `^'). +(after a possible leading \(aq^\(aq). .PP Finally, there is one new type of atom, a \fIback reference\fR: -`\e' followed by a non-zero decimal digit \fId\fR +\(aq\e\(aq followed by a non-zero decimal digit \fId\fR matches the same sequence of characters matched by the \fId\fRth parenthesized subexpression (numbering subexpressions by the positions of their opening parentheses, left to right), -so that, for example, `\e([bc]\e)\e1' matches `bb' or `cc' but not `bc'. +so that, for example, "\fI\e([bc]\e)\e1\fP" matches "bb" or "cc" but not "bc". .SH BUGS Having two kinds of REs is a botch. .PP -The current POSIX.2 spec says that `)' is an ordinary character in -the absence of an unmatched `('; +The current POSIX.2 spec says that \(aq)\(aq is an ordinary character in +the absence of an unmatched \(aq(\(aq; this was an unintentional result of a wording error, and change is likely. Avoid relying on it. @@ -257,7 +257,7 @@ Back references are a dreadful botch, posing major problems for efficient implementations. They are also somewhat vaguely defined (does -`a\e(\e(b\e)*\e2\e)*d' match `abbbd'?). +"\fIa\e(\e(b\e)*\e2\e)*d\fP" match "abbbd"?). Avoid using them. .PP POSIX.2's specification of case-independent matching is vague.