mirror of https://github.com/mkerrisk/man-pages
539 lines
18 KiB
Groff
539 lines
18 KiB
Groff
.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved
|
|
.TH "TR" P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual"
|
|
.\" tr
|
|
.SH NAME
|
|
tr \- translate characters
|
|
.SH SYNOPSIS
|
|
.LP
|
|
\fBtr\fP \fB[\fP\fB-c | -C\fP\fB][\fP\fB-s]\fP \fIstring1 string2\fP\fB
|
|
.br
|
|
.sp
|
|
tr -s\fP \fB[\fP\fB-c | -C\fP\fB]\fP \fIstring1\fP\fB
|
|
.br
|
|
.sp
|
|
tr -d\fP \fB[\fP\fB-c | -C\fP\fB]\fP \fIstring1\fP\fB
|
|
.br
|
|
.sp
|
|
tr -ds\fP \fB[\fP\fB-c | -C\fP\fB]\fP \fIstring1 string2\fP\fB
|
|
.br
|
|
\fP
|
|
.SH DESCRIPTION
|
|
.LP
|
|
The \fItr\fP utility shall copy the standard input to the standard
|
|
output with substitution or deletion of selected characters.
|
|
The options specified and the \fIstring1\fP and \fIstring2\fP operands
|
|
shall control translations that occur while copying
|
|
characters and single-character collating elements.
|
|
.SH OPTIONS
|
|
.LP
|
|
The \fItr\fP utility shall conform to the Base Definitions volume
|
|
of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
|
|
.LP
|
|
The following options shall be supported:
|
|
.TP 7
|
|
\fB-c\fP
|
|
Complement the set of values specified by \fIstring1\fP. See the EXTENDED
|
|
DESCRIPTION section.
|
|
.TP 7
|
|
\fB-C\fP
|
|
Complement the set of characters specified by \fIstring1\fP. See the
|
|
EXTENDED DESCRIPTION section.
|
|
.TP 7
|
|
\fB-d\fP
|
|
Delete all occurrences of input characters that are specified by \fIstring1\fP.
|
|
.TP 7
|
|
\fB-s\fP
|
|
Replace instances of repeated characters with a single character,
|
|
as described in the EXTENDED DESCRIPTION section.
|
|
.sp
|
|
.SH OPERANDS
|
|
.LP
|
|
The following operands shall be supported:
|
|
.TP 7
|
|
\fIstring1\fP,\ \fIstring2\fP
|
|
.sp
|
|
Translation control strings. Each string shall represent a set of
|
|
characters to be converted into an array of characters used for
|
|
the translation. For a detailed description of how the strings are
|
|
interpreted, see the EXTENDED DESCRIPTION section.
|
|
.sp
|
|
.SH STDIN
|
|
.LP
|
|
The standard input can be any type of file.
|
|
.SH INPUT FILES
|
|
.LP
|
|
None.
|
|
.SH ENVIRONMENT VARIABLES
|
|
.LP
|
|
The following environment variables shall affect the execution of
|
|
\fItr\fP:
|
|
.TP 7
|
|
\fILANG\fP
|
|
Provide a default value for the internationalization variables that
|
|
are unset or null. (See the Base Definitions volume of
|
|
IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables
|
|
for
|
|
the precedence of internationalization variables used to determine
|
|
the values of locale categories.)
|
|
.TP 7
|
|
\fILC_ALL\fP
|
|
If set to a non-empty string value, override the values of all the
|
|
other internationalization variables.
|
|
.TP 7
|
|
\fILC_COLLATE\fP
|
|
.sp
|
|
Determine the locale for the behavior of range expressions and equivalence
|
|
classes.
|
|
.TP 7
|
|
\fILC_CTYPE\fP
|
|
Determine the locale for the interpretation of sequences of bytes
|
|
of text data as characters (for example, single-byte as
|
|
opposed to multi-byte characters in arguments) and the behavior of
|
|
character classes.
|
|
.TP 7
|
|
\fILC_MESSAGES\fP
|
|
Determine the locale that should be used to affect the format and
|
|
contents of diagnostic messages written to standard
|
|
error.
|
|
.TP 7
|
|
\fINLSPATH\fP
|
|
Determine the location of message catalogs for the processing of \fILC_MESSAGES
|
|
\&.\fP
|
|
.sp
|
|
.SH ASYNCHRONOUS EVENTS
|
|
.LP
|
|
Default.
|
|
.SH STDOUT
|
|
.LP
|
|
The \fItr\fP output shall be identical to the input, with the exception
|
|
of the specified transformations.
|
|
.SH STDERR
|
|
.LP
|
|
The standard error shall be used only for diagnostic messages.
|
|
.SH OUTPUT FILES
|
|
.LP
|
|
None.
|
|
.SH EXTENDED DESCRIPTION
|
|
.LP
|
|
The operands \fIstring1\fP and \fIstring2\fP (if specified) define
|
|
two arrays of characters. The constructs in the following
|
|
list can be used to specify characters or single-character collating
|
|
elements. If any of the constructs result in multi-character
|
|
collating elements, \fItr\fP shall exclude, without a diagnostic,
|
|
those multi-character elements from the resulting array.
|
|
.TP 7
|
|
\fIcharacter\fP
|
|
Any character not described by one of the conventions below shall
|
|
represent itself.
|
|
.TP 7
|
|
\\\fIoctal\fP
|
|
Octal sequences can be used to represent characters with specific
|
|
coded values. An octal sequence shall consist of a backslash
|
|
followed by the longest sequence of one, two, or three-octal-digit
|
|
characters (01234567). The sequence shall cause the value whose
|
|
encoding is represented by the one, two, or three-digit octal integer
|
|
to be placed into the array. If the size of a byte on the
|
|
system is greater than nine bits, the valid escape sequence used to
|
|
represent a byte is implementation-defined. Multi-byte
|
|
characters require multiple, concatenated escape sequences of this
|
|
type, including the leading \fB'\\'\fP for each byte.
|
|
.TP 7
|
|
\\\fIcharacter\fP
|
|
The backslash-escape sequences in the Base Definitions volume of IEEE\ Std\ 1003.1-2001,
|
|
Table 5-1, Escape Sequences
|
|
and Associated Actions ( \fB'\\\\'\fP , \fB'\\a'\fP , \fB'\\b'\fP
|
|
, \fB'\\f'\fP , \fB'\\n'\fP , \fB'\\r'\fP ,
|
|
\fB'\\t'\fP , \fB'\\v'\fP ) shall be supported. The results of using
|
|
any other character, other than an octal digit, following
|
|
the backslash are unspecified.
|
|
.TP 7
|
|
\fIc\fP-\fIc\fP
|
|
In the POSIX locale, this construct shall represent the range of collating
|
|
elements between the range endpoints (as long as
|
|
neither endpoint is an octal sequence of the form \\\fIoctal\fP),
|
|
inclusive, as defined by the collation sequence. The characters
|
|
or collating elements in the range shall be placed in the array in
|
|
ascending collation sequence. If the second endpoint precedes
|
|
the starting endpoint in the collation sequence, it is unspecified
|
|
whether the range of collating elements is empty, or this
|
|
construct is treated as invalid. In locales other than the POSIX locale,
|
|
this construct has unspecified behavior.
|
|
.LP
|
|
If either or both of the range endpoints are octal sequences of the
|
|
form \\\fIoctal\fP, this shall represent the range of
|
|
specific coded values between the two range endpoints, inclusive.
|
|
.TP 7
|
|
.B :\fIclass\fP:
|
|
Represents all characters belonging to the defined character class,
|
|
as defined by the current setting of the \fILC_CTYPE\fP
|
|
locale category. The following character class names shall be accepted
|
|
when specified in \fIstring1\fP:
|
|
.TS C
|
|
center; l l l l l l.
|
|
\fBalnum\fP \fBblank\fP \fBdigit\fP \fBlower\fP \fBpunct\fP \fBupper\fP
|
|
\fBalpha\fP \fBcntrl\fP \fBgraph\fP \fBprint\fP \fBspace\fP \fBxdigit\fP
|
|
.TE
|
|
.LP
|
|
In addition, character class expressions of the form [: \fIname\fP:]
|
|
shall be recognized in those locales where the \fIname\fP
|
|
keyword has been given a \fBcharclass\fP definition in the \fILC_CTYPE\fP
|
|
category.
|
|
.LP
|
|
When both the \fB-d\fP and \fB-s\fP options are specified, any of
|
|
the character class names shall be accepted in
|
|
\fIstring2\fP. Otherwise, only character class names \fBlower\fP or
|
|
\fBupper\fP are valid in \fIstring2\fP and then only if the
|
|
corresponding character class ( \fBupper\fP and \fBlower\fP, respectively)
|
|
is specified in the same relative position in
|
|
\fIstring1\fP. Such a specification shall be interpreted as a request
|
|
for case conversion. When [: \fIlower\fP:] appears in
|
|
\fIstring1\fP and [: \fIupper\fP:] appears in \fIstring2\fP, the arrays
|
|
shall contain the characters from the \fBtoupper\fP
|
|
mapping in the \fILC_CTYPE\fP category of the current locale. When
|
|
[: \fIupper\fP:] appears in \fIstring1\fP and [:
|
|
\fIlower\fP:] appears in \fIstring2\fP, the arrays shall contain the
|
|
characters from the \fBtolower\fP mapping in the
|
|
\fILC_CTYPE\fP category of the current locale. The first character
|
|
from each mapping pair shall be in the array for \fIstring1\fP
|
|
and the second character from each mapping pair shall be in the array
|
|
for \fIstring2\fP in the same relative position.
|
|
.LP
|
|
Except for case conversion, the characters specified by a character
|
|
class expression shall be placed in the array in an
|
|
unspecified order.
|
|
.LP
|
|
If the name specified for \fIclass\fP does not define a valid character
|
|
class in the current locale, the behavior is
|
|
undefined.
|
|
.TP 7
|
|
.B =\fIequiv\fP=
|
|
Represents all characters or collating elements belonging to the same
|
|
equivalence class as \fIequiv\fP, as defined by the
|
|
current setting of the \fILC_COLLATE\fP locale category. An equivalence
|
|
class expression shall be allowed only in \fIstring1\fP,
|
|
or in \fIstring2\fP when it is being used by the combined \fB-d\fP
|
|
and \fB-s\fP options. The characters belonging to the
|
|
equivalence class shall be placed in the array in an unspecified order.
|
|
.TP 7
|
|
.B \fIx\fP*\fIn\fP
|
|
Represents \fIn\fP repeated occurrences of the character \fIx\fP.
|
|
Because this expression is used to map multiple characters
|
|
to one, it is only valid when it occurs in \fIstring2\fP. If \fIn\fP
|
|
is omitted or is zero, it shall be interpreted as large
|
|
enough to extend the \fIstring2\fP-based sequence to the length of
|
|
the \fIstring1\fP-based sequence. If \fIn\fP has a leading
|
|
zero, it shall be interpreted as an octal value. Otherwise, it shall
|
|
be interpreted as a decimal value.
|
|
.sp
|
|
.LP
|
|
When the \fB-d\fP option is not specified:
|
|
.IP " *" 3
|
|
Each input character found in the array specified by \fIstring1\fP
|
|
shall be replaced by the character in the same relative
|
|
position in the array specified by \fIstring2\fP. When the array specified
|
|
by \fIstring2\fP is shorter that the one specified by
|
|
\fIstring1\fP, the results are unspecified.
|
|
.LP
|
|
.IP " *" 3
|
|
If the \fB-C\fP option is specified, the complements of the characters
|
|
specified by \fIstring1\fP (the set of all characters
|
|
in the current character set, as defined by the current setting of
|
|
\fILC_CTYPE ,\fP except for those actually specified in the
|
|
\fIstring1\fP operand) shall be placed in the array in ascending collation
|
|
sequence, as defined by the current setting of
|
|
\fILC_COLLATE .\fP
|
|
.LP
|
|
.IP " *" 3
|
|
If the \fB-c\fP option is specified, the complement of the values
|
|
specified by \fIstring1\fP shall be placed in the array in
|
|
ascending order by binary value.
|
|
.LP
|
|
.IP " *" 3
|
|
Because the order in which characters specified by character class
|
|
expressions or equivalence class expressions is undefined,
|
|
such expressions should only be used if the intent is to map several
|
|
characters into one. An exception is case conversion, as
|
|
described previously.
|
|
.LP
|
|
.LP
|
|
When the \fB-d\fP option is specified:
|
|
.IP " *" 3
|
|
Input characters found in the array specified by \fIstring1\fP shall
|
|
be deleted.
|
|
.LP
|
|
.IP " *" 3
|
|
When the \fB-C\fP option is specified with \fB-d\fP, all characters
|
|
except those specified by \fIstring1\fP shall be deleted.
|
|
The contents of \fIstring2\fP are ignored, unless the \fB-s\fP option
|
|
is also specified.
|
|
.LP
|
|
.IP " *" 3
|
|
When the \fB-c\fP option is specified with \fB-d\fP, all values except
|
|
those specified by \fIstring1\fP shall be deleted. The
|
|
contents of \fIstring2\fP shall be ignored, unless the \fB-s\fP option
|
|
is also specified.
|
|
.LP
|
|
.IP " *" 3
|
|
The same string cannot be used for both the \fB-d\fP and the \fB-s\fP
|
|
option; when both options are specified, both
|
|
\fIstring1\fP (used for deletion) and \fIstring2\fP (used for squeezing)
|
|
shall be required.
|
|
.LP
|
|
.LP
|
|
When the \fB-s\fP option is specified, after any deletions or translations
|
|
have taken place, repeated sequences of the same
|
|
character shall be replaced by one occurrence of the same character,
|
|
if the character is found in the array specified by the last
|
|
operand. If the last operand contains a character class, such as the
|
|
following example:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr -s '[:space:]'
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
the last operand's array shall contain all of the characters in that
|
|
character class. However, in a case conversion, as
|
|
described previously, such as:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr -s '[:upper:]' '[:lower:]'
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
the last operand's array shall contain only those characters defined
|
|
as the second characters in each of the \fBtoupper\fP or
|
|
\fBtolower\fP character pairs, as appropriate.
|
|
.LP
|
|
An empty string used for \fIstring1\fP or \fIstring2\fP produces undefined
|
|
results.
|
|
.SH EXIT STATUS
|
|
.LP
|
|
The following exit values shall be returned:
|
|
.TP 7
|
|
\ 0
|
|
All input was processed successfully.
|
|
.TP 7
|
|
>0
|
|
An error occurred.
|
|
.sp
|
|
.SH CONSEQUENCES OF ERRORS
|
|
.LP
|
|
Default.
|
|
.LP
|
|
\fIThe following sections are informative.\fP
|
|
.SH APPLICATION USAGE
|
|
.LP
|
|
If necessary, \fIstring1\fP and \fIstring2\fP can be quoted to avoid
|
|
pattern matching by the shell.
|
|
.LP
|
|
If an ordinary digit (representing itself) is to follow an octal sequence,
|
|
the octal sequence must use the full three digits to
|
|
avoid ambiguity.
|
|
.LP
|
|
When \fIstring2\fP is shorter than \fIstring1\fP, a difference results
|
|
between historical System\ V and BSD systems. A BSD
|
|
system pads \fIstring2\fP with the last character found in \fIstring2\fP.
|
|
Thus, it is possible to do the following:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr 0123456789 d
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
which would translate all digits to the letter \fB'd'\fP . Since this
|
|
area is specifically unspecified in this volume of
|
|
IEEE\ Std\ 1003.1-2001, both the BSD and System\ V behaviors are allowed,
|
|
but a conforming application cannot rely on
|
|
the BSD behavior. It would have to code the example in the following
|
|
way:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr 0123456789 '[d*]'
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
It should be noted that, despite similarities in appearance, the string
|
|
operands used by \fItr\fP are not regular
|
|
expressions.
|
|
.LP
|
|
Unlike some historical implementations, this definition of the \fItr\fP
|
|
utility correctly processes NUL characters in its input
|
|
stream. NUL characters can be stripped by using:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr -d '\\000'
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.SH EXAMPLES
|
|
.IP " 1." 4
|
|
The following example creates a list of all words in \fBfile1\fP one
|
|
per line in \fBfile2\fP, where a word is taken to be a
|
|
maximal string of letters.
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr -cs "[:alpha:]" "[\\n*]" <file1 >file2
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
.IP " 2." 4
|
|
The next example translates all lowercase characters in \fBfile1\fP
|
|
to uppercase and writes the results to standard output.
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr "[:lower:]" "[:upper:]" <file1
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
.IP " 3." 4
|
|
This example uses an equivalence class to identify accented variants
|
|
of the base character \fB'e'\fP in \fBfile1\fP, which
|
|
are stripped of diacritical marks and written to \fBfile2\fP.
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr "[=e=]" e <file1 >file2
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
.SH RATIONALE
|
|
.LP
|
|
In some early proposals, an explicit option \fB-n\fP was added to
|
|
disable the historical behavior of stripping NUL characters
|
|
from the input. It was considered that automatically stripping NUL
|
|
characters from the input was not correct functionality.
|
|
However, the removal of \fB-n\fP in a later proposal does not remove
|
|
the requirement that \fItr\fP correctly process NUL
|
|
characters in its input stream. NUL characters can be stripped by
|
|
using \fItr\fP \fB-d\fP '\\000'.
|
|
.LP
|
|
Historical implementations of \fItr\fP differ widely in syntax and
|
|
behavior. For example, the BSD version has not needed the
|
|
bracket characters for the repetition sequence. The \fItr\fP utility
|
|
syntax is based more closely on the System V and XPG3 model
|
|
while attempting to accommodate historical BSD implementations. In
|
|
the case of the short \fIstring2\fP padding, the decision was
|
|
to unspecify the behavior and preserve System V and XPG3 scripts,
|
|
which might find difficulty with the BSD method. The assumption
|
|
was made that BSD users of \fItr\fP have to make accommodations to
|
|
meet the syntax defined here. Since it is possible to use the
|
|
repetition sequence to duplicate the desired behavior, whereas there
|
|
is no simple way to achieve the System V method, this was the
|
|
correct, if not desirable, approach.
|
|
.LP
|
|
The use of octal values to specify control characters, while having
|
|
historical precedents, is not portable. The introduction of
|
|
escape sequences for control characters should provide the necessary
|
|
portability. It is recognized that this may cause some
|
|
historical scripts to break.
|
|
.LP
|
|
An early proposal included support for multi-character collating elements.
|
|
It was pointed out that, while \fItr\fP does employ
|
|
some syntactical elements from REs, the aim of \fItr\fP is quite different;
|
|
ranges, for example, do not have a similar meaning
|
|
(``any of the chars in the range matches", \fIversus\fP "translate
|
|
each character in the range to the output counterpart"). As
|
|
a result, the previously included support for multi-character collating
|
|
elements has been removed. What remains are ranges in
|
|
current collation order (to support, for example, accented characters),
|
|
character classes, and equivalence classes.
|
|
.LP
|
|
In XPG3 the [: \fIclass\fP:] and [= \fIequiv\fP=] conventions are
|
|
shown with double brackets, as in RE syntax. However,
|
|
\fItr\fP does not implement RE principles; it just borrows part of
|
|
the syntax. Consequently, [: \fIclass\fP:] and [=
|
|
\fIequiv\fP=] should be regarded as syntactical elements on a par
|
|
with [ \fIx\fP* \fIn\fP], which is not an RE bracket
|
|
expression.
|
|
.LP
|
|
The standard developers will consider changes to \fItr\fP that allow
|
|
it to translate characters between different character
|
|
encodings, or they will consider providing a new utility to accomplish
|
|
this.
|
|
.LP
|
|
On historical System V systems, a range expression requires enclosing
|
|
square-brackets, such as:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr '[a-z]' '[A-Z]'
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
However, BSD-based systems did not require the brackets, and this
|
|
convention is used here to avoid breaking large numbers of BSD
|
|
scripts:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBtr a-z A-Z
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
The preceding System V script will continue to work because the brackets,
|
|
treated as regular characters, are translated to
|
|
themselves. However, any System V script that relied on \fB"a-z"\fP
|
|
representing the three characters \fB'a'\fP ,
|
|
\fB'-'\fP , and \fB'z'\fP have to be rewritten as \fB"az-"\fP .
|
|
.LP
|
|
The ISO\ POSIX-2:1993 standard had a \fB-c\fP option that behaved
|
|
similarly to the \fB-C\fP option, but did not supply
|
|
functionality equivalent to the \fB-c\fP option specified in IEEE\ Std\ 1003.1-2001.
|
|
This meant that historical practice
|
|
of being able to specify \fItr\fP \fB-d\fP\\200-\\377 (which would
|
|
delete all bytes with the top bit set) would have no effect
|
|
because, in the C locale, bytes with the values octal 200 to octal
|
|
377 are not characters.
|
|
.LP
|
|
The earlier version also said that octal sequences referred to collating
|
|
elements and could be placed adjacent to each other to
|
|
specify multi-byte characters. However, it was noted that this caused
|
|
ambiguities because \fItr\fP would not be able to tell
|
|
whether adjacent octal sequences were intending to specify multi-byte
|
|
characters or multiple single byte characters.
|
|
IEEE\ Std\ 1003.1-2001 specifies that octal sequences always refer
|
|
to single byte binary values.
|
|
.SH FUTURE DIRECTIONS
|
|
.LP
|
|
None.
|
|
.SH SEE ALSO
|
|
.LP
|
|
\fIsed\fP
|
|
.SH COPYRIGHT
|
|
Portions of this text are reprinted and reproduced in electronic form
|
|
from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
|
|
-- Portable Operating System Interface (POSIX), The Open Group Base
|
|
Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
|
|
Electrical and Electronics Engineers, Inc and The Open Group. In the
|
|
event of any discrepancy between this version and the original IEEE and
|
|
The Open Group Standard, the original IEEE and The Open Group Standard
|
|
is the referee document. The original Standard can be obtained online at
|
|
http://www.opengroup.org/unix/online.html .
|