mirror of https://github.com/mkerrisk/man-pages
513 lines
16 KiB
Groff
513 lines
16 KiB
Groff
.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved
|
|
.TH "SORT" P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual"
|
|
.\" sort
|
|
.SH NAME
|
|
sort \- sort, merge, or sequence check text files
|
|
.SH SYNOPSIS
|
|
.LP
|
|
\fBsort\fP \fB[\fP\fB-m\fP\fB][\fP\fB-o\fP \fIoutput\fP\fB][\fP\fB-bdfinru\fP\fB][\fP\fB-t\fP
|
|
\fIchar\fP\fB][\fP\fB-k\fP \fIkeydef\fP\fB]\fP\fB...\fP \fB[\fP\fIfile\fP\fB...\fP\fB]\fP\fB
|
|
.br
|
|
.sp
|
|
sort -c\fP \fB[\fP\fB-bdfinru\fP\fB][\fP\fB-t\fP \fIchar\fP\fB][\fP\fB-k\fP
|
|
\fIkeydef\fP\fB][\fP\fIfile\fP\fB]\fP\fB
|
|
.br
|
|
\fP
|
|
.SH DESCRIPTION
|
|
.LP
|
|
The \fIsort\fP utility shall perform one of the following functions:
|
|
.IP " 1." 4
|
|
Sort lines of all the named files together and write the result to
|
|
the specified output.
|
|
.LP
|
|
.IP " 2." 4
|
|
Merge lines of all the named (presorted) files together and write
|
|
the result to the specified output.
|
|
.LP
|
|
.IP " 3." 4
|
|
Check that a single input file is correctly presorted.
|
|
.LP
|
|
.LP
|
|
Comparisons shall be based on one or more sort keys extracted from
|
|
each line of input (or, if no sort keys are specified, the
|
|
entire line up to, but not including, the terminating <newline>),
|
|
and shall be performed using the collating sequence of the
|
|
current locale.
|
|
.SH OPTIONS
|
|
.LP
|
|
The \fIsort\fP utility shall conform to the Base Definitions volume
|
|
of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines,
|
|
and the \fB-k\fP \fIkeydef\fP option should
|
|
follow the \fB-b\fP, \fB-d\fP, \fB-f\fP, \fB-i\fP, \fB-n\fP, and \fB-r\fP
|
|
options.
|
|
.LP
|
|
The following options shall be supported:
|
|
.TP 7
|
|
\fB-c\fP
|
|
Check that the single input file is ordered as specified by the arguments
|
|
and the collating sequence of the current locale. No
|
|
output shall be produced; only the exit code shall be affected.
|
|
.TP 7
|
|
\fB-m\fP
|
|
Merge only; the input file shall be assumed to be already sorted.
|
|
.TP 7
|
|
\fB-o\ \fP \fIoutput\fP
|
|
Specify the name of an output file to be used instead of the standard
|
|
output. This file can be the same as one of the input
|
|
\fIfile\fPs.
|
|
.TP 7
|
|
\fB-u\fP
|
|
Unique: suppress all but one in each set of lines having equal keys.
|
|
If used with the \fB-c\fP option, check that there are no
|
|
lines with duplicate keys, in addition to checking that the input
|
|
file is sorted.
|
|
.sp
|
|
.LP
|
|
The following options shall override the default ordering rules. When
|
|
ordering options appear independent of any key field
|
|
specifications, the requested field ordering rules shall be applied
|
|
globally to all sort keys. When attached to a specific key (see
|
|
\fB-k\fP), the specified ordering options shall override all global
|
|
ordering options for that key.
|
|
.TP 7
|
|
\fB-d\fP
|
|
Specify that only <blank>s and alphanumeric characters, according
|
|
to the current setting of \fILC_CTYPE ,\fP shall be
|
|
significant in comparisons. The behavior is undefined for a sort key
|
|
to which \fB-i\fP or \fB-n\fP also applies.
|
|
.TP 7
|
|
\fB-f\fP
|
|
Consider all lowercase characters that have uppercase equivalents,
|
|
according to the current setting of \fILC_CTYPE ,\fP to be
|
|
the uppercase equivalent for the purposes of comparison.
|
|
.TP 7
|
|
\fB-i\fP
|
|
Ignore all characters that are non-printable, according to the current
|
|
setting of \fILC_CTYPE .\fP
|
|
.TP 7
|
|
\fB-n\fP
|
|
Restrict the sort key to an initial numeric string, consisting of
|
|
optional <blank>s, optional minus sign, and zero or
|
|
more digits with an optional radix character and thousands separators
|
|
(as defined in the current locale), which shall be sorted by
|
|
arithmetic value. An empty digit string shall be treated as zero.
|
|
Leading zeros and signs on zeros shall not affect ordering.
|
|
.TP 7
|
|
\fB-r\fP
|
|
Reverse the sense of comparisons.
|
|
.sp
|
|
.LP
|
|
The treatment of field separators can be altered using the options:
|
|
.TP 7
|
|
\fB-b\fP
|
|
Ignore leading <blank>s when determining the starting and ending positions
|
|
of a restricted sort key. If the \fB-b\fP
|
|
option is specified before the first \fB-k\fP option, it shall be
|
|
applied to all \fB-k\fP options. Otherwise, the \fB-b\fP
|
|
option can be attached independently to each \fB-k\fP \fIfield_start\fP
|
|
or \fIfield_end\fP option-argument (see below).
|
|
.TP 7
|
|
\fB-t\ \fP \fIchar\fP
|
|
Use \fIchar\fP as the field separator character; \fIchar\fP shall
|
|
not be considered to be part of a field (although it can be
|
|
included in a sort key). Each occurrence of \fIchar\fP shall be significant
|
|
(for example, <\fIchar\fP><\fIchar\fP>
|
|
delimits an empty field). If \fB-t\fP is not specified, <blank>s shall
|
|
be used as default field separators; each maximal
|
|
non-empty sequence of <blank>s that follows a non- <blank> shall be
|
|
a field separator.
|
|
.sp
|
|
.LP
|
|
Sort keys can be specified using the options:
|
|
.TP 7
|
|
\fB-k\ \fP \fIkeydef\fP
|
|
The \fIkeydef\fP argument is a restricted sort key field definition.
|
|
The format of this definition is:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fIfield_start\fP\fB[\fP\fItype\fP\fB][\fP\fB,\fP\fIfield_end\fP\fB[\fP\fItype\fP\fB]]\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
where \fIfield_start\fP and \fIfield_end\fP define a key field restricted
|
|
to a portion of the line (see the EXTENDED
|
|
DESCRIPTION section), and \fItype\fP is a modifier from the list of
|
|
characters \fB'b'\fP , \fB'd'\fP , \fB'f'\fP ,
|
|
\fB'i'\fP , \fB'n'\fP , \fB'r'\fP . The \fB'b'\fP modifier shall behave
|
|
like the \fB-b\fP option, but shall apply only
|
|
to the \fIfield_start\fP or \fIfield_end\fP to which it is attached.
|
|
The other modifiers shall behave like the corresponding
|
|
options, but shall apply only to the key field to which they are attached;
|
|
they shall have this effect if specified with
|
|
\fIfield_start\fP, \fIfield_end\fP, or both. If any modifier is attached
|
|
to a \fIfield_start\fP or to a \fIfield_end\fP, no
|
|
option shall apply to either. Implementations shall support at least
|
|
nine occurrences of the \fB-k\fP option, which shall be
|
|
significant in command line order. If no \fB-k\fP option is specified,
|
|
a default sort key of the entire line shall be used.
|
|
.LP
|
|
When there are multiple key fields, later keys shall be compared only
|
|
after all earlier keys compare equal. Except when the
|
|
\fB-u\fP option is specified, lines that otherwise compare equal shall
|
|
be ordered as if none of the options \fB-d\fP, \fB-f\fP,
|
|
\fB-i\fP, \fB-n\fP, or \fB-k\fP were present (but with \fB-r\fP still
|
|
in effect, if it was specified) and with all bytes in the
|
|
lines significant to the comparison. The order in which lines that
|
|
still compare equal are written is unspecified.
|
|
.sp
|
|
.SH OPERANDS
|
|
.LP
|
|
The following operand shall be supported:
|
|
.TP 7
|
|
\fIfile\fP
|
|
A pathname of a file to be sorted, merged, or checked. If no \fIfile\fP
|
|
operands are specified, or if a \fIfile\fP operand is
|
|
\fB'-'\fP , the standard input shall be used.
|
|
.sp
|
|
.SH STDIN
|
|
.LP
|
|
The standard input shall be used only if no \fIfile\fP operands are
|
|
specified, or if a \fIfile\fP operand is \fB'-'\fP .
|
|
See the INPUT FILES section.
|
|
.SH INPUT FILES
|
|
.LP
|
|
The input files shall be text files, except that the \fIsort\fP utility
|
|
shall add a <newline> to the end of a file ending
|
|
with an incomplete last line.
|
|
.SH ENVIRONMENT VARIABLES
|
|
.LP
|
|
The following environment variables shall affect the execution of
|
|
\fIsort\fP:
|
|
.TP 7
|
|
\fILANG\fP
|
|
Provide a default value for the internationalization variables that
|
|
are unset or null. (See the Base Definitions volume of
|
|
IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables
|
|
for
|
|
the precedence of internationalization variables used to determine
|
|
the values of locale categories.)
|
|
.TP 7
|
|
\fILC_ALL\fP
|
|
If set to a non-empty string value, override the values of all the
|
|
other internationalization variables.
|
|
.TP 7
|
|
\fILC_COLLATE\fP
|
|
.sp
|
|
Determine the locale for ordering rules.
|
|
.TP 7
|
|
\fILC_CTYPE\fP
|
|
Determine the locale for the interpretation of sequences of bytes
|
|
of text data as characters (for example, single-byte as
|
|
opposed to multi-byte characters in arguments and input files) and
|
|
the behavior of character classification for the \fB-b\fP,
|
|
\fB-d\fP, \fB-f\fP, \fB-i\fP, and \fB-n\fP options.
|
|
.TP 7
|
|
\fILC_MESSAGES\fP
|
|
Determine the locale that should be used to affect the format and
|
|
contents of diagnostic messages written to standard
|
|
error.
|
|
.TP 7
|
|
\fILC_NUMERIC\fP
|
|
.sp
|
|
Determine the locale for the definition of the radix character and
|
|
thousands separator for the \fB-n\fP option.
|
|
.TP 7
|
|
\fINLSPATH\fP
|
|
Determine the location of message catalogs for the processing of \fILC_MESSAGES
|
|
\&.\fP
|
|
.sp
|
|
.SH ASYNCHRONOUS EVENTS
|
|
.LP
|
|
Default.
|
|
.SH STDOUT
|
|
.LP
|
|
Unless the \fB-o\fP or \fB-c\fP options are in effect, the standard
|
|
output shall contain the sorted input.
|
|
.SH STDERR
|
|
.LP
|
|
The standard error shall be used for diagnostic messages. A warning
|
|
message about correcting an incomplete last line of an input
|
|
file may be generated, but need not affect the final exit status.
|
|
.SH OUTPUT FILES
|
|
.LP
|
|
If the \fB-o\fP option is in effect, the sorted input shall be written
|
|
to the file \fIoutput\fP.
|
|
.SH EXTENDED DESCRIPTION
|
|
.LP
|
|
The notation:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fB-k\fP \fIfield_start\fP\fB[\fP\fItype\fP\fB][\fP\fB,\fP\fIfield_end\fP\fB[\fP\fItype\fP\fB]]\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
shall define a key field that begins at \fIfield_start\fP and ends
|
|
at \fIfield_end\fP inclusive, unless \fIfield_start\fP
|
|
falls beyond the end of the line or after \fIfield_end\fP, in which
|
|
case the key field is empty. A missing \fIfield_end\fP shall
|
|
mean the last character of the line.
|
|
.LP
|
|
A field comprises a maximal sequence of non-separating characters
|
|
and, in the absence of option \fB-t\fP, any preceding field
|
|
separator.
|
|
.LP
|
|
The \fIfield_start\fP portion of the \fIkeydef\fP option-argument
|
|
shall have the form:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fIfield_number\fP\fB[\fP\fB.\fP\fIfirst_character\fP\fB]\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
Fields and characters within fields shall be numbered starting with
|
|
1. The \fIfield_number\fP and \fIfirst_character\fP
|
|
pieces, interpreted as positive decimal integers, shall specify the
|
|
first character to be used as part of a sort key. If
|
|
\fI\&.first_character\fP is omitted, it shall refer to the first character
|
|
of the field.
|
|
.LP
|
|
The \fIfield_end\fP portion of the \fIkeydef\fP option-argument shall
|
|
have the form:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fIfield_number\fP\fB[\fP\fB.\fP\fIlast_character\fP\fB]\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
The \fIfield_number\fP shall be as described above for \fIfield_start.\fP
|
|
The \fIlast_character\fP piece, interpreted as a
|
|
non-negative decimal integer, shall specify the last character to
|
|
be used as part of the sort key. If \fIlast_character\fP
|
|
evaluates to zero or \fI.last_character\fP is omitted, it shall refer
|
|
to the last character of the field specified by
|
|
\fIfield_number\fP.
|
|
.LP
|
|
If the \fB-b\fP option or \fBb\fP type modifier is in effect, characters
|
|
within a field shall be counted from the first non-
|
|
<blank> in the field. (This shall apply separately to \fIfirst_character\fP
|
|
and \fIlast_character\fP.)
|
|
.SH EXIT STATUS
|
|
.LP
|
|
The following exit values shall be returned:
|
|
.TP 7
|
|
\ 0
|
|
All input files were output successfully, or \fB-c\fP was specified
|
|
and the input file was correctly sorted.
|
|
.TP 7
|
|
\ 1
|
|
Under the \fB-c\fP option, the file was not ordered as specified,
|
|
or if the \fB-c\fP and \fB-u\fP options were both
|
|
specified, two input lines were found with equal keys.
|
|
.TP 7
|
|
>1
|
|
An error occurred.
|
|
.sp
|
|
.SH CONSEQUENCES OF ERRORS
|
|
.LP
|
|
Default.
|
|
.LP
|
|
\fIThe following sections are informative.\fP
|
|
.SH APPLICATION USAGE
|
|
.LP
|
|
The default value for \fB-t\fP, <blank>, has different properties
|
|
from, for example, \fB-t\fP "<space>". If a line
|
|
contains:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fB<space><space>foo
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
the following treatment would occur with default separation as opposed
|
|
to specifically selecting a <space>:
|
|
.TS C
|
|
center; l l l.
|
|
\fBField\fP \fBDefault\fP \fB-t "<space>"\fP
|
|
1 <space><space>foo \fIempty\fP
|
|
2 \fIempty\fP \fIempty\fP
|
|
3 \fIempty\fP foo
|
|
.TE
|
|
.LP
|
|
The leading field separator itself is included in a field when \fB-t\fP
|
|
is not used. For example, this command returns an exit
|
|
status of zero, meaning the input was already sorted:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBsort -c -k 2 <<eof
|
|
y<tab>b
|
|
x<space>a
|
|
eof
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
(assuming that a <tab> precedes the <space> in the current collating
|
|
sequence). The field separator is not included
|
|
in a field when it is explicitly set via \fB-t\fP. This is historical
|
|
practice and allows usage such as:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBsort -t "|" -k 2n <<eof
|
|
Atlanta|425022|Georgia
|
|
Birmingham|284413|Alabama
|
|
Columbia|100385|South Carolina
|
|
eof
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
where the second field can be correctly sorted numerically without
|
|
regard to the non-numeric field separator.
|
|
.LP
|
|
The wording in the OPTIONS section clarifies that the \fB-b\fP, \fB-d\fP,
|
|
\fB-f\fP, \fB-i\fP, \fB-n\fP, and \fB-r\fP
|
|
options have to come before the first sort key specified if they are
|
|
intended to apply to all specified keys. The way it is
|
|
described in this volume of IEEE\ Std\ 1003.1-2001 matches historical
|
|
practice, not historical documentation. The results
|
|
are unspecified if these options are specified after a \fB-k\fP option.
|
|
.LP
|
|
The \fB-f\fP option might not work as expected in locales where there
|
|
is not a one-to-one mapping between an uppercase and a
|
|
lowercase letter.
|
|
.SH EXAMPLES
|
|
.IP " 1." 4
|
|
The following command sorts the contents of \fBinfile\fP with the
|
|
second field as the sort key:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBsort -k 2,2 infile
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
.IP " 2." 4
|
|
The following command sorts, in reverse order, the contents of \fBinfile1\fP
|
|
and \fBinfile2\fP, placing the output in
|
|
\fBoutfile\fP and using the second character of the second field as
|
|
the sort key (assuming that the first character of the second
|
|
field is the field separator):
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBsort -r -o outfile -k 2.2,2.2 infile1 infile2
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
.IP " 3." 4
|
|
The following command sorts the contents of \fBinfile1\fP and \fBinfile2\fP
|
|
using the second non- <blank> of the second
|
|
field as the sort key:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBsort -k 2.2b,2.2b infile1 infile2
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
.IP " 4." 4
|
|
The following command prints the System\ V password file (user database)
|
|
sorted by the numeric user ID (the third
|
|
colon-separated field):
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBsort -t : -k 3,3n /etc/passwd
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
.IP " 5." 4
|
|
The following command prints the lines of the already sorted file
|
|
\fBinfile\fP, suppressing all but one occurrence of lines
|
|
having the same third field:
|
|
.sp
|
|
.RS
|
|
.nf
|
|
|
|
\fBsort -um -k 3.1,3.0 infile
|
|
\fP
|
|
.fi
|
|
.RE
|
|
.LP
|
|
.SH RATIONALE
|
|
.LP
|
|
Examples in some historical documentation state that options \fB-um\fP
|
|
with one input file keep the first in each set of lines
|
|
with equal keys. This behavior was deemed to be an implementation
|
|
artifact and was not standardized.
|
|
.LP
|
|
The \fB-z\fP option was omitted; it is not standard practice on most
|
|
systems and is inconsistent with using \fIsort\fP to sort
|
|
several files individually and then merge them together. The text
|
|
concerning \fB-z\fP in historical documentation appeared to
|
|
require implementations to determine the proper buffer length during
|
|
the sort phase of operation, but not during the merge.
|
|
.LP
|
|
The \fB-y\fP option was omitted because of non-portability. The \fB-M\fP
|
|
option, present in System V, was omitted because of
|
|
non-portability in international usage.
|
|
.LP
|
|
An undocumented \fB-T\fP option exists in some implementations. It
|
|
is used to specify a directory for intermediate files.
|
|
Implementations are encouraged to support the use of the \fITMPDIR\fP
|
|
environment variable instead of adding an option to support
|
|
this functionality.
|
|
.LP
|
|
The \fB-k\fP option was added to satisfy two objections. First, the
|
|
zero-based counting used by \fIsort\fP is not consistent
|
|
with other utility conventions. Second, it did not meet syntax guideline
|
|
requirements.
|
|
.LP
|
|
Historical documentation indicates that "setting \fB-n\fP implies
|
|
\fB-b\fP". The description of \fB-n\fP already states
|
|
that optional leading <blank>s are tolerated in doing the comparison.
|
|
If \fB-b\fP is enabled, rather than implied, by
|
|
\fB-n\fP, this has unusual side effects. When a character offset is
|
|
used in a column of numbers (for example, to sort modulo 100),
|
|
that offset is measured relative to the most significant digit, not
|
|
to the column. Based upon a recommendation from the author of
|
|
the original \fIsort\fP utility, the \fB-b\fP implication has been
|
|
omitted from this volume of IEEE\ Std\ 1003.1-2001,
|
|
and an application wishing to achieve the previously mentioned side
|
|
effects has to code the \fB-b\fP flag explicitly.
|
|
.SH FUTURE DIRECTIONS
|
|
.LP
|
|
None.
|
|
.SH SEE ALSO
|
|
.LP
|
|
\fIcomm\fP , \fIjoin\fP , \fIuniq\fP , the System
|
|
Interfaces volume of IEEE\ Std\ 1003.1-2001, \fItoupper\fP()
|
|
.SH COPYRIGHT
|
|
Portions of this text are reprinted and reproduced in electronic form
|
|
from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
|
|
-- Portable Operating System Interface (POSIX), The Open Group Base
|
|
Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
|
|
Electrical and Electronics Engineers, Inc and The Open Group. In the
|
|
event of any discrepancy between this version and the original IEEE and
|
|
The Open Group Standard, the original IEEE and The Open Group Standard
|
|
is the referee document. The original Standard can be obtained online at
|
|
http://www.opengroup.org/unix/online.html .
|