2004-11-03 13:51:07 +00:00
|
|
|
.\" Copyright (c) 2001-2003 The Open Group, All Rights Reserved
|
2007-06-20 22:33:04 +00:00
|
|
|
.TH "JOIN" 1P 2003 "IEEE/The Open Group" "POSIX Programmer's Manual"
|
2004-11-03 13:51:07 +00:00
|
|
|
.\" join
|
2007-09-20 06:03:25 +00:00
|
|
|
.SH PROLOG
|
|
|
|
This manual page is part of the POSIX Programmer's Manual.
|
|
|
|
The Linux implementation of this interface may differ (consult
|
|
|
|
the corresponding Linux manual page for details of Linux behavior),
|
|
|
|
or the interface may not be implemented on Linux.
|
2004-11-03 13:51:07 +00:00
|
|
|
.SH NAME
|
|
|
|
join \- relational database operator
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.LP
|
|
|
|
\fBjoin\fP \fB[\fP\fB-a\fP \fIfile_number\fP \fB| -v\fP \fIfile_number\fP\fB][\fP\fB-e\fP
|
|
|
|
\fIstring\fP\fB][\fP\fB-o\fP \fIlist\fP\fB][\fP\fB-t\fP \fIchar\fP\fB]
|
|
|
|
.br
|
|
|
|
\fP \fB\ \ \ \ \ \ \fP \fB[\fP\fB-1\fP \fIfield\fP\fB][\fP\fB-2\fP
|
|
|
|
\fIfield\fP\fB]\fP
|
|
|
|
\fIfile1 file2\fP
|
|
|
|
.SH DESCRIPTION
|
|
|
|
.LP
|
|
|
|
The \fIjoin\fP utility shall perform an equality join on the files
|
|
|
|
\fIfile1\fP and \fIfile2\fP. The joined files shall be
|
|
|
|
written to the standard output.
|
|
|
|
.LP
|
|
|
|
The join field is a field in each file on which the files are compared.
|
|
|
|
The \fIjoin\fP utility shall write one line in the
|
|
|
|
output for each pair of lines in \fIfile1\fP and \fIfile2\fP that
|
|
|
|
have identical join fields. The output line by default shall
|
|
|
|
consist of the join field, then the remaining fields from \fIfile1\fP,
|
|
|
|
then the remaining fields from \fIfile2\fP. This format
|
|
|
|
can be changed by using the \fB-o\fP option (see below). The \fB-a\fP
|
|
|
|
option can be used to add unmatched lines to the output.
|
|
|
|
The \fB-v\fP option can be used to output only unmatched lines.
|
|
|
|
.LP
|
|
|
|
The files \fIfile1\fP and \fIfile2\fP shall be ordered in the collating
|
|
|
|
sequence of \fIsort\fP \fB-b\fP on the fields on which they shall
|
|
|
|
be joined, by default the first in each line.
|
|
|
|
All selected output shall be written in the same collating sequence.
|
|
|
|
.LP
|
|
|
|
The default input field separators shall be <blank>s. In this case,
|
|
|
|
multiple separators shall count as one field
|
|
|
|
separator, and leading separators shall be ignored. The default output
|
|
|
|
field separator shall be a <space>.
|
|
|
|
.LP
|
|
|
|
The field separator and collating sequence can be changed by using
|
|
|
|
the \fB-t\fP option (see below).
|
|
|
|
.LP
|
|
|
|
If the same key appears more than once in either file, all combinations
|
|
|
|
of the set of remaining fields in \fIfile1\fP and the
|
|
|
|
set of remaining fields in \fIfile2\fP are output in the order of
|
|
|
|
the lines encountered.
|
|
|
|
.LP
|
|
|
|
If the input files are not in the appropriate collating sequence,
|
|
|
|
the results are unspecified.
|
|
|
|
.SH OPTIONS
|
|
|
|
.LP
|
|
|
|
The \fIjoin\fP utility shall conform to the Base Definitions volume
|
|
|
|
of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
|
|
|
|
.LP
|
|
|
|
The following options shall be supported:
|
|
|
|
.TP 7
|
|
|
|
\fB-a\ \fP \fIfile_number\fP
|
|
|
|
.sp
|
|
|
|
Produce a line for each unpairable line in file \fIfile_number\fP,
|
|
|
|
where \fIfile_number\fP is 1 or 2, in addition to the default
|
|
|
|
output. If both \fB-a\fP1 and \fB-a\fP2 are specified, all unpairable
|
|
|
|
lines shall be output.
|
|
|
|
.TP 7
|
|
|
|
\fB-e\ \fP \fIstring\fP
|
|
|
|
Replace empty output fields in the list selected by \fB-o\fP with
|
|
|
|
the string \fIstring\fP.
|
|
|
|
.TP 7
|
|
|
|
\fB-o\ \fP \fIlist\fP
|
|
|
|
Construct the output line to comprise the fields specified in \fIlist\fP,
|
|
|
|
each element of which shall have one of the
|
|
|
|
following two forms:
|
|
|
|
.RS
|
|
|
|
.IP " 1." 4
|
|
|
|
\fIfile_number.field\fP, where \fIfile_number\fP is a file number
|
|
|
|
and \fIfield\fP is a decimal integer field number
|
|
|
|
.LP
|
|
|
|
.IP " 2." 4
|
|
|
|
0 (zero), representing the join field
|
|
|
|
.LP
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
The elements of \fIlist\fP shall be either comma-separated or <blank>-separated,
|
|
|
|
as specified in Guideline 8 of the Base
|
|
|
|
Definitions volume of IEEE\ Std\ 1003.1-2001, Section 12.2, Utility
|
|
|
|
Syntax
|
|
|
|
Guidelines. The fields specified by \fIlist\fP shall be written for
|
|
|
|
all selected output lines. Fields selected by \fIlist\fP
|
|
|
|
that do not appear in the input shall be treated as empty output fields.
|
|
|
|
(See the \fB-e\fP option.) Only specifically requested
|
|
|
|
fields shall be written. The application shall ensure that \fIlist\fP
|
|
|
|
is a single command line argument.
|
|
|
|
.TP 7
|
|
|
|
\fB-t\ \fP \fIchar\fP
|
|
|
|
Use character \fIchar\fP as a separator, for both input and output.
|
|
|
|
Every appearance of \fIchar\fP in a line shall be
|
|
|
|
significant. When this option is specified, the collating sequence
|
|
|
|
shall be the same as \fIsort\fP without the \fB-b\fP option.
|
|
|
|
.TP 7
|
|
|
|
\fB-v\ \fP \fIfile_number\fP
|
|
|
|
.sp
|
|
|
|
Instead of the default output, produce a line only for each unpairable
|
|
|
|
line in \fIfile_number\fP, where \fIfile_number\fP is 1 or
|
|
|
|
2. If both \fB-v\fP1 and \fB-v\fP2 are specified, all unpairable lines
|
|
|
|
shall be output.
|
|
|
|
.TP 7
|
|
|
|
\fB-1\ \fP \fIfield\fP
|
|
|
|
Join on the \fIfield\fPth field of file 1. Fields are decimal integers
|
|
|
|
starting with 1.
|
|
|
|
.TP 7
|
|
|
|
\fB-2\ \fP \fIfield\fP
|
|
|
|
Join on the \fIfield\fPth field of file 2. Fields are decimal integers
|
|
|
|
starting with 1.
|
|
|
|
.sp
|
|
|
|
.SH OPERANDS
|
|
|
|
.LP
|
|
|
|
The following operands shall be supported:
|
|
|
|
.TP 7
|
|
|
|
\fIfile1\fP,\ \fIfile2\fP
|
|
|
|
A pathname of a file to be joined. If either of the \fIfile1\fP or
|
2007-09-20 06:36:16 +00:00
|
|
|
\fIfile2\fP operands is \fB'-'\fP, the standard input
|
2004-11-03 13:51:07 +00:00
|
|
|
shall be used in its place.
|
|
|
|
.sp
|
|
|
|
.SH STDIN
|
|
|
|
.LP
|
|
|
|
The standard input shall be used only if the \fIfile1\fP or \fIfile2\fP
|
|
|
|
operand is \fB'-'\fP . See the INPUT FILES
|
|
|
|
section.
|
|
|
|
.SH INPUT FILES
|
|
|
|
.LP
|
|
|
|
The input files shall be text files.
|
|
|
|
.SH ENVIRONMENT VARIABLES
|
|
|
|
.LP
|
|
|
|
The following environment variables shall affect the execution of
|
|
|
|
\fIjoin\fP:
|
|
|
|
.TP 7
|
|
|
|
\fILANG\fP
|
|
|
|
Provide a default value for the internationalization variables that
|
|
|
|
are unset or null. (See the Base Definitions volume of
|
|
|
|
IEEE\ Std\ 1003.1-2001, Section 8.2, Internationalization Variables
|
|
|
|
for
|
|
|
|
the precedence of internationalization variables used to determine
|
|
|
|
the values of locale categories.)
|
|
|
|
.TP 7
|
|
|
|
\fILC_ALL\fP
|
|
|
|
If set to a non-empty string value, override the values of all the
|
|
|
|
other internationalization variables.
|
|
|
|
.TP 7
|
|
|
|
\fILC_COLLATE\fP
|
|
|
|
.sp
|
|
|
|
Determine the locale of the collating sequence \fIjoin\fP expects
|
|
|
|
to have been used when the input files were sorted.
|
|
|
|
.TP 7
|
|
|
|
\fILC_CTYPE\fP
|
|
|
|
Determine the locale for the interpretation of sequences of bytes
|
|
|
|
of text data as characters (for example, single-byte as
|
|
|
|
opposed to multi-byte characters in arguments and input files).
|
|
|
|
.TP 7
|
|
|
|
\fILC_MESSAGES\fP
|
|
|
|
Determine the locale that should be used to affect the format and
|
|
|
|
contents of diagnostic messages written to standard
|
|
|
|
error.
|
|
|
|
.TP 7
|
|
|
|
\fINLSPATH\fP
|
|
|
|
Determine the location of message catalogs for the processing of \fILC_MESSAGES
|
|
|
|
\&.\fP
|
|
|
|
.sp
|
|
|
|
.SH ASYNCHRONOUS EVENTS
|
|
|
|
.LP
|
|
|
|
Default.
|
|
|
|
.SH STDOUT
|
|
|
|
.LP
|
|
|
|
The \fIjoin\fP utility output shall be a concatenation of selected
|
|
|
|
character fields. When the \fB-o\fP option is not
|
|
|
|
specified, the output shall be:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fB"%s%s%s\\n", <\fP\fIjoin field\fP\fB>, <\fP\fIother file1 fields\fP\fB>,
|
|
|
|
<\fP\fIother file2 fields\fP\fB>
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
If the join field is not the first field in a file, the <\fIother\ file\ fields\fP>
|
|
|
|
for that file shall be:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fB<\fP\fIfields preceding join field\fP\fB>, <\fP\fIfields following join field\fP\fB>
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
When the \fB-o\fP option is specified, the output format shall be:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fB"%s\\n", <\fP\fIconcatenation of fields\fP\fB>
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
where the concatenation of fields is described by the \fB-o\fP option,
|
|
|
|
above.
|
|
|
|
.LP
|
|
|
|
For either format, each field (except the last) shall be written with
|
|
|
|
its trailing separator character. If the separator is the
|
|
|
|
default ( <blank>s), a single <space> shall be written after each
|
|
|
|
field (except the last).
|
|
|
|
.SH STDERR
|
|
|
|
.LP
|
|
|
|
The standard error shall be used only for diagnostic messages.
|
|
|
|
.SH OUTPUT FILES
|
|
|
|
.LP
|
|
|
|
None.
|
|
|
|
.SH EXTENDED DESCRIPTION
|
|
|
|
.LP
|
|
|
|
None.
|
|
|
|
.SH EXIT STATUS
|
|
|
|
.LP
|
|
|
|
The following exit values shall be returned:
|
|
|
|
.TP 7
|
|
|
|
\ 0
|
|
|
|
All input files were output successfully.
|
|
|
|
.TP 7
|
|
|
|
>0
|
|
|
|
An error occurred.
|
|
|
|
.sp
|
|
|
|
.SH CONSEQUENCES OF ERRORS
|
|
|
|
.LP
|
|
|
|
Default.
|
|
|
|
.LP
|
|
|
|
\fIThe following sections are informative.\fP
|
|
|
|
.SH APPLICATION USAGE
|
|
|
|
.LP
|
|
|
|
Pathnames consisting of numeric digits or of the form \fIstring.string\fP
|
|
|
|
should not be specified directly following the
|
|
|
|
\fB-o\fP list.
|
|
|
|
.SH EXAMPLES
|
|
|
|
.LP
|
|
|
|
The \fB-o\fP 0 field essentially selects the union of the join fields.
|
|
|
|
For example, given file \fBphone\fP:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fB!Name Phone Number
|
|
|
|
Don +1 123-456-7890
|
|
|
|
Hal +1 234-567-8901
|
|
|
|
Yasushi +2 345-678-9012
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
and file \fBfax\fP:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fB!Name Fax Number
|
|
|
|
Don +1 123-456-7899
|
|
|
|
Keith +1 456-789-0122
|
|
|
|
Yasushi +2 345-678-9011
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
(where the large expanses of white space are meant to each represent
|
|
|
|
a single <tab>), the command:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fBjoin -t "<tab>" -a 1 -a 2 -e '(unknown)' -o 0,1.2,2.2 phone fax
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
would produce:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fB!Name Phone Number Fax Number
|
|
|
|
Don +1 123-456-7890 +1 123-456-7899
|
|
|
|
Hal +1 234-567-8901 (unknown)
|
|
|
|
Keith (unknown) +1 456-789-0122
|
|
|
|
Yasushi +2 345-678-9012 +2 345-678-9011
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
Multiple instances of the same key will produce combinatorial results.
|
|
|
|
The following:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fBfa:
|
|
|
|
a x
|
|
|
|
a y
|
|
|
|
a z
|
|
|
|
fb:
|
|
|
|
a p
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
will produce:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fBa x p
|
|
|
|
a y p
|
|
|
|
a z p
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
And the following:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fBfa:
|
|
|
|
a b c
|
|
|
|
a d e
|
|
|
|
fb:
|
|
|
|
a w x
|
|
|
|
a y z
|
|
|
|
a o p
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.LP
|
|
|
|
will produce:
|
|
|
|
.sp
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
|
|
|
|
\fBa b c w x
|
|
|
|
a b c y z
|
|
|
|
a b c o p
|
|
|
|
a d e w x
|
|
|
|
a d e y z
|
|
|
|
a d e o p
|
|
|
|
\fP
|
|
|
|
.fi
|
|
|
|
.RE
|
|
|
|
.SH RATIONALE
|
|
|
|
.LP
|
|
|
|
The \fB-e\fP option is only effective when used with \fB-o\fP because,
|
|
|
|
unless specific fields are identified using \fB-o\fP,
|
|
|
|
\fIjoin\fP is not aware of what fields might be empty. The exception
|
|
|
|
to this is the join field, but identifying an empty join
|
|
|
|
field with the \fB-e\fP string is not historical practice and some
|
|
|
|
scripts might break if this were changed.
|
|
|
|
.LP
|
|
|
|
The 0 field in the \fB-o\fP list was adopted from the Tenth Edition
|
|
|
|
version of \fIjoin\fP to satisfy international objections
|
|
|
|
that the \fIjoin\fP in the base documents does not support the "full
|
|
|
|
join" or "outer join" described in relational database
|
|
|
|
literature. Although it has been possible to include a join field
|
|
|
|
in the output (by default, or by field number using \fB-o\fP),
|
|
|
|
the join field could not be included for an unpaired line selected
|
|
|
|
by \fB-a\fP. The \fB-o\fP 0 field essentially selects the
|
|
|
|
union of the join fields.
|
|
|
|
.LP
|
|
|
|
This sort of outer join was not possible with the \fIjoin\fP commands
|
|
|
|
in the base documents. The \fB-o\fP 0 field was chosen
|
|
|
|
because it is an upwards-compatible change for applications. An alternative
|
|
|
|
was considered: have the join field represent the union
|
|
|
|
of the fields in the files (where they are identical for matched lines,
|
|
|
|
and one or both are null for unmatched lines). This was not
|
|
|
|
adopted because it would break some historical applications.
|
|
|
|
.LP
|
|
|
|
The ability to specify \fIfile2\fP as \fB-\fP is not historical practice;
|
|
|
|
it was added for completeness.
|
|
|
|
.LP
|
|
|
|
The \fB-v\fP option is not historical practice, but was considered
|
|
|
|
necessary because it permitted the writing of \fIonly\fP
|
|
|
|
those lines that do not match on the join field, as opposed to the
|
|
|
|
\fB-a\fP option, which prints both lines that do and do not
|
|
|
|
match. This additional facility is parallel with the \fB-v\fP option
|
|
|
|
of \fIgrep\fP.
|
|
|
|
.LP
|
|
|
|
Some historical implementations have been encountered where a blank
|
|
|
|
line in one of the input files was considered to be the end
|
|
|
|
of the file; the description in this volume of IEEE\ Std\ 1003.1-2001
|
|
|
|
does not cite this as an allowable case.
|
|
|
|
.SH FUTURE DIRECTIONS
|
|
|
|
.LP
|
|
|
|
None.
|
|
|
|
.SH SEE ALSO
|
|
|
|
.LP
|
2007-09-20 06:36:16 +00:00
|
|
|
\fIawk\fP, \fIcomm\fP, \fIsort\fP, \fIuniq\fP
|
2004-11-03 13:51:07 +00:00
|
|
|
.SH COPYRIGHT
|
|
|
|
Portions of this text are reprinted and reproduced in electronic form
|
|
|
|
from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
|
|
|
|
-- Portable Operating System Interface (POSIX), The Open Group Base
|
|
|
|
Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
|
|
|
|
Electrical and Electronics Engineers, Inc and The Open Group. In the
|
|
|
|
event of any discrepancy between this version and the original IEEE and
|
|
|
|
The Open Group Standard, the original IEEE and The Open Group Standard
|
|
|
|
is the referee document. The original Standard can be obtained online at
|
|
|
|
http://www.opengroup.org/unix/online.html .
|