mirror of https://github.com/mkerrisk/man-pages
select_tut.2: Many parts tidied and rewritten
Remove some redundant text, clarify various pieces, tidy example code, etc. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
b0cac4c7aa
commit
7ce9ffda47
|
@ -25,8 +25,9 @@
|
|||
.\" Modified 5 June 2002, Michael Kerrisk <mtk.manpages@gmail.com>
|
||||
.\" 2006-05-13, mtk, removed much material that is redundant with select.2
|
||||
.\" various other changes
|
||||
.\" 2008-01-26, mtk, substantial changes and rewrites
|
||||
.\"
|
||||
.TH SELECT_TUT 2 2008-12-05 "Linux" "Linux Programmer's Manual"
|
||||
.TH SELECT_TUT 2 2009-01-26 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
select, pselect, FD_CLR, FD_ISSET, FD_SET, FD_ZERO \-
|
||||
synchronous I/O multiplexing
|
||||
|
@ -73,56 +74,35 @@ _POSIX_C_SOURCE\ >=\ 200112L || _XOPEN_SOURCE\ >=\ 600
|
|||
.BR select ()
|
||||
(or
|
||||
.BR pselect ())
|
||||
is the pivot function of
|
||||
most C programs that
|
||||
handle more than one simultaneous file descriptor (or socket handle)
|
||||
in an efficient
|
||||
manner.
|
||||
Its principal arguments are three arrays of file descriptors:
|
||||
is used to efficiently monitor multiple file descriptors,
|
||||
to see if any of them is, or becomes, "ready";
|
||||
that is, to see whether I/O becomes possible,
|
||||
or an "exceptional condition" has occurred on any of the descriptors.
|
||||
|
||||
Its principal arguments are three "sets" of file descriptors:
|
||||
\fIreadfds\fP, \fIwritefds\fP, and \fIexceptfds\fP.
|
||||
The way that
|
||||
.BR select ()
|
||||
is usually used is to block while waiting for a "change of
|
||||
status" on one or more of the file descriptors.
|
||||
A "change of status" is
|
||||
when more characters become available from the file descriptor, \fIor\fP
|
||||
when space becomes available within the kernel's internal buffers for
|
||||
more to be written to the file descriptor, \fIor\fP when a file
|
||||
descriptor goes into error (in the case of a socket or pipe this is
|
||||
when the other end of the connection is closed).
|
||||
|
||||
In summary,
|
||||
.BR select ()
|
||||
just watches multiple file descriptors,
|
||||
and is the standard Unix call to do so.
|
||||
|
||||
The arrays of file descriptors are called \fIfile descriptor sets\fP.
|
||||
Each set is declared as type \fBfd_set\fP, and its contents can be
|
||||
altered with the macros
|
||||
Each set is declared as type
|
||||
.IR fd_set ,
|
||||
and its contents can be manipulated with the macros
|
||||
.BR FD_CLR (),
|
||||
.BR FD_ISSET (),
|
||||
.BR FD_SET (),
|
||||
and
|
||||
.BR FD_ZERO ().
|
||||
.BR FD_ZERO ()
|
||||
is usually the first function to be used on
|
||||
a newly declared set.
|
||||
Thereafter, the individual file descriptors that
|
||||
you are interested in can be added one by one with
|
||||
.BR FD_SET ().
|
||||
A newly declared set should first be cleared using
|
||||
.BR FD_ZERO ().
|
||||
.BR select ()
|
||||
modifies the contents of the sets according to the rules
|
||||
described below; after calling
|
||||
.BR select ()
|
||||
you can test if your file
|
||||
descriptor is still present in the set with the
|
||||
you can test if a file descriptor is still present in a set with the
|
||||
.BR FD_ISSET ()
|
||||
macro.
|
||||
.BR FD_ISSET ()
|
||||
returns non-zero if the descriptor is present and zero if
|
||||
it is not.
|
||||
returns non-zero if a specified file descriptor is present in a set
|
||||
and zero if it is not.
|
||||
.BR FD_CLR ()
|
||||
removes a file descriptor from the set.
|
||||
removes a file descriptor from a set.
|
||||
.SS Arguments
|
||||
.TP
|
||||
\fIreadfds\fP
|
||||
|
@ -132,11 +112,7 @@ After
|
|||
.BR select ()
|
||||
has returned, \fIreadfds\fP will be
|
||||
cleared of all file descriptors except for those that
|
||||
are immediately available for reading with a
|
||||
.BR recv (2)
|
||||
(for sockets) or
|
||||
.BR read (2)
|
||||
(for pipes, files, and sockets) call.
|
||||
are immediately available for reading.
|
||||
.TP
|
||||
\fIwritefds\fP
|
||||
This set is watched to see if there is space to write data to any of
|
||||
|
@ -145,63 +121,48 @@ After
|
|||
.BR select ()
|
||||
has returned, \fIwritefds\fP will be
|
||||
cleared of all file descriptors except for those that
|
||||
are immediately available for writing with a
|
||||
.BR send (2)
|
||||
(for sockets) or
|
||||
.BR write (2)
|
||||
(for pipes, files, and sockets) call.
|
||||
are immediately available for writing.
|
||||
.TP
|
||||
\fIexceptfds\fP
|
||||
This set is watched for exceptions or errors on any of the file
|
||||
descriptors.
|
||||
However, that is actually just a rumor.
|
||||
How you use
|
||||
\fIexceptfds\fP is to watch for \fIout-of-band\fP (OOB) data.
|
||||
OOB data
|
||||
is data sent on a socket using the \fBMSG_OOB\fP flag, and hence
|
||||
\fIexceptfds\fP only really applies to sockets.
|
||||
This set is watched for "exceptional conditions".
|
||||
In practice, only one such exceptional condition is common:
|
||||
the availability of \fIout-of-band\fP (OOB) data for reading
|
||||
from a TCP socket.
|
||||
See
|
||||
.BR recv (2)
|
||||
.BR recv (2),
|
||||
.BR send (2),
|
||||
and
|
||||
.BR send (2)
|
||||
about this.
|
||||
.BR tcp (7)
|
||||
for more details about OOB data.
|
||||
(One other less common case where
|
||||
.BR select (2)
|
||||
indicates an exceptional condition occurs with pseudo-terminals
|
||||
in packet mode; see
|
||||
.BR tty_ioctl (4).)
|
||||
After
|
||||
.BR select ()
|
||||
has returned,
|
||||
\fIexceptfds\fP will be cleared of all file descriptors except for those
|
||||
that are available for reading OOB data.
|
||||
You can only ever
|
||||
read one byte of OOB data though (which is done with
|
||||
.BR recv (2)),
|
||||
and
|
||||
writing OOB data (done with
|
||||
.BR send (2))
|
||||
can be done at any time and will
|
||||
not block.
|
||||
Hence there is no need for a fourth set to check if a socket
|
||||
is available for writing OOB data.
|
||||
for which an exceptional condition has occurred.
|
||||
.TP
|
||||
\fInfds\fP
|
||||
This is an integer one more than the maximum of any file descriptor in
|
||||
any of the sets.
|
||||
In other words, while you are busy adding file descriptors
|
||||
to your sets, you must calculate the maximum integer value of all of
|
||||
them, then increment this value by one, and then pass this as \fInfds\fP to
|
||||
.BR select ().
|
||||
In other words, while adding file descriptors each of the sets,
|
||||
you must calculate the maximum integer value of all of them,
|
||||
then increment this value by one, and then pass this as \fInfds\fP.
|
||||
.TP
|
||||
\fIutimeout\fP
|
||||
This is the longest time
|
||||
.BR select ()
|
||||
may wait before returning, even
|
||||
if nothing interesting happened.
|
||||
If this value is passed as NULL,
|
||||
then
|
||||
may wait before returning, even if nothing interesting happened.
|
||||
If this value is passed as NULL, then
|
||||
.BR select ()
|
||||
blocks indefinitely waiting for an event.
|
||||
blocks indefinitely waiting for a file descriptor to become ready.
|
||||
\fIutimeout\fP can be set to zero seconds, which causes
|
||||
.BR select ()
|
||||
to
|
||||
return immediately.
|
||||
to return immediately, with information about the readiness
|
||||
of file descriptors at the time of the call.
|
||||
The structure \fIstruct timeval\fP is defined as:
|
||||
.IP
|
||||
.in +4n
|
||||
|
@ -214,7 +175,12 @@ struct timeval {
|
|||
.in
|
||||
.TP
|
||||
\fIntimeout\fP
|
||||
This argument has the same meaning as \fIutimeout\fP but \fIstruct timespec\fP
|
||||
This argument for
|
||||
.BR pselect ()
|
||||
has the same meaning as
|
||||
.IR utimeout ,
|
||||
but
|
||||
.I "struct timespec"
|
||||
has nanosecond precision as follows:
|
||||
.IP
|
||||
.in +4n
|
||||
|
@ -227,52 +193,52 @@ struct timespec {
|
|||
.in
|
||||
.TP
|
||||
\fIsigmask\fP
|
||||
This argument holds a set of signals to allow while performing a
|
||||
This argument holds a set of signals that the kernel should unblock
|
||||
(i.e., remove from the signal mask of the calling thread),
|
||||
while the caller is blocked inside the
|
||||
.BR pselect ()
|
||||
call (see
|
||||
.BR sigaddset (3)
|
||||
and
|
||||
.BR sigprocmask (2)).
|
||||
It can be passed
|
||||
as NULL, in which case it does not modify the set of allowed signals on
|
||||
It may be NULL,
|
||||
in which case the call does not modify the signal mask on
|
||||
entry and exit to the function.
|
||||
It will then behave just like
|
||||
In this case,
|
||||
.BR pselect ()
|
||||
will then behave just like
|
||||
.BR select ().
|
||||
.SS Combining Signal and Data Events
|
||||
.BR pselect ()
|
||||
must be used if you are waiting for a signal as well as
|
||||
data from a file descriptor.
|
||||
Programs that receive signals as events
|
||||
is useful if you are waiting for a signal as well as
|
||||
for file descriptor(s) to become ready for I/O.
|
||||
Programs that receive signals
|
||||
normally use the signal handler only to raise a global flag.
|
||||
The global
|
||||
flag will indicate that the event must be processed in the main loop of
|
||||
the program.
|
||||
The global flag will indicate that the event must be processed
|
||||
in the main loop of the program.
|
||||
A signal will cause the
|
||||
.BR select ()
|
||||
(or
|
||||
.BR pselect ())
|
||||
call to return with \fIerrno\fP set to \fBEINTR\fP.
|
||||
This behavior is
|
||||
essential so that signals can be processed in the main loop of the
|
||||
program, otherwise
|
||||
This behavior is essential so that signals can be processed
|
||||
in the main loop of the program, otherwise
|
||||
.BR select ()
|
||||
would block indefinitely.
|
||||
Now, somewhere
|
||||
in the main loop will be a conditional to check the global flag.
|
||||
So we
|
||||
must ask: what if a signal arrives after the conditional, but before the
|
||||
So we must ask:
|
||||
what if a signal arrives after the conditional, but before the
|
||||
.BR select ()
|
||||
call?
|
||||
The answer is that
|
||||
.BR select ()
|
||||
would block
|
||||
indefinitely, even though an event is actually pending.
|
||||
This race
|
||||
condition is solved by the
|
||||
would block indefinitely, even though an event is actually pending.
|
||||
This race condition is solved by the
|
||||
.BR pselect ()
|
||||
call.
|
||||
This call can be used to
|
||||
mask out signals that are not to be received except within the
|
||||
This call can be used to set the siognal mask to a set of signals
|
||||
that are only to be received within the
|
||||
.BR pselect ()
|
||||
call.
|
||||
For instance, let us say that the event in question
|
||||
|
@ -282,9 +248,10 @@ would block \fBSIGCHLD\fP using
|
|||
.BR sigprocmask (2).
|
||||
Our
|
||||
.BR pselect ()
|
||||
call would enable \fBSIGCHLD\fP by using the virgin signal mask.
|
||||
Our
|
||||
program would look like:
|
||||
call would enable
|
||||
.B SIGCHLD
|
||||
by using an empty signal mask.
|
||||
Our program would look like:
|
||||
.PP
|
||||
.nf
|
||||
static volatile sig_atomic_t got_SIGCHLD = 0;
|
||||
|
@ -344,51 +311,37 @@ main(int argc, char *argv[])
|
|||
.SS Practical
|
||||
So what is the point of
|
||||
.BR select ()?
|
||||
Can't I just read and write to my
|
||||
descriptors whenever I want?
|
||||
Can't I just read and write to my descriptors whenever I want?
|
||||
The point of
|
||||
.BR select ()
|
||||
is that it watches
|
||||
multiple descriptors at the same time and properly puts the process to
|
||||
sleep if there is no activity.
|
||||
It does this while enabling you to handle
|
||||
multiple simultaneous pipes and sockets.
|
||||
Unix programmers often find
|
||||
themselves in a position where they have to handle I/O from more than one
|
||||
file descriptor where the data flow may be intermittent.
|
||||
If you were to
|
||||
merely create a sequence of
|
||||
If you were to merely create a sequence of
|
||||
.BR read (2)
|
||||
and
|
||||
.BR write (2)
|
||||
calls, you would
|
||||
find that one of your calls may block waiting for data from/to a file
|
||||
descriptor, while another file descriptor is unused though available
|
||||
for data.
|
||||
descriptor, while another file descriptor is unused though ready for I/O.
|
||||
.BR select ()
|
||||
efficiently copes with this situation.
|
||||
|
||||
A simple example of the use of
|
||||
.BR select ()
|
||||
can be found in the
|
||||
.BR select (2)
|
||||
manual page.
|
||||
.SS Select Law
|
||||
Many people who try to use
|
||||
.BR select ()
|
||||
come across behavior that is
|
||||
difficult to understand and produces non-portable or borderline
|
||||
results.
|
||||
difficult to understand and produces non-portable or borderline results.
|
||||
For instance, the above program is carefully written not to
|
||||
block at any point, even though it does not set its file descriptors to
|
||||
non-blocking mode at all (see
|
||||
.BR ioctl (2)).
|
||||
non-blocking mode.
|
||||
It is easy to introduce
|
||||
subtle errors that will remove the advantage of using
|
||||
.BR select (),
|
||||
hence I will present a list of essentials to watch for when using the
|
||||
.BR select ()
|
||||
call.
|
||||
so here is a list of essentials to watch for when using
|
||||
.BR select ().
|
||||
.TP 4
|
||||
1.
|
||||
You should always try to use
|
||||
|
@ -407,8 +360,7 @@ explained above.
|
|||
No file descriptor must be added to any set if you do not intend
|
||||
to check its result after the
|
||||
.BR select ()
|
||||
call, and respond
|
||||
appropriately.
|
||||
call, and respond appropriately.
|
||||
See next rule.
|
||||
.TP
|
||||
4.
|
||||
|
@ -416,10 +368,6 @@ After
|
|||
.BR select ()
|
||||
returns, all file descriptors in all sets
|
||||
should be checked to see if they are ready.
|
||||
.\" mtk, May 2006: the following isn't really true.
|
||||
.\" Any file descriptor that is available
|
||||
.\" for writing \fImust\fP be written to, and any file descriptor
|
||||
.\" available for reading \fImust\fP be read, etc.
|
||||
.TP
|
||||
5.
|
||||
The functions
|
||||
|
@ -432,8 +380,7 @@ do \fInot\fP necessarily read/write the full amount of data
|
|||
that you have requested.
|
||||
If they do read/write the full amount, it's
|
||||
because you have a low traffic load and a fast stream.
|
||||
This is not
|
||||
always going to be the case.
|
||||
This is not always going to be the case.
|
||||
You should cope with the case of your
|
||||
functions only managing to send or receive a single byte.
|
||||
.TP
|
||||
|
@ -442,7 +389,7 @@ Never read/write only in single bytes at a time unless you are really
|
|||
sure that you have a small amount of data to process.
|
||||
It is extremely
|
||||
inefficient not to read/write as much data as you can buffer each time.
|
||||
The buffers in the example above are 1024 bytes although they could
|
||||
The buffers in the example below are 1024 bytes although they could
|
||||
easily be made larger.
|
||||
.TP
|
||||
7.
|
||||
|
@ -460,14 +407,12 @@ set to \fBEINTR\fP,
|
|||
or with
|
||||
.I errno
|
||||
set to \fBEAGAIN\fP (\fBEWOULDBLOCK\fP).
|
||||
These results must be properly managed (not done properly
|
||||
above).
|
||||
These results must be properly managed (not done properly above).
|
||||
If your program is not going to receive any signals, then
|
||||
it is unlikely you will get \fBEINTR\fP.
|
||||
If your program does not
|
||||
set non-blocking I/O, you will not get \fBEAGAIN\fP.
|
||||
Nonetheless
|
||||
you should still cope with these errors for completeness.
|
||||
If your program does not set non-blocking I/O,
|
||||
you will not get \fBEAGAIN\fP.
|
||||
.\" Nonetheless, you should still cope with these errors for completeness.
|
||||
.TP
|
||||
8.
|
||||
Never call
|
||||
|
@ -485,13 +430,12 @@ If the functions
|
|||
.BR write (2),
|
||||
and
|
||||
.BR send (2)
|
||||
fail
|
||||
with errors other than those listed in \fB7.\fP,
|
||||
fail with errors other than those listed in \fB7.\fP,
|
||||
or one of the input functions returns 0, indicating end of file,
|
||||
then you should \fInot\fP pass that descriptor to
|
||||
.BR select ()
|
||||
again.
|
||||
In the above example,
|
||||
In the example below,
|
||||
I close the descriptor immediately, and then set it to \-1
|
||||
to prevent it being included in a set.
|
||||
.TP
|
||||
|
@ -503,15 +447,23 @@ since some operating systems modify the structure.
|
|||
however does not modify its timeout structure.
|
||||
.TP
|
||||
11.
|
||||
I have heard that the Windows socket layer does not cope with OOB data
|
||||
properly.
|
||||
It also does not cope with
|
||||
Since
|
||||
.BR select ()
|
||||
calls when no file
|
||||
descriptors are set at all.
|
||||
Having no file descriptors set is a useful
|
||||
way to sleep the process with sub-second precision by using the timeout.
|
||||
(See further on.)
|
||||
modifies its file descriptor sets,
|
||||
if the call is being used in a loop,
|
||||
then the sets must be re-initialized before each call.
|
||||
.\" "I have heard" does not fill me with confidence, and doesn't
|
||||
.\" belong in a man page, so I've commented this point out.
|
||||
.\" .TP
|
||||
.\" 11.
|
||||
.\" I have heard that the Windows socket layer does not cope with OOB data
|
||||
.\" properly.
|
||||
.\" It also does not cope with
|
||||
.\" .BR select ()
|
||||
.\" calls when no file descriptors are set at all.
|
||||
.\" Having no file descriptors set is a useful
|
||||
.\" way to sleep the process with sub-second precision by using the timeout.
|
||||
.\" (See further on.)
|
||||
.SS Usleep Emulation
|
||||
On systems that do not have a
|
||||
.BR usleep (3)
|
||||
|
@ -536,8 +488,7 @@ still present in the file descriptor sets.
|
|||
|
||||
If
|
||||
.BR select ()
|
||||
timed out, then
|
||||
the return value will be zero.
|
||||
timed out, then the return value will be zero.
|
||||
The file descriptors set should be all
|
||||
empty (but may not be on some systems).
|
||||
|
||||
|
@ -548,11 +499,8 @@ the \fIstruct timeout\fP contents are undefined and should not be used.
|
|||
.BR pselect ()
|
||||
however never modifies \fIntimeout\fP.
|
||||
.SH NOTES
|
||||
Generally speaking, all operating systems that support sockets, also
|
||||
support
|
||||
.BR select ().
|
||||
Many types of programs become
|
||||
extremely complicated without the use of
|
||||
Generally speaking,
|
||||
all operating systems that support sockets also support
|
||||
.BR select ().
|
||||
.BR select ()
|
||||
can be used to solve
|
||||
|
@ -566,8 +514,7 @@ system call has the same functionality as
|
|||
.BR select (),
|
||||
and is somewhat more efficient when monitoring sparse
|
||||
file descriptor sets.
|
||||
It is nowadays widely available,
|
||||
but historically was less portable than
|
||||
It is nowadays widely available, but historically was less portable than
|
||||
.BR select ().
|
||||
.PP
|
||||
The Linux-specific
|
||||
|
@ -682,7 +629,7 @@ connect_socket(int connect_port, char *address)
|
|||
#define BUF_SIZE 1024
|
||||
|
||||
int
|
||||
main(int argc, char **argv)
|
||||
main(int argc, char *argv[])
|
||||
{
|
||||
int h;
|
||||
int fd1 = \-1, fd2 = \-1;
|
||||
|
@ -707,6 +654,7 @@ main(int argc, char **argv)
|
|||
for (;;) {
|
||||
int r, nfds = 0;
|
||||
fd_set rd, wr, er;
|
||||
|
||||
FD_ZERO(&rd);
|
||||
FD_ZERO(&wr);
|
||||
FD_ZERO(&er);
|
||||
|
@ -720,13 +668,11 @@ main(int argc, char **argv)
|
|||
FD_SET(fd2, &rd);
|
||||
nfds = max(nfds, fd2);
|
||||
}
|
||||
if (fd1 > 0
|
||||
&& buf2_avail \- buf2_written > 0) {
|
||||
if (fd1 > 0 && buf2_avail \- buf2_written > 0) {
|
||||
FD_SET(fd1, &wr);
|
||||
nfds = max(nfds, fd1);
|
||||
}
|
||||
if (fd2 > 0
|
||||
&& buf1_avail \- buf1_written > 0) {
|
||||
if (fd2 > 0 && buf1_avail \- buf1_written > 0) {
|
||||
FD_SET(fd2, &wr);
|
||||
nfds = max(nfds, fd2);
|
||||
}
|
||||
|
@ -743,10 +689,12 @@ main(int argc, char **argv)
|
|||
|
||||
if (r == \-1 && errno == EINTR)
|
||||
continue;
|
||||
|
||||
if (r == \-1) {
|
||||
perror("select()");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
if (FD_ISSET(h, &rd)) {
|
||||
unsigned int l;
|
||||
struct sockaddr_in client_address;
|
||||
|
@ -770,12 +718,12 @@ main(int argc, char **argv)
|
|||
}
|
||||
}
|
||||
|
||||
/* NB: read oob data before normal reads */
|
||||
/* NB: read oob data before normal reads */
|
||||
|
||||
if (fd1 > 0)
|
||||
if (FD_ISSET(fd1, &er)) {
|
||||
char c;
|
||||
errno = 0;
|
||||
|
||||
r = recv(fd1, &c, 1, MSG_OOB);
|
||||
if (r < 1)
|
||||
SHUT_FD1;
|
||||
|
@ -785,7 +733,7 @@ main(int argc, char **argv)
|
|||
if (fd2 > 0)
|
||||
if (FD_ISSET(fd2, &er)) {
|
||||
char c;
|
||||
errno = 0;
|
||||
|
||||
r = recv(fd2, &c, 1, MSG_OOB);
|
||||
if (r < 1)
|
||||
SHUT_FD1;
|
||||
|
@ -829,7 +777,7 @@ main(int argc, char **argv)
|
|||
buf1_written += r;
|
||||
}
|
||||
|
||||
/* check if write data has caught read data */
|
||||
/* check if write data has caught read data */
|
||||
|
||||
if (buf1_written == buf1_avail)
|
||||
buf1_written = buf1_avail = 0;
|
||||
|
@ -850,17 +798,14 @@ main(int argc, char **argv)
|
|||
.PP
|
||||
The above program properly forwards most kinds of TCP connections
|
||||
including OOB signal data transmitted by \fBtelnet\fP servers.
|
||||
It
|
||||
handles the tricky problem of having data flow in both directions
|
||||
It handles the tricky problem of having data flow in both directions
|
||||
simultaneously.
|
||||
You might think it more efficient to use a
|
||||
.BR fork (2)
|
||||
call and devote a thread to each stream.
|
||||
This becomes more tricky than
|
||||
you might suspect.
|
||||
Another idea is to set non-blocking I/O using an
|
||||
.BR ioctl (2)
|
||||
call.
|
||||
This becomes more tricky than you might suspect.
|
||||
Another idea is to set non-blocking I/O using
|
||||
.BR fcntl (2).
|
||||
This also has its problems because you end up using
|
||||
inefficient timeouts.
|
||||
|
||||
|
|
Loading…
Reference in New Issue