mirror of https://github.com/mkerrisk/man-pages
Reverting blunder in commit 4699
This commit is contained in:
parent
10874173db
commit
77117f4fc5
23
Changes
23
Changes
|
@ -38,6 +38,29 @@ initrd.4
|
|||
mtk
|
||||
Fix mis-ordered (.SH) sections.
|
||||
|
||||
connect.2
|
||||
socket.2
|
||||
rtnetlink.3
|
||||
arp.7
|
||||
ddp.7
|
||||
ip.7
|
||||
ipv6.7
|
||||
netlink.7
|
||||
packet.7
|
||||
raw.7
|
||||
rtnetlink.7
|
||||
socket.7
|
||||
tcp.7
|
||||
udp.7
|
||||
unix.7
|
||||
x25.7
|
||||
mtk
|
||||
s/PF_/AF_/ for socket family conistants. Reasons: the AF_ and
|
||||
PF_ constants have always had the same values; there never has
|
||||
been a protocol family that had more than one address family,
|
||||
and POSIX.1-2001 only specifies the AF_* constants.
|
||||
|
||||
|
||||
Typographical or grammatical errors have been corrected in several
|
||||
other places.
|
||||
|
||||
|
|
276
man2/connect.2
276
man2/connect.2
|
@ -1,8 +1,268 @@
|
|||
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.\" Hey Emacs! This file is -*- nroff -*- source.
|
||||
.\"
|
||||
.\" Copyright 1993 Rickard E. Faith (faith@cs.unc.edu)
|
||||
.\" Portions extracted from /usr/include/sys/socket.h, which does not have
|
||||
.\" any authorship information in it. It is probably available under the GPL.
|
||||
.\"
|
||||
.\" Permission is granted to make and distribute verbatim copies of this
|
||||
.\" manual provided the copyright notice and this permission notice are
|
||||
.\" preserved on all copies.
|
||||
.\"
|
||||
.\" Permission is granted to copy and distribute modified versions of this
|
||||
.\" manual under the conditions for verbatim copying, provided that the
|
||||
.\" entire resulting derived work is distributed under the terms of a
|
||||
.\" permission notice identical to this one.
|
||||
.\"
|
||||
.\" Since the Linux kernel and libraries are constantly changing, this
|
||||
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
||||
.\" responsibility for errors or omissions, or for damages resulting from
|
||||
.\" the use of the information contained herein. The author(s) may not
|
||||
.\" have taken the same level of care in the production of this manual,
|
||||
.\" which is licensed free of charge, as they might when working
|
||||
.\" professionally.
|
||||
.\"
|
||||
.\" Formatted or processed versions of this manual, if unaccompanied by
|
||||
.\" the source, must acknowledge the copyright and authors of this work.
|
||||
.\"
|
||||
.\"
|
||||
.\" Other portions are from the 6.9 (Berkeley) 3/10/91 man page:
|
||||
.\"
|
||||
.\" Copyright (c) 1983 The Regents of the University of California.
|
||||
.\" All rights reserved.
|
||||
.\"
|
||||
.\" Redistribution and use in source and binary forms, with or without
|
||||
.\" modification, are permitted provided that the following conditions
|
||||
.\" are met:
|
||||
.\" 1. Redistributions of source code must retain the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer.
|
||||
.\" 2. Redistributions in binary form must reproduce the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer in the
|
||||
.\" documentation and/or other materials provided with the distribution.
|
||||
.\" 3. All advertising materials mentioning features or use of this software
|
||||
.\" must display the following acknowledgement:
|
||||
.\" This product includes software developed by the University of
|
||||
.\" California, Berkeley and its contributors.
|
||||
.\" 4. Neither the name of the University nor the names of its contributors
|
||||
.\" may be used to endorse or promote products derived from this software
|
||||
.\" without specific prior written permission.
|
||||
.\"
|
||||
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
||||
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
||||
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
||||
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||||
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||||
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
||||
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
.\" SUCH DAMAGE.
|
||||
.\"
|
||||
.\" Modified 1997-01-31 by Eric S. Raymond <esr@thyrsus.com>
|
||||
.\" Modified 1998, 1999 by Andi Kleen
|
||||
.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
|
||||
.\"
|
||||
.TH CONNECT 2 2007-12-28 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
connect \- initiate a connection on a socket
|
||||
.SH SYNOPSIS
|
||||
.nf
|
||||
.BR "#include <sys/types.h>" " /* See NOTES */"
|
||||
.br
|
||||
.B #include <sys/socket.h>
|
||||
.sp
|
||||
.BI "int connect(int " sockfd ", const struct sockaddr *" serv_addr ,
|
||||
.BI " socklen_t " addrlen );
|
||||
.fi
|
||||
.SH DESCRIPTION
|
||||
The
|
||||
.BR connect ()
|
||||
system call connects the socket referred to by the file descriptor
|
||||
.I sockfd
|
||||
to the address specified by
|
||||
.IR serv_addr .
|
||||
The
|
||||
.I addrlen
|
||||
argument specifies the size of
|
||||
.IR serv_addr .
|
||||
The format of the address in
|
||||
.I serv_addr
|
||||
is determined by the address space of the socket
|
||||
.IR sockfd ;
|
||||
see
|
||||
.BR socket (2)
|
||||
for further details.
|
||||
|
||||
If the socket
|
||||
.I sockfd
|
||||
is of type
|
||||
.B SOCK_DGRAM
|
||||
then
|
||||
.I serv_addr
|
||||
is the address to which datagrams are sent by default, and the only
|
||||
address from which datagrams are received.
|
||||
If the socket is of type
|
||||
.B SOCK_STREAM
|
||||
or
|
||||
.BR SOCK_SEQPACKET ,
|
||||
this call attempts to make a connection to the socket that is bound
|
||||
to the address specified by
|
||||
.IR serv_addr .
|
||||
.PP
|
||||
Generally, connection-based protocol sockets may successfully
|
||||
.BR connect ()
|
||||
only once; connectionless protocol sockets may use
|
||||
.BR connect ()
|
||||
multiple times to change their association.
|
||||
Connectionless sockets may
|
||||
dissolve the association by connecting to an address with the
|
||||
.I sa_family
|
||||
member of
|
||||
.I sockaddr
|
||||
set to
|
||||
.BR AF_UNSPEC
|
||||
(supported on Linux since kernel 2.2).
|
||||
.SH "RETURN VALUE"
|
||||
If the connection or binding succeeds, zero is returned.
|
||||
On error, \-1 is returned, and
|
||||
.I errno
|
||||
is set appropriately.
|
||||
.SH ERRORS
|
||||
The following are general socket errors only.
|
||||
There may be other domain-specific error codes.
|
||||
.TP
|
||||
.B EACCES
|
||||
For Unix domain sockets, which are identified by pathname:
|
||||
Write permission is denied on the socket file,
|
||||
or search permission is denied for one of the directories
|
||||
in the path prefix.
|
||||
(See also
|
||||
.BR path_resolution (7).)
|
||||
.TP
|
||||
.BR EACCES ", " EPERM
|
||||
The user tried to connect to a broadcast address without having the socket
|
||||
broadcast flag enabled or the connection request failed because of a local
|
||||
firewall rule.
|
||||
.TP
|
||||
.B EADDRINUSE
|
||||
Local address is already in use.
|
||||
.TP
|
||||
.B EAFNOSUPPORT
|
||||
The passed address didn't have the correct address family in its
|
||||
.I sa_family
|
||||
field.
|
||||
.TP
|
||||
.B EAGAIN
|
||||
No more free local ports or insufficient entries in the routing cache.
|
||||
For
|
||||
.B PF_INET
|
||||
see the
|
||||
.I net.ipv4.ip_local_port_range
|
||||
sysctl in
|
||||
.BR ip (7)
|
||||
on how to increase the number of local ports.
|
||||
.TP
|
||||
.B EALREADY
|
||||
The socket is non-blocking and a previous connection attempt has not yet
|
||||
been completed.
|
||||
.TP
|
||||
.B EBADF
|
||||
The file descriptor is not a valid index in the descriptor table.
|
||||
.TP
|
||||
.B ECONNREFUSED
|
||||
No-one listening on the remote address.
|
||||
.TP
|
||||
.B EFAULT
|
||||
The socket structure address is outside the user's address space.
|
||||
.TP
|
||||
.B EINPROGRESS
|
||||
The socket is non-blocking and the connection cannot be completed
|
||||
immediately.
|
||||
It is possible to
|
||||
.BR select (2)
|
||||
or
|
||||
.BR poll (2)
|
||||
for completion by selecting the socket for writing.
|
||||
After
|
||||
.BR select (2)
|
||||
indicates writability, use
|
||||
.BR getsockopt (2)
|
||||
to read the
|
||||
.B SO_ERROR
|
||||
option at level
|
||||
.B SOL_SOCKET
|
||||
to determine whether
|
||||
.BR connect ()
|
||||
completed successfully
|
||||
.RB ( SO_ERROR
|
||||
is zero) or unsuccessfully
|
||||
.RB ( SO_ERROR
|
||||
is one of the usual error codes listed here,
|
||||
explaining the reason for the failure).
|
||||
.TP
|
||||
.B EINTR
|
||||
The system call was interrupted by a signal that was caught; see
|
||||
.BR signal (7).
|
||||
.\" For TCP, the connection will complete asynchronously.
|
||||
.\" See http://lkml.org/lkml/2005/7/12/254
|
||||
.TP
|
||||
.B EISCONN
|
||||
The socket is already connected.
|
||||
.TP
|
||||
.B ENETUNREACH
|
||||
Network is unreachable.
|
||||
.TP
|
||||
.B ENOTSOCK
|
||||
The file descriptor is not associated with a socket.
|
||||
.TP
|
||||
.B ETIMEDOUT
|
||||
Timeout while attempting connection.
|
||||
The server may be too
|
||||
busy to accept new connections.
|
||||
Note that for IP sockets the timeout may
|
||||
be very long when syncookies are enabled on the server.
|
||||
.SH "CONFORMING TO"
|
||||
SVr4, 4.4BSD, (the
|
||||
.BR connect ()
|
||||
function first appeared in 4.2BSD), POSIX.1-2001.
|
||||
.\" SVr4 documents the additional
|
||||
.\" general error codes
|
||||
.\" .BR EADDRNOTAVAIL ,
|
||||
.\" .BR EINVAL ,
|
||||
.\" .BR EAFNOSUPPORT ,
|
||||
.\" .BR EALREADY ,
|
||||
.\" .BR EINTR ,
|
||||
.\" .BR EPROTOTYPE ,
|
||||
.\" and
|
||||
.\" .BR ENOSR .
|
||||
.\" It also
|
||||
.\" documents many additional error conditions not described here.
|
||||
.SH NOTES
|
||||
POSIX.1-2001 does not require the inclusion of
|
||||
.IR <sys/types.h> ,
|
||||
and this header file is not required on Linux.
|
||||
However, some historical (BSD) implementations required this header
|
||||
file, and portable applications are probably wise to include it.
|
||||
|
||||
The third argument of
|
||||
.BR connect ()
|
||||
is in reality an
|
||||
.I int
|
||||
(and this is what 4.x BSD and libc4 and libc5 have).
|
||||
Some POSIX confusion resulted in the present
|
||||
.IR socklen_t ,
|
||||
also used by glibc.
|
||||
See also
|
||||
.BR accept (2).
|
||||
.SH EXAMPLE
|
||||
An example of the use of
|
||||
.BR connect ()
|
||||
is shown in
|
||||
.BR getaddrinfo (3).
|
||||
.SH "SEE ALSO"
|
||||
.BR accept (2),
|
||||
.BR bind (2),
|
||||
.BR getsockname (2),
|
||||
.BR listen (2),
|
||||
.BR socket (2),
|
||||
.BR path_resolution (7)
|
||||
|
|
390
man2/socket.2
390
man2/socket.2
|
@ -1,8 +1,382 @@
|
|||
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
'\" t
|
||||
.\" Copyright (c) 1983, 1991 The Regents of the University of California.
|
||||
.\" All rights reserved.
|
||||
.\"
|
||||
.\" Redistribution and use in source and binary forms, with or without
|
||||
.\" modification, are permitted provided that the following conditions
|
||||
.\" are met:
|
||||
.\" 1. Redistributions of source code must retain the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer.
|
||||
.\" 2. Redistributions in binary form must reproduce the above copyright
|
||||
.\" notice, this list of conditions and the following disclaimer in the
|
||||
.\" documentation and/or other materials provided with the distribution.
|
||||
.\" 3. All advertising materials mentioning features or use of this software
|
||||
.\" must display the following acknowledgement:
|
||||
.\" This product includes software developed by the University of
|
||||
.\" California, Berkeley and its contributors.
|
||||
.\" 4. Neither the name of the University nor the names of its contributors
|
||||
.\" may be used to endorse or promote products derived from this software
|
||||
.\" without specific prior written permission.
|
||||
.\"
|
||||
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
||||
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
||||
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
||||
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||||
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||||
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
||||
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
.\" SUCH DAMAGE.
|
||||
.\"
|
||||
.\" $Id: socket.2,v 1.4 1999/05/13 11:33:42 freitag Exp $
|
||||
.\"
|
||||
.\" Modified 1993-07-24 by Rik Faith <faith@cs.unc.edu>
|
||||
.\" Modified 1996-10-22 by Eric S. Raymond <esr@thyrsus.com>
|
||||
.\" Modified 1998, 1999 by Andi Kleen <ak@muc.de>
|
||||
.\" Modified 2002-07-17 by Michael Kerrisk <mtk.manpages@gmail.com>
|
||||
.\" Modified 2004-06-17 by Michael Kerrisk <mtk.manpages@gmail.com>
|
||||
.\"
|
||||
.TH SOCKET 2 2004-06-17 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
socket \- create an endpoint for communication
|
||||
.SH SYNOPSIS
|
||||
.BR "#include <sys/types.h>" " /* See NOTES */"
|
||||
.br
|
||||
.B #include <sys/socket.h>
|
||||
.sp
|
||||
.BI "int socket(int " domain ", int " type ", int " protocol );
|
||||
.SH DESCRIPTION
|
||||
.BR socket ()
|
||||
creates an endpoint for communication and returns a descriptor.
|
||||
.PP
|
||||
The
|
||||
.I domain
|
||||
argument specifies a communication domain; this selects the protocol
|
||||
family which will be used for communication.
|
||||
These families are defined in
|
||||
.IR <sys/socket.h> .
|
||||
The currently understood formats include:
|
||||
.TS
|
||||
tab(:);
|
||||
l l l.
|
||||
Name:Purpose:Man page
|
||||
T{
|
||||
.BR PF_UNIX ", " PF_LOCAL
|
||||
T}:T{
|
||||
Local communication
|
||||
T}:T{
|
||||
.BR unix (7)
|
||||
T}
|
||||
T{
|
||||
.B PF_INET
|
||||
T}:IPv4 Internet protocols:T{
|
||||
.BR ip (7)
|
||||
T}
|
||||
T{
|
||||
.B PF_INET6
|
||||
T}:IPv6 Internet protocols:T{
|
||||
.BR ipv6 (7)
|
||||
T}
|
||||
T{
|
||||
.B PF_IPX
|
||||
T}:IPX \- Novell protocols:
|
||||
T{
|
||||
.B PF_NETLINK
|
||||
T}:T{
|
||||
Kernel user interface device
|
||||
T}:T{
|
||||
.BR netlink (7)
|
||||
T}
|
||||
T{
|
||||
.B PF_X25
|
||||
T}:ITU-T X.25 / ISO-8208 protocol:T{
|
||||
.BR x25 (7)
|
||||
T}
|
||||
T{
|
||||
.B PF_AX25
|
||||
T}:T{
|
||||
Amateur radio AX.25 protocol
|
||||
T}:
|
||||
T{
|
||||
.B PF_ATMPVC
|
||||
T}:Access to raw ATM PVCs:
|
||||
T{
|
||||
.B PF_APPLETALK
|
||||
T}:Appletalk:T{
|
||||
.BR ddp (7)
|
||||
T}
|
||||
T{
|
||||
.B PF_PACKET
|
||||
T}:T{
|
||||
Low level packet interface
|
||||
T}:T{
|
||||
.BR packet (7)
|
||||
T}
|
||||
.TE
|
||||
.PP
|
||||
The socket has the indicated
|
||||
.IR type ,
|
||||
which specifies the communication semantics.
|
||||
Currently defined types
|
||||
are:
|
||||
.TP
|
||||
.B SOCK_STREAM
|
||||
Provides sequenced, reliable, two-way, connection-based byte streams.
|
||||
An out-of-band data transmission mechanism may be supported.
|
||||
.TP
|
||||
.B SOCK_DGRAM
|
||||
Supports datagrams (connectionless, unreliable messages of a fixed
|
||||
maximum length).
|
||||
.TP
|
||||
.B SOCK_SEQPACKET
|
||||
Provides a sequenced, reliable, two-way connection-based data
|
||||
transmission path for datagrams of fixed maximum length; a consumer is
|
||||
required to read an entire packet with each input system call.
|
||||
.TP
|
||||
.B SOCK_RAW
|
||||
Provides raw network protocol access.
|
||||
.TP
|
||||
.B SOCK_RDM
|
||||
Provides a reliable datagram layer that does not guarantee ordering.
|
||||
.TP
|
||||
.B SOCK_PACKET
|
||||
Obsolete and should not be used in new programs;
|
||||
see
|
||||
.BR packet (7).
|
||||
.PP
|
||||
Some socket types may not be implemented by all protocol families;
|
||||
for example,
|
||||
.B SOCK_SEQPACKET
|
||||
is not implemented for
|
||||
.BR AF_INET .
|
||||
.PP
|
||||
The
|
||||
.I protocol
|
||||
specifies a particular protocol to be used with the socket.
|
||||
Normally only a single protocol exists to support a particular
|
||||
socket type within a given protocol family, in which case
|
||||
.I protocol
|
||||
can be specified as 0.
|
||||
However, it is possible that many protocols may exist, in
|
||||
which case a particular protocol must be specified in this manner.
|
||||
The protocol number to use is specific to the \*(lqcommunication domain\*(rq
|
||||
in which communication is to take place; see
|
||||
.BR protocols (5).
|
||||
See
|
||||
.BR getprotoent (3)
|
||||
on how to map protocol name strings to protocol numbers.
|
||||
.PP
|
||||
Sockets of type
|
||||
.B SOCK_STREAM
|
||||
are full-duplex byte streams, similar to pipes.
|
||||
They do not preserve
|
||||
record boundaries.
|
||||
A stream socket must be in
|
||||
a
|
||||
.I connected
|
||||
state before any data may be sent or received on it.
|
||||
A connection to
|
||||
another socket is created with a
|
||||
.BR connect (2)
|
||||
call.
|
||||
Once connected, data may be transferred using
|
||||
.BR read (2)
|
||||
and
|
||||
.BR write (2)
|
||||
calls or some variant of the
|
||||
.BR send (2)
|
||||
and
|
||||
.BR recv (2)
|
||||
calls.
|
||||
When a session has been completed a
|
||||
.BR close (2)
|
||||
may be performed.
|
||||
Out-of-band data may also be transmitted as described in
|
||||
.BR send (2)
|
||||
and received as described in
|
||||
.BR recv (2).
|
||||
.PP
|
||||
The communications protocols which implement a
|
||||
.B SOCK_STREAM
|
||||
ensure that data is not lost or duplicated.
|
||||
If a piece of data for which
|
||||
the peer protocol has buffer space cannot be successfully transmitted
|
||||
within a reasonable length of time, then the connection is considered
|
||||
to be dead.
|
||||
When
|
||||
.B SO_KEEPALIVE
|
||||
is enabled on the socket the protocol checks in a protocol-specific
|
||||
manner if the other end is still alive.
|
||||
A
|
||||
.B SIGPIPE
|
||||
signal is raised if a process sends or receives
|
||||
on a broken stream; this causes naive processes,
|
||||
which do not handle the signal, to exit.
|
||||
.B SOCK_SEQPACKET
|
||||
sockets employ the same system calls as
|
||||
.B SOCK_STREAM
|
||||
sockets.
|
||||
The only difference is that
|
||||
.BR read (2)
|
||||
calls will return only the amount of data requested,
|
||||
and any data remaining in the arriving packet will be discarded.
|
||||
Also all message boundaries in incoming datagrams are preserved.
|
||||
.PP
|
||||
.B SOCK_DGRAM
|
||||
and
|
||||
.B SOCK_RAW
|
||||
sockets allow sending of datagrams to correspondents named in
|
||||
.BR sendto (2)
|
||||
calls.
|
||||
Datagrams are generally received with
|
||||
.BR recvfrom (2),
|
||||
which returns the next datagram along with the address of its sender.
|
||||
.PP
|
||||
.B SOCK_PACKET
|
||||
is an obsolete socket type to receive raw packets directly from the
|
||||
device driver.
|
||||
Use
|
||||
.BR packet (7)
|
||||
instead.
|
||||
.PP
|
||||
An
|
||||
.BR fcntl (2)
|
||||
.B F_SETOWN
|
||||
operation can be used to specify a process or process group to receive a
|
||||
.B SIGURG
|
||||
signal when the out-of-band data arrives or
|
||||
.B SIGPIPE
|
||||
signal when a
|
||||
.B SOCK_STREAM
|
||||
connection breaks unexpectedly.
|
||||
This operation may also be used to set the process or process group
|
||||
that receives the I/O and asynchronous notification of I/O events via
|
||||
.BR SIGIO .
|
||||
Using
|
||||
.B F_SETOWN
|
||||
is equivalent to an
|
||||
.BR ioctl (2)
|
||||
call with the
|
||||
.B FIOSETOWN
|
||||
or
|
||||
.B SIOCSPGRP
|
||||
argument.
|
||||
.PP
|
||||
When the network signals an error condition to the protocol module (e.g.,
|
||||
using a ICMP message for IP) the pending error flag is set for the socket.
|
||||
The next operation on this socket will return the error code of the pending
|
||||
error.
|
||||
For some protocols it is possible to enable a per-socket error queue
|
||||
to retrieve detailed information about the error; see
|
||||
.B IP_RECVERR
|
||||
in
|
||||
.BR ip (7).
|
||||
.PP
|
||||
The operation of sockets is controlled by socket level
|
||||
.IR options .
|
||||
These options are defined in
|
||||
.IR <sys/socket.h> .
|
||||
The functions
|
||||
.BR setsockopt (2)
|
||||
and
|
||||
.BR getsockopt (2)
|
||||
are used to set and get options, respectively.
|
||||
.SH "RETURN VALUE"
|
||||
On success, a file descriptor for the new socket is returned.
|
||||
On error, \-1 is returned, and
|
||||
.I errno
|
||||
is set appropriately.
|
||||
.SH ERRORS
|
||||
.TP
|
||||
.B EACCES
|
||||
Permission to create a socket of the specified type and/or protocol
|
||||
is denied.
|
||||
.TP
|
||||
.B EAFNOSUPPORT
|
||||
The implementation does not support the specified address family.
|
||||
.TP
|
||||
.B EINVAL
|
||||
Unknown protocol, or protocol family not available.
|
||||
.TP
|
||||
.B EMFILE
|
||||
Process file table overflow.
|
||||
.TP
|
||||
.B ENFILE
|
||||
The system limit on the total number of open files has been reached.
|
||||
.TP
|
||||
.BR ENOBUFS " or " ENOMEM
|
||||
Insufficient memory is available.
|
||||
The socket cannot be
|
||||
created until sufficient resources are freed.
|
||||
.TP
|
||||
.B EPROTONOSUPPORT
|
||||
The protocol type or the specified protocol is not
|
||||
supported within this domain.
|
||||
.PP
|
||||
Other errors may be generated by the underlying protocol modules.
|
||||
.SH "CONFORMING TO"
|
||||
4.4BSD, POSIX.1-2001.
|
||||
.BR socket ()
|
||||
appeared in 4.2BSD.
|
||||
It is generally portable to/from
|
||||
non-BSD systems supporting clones of the BSD socket layer (including
|
||||
System V variants).
|
||||
.SH NOTES
|
||||
POSIX.1-2001 does not require the inclusion of
|
||||
.IR <sys/types.h> ,
|
||||
and this header file is not required on Linux.
|
||||
However, some historical (BSD) implementations required this header
|
||||
file, and portable applications are probably wise to include it.
|
||||
|
||||
The manifest constants used under 4.x BSD for protocol families
|
||||
are
|
||||
.BR PF_UNIX ,
|
||||
.BR PF_INET ,
|
||||
etc., while
|
||||
.B AF_UNIX
|
||||
etc. are used for address
|
||||
families.
|
||||
However, already the BSD man page promises: "The protocol
|
||||
family generally is the same as the address family", and subsequent
|
||||
standards use AF_* everywhere.
|
||||
.SH BUGS
|
||||
.B SOCK_UUCP
|
||||
is not implemented yet.
|
||||
.SH EXAMPLE
|
||||
An example of the use of
|
||||
.BR socket ()
|
||||
is shown in
|
||||
.BR getaddrinfo (3).
|
||||
.SH "SEE ALSO"
|
||||
.BR accept (2),
|
||||
.BR bind (2),
|
||||
.BR connect (2),
|
||||
.BR fcntl (2),
|
||||
.BR getpeername (2),
|
||||
.BR getsockname (2),
|
||||
.BR getsockopt (2),
|
||||
.BR ioctl (2),
|
||||
.BR listen (2),
|
||||
.BR read (2),
|
||||
.BR recv (2),
|
||||
.BR select (2),
|
||||
.BR send (2),
|
||||
.BR shutdown (2),
|
||||
.BR socketpair (2),
|
||||
.BR write (2),
|
||||
.BR getprotoent (3),
|
||||
.BR ip (7),
|
||||
.BR socket (7),
|
||||
.BR tcp (7),
|
||||
.BR udp (7),
|
||||
.BR unix (7)
|
||||
.PP
|
||||
\(lqAn Introductory 4.3BSD Interprocess Communication Tutorial\(rq
|
||||
is reprinted in
|
||||
.I UNIX Programmer's Supplementary Documents Volume 1.
|
||||
.PP
|
||||
\(lqBSD Interprocess Communication Tutorial\(rq
|
||||
is reprinted in
|
||||
.I UNIX Programmer's Supplementary Documents Volume 1.
|
||||
|
|
126
man3/rtnetlink.3
126
man3/rtnetlink.3
|
@ -1,8 +1,118 @@
|
|||
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
|
||||
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\" $Id: rtnetlink.3,v 1.2 1999/05/18 10:35:10 freitag Exp $
|
||||
.TH RTNETLINK 3 1999-05-14 "GNU" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
rtnetlink \- macros to manipulate rtnetlink messages
|
||||
.SH SYNOPSIS
|
||||
.B #include <asm/types.h>
|
||||
.br
|
||||
.B #include <linux/netlink.h>
|
||||
.br
|
||||
.B #include <linux/rtnetlink.h>
|
||||
.br
|
||||
.B #include <sys/socket.h>
|
||||
|
||||
.BI "rtnetlink_socket = socket(PF_NETLINK, int " socket_type \
|
||||
", NETLINK_ROUTE);"
|
||||
.sp
|
||||
.BI "int RTA_OK(struct rtattr *" rta ", int " rtabuflen );
|
||||
.sp
|
||||
.BI "void *RTA_DATA(struct rtattr *" rta );
|
||||
.sp
|
||||
.BI "unsigned int RTA_PAYLOAD(struct rtattr *" rta );
|
||||
.sp
|
||||
.BI "struct rtattr *RTA_NEXT(struct rtattr *" rta \
|
||||
", unsigned int " rtabuflen );
|
||||
.sp
|
||||
.BI "unsigned int RTA_LENGTH(unsigned int " length );
|
||||
.sp
|
||||
.BI "unsigned int RTA_SPACE(unsigned int "length );
|
||||
.SH DESCRIPTION
|
||||
All
|
||||
.BR rtnetlink (7)
|
||||
messages consist of a
|
||||
.BR netlink (7)
|
||||
message header and appended attributes.
|
||||
The attributes should be only
|
||||
manipulated using the macros provided here.
|
||||
.PP
|
||||
.BI RTA_OK( rta ", " attrlen )
|
||||
returns true if
|
||||
.I rta
|
||||
points to a valid routing attribute;
|
||||
.I attrlen
|
||||
is the running length of the attribute buffer.
|
||||
When not true then you must assume there are no more attributes in the
|
||||
message, even if
|
||||
.I attrlen
|
||||
is non-zero.
|
||||
.PP
|
||||
.BI RTA_DATA( rta )
|
||||
returns a pointer to the start of this attribute's data.
|
||||
.PP
|
||||
.BI RTA_PAYLOAD( rta )
|
||||
returns the length of this attribute's data.
|
||||
.PP
|
||||
.BI RTA_NEXT( rta ", " attrlen )
|
||||
gets the next attribute after
|
||||
.IR rta .
|
||||
Calling this macro will update
|
||||
.IR attrlen .
|
||||
You should use
|
||||
.B RTA_OK
|
||||
to check the validity of the returned pointer.
|
||||
.PP
|
||||
.BI RTA_LENGTH( len )
|
||||
returns the length which is required for
|
||||
.I len
|
||||
bytes of data plus the header.
|
||||
.PP
|
||||
.BI RTA_SPACE( len )
|
||||
returns the amount of space which will be needed in a message with
|
||||
.I len
|
||||
bytes of data.
|
||||
.SH CONFORMING TO
|
||||
These macros are non-standard Linux extensions.
|
||||
.SH BUGS
|
||||
This manual page is incomplete.
|
||||
.SH EXAMPLE
|
||||
.\" FIXME ? would be better to use libnetlink in the EXAMPLE code here
|
||||
|
||||
Creating a rtnetlink message to set the MTU of a device:
|
||||
.nf
|
||||
|
||||
struct {
|
||||
struct nlmsghdr nh;
|
||||
struct ifinfomsg if;
|
||||
char attrbuf[512];
|
||||
} req;
|
||||
|
||||
struct rtattr *rta;
|
||||
unsigned int mtu = 1000;
|
||||
|
||||
int rtnetlink_sk = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE);
|
||||
|
||||
memset(&req, 0, sizeof(req));
|
||||
req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
|
||||
req.nh.nlmsg_flags = NLM_F_REQUEST;
|
||||
req.nh.nlmsg_type = RTML_NEWLINK;
|
||||
req.if.ifi_family = AF_UNSPEC;
|
||||
req.if.ifi_index = INTERFACE_INDEX;
|
||||
req.if.ifi_change = 0xffffffff; /* ???*/
|
||||
rta = (struct rtattr *)(((char *) &req) +
|
||||
NLMSG_ALIGN(n\->nlmsg_len));
|
||||
rta\->rta_type = IFLA_MTU;
|
||||
rta\->rta_len = sizeof(unsigned int);
|
||||
req.n.nlmsg_len = NLMSG_ALIGN(req.n.nlmsg_len) +
|
||||
RTA_LENGTH(sizeof(mtu));
|
||||
memcpy(RTA_DATA(rta), &mtu, sizeof(mtu));
|
||||
send(rtnetlink_sk, &req, req.n.nlmsg_len);
|
||||
.fi
|
||||
.SH "SEE ALSO"
|
||||
.BR netlink (3),
|
||||
.BR netlink (7),
|
||||
.BR rtnetlink (7)
|
||||
|
|
283
man7/arp.7
283
man7/arp.7
|
@ -1,8 +1,275 @@
|
|||
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
'\" t
|
||||
.\" This man page is Copyright (C) 1999 Matthew Wilcox <willy@bofh.ai>.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\" Modified June 1999 Andi Kleen
|
||||
.\" $Id: arp.7,v 1.10 2000/04/27 19:31:38 ak Exp $
|
||||
.TH ARP 7 2007-07-27 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
arp \- Linux ARP kernel module.
|
||||
.SH DESCRIPTION
|
||||
This kernel protocol module implements the Address Resolution
|
||||
Protocol defined in RFC\ 826.
|
||||
It is used to convert between Layer2 hardware addresses
|
||||
and IPv4 protocol addresses on directly connected networks.
|
||||
The user normally doesn't interact directly with this module except to
|
||||
configure it;
|
||||
instead it provides a service for other protocols in the kernel.
|
||||
|
||||
A user process can receive ARP packets by using
|
||||
.BR packet (7)
|
||||
sockets.
|
||||
There is also a mechanism for managing the ARP cache
|
||||
in user-space by using
|
||||
.BR netlink (7)
|
||||
sockets.
|
||||
The ARP table can also be controlled via
|
||||
.BR ioctl (2)
|
||||
on any
|
||||
.B PF_INET
|
||||
socket.
|
||||
|
||||
The ARP module maintains a cache of mappings between hardware addresses
|
||||
and protocol addresses.
|
||||
The cache has a limited size so old and less
|
||||
frequently used entries are garbage-collected.
|
||||
Entries which are marked
|
||||
as permanent are never deleted by the garbage-collector.
|
||||
The cache can
|
||||
be directly manipulated by the use of ioctls and its behavior can be
|
||||
tuned by the sysctls defined below.
|
||||
|
||||
When there is no positive feedback for an existing mapping after some
|
||||
time (see the sysctls below) a neighbor cache entry is considered stale.
|
||||
Positive feedback can be gotten from a higher layer; for example from
|
||||
a successful TCP ACK.
|
||||
Other protocols can signal forward progress
|
||||
using the
|
||||
.B MSG_CONFIRM
|
||||
flag to
|
||||
.BR sendmsg (2).
|
||||
When there is no forward progress ARP tries to reprobe.
|
||||
It first tries to ask a local arp daemon
|
||||
.B app_solicit
|
||||
times for an updated MAC address.
|
||||
If that fails and an old MAC address is known an unicast probe is send
|
||||
.B ucast_solicit
|
||||
times.
|
||||
If that fails too it will broadcast a new ARP
|
||||
request to the network.
|
||||
Requests are only send when there is data queued
|
||||
for sending.
|
||||
|
||||
Linux will automatically add a non-permanent proxy arp entry when it
|
||||
receives a request for an address it forwards to and proxy arp is
|
||||
enabled on the receiving interface.
|
||||
When there is a reject route for the target no proxy arp entry is added.
|
||||
.SS Ioctls
|
||||
Three ioctls are available on all
|
||||
.B PF_INET
|
||||
sockets.
|
||||
They take a pointer to a
|
||||
.I struct arpreq
|
||||
as their argument.
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct arpreq {
|
||||
struct sockaddr arp_pa; /* protocol address */
|
||||
struct sockaddr arp_ha; /* hardware address */
|
||||
int arp_flags; /* flags */
|
||||
struct sockaddr arp_netmask; /* netmask of protocol address */
|
||||
char arp_dev[16];
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.BR SIOCSARP ", " SIOCDARP " and " SIOCGARP
|
||||
respectively set, delete and get an ARP mapping.
|
||||
Setting & deleting ARP maps are privileged operations and may
|
||||
only be performed by a process with the
|
||||
.B CAP_NET_ADMIN
|
||||
capability or an effective UID of 0.
|
||||
|
||||
.I arp_pa
|
||||
must be an
|
||||
.B AF_INET
|
||||
socket and
|
||||
.I arp_ha
|
||||
must have the same type as the device which is specified in
|
||||
.IR arp_dev .
|
||||
.I arp_dev
|
||||
is a zero-terminated string which names a device.
|
||||
.RS
|
||||
.TS
|
||||
tab(:) allbox;
|
||||
c s
|
||||
l l.
|
||||
\fIarp_flags\fR
|
||||
flag:meaning
|
||||
ATF_COM:Lookup complete
|
||||
ATF_PERM:Permanent entry
|
||||
ATF_PUBL:Publish entry
|
||||
ATF_USETRAILERS:Trailers requested
|
||||
ATF_NETMASK:Use a netmask
|
||||
ATF_DONTPUB:Don't answer
|
||||
.TE
|
||||
.RE
|
||||
|
||||
.PP
|
||||
If the
|
||||
.B ATF_NETMASK
|
||||
flag is set, then
|
||||
.I arp_netmask
|
||||
should be valid.
|
||||
Linux 2.2 does not support proxy network ARP entries, so this
|
||||
should be set to 0xffffffff, or 0 to remove an existing proxy arp entry.
|
||||
.B ATF_USETRAILERS
|
||||
is obsolete and should not be used.
|
||||
.SS Sysctls
|
||||
ARP supports a sysctl interface to configure parameters on a global
|
||||
or per-interface basis.
|
||||
The sysctls can be accessed by reading or writing the
|
||||
.I /proc/sys/net/ipv4/neigh/*/*
|
||||
files or with the
|
||||
.BR sysctl (2)
|
||||
interface.
|
||||
Each interface in the system has its own directory in
|
||||
/proc/sys/net/ipv4/neigh/.
|
||||
The setting in the "default" directory is used for all newly created
|
||||
devices.
|
||||
Unless otherwise specified time-related sysctls are specified
|
||||
in seconds.
|
||||
.TP
|
||||
.B anycast_delay
|
||||
The maximum number of jiffies to delay before replying to a
|
||||
IPv6 neighbor solicitation message.
|
||||
Anycast support is not yet implemented.
|
||||
Defaults to 1 second.
|
||||
.TP
|
||||
.B app_solicit
|
||||
The maximum number of probes to send to the user space ARP daemon via
|
||||
netlink before dropping back to multicast probes (see
|
||||
.IR mcast_solicit ).
|
||||
Defaults to 0.
|
||||
.TP
|
||||
.B base_reachable_time
|
||||
Once a neighbor has been found, the entry is considered to be valid
|
||||
for at least a random value between
|
||||
.IR base_reachable_time "/2 and 3*" base_reachable_time /2.
|
||||
An entry's validity will be extended if it receives positive feedback
|
||||
from higher level protocols.
|
||||
Defaults to 30 seconds.
|
||||
.TP
|
||||
.B delay_first_probe_time
|
||||
Delay before first probe after it has been decided that a neighbor
|
||||
is stale.
|
||||
Defaults to 5 seconds.
|
||||
.TP
|
||||
.B gc_interval
|
||||
How frequently the garbage collector for neighbor entries
|
||||
should attempt to run.
|
||||
Defaults to 30 seconds.
|
||||
.TP
|
||||
.B gc_stale_time
|
||||
Determines how often to check for stale neighbor entries.
|
||||
When a neighbor entry is considered stale it is resolved again before
|
||||
sending data to it.
|
||||
Defaults to 60 seconds.
|
||||
.TP
|
||||
.B gc_thresh1
|
||||
The minimum number of entries to keep in the ARP cache.
|
||||
The garbage collector will not run if there are fewer than
|
||||
this number of entries in the cache.
|
||||
Defaults to 128.
|
||||
.TP
|
||||
.B gc_thresh2
|
||||
The soft maximum number of entries to keep in the ARP cache.
|
||||
The garbage collector will allow the number of entries to exceed
|
||||
this for 5 seconds before collection will be performed.
|
||||
Defaults to 512.
|
||||
.TP
|
||||
.B gc_thresh3
|
||||
The hard maximum number of entries to keep in the ARP cache.
|
||||
The garbage collector will always run if there are more than
|
||||
this number of entries in the cache.
|
||||
Defaults to 1024.
|
||||
.TP
|
||||
.B locktime
|
||||
The minimum number of jiffies to keep an ARP entry in the cache.
|
||||
This prevents ARP cache thrashing if there is more than one potential
|
||||
mapping (generally due to network misconfiguration).
|
||||
Defaults to 1 second.
|
||||
.TP
|
||||
.B mcast_solicit
|
||||
The maximum number of attempts to resolve an address by
|
||||
multicast/broadcast before marking the entry as unreachable.
|
||||
Defaults to 3.
|
||||
.TP
|
||||
.B proxy_delay
|
||||
When an ARP request for a known proxy-ARP address is received, delay up to
|
||||
.I proxy_delay
|
||||
jiffies before replying.
|
||||
This is used to prevent network flooding in some cases.
|
||||
Defaults to 0.8 seconds.
|
||||
.TP
|
||||
.B proxy_qlen
|
||||
The maximum number of packets which may be queued to proxy-ARP addresses.
|
||||
Defaults to 64.
|
||||
.TP
|
||||
.B retrans_time
|
||||
The number of jiffies to delay before retransmitting a request.
|
||||
Defaults to 1 second.
|
||||
.TP
|
||||
.B ucast_solicit
|
||||
The maximum number of attempts to send unicast probes before asking
|
||||
the ARP daemon (see
|
||||
.IR app_solicit ).
|
||||
Defaults to 3.
|
||||
.TP
|
||||
.B unres_qlen
|
||||
The maximum number of packets which may be queued for each unresolved
|
||||
address by other network layers.
|
||||
Defaults to 3.
|
||||
.SH VERSIONS
|
||||
The
|
||||
.I struct arpreq
|
||||
changed in Linux 2.0 to include the
|
||||
.I arp_dev
|
||||
member and the ioctl numbers changed at the same time.
|
||||
Support for the old ioctls was dropped in Linux 2.2.
|
||||
|
||||
Support for proxy arp entries for networks (netmask not equal 0xffffffff)
|
||||
was dropped in Linux 2.2.
|
||||
It is replaced by automatic proxy arp setup by
|
||||
the kernel for all reachable hosts on other interfaces (when
|
||||
forwarding and proxy arp is enabled for the interface).
|
||||
|
||||
The
|
||||
.I neigh/*
|
||||
sysctls did not exist before Linux 2.2.
|
||||
.SH BUGS
|
||||
Some timer settings are specified in jiffies, which is architecture-
|
||||
and kernel version-dependent; see
|
||||
.BR time (7).
|
||||
|
||||
There is no way to signal positive feedback from user space.
|
||||
This means connection oriented protocols implemented in user space
|
||||
will generate excessive ARP traffic, because ndisc will regularly
|
||||
reprobe the MAC address.
|
||||
The same problem applies for some kernel protocols (e.g., NFS over UDP).
|
||||
|
||||
This man page mashes IPv4 specific and shared between IPv4 and IPv6
|
||||
functionality together.
|
||||
.SH "SEE ALSO"
|
||||
.BR capabilities (7),
|
||||
.BR ip (7)
|
||||
.PP
|
||||
RFC\ 826 for a description of ARP.
|
||||
.br
|
||||
RFC\ 2461 for a description of IPv6 neighbor discovery and the base
|
||||
algorithms used.
|
||||
.LP
|
||||
Linux 2.2+ IPv4 ARP uses the IPv6 algorithms when applicable.
|
||||
|
|
259
man7/ddp.7
259
man7/ddp.7
|
@ -1,8 +1,251 @@
|
|||
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.\" This man page is Copyright (C) 1998 Alan Cox.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\" $Id: ddp.7,v 1.3 1999/05/13 11:33:22 freitag Exp $
|
||||
.TH DDP 7 1999-05-01 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
ddp \- Linux AppleTalk protocol implementation
|
||||
.SH SYNOPSIS
|
||||
.B #include <sys/socket.h>
|
||||
.br
|
||||
.B #include <netatalk/at.h>
|
||||
.sp
|
||||
.IB ddp_socket " = socket(PF_APPLETALK, SOCK_DGRAM, 0);"
|
||||
.br
|
||||
.IB raw_socket " = socket(PF_APPLETALK, SOCK_RAW, " protocol ");"
|
||||
.SH DESCRIPTION
|
||||
Linux implements the Appletalk protocols described in
|
||||
.IR "Inside Appletalk" .
|
||||
Only the DDP layer and AARP are present in
|
||||
the kernel.
|
||||
They are designed to be used via the
|
||||
.B netatalk
|
||||
protocol
|
||||
libraries.
|
||||
This page documents the interface for those who wish or need to
|
||||
use the DDP layer directly.
|
||||
.PP
|
||||
The communication between Appletalk and the user program works using a
|
||||
BSD-compatible socket interface.
|
||||
For more information on sockets, see
|
||||
.BR socket (7).
|
||||
.PP
|
||||
An AppleTalk socket is created by calling the
|
||||
.BR socket (2)
|
||||
function with a
|
||||
.B PF_APPLETALK
|
||||
socket family argument.
|
||||
Valid socket types are
|
||||
.B SOCK_DGRAM
|
||||
to open a
|
||||
.B ddp
|
||||
socket or
|
||||
.B SOCK_RAW
|
||||
to open a
|
||||
.B raw
|
||||
socket.
|
||||
.I protocol
|
||||
is the Appletalk protocol to be received or sent.
|
||||
For
|
||||
.B SOCK_RAW
|
||||
you must specify
|
||||
.BR ATPROTO_DDP .
|
||||
.PP
|
||||
Raw sockets may be only opened by a process with effective user ID 0
|
||||
or when the process has the
|
||||
.B CAP_NET_RAW
|
||||
capability.
|
||||
.SS "Address Format"
|
||||
An Appletalk socket address is defined as a combination of a network number,
|
||||
a node number, and a port number.
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
struct at_addr {
|
||||
unsigned short s_net;
|
||||
unsigned char s_node;
|
||||
};
|
||||
|
||||
struct sockaddr_atalk {
|
||||
sa_family_t sat_family; /* address family */
|
||||
unsigned char sat_port; /* port */
|
||||
struct at_addr sat_addr; /* net/node */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
.PP
|
||||
.I sat_family
|
||||
is always set to
|
||||
.BR AF_APPLETALK .
|
||||
.I sat_port
|
||||
contains the port.
|
||||
The port numbers below 129 are known as
|
||||
.I reserved ports.
|
||||
Only processes with the effective user ID 0 or the
|
||||
.B CAP_NET_BIND_SERVICE
|
||||
capability may
|
||||
.BR bind (2)
|
||||
to these sockets.
|
||||
.I sat_addr
|
||||
is the host address.
|
||||
The
|
||||
.I net
|
||||
member of
|
||||
.I struct at_addr
|
||||
contains the host network in network byte order.
|
||||
The value of
|
||||
.B AT_ANYNET
|
||||
is a
|
||||
wildcard and also implies \(lqthis network.\(rq
|
||||
The
|
||||
.I node
|
||||
member of
|
||||
.I struct at_addr
|
||||
contains the host node number.
|
||||
The value of
|
||||
.B AT_ANYNODE
|
||||
is a
|
||||
wildcard and also implies \(lqthis node.\(rq The value of
|
||||
.B ATADDR_BCAST
|
||||
is a link
|
||||
local broadcast address.
|
||||
.\" FIXME this doesn't make sense [johnl]
|
||||
.SS "Socket Options"
|
||||
No protocol-specific socket options are supported.
|
||||
.SS Sysctls
|
||||
IP supports a sysctl interface to configure some global AppleTalk
|
||||
parameters.
|
||||
The sysctls can be accessed by reading or writing the
|
||||
.I /proc/sys/net/atalk/*
|
||||
files or with the
|
||||
.BR sysctl (2)
|
||||
interface.
|
||||
.TP
|
||||
.B aarp-expiry-time
|
||||
The time interval (in seconds) before an AARP cache entry expires.
|
||||
.TP
|
||||
.B aarp-resolve-time
|
||||
The time interval (in seconds) before an AARP cache entry is resolved.
|
||||
.TP
|
||||
.B aarp-retransmit-limit
|
||||
The number of retransmissions of an AARP query before the node is declared
|
||||
dead.
|
||||
.TP
|
||||
.B aarp-tick-time
|
||||
The timer rate (in seconds) for the timer driving AARP.
|
||||
.PP
|
||||
The default values match the specification and should never need to be
|
||||
changed.
|
||||
.SS Ioctls
|
||||
All ioctls described in
|
||||
.BR socket (7)
|
||||
apply to ddp.
|
||||
.\" FIXME Add a section about multicasting
|
||||
.SH ERRORS
|
||||
.\" FIXME document all errors. We should really fix the kernels to
|
||||
.\" give more uniform error returns (ENOMEM vs ENOBUFS, EPERM vs
|
||||
.\" EACCES etc.)
|
||||
.TP
|
||||
.B EACCES
|
||||
The user tried to execute an operation without the necessary permissions.
|
||||
These include sending to a broadcast address without
|
||||
having the broadcast flag set,
|
||||
and trying to bind to a reserved port without effective user ID 0 or
|
||||
.BR CAP_NET_BIND_SERVICE .
|
||||
.TP
|
||||
.B EADDRINUSE
|
||||
Tried to bind to an address already in use.
|
||||
.TP
|
||||
.B EADDRNOTAVAIL
|
||||
A nonexistent interface was requested or the requested source address was
|
||||
not local.
|
||||
.TP
|
||||
.B EAGAIN
|
||||
Operation on a non-blocking socket would block.
|
||||
.TP
|
||||
.B EALREADY
|
||||
A connection operation on a non-blocking socket is already in progress.
|
||||
.TP
|
||||
.B ECONNABORTED
|
||||
A connection was closed during an
|
||||
.BR accept (2).
|
||||
.TP
|
||||
.B EHOSTUNREACH
|
||||
No routing table entry matches the destination address.
|
||||
.TP
|
||||
.B EINVAL
|
||||
Invalid argument passed.
|
||||
.TP
|
||||
.B EISCONN
|
||||
.BR connect (2)
|
||||
was called on an already connected socket.
|
||||
.TP
|
||||
.B EMSGSIZE
|
||||
Datagram is bigger than the DDP MTU.
|
||||
.TP
|
||||
.B ENODEV
|
||||
Network device not available or not capable of sending IP.
|
||||
.TP
|
||||
.B ENOENT
|
||||
.B SIOCGSTAMP
|
||||
was called on a socket where no packet arrived.
|
||||
.TP
|
||||
.BR ENOMEM " and " ENOBUFS
|
||||
Not enough memory available.
|
||||
.TP
|
||||
.B ENOPKG
|
||||
A kernel subsystem was not configured.
|
||||
.TP
|
||||
.BR ENOPROTOOPT " and " EOPNOTSUPP
|
||||
Invalid socket option passed.
|
||||
.TP
|
||||
.B ENOTCONN
|
||||
The operation is only defined on a connected socket, but the socket wasn't
|
||||
connected.
|
||||
.TP
|
||||
.B EPERM
|
||||
User doesn't have permission to set high priority,
|
||||
make a configuration change,
|
||||
or send signals to the requested process or group,
|
||||
.TP
|
||||
.B EPIPE
|
||||
The connection was unexpectedly closed or shut down by the other end.
|
||||
.TP
|
||||
.B ESOCKTNOSUPPORT
|
||||
The socket was unconfigured, or an unknown socket type was requested.
|
||||
.SH VERSIONS
|
||||
Appletalk is supported by Linux 2.0 or higher.
|
||||
The
|
||||
.B sysctl
|
||||
interface is
|
||||
new in Linux 2.2.
|
||||
.SH NOTES
|
||||
Be very careful with the
|
||||
.B SO_BROADCAST
|
||||
option \- it is not privileged in Linux.
|
||||
It is easy to overload the network
|
||||
with careless sending to broadcast addresses.
|
||||
.SS Compatibility
|
||||
The basic AppleTalk socket interface is compatible with
|
||||
.B netatalk
|
||||
on BSD-derived systems.
|
||||
Many BSD systems fail to check
|
||||
.B SO_BROADCAST
|
||||
when sending broadcast frames; this can lead to compatibility problems.
|
||||
.PP
|
||||
The
|
||||
raw
|
||||
socket mode is unique to Linux and exists to support the alternative CAP
|
||||
package and AppleTalk monitoring tools more easily.
|
||||
.SH BUGS
|
||||
There are too many inconsistent error values.
|
||||
.PP
|
||||
The ioctls used to configure routing tables, devices,
|
||||
AARP tables and other devices are not yet described.
|
||||
.SH "SEE ALSO"
|
||||
.BR recvmsg (2),
|
||||
.BR sendmsg (2),
|
||||
.BR capabilities (7),
|
||||
.BR socket (7)
|
||||
|
|
335
man7/ipv6.7
335
man7/ipv6.7
|
@ -1,8 +1,327 @@
|
|||
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.\" This man page is Copyright (C) 2000 Andi Kleen <ak@muc.de>.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\" $Id: ipv6.7,v 1.3 2000/12/20 18:10:31 ak Exp $
|
||||
.TH IPV6 7 2008-07-17 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
ipv6, PF_INET6 \- Linux IPv6 protocol implementation
|
||||
.SH SYNOPSIS
|
||||
.B #include <sys/socket.h>
|
||||
.br
|
||||
.B #include <netinet/in.h>
|
||||
.sp
|
||||
.IB tcp6_socket " = socket(PF_INET6, SOCK_STREAM, 0);"
|
||||
.br
|
||||
.IB raw6_socket " = socket(PF_INET6, SOCK_RAW, " protocol ");"
|
||||
.br
|
||||
.IB udp6_socket " = socket(PF_INET6, SOCK_DGRAM, " protocol ");"
|
||||
.SH DESCRIPTION
|
||||
Linux 2.2 optionally implements the Internet Protocol, version 6.
|
||||
This man page contains a description of the IPv6 basic API as
|
||||
implemented by the Linux kernel and glibc 2.1.
|
||||
The interface
|
||||
is based on the BSD sockets interface; see
|
||||
.BR socket (7).
|
||||
.PP
|
||||
The IPv6 API aims to be mostly compatible with the
|
||||
.BR ip (7)
|
||||
v4 API.
|
||||
Only differences are described in this man page.
|
||||
.PP
|
||||
To bind an
|
||||
.B AF_INET6
|
||||
socket to any process the local address should be copied from the
|
||||
.I in6addr_any
|
||||
variable which has
|
||||
.I in6_addr
|
||||
type.
|
||||
In static initializations
|
||||
.B IN6ADDR_ANY_INIT
|
||||
may also be used, which expands to a constant expression.
|
||||
Both of them are in network order.
|
||||
.PP
|
||||
The IPv6 loopback address (::1) is available in the global
|
||||
.I in6addr_loopback
|
||||
variable.
|
||||
For initializations
|
||||
.B IN6ADDR_LOOPBACK_INIT
|
||||
should be used.
|
||||
.PP
|
||||
IPv4 connections can be handled with the v6 API by using the
|
||||
v4-mapped-on-v6 address type;
|
||||
thus a program only needs only to support this API type to
|
||||
support both protocols.
|
||||
This is handled transparently by the address
|
||||
handling functions in libc.
|
||||
.PP
|
||||
IPv4 and IPv6 share the local port space.
|
||||
When you get an IPv4 connection
|
||||
or packet to a IPv6 socket its source address will be mapped
|
||||
to v6 and it will be mapped to v6.
|
||||
.SS "Address Format"
|
||||
.in +4n
|
||||
.nf
|
||||
struct sockaddr_in6 {
|
||||
uint16_t sin6_family; /* AF_INET6 */
|
||||
uint16_t sin6_port; /* port number */
|
||||
uint32_t sin6_flowinfo; /* IPv6 flow information */
|
||||
struct in6_addr sin6_addr; /* IPv6 address */
|
||||
uint32_t sin6_scope_id; /* Scope ID (new in 2.4) */
|
||||
};
|
||||
|
||||
struct in6_addr {
|
||||
unsigned char s6_addr[16]; /* IPv6 address */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
.sp
|
||||
.I sin6_family
|
||||
is always set to
|
||||
.BR AF_INET6 ;
|
||||
.I sin6_port
|
||||
is the protocol port (see
|
||||
.I sin_port
|
||||
in
|
||||
.BR ip (7));
|
||||
.I sin6_flowinfo
|
||||
is the IPv6 flow identifier;
|
||||
.I sin6_addr
|
||||
is the 128-bit IPv6 address.
|
||||
.I sin6_scope_id
|
||||
is an ID of depending of on the scope of the address.
|
||||
It is new in Linux 2.4.
|
||||
Linux only supports it for link scope addresses, in that case
|
||||
.I sin6_scope_id
|
||||
contains the interface index (see
|
||||
.BR netdevice (7))
|
||||
.PP
|
||||
IPv6 supports several address types: unicast to address a single
|
||||
host, multicast to address a group of hosts,
|
||||
anycast to address the nearest member of a group of hosts
|
||||
(not implemented in Linux), IPv4-on-IPv6 to
|
||||
address a IPv4 host, and other reserved address types.
|
||||
.PP
|
||||
The address notation for IPv6 is a group of 16 2-digit hexadecimal
|
||||
numbers, separated with a \(aq:\(aq.
|
||||
\&"::" stands for a string of 0 bits.
|
||||
Special addresses are ::1 for loopback and ::FFFF:<IPv4 address>
|
||||
for IPv4-mapped-on-IPv6.
|
||||
.PP
|
||||
The port space of IPv6 is shared with IPv4.
|
||||
.SS "Socket Options"
|
||||
IPv6 supports some protocol-specific socket options that can be set with
|
||||
.BR setsockopt (2)
|
||||
and read with
|
||||
.BR getsockopt (2).
|
||||
The socket option level for IPv6 is
|
||||
.BR IPPROTO_IPV6 .
|
||||
A boolean integer flag is zero when it is false, otherwise true.
|
||||
.TP
|
||||
.B IPV6_ADDRFORM
|
||||
Turn an
|
||||
.B AF_INET6
|
||||
socket into a socket of a different address family.
|
||||
Only
|
||||
.B AF_INET
|
||||
is currently supported for that.
|
||||
It is only allowed for IPv6 sockets
|
||||
that are connected and bound to a v4-mapped-on-v6 address.
|
||||
The argument is a pointer to an integer containing
|
||||
.BR AF_INET .
|
||||
This is useful to pass v4-mapped sockets as file descriptors to
|
||||
programs that don't know how to deal with the IPv6 API.
|
||||
.TP
|
||||
.B IPV6_ADD_MEMBERSHIP, IPV6_DROP_MEMBERSHIP
|
||||
Control membership in multicast groups.
|
||||
Argument is a pointer to a
|
||||
.I struct ipv6_mreq
|
||||
structure.
|
||||
.\" FIXME IPV6_CHECKSUM is not documented, and probably should be
|
||||
.\" FIXME IPV6_JOIN_ANYCAST is not documented, and probably should be
|
||||
.\" FIXME IPV6_LEAVE_ANYCAST is not documented, and probably should be
|
||||
.\" FIXME IPV6_RECVPKTINFO is not documented, and probably should be
|
||||
.\" FIXME IPV6_2292PKTINFO is not documented, and probably should be
|
||||
.\" FIXME there are probably many other IPV6_* socket options that
|
||||
.\" should be documented
|
||||
.TP
|
||||
.B IPV6_MTU
|
||||
Set the MTU to be used for the socket.
|
||||
The MTU is limited by the device
|
||||
MTU or the path mtu when path mtu discovery is enabled.
|
||||
Argument is a pointer to integer.
|
||||
.TP
|
||||
.B IPV6_MTU_DISCOVER
|
||||
Control path mtu discovery on the socket.
|
||||
See
|
||||
.B IP_MTU_DISCOVER
|
||||
in
|
||||
.BR ip (7)
|
||||
for details.
|
||||
.TP
|
||||
.B IPV6_MULTICAST_HOPS
|
||||
Set the multicast hop limit for the socket.
|
||||
Argument is a pointer to an
|
||||
integer.
|
||||
\-1 in the value means use the route default, otherwise it should be
|
||||
between 0 and 255.
|
||||
.TP
|
||||
.B IPV6_MULTICAST_IF
|
||||
Set the device for outgoing multicast packets on the socket.
|
||||
This is only allowed
|
||||
for
|
||||
.B SOCK_DGRAM
|
||||
and
|
||||
.B SOCK_RAW
|
||||
socket.
|
||||
The argument is a pointer to an interface index (see
|
||||
.BR netdevice (7))
|
||||
in an integer.
|
||||
.TP
|
||||
.B IPV6_MULTICAST_LOOP
|
||||
Control whether the socket sees multicast packets that it has send itself.
|
||||
Argument is a pointer to boolean.
|
||||
.TP
|
||||
.B IPV6_PKTINFO
|
||||
Set delivery of the
|
||||
.B IPV6_PKTINFO
|
||||
control message on incoming datagrams.
|
||||
Only allowed for
|
||||
.B SOCK_DGRAM
|
||||
or
|
||||
.B SOCK_RAW
|
||||
sockets.
|
||||
Argument is a pointer to a boolean value in an integer.
|
||||
.TP
|
||||
.nh
|
||||
.B IPV6_RTHDR, IPV6_AUTHHDR, IPV6_DSTOPS, IPV6_HOPOPTS, IPV6_FLOWINFO, IPV6_HOPLIMIT
|
||||
.hy
|
||||
Set delivery of control messages for incoming datagrams containing
|
||||
extension headers from the received packet.
|
||||
.B IPV6_RTHDR
|
||||
delivers the routing header,
|
||||
.B IPV6_AUTHHDR
|
||||
delivers the authentication header,
|
||||
.B IPV6_DSTOPTS
|
||||
delivers the destination options,
|
||||
.B IPV6_HOPOPTS
|
||||
delivers the hop options,
|
||||
.B IPV6_FLOWINFO
|
||||
delivers an integer containing the flow ID,
|
||||
.B IPV6_HOPLIMIT
|
||||
delivers an integer containing the hop count of the packet.
|
||||
The control messages have the same type as the socket option.
|
||||
All these header options can also be set for outgoing packets
|
||||
by putting the appropriate control message into the control buffer of
|
||||
.BR sendmsg (2).
|
||||
Only allowed for
|
||||
.B SOCK_DGRAM
|
||||
or
|
||||
.B SOCK_RAW
|
||||
sockets.
|
||||
Argument is a pointer to a boolean value.
|
||||
.TP
|
||||
.B IPV6_RECVERR
|
||||
Control receiving of asynchronous error options.
|
||||
See
|
||||
.B IP_RECVERR
|
||||
in
|
||||
.BR ip (7)
|
||||
for details.
|
||||
Argument is a pointer to boolean.
|
||||
.TP
|
||||
.B IPV6_ROUTER_ALERT
|
||||
Pass forwarded packets containing a router alert hop-by-hop option to
|
||||
this socket.
|
||||
Only allowed for SOCK_RAW sockets.
|
||||
The tapped packets are not forwarded by the kernel, it is the
|
||||
user's responsibility to send them out again.
|
||||
Argument is a pointer to an integer.
|
||||
A positive integer indicates a router alert option value to intercept.
|
||||
Packets carrying a router alert option with a value field containing
|
||||
this integer will be delivered to the socket.
|
||||
A negative integer disables delivery of packets with router alert options
|
||||
to this socket.
|
||||
.TP
|
||||
.B IPV6_UNICAST_HOPS
|
||||
Set the unicast hop limit for the socket.
|
||||
Argument is a pointer to an integer.
|
||||
\-1 in the value means use the route default,
|
||||
otherwise it should be between 0 and 255.
|
||||
.TP
|
||||
.BR IPV6_V6ONLY " (since Linux 2.4.21 and 2.6)"
|
||||
.\" See RFC 3493
|
||||
If this flag is set to true (non-zero), then the socket is restricted
|
||||
to sending and receiving IPv6 packets only.
|
||||
In this case, an IPv4 and an IPv6 application can bind
|
||||
to a single port at the same time.
|
||||
|
||||
If this flag is set to false (zero),
|
||||
then the socket can be used to send and receive packets
|
||||
to and from an IPv6 address or an IPv4-mapped IPv6 address.
|
||||
|
||||
The argument is a pointer to a boolean value in an integer.
|
||||
|
||||
The default value for this flag is defined by the contents of the file
|
||||
.BR /proc/sys/net/ipv6/bindv6only .
|
||||
The default value for that file is 0 (false).
|
||||
.\" FLOWLABEL_MGR, FLOWINFO_SEND
|
||||
.SH VERSIONS
|
||||
The older
|
||||
.I libinet6
|
||||
libc5 based IPv6 API implementation for Linux is not described here
|
||||
and may vary in details.
|
||||
.PP
|
||||
Linux 2.4 will break binary compatibility for the
|
||||
.I sockaddr_in6
|
||||
for 64-bit
|
||||
hosts by changing the alignment of
|
||||
.I in6_addr
|
||||
and adding an additional
|
||||
.I sin6_scope_id
|
||||
field.
|
||||
The kernel interfaces stay compatible, but a program including
|
||||
.I sockaddr_in6
|
||||
or
|
||||
.I in6_addr
|
||||
into other structures may not be.
|
||||
This is not
|
||||
a problem for 32-bit hosts like i386.
|
||||
.PP
|
||||
The
|
||||
.I sin6_flowinfo
|
||||
field is new in Linux 2.4.
|
||||
It is transparently passed/read by the kernel
|
||||
when the passed address length contains it.
|
||||
Some programs that pass a longer address buffer and then
|
||||
check the outgoing address length may break.
|
||||
.SH "NOTES"
|
||||
The
|
||||
.I sockaddr_in6
|
||||
structure is bigger than the generic
|
||||
.IR sockaddr .
|
||||
Programs that assume that all address types can be stored safely in a
|
||||
.I struct sockaddr
|
||||
need to be changed to use
|
||||
.I struct sockaddr_storage
|
||||
for that instead.
|
||||
.SH BUGS
|
||||
The IPv6 extended API as in RFC\ 2292 is currently only partly
|
||||
implemented;
|
||||
although the 2.2 kernel has near complete support for receiving options,
|
||||
the macros for generating IPv6 options are missing in glibc 2.1.
|
||||
.PP
|
||||
IPSec support for EH and AH headers is missing.
|
||||
.PP
|
||||
Flow label management is not complete and not documented here.
|
||||
.PP
|
||||
This man page is not complete.
|
||||
.SH "SEE ALSO"
|
||||
.BR cmsg (3),
|
||||
.BR ip (7)
|
||||
.PP
|
||||
RFC\ 2553: IPv6 BASIC API.
|
||||
Linux tries to be compliant to this.
|
||||
.PP
|
||||
RFC\ 2460: IPv6 specification.
|
||||
|
|
468
man7/netlink.7
468
man7/netlink.7
|
@ -1,8 +1,460 @@
|
|||
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
'\" t
|
||||
.\" Don't change the first line, it tells man that tbl is needed.
|
||||
.\" This man page is Copyright (c) 1998 by Andi Kleen. Subject to the GPL.
|
||||
.\" Based on the original comments from Alexey Kuznetsov
|
||||
.\" Modified 2005-12-27 by Hasso Tepper <hasso@estpak.ee>
|
||||
.\" $Id: netlink.7,v 1.8 2000/06/22 13:23:00 ak Exp $
|
||||
.TH NETLINK 7 2005-12-27 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
netlink \- Communication between kernel and userspace (PF_NETLINK)
|
||||
.SH SYNOPSIS
|
||||
.nf
|
||||
.B #include <asm/types.h>
|
||||
.B #include <sys/socket.h>
|
||||
.B #include <linux/netlink.h>
|
||||
|
||||
.BI "netlink_socket = socket(PF_NETLINK, " socket_type ", " netlink_family );
|
||||
.fi
|
||||
.SH DESCRIPTION
|
||||
Netlink is used to transfer information between kernel and
|
||||
userspace processes.
|
||||
It consists of a standard sockets-based interface for userspace
|
||||
processes and an internal kernel API for kernel modules.
|
||||
The internal kernel interface is not documented in this manual page.
|
||||
There is also an obsolete netlink interface
|
||||
via netlink character devices; this interface is not documented here
|
||||
and is only provided for backwards compatibility.
|
||||
|
||||
Netlink is a datagram-oriented service.
|
||||
Both
|
||||
.B SOCK_RAW
|
||||
and
|
||||
.B SOCK_DGRAM
|
||||
are valid values for
|
||||
.IR socket_type .
|
||||
However, the netlink protocol does not distinguish between datagram
|
||||
and raw sockets.
|
||||
|
||||
.I netlink_family
|
||||
selects the kernel module or netlink group to communicate with.
|
||||
The currently assigned netlink families are:
|
||||
.TP
|
||||
.B NETLINK_ROUTE
|
||||
Receives routing and link updates and may be used to modify the routing
|
||||
tables (both IPv4 and IPv6), IP addresses, link parameters,
|
||||
neighbor setups, queueing disciplines, traffic classes and
|
||||
packet classifiers (see
|
||||
.BR rtnetlink (7)).
|
||||
.TP
|
||||
.B NETLINK_W1
|
||||
Messages from 1-wire subsystem.
|
||||
.TP
|
||||
.B NETLINK_USERSOCK
|
||||
Reserved for user-mode socket protocols.
|
||||
.TP
|
||||
.B NETLINK_FIREWALL
|
||||
Transport IPv4 packets from netfilter to userspace.
|
||||
Used by
|
||||
.I ip_queue
|
||||
kernel module.
|
||||
.TP
|
||||
.B NETLINK_INET_DIAG
|
||||
.\" FIXME More details on NETLINK_INET_DIAG needed.
|
||||
INET socket monitoring.
|
||||
.TP
|
||||
.B NETLINK_NFLOG
|
||||
Netfilter/iptables ULOG.
|
||||
.TP
|
||||
.B NETLINK_XFRM
|
||||
.\" FIXME More details on NETLINK_XFRM needed.
|
||||
IPsec.
|
||||
.TP
|
||||
.B NETLINK_SELINUX
|
||||
SELinux event notifications.
|
||||
.TP
|
||||
.B NETLINK_ISCSI
|
||||
.\" FIXME More details on NETLINK_ISCSI needed.
|
||||
Open-iSCSI.
|
||||
.TP
|
||||
.B NETLINK_AUDIT
|
||||
.\" FIXME More details on NETLINK_AUDIT needed.
|
||||
Auditing.
|
||||
.TP
|
||||
.B NETLINK_FIB_LOOKUP
|
||||
.\" FIXME More details on NETLINK_FIB_LOOKUP needed.
|
||||
Access to FIB lookup from userspace.
|
||||
.TP
|
||||
.B NETLINK_CONNECTOR
|
||||
Kernel connector.
|
||||
See
|
||||
.I Documentation/connector/*
|
||||
in the kernel source for further information.
|
||||
.TP
|
||||
.B NETLINK_NETFILTER
|
||||
.\" FIXME More details on NETLINK_NETFILTER needed.
|
||||
Netfilter subsystem.
|
||||
.TP
|
||||
.B NETLINK_IP6_FW
|
||||
Transport IPv6 packets from netfilter to userspace.
|
||||
Used by
|
||||
.I ip6_queue
|
||||
kernel module.
|
||||
.TP
|
||||
.B NETLINK_DNRTMSG
|
||||
DECnet routing messages.
|
||||
.TP
|
||||
.B NETLINK_KOBJECT_UEVENT
|
||||
.\" FIXME More details on NETLINK_KOBJECT_UEVENT needed.
|
||||
Kernel messages to userspace.
|
||||
.TP
|
||||
.B NETLINK_GENERIC
|
||||
Generic netlink family for simplified netlink usage.
|
||||
.PP
|
||||
Netlink messages consist of a byte stream with one or multiple
|
||||
.I nlmsghdr
|
||||
headers and associated payload.
|
||||
The byte stream should only be accessed with the standard
|
||||
.B NLMSG_*
|
||||
macros.
|
||||
See
|
||||
.BR netlink (3)
|
||||
for further information.
|
||||
|
||||
In multipart messages (multiple
|
||||
.I nlmsghdr
|
||||
headers with associated payload in one byte stream) the first and all
|
||||
following headers have the
|
||||
.B NLM_F_MULTI
|
||||
flag set, except for the last header which has the type
|
||||
.BR NLMSG_DONE .
|
||||
|
||||
After each
|
||||
.I nlmsghdr
|
||||
the payload follows.
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct nlmsghdr {
|
||||
__u32 nlmsg_len; /* Length of message including header. */
|
||||
__u16 nlmsg_type; /* Type of message content. */
|
||||
__u16 nlmsg_flags; /* Additional flags. */
|
||||
__u32 nlmsg_seq; /* Sequence number. */
|
||||
__u32 nlmsg_pid; /* PID of the sending process. */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.I nlmsg_type
|
||||
can be one of the standard message types:
|
||||
.B NLMSG_NOOP
|
||||
message is to be ignored,
|
||||
.B NLMSG_ERROR
|
||||
message signals an error and the payload contains an
|
||||
.I nlmsgerr
|
||||
structure,
|
||||
.B NLMSG_DONE
|
||||
message terminates a multipart message.
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct nlmsgerr {
|
||||
int error; /* Negative errno or 0 for acknowledgements */
|
||||
struct nlmsghdr msg; /* Message header that caused the error */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
A netlink family usually specifies more message types, see the
|
||||
appropriate manual pages for that, for example,
|
||||
.BR rtnetlink (7)
|
||||
for
|
||||
.BR NETLINK_ROUTE .
|
||||
|
||||
Standard flag bits in
|
||||
.I nlmsg_flags
|
||||
.br
|
||||
---------------------------------
|
||||
.TS
|
||||
tab(:);
|
||||
lB l.
|
||||
NLM_F_REQUEST:Must be set on all request messages.
|
||||
NLM_F_MULTI:T{
|
||||
The message is part of a multipart message terminated by
|
||||
.BR NLMSG_DONE .
|
||||
T}
|
||||
NLM_F_ACK:Request for an acknowledgment on success.
|
||||
NLM_F_ECHO:Echo this request.
|
||||
.TE
|
||||
|
||||
Additional flag bits for GET requests
|
||||
.br
|
||||
-------------------------------------
|
||||
.TS
|
||||
tab(:);
|
||||
lB l.
|
||||
NLM_F_ROOT:Return the complete table instead of a single entry.
|
||||
NLM_F_MATCH:T{
|
||||
Return all entries matching criteria passed in message content.
|
||||
Not implemented yet.
|
||||
T}
|
||||
.\" FIXME NLM_F_ATOMIC is not used any more?
|
||||
NLM_F_ATOMIC:Return an atomic snapshot of the table.
|
||||
NLM_F_DUMP:Convenience macro; equivalent to (NLM_F_ROOT|NLM_F_MATCH).
|
||||
.TE
|
||||
|
||||
Note that
|
||||
.B NLM_F_ATOMIC
|
||||
requires the
|
||||
.B CAP_NET_ADMIN
|
||||
capability or an effective UID of 0.
|
||||
|
||||
Additional flag bits for NEW requests
|
||||
.br
|
||||
-------------------------------------
|
||||
.TS
|
||||
tab(:);
|
||||
lB l.
|
||||
NLM_F_REPLACE:Replace existing matching object.
|
||||
NLM_F_EXCL:Don't replace if the object already exists.
|
||||
NLM_F_CREATE:Create object if it doesn't already exist.
|
||||
NLM_F_APPEND:Add to the end of the object list.
|
||||
.TE
|
||||
|
||||
.I nlmsg_seq
|
||||
and
|
||||
.I nlmsg_pid
|
||||
are used to track messages.
|
||||
.I nlmsg_pid
|
||||
shows the origin of the message.
|
||||
Note that there isn't a 1:1 relationship between
|
||||
.I nlmsg_pid
|
||||
and the PID of the process if the message originated from a netlink
|
||||
socket.
|
||||
See the
|
||||
.B ADDRESS FORMATS
|
||||
section for further information.
|
||||
|
||||
Both
|
||||
.I nlmsg_seq
|
||||
and
|
||||
.I nlmsg_pid
|
||||
.\" FIXME Explain more about nlmsg_seq and nlmsg_pid.
|
||||
are opaque to netlink core.
|
||||
|
||||
Netlink is not a reliable protocol.
|
||||
It tries its best to deliver a message to its destination(s),
|
||||
but may drop messages when an out-of-memory condition or
|
||||
other error occurs.
|
||||
For reliable transfer the sender can request an
|
||||
acknowledgement from the receiver by setting the
|
||||
.B NLM_F_ACK
|
||||
flag.
|
||||
An acknowledgment is an
|
||||
.B NLMSG_ERROR
|
||||
packet with the error field set to 0.
|
||||
The application must generate acknowledgements for
|
||||
received messages itself.
|
||||
The kernel tries to send an
|
||||
.B NLMSG_ERROR
|
||||
message for every failed packet.
|
||||
A user process should follow this convention too.
|
||||
|
||||
However, reliable transmissions from kernel to user are impossible
|
||||
in any case.
|
||||
The kernel can't send a netlink message if the socket buffer is full:
|
||||
the message will be dropped and the kernel and the userspace process will
|
||||
no longer have the same view of kernel state.
|
||||
It is up to the application to detect when this happens (via the
|
||||
.B ENOBUFS
|
||||
error returned by
|
||||
.BR recvmsg (2))
|
||||
and resynchronize.
|
||||
.SS Address Formats
|
||||
The
|
||||
.I sockaddr_nl
|
||||
structure describes a netlink client in user space or in the kernel.
|
||||
A
|
||||
.I sockaddr_nl
|
||||
can be either unicast (only sent to one peer) or sent to
|
||||
netlink multicast groups
|
||||
.RI ( nl_groups
|
||||
not equal 0).
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct sockaddr_nl {
|
||||
sa_family_t nl_family; /* AF_NETLINK */
|
||||
unsigned short nl_pad; /* Zero. */
|
||||
pid_t nl_pid; /* Process ID. */
|
||||
__u32 nl_groups; /* Multicast groups mask. */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.I nl_pid
|
||||
is the unicast address of netlink socket.
|
||||
It's always 0 if the destination is in the kernel.
|
||||
For a userspace process,
|
||||
.I nl_pid
|
||||
is usually the PID of the process owning the destination socket.
|
||||
However,
|
||||
.I nl_pid
|
||||
identifies a netlink socket, not a process.
|
||||
If a process owns several netlink
|
||||
sockets, then
|
||||
.I nl_pid
|
||||
can only be equal to the process ID for at most one socket.
|
||||
There are two ways to assign
|
||||
.I nl_pid
|
||||
to a netlink socket.
|
||||
If the application sets
|
||||
.I nl_pid
|
||||
before calling
|
||||
.BR bind (2),
|
||||
then it is up to the application to make sure that
|
||||
.I nl_pid
|
||||
is unique.
|
||||
If the application sets it to 0, the kernel takes care of assigning it.
|
||||
The kernel assigns the process ID to the first netlink socket the process
|
||||
opens and assigns a unique
|
||||
.I nl_pid
|
||||
to every netlink socket that the process subsequently creates.
|
||||
|
||||
.I nl_groups
|
||||
is a bit mask with every bit representing a netlink group number.
|
||||
Each netlink family has a set of 32 multicast groups.
|
||||
When
|
||||
.BR bind (2)
|
||||
is called on the socket, the
|
||||
.I nl_groups
|
||||
field in the
|
||||
.I sockaddr_nl
|
||||
should be set to a bit mask of the groups which it wishes to listen to.
|
||||
The default value for this field is zero which means that no multicasts
|
||||
will be received.
|
||||
A socket may multicast messages to any of the multicast groups by setting
|
||||
.I nl_groups
|
||||
to a bit mask of the groups it wishes to send to when it calls
|
||||
.BR sendmsg (2)
|
||||
or does a
|
||||
.BR connect (2).
|
||||
Only processes with an effective UID of 0 or the
|
||||
.B CAP_NET_ADMIN
|
||||
capability may send or listen to a netlink multicast group.
|
||||
Any replies to a message received for a multicast group should be
|
||||
sent back to the sending PID and the multicast group.
|
||||
.SH VERSIONS
|
||||
The socket interface to netlink is a new feature of Linux 2.2.
|
||||
|
||||
Linux 2.0 supported a more primitive device based netlink interface
|
||||
(which is still available as a compatibility option).
|
||||
This obsolete interface is not described here.
|
||||
|
||||
NETLINK_SELINUX appeared in Linux 2.6.4.
|
||||
|
||||
NETLINK_AUDIT appeared in Linux 2.6.6.
|
||||
|
||||
NETLINK_KOBJECT_UEVENT appeared in Linux 2.6.10.
|
||||
|
||||
NETLINK_W1 and NETLINK_FIB_LOOKUP appeared in Linux 2.6.13.
|
||||
|
||||
NETLINK_INET_DIAG, NETLINK_CONNECTOR and NETLINK_NETFILTER appeared in
|
||||
Linux 2.6.14.
|
||||
|
||||
NETLINK_GENERIC and NETLINK_ISCSI appeared in Linux 2.6.15.
|
||||
.SH NOTES
|
||||
It is often better to use netlink via
|
||||
.I libnetlink
|
||||
or
|
||||
.I libnl
|
||||
than via the low-level kernel interface.
|
||||
.SH BUGS
|
||||
This manual page is not complete.
|
||||
.SH EXAMPLE
|
||||
The following example creates a
|
||||
.B NETLINK_ROUTE
|
||||
netlink socket which will listen to the
|
||||
.B RTMGRP_LINK
|
||||
(network interface create/delete/up/down events) and
|
||||
.B RTMGRP_IPV4_IFADDR
|
||||
(IPv4 addresses add/delete events) multicast groups.
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct sockaddr_nl sa;
|
||||
|
||||
memset(&sa, 0, sizeof(sa));
|
||||
snl.nl_family = AF_NETLINK;
|
||||
snl.nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR;
|
||||
|
||||
fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
|
||||
bind(fd, (struct sockaddr*)&sa, sizeof(sa));
|
||||
.fi
|
||||
.in
|
||||
|
||||
The next example demonstrates how to send a netlink message to the
|
||||
kernel (pid 0).
|
||||
Note that application must take care of message sequence numbers
|
||||
in order to reliably track acknowledgements.
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct nlmsghdr *nh; /* The nlmsghdr with payload to send. */
|
||||
struct sockaddr_nl sa;
|
||||
struct iovec iov = { (void *) nh, nh\->nlmsg_len };
|
||||
struct msghdr msg;
|
||||
|
||||
msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 };
|
||||
memset(&sa, 0, sizeof(sa));
|
||||
sa.nl_family = AF_NETLINK;
|
||||
nh\->nlmsg_pid = 0;
|
||||
nh\->nlmsg_seq = ++sequence_number;
|
||||
/* Request an ack from kernel by setting NLM_F_ACK. */
|
||||
nh\->nlmsg_flags |= NLM_F_ACK;
|
||||
|
||||
sendmsg(fd, &msg, 0);
|
||||
.fi
|
||||
.in
|
||||
|
||||
And the last example is about reading netlink message.
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
int len;
|
||||
char buf[4096];
|
||||
struct iovec iov = { buf, sizeof(buf) };
|
||||
struct sockaddr_nl sa;
|
||||
struct msghdr msg;
|
||||
struct nlmsghdr *nh;
|
||||
|
||||
msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 };
|
||||
len = recvmsg(fd, &msg, 0);
|
||||
|
||||
for (nh = (struct nlmsghdr *) buf; NLMSG_OK (nh, len);
|
||||
nh = NLMSG_NEXT (nh, len)) {
|
||||
/* The end of multipart message. */
|
||||
if (nh\->nlmsg_type == NLMSG_DONE)
|
||||
return;
|
||||
|
||||
if (nh\->nlmsg_type == NLMSG_ERROR)
|
||||
/* Do some error handling. */
|
||||
...
|
||||
|
||||
/* Continue with parsing payload. */
|
||||
...
|
||||
}
|
||||
.fi
|
||||
.in
|
||||
.SH "SEE ALSO"
|
||||
.BR cmsg (3),
|
||||
.BR netlink (3),
|
||||
.BR capabilities (7),
|
||||
.BR rtnetlink (7)
|
||||
.PP
|
||||
ftp://ftp.inr.ac.ru/ip-routing/iproute2*
|
||||
for information about libnetlink.
|
||||
|
||||
http://people.suug.ch/~tgr/libnl/
|
||||
for information about libnl.
|
||||
|
||||
RFC 3549 "Linux Netlink as an IP Services Protocol"
|
||||
|
|
410
man7/packet.7
410
man7/packet.7
|
@ -1,8 +1,402 @@
|
|||
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\" $Id: packet.7,v 1.13 2000/08/14 08:03:45 ak Exp $
|
||||
.TH PACKET 7 1999-04-29 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
packet, PF_PACKET \- packet interface on device level.
|
||||
.SH SYNOPSIS
|
||||
.nf
|
||||
.B #include <sys/socket.h>
|
||||
.br
|
||||
.B #include <netpacket/packet.h>
|
||||
.br
|
||||
.B #include <net/ethernet.h> /* the L2 protocols */
|
||||
.sp
|
||||
.BI "packet_socket = socket(PF_PACKET, int " socket_type ", int "protocol );
|
||||
.fi
|
||||
.SH DESCRIPTION
|
||||
Packet sockets are used to receive or send raw packets at the device driver
|
||||
(OSI Layer 2) level.
|
||||
They allow the user to implement protocol modules in user space
|
||||
on top of the physical layer.
|
||||
|
||||
The
|
||||
.I socket_type
|
||||
is either
|
||||
.B SOCK_RAW
|
||||
for raw packets including the link level header or
|
||||
.B SOCK_DGRAM
|
||||
for cooked packets with the link level header removed.
|
||||
The link level
|
||||
header information is available in a common format in a
|
||||
.IR sockaddr_ll .
|
||||
.I protocol
|
||||
is the IEEE 802.3 protocol number in network order.
|
||||
See the
|
||||
.I <linux/if_ether.h>
|
||||
include file for a list of allowed protocols.
|
||||
When protocol
|
||||
is set to
|
||||
.B htons(ETH_P_ALL)
|
||||
then all protocols are received.
|
||||
All incoming packets of that protocol type will be passed to the packet
|
||||
socket before they are passed to the protocols implemented in the kernel.
|
||||
|
||||
Only processes with effective UID 0 or the
|
||||
.B CAP_NET_RAW
|
||||
capability may open packet sockets.
|
||||
|
||||
.B SOCK_RAW
|
||||
packets are passed to and from the device driver without any changes in
|
||||
the packet data.
|
||||
When receiving a packet, the address is still parsed and
|
||||
passed in a standard
|
||||
.I sockaddr_ll
|
||||
address structure.
|
||||
When transmitting a packet, the user supplied buffer
|
||||
should contain the physical layer header.
|
||||
That packet is then
|
||||
queued unmodified to the network driver of the interface defined by the
|
||||
destination address.
|
||||
Some device drivers always add other headers.
|
||||
.B SOCK_RAW
|
||||
is similar to but not compatible with the obsolete
|
||||
.B PF_INET/SOCK_PACKET
|
||||
of Linux 2.0.
|
||||
|
||||
.B SOCK_DGRAM
|
||||
operates on a slightly higher level.
|
||||
The physical header is removed before the packet is passed to the user.
|
||||
Packets sent through a
|
||||
.B SOCK_DGRAM
|
||||
packet socket get a suitable physical layer header based on the
|
||||
information in the
|
||||
.I sockaddr_ll
|
||||
destination address before they are queued.
|
||||
|
||||
By default all packets of the specified protocol type
|
||||
are passed to a packet socket.
|
||||
To only get packets from a specific interface use
|
||||
.BR bind (2)
|
||||
specifying an address in a
|
||||
.I struct sockaddr_ll
|
||||
to bind the packet socket to an interface.
|
||||
Only the
|
||||
.I sll_protocol
|
||||
and the
|
||||
.I sll_ifindex
|
||||
address fields are used for purposes of binding.
|
||||
|
||||
The
|
||||
.BR connect (2)
|
||||
operation is not supported on packet sockets.
|
||||
|
||||
When the
|
||||
.B MSG_TRUNC
|
||||
flag is passed to
|
||||
.BR recvmsg (2),
|
||||
.BR recv (2),
|
||||
.BR recvfrom (2)
|
||||
the real length of the packet on the wire is always returned,
|
||||
even when it is longer than the buffer.
|
||||
.SS Address Types
|
||||
The sockaddr_ll is a device independent physical layer address.
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct sockaddr_ll {
|
||||
unsigned short sll_family; /* Always AF_PACKET */
|
||||
unsigned short sll_protocol; /* Physical layer protocol */
|
||||
int sll_ifindex; /* Interface number */
|
||||
unsigned short sll_hatype; /* Header type */
|
||||
unsigned char sll_pkttype; /* Packet type */
|
||||
unsigned char sll_halen; /* Length of address */
|
||||
unsigned char sll_addr[8]; /* Physical layer address */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.I sll_protocol
|
||||
is the standard ethernet protocol type in network order as defined
|
||||
in the
|
||||
.I <linux/if_ether.h>
|
||||
include file.
|
||||
It defaults to the socket's protocol.
|
||||
.I sll_ifindex
|
||||
is the interface index of the interface
|
||||
(see
|
||||
.BR netdevice (7));
|
||||
0 matches any interface (only permitted for binding).
|
||||
.I sll_hatype
|
||||
is a ARP type as defined in the
|
||||
.I <linux/if_arp.h>
|
||||
include file.
|
||||
.I sll_pkttype
|
||||
contains the packet type.
|
||||
Valid types are
|
||||
.B PACKET_HOST
|
||||
for a packet addressed to the local host,
|
||||
.B PACKET_BROADCAST
|
||||
for a physical layer broadcast packet,
|
||||
.B PACKET_MULTICAST
|
||||
for a packet sent to a physical layer multicast address,
|
||||
.B PACKET_OTHERHOST
|
||||
for a packet to some other host that has been caught by a device driver
|
||||
in promiscuous mode, and
|
||||
.B PACKET_OUTGOING
|
||||
for a packet originated from the local host that is looped back to a packet
|
||||
socket.
|
||||
These types make only sense for receiving.
|
||||
.I sll_addr
|
||||
and
|
||||
.I sll_halen
|
||||
contain the physical layer (e.g., IEEE 802.3) address and its length.
|
||||
The exact interpretation depends on the device.
|
||||
|
||||
When you send packets it is enough to specify
|
||||
.IR sll_family ,
|
||||
.IR sll_addr ,
|
||||
.IR sll_halen ,
|
||||
.IR sll_ifindex .
|
||||
The other fields should be 0.
|
||||
.I sll_hatype
|
||||
and
|
||||
.I sll_pkttype
|
||||
are set on received packets for your information.
|
||||
For bind only
|
||||
.I sll_protocol
|
||||
and
|
||||
.I sll_ifindex
|
||||
are used.
|
||||
.SS Socket Options
|
||||
Packet sockets can be used to configure physical layer multicasting
|
||||
and promiscuous mode.
|
||||
It works by calling
|
||||
.BR setsockopt (2)
|
||||
on a packet socket for
|
||||
.B SOL_PACKET
|
||||
and one of the options
|
||||
.B PACKET_ADD_MEMBERSHIP
|
||||
to add a binding or
|
||||
.B PACKET_DROP_MEMBERSHIP
|
||||
to drop it.
|
||||
They both expect a
|
||||
.B packet_mreq
|
||||
structure as argument:
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct packet_mreq {
|
||||
int mr_ifindex; /* interface index */
|
||||
unsigned short mr_type; /* action */
|
||||
unsigned short mr_alen; /* address length */
|
||||
unsigned char mr_address[8]; /* physical layer address */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.B mr_ifindex
|
||||
contains the interface index for the interface whose status
|
||||
should be changed.
|
||||
The
|
||||
.B mr_type
|
||||
parameter specifies which action to perform.
|
||||
.B PACKET_MR_PROMISC
|
||||
enables receiving all packets on a shared medium (often known as
|
||||
"promiscuous mode"),
|
||||
.B PACKET_MR_MULTICAST
|
||||
binds the socket to the physical layer multicast group specified in
|
||||
.B mr_address
|
||||
and
|
||||
.BR mr_alen ,
|
||||
and
|
||||
.B PACKET_MR_ALLMULTI
|
||||
sets the socket up to receive all multicast packets arriving at
|
||||
the interface.
|
||||
|
||||
In addition the traditional ioctls
|
||||
.BR SIOCSIFFLAGS ,
|
||||
.BR SIOCADDMULTI ,
|
||||
.B SIOCDELMULTI
|
||||
can be used for the same purpose.
|
||||
.SS Ioctls
|
||||
.B SIOCGSTAMP
|
||||
can be used to receive the timestamp of the last received packet.
|
||||
Argument is a
|
||||
.I struct timeval.
|
||||
|
||||
In addition all standard ioctls defined in
|
||||
.BR netdevice (7)
|
||||
and
|
||||
.BR socket (7)
|
||||
are valid on packet sockets.
|
||||
.SS Error Handling
|
||||
Packet sockets do no error handling other than errors occurred
|
||||
while passing the packet to the device driver.
|
||||
They don't have the concept of a pending error.
|
||||
.SH ERRORS
|
||||
.TP
|
||||
.B EADDRNOTAVAIL
|
||||
Unknown multicast group address passed.
|
||||
.TP
|
||||
.B EFAULT
|
||||
User passed invalid memory address.
|
||||
.TP
|
||||
.B EINVAL
|
||||
Invalid argument.
|
||||
.TP
|
||||
.B EMSGSIZE
|
||||
Packet is bigger than interface MTU.
|
||||
.TP
|
||||
.B ENETDOWN
|
||||
Interface is not up.
|
||||
.TP
|
||||
.B ENOBUFS
|
||||
Not enough memory to allocate the packet.
|
||||
.TP
|
||||
.B ENODEV
|
||||
Unknown device name or interface index specified in interface address.
|
||||
.TP
|
||||
.B ENOENT
|
||||
No packet received.
|
||||
.TP
|
||||
.B ENOTCONN
|
||||
No interface address passed.
|
||||
.TP
|
||||
.B ENXIO
|
||||
Interface address contained an invalid interface index.
|
||||
.TP
|
||||
.B EPERM
|
||||
User has insufficient privileges to carry out this operation.
|
||||
|
||||
In addition other errors may be generated by the low-level driver.
|
||||
.SH VERSIONS
|
||||
.B PF_PACKET
|
||||
is a new feature in Linux 2.2.
|
||||
Earlier Linux versions supported only
|
||||
.BR SOCK_PACKET .
|
||||
.PP
|
||||
The include file
|
||||
.I <netpacket/packet.h>
|
||||
is present since glibc 2.1.
|
||||
Older systems need:
|
||||
.sp
|
||||
.in +4n
|
||||
.nf
|
||||
#include <asm/types.h>
|
||||
#include <linux/if_packet.h>
|
||||
#include <linux/if_ether.h> /* The L2 protocols */
|
||||
.fi
|
||||
.in
|
||||
.SH NOTES
|
||||
For portable programs it is suggested to use
|
||||
.B PF_PACKET
|
||||
via
|
||||
.BR pcap (3);
|
||||
although this only covers a subset of the
|
||||
.B PF_PACKET
|
||||
features.
|
||||
|
||||
The
|
||||
.B SOCK_DGRAM
|
||||
packet sockets make no attempt to create or parse the IEEE 802.2 LLC
|
||||
header for a IEEE 802.3 frame.
|
||||
When
|
||||
.B ETH_P_802_3
|
||||
is specified as protocol for sending the kernel creates the
|
||||
802.3 frame and fills out the length field; the user has to supply the LLC
|
||||
header to get a fully conforming packet.
|
||||
Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol
|
||||
fields; instead they are supplied to the user as protocol
|
||||
.B ETH_P_802_2
|
||||
with the LLC header prepended.
|
||||
It is thus not possible to bind to
|
||||
.BR ETH_P_802_3 ;
|
||||
bind to
|
||||
.B ETH_P_802_2
|
||||
instead and do the protocol multiplex yourself.
|
||||
The default for sending is the standard Ethernet DIX
|
||||
encapsulation with the protocol filled in.
|
||||
|
||||
Packet sockets are not subject to the input or output firewall chains.
|
||||
.SS Compatibility
|
||||
In Linux 2.0, the only way to get a packet socket was by calling
|
||||
.BI "socket(PF_INET, SOCK_PACKET, " protocol )\fR.
|
||||
This is still supported but strongly deprecated.
|
||||
The main difference between the two methods is that
|
||||
.B SOCK_PACKET
|
||||
uses the old
|
||||
.I struct sockaddr_pkt
|
||||
to specify an interface, which doesn't provide physical layer
|
||||
independence.
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct sockaddr_pkt {
|
||||
unsigned short spkt_family;
|
||||
unsigned char spkt_device[14];
|
||||
unsigned short spkt_protocol;
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.I spkt_family
|
||||
contains
|
||||
the device type,
|
||||
.I spkt_protocol
|
||||
is the IEEE 802.3 protocol type as defined in
|
||||
.I <sys/if_ether.h>
|
||||
and
|
||||
.I spkt_device
|
||||
is the device name as a null terminated string, for example, eth0.
|
||||
|
||||
This structure is obsolete and should not be used in new code.
|
||||
.SH BUGS
|
||||
glibc 2.1 does not have a define for
|
||||
.BR SOL_PACKET .
|
||||
The suggested workaround is to use:
|
||||
.in +4n
|
||||
.nf
|
||||
|
||||
#ifndef SOL_PACKET
|
||||
#define SOL_PACKET 263
|
||||
#endif
|
||||
|
||||
.fi
|
||||
.in
|
||||
This is fixed in later glibc versions and also does not occur on
|
||||
libc5 systems.
|
||||
|
||||
The IEEE 802.2/803.3 LLC handling could be considered as a bug.
|
||||
|
||||
Socket filters are not documented.
|
||||
|
||||
The
|
||||
.B MSG_TRUNC
|
||||
.BR recvmsg (2)
|
||||
extension is an ugly hack and should be replaced by a control message.
|
||||
There is currently no way to get the original destination address of
|
||||
packets via
|
||||
.BR SOCK_DGRAM .
|
||||
.\" .SH CREDITS
|
||||
.\" This man page was written by Andi Kleen with help from Matthew Wilcox.
|
||||
.\" PF_PACKET in Linux 2.2 was implemented
|
||||
.\" by Alexey Kuznetsov, based on code by Alan Cox and others.
|
||||
.SH "SEE ALSO"
|
||||
.BR socket (2),
|
||||
.BR pcap (3),
|
||||
.BR capabilities (7),
|
||||
.BR ip (7),
|
||||
.BR raw (7),
|
||||
.BR socket (7)
|
||||
|
||||
RFC\ 894 for the standard IP Ethernet encapsulation.
|
||||
|
||||
RFC\ 1700 for the IEEE 802.3 IP encapsulation.
|
||||
|
||||
The
|
||||
.I <linux/if_ether.h>
|
||||
include file for physical layer protocols.
|
||||
|
|
286
man7/raw.7
286
man7/raw.7
|
@ -1,8 +1,278 @@
|
|||
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
'\" t
|
||||
.\" Don't change the first line, it tells man that we need tbl.
|
||||
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\" $Id: raw.7,v 1.6 1999/06/05 10:32:08 freitag Exp $
|
||||
.TH RAW 7 1998-10-02 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
raw, SOCK_RAW \- Linux IPv4 raw sockets
|
||||
.SH SYNOPSIS
|
||||
.B #include <sys/socket.h>
|
||||
.br
|
||||
.B #include <netinet/in.h>
|
||||
.br
|
||||
.BI "raw_socket = socket(PF_INET, SOCK_RAW, int " protocol );
|
||||
.SH DESCRIPTION
|
||||
Raw sockets allow new IPv4 protocols to be implemented in user space.
|
||||
A raw socket receives or sends the raw datagram not
|
||||
including link level headers.
|
||||
|
||||
The IPv4 layer generates an IP header when sending a packet unless the
|
||||
.B IP_HDRINCL
|
||||
socket option is enabled on the socket.
|
||||
When it is enabled, the packet must contain an IP header.
|
||||
For receiving the IP header is always included in the packet.
|
||||
|
||||
Only processes with an effective user ID of 0 or the
|
||||
.B CAP_NET_RAW
|
||||
capability are allowed to open raw sockets.
|
||||
|
||||
All packets or errors matching the
|
||||
.I protocol
|
||||
number specified
|
||||
for the raw socket are passed to this socket.
|
||||
For a list of the allowed protocols see RFC\ 1700 assigned numbers and
|
||||
.BR getprotobyname (3).
|
||||
|
||||
A protocol of
|
||||
.B IPPROTO_RAW
|
||||
implies enabled
|
||||
.B IP_HDRINCL
|
||||
and is able to send any IP protocol that is specified in the passed
|
||||
header.
|
||||
Receiving of all IP protocols via
|
||||
.B IPPROTO_RAW
|
||||
is not possible using raw sockets.
|
||||
.RS
|
||||
.TS
|
||||
tab(:) allbox;
|
||||
c s
|
||||
l l.
|
||||
IP Header fields modified on sending by \fBIP_HDRINCL\fP
|
||||
IP Checksum:Always filled in.
|
||||
Source Address:Filled in when zero.
|
||||
Packet Id:Filled in when zero.
|
||||
Total Length:Always filled in.
|
||||
.TE
|
||||
.RE
|
||||
.sp
|
||||
.PP
|
||||
If
|
||||
.B IP_HDRINCL
|
||||
is specified and the IP header has a non-zero destination address then
|
||||
the destination address of the socket is used to route the packet.
|
||||
When
|
||||
.B MSG_DONTROUTE
|
||||
is specified the destination address should refer to a local interface,
|
||||
otherwise a routing table lookup is done anyway but gatewayed routes
|
||||
are ignored.
|
||||
|
||||
If
|
||||
.B IP_HDRINCL
|
||||
isn't set then IP header options can be set on raw sockets with
|
||||
.BR setsockopt (2);
|
||||
see
|
||||
.BR ip (7)
|
||||
for more information.
|
||||
|
||||
In Linux 2.2 all IP header fields and options can be set using
|
||||
IP socket options.
|
||||
This means raw sockets are usually only needed for new
|
||||
protocols or protocols with no user interface (like ICMP).
|
||||
|
||||
When a packet is received, it is passed to any raw sockets which have
|
||||
been bound to its protocol before it is passed to other protocol handlers
|
||||
(e.g., kernel protocol modules).
|
||||
.SS Address Format
|
||||
Raw sockets use the standard
|
||||
.I sockaddr_in
|
||||
address structure defined in
|
||||
.BR ip (7).
|
||||
The
|
||||
.I sin_port
|
||||
field could be used to specify the IP protocol number,
|
||||
but it is ignored for sending in Linux 2.2 and should be always
|
||||
set to 0 (see BUGS)
|
||||
For incoming packets
|
||||
.I sin_port
|
||||
is set to the protocol of the packet.
|
||||
See the
|
||||
.I <netinet/in.h>
|
||||
include file for valid IP protocols.
|
||||
.SS Socket Options
|
||||
Raw socket options can be set with
|
||||
.BR setsockopt (2)
|
||||
and read with
|
||||
.BR getsockopt (2)
|
||||
by passing the
|
||||
.B IPPROTO_RAW
|
||||
.\" Or SOL_RAW on Linux
|
||||
family flag.
|
||||
.TP
|
||||
.B ICMP_FILTER
|
||||
Enable a special filter for raw sockets bound to the
|
||||
.B IPPROTO_ICMP
|
||||
protocol.
|
||||
The value has a bit set for each ICMP message type which
|
||||
should be filtered out.
|
||||
The default is to filter no ICMP messages.
|
||||
.PP
|
||||
In addition all
|
||||
.BR ip (7)
|
||||
.B IPPROTO_IP
|
||||
socket options valid for datagram sockets are supported.
|
||||
.SS Error Handling
|
||||
Errors originating from the network are only passed to the user when the
|
||||
socket is connected or the
|
||||
.B IP_RECVERR
|
||||
flag is enabled.
|
||||
For connected sockets only
|
||||
.B EMSGSIZE
|
||||
and
|
||||
.B EPROTO
|
||||
are passed for compatibility.
|
||||
With
|
||||
.B IP_RECVERR
|
||||
all network errors are saved in the error queue.
|
||||
.SH ERRORS
|
||||
.TP
|
||||
.B EACCES
|
||||
User tried to send to a broadcast address without having the
|
||||
broadcast flag set on the socket.
|
||||
.TP
|
||||
.B EFAULT
|
||||
An invalid memory address was supplied.
|
||||
.TP
|
||||
.B EINVAL
|
||||
Invalid argument.
|
||||
.TP
|
||||
.B EMSGSIZE
|
||||
Packet too big.
|
||||
Either Path MTU Discovery is enabled (the
|
||||
.B IP_MTU_DISCOVER
|
||||
socket flag) or the packet size exceeds the maximum allowed IPv4
|
||||
packet size of 64KB.
|
||||
.TP
|
||||
.B EOPNOTSUPP
|
||||
Invalid flag has been passed to a socket call (like
|
||||
.BR MSG_OOB ).
|
||||
.TP
|
||||
.B EPERM
|
||||
The user doesn't have permission to open raw sockets.
|
||||
Only processes with an effective user ID of 0 or the
|
||||
.B CAP_NET_RAW
|
||||
attribute may do that.
|
||||
.TP
|
||||
.B EPROTO
|
||||
An ICMP error has arrived reporting a parameter problem.
|
||||
.SH VERSIONS
|
||||
.B IP_RECVERR
|
||||
and
|
||||
.B ICMP_FILTER
|
||||
are new in Linux 2.2.
|
||||
They are Linux extensions and should not be used in portable programs.
|
||||
|
||||
Linux 2.0 enabled some bug-to-bug compatibility with BSD in the
|
||||
raw socket code when the
|
||||
.B SO_BSDCOMPAT
|
||||
socket option was set \(em since Linux 2.2,
|
||||
this option no longer has that effect.
|
||||
.SH NOTES
|
||||
By default raw sockets do path MTU (Maximum Transmission Unit) discovery.
|
||||
This means the kernel
|
||||
will keep track of the MTU to a specific target IP address and return
|
||||
.B EMSGSIZE
|
||||
when a raw packet write exceeds it.
|
||||
When this happens the application should decrease the packet size.
|
||||
Path MTU discovery can be also turned off using the
|
||||
.B IP_MTU_DISCOVER
|
||||
socket option or the
|
||||
.I ip_no_pmtu_disc
|
||||
sysctl, see
|
||||
.BR ip (7)
|
||||
for details.
|
||||
When turned off raw sockets will fragment outgoing packets
|
||||
that exceed the interface MTU.
|
||||
However disabling it is not recommended
|
||||
for performance and reliability reasons.
|
||||
|
||||
A raw socket can be bound to a specific local address using the
|
||||
.BR bind (2)
|
||||
call.
|
||||
If it isn't bound all packets with the specified IP protocol are received.
|
||||
In addition a RAW socket can be bound to a specific network device using
|
||||
.BR SO_BINDTODEVICE ;
|
||||
see
|
||||
.BR socket (7).
|
||||
|
||||
An
|
||||
.B IPPROTO_RAW
|
||||
socket is send only.
|
||||
If you really want to receive all IP packets use a
|
||||
.BR packet (7)
|
||||
socket with the
|
||||
.B ETH_P_IP
|
||||
protocol.
|
||||
Note that packet sockets don't reassemble IP fragments,
|
||||
unlike raw sockets.
|
||||
|
||||
If you want to receive all ICMP packets for a datagram socket
|
||||
it is often better to use
|
||||
.B IP_RECVERR
|
||||
on that particular socket; see
|
||||
.BR ip (7).
|
||||
|
||||
Raw sockets may tap all IP protocols in Linux, even
|
||||
protocols like ICMP or TCP which have a protocol module in the kernel.
|
||||
In this case the packets are passed to both the kernel module and the raw
|
||||
socket(s).
|
||||
This should not be relied upon in portable programs, many other BSD
|
||||
socket implementation have limitations here.
|
||||
|
||||
Linux never changes headers passed from the user (except for filling
|
||||
in some zeroed fields as described for
|
||||
.BR IP_HDRINCL ).
|
||||
This differs from many other implementations of raw sockets.
|
||||
|
||||
RAW sockets are generally rather unportable and should be avoided in
|
||||
programs intended to be portable.
|
||||
|
||||
Sending on raw sockets should take the IP protocol from
|
||||
.IR sin_port ;
|
||||
this ability was lost in Linux 2.2.
|
||||
The workaround is to use
|
||||
.BR IP_HDRINCL .
|
||||
.SH BUGS
|
||||
Transparent proxy extensions are not described.
|
||||
|
||||
When the
|
||||
.B IP_HDRINCL
|
||||
option is set datagrams will not be fragmented and are limited to
|
||||
the interface MTU.
|
||||
|
||||
Setting the IP protocol for sending in
|
||||
.I sin_port
|
||||
got lost in Linux 2.2.
|
||||
The protocol that the socket was bound to or that
|
||||
was specified in the initial
|
||||
.BR socket (2)
|
||||
call is always used.
|
||||
.\" .SH AUTHORS
|
||||
.\" This man page was written by Andi Kleen.
|
||||
.SH "SEE ALSO"
|
||||
.BR recvmsg (2),
|
||||
.BR sendmsg (2),
|
||||
.BR capabilities (7),
|
||||
.BR ip (7),
|
||||
.BR socket (7)
|
||||
|
||||
.B RFC\ 1191
|
||||
for path MTU discovery.
|
||||
|
||||
.B RFC\ 791
|
||||
and the
|
||||
.I <linux/ip.h>
|
||||
include file for the IP protocol.
|
||||
|
|
455
man7/rtnetlink.7
455
man7/rtnetlink.7
|
@ -1,8 +1,449 @@
|
|||
'\" t
|
||||
.\" Don't remove the line above, it tells man that tbl is needed.
|
||||
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\" Based on the original comments from Alexey Kuznetsov, written with
|
||||
.\" help from Matthew Wilcox.
|
||||
.\" $Id: rtnetlink.7,v 1.8 2000/01/22 01:55:04 freitag Exp $
|
||||
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
|
||||
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
rtnetlink, NETLINK_ROUTE \- Linux IPv4 routing socket
|
||||
.SH SYNOPSIS
|
||||
.B #include <asm/types.h>
|
||||
.br
|
||||
.B #include <linux/netlink.h>
|
||||
.br
|
||||
.B #include <linux/rtnetlink.h>
|
||||
.br
|
||||
.B #include <sys/socket.h>
|
||||
.sp
|
||||
.BI "rtnetlink_socket = socket(PF_NETLINK, int " socket_type ", NETLINK_ROUTE);"
|
||||
.SH DESCRIPTION
|
||||
Rtnetlink allows the kernel's routing tables to be read and altered.
|
||||
It is used within the kernel to communicate between
|
||||
various subsystems, though this usage is not documented here, and for
|
||||
communication with user-space programs.
|
||||
Network routes, ip addresses, link parameters, neighbor setups, queueing
|
||||
disciplines, traffic classes and packet classifiers may all be controlled
|
||||
through
|
||||
.B NETLINK_ROUTE
|
||||
sockets.
|
||||
It is based on netlink messages, see
|
||||
.BR netlink (7)
|
||||
for more information.
|
||||
.\" FIXME ? all these macros could be moved to rtnetlink(3)
|
||||
.SS "Routing Attributes"
|
||||
Some rtnetlink messages have optional attributes after the initial header:
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct rtattr {
|
||||
unsigned short rta_len; /* Length of option */
|
||||
unsigned short rta_type; /* Type of option */
|
||||
/* Data follows */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
These attributes should be only manipulated using the RTA_* macros
|
||||
or libnetlink, see
|
||||
.BR rtnetlink (3).
|
||||
.SS Messages
|
||||
Rtnetlink consists of these message types
|
||||
(in addition to standard netlink messages):
|
||||
.TP
|
||||
.BR RTM_NEWLINK ", " RTM_DELLINK ", " RTM_GETLINK
|
||||
Create, remove or get information about a specific network interface.
|
||||
These messages contain an
|
||||
.I ifinfomsg
|
||||
structure followed by a series of
|
||||
.I rtattr
|
||||
structures.
|
||||
|
||||
.nf
|
||||
struct ifinfomsg {
|
||||
unsigned char ifi_family; /* AF_UNSPEC */
|
||||
unsigned short ifi_type; /* Device type */
|
||||
int ifi_index; /* Interface index */
|
||||
unsigned int ifi_flags; /* Device flags */
|
||||
unsigned int ifi_change; /* change mask */
|
||||
};
|
||||
.fi
|
||||
|
||||
.\" FIXME ifi_type
|
||||
.I ifi_flags
|
||||
contains the device flags, see
|
||||
.BR netdevice (7);
|
||||
.I ifi_index
|
||||
is the unique interface index,
|
||||
.I ifi_change
|
||||
is reserved for future use and should be always set to 0xFFFFFFFF.
|
||||
.TS
|
||||
tab(:);
|
||||
c
|
||||
l l l.
|
||||
Routing attributes
|
||||
rta_type:value type:description
|
||||
_
|
||||
IFLA_UNSPEC:-:unspecified.
|
||||
IFLA_ADDRESS:hardware address:interface L2 address
|
||||
IFLA_BROADCAST:hardware address:L2 broadcast address.
|
||||
IFLA_IFNAME:asciiz string:Device name.
|
||||
IFLA_MTU:unsigned int:MTU of the device.
|
||||
IFLA_LINK:int:Link type.
|
||||
IFLA_QDISC:asciiz string:Queueing discipline.
|
||||
IFLA_STATS:T{
|
||||
see below
|
||||
T}:Interface Statistics.
|
||||
.TE
|
||||
.sp
|
||||
The value type for IFLA_STATS is \fIstruct net_device_stats\fP.
|
||||
.TP
|
||||
.BR RTM_NEWADDR ", " RTM_DELADDR ", " RTM_GETADDR
|
||||
Add, remove or receive information about an IP address associated with
|
||||
an interface.
|
||||
In Linux 2.2 an interface can carry multiple IP addresses,
|
||||
this replaces the alias device concept in 2.0.
|
||||
In Linux 2.2 these messages
|
||||
support IPv4 and IPv6 addresses.
|
||||
They contain an
|
||||
.I ifaddrmsg
|
||||
structure, optionally followed by
|
||||
.I rtaddr
|
||||
routing attributes.
|
||||
|
||||
.nf
|
||||
struct ifaddrmsg {
|
||||
unsigned char ifa_family; /* Address type */
|
||||
unsigned char ifa_prefixlen; /* Prefixlength of address */
|
||||
unsigned char ifa_flags; /* Address flags */
|
||||
unsigned char ifa_scope; /* Address scope */
|
||||
int ifa_index; /* Interface index */
|
||||
};
|
||||
.fi
|
||||
|
||||
.I ifa_family
|
||||
is the address family type (currently
|
||||
.B AF_INET
|
||||
or
|
||||
.BR AF_INET6 ),
|
||||
.I ifa_prefixlen
|
||||
is the length of the address mask of the address if defined for the
|
||||
family (like for IPv4),
|
||||
.I ifa_scope
|
||||
is the address scope,
|
||||
.I ifa_index
|
||||
is the interface index of the interface the address is associated with.
|
||||
.I ifa_flags
|
||||
is a flag word of
|
||||
.B IFA_F_SECONDARY
|
||||
for secondary address (old alias interface),
|
||||
.B IFA_F_PERMANENT
|
||||
for a permanent address set by the user and other undocumented flags.
|
||||
.TS
|
||||
tab(:);
|
||||
c
|
||||
l l l.
|
||||
Attributes
|
||||
rta_type:value type:description
|
||||
_
|
||||
IFA_UNSPEC:-:unspecified.
|
||||
IFA_ADDRESS:raw protocol address:interface address
|
||||
IFA_LOCAL:raw protocol address:local address
|
||||
IFA_LABEL:asciiz string:name of the interface
|
||||
IFA_BROADCAST:raw protocol address:broadcast address.
|
||||
IFA_ANYCAST:raw protocol address:anycast address
|
||||
IFA_CACHEINFO:struct ifa_cacheinfo:Address information.
|
||||
.TE
|
||||
.\" FIXME struct ifa_cacheinfo
|
||||
.TP
|
||||
.BR RTM_NEWROUTE ", " RTM_DELROUTE ", " RTM_GETROUTE
|
||||
Create, remove or receive information about a network route.
|
||||
These messages contain an
|
||||
.I rtmsg
|
||||
structure with an optional sequence of
|
||||
.I rtattr
|
||||
structures following.
|
||||
For
|
||||
.B RTM_GETROUTE
|
||||
setting
|
||||
.I rtm_dst_len
|
||||
and
|
||||
.I rtm_src_len
|
||||
to 0 means you get all entries for the specified routing table.
|
||||
For the other fields except
|
||||
.I rtm_table
|
||||
and
|
||||
.I rtm_protocol
|
||||
0 is the wildcard.
|
||||
|
||||
.nf
|
||||
struct rtmsg {
|
||||
unsigned char rtm_family; /* Address family of route */
|
||||
unsigned char rtm_dst_len; /* Length of source */
|
||||
unsigned char rtm_src_len; /* Length of destination */
|
||||
unsigned char rtm_tos; /* TOS filter */
|
||||
|
||||
unsigned char rtm_table; /* Routing table ID */
|
||||
unsigned char rtm_protocol; /* Routing protocol; see below */
|
||||
unsigned char rtm_scope; /* See below */
|
||||
unsigned char rtm_type; /* See below */
|
||||
|
||||
unsigned int rtm_flags;
|
||||
};
|
||||
.fi
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
rtm_type:Route type
|
||||
_
|
||||
RTN_UNSPEC:unknown route
|
||||
RTN_UNICAST:a gateway or direct route
|
||||
RTN_LOCAL:a local interface route
|
||||
RTN_BROADCAST:T{
|
||||
a local broadcast route (sent as a broadcast)
|
||||
T}
|
||||
RTN_ANYCAST:T{
|
||||
a local broadcast route (sent as a unicast)
|
||||
T}
|
||||
RTN_MULTICAST:a multicast route
|
||||
RTN_BLACKHOLE:a packet dropping route
|
||||
RTN_UNREACHABLE:an unreachable destination
|
||||
RTN_PROHIBIT:a packet rejection route
|
||||
RTN_THROW:continue routing lookup in another table
|
||||
RTN_NAT:a network address translation rule
|
||||
RTN_XRESOLVE:T{
|
||||
refer to an external resolver (not implemented)
|
||||
T}
|
||||
.TE
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
rtm_protocol:Route origin.
|
||||
_
|
||||
RTPROT_UNSPEC:unknown
|
||||
RTPROT_REDIRECT:T{
|
||||
by an ICMP redirect (currently unused)
|
||||
T}
|
||||
RTPROT_KERNEL:by the kernel
|
||||
RTPROT_BOOT:during boot
|
||||
RTPROT_STATIC:by the administrator
|
||||
.TE
|
||||
|
||||
Values larger than
|
||||
.B RTPROT_STATIC
|
||||
are not interpreted by the kernel, they are just for user information.
|
||||
They may be used to tag the source of a routing information or to
|
||||
distinguish between multiple routing daemons.
|
||||
See
|
||||
.I <linux/rtnetlink.h>
|
||||
for the routing daemon identifiers which are already assigned.
|
||||
|
||||
.I rtm_scope
|
||||
is the distance to the destination:
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
RT_SCOPE_UNIVERSE:global route
|
||||
RT_SCOPE_SITE:T{
|
||||
interior route in the local autonomous system
|
||||
T}
|
||||
RT_SCOPE_LINK:route on this link
|
||||
RT_SCOPE_HOST:route on the local host
|
||||
RT_SCOPE_NOWHERE:destination doesn't exist
|
||||
.TE
|
||||
|
||||
The values between
|
||||
.B RT_SCOPE_UNIVERSE
|
||||
and
|
||||
.B RT_SCOPE_SITE
|
||||
are available to the user.
|
||||
|
||||
The
|
||||
.I rtm_flags
|
||||
have the following meanings:
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
RTM_F_NOTIFY:T{
|
||||
if the route changes, notify the user via rtnetlink
|
||||
T}
|
||||
RTM_F_CLONED:route is cloned from another route
|
||||
RTM_F_EQUALIZE:a multipath equalizer (not yet implemented)
|
||||
.TE
|
||||
|
||||
.I rtm_table
|
||||
specifies the routing table
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
RT_TABLE_UNSPEC:an unspecified routing table
|
||||
RT_TABLE_DEFAULT:the default table
|
||||
RT_TABLE_MAIN:the main table
|
||||
RT_TABLE_LOCAL:the local table
|
||||
.TE
|
||||
|
||||
The user may assign arbitrary values between
|
||||
.B RT_TABLE_UNSPEC
|
||||
and
|
||||
.BR RT_TABLE_DEFAULT .
|
||||
.TS
|
||||
tab(:);
|
||||
c
|
||||
l l l.
|
||||
Attributes
|
||||
rta_type:value type:description
|
||||
_
|
||||
RTA_UNSPEC:-:ignored.
|
||||
RTA_DST:protocol address:Route destination address.
|
||||
RTA_SRC:protocol address:Route source address.
|
||||
RTA_IIF:int:Input interface index.
|
||||
RTA_OIF:int:Output interface index.
|
||||
RTA_GATEWAY:protocol address:The gateway of the route
|
||||
RTA_PRIORITY:int:Priority of route.
|
||||
RTA_PREFSRC::
|
||||
RTA_METRICS:int:Route metric
|
||||
RTA_MULTIPATH::
|
||||
RTA_PROTOINFO::
|
||||
RTA_FLOW::
|
||||
RTA_CACHEINFO::
|
||||
.TE
|
||||
|
||||
.B Fill these values in!
|
||||
.TP
|
||||
.BR RTM_NEWNEIGH ", " RTM_DELNEIGH ", " RTM_GETNEIGH
|
||||
Add, remove or receive information about a neighbor table
|
||||
entry (e.g., an ARP entry).
|
||||
The message contains an
|
||||
.I ndmsg
|
||||
structure.
|
||||
|
||||
.nf
|
||||
struct ndmsg {
|
||||
unsigned char ndm_family;
|
||||
int ndm_ifindex; /* Interface index */
|
||||
__u16 ndm_state; /* State */
|
||||
__u8 ndm_flags; /* Flags */
|
||||
__u8 ndm_type;
|
||||
};
|
||||
|
||||
struct nda_cacheinfo {
|
||||
__u32 ndm_confirmed;
|
||||
__u32 ndm_used;
|
||||
__u32 ndm_updated;
|
||||
__u32 ndm_refcnt;
|
||||
};
|
||||
.fi
|
||||
|
||||
.I ndm_state
|
||||
is a bit mask of the following states:
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
NUD_INCOMPLETE:a currently resolving cache entry
|
||||
NUD_REACHABLE:a confirmed working cache entry
|
||||
NUD_STALE:an expired cache entry
|
||||
NUD_DELAY:an entry waiting for a timer
|
||||
NUD_PROBE:a cache entry that is currently reprobed
|
||||
NUD_FAILED:an invalid cache entry
|
||||
NUD_NOARP:a device with no destination cache
|
||||
NUD_PERMANENT:a static entry
|
||||
.TE
|
||||
|
||||
Valid
|
||||
.I ndm_flags
|
||||
are:
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
NTF_PROXY:a proxy arp entry
|
||||
NTF_ROUTER:an IPv6 router
|
||||
.TE
|
||||
|
||||
.\" FIXME
|
||||
.\" document the members of the struct better
|
||||
The
|
||||
.I rtaddr
|
||||
struct has the following meanings for the
|
||||
.I rta_type
|
||||
field:
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
NDA_UNSPEC:unknown type
|
||||
NDA_DST:a neighbor cache n/w layer destination address
|
||||
NDA_LLADDR:a neighbor cache link layer address
|
||||
NDA_CACHEINFO:cache statistics.
|
||||
.TE
|
||||
|
||||
If the
|
||||
.I rta_type
|
||||
field is
|
||||
.B NDA_CACHEINFO
|
||||
then a
|
||||
.I struct nda_cacheinfo
|
||||
header follows
|
||||
.TP
|
||||
.BR RTM_NEWRULE ", " RTM_DELRULE ", " RTM_GETRULE
|
||||
Add, delete or retrieve a routing rule.
|
||||
Carries a
|
||||
.I struct rtmsg
|
||||
.TP
|
||||
.BR RTM_NEWQDISC ", " RTM_DELQDISC ", " RTM_GETQDISC
|
||||
Add, remove or get a queueing discipline.
|
||||
The message contains a
|
||||
.I struct tcmsg
|
||||
and may be followed by a series of
|
||||
attributes.
|
||||
|
||||
.nf
|
||||
struct tcmsg {
|
||||
unsigned char tcm_family;
|
||||
int tcm_ifindex; /* interface index */
|
||||
__u32 tcm_handle; /* Qdisc handle */
|
||||
__u32 tcm_parent; /* Parent qdisc */
|
||||
__u32 tcm_info;
|
||||
};
|
||||
.fi
|
||||
.TS
|
||||
tab(:);
|
||||
c
|
||||
l l l.
|
||||
Attributes
|
||||
rta_type:value type:Description
|
||||
_
|
||||
TCA_UNSPEC:-:unspecified
|
||||
TCA_KIND:asciiz string:Name of queueing discipline
|
||||
TCA_OPTIONS:byte sequence:Qdisc-specific options follow
|
||||
TCA_STATS:struct tc_stats:Qdisc statistics.
|
||||
TCA_XSTATS:qdisc specific:Module-specific statistics.
|
||||
TCA_RATE:struct tc_estimator:Rate limit.
|
||||
.TE
|
||||
|
||||
In addition various other qdisc module specific attributes are allowed.
|
||||
For more information see the appropriate include files.
|
||||
.TP
|
||||
.BR RTM_NEWTCLASS ", " RTM_DELTCLASS ", " RTM_GETTCLASS
|
||||
Add, remove or get a traffic class.
|
||||
These messages contain a
|
||||
.I struct tcmsg
|
||||
as described above.
|
||||
.TP
|
||||
.BR RTM_NEWTFILTER ", " RTM_DELTFILTER ", " RTM_GETTFILTER
|
||||
Add, remove or receive information about a traffic filter.
|
||||
These messages contain a
|
||||
.I struct tcmsg
|
||||
as described above.
|
||||
.SH VERSIONS
|
||||
.B rtnetlink
|
||||
is a new feature of Linux 2.2.
|
||||
.SH BUGS
|
||||
This manual page is incomplete.
|
||||
.SH "SEE ALSO"
|
||||
.BR cmsg (3),
|
||||
.BR rtnetlink (3),
|
||||
.BR ip (7),
|
||||
.BR netlink (7)
|
||||
|
|
736
man7/socket.7
736
man7/socket.7
|
@ -1,8 +1,728 @@
|
|||
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
|
||||
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
|
||||
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
|
||||
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
|
||||
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
|
||||
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
|
||||
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
|
||||
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
|
||||
'\" t
|
||||
.\" Don't change the first line, it tells man that we need tbl.
|
||||
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
||||
.\" and copyright (c) 1999 Matthew Wilcox.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\"
|
||||
.\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
|
||||
.\" Added description of SO_ACCEPTCONN
|
||||
.\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
|
||||
.\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
|
||||
.\" Added notes on capability requirements
|
||||
.\" A few small grammar fixes
|
||||
.\"
|
||||
.\" FIXME probably all PF_* should be AF_* in this page, since
|
||||
.\" POSIX only specifies the latter values.
|
||||
.\"
|
||||
.TH SOCKET 7 2007-12-28 Linux "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
socket \- Linux socket interface
|
||||
.SH SYNOPSIS
|
||||
.B #include <sys/socket.h>
|
||||
.sp
|
||||
.IB mysocket " = socket(int " socket_family ", int " socket_type ", int " protocol );
|
||||
.SH DESCRIPTION
|
||||
This manual page describes the Linux networking socket layer user
|
||||
interface.
|
||||
The BSD compatible sockets
|
||||
are the uniform interface
|
||||
between the user process and the network protocol stacks in the kernel.
|
||||
The protocol modules are grouped into
|
||||
.I protocol families
|
||||
like
|
||||
.BR PF_INET ", " PF_IPX ", " PF_PACKET
|
||||
and
|
||||
.I socket types
|
||||
like
|
||||
.B SOCK_STREAM
|
||||
or
|
||||
.BR SOCK_DGRAM .
|
||||
See
|
||||
.BR socket (2)
|
||||
for more information on families and types.
|
||||
.SS Socket Layer Functions
|
||||
These functions are used by the user process to send or receive packets
|
||||
and to do other socket operations.
|
||||
For more information see their respective manual pages.
|
||||
|
||||
.BR socket (2)
|
||||
creates a socket,
|
||||
.BR connect (2)
|
||||
connects a socket to a remote socket address,
|
||||
the
|
||||
.BR bind (2)
|
||||
function binds a socket to a local socket address,
|
||||
.BR listen (2)
|
||||
tells the socket that new connections shall be accepted, and
|
||||
.BR accept (2)
|
||||
is used to get a new socket with a new incoming connection.
|
||||
.BR socketpair (2)
|
||||
returns two connected anonymous sockets (only implemented for a few
|
||||
local families like
|
||||
.BR PF_UNIX )
|
||||
.PP
|
||||
.BR send (2),
|
||||
.BR sendto (2),
|
||||
and
|
||||
.BR sendmsg (2)
|
||||
send data over a socket, and
|
||||
.BR recv (2),
|
||||
.BR recvfrom (2),
|
||||
.BR recvmsg (2)
|
||||
receive data from a socket.
|
||||
.BR poll (2)
|
||||
and
|
||||
.BR select (2)
|
||||
wait for arriving data or a readiness to send data.
|
||||
In addition, the standard I/O operations like
|
||||
.BR write (2),
|
||||
.BR writev (2),
|
||||
.BR sendfile (2),
|
||||
.BR read (2),
|
||||
and
|
||||
.BR readv (2)
|
||||
can be used to read and write data.
|
||||
.PP
|
||||
.BR getsockname (2)
|
||||
returns the local socket address and
|
||||
.BR getpeername (2)
|
||||
returns the remote socket address.
|
||||
.BR getsockopt (2)
|
||||
and
|
||||
.BR setsockopt (2)
|
||||
are used to set or get socket layer or protocol options.
|
||||
.BR ioctl (2)
|
||||
can be used to set or read some other options.
|
||||
.PP
|
||||
.BR close (2)
|
||||
is used to close a socket.
|
||||
.BR shutdown (2)
|
||||
closes parts of a full-duplex socket connection.
|
||||
.PP
|
||||
Seeking, or calling
|
||||
.BR pread (2)
|
||||
or
|
||||
.BR pwrite (2)
|
||||
with a non-zero position is not supported on sockets.
|
||||
.PP
|
||||
It is possible to do non-blocking I/O on sockets by setting the
|
||||
.B O_NONBLOCK
|
||||
flag on a socket file descriptor using
|
||||
.BR fcntl (2).
|
||||
Then all operations that would block will (usually)
|
||||
return with
|
||||
.B EAGAIN
|
||||
(operation should be retried later);
|
||||
.BR connect (2)
|
||||
will return
|
||||
.B EINPROGRESS
|
||||
error.
|
||||
The user can then wait for various events via
|
||||
.BR poll (2)
|
||||
or
|
||||
.BR select (2).
|
||||
.TS
|
||||
tab(:) allbox;
|
||||
c s s
|
||||
l l l.
|
||||
I/O events
|
||||
Event:Poll flag:Occurrence
|
||||
Read:POLLIN:T{
|
||||
New data arrived.
|
||||
T}
|
||||
Read:POLLIN:T{
|
||||
A connection setup has been completed
|
||||
(for connection-oriented sockets)
|
||||
T}
|
||||
Read:POLLHUP:T{
|
||||
A disconnection request has been initiated by the other end.
|
||||
T}
|
||||
Read:POLLHUP:T{
|
||||
A connection is broken (only for connection-oriented protocols).
|
||||
When the socket is written
|
||||
.B SIGPIPE
|
||||
is also sent.
|
||||
T}
|
||||
Write:POLLOUT:T{
|
||||
Socket has enough send buffer space for writing new data.
|
||||
T}
|
||||
Read/Write:T{
|
||||
POLLIN|
|
||||
.br
|
||||
POLLOUT
|
||||
T}:T{
|
||||
An outgoing
|
||||
.BR connect (2)
|
||||
finished.
|
||||
T}
|
||||
Read/Write:POLLERR:An asynchronous error occurred.
|
||||
Read/Write:POLLHUP:The other end has shut down one direction.
|
||||
Exception:POLLPRI:T{
|
||||
Urgent data arrived.
|
||||
.B SIGURG
|
||||
is sent then.
|
||||
T}
|
||||
.\" FIXME . The following is not true currently:
|
||||
.\" It is no I/O event when the connection
|
||||
.\" is broken from the local end using
|
||||
.\" .BR shutdown (2)
|
||||
.\" or
|
||||
.\" .BR close (2).
|
||||
.TE
|
||||
|
||||
.PP
|
||||
An alternative to
|
||||
.BR poll (2)
|
||||
and
|
||||
.BR select (2)
|
||||
is to let the kernel inform the application about events
|
||||
via a
|
||||
.B SIGIO
|
||||
signal.
|
||||
For that the
|
||||
.B O_ASYNC
|
||||
flag must be set on a socket file descriptor via
|
||||
.BR fcntl (2)
|
||||
and a valid signal handler for
|
||||
.B SIGIO
|
||||
must be installed via
|
||||
.BR sigaction (2).
|
||||
See the
|
||||
.I Signals
|
||||
discussion below.
|
||||
.SS Socket Options
|
||||
These socket options can be set by using
|
||||
.BR setsockopt (2)
|
||||
and read with
|
||||
.BR getsockopt (2)
|
||||
with the socket level set to
|
||||
.B SOL_SOCKET
|
||||
for all sockets:
|
||||
.\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
|
||||
.\" W R Stevens, UNPv1
|
||||
.TP
|
||||
.B SO_ACCEPTCONN
|
||||
Returns a value indicating whether or not this socket has been marked
|
||||
to accept connections with
|
||||
.BR listen (2).
|
||||
The value 0 indicates that this is not a listening socket,
|
||||
the value 1 indicates that this is a listening socket.
|
||||
Can only be read
|
||||
with
|
||||
.BR getsockopt (2).
|
||||
.TP
|
||||
.B SO_BINDTODEVICE
|
||||
Bind this socket to a particular device like \(lqeth0\(rq,
|
||||
as specified in the passed interface name.
|
||||
If the
|
||||
name is an empty string or the option length is zero, the socket device
|
||||
binding is removed.
|
||||
The passed option is a variable-length null terminated
|
||||
interface name string with the maximum size of
|
||||
.BR IFNAMSIZ .
|
||||
If a socket is bound to an interface,
|
||||
only packets received from that particular interface are processed by the
|
||||
socket.
|
||||
Note that this only works for some socket types, particularly
|
||||
.B AF_INET
|
||||
sockets.
|
||||
It is not supported for packet sockets (use normal
|
||||
.BR bind (8)
|
||||
there).
|
||||
.TP
|
||||
.B SO_BROADCAST
|
||||
Set or get the broadcast flag.
|
||||
When enabled, datagram sockets
|
||||
receive packets sent to a broadcast address and they are allowed to send
|
||||
packets to a broadcast address.
|
||||
This option has no effect on stream-oriented sockets.
|
||||
.TP
|
||||
.B SO_BSDCOMPAT
|
||||
Enable BSD bug-to-bug compatibility.
|
||||
This is used by the UDP protocol module in Linux 2.0 and 2.2.
|
||||
If enabled ICMP errors received for a UDP socket will not be passed
|
||||
to the user program.
|
||||
In later kernel versions, support for this option has been phased out:
|
||||
Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
|
||||
(printk()) if a program uses this option.
|
||||
Linux 2.0 also enabled BSD bug-to-bug compatibility
|
||||
options (random header changing, skipping of the broadcast flag) for raw
|
||||
sockets with this option, but that was removed in Linux 2.2.
|
||||
.TP
|
||||
.B SO_DEBUG
|
||||
Enable socket debugging.
|
||||
Only allowed for processes with the
|
||||
.B CAP_NET_ADMIN
|
||||
capability or an effective user ID of 0.
|
||||
.TP
|
||||
.B SO_ERROR
|
||||
Get and clear the pending socket error.
|
||||
Only valid as a
|
||||
.BR getsockopt (2).
|
||||
Expects an integer.
|
||||
.TP
|
||||
.B SO_DONTROUTE
|
||||
Don't send via a gateway, only send to directly connected hosts.
|
||||
The same effect can be achieved by setting the
|
||||
.B MSG_DONTROUTE
|
||||
flag on a socket
|
||||
.BR send (2)
|
||||
operation.
|
||||
Expects an integer boolean flag.
|
||||
.TP
|
||||
.B SO_KEEPALIVE
|
||||
Enable sending of keep-alive messages on connection-oriented sockets.
|
||||
Expects an integer boolean flag.
|
||||
.TP
|
||||
.B SO_LINGER
|
||||
Sets or gets the
|
||||
.B SO_LINGER
|
||||
option.
|
||||
The argument is a
|
||||
.I linger
|
||||
structure.
|
||||
.sp
|
||||
.in +4n
|
||||
.nf
|
||||
struct linger {
|
||||
int l_onoff; /* linger active */
|
||||
int l_linger; /* how many seconds to linger for */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
.IP
|
||||
When enabled, a
|
||||
.BR close (2)
|
||||
or
|
||||
.BR shutdown (2)
|
||||
will not return until all queued messages for the socket have been
|
||||
successfully sent or the linger timeout has been reached.
|
||||
Otherwise,
|
||||
the call returns immediately and the closing is done in the background.
|
||||
When the socket is closed as part of
|
||||
.BR exit (2),
|
||||
it always lingers in the background.
|
||||
.TP
|
||||
.B SO_OOBINLINE
|
||||
If this option is enabled,
|
||||
out-of-band data is directly placed into the receive data stream.
|
||||
Otherwise out-of-band data is only passed when the
|
||||
.B MSG_OOB
|
||||
flag is set during receiving.
|
||||
.\" don't document it because it can do too much harm.
|
||||
.\".B SO_NO_CHECK
|
||||
.TP
|
||||
.B SO_PASSCRED
|
||||
Enable or disable the receiving of the
|
||||
.B SCM_CREDENTIALS
|
||||
control message.
|
||||
For more information see
|
||||
.BR unix (7).
|
||||
.\" FIXME Document SO_PASSSEC, added in 2.6.18; there is some info
|
||||
.\" in the 2.6.18 ChangeLog
|
||||
.TP
|
||||
.B SO_PEERCRED
|
||||
Return the credentials of the foreign process connected to this socket.
|
||||
This is only possible for connected
|
||||
.B PF_UNIX
|
||||
stream sockets and
|
||||
.B PF_UNIX
|
||||
stream and datagram socket pairs created using
|
||||
.BR socketpair (2);
|
||||
see
|
||||
.BR unix (7).
|
||||
The returned credentials are those that were in effect at the time
|
||||
of the call to
|
||||
.BR connect (2)
|
||||
or
|
||||
.BR socketpair (2).
|
||||
Argument is a
|
||||
.I ucred
|
||||
structure.
|
||||
Only valid as a
|
||||
.BR getsockopt (2).
|
||||
.TP
|
||||
.B SO_PRIORITY
|
||||
Set the protocol-defined priority for all packets to be sent on
|
||||
this socket.
|
||||
Linux uses this value to order the networking queues:
|
||||
packets with a higher priority may be processed first depending
|
||||
on the selected device queueing discipline.
|
||||
For
|
||||
.BR ip (7),
|
||||
this also sets the IP type-of-service (TOS) field for outgoing packets.
|
||||
Setting a priority outside the range 0 to 6 requires the
|
||||
.B CAP_NET_ADMIN
|
||||
capability.
|
||||
.TP
|
||||
.B SO_RCVBUF
|
||||
Sets or gets the maximum socket receive buffer in bytes.
|
||||
The kernel doubles this value (to allow space for bookkeeping overhead)
|
||||
when it is set using
|
||||
.\" Most (all?) other implementations do not do this -- MTK, Dec 05
|
||||
.BR setsockopt (2),
|
||||
and this doubled value is returned by
|
||||
.BR getsockopt (2).
|
||||
The default value is set by the
|
||||
.I rmem_default
|
||||
sysctl and the maximum allowed value is set by the
|
||||
.I rmem_max
|
||||
sysctl.
|
||||
The minimum (doubled) value for this option is 256.
|
||||
.TP
|
||||
.BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
|
||||
Using this socket option, a privileged
|
||||
.RB ( CAP_NET_ADMIN )
|
||||
process can perform the same task as
|
||||
.BR SO_RCVBUF ,
|
||||
but the
|
||||
.I rmem_max
|
||||
limit can be overridden.
|
||||
.TP
|
||||
.BR SO_RCVLOWAT " and " SO_SNDLOWAT
|
||||
Specify the minimum number of bytes in the buffer until the socket layer
|
||||
will pass the data to the protocol
|
||||
.RB ( SO_SNDLOWAT )
|
||||
or the user on receiving
|
||||
.RB ( SO_RCVLOWAT ).
|
||||
These two values are initialized to 1.
|
||||
.B SO_SNDLOWAT
|
||||
is not changeable on Linux
|
||||
.RB ( setsockopt (2)
|
||||
fails with the error
|
||||
.BR ENOPROTOOPT ).
|
||||
.B SO_RCVLOWAT
|
||||
is changeable
|
||||
only since Linux 2.4.
|
||||
The
|
||||
.BR select (2)
|
||||
and
|
||||
.BR poll (2)
|
||||
system calls currently do not respect the
|
||||
.B SO_RCVLOWAT
|
||||
setting on Linux,
|
||||
and mark a socket readable when even a single byte of data is available.
|
||||
A subsequent read from the socket will block until
|
||||
.B SO_RCVLOWAT
|
||||
bytes are available.
|
||||
.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
|
||||
.\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
|
||||
.TP
|
||||
.BR SO_RCVTIMEO " and " SO_SNDTIMEO
|
||||
.\" Not implemented in 2.0.
|
||||
.\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
|
||||
.\" Implemented in 2.3.41 for setsockopt, and actually used.
|
||||
Specify the receiving or sending timeouts until reporting an error.
|
||||
The argument is a
|
||||
.IR "struct timeval" .
|
||||
If an input or output function blocks for this period of time, and
|
||||
data has been sent or received, the return value of that function
|
||||
will be the amount of data transferred; if no data has been transferred
|
||||
and the timeout has been reached then \-1 is returned with
|
||||
.I errno
|
||||
set to
|
||||
.B EAGAIN
|
||||
or
|
||||
.B EWOULDBLOCK
|
||||
.\" in fact to EAGAIN
|
||||
just as if the socket was specified to be non-blocking.
|
||||
If the timeout is set to zero (the default)
|
||||
then the operation will never timeout.
|
||||
Timeouts only have effect for system calls that perform socket I/O (e.g.,
|
||||
.BR read (2),
|
||||
.BR recvmsg (2),
|
||||
.BR send (2),
|
||||
.BR sendmsg (2));
|
||||
timeouts have no effect for
|
||||
.BR select (2),
|
||||
.BR poll (2),
|
||||
.BR epoll_wait (2),
|
||||
etc.
|
||||
.TP
|
||||
.B SO_REUSEADDR
|
||||
Indicates that the rules used in validating addresses supplied in a
|
||||
.BR bind (2)
|
||||
call should allow reuse of local addresses.
|
||||
For
|
||||
.B PF_INET
|
||||
sockets this
|
||||
means that a socket may bind, except when there
|
||||
is an active listening socket bound to the address.
|
||||
When the listening socket is bound to
|
||||
.B INADDR_ANY
|
||||
with a specific port then it is not possible
|
||||
to bind to this port for any local address.
|
||||
Argument is an integer boolean flag.
|
||||
.TP
|
||||
.B SO_SNDBUF
|
||||
Sets or gets the maximum socket send buffer in bytes.
|
||||
The kernel doubles this value (to allow space for bookkeeping overhead)
|
||||
when it is set using
|
||||
.\" Most (all?) other implementations do not do this -- MTK, Dec 05
|
||||
.BR setsockopt (2),
|
||||
and this doubled value is returned by
|
||||
.BR getsockopt (2).
|
||||
The default value is set by the
|
||||
.I wmem_default
|
||||
sysctl and the maximum allowed value is set by the
|
||||
.I wmem_max
|
||||
sysctl.
|
||||
The minimum (doubled) value for this option is 2048.
|
||||
.TP
|
||||
.BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
|
||||
Using this socket option, a privileged
|
||||
.RB ( CAP_NET_ADMIN )
|
||||
process can perform the same task as
|
||||
.BR SO_SNDBUF ,
|
||||
but the
|
||||
.I wmem_max
|
||||
limit can be overridden.
|
||||
.TP
|
||||
.B SO_TIMESTAMP
|
||||
Enable or disable the receiving of the
|
||||
.B SO_TIMESTAMP
|
||||
control message.
|
||||
The timestamp control message is sent with level
|
||||
.B SOL_SOCKET
|
||||
and the
|
||||
.I cmsg_data
|
||||
field is a
|
||||
.I "struct timeval"
|
||||
indicating the
|
||||
reception time of the last packet passed to the user in this call.
|
||||
See
|
||||
.BR cmsg (3)
|
||||
for details on control messages.
|
||||
.TP
|
||||
.B SO_TYPE
|
||||
Gets the socket type as an integer (like
|
||||
.BR SOCK_STREAM ).
|
||||
Can only be read
|
||||
with
|
||||
.BR getsockopt (2).
|
||||
.SS Signals
|
||||
When writing onto a connection-oriented socket that has been shut down
|
||||
(by the local or the remote end)
|
||||
.B SIGPIPE
|
||||
is sent to the writing process and
|
||||
.B EPIPE
|
||||
is returned.
|
||||
The signal is not sent when the write call
|
||||
specified the
|
||||
.B MSG_NOSIGNAL
|
||||
flag.
|
||||
.PP
|
||||
When requested with the
|
||||
.B FIOSETOWN
|
||||
.BR fcntl (2)
|
||||
or
|
||||
.B SIOCSPGRP
|
||||
.BR ioctl (2),
|
||||
.B SIGIO
|
||||
is sent when an I/O event occurs.
|
||||
It is possible to use
|
||||
.BR poll (2)
|
||||
or
|
||||
.BR select (2)
|
||||
in the signal handler to find out which socket the event occurred on.
|
||||
An alternative (in Linux 2.2) is to set a real-time signal using the
|
||||
.B F_SETSIG
|
||||
.BR fcntl (2);
|
||||
the handler of the real time signal will be called with
|
||||
the file descriptor in the
|
||||
.I si_fd
|
||||
field of its
|
||||
.IR siginfo_t .
|
||||
See
|
||||
.BR fcntl (2)
|
||||
for more information.
|
||||
.PP
|
||||
Under some circumstances (e.g., multiple processes accessing a
|
||||
single socket), the condition that caused the
|
||||
.B SIGIO
|
||||
may have already disappeared when the process reacts to the signal.
|
||||
If this happens, the process should wait again because Linux
|
||||
will resend the signal later.
|
||||
.\" .SS Ancillary Messages
|
||||
.SS Sysctls
|
||||
The core socket networking sysctls can be accessed using the
|
||||
.I /proc/sys/net/core/*
|
||||
files or with the
|
||||
.BR sysctl (2)
|
||||
interface.
|
||||
.TP
|
||||
.I rmem_default
|
||||
contains the default setting in bytes of the socket receive buffer.
|
||||
.TP
|
||||
.I rmem_max
|
||||
contains the maximum socket receive buffer size in bytes which a user may
|
||||
set by using the
|
||||
.B SO_RCVBUF
|
||||
socket option.
|
||||
.TP
|
||||
.I wmem_default
|
||||
contains the default setting in bytes of the socket send buffer.
|
||||
.TP
|
||||
.I wmem_max
|
||||
contains the maximum socket send buffer size in bytes which a user may
|
||||
set by using the
|
||||
.B SO_SNDBUF
|
||||
socket option.
|
||||
.TP
|
||||
.BR message_cost " and " message_burst
|
||||
configure the token bucket filter used to load limit warning messages
|
||||
caused by external network events.
|
||||
.TP
|
||||
.I netdev_max_backlog
|
||||
Maximum number of packets in the global input queue.
|
||||
.TP
|
||||
.I optmem_max
|
||||
Maximum length of ancillary data and user control data like the iovecs
|
||||
per socket.
|
||||
.\" netdev_fastroute is not documented because it is experimental
|
||||
.SS Ioctls
|
||||
These operations can be accessed using
|
||||
.BR ioctl (2):
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
.IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
|
||||
.fi
|
||||
.in
|
||||
.TP
|
||||
.B SIOCGSTAMP
|
||||
Return a
|
||||
.I struct timeval
|
||||
with the receive timestamp of the last packet passed to the user.
|
||||
This is useful for accurate round trip time measurements.
|
||||
See
|
||||
.BR setitimer (2)
|
||||
for a description of
|
||||
.IR "struct timeval" .
|
||||
.\"
|
||||
This ioctl should only be used if the socket option
|
||||
.B SO_TIMESTAMP
|
||||
is not set on the socket.
|
||||
Otherwise, it returns the timestamp of the
|
||||
last packet that was received while
|
||||
.B SO_TIMESTAMP
|
||||
was not set, or it fails if no such packet has been received,
|
||||
(i.e.,
|
||||
.BR ioctl (2)
|
||||
returns \-1 with
|
||||
.I errno
|
||||
set to
|
||||
.BR ENOENT ).
|
||||
.TP
|
||||
.B SIOCSPGRP
|
||||
Set the process or process group to send
|
||||
.B SIGIO
|
||||
or
|
||||
.B SIGURG
|
||||
signals
|
||||
to when an
|
||||
asynchronous I/O operation has finished or urgent data is available.
|
||||
The argument is a pointer to a
|
||||
.IR pid_t .
|
||||
If the argument is positive, send the signals to that process.
|
||||
If the
|
||||
argument is negative, send the signals to the process group with the ID
|
||||
of the absolute value of the argument.
|
||||
The process may only choose itself or its own process group to receive
|
||||
signals unless it has the
|
||||
.B CAP_KILL
|
||||
capability or an effective UID of 0.
|
||||
.TP
|
||||
.B FIOASYNC
|
||||
Change the
|
||||
.B O_ASYNC
|
||||
flag to enable or disable asynchronous I/O mode of the socket.
|
||||
Asynchronous I/O mode means that the
|
||||
.B SIGIO
|
||||
signal or the signal set with
|
||||
.B F_SETSIG
|
||||
is raised when a new I/O event occurs.
|
||||
.IP
|
||||
Argument is an integer boolean flag.
|
||||
(This operation is synonymous with the use of
|
||||
.BR fcntl (2)
|
||||
to set the
|
||||
.B O_ASYNC
|
||||
flag.)
|
||||
.\"
|
||||
.TP
|
||||
.B SIOCGPGRP
|
||||
Get the current process or process group that receives
|
||||
.B SIGIO
|
||||
or
|
||||
.B SIGURG
|
||||
signals,
|
||||
or 0
|
||||
when none is set.
|
||||
.PP
|
||||
Valid
|
||||
.BR fcntl (2)
|
||||
operations:
|
||||
.TP
|
||||
.B FIOGETOWN
|
||||
The same as the
|
||||
.B SIOCGPGRP
|
||||
.BR ioctl (2).
|
||||
.TP
|
||||
.B FIOSETOWN
|
||||
The same as the
|
||||
.B SIOCSPGRP
|
||||
.BR ioctl (2).
|
||||
.SH VERSIONS
|
||||
.B SO_BINDTODEVICE
|
||||
was introduced in Linux 2.0.30.
|
||||
.B SO_PASSCRED
|
||||
is new in Linux 2.2.
|
||||
The sysctls are new in Linux 2.2.
|
||||
.B SO_RCVTIMEO
|
||||
and
|
||||
.B SO_SNDTIMEO
|
||||
are supported since Linux 2.3.41.
|
||||
Earlier, timeouts were fixed to
|
||||
a protocol-specific setting, and could not be read or written.
|
||||
.SH NOTES
|
||||
Linux assumes that half of the send/receive buffer is used for internal
|
||||
kernel structures; thus the sysctls are twice what can be observed
|
||||
on the wire.
|
||||
|
||||
Linux will only allow port re-use with the
|
||||
.B SO_REUSEADDR
|
||||
option
|
||||
when this option was set both in the previous program that performed a
|
||||
.BR bind (2)
|
||||
to the port and in the program that wants to re-use the port.
|
||||
This differs from some implementations (e.g., FreeBSD)
|
||||
where only the later program needs to set the
|
||||
.B SO_REUSEADDR
|
||||
option.
|
||||
Typically this difference is invisible, since, for example, a server
|
||||
program is designed to always set this option.
|
||||
.SH BUGS
|
||||
The
|
||||
.B CONFIG_FILTER
|
||||
socket options
|
||||
.B SO_ATTACH_FILTER
|
||||
and
|
||||
.B SO_DETACH_FILTER
|
||||
are
|
||||
not documented.
|
||||
The suggested interface to use them is via the libpcap
|
||||
library.
|
||||
.\" .SH AUTHORS
|
||||
.\" This man page was written by Andi Kleen.
|
||||
.SH "SEE ALSO"
|
||||
.BR getsockopt (2),
|
||||
.BR setsockopt (2),
|
||||
.BR socket (2),
|
||||
.BR capabilities (7),
|
||||
.BR ddp (7),
|
||||
.BR ip (7),
|
||||
.BR packet (7)
|
||||
|
|
955
man7/tcp.7
955
man7/tcp.7
|
@ -1,8 +1,947 @@
|
|||
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\"
|
||||
.\" 2.4 Updates by Nivedita Singhvi 4/20/02 <nivedita@us.ibm.com>.
|
||||
.\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer
|
||||
.\" Updated details of interaction of TCP_CORK and TCP_NODELAY.
|
||||
.\"
|
||||
.\" FIXME 2.6.17-rc1 adds the following /proc files, which need to be
|
||||
.\" documented: tcp_mtu_probing, tcp_base_mss, and
|
||||
.\" tcp_workaround_signed_windows
|
||||
.\"
|
||||
.TH TCP 7 2007-11-25 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
tcp \- TCP protocol
|
||||
.SH SYNOPSIS
|
||||
.B #include <sys/socket.h>
|
||||
.br
|
||||
.B #include <netinet/in.h>
|
||||
.br
|
||||
.B #include <netinet/tcp.h>
|
||||
.sp
|
||||
.B tcp_socket = socket(PF_INET, SOCK_STREAM, 0);
|
||||
.SH DESCRIPTION
|
||||
This is an implementation of the TCP protocol defined in
|
||||
RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK
|
||||
extensions.
|
||||
It provides a reliable, stream-oriented,
|
||||
full-duplex connection between two sockets on top of
|
||||
.BR ip (7),
|
||||
for both v4 and v6 versions.
|
||||
TCP guarantees that the data arrives in order and
|
||||
retransmits lost packets.
|
||||
It generates and checks a per-packet checksum to catch
|
||||
transmission errors.
|
||||
TCP does not preserve record boundaries.
|
||||
|
||||
A newly created TCP socket has no remote or local address and is not
|
||||
fully specified.
|
||||
To create an outgoing TCP connection use
|
||||
.BR connect (2)
|
||||
to establish a connection to another TCP socket.
|
||||
To receive new incoming connections, first
|
||||
.BR bind (2)
|
||||
the socket to a local address and port and then call
|
||||
.BR listen (2)
|
||||
to put the socket into the listening state.
|
||||
After that a new
|
||||
socket for each incoming connection can be accepted
|
||||
using
|
||||
.BR accept (2).
|
||||
A socket which has had
|
||||
.BR accept (2)
|
||||
or
|
||||
.BR connect (2)
|
||||
successfully called on it is fully specified and may
|
||||
transmit data.
|
||||
Data cannot be transmitted on listening or
|
||||
not yet connected sockets.
|
||||
|
||||
Linux supports RFC\ 1323 TCP high performance
|
||||
extensions.
|
||||
These include Protection Against Wrapped
|
||||
Sequence Numbers (PAWS), Window Scaling and
|
||||
Timestamps.
|
||||
Window scaling allows the use
|
||||
of large (> 64K) TCP windows in order to support links with high
|
||||
latency or bandwidth.
|
||||
To make use of them, the send and
|
||||
receive buffer sizes must be increased.
|
||||
They can be set globally with the
|
||||
.I net.ipv4.tcp_wmem
|
||||
and
|
||||
.I net.ipv4.tcp_rmem
|
||||
sysctl variables, or on individual sockets by using the
|
||||
.B SO_SNDBUF
|
||||
and
|
||||
.B SO_RCVBUF
|
||||
socket options with the
|
||||
.BR setsockopt (2)
|
||||
call.
|
||||
|
||||
The maximum sizes for socket buffers declared via the
|
||||
.B SO_SNDBUF
|
||||
and
|
||||
.B SO_RCVBUF
|
||||
mechanisms are limited by the global
|
||||
.I net.core.rmem_max
|
||||
and
|
||||
.I net.core.wmem_max
|
||||
sysctls.
|
||||
Note that TCP actually allocates twice the size of
|
||||
the buffer requested in the
|
||||
.BR setsockopt (2)
|
||||
call, and so a succeeding
|
||||
.BR getsockopt (2)
|
||||
call will not return the same size of buffer as requested
|
||||
in the
|
||||
.BR setsockopt (2)
|
||||
call.
|
||||
TCP uses the extra space for administrative purposes and internal
|
||||
kernel structures, and the sysctl variables reflect the
|
||||
larger sizes compared to the actual TCP windows.
|
||||
On individual connections, the socket buffer size must be
|
||||
set prior to the
|
||||
.BR listen (2)
|
||||
or
|
||||
.BR connect (2)
|
||||
calls in order to have it take effect.
|
||||
See
|
||||
.BR socket (7)
|
||||
for more information.
|
||||
.PP
|
||||
TCP supports urgent data.
|
||||
Urgent data is used to signal the
|
||||
receiver that some important message is part of the data
|
||||
stream and that it should be processed as soon as possible.
|
||||
To send urgent data specify the
|
||||
.B MSG_OOB
|
||||
option to
|
||||
.BR send (2).
|
||||
When urgent data is received, the kernel sends a
|
||||
.B SIGURG
|
||||
signal to the process or process group that has been set as the
|
||||
socket "owner" using the
|
||||
.B SIOCSPGRP
|
||||
or
|
||||
.B FIOSETOWN
|
||||
ioctls (or the POSIX.1-2001-specified
|
||||
.BR fcntl (2)
|
||||
.B F_SETOWN
|
||||
operation).
|
||||
When the
|
||||
.B SO_OOBINLINE
|
||||
socket option is enabled, urgent data is put into the normal
|
||||
data stream (a program can test for its location using the
|
||||
.B SIOCATMARK
|
||||
ioctl described below),
|
||||
otherwise it can be only received when the
|
||||
.B MSG_OOB
|
||||
flag is set for
|
||||
.BR recv (2)
|
||||
or
|
||||
.BR recvmsg (2).
|
||||
|
||||
Linux 2.4 introduced a number of changes for improved
|
||||
throughput and scaling, as well as enhanced functionality.
|
||||
Some of these features include support for zero-copy
|
||||
.BR sendfile (2),
|
||||
Explicit Congestion Notification, new
|
||||
management of TIME_WAIT sockets, keep-alive socket options
|
||||
and support for Duplicate SACK extensions.
|
||||
.SS Address Formats
|
||||
TCP is built on top of IP (see
|
||||
.BR ip (7)).
|
||||
The address formats defined by
|
||||
.BR ip (7)
|
||||
apply to TCP.
|
||||
TCP only supports point-to-point
|
||||
communication; broadcasting and multicasting are not
|
||||
supported.
|
||||
.SS Sysctls
|
||||
These variables can be accessed by the
|
||||
.I /proc/sys/net/ipv4/*
|
||||
files or with the
|
||||
.BR sysctl (2)
|
||||
interface.
|
||||
In addition, most IP sysctls also apply to TCP; see
|
||||
.BR ip (7).
|
||||
Variables described as
|
||||
.I Boolean
|
||||
take an integer value, with a non-zero value ("true") meaning that
|
||||
the corresponding option is enabled, and a zero value ("false")
|
||||
meaning that the option is disabled.
|
||||
.\" FIXME As at Sept 2006, kernel 2.6.18-rc5, the following are
|
||||
.\" not yet documented (shown with default values):
|
||||
.\"
|
||||
.\" /proc/sys/net/ipv4/tcp_congestion_control (since 2.6.13)
|
||||
.\" bic
|
||||
.\" /proc/sys/net/ipv4/tcp_moderate_rcvbuf
|
||||
.\" 1
|
||||
.\" /proc/sys/net/ipv4/tcp_no_metrics_save
|
||||
.\" 0
|
||||
.TP
|
||||
.BR tcp_abort_on_overflow " (Boolean; default: disabled)"
|
||||
Enable resetting connections if the listening service is too
|
||||
slow and unable to keep up and accept them.
|
||||
It means that if overflow occurred due
|
||||
to a burst, the connection will recover.
|
||||
Enable this option
|
||||
.I only
|
||||
if you are really sure that the listening daemon
|
||||
cannot be tuned to accept connections faster.
|
||||
Enabling this
|
||||
option can harm the clients of your server.
|
||||
.TP
|
||||
.BR tcp_adv_win_scale " (integer; default: 2)"
|
||||
Count buffering overhead as
|
||||
.IR "bytes/2^tcp_adv_win_scale" ,
|
||||
if
|
||||
.I tcp_adv_win_scale
|
||||
is greater than 0; or
|
||||
.IR "bytes-bytes/2^(\-tcp_adv_win_scale)" ,
|
||||
if
|
||||
.I tcp_adv_win_scale
|
||||
is less than or equal to zero.
|
||||
|
||||
The socket receive buffer space is shared between the
|
||||
application and kernel.
|
||||
TCP maintains part of the buffer as
|
||||
the TCP window, this is the size of the receive window
|
||||
advertised to the other end.
|
||||
The rest of the space is used
|
||||
as the "application" buffer, used to isolate the network
|
||||
from scheduling and application latencies.
|
||||
The
|
||||
.I tcp_adv_win_scale
|
||||
default value of 2 implies that the space
|
||||
used for the application buffer is one fourth that of the
|
||||
total.
|
||||
.TP
|
||||
.BR tcp_app_win " (integer; default: 31)"
|
||||
This variable defines how many
|
||||
bytes of the TCP window are reserved for buffering
|
||||
overhead.
|
||||
|
||||
A maximum of (\fIwindow/2^tcp_app_win\fP, mss) bytes in the window
|
||||
are reserved for the application buffer.
|
||||
A value of 0
|
||||
implies that no amount is reserved.
|
||||
.\"
|
||||
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
||||
.TP
|
||||
.BR tcp_bic " (Boolean; default: disabled)"
|
||||
Enable BIC TCP congestion control algorithm.
|
||||
BIC-TCP is a sender-side only change that ensures a linear RTT
|
||||
fairness under large windows while offering both scalability and
|
||||
bounded TCP-friendliness.
|
||||
The protocol combines two schemes
|
||||
called additive increase and binary search increase.
|
||||
When the
|
||||
congestion window is large, additive increase with a large
|
||||
increment ensures linear RTT fairness as well as good
|
||||
scalability.
|
||||
Under small congestion windows, binary search
|
||||
increase provides TCP friendliness.
|
||||
.\"
|
||||
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
||||
.TP
|
||||
.BR tcp_bic_low_window " (integer; default: 14)"
|
||||
Sets the threshold window (in packets) where BIC TCP starts to
|
||||
adjust the congestion window.
|
||||
Below this threshold BIC TCP behaves
|
||||
the same as the default TCP Reno.
|
||||
.\"
|
||||
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
||||
.TP
|
||||
.BR tcp_bic_fast_convergence " (Boolean; default: enabled)"
|
||||
Forces BIC TCP to more quickly respond to changes in congestion
|
||||
window.
|
||||
Allows two flows sharing the same connection to converge
|
||||
more rapidly.
|
||||
.TP
|
||||
.BR tcp_dsack " (Boolean; default: enabled)"
|
||||
Enable RFC\ 2883 TCP Duplicate SACK support.
|
||||
.TP
|
||||
.BR tcp_ecn " (Boolean; default: disabled)"
|
||||
Enable RFC\ 2884 Explicit Congestion Notification.
|
||||
When enabled, connectivity to some
|
||||
destinations could be affected due to older, misbehaving
|
||||
routers along the path causing connections to be dropped.
|
||||
.TP
|
||||
.BR tcp_fack " (Boolean; default: enabled)"
|
||||
Enable TCP Forward Acknowledgement support.
|
||||
.TP
|
||||
.BR tcp_fin_timeout " (integer; default: 60)"
|
||||
This specifies how many seconds to wait for a final FIN packet before the
|
||||
socket is forcibly closed.
|
||||
This is strictly a violation of
|
||||
the TCP specification, but required to prevent
|
||||
denial-of-service attacks.
|
||||
In Linux 2.2, the default value was 180.
|
||||
.\"
|
||||
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
||||
.TP
|
||||
.BR tcp_frto " (Boolean; default: disabled)"
|
||||
Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
|
||||
timeouts.
|
||||
It is particularly beneficial in wireless environments
|
||||
where packet loss is typically due to random radio interference
|
||||
rather than intermediate router congestion.
|
||||
.TP
|
||||
.BR tcp_keepalive_intvl " (integer; default: 75)"
|
||||
The number of seconds between TCP keep-alive probes.
|
||||
.TP
|
||||
.BR tcp_keepalive_probes " (integer; default: 9)"
|
||||
The maximum number of TCP keep-alive probes to send
|
||||
before giving up and killing the connection if
|
||||
no response is obtained from the other end.
|
||||
.TP
|
||||
.BR tcp_keepalive_time " (integer; default: 7200)"
|
||||
The number of seconds a connection needs to be idle
|
||||
before TCP begins sending out keep-alive probes.
|
||||
Keep-alives are only sent when the
|
||||
.B SO_KEEPALIVE
|
||||
socket option is enabled.
|
||||
The default value is 7200 seconds (2 hours).
|
||||
An idle connection is terminated after
|
||||
approximately an additional 11 minutes (9 probes an interval
|
||||
of 75 seconds apart) when keep-alive is enabled.
|
||||
|
||||
Note that underlying connection tracking mechanisms and
|
||||
application timeouts may be much shorter.
|
||||
.\"
|
||||
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
||||
.TP
|
||||
.BR tcp_low_latency " (Boolean; default: disabled)"
|
||||
If enabled, the TCP stack makes decisions that prefer lower
|
||||
latency as opposed to higher throughput.
|
||||
It this option is disabled, then higher throughput is preferred.
|
||||
An example of an application where this default should be
|
||||
changed would be a Beowulf compute cluster.
|
||||
.TP
|
||||
.BR tcp_max_orphans " (integer; default: see below)"
|
||||
The maximum number of orphaned (not attached to any user file
|
||||
handle) TCP sockets allowed in the system.
|
||||
When this number
|
||||
is exceeded, the orphaned connection is reset and a warning
|
||||
is printed.
|
||||
This limit exists only to prevent simple denial-of-service attacks.
|
||||
Lowering this limit is not recommended.
|
||||
Network conditions might require you to increase the number of
|
||||
orphans allowed, but note that each orphan can eat up to ~64K
|
||||
of unswappable memory.
|
||||
The default initial value is set
|
||||
equal to the kernel parameter NR_FILE.
|
||||
This initial default is adjusted depending on the memory in the system.
|
||||
.TP
|
||||
.BR tcp_max_syn_backlog " (integer; default: see below)"
|
||||
The maximum number of queued connection requests which have
|
||||
still not received an acknowledgement from the connecting client.
|
||||
If this number is exceeded, the kernel will begin
|
||||
dropping requests.
|
||||
The default value of 256 is increased to
|
||||
1024 when the memory present in the system is adequate or
|
||||
greater (>= 128Mb), and reduced to 128 for those systems with
|
||||
very low memory (<= 32Mb).
|
||||
It is recommended that if this
|
||||
needs to be increased above 1024, TCP_SYNQ_HSIZE in
|
||||
.I include/net/tcp.h
|
||||
be modified to keep
|
||||
TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be
|
||||
recompiled.
|
||||
.TP
|
||||
.BR tcp_max_tw_buckets " (integer; default: see below)"
|
||||
The maximum number of sockets in TIME_WAIT state allowed in
|
||||
the system.
|
||||
This limit exists only to prevent simple denial-of-service
|
||||
attacks.
|
||||
The default value of NR_FILE*2 is adjusted
|
||||
depending on the memory in the system.
|
||||
If this number is
|
||||
exceeded, the socket is closed and a warning is printed.
|
||||
.TP
|
||||
.I tcp_mem
|
||||
This is a vector of 3 integers: [low, pressure, high].
|
||||
These bounds are used by TCP to track its memory usage.
|
||||
The
|
||||
defaults are calculated at boot time from the amount of
|
||||
available memory.
|
||||
(TCP can only use
|
||||
.I "low memory"
|
||||
for this, which is limited to around 900 megabytes on 32-bit systems.
|
||||
64-bit systems do not suffer this limitation.)
|
||||
|
||||
.I low
|
||||
- TCP doesn't regulate its memory allocation when the number
|
||||
of pages it has allocated globally is below this number.
|
||||
|
||||
.I pressure
|
||||
- when the amount of memory allocated by TCP
|
||||
exceeds this number of pages, TCP moderates its memory consumption.
|
||||
This memory pressure state is exited
|
||||
once the number of pages allocated falls below
|
||||
the
|
||||
.I low
|
||||
mark.
|
||||
|
||||
.I high
|
||||
- the maximum number of pages, globally, that TCP
|
||||
will allocate.
|
||||
This value overrides any other limits
|
||||
imposed by the kernel.
|
||||
.TP
|
||||
.BR tcp_orphan_retries " (integer; default: 8)"
|
||||
The maximum number of attempts made to probe the other
|
||||
end of a connection which has been closed by our end.
|
||||
.TP
|
||||
.BR tcp_reordering " (integer; default: 3)"
|
||||
The maximum a packet can be reordered in a TCP packet stream
|
||||
without TCP assuming packet loss and going into slow start.
|
||||
It is not advisable to change this number.
|
||||
This is a packet reordering detection metric designed to
|
||||
minimize unnecessary back off and retransmits provoked by
|
||||
reordering of packets on a connection.
|
||||
.TP
|
||||
.BR tcp_retrans_collapse " (Boolean; default: enabled)"
|
||||
Try to send full-sized packets during retransmit.
|
||||
.TP
|
||||
.BR tcp_retries1 " (integer; default: 3)"
|
||||
The number of times TCP will attempt to retransmit a
|
||||
packet on an established connection normally,
|
||||
without the extra effort of getting the network
|
||||
layers involved.
|
||||
Once we exceed this number of
|
||||
retransmits, we first have the network layer
|
||||
update the route if possible before each new retransmit.
|
||||
The default is the RFC specified minimum of 3.
|
||||
.TP
|
||||
.BR tcp_retries2 " (integer; default: 15)"
|
||||
The maximum number of times a TCP packet is retransmitted
|
||||
in established state before giving up.
|
||||
The default
|
||||
value is 15, which corresponds to a duration of
|
||||
approximately between 13 to 30 minutes, depending
|
||||
on the retransmission timeout.
|
||||
The RFC\ 1122 specified
|
||||
minimum limit of 100 seconds is typically deemed too
|
||||
short.
|
||||
.TP
|
||||
.BR tcp_rfc1337 " (Boolean; default: disabled)"
|
||||
Enable TCP behavior conformant with RFC\ 1337.
|
||||
When disabled,
|
||||
if a RST is received in TIME_WAIT state, we close
|
||||
the socket immediately without waiting for the end
|
||||
of the TIME_WAIT period.
|
||||
.TP
|
||||
.I tcp_rmem
|
||||
This is a vector of 3 integers: [min, default,
|
||||
max].
|
||||
These parameters are used by TCP to regulate receive
|
||||
buffer sizes.
|
||||
TCP dynamically adjusts the size of the
|
||||
receive buffer from the defaults listed below, in the range
|
||||
of these sysctl variables, depending on memory available
|
||||
in the system.
|
||||
.RS
|
||||
.TP 9
|
||||
.I min
|
||||
minimum size of the receive buffer used by each TCP socket.
|
||||
The default value is 4K, and is lowered to
|
||||
.B PAGE_SIZE
|
||||
bytes in low-memory systems.
|
||||
This value
|
||||
is used to ensure that in memory pressure mode,
|
||||
allocations below this size will still succeed.
|
||||
This is not
|
||||
used to bound the size of the receive buffer declared
|
||||
using
|
||||
.B SO_RCVBUF
|
||||
on a socket.
|
||||
.TP
|
||||
.I default
|
||||
the default size of the receive buffer for a TCP socket.
|
||||
This value overwrites the initial default buffer size from
|
||||
the generic global
|
||||
.I net.core.rmem_default
|
||||
defined for all protocols.
|
||||
The default value is 87380
|
||||
bytes, and is lowered to 43689 in low-memory systems.
|
||||
If larger receive buffer sizes are desired, this value should
|
||||
be increased (to affect all sockets).
|
||||
To employ large TCP
|
||||
windows, the
|
||||
.I net.ipv4.tcp_window_scaling
|
||||
must be enabled (default).
|
||||
.TP
|
||||
.I max
|
||||
the maximum size of the receive buffer used by
|
||||
each TCP socket.
|
||||
This value does not override the global
|
||||
.IR net.core.rmem_max .
|
||||
This is not used to limit the size of the receive buffer
|
||||
declared using
|
||||
.B SO_RCVBUF
|
||||
on a socket.
|
||||
The default value of 87380*2 bytes is lowered to 87380
|
||||
in low-memory systems.
|
||||
.RE
|
||||
.TP
|
||||
.BR tcp_sack " (Boolean; default: enabled)"
|
||||
Enable RFC\ 2018 TCP Selective Acknowledgements.
|
||||
.TP
|
||||
.BR tcp_stdurg " (Boolean; default: disabled)"
|
||||
If this option is enabled, then use the RFC\ 1122 interpretation
|
||||
of the TCP urgent-pointer field.
|
||||
.\" RFC 793 was ambiguous in its specification of the meaning of the
|
||||
.\" urgent pointer. RFC 1122 (and RFC 961) fixed on a particular
|
||||
.\" resolution of this ambiguity (unfortunately the "wrong" one).
|
||||
According to this interpretation, the urgent pointer points
|
||||
to the last byte of urgent data.
|
||||
If this option is disabled, then use the BSD-compatible interpretation of
|
||||
the urgent pointer:
|
||||
the urgent pointer points to the first byte after the urgent data.
|
||||
Enabling this option may lead to interoperability problems.
|
||||
.TP
|
||||
.BR tcp_synack_retries " (integer; default: 5)"
|
||||
The maximum number of times a SYN/ACK segment
|
||||
for a passive TCP connection will be retransmitted.
|
||||
This number should not be higher than 255.
|
||||
.TP
|
||||
.BR tcp_syncookies " (Boolean)"
|
||||
Enable TCP syncookies.
|
||||
The kernel must be compiled with
|
||||
.BR CONFIG_SYN_COOKIES .
|
||||
Send out syncookies when the syn backlog queue of a socket
|
||||
overflows.
|
||||
The syncookies feature attempts to protect a
|
||||
socket from a SYN flood attack.
|
||||
This should be used as a
|
||||
last resort, if at all.
|
||||
This is a violation of the TCP
|
||||
protocol, and conflicts with other areas of TCP such as TCP
|
||||
extensions.
|
||||
It can cause problems for clients and relays.
|
||||
It is not recommended as a tuning mechanism for heavily
|
||||
loaded servers to help with overloaded or misconfigured
|
||||
conditions.
|
||||
For recommended alternatives see
|
||||
.IR tcp_max_syn_backlog ,
|
||||
.IR tcp_synack_retries ,
|
||||
and
|
||||
.IR tcp_abort_on_overflow .
|
||||
.TP
|
||||
.BR tcp_syn_retries " (integer; default: 5)"
|
||||
The maximum number of times initial SYNs for an active TCP
|
||||
connection attempt will be retransmitted.
|
||||
This value should
|
||||
not be higher than 255.
|
||||
The default value is 5, which
|
||||
corresponds to approximately 180 seconds.
|
||||
.TP
|
||||
.BR tcp_timestamps " (Boolean; default: enabled)"
|
||||
Enable RFC\ 1323 TCP timestamps.
|
||||
.TP
|
||||
.BR tcp_tw_recycle " (Boolean; default: disabled)"
|
||||
Enable fast recycling of TIME_WAIT sockets.
|
||||
Enabling this option is not
|
||||
recommended since this causes problems when working
|
||||
with NAT (Network Address Translation).
|
||||
.\"
|
||||
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
||||
.TP
|
||||
.BR tcp_tw_reuse " (Boolean; default: disabled)"
|
||||
Allow to reuse TIME_WAIT sockets for new connections when it is
|
||||
safe from protocol viewpoint.
|
||||
It should not be changed without advice/request of technical
|
||||
experts.
|
||||
.TP
|
||||
.BR tcp_window_scaling " (Boolean; default: enabled)"
|
||||
Enable RFC\ 1323 TCP window scaling.
|
||||
This feature allows the use of a large window
|
||||
(> 64K) on a TCP connection, should the other end support it.
|
||||
Normally, the 16 bit window length field in the TCP header
|
||||
limits the window size to less than 64K bytes.
|
||||
If larger
|
||||
windows are desired, applications can increase the size of
|
||||
their socket buffers and the window scaling option will be
|
||||
employed.
|
||||
If
|
||||
.I tcp_window_scaling
|
||||
is disabled, TCP will not negotiate the use of window
|
||||
scaling with the other end during connection setup.
|
||||
.\"
|
||||
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
||||
.TP
|
||||
.BR tcp_vegas_cong_avoid " (Boolean; default: disabled)"
|
||||
Enable TCP Vegas congestion avoidance algorithm.
|
||||
TCP Vegas is a sender-side only change to TCP that anticipates
|
||||
the onset of congestion by estimating the bandwidth.
|
||||
TCP Vegas
|
||||
adjusts the sending rate by modifying the congestion
|
||||
window.
|
||||
TCP Vegas should provide less packet loss, but it is
|
||||
not as aggressive as TCP Reno.
|
||||
.\"
|
||||
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
||||
.TP
|
||||
.BR tcp_westwood " (Boolean; default: disabled)"
|
||||
Enable TCP Westwood+ congestion control algorithm.
|
||||
TCP Westwood+ is a sender-side only modification of the TCP Reno
|
||||
protocol stack that optimizes the performance of TCP congestion
|
||||
control.
|
||||
It is based on end-to-end bandwidth estimation to set
|
||||
congestion window and slow start threshold after a congestion
|
||||
episode.
|
||||
Using this estimation, TCP Westwood+ adaptively sets a
|
||||
slow start threshold and a congestion window which takes into
|
||||
account the bandwidth used at the time congestion is experienced.
|
||||
TCP Westwood+ significantly increases fairness with respect to
|
||||
TCP Reno in wired networks and throughput over wireless links.
|
||||
.TP
|
||||
.I tcp_wmem
|
||||
This is a vector of 3 integers: [min, default, max].
|
||||
These parameters are used by TCP to regulate send buffer sizes.
|
||||
TCP dynamically adjusts the size of the send buffer from the
|
||||
default values listed below, in the range of these sysctl
|
||||
variables, depending on memory available.
|
||||
|
||||
.I min
|
||||
- minimum size of the send buffer used by each TCP socket.
|
||||
The default value is 4K bytes.
|
||||
This value is used to ensure that in memory pressure mode,
|
||||
allocations below this size will still succeed.
|
||||
This is not
|
||||
used to bound the size of the send buffer declared
|
||||
using
|
||||
.B SO_SNDBUF
|
||||
on a socket.
|
||||
|
||||
.I default
|
||||
- the default size of the send buffer for a TCP socket.
|
||||
This value overwrites the initial default buffer size from
|
||||
the generic global
|
||||
.I net.core.wmem_default
|
||||
defined for all protocols.
|
||||
The default value is 16K bytes.
|
||||
If larger send buffer sizes are desired, this value
|
||||
should be increased (to affect all sockets).
|
||||
To employ large TCP windows, the sysctl variable
|
||||
.I net.ipv4.tcp_window_scaling
|
||||
must be enabled (default).
|
||||
|
||||
.I max
|
||||
- the maximum size of the send buffer used by
|
||||
each TCP socket.
|
||||
This value does not override the global
|
||||
.IR net.core.wmem_max .
|
||||
This is not used to limit the size of the send buffer
|
||||
declared using
|
||||
.B SO_SNDBUF
|
||||
on a socket.
|
||||
The default value is 128K bytes.
|
||||
It is lowered to 64K
|
||||
depending on the memory available in the system.
|
||||
.SS Socket Options
|
||||
To set or get a TCP socket option, call
|
||||
.BR getsockopt (2)
|
||||
to read or
|
||||
.BR setsockopt (2)
|
||||
to write the option with the option level argument set to
|
||||
.BR IPPROTO_TCP .
|
||||
.\" or SOL_TCP on Linux
|
||||
In addition,
|
||||
most
|
||||
.B IPPROTO_IP
|
||||
socket options are valid on TCP sockets.
|
||||
For more information see
|
||||
.BR ip (7).
|
||||
.\" FIXME Document TCP_CONGESTION (new in 2.6.13)
|
||||
.TP
|
||||
.B TCP_CORK
|
||||
If set, don't send out partial frames.
|
||||
All queued
|
||||
partial frames are sent when the option is cleared again.
|
||||
This is useful for prepending headers before calling
|
||||
.BR sendfile (2),
|
||||
or for throughput optimization.
|
||||
As currently implemented, there is a 200 millisecond ceiling on the time
|
||||
for which output is corked by
|
||||
.BR TCP_CORK .
|
||||
If this ceiling is reached, then queued data is automatically transmitted.
|
||||
This option can be
|
||||
combined with
|
||||
.B TCP_NODELAY
|
||||
only since Linux 2.5.71.
|
||||
This option should not be used in code intended to be
|
||||
portable.
|
||||
.TP
|
||||
.B TCP_DEFER_ACCEPT
|
||||
Allows a listener to be awakened only when data arrives on
|
||||
the socket.
|
||||
Takes an integer value (seconds), this can
|
||||
bound the maximum number of attempts TCP will make to
|
||||
complete the connection.
|
||||
This option should not be used in
|
||||
code intended to be portable.
|
||||
.TP
|
||||
.B TCP_INFO
|
||||
Used to collect information about this socket.
|
||||
The kernel returns a \fIstruct tcp_info\fP as defined in the file
|
||||
.IR /usr/include/linux/tcp.h .
|
||||
This option should not be used in code intended to be portable.
|
||||
.TP
|
||||
.B TCP_KEEPCNT
|
||||
The maximum number of keepalive probes TCP should send
|
||||
before dropping the connection.
|
||||
This option should not be
|
||||
used in code intended to be portable.
|
||||
.TP
|
||||
.B TCP_KEEPIDLE
|
||||
The time (in seconds) the connection needs to remain idle
|
||||
before TCP starts sending keepalive probes, if the socket
|
||||
option
|
||||
.B SO_KEEPALIVE
|
||||
has been set on this socket.
|
||||
This option should not be used in code intended to be portable.
|
||||
.TP
|
||||
.B TCP_KEEPINTVL
|
||||
The time (in seconds) between individual keepalive probes.
|
||||
This option should not be used in code intended to be
|
||||
portable.
|
||||
.TP
|
||||
.B TCP_LINGER2
|
||||
The lifetime of orphaned FIN_WAIT2 state sockets.
|
||||
This option can be used to override the system wide sysctl
|
||||
.I tcp_fin_timeout
|
||||
on this socket.
|
||||
This is not to be confused with the
|
||||
.BR socket (7)
|
||||
level option
|
||||
.BR SO_LINGER .
|
||||
This option should not be used in code intended to be
|
||||
portable.
|
||||
.TP
|
||||
.B TCP_MAXSEG
|
||||
The maximum segment size for outgoing TCP packets.
|
||||
If this option is set before connection establishment, it also
|
||||
changes the MSS value announced to the other end in the
|
||||
initial packet.
|
||||
Values greater than the (eventual) interface MTU have no effect.
|
||||
TCP will also impose
|
||||
its minimum and maximum bounds over the value provided.
|
||||
.TP
|
||||
.B TCP_NODELAY
|
||||
If set, disable the Nagle algorithm.
|
||||
This means that segments
|
||||
are always sent as soon as possible, even if there is only a
|
||||
small amount of data.
|
||||
When not set, data is buffered until there
|
||||
is a sufficient amount to send out, thereby avoiding the
|
||||
frequent sending of small packets, which results in poor
|
||||
utilization of the network.
|
||||
This option is overridden by
|
||||
.BR TCP_CORK ;
|
||||
however, setting this option forces an explicit flush of
|
||||
pending output, even if
|
||||
.B TCP_CORK
|
||||
is currently set.
|
||||
.TP
|
||||
.B TCP_QUICKACK
|
||||
Enable quickack mode if set or disable quickack
|
||||
mode if cleared.
|
||||
In quickack mode, acks are sent
|
||||
immediately, rather than delayed if needed in accordance
|
||||
to normal TCP operation.
|
||||
This flag is not permanent,
|
||||
it only enables a switch to or from quickack mode.
|
||||
Subsequent operation of the TCP protocol will
|
||||
once again enter/leave quickack mode depending on
|
||||
internal protocol processing and factors such as
|
||||
delayed ack timeouts occurring and data transfer.
|
||||
This option should not be used in code intended to be
|
||||
portable.
|
||||
.TP
|
||||
.B TCP_SYNCNT
|
||||
Set the number of SYN retransmits that TCP should send before
|
||||
aborting the attempt to connect.
|
||||
It cannot exceed 255.
|
||||
This option should not be used in code intended to be
|
||||
portable.
|
||||
.TP
|
||||
.B TCP_WINDOW_CLAMP
|
||||
Bound the size of the advertised window to this value.
|
||||
The kernel imposes a minimum size of SOCK_MIN_RCVBUF/2.
|
||||
This option should not be used in code intended to be
|
||||
portable.
|
||||
.SS Ioctls
|
||||
These following
|
||||
.BR ioctl (2)
|
||||
calls return information in
|
||||
.IR value .
|
||||
The correct syntax is:
|
||||
.PP
|
||||
.RS
|
||||
.nf
|
||||
.BI int " value";
|
||||
.IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");"
|
||||
.fi
|
||||
.RE
|
||||
.PP
|
||||
.I ioctl_type
|
||||
is one of the following:
|
||||
.TP
|
||||
.B SIOCINQ
|
||||
Returns the amount of queued unread data in the receive buffer.
|
||||
The socket must not be in LISTEN state, otherwise an error
|
||||
.RB ( EINVAL )
|
||||
is returned.
|
||||
.TP
|
||||
.B SIOCATMARK
|
||||
Returns true (i.e.,
|
||||
.I value
|
||||
is non-zero) if the inbound data stream is at the urgent mark.
|
||||
|
||||
If the
|
||||
.B SO_OOBINLINE
|
||||
socket option is set, and
|
||||
.B SIOCATMARK
|
||||
returns true, then the
|
||||
next read from the socket will return the urgent data.
|
||||
If the
|
||||
.B SO_OOBINLINE
|
||||
socket option is not set, and
|
||||
.B SIOCATMARK
|
||||
returns true, then the
|
||||
next read from the socket will return the bytes following
|
||||
the urgent data (to actually read the urgent data requires the
|
||||
.B recv(MSG_OOB)
|
||||
flag).
|
||||
|
||||
Note that a read never reads across the urgent mark.
|
||||
If an application is informed of the presence of urgent data via
|
||||
.BR select (2)
|
||||
(using the
|
||||
.I exceptfds
|
||||
argument) or through delivery of a
|
||||
.B SIGURG
|
||||
signal,
|
||||
then it can advance up to the mark using a loop which repeatedly tests
|
||||
.B SIOCATMARK
|
||||
and performs a read (requesting any number of bytes) as long as
|
||||
.B SIOCATMARK
|
||||
returns false.
|
||||
.TP
|
||||
.B SIOCOUTQ
|
||||
Returns the amount of unsent data in the socket send queue.
|
||||
The socket must not be in LISTEN state, otherwise an error
|
||||
.RB ( EINVAL )
|
||||
is returned.
|
||||
.SS Error Handling
|
||||
When a network error occurs, TCP tries to resend the packet.
|
||||
If it doesn't succeed after some time, either
|
||||
.B ETIMEDOUT
|
||||
or the last received error on this connection is reported.
|
||||
.PP
|
||||
Some applications require a quicker error notification.
|
||||
This can be enabled with the
|
||||
.B IPPROTO_IP
|
||||
level
|
||||
.B IP_RECVERR
|
||||
socket option.
|
||||
When this option is enabled, all incoming
|
||||
errors are immediately passed to the user program.
|
||||
Use this
|
||||
option with care \(em it makes TCP less tolerant to routing
|
||||
changes and other normal network conditions.
|
||||
.SH ERRORS
|
||||
.TP
|
||||
.B EAFNOTSUPPORT
|
||||
Passed socket address type in
|
||||
.I sin_family
|
||||
was not
|
||||
.BR AF_INET .
|
||||
.TP
|
||||
.B EPIPE
|
||||
The other end closed the socket unexpectedly or a read is
|
||||
executed on a shut down socket.
|
||||
.TP
|
||||
.B ETIMEDOUT
|
||||
The other end didn't acknowledge retransmitted data after
|
||||
some time.
|
||||
.PP
|
||||
Any errors defined for
|
||||
.BR ip (7)
|
||||
or the generic socket layer may also be returned for TCP.
|
||||
.SH VERSIONS
|
||||
Support for Explicit Congestion Notification, zero-copy
|
||||
.BR sendfile (2),
|
||||
reordering support and some SACK extensions
|
||||
(DSACK) were introduced in 2.4.
|
||||
Support for forward acknowledgement (FACK), TIME_WAIT recycling,
|
||||
per connection keepalive socket options and sysctls
|
||||
were introduced in 2.3.
|
||||
|
||||
The default values and descriptions for the sysctl variables
|
||||
given above are applicable for the 2.4 kernel.
|
||||
.SH NOTES
|
||||
TCP has no real out-of-band data; it has urgent data.
|
||||
In Linux this means if the other end sends newer out-of-band
|
||||
data the older urgent data is inserted as normal data into
|
||||
the stream (even when
|
||||
.B SO_OOBINLINE
|
||||
is not set).
|
||||
This differs from BSD-based stacks.
|
||||
.PP
|
||||
Linux uses the BSD compatible interpretation of the urgent
|
||||
pointer field by default.
|
||||
This violates RFC\ 1122, but is
|
||||
required for interoperability with other stacks.
|
||||
It can be changed by the
|
||||
.I tcp_stdurg
|
||||
sysctl.
|
||||
.SH BUGS
|
||||
Not all errors are documented.
|
||||
.br
|
||||
IPv6 is not described.
|
||||
.\" Only a single Linux kernel version is described
|
||||
.\" Info for 2.2 was lost. Should be added again,
|
||||
.\" or put into a separate page.
|
||||
.\" .SH AUTHORS
|
||||
.\" This man page was originally written by Andi Kleen.
|
||||
.\" It was updated for 2.4 by Nivedita Singhvi with input from
|
||||
.\" Alexey Kuznetsov's Documentation/networking/ip-sysctl.txt
|
||||
.\" document.
|
||||
.SH "SEE ALSO"
|
||||
.BR accept (2),
|
||||
.BR bind (2),
|
||||
.BR connect (2),
|
||||
.BR getsockopt (2),
|
||||
.BR listen (2),
|
||||
.BR recvmsg (2),
|
||||
.BR sendfile (2),
|
||||
.BR sendmsg (2),
|
||||
.BR socket (2),
|
||||
.BR sysctl (2),
|
||||
.BR ip (7),
|
||||
.BR socket (7)
|
||||
.sp
|
||||
RFC\ 793 for the TCP specification.
|
||||
.br
|
||||
RFC\ 1122 for the TCP requirements and a description
|
||||
of the Nagle algorithm.
|
||||
.br
|
||||
RFC\ 1323 for TCP timestamp and window scaling options.
|
||||
.br
|
||||
RFC\ 1644 for a description of TIME_WAIT assassination
|
||||
hazards.
|
||||
.br
|
||||
RFC\ 3168 for a description of Explicit Congestion
|
||||
Notification.
|
||||
.br
|
||||
RFC\ 2581 for TCP congestion control algorithms.
|
||||
.br
|
||||
RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK.
|
||||
|
|
201
man7/udp.7
201
man7/udp.7
|
@ -1,8 +1,193 @@
|
|||
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\" $Id: udp.7,v 1.7 2000/01/22 01:55:05 freitag Exp $
|
||||
.\"
|
||||
.TH UDP 7 1998-10-02 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
udp \- User Datagram Protocol for IPv4
|
||||
.SH SYNOPSIS
|
||||
.B #include <sys/socket.h>
|
||||
.br
|
||||
.B #include <netinet/in.h>
|
||||
.sp
|
||||
.B udp_socket = socket(PF_INET, SOCK_DGRAM, 0);
|
||||
.SH DESCRIPTION
|
||||
This is an implementation of the User Datagram Protocol
|
||||
described in RFC\ 768.
|
||||
It implements a connectionless, unreliable datagram packet service.
|
||||
Packets may be reordered or duplicated before they arrive.
|
||||
UDP generates and checks checksums to catch transmission errors.
|
||||
|
||||
When a UDP socket is created,
|
||||
its local and remote addresses are unspecified.
|
||||
Datagrams can be sent immediately using
|
||||
.BR sendto (2)
|
||||
or
|
||||
.BR sendmsg (2)
|
||||
with a valid destination address as an argument.
|
||||
When
|
||||
.BR connect (2)
|
||||
is called on the socket the default destination address is set and
|
||||
datagrams can now be sent using
|
||||
.BR send (2)
|
||||
or
|
||||
.BR write (2)
|
||||
without specifying a destination address.
|
||||
It is still possible to send to other destinations by passing an
|
||||
address to
|
||||
.BR sendto (2)
|
||||
or
|
||||
.BR sendmsg (2).
|
||||
In order to receive packets the socket can be bound to a local
|
||||
address first by using
|
||||
.BR bind (2).
|
||||
Otherwise the socket layer will automatically assign
|
||||
a free local port out of the range defined by
|
||||
.I net.ipv4.ip_local_port_range
|
||||
and bind the socket to
|
||||
.BR INADDR_ANY .
|
||||
|
||||
All receive operations return only one packet.
|
||||
When the packet is smaller than the passed buffer only that much
|
||||
data is returned, when it is bigger the packet is truncated and the
|
||||
.B MSG_TRUNC
|
||||
flag is set.
|
||||
.B MSG_WAITALL
|
||||
is not supported.
|
||||
|
||||
IP options may be sent or received using the socket options described in
|
||||
.BR ip (7).
|
||||
They are only processed by the kernel when the appropriate sysctl
|
||||
is enabled (but still passed to the user even when it is turned off).
|
||||
See
|
||||
.BR ip (7).
|
||||
|
||||
When the
|
||||
.B MSG_DONTROUTE
|
||||
flag is set on sending the destination address must refer to a local
|
||||
interface address and the packet is only sent to that interface.
|
||||
|
||||
By default Linux UDP does path MTU (Maximum Transmission Unit) discovery.
|
||||
This means the kernel
|
||||
will keep track of the MTU to a specific target IP address and return
|
||||
.B EMSGSIZE
|
||||
when a UDP packet write exceeds it.
|
||||
When this happens the application should decrease the packet size.
|
||||
Path MTU discovery can be also turned off using the
|
||||
.B IP_MTU_DISCOVER
|
||||
socket option or the
|
||||
.I ip_no_pmtu_disc
|
||||
sysctl, see
|
||||
.BR ip (7)
|
||||
for details.
|
||||
When turned off UDP will fragment outgoing UDP packets
|
||||
that exceed the interface MTU.
|
||||
However disabling it is not recommended
|
||||
for performance and reliability reasons.
|
||||
.SS "Address Format"
|
||||
UDP uses the IPv4
|
||||
.I sockaddr_in
|
||||
address format described in
|
||||
.BR ip (7).
|
||||
.SS "Error Handling"
|
||||
All fatal errors will be passed to the user as an error return even
|
||||
when the socket is not connected.
|
||||
This includes asynchronous errors
|
||||
received from the network.
|
||||
You may get an error for an earlier packet
|
||||
that was sent on the same socket.
|
||||
This behavior differs from many other BSD socket implementations
|
||||
which don't pass any errors unless the socket is connected.
|
||||
Linux's behavior is mandated by
|
||||
.BR RFC\ 1122 .
|
||||
|
||||
For compatibility with legacy code in Linux 2.0 and 2.2
|
||||
it was possible to set the
|
||||
.B SO_BSDCOMPAT
|
||||
.B SOL_SOCKET
|
||||
option to receive remote errors only when the socket has been
|
||||
connected (except for
|
||||
.B EPROTO
|
||||
and
|
||||
.BR EMSGSIZE ).
|
||||
Locally generated errors are always passed.
|
||||
Support for this socket option was removed in later kernels; see
|
||||
.BR socket (7)
|
||||
for further information.
|
||||
|
||||
When the
|
||||
.B IP_RECVERR
|
||||
option is enabled all errors are stored in the socket error queue
|
||||
and can be received by
|
||||
.BR recvmsg (2)
|
||||
with the
|
||||
.B MSG_ERRQUEUE
|
||||
flag set.
|
||||
.SS "Socket Options"
|
||||
To set or get a UDP socket option, call
|
||||
.BR getsockopt (2)
|
||||
to read or
|
||||
.BR setsockopt (2)
|
||||
to write the option with the option level argument set to
|
||||
.BR IPPROTO_UDP .
|
||||
.TP
|
||||
.BR UDP_CORK " (since Linux 2.5.44)"
|
||||
If this option is enabled, then all data output on this socket
|
||||
is accumulated into a single datagram that is transmitted when
|
||||
the option is disabled.
|
||||
This option should not be used in code intended to be
|
||||
portable.
|
||||
.\" FIXME document UDP_ENCAP (new in kernel 2.5.67)
|
||||
.SS Ioctls
|
||||
These ioctls can be accessed using
|
||||
.BR ioctl (2).
|
||||
The correct syntax is:
|
||||
.PP
|
||||
.RS
|
||||
.nf
|
||||
.BI int " value";
|
||||
.IB error " = ioctl(" udp_socket ", " ioctl_type ", &" value ");"
|
||||
.fi
|
||||
.RE
|
||||
.TP
|
||||
.BR FIONREAD " (" SIOCINQ )
|
||||
Gets a pointer to an integer as argument.
|
||||
Returns the size of the next pending datagram in the integer in bytes,
|
||||
or 0 when no datagram is pending.
|
||||
.TP
|
||||
.BR TIOCOUTQ " (" SIOCOUTQ )
|
||||
Returns the number of data bytes in the local send queue.
|
||||
Only supported with Linux 2.4 and above.
|
||||
.PP
|
||||
In addition all ioctls documented in
|
||||
.BR ip (7)
|
||||
and
|
||||
.BR socket (7)
|
||||
are supported.
|
||||
.SH ERRORS
|
||||
All errors documented for
|
||||
.BR socket (7)
|
||||
or
|
||||
.BR ip (7)
|
||||
may be returned by a send or receive on a UDP socket.
|
||||
|
||||
.B ECONNREFUSED
|
||||
No receiver was associated with the destination address.
|
||||
This might be caused by a previous packet sent over the socket.
|
||||
.SH VERSIONS
|
||||
IP_RECVERR is a new feature in Linux 2.2.
|
||||
.\" .SH CREDITS
|
||||
.\" This man page was written by Andi Kleen.
|
||||
.SH "SEE ALSO"
|
||||
.BR ip (7),
|
||||
.BR raw (7),
|
||||
.BR socket (7)
|
||||
|
||||
RFC\ 768 for the User Datagram Protocol.
|
||||
.br
|
||||
RFC\ 1122 for the host requirements.
|
||||
.br
|
||||
RFC\ 1191 for a description of path MTU discovery.
|
||||
|
|
367
man7/unix.7
367
man7/unix.7
|
@ -1,8 +1,359 @@
|
|||
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\"
|
||||
.\" Modified, 2003-12-02, Michael Kerrisk, <mtk.manpages@gmail.com>
|
||||
.\" Modified, 2003-09-23, Adam Langley
|
||||
.\" Modified, 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com>
|
||||
.\" Added SOCK_SEQPACKET
|
||||
.\" 2008-05-27, mtk, Provide a clear description of the three types of
|
||||
.\" address that can appear in the sockaddr_un structure: pathname,
|
||||
.\" unnamed, and abstract.
|
||||
.\"
|
||||
.TH UNIX 7 2008-06-17 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
unix, PF_UNIX, AF_UNIX, PF_LOCAL, AF_LOCAL \- Sockets for local
|
||||
interprocess communication
|
||||
.SH SYNOPSIS
|
||||
.B #include <sys/socket.h>
|
||||
.br
|
||||
.B #include <sys/un.h>
|
||||
|
||||
.IB unix_socket " = socket(PF_UNIX, type, 0);"
|
||||
.br
|
||||
.IB error " = socketpair(PF_UNIX, type, 0, int *" sv ");"
|
||||
.SH DESCRIPTION
|
||||
The
|
||||
.B PF_UNIX
|
||||
(also known as
|
||||
.BR PF_LOCAL )
|
||||
socket family is used to communicate between processes on the same machine
|
||||
efficiently.
|
||||
Traditionally, Unix sockets can be either unnamed,
|
||||
or bound to a file system pathname (marked as being of type socket).
|
||||
Linux also supports an abstract namespace which is independent of the
|
||||
file system.
|
||||
|
||||
Valid types are:
|
||||
.BR SOCK_STREAM ,
|
||||
for a stream-oriented socket and
|
||||
.BR SOCK_DGRAM ,
|
||||
for a datagram-oriented socket that preserves message boundaries
|
||||
(as on most Unix implementations, Unix domain datagram
|
||||
sockets are always reliable and don't reorder datagrams);
|
||||
and (since Linux 2.6.4)
|
||||
.BR SOCK_SEQPACKET ,
|
||||
for a connection-oriented socket that preserves message boundaries
|
||||
and delivers messages in the order that they were sent.
|
||||
|
||||
Unix sockets support passing file descriptors or process credentials
|
||||
to other processes using ancillary data.
|
||||
.SS Address Format
|
||||
A Unix domain socket address is represented in the following structure:
|
||||
.in +4n
|
||||
.nf
|
||||
|
||||
#define UNIX_PATH_MAX 108
|
||||
|
||||
struct sockaddr_un {
|
||||
sa_family_t sun_family; /* AF_UNIX */
|
||||
char sun_path[UNIX_PATH_MAX]; /* pathname */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
.PP
|
||||
.I sun_family
|
||||
always contains
|
||||
.BR AF_UNIX .
|
||||
|
||||
Three types of address are distinguished in this structure:
|
||||
.IP * 3
|
||||
.IR pathname :
|
||||
a Unix domain socket can be bound to a null-terminated file
|
||||
system pathname using
|
||||
.BR bind (2).
|
||||
When the address of the socket is returned by
|
||||
.BR getsockname (2),
|
||||
.BR getpeername (2),
|
||||
and
|
||||
.BR accept (2),
|
||||
its length is
|
||||
.IR "sizeof(sa_family_t) + strlen(sun_path) + 1" ,
|
||||
and
|
||||
.I sun_path
|
||||
contains the null-terminated pathname.
|
||||
.IP *
|
||||
.IR unnamed :
|
||||
A stream socket that has not been bound to a pathname using
|
||||
.BR bind (2)
|
||||
has no name.
|
||||
Likewise, the two sockets created by
|
||||
.BR socketpair (2)
|
||||
are unnamed.
|
||||
When the address of an unnamed socket is returned by
|
||||
.BR getsockname (2),
|
||||
.BR getpeername (2),
|
||||
and
|
||||
.BR accept (2),
|
||||
its length is
|
||||
.IR "sizeof(sa_family_t)" ,
|
||||
and
|
||||
.I sun_path
|
||||
should not be inspected.
|
||||
.\" There is quite some variation across implementations: FreeBSD
|
||||
.\" says the length is 16 bytes, HP-UX 11 says it's zero bytes.
|
||||
.IP *
|
||||
.IR abstract :
|
||||
an abstract socket address is distinguished by the fact that
|
||||
.IR sun_path[0]
|
||||
is a null byte ('\\0').
|
||||
All of the remaining bytes in
|
||||
.I sun_path
|
||||
define the "name" of the socket.
|
||||
(Null bytes in the name have no special significance.)
|
||||
The name has no connection with file system pathnames.
|
||||
The socket's address in this namespace is given by the rest of the
|
||||
bytes in
|
||||
.IR sun_path .
|
||||
When the address of an abstract socket is returned by
|
||||
.BR getsockname (2),
|
||||
.BR getpeername (2),
|
||||
and
|
||||
.BR accept (2),
|
||||
its length is
|
||||
.IR "sizeof(struct sockaddr_un)" ,
|
||||
and
|
||||
.I sun_path
|
||||
contains the abstract name.
|
||||
The abstract socket namespace is a non-portable Linux extension.
|
||||
.SS Socket Options
|
||||
For historical reasons these socket options are specified with a
|
||||
.B SOL_SOCKET
|
||||
type even though they are
|
||||
.B PF_UNIX
|
||||
specific.
|
||||
They can be set with
|
||||
.BR setsockopt (2)
|
||||
and read with
|
||||
.BR getsockopt (2)
|
||||
by specifying
|
||||
.B SOL_SOCKET
|
||||
as the socket family.
|
||||
.TP
|
||||
.B SO_PASSCRED
|
||||
Enables the receiving of the credentials of the sending process
|
||||
ancillary message.
|
||||
When this option is set and the socket is not yet connected
|
||||
a unique name in the abstract namespace will be generated automatically.
|
||||
Expects an integer boolean flag.
|
||||
.SS (Un)supported Features
|
||||
The following paragraphs describe domain-specific details and
|
||||
unsupported features of the sockets API for Unix domain sockets on Linux.
|
||||
|
||||
Unix domain sockets do not support the transmission of
|
||||
out-of-band data (the
|
||||
.B MSG_OOB
|
||||
flag for
|
||||
.BR send (2)
|
||||
and
|
||||
.BR recv (2)).
|
||||
|
||||
The
|
||||
.BR send (2)
|
||||
.B MSG_MORE
|
||||
flag is not supported by Unix domain sockets.
|
||||
|
||||
The
|
||||
.B SO_SNDBUF
|
||||
socket option does have an effect for Unix domain sockets, but the
|
||||
.B SO_RCVBUF
|
||||
option does not.
|
||||
For datagram sockets, the
|
||||
.B SO_SNDBUF
|
||||
value imposes an upper limit on the size of outgoing datagrams.
|
||||
This limit is calculated as the doubled (see
|
||||
.BR socket (7))
|
||||
option value less 32 bytes used for overhead.
|
||||
.SS Ancillary Messages
|
||||
Ancillary data is sent and received using
|
||||
.BR sendmsg (2)
|
||||
and
|
||||
.BR recvmsg (2).
|
||||
For historical reasons the ancillary message types listed below
|
||||
are specified with a
|
||||
.B SOL_SOCKET
|
||||
type even though they are
|
||||
.B PF_UNIX
|
||||
specific.
|
||||
To send them set the
|
||||
.I cmsg_level
|
||||
field of the struct
|
||||
.I cmsghdr
|
||||
to
|
||||
.B SOL_SOCKET
|
||||
and the
|
||||
.I cmsg_type
|
||||
field to the type.
|
||||
For more information see
|
||||
.BR cmsg (3).
|
||||
.TP
|
||||
.B SCM_RIGHTS
|
||||
Send or receive a set of open file descriptors from another process.
|
||||
The data portion contains an integer array of the file descriptors.
|
||||
The passed file descriptors behave as though they have been created with
|
||||
.BR dup (2).
|
||||
.TP
|
||||
.B SCM_CREDENTIALS
|
||||
Send or receive Unix credentials.
|
||||
This can be used for authentication.
|
||||
The credentials are passed as a
|
||||
.I struct ucred
|
||||
ancillary message.
|
||||
|
||||
.in +4n
|
||||
.nf
|
||||
struct ucred {
|
||||
pid_t pid; /* process ID of the sending process */
|
||||
uid_t uid; /* user ID of the sending process */
|
||||
gid_t gid; /* group ID of the sending process */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
The credentials which the sender specifies are checked by the kernel.
|
||||
A process with effective user ID 0 is allowed to specify values that do
|
||||
not match its own.
|
||||
The sender must specify its own process ID (unless it has the capability
|
||||
.BR CAP_SYS_ADMIN ),
|
||||
its user ID, effective user ID, or saved set-user-ID (unless it has
|
||||
.BR CAP_SETUID ),
|
||||
and its group ID, effective group ID, or saved set-group-ID
|
||||
(unless it has
|
||||
.BR CAP_SETGID ).
|
||||
To receive a
|
||||
.I struct ucred
|
||||
message the
|
||||
.B SO_PASSCRED
|
||||
option must be enabled on the socket.
|
||||
.SH ERRORS
|
||||
.TP
|
||||
.B EADDRINUSE
|
||||
Selected local address is already taken or file system socket
|
||||
object already exists.
|
||||
.TP
|
||||
.B ECONNREFUSED
|
||||
.BR connect (2)
|
||||
called with a socket object that isn't listening.
|
||||
This can happen when
|
||||
the remote socket does not exist or the filename is not a socket.
|
||||
.TP
|
||||
.B ECONNRESET
|
||||
Remote socket was unexpectedly closed.
|
||||
.TP
|
||||
.B EFAULT
|
||||
User memory address was not valid.
|
||||
.TP
|
||||
.B EINVAL
|
||||
Invalid argument passed.
|
||||
A common cause is the missing setting of AF_UNIX
|
||||
in the
|
||||
.I sun_type
|
||||
field of passed addresses or the socket being in an
|
||||
invalid state for the applied operation.
|
||||
.TP
|
||||
.B EISCONN
|
||||
.BR connect (2)
|
||||
called on an already connected socket or a target address was
|
||||
specified on a connected socket.
|
||||
.TP
|
||||
.B ENOMEM
|
||||
Out of memory.
|
||||
.TP
|
||||
.B ENOTCONN
|
||||
Socket operation needs a target address, but the socket is not connected.
|
||||
.TP
|
||||
.B EOPNOTSUPP
|
||||
Stream operation called on non-stream oriented socket or tried to
|
||||
use the out-of-band data option.
|
||||
.TP
|
||||
.B EPERM
|
||||
The sender passed invalid credentials in the
|
||||
.IR "struct ucred" .
|
||||
.TP
|
||||
.B EPIPE
|
||||
Remote socket was closed on a stream socket.
|
||||
If enabled, a
|
||||
.B SIGPIPE
|
||||
is sent as well.
|
||||
This can be avoided by passing the
|
||||
.B MSG_NOSIGNAL
|
||||
flag to
|
||||
.BR sendmsg (2)
|
||||
or
|
||||
.BR recvmsg (2).
|
||||
.TP
|
||||
.B EPROTONOSUPPORT
|
||||
Passed protocol is not PF_UNIX.
|
||||
.TP
|
||||
.B EPROTOTYPE
|
||||
Remote socket does not match the local socket type
|
||||
.RB ( SOCK_DGRAM
|
||||
vs.
|
||||
.BR SOCK_STREAM )
|
||||
.TP
|
||||
.B ESOCKTNOSUPPORT
|
||||
Unknown socket type.
|
||||
.PP
|
||||
Other errors can be generated by the generic socket layer or
|
||||
by the file system while generating a file system socket object.
|
||||
See the appropriate manual pages for more information.
|
||||
.SH VERSIONS
|
||||
.B SCM_CREDENTIALS
|
||||
and the abstract namespace were introduced with Linux 2.2 and should not
|
||||
be used in portable programs.
|
||||
(Some BSD-derived systems also support credential passing,
|
||||
but the implementation details differ.)
|
||||
.SH NOTES
|
||||
In the Linux implementation, sockets which are visible in the
|
||||
file system honor the permissions of the directory they are in.
|
||||
Their owner, group and their permissions can be changed.
|
||||
Creation of a new socket will fail if the process does not have write and
|
||||
search (execute) permission on the directory the socket is created in.
|
||||
Connecting to the socket object requires read/write permission.
|
||||
This behavior differs from many BSD-derived systems which
|
||||
ignore permissions for Unix sockets.
|
||||
Portable programs should not rely on
|
||||
this feature for security.
|
||||
|
||||
Binding to a socket with a filename creates a socket
|
||||
in the file system that must be deleted by the caller when it is no
|
||||
longer needed (using
|
||||
.BR unlink (2)).
|
||||
The usual Unix close-behind semantics apply; the socket can be unlinked
|
||||
at any time and will be finally removed from the file system when the last
|
||||
reference to it is closed.
|
||||
|
||||
To pass file descriptors or credentials over a
|
||||
.BR SOCK_STREAM ,
|
||||
you need
|
||||
to send or receive at least one byte of non-ancillary data in the same
|
||||
.BR sendmsg (2)
|
||||
or
|
||||
.BR recvmsg (2)
|
||||
call.
|
||||
|
||||
Unix domain stream sockets do not support the notion of out-of-band data.
|
||||
.SH EXAMPLE
|
||||
See
|
||||
.BR bind (2).
|
||||
.SH "SEE ALSO"
|
||||
.BR recvmsg (2),
|
||||
.BR sendmsg (2),
|
||||
.BR socket (2),
|
||||
.BR socketpair (2),
|
||||
.BR cmsg (3),
|
||||
.BR capabilities (7),
|
||||
.BR credentials (7),
|
||||
.BR socket (7)
|
||||
|
|
130
man7/x25.7
130
man7/x25.7
|
@ -1,8 +1,122 @@
|
|||
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
|
||||
.\" This man page is Copyright (C) 1998 Heiner Eisen.
|
||||
.\" Permission is granted to distribute possibly modified copies
|
||||
.\" of this page provided the header is included verbatim,
|
||||
.\" and in case of nontrivial modification author and date
|
||||
.\" of the modification is added to the header.
|
||||
.\" $Id: x25.7,v 1.4 1999/05/18 10:35:12 freitag Exp $
|
||||
.TH X25 7 1998-12-01 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
x25, PF_X25 \- ITU-T X.25 / ISO-8208 protocol interface.
|
||||
.SH SYNOPSIS
|
||||
.B #include <sys/socket.h>
|
||||
.br
|
||||
.B #include <linux/x25.h>
|
||||
.sp
|
||||
.B x25_socket = socket(PF_X25, SOCK_SEQPACKET, 0);
|
||||
.SH DESCRIPTION
|
||||
X25 sockets provide an interface to the X.25 packet layer protocol.
|
||||
This allows applications to
|
||||
communicate over a public X.25 data network as standardized by
|
||||
International Telecommunication Union's recommendation X.25
|
||||
(X.25 DTE-DCE mode).
|
||||
X25 sockets can also be used for communication
|
||||
without an intermediate X.25 network (X.25 DTE-DTE mode) as described
|
||||
in ISO-8208.
|
||||
.PP
|
||||
Message boundaries are preserved \(em a
|
||||
.BR read (2)
|
||||
from a socket will
|
||||
retrieve the same chunk of data as output with the corresponding
|
||||
.BR write (2)
|
||||
to the peer socket.
|
||||
When necessary, the kernel takes care
|
||||
of segmenting and re-assembling long messages by means of
|
||||
the X.25 M-bit.
|
||||
There is no hard-coded upper limit for the
|
||||
message size.
|
||||
However, re-assembling of a long message might fail if
|
||||
there is a temporary lack of system resources or when other constraints
|
||||
(such as socket memory or buffer size limits) become effective.
|
||||
If that
|
||||
occurs, the X.25 connection will be reset.
|
||||
.SS Socket Addresses
|
||||
The
|
||||
.B AF_X25
|
||||
socket address family uses the
|
||||
.I struct sockaddr_x25
|
||||
for representing network addresses as defined in ITU-T
|
||||
recommendation X.121.
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
struct sockaddr_x25 {
|
||||
sa_family_t sx25_family; /* must be AF_X25 */
|
||||
x25_address sx25_addr; /* X.121 Address */
|
||||
};
|
||||
.fi
|
||||
.in
|
||||
.PP
|
||||
.I sx25_addr
|
||||
contains a char array
|
||||
.I x25_addr[]
|
||||
to be interpreted as a null-terminated string.
|
||||
.I sx25_addr.x25_addr[]
|
||||
consists of up to 15 (not counting the terminating 0) ASCII
|
||||
characters forming the X.121 address.
|
||||
Only the decimal digit characters from \(aq0\(aq to \(aq9\(aq are allowed.
|
||||
.SS Socket Options
|
||||
The following X.25-specific socket options can be set by using
|
||||
.BR setsockopt (2)
|
||||
and read with
|
||||
.BR getsockopt (2)
|
||||
with the
|
||||
.I level
|
||||
argument set to
|
||||
.BR SOL_X25 .
|
||||
.TP
|
||||
.B X25_QBITINCL
|
||||
Controls whether the X.25 Q-bit (Qualified Data Bit) is accessible by the
|
||||
user.
|
||||
It expects an integer argument.
|
||||
If set to 0 (default),
|
||||
the Q-bit is never set for outgoing packets and the Q-bit of incoming
|
||||
packets is ignored.
|
||||
If set to 1, an additional first byte is prepended
|
||||
to each message read from or written to the socket.
|
||||
For data read from
|
||||
the socket, a 0 first byte indicates that the Q-bits of the corresponding
|
||||
incoming data packets were not set.
|
||||
A first byte with value 1 indicates
|
||||
that the Q-bit of the corresponding incoming data packets was set.
|
||||
If the first byte of the data written to the socket is 1 the Q-bit of the
|
||||
corresponding outgoing data packets will be set.
|
||||
If the first byte is 0
|
||||
the Q-bit will not be set.
|
||||
.SH VERSIONS
|
||||
The PF_X25 protocol family is a new feature of Linux 2.2.
|
||||
.SH BUGS
|
||||
Plenty, as the X.25 PLP implementation is
|
||||
.BR CONFIG_EXPERIMENTAL .
|
||||
.PP
|
||||
This man page is incomplete.
|
||||
.PP
|
||||
There is no dedicated application programmer's header file yet;
|
||||
you need to include the kernel header file
|
||||
.IR <linux/x25.h> .
|
||||
.B CONFIG_EXPERIMENTAL
|
||||
might also imply that future versions of the
|
||||
interface are not binary compatible.
|
||||
.PP
|
||||
X.25 N-Reset events are not propagated to the user process yet.
|
||||
Thus,
|
||||
if a reset occurred, data might be lost without notice.
|
||||
.SH "SEE ALSO"
|
||||
.BR socket (2),
|
||||
.BR socket (7)
|
||||
.PP
|
||||
Jonathan Simon Naylor:
|
||||
\(lqThe Re-Analysis and Re-Implementation of X.25.\(rq
|
||||
The URL is
|
||||
.RS
|
||||
.I ftp://ftp.pspt.fi/pub/ham/linux/ax25/x25doc.tgz
|
||||
.RE
|
||||
|
|
Loading…
Reference in New Issue