Reverting blunder in commit 4699

This commit is contained in:
Michael Kerrisk 2008-08-08 16:41:48 +00:00
parent 10874173db
commit 77117f4fc5
17 changed files with 6614 additions and 127 deletions

23
Changes
View File

@ -38,6 +38,29 @@ initrd.4
mtk
Fix mis-ordered (.SH) sections.
connect.2
socket.2
rtnetlink.3
arp.7
ddp.7
ip.7
ipv6.7
netlink.7
packet.7
raw.7
rtnetlink.7
socket.7
tcp.7
udp.7
unix.7
x25.7
mtk
s/PF_/AF_/ for socket family conistants. Reasons: the AF_ and
PF_ constants have always had the same values; there never has
been a protocol family that had more than one address family,
and POSIX.1-2001 only specifies the AF_* constants.
Typographical or grammatical errors have been corrected in several
other places.

View File

@ -1,8 +1,268 @@
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.\" Hey Emacs! This file is -*- nroff -*- source.
.\"
.\" Copyright 1993 Rickard E. Faith (faith@cs.unc.edu)
.\" Portions extracted from /usr/include/sys/socket.h, which does not have
.\" any authorship information in it. It is probably available under the GPL.
.\"
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.\"
.\" Since the Linux kernel and libraries are constantly changing, this
.\" manual page may be incorrect or out-of-date. The author(s) assume no
.\" responsibility for errors or omissions, or for damages resulting from
.\" the use of the information contained herein. The author(s) may not
.\" have taken the same level of care in the production of this manual,
.\" which is licensed free of charge, as they might when working
.\" professionally.
.\"
.\" Formatted or processed versions of this manual, if unaccompanied by
.\" the source, must acknowledge the copyright and authors of this work.
.\"
.\"
.\" Other portions are from the 6.9 (Berkeley) 3/10/91 man page:
.\"
.\" Copyright (c) 1983 The Regents of the University of California.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\" must display the following acknowledgement:
.\" This product includes software developed by the University of
.\" California, Berkeley and its contributors.
.\" 4. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" Modified 1997-01-31 by Eric S. Raymond <esr@thyrsus.com>
.\" Modified 1998, 1999 by Andi Kleen
.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
.\"
.TH CONNECT 2 2007-12-28 "Linux" "Linux Programmer's Manual"
.SH NAME
connect \- initiate a connection on a socket
.SH SYNOPSIS
.nf
.BR "#include <sys/types.h>" " /* See NOTES */"
.br
.B #include <sys/socket.h>
.sp
.BI "int connect(int " sockfd ", const struct sockaddr *" serv_addr ,
.BI " socklen_t " addrlen );
.fi
.SH DESCRIPTION
The
.BR connect ()
system call connects the socket referred to by the file descriptor
.I sockfd
to the address specified by
.IR serv_addr .
The
.I addrlen
argument specifies the size of
.IR serv_addr .
The format of the address in
.I serv_addr
is determined by the address space of the socket
.IR sockfd ;
see
.BR socket (2)
for further details.
If the socket
.I sockfd
is of type
.B SOCK_DGRAM
then
.I serv_addr
is the address to which datagrams are sent by default, and the only
address from which datagrams are received.
If the socket is of type
.B SOCK_STREAM
or
.BR SOCK_SEQPACKET ,
this call attempts to make a connection to the socket that is bound
to the address specified by
.IR serv_addr .
.PP
Generally, connection-based protocol sockets may successfully
.BR connect ()
only once; connectionless protocol sockets may use
.BR connect ()
multiple times to change their association.
Connectionless sockets may
dissolve the association by connecting to an address with the
.I sa_family
member of
.I sockaddr
set to
.BR AF_UNSPEC
(supported on Linux since kernel 2.2).
.SH "RETURN VALUE"
If the connection or binding succeeds, zero is returned.
On error, \-1 is returned, and
.I errno
is set appropriately.
.SH ERRORS
The following are general socket errors only.
There may be other domain-specific error codes.
.TP
.B EACCES
For Unix domain sockets, which are identified by pathname:
Write permission is denied on the socket file,
or search permission is denied for one of the directories
in the path prefix.
(See also
.BR path_resolution (7).)
.TP
.BR EACCES ", " EPERM
The user tried to connect to a broadcast address without having the socket
broadcast flag enabled or the connection request failed because of a local
firewall rule.
.TP
.B EADDRINUSE
Local address is already in use.
.TP
.B EAFNOSUPPORT
The passed address didn't have the correct address family in its
.I sa_family
field.
.TP
.B EAGAIN
No more free local ports or insufficient entries in the routing cache.
For
.B PF_INET
see the
.I net.ipv4.ip_local_port_range
sysctl in
.BR ip (7)
on how to increase the number of local ports.
.TP
.B EALREADY
The socket is non-blocking and a previous connection attempt has not yet
been completed.
.TP
.B EBADF
The file descriptor is not a valid index in the descriptor table.
.TP
.B ECONNREFUSED
No-one listening on the remote address.
.TP
.B EFAULT
The socket structure address is outside the user's address space.
.TP
.B EINPROGRESS
The socket is non-blocking and the connection cannot be completed
immediately.
It is possible to
.BR select (2)
or
.BR poll (2)
for completion by selecting the socket for writing.
After
.BR select (2)
indicates writability, use
.BR getsockopt (2)
to read the
.B SO_ERROR
option at level
.B SOL_SOCKET
to determine whether
.BR connect ()
completed successfully
.RB ( SO_ERROR
is zero) or unsuccessfully
.RB ( SO_ERROR
is one of the usual error codes listed here,
explaining the reason for the failure).
.TP
.B EINTR
The system call was interrupted by a signal that was caught; see
.BR signal (7).
.\" For TCP, the connection will complete asynchronously.
.\" See http://lkml.org/lkml/2005/7/12/254
.TP
.B EISCONN
The socket is already connected.
.TP
.B ENETUNREACH
Network is unreachable.
.TP
.B ENOTSOCK
The file descriptor is not associated with a socket.
.TP
.B ETIMEDOUT
Timeout while attempting connection.
The server may be too
busy to accept new connections.
Note that for IP sockets the timeout may
be very long when syncookies are enabled on the server.
.SH "CONFORMING TO"
SVr4, 4.4BSD, (the
.BR connect ()
function first appeared in 4.2BSD), POSIX.1-2001.
.\" SVr4 documents the additional
.\" general error codes
.\" .BR EADDRNOTAVAIL ,
.\" .BR EINVAL ,
.\" .BR EAFNOSUPPORT ,
.\" .BR EALREADY ,
.\" .BR EINTR ,
.\" .BR EPROTOTYPE ,
.\" and
.\" .BR ENOSR .
.\" It also
.\" documents many additional error conditions not described here.
.SH NOTES
POSIX.1-2001 does not require the inclusion of
.IR <sys/types.h> ,
and this header file is not required on Linux.
However, some historical (BSD) implementations required this header
file, and portable applications are probably wise to include it.
The third argument of
.BR connect ()
is in reality an
.I int
(and this is what 4.x BSD and libc4 and libc5 have).
Some POSIX confusion resulted in the present
.IR socklen_t ,
also used by glibc.
See also
.BR accept (2).
.SH EXAMPLE
An example of the use of
.BR connect ()
is shown in
.BR getaddrinfo (3).
.SH "SEE ALSO"
.BR accept (2),
.BR bind (2),
.BR getsockname (2),
.BR listen (2),
.BR socket (2),
.BR path_resolution (7)

View File

@ -1,8 +1,382 @@
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
'\" t
.\" Copyright (c) 1983, 1991 The Regents of the University of California.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\" must display the following acknowledgement:
.\" This product includes software developed by the University of
.\" California, Berkeley and its contributors.
.\" 4. Neither the name of the University nor the names of its contributors
.\" may be used to endorse or promote products derived from this software
.\" without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $Id: socket.2,v 1.4 1999/05/13 11:33:42 freitag Exp $
.\"
.\" Modified 1993-07-24 by Rik Faith <faith@cs.unc.edu>
.\" Modified 1996-10-22 by Eric S. Raymond <esr@thyrsus.com>
.\" Modified 1998, 1999 by Andi Kleen <ak@muc.de>
.\" Modified 2002-07-17 by Michael Kerrisk <mtk.manpages@gmail.com>
.\" Modified 2004-06-17 by Michael Kerrisk <mtk.manpages@gmail.com>
.\"
.TH SOCKET 2 2004-06-17 "Linux" "Linux Programmer's Manual"
.SH NAME
socket \- create an endpoint for communication
.SH SYNOPSIS
.BR "#include <sys/types.h>" " /* See NOTES */"
.br
.B #include <sys/socket.h>
.sp
.BI "int socket(int " domain ", int " type ", int " protocol );
.SH DESCRIPTION
.BR socket ()
creates an endpoint for communication and returns a descriptor.
.PP
The
.I domain
argument specifies a communication domain; this selects the protocol
family which will be used for communication.
These families are defined in
.IR <sys/socket.h> .
The currently understood formats include:
.TS
tab(:);
l l l.
Name:Purpose:Man page
T{
.BR PF_UNIX ", " PF_LOCAL
T}:T{
Local communication
T}:T{
.BR unix (7)
T}
T{
.B PF_INET
T}:IPv4 Internet protocols:T{
.BR ip (7)
T}
T{
.B PF_INET6
T}:IPv6 Internet protocols:T{
.BR ipv6 (7)
T}
T{
.B PF_IPX
T}:IPX \- Novell protocols:
T{
.B PF_NETLINK
T}:T{
Kernel user interface device
T}:T{
.BR netlink (7)
T}
T{
.B PF_X25
T}:ITU-T X.25 / ISO-8208 protocol:T{
.BR x25 (7)
T}
T{
.B PF_AX25
T}:T{
Amateur radio AX.25 protocol
T}:
T{
.B PF_ATMPVC
T}:Access to raw ATM PVCs:
T{
.B PF_APPLETALK
T}:Appletalk:T{
.BR ddp (7)
T}
T{
.B PF_PACKET
T}:T{
Low level packet interface
T}:T{
.BR packet (7)
T}
.TE
.PP
The socket has the indicated
.IR type ,
which specifies the communication semantics.
Currently defined types
are:
.TP
.B SOCK_STREAM
Provides sequenced, reliable, two-way, connection-based byte streams.
An out-of-band data transmission mechanism may be supported.
.TP
.B SOCK_DGRAM
Supports datagrams (connectionless, unreliable messages of a fixed
maximum length).
.TP
.B SOCK_SEQPACKET
Provides a sequenced, reliable, two-way connection-based data
transmission path for datagrams of fixed maximum length; a consumer is
required to read an entire packet with each input system call.
.TP
.B SOCK_RAW
Provides raw network protocol access.
.TP
.B SOCK_RDM
Provides a reliable datagram layer that does not guarantee ordering.
.TP
.B SOCK_PACKET
Obsolete and should not be used in new programs;
see
.BR packet (7).
.PP
Some socket types may not be implemented by all protocol families;
for example,
.B SOCK_SEQPACKET
is not implemented for
.BR AF_INET .
.PP
The
.I protocol
specifies a particular protocol to be used with the socket.
Normally only a single protocol exists to support a particular
socket type within a given protocol family, in which case
.I protocol
can be specified as 0.
However, it is possible that many protocols may exist, in
which case a particular protocol must be specified in this manner.
The protocol number to use is specific to the \*(lqcommunication domain\*(rq
in which communication is to take place; see
.BR protocols (5).
See
.BR getprotoent (3)
on how to map protocol name strings to protocol numbers.
.PP
Sockets of type
.B SOCK_STREAM
are full-duplex byte streams, similar to pipes.
They do not preserve
record boundaries.
A stream socket must be in
a
.I connected
state before any data may be sent or received on it.
A connection to
another socket is created with a
.BR connect (2)
call.
Once connected, data may be transferred using
.BR read (2)
and
.BR write (2)
calls or some variant of the
.BR send (2)
and
.BR recv (2)
calls.
When a session has been completed a
.BR close (2)
may be performed.
Out-of-band data may also be transmitted as described in
.BR send (2)
and received as described in
.BR recv (2).
.PP
The communications protocols which implement a
.B SOCK_STREAM
ensure that data is not lost or duplicated.
If a piece of data for which
the peer protocol has buffer space cannot be successfully transmitted
within a reasonable length of time, then the connection is considered
to be dead.
When
.B SO_KEEPALIVE
is enabled on the socket the protocol checks in a protocol-specific
manner if the other end is still alive.
A
.B SIGPIPE
signal is raised if a process sends or receives
on a broken stream; this causes naive processes,
which do not handle the signal, to exit.
.B SOCK_SEQPACKET
sockets employ the same system calls as
.B SOCK_STREAM
sockets.
The only difference is that
.BR read (2)
calls will return only the amount of data requested,
and any data remaining in the arriving packet will be discarded.
Also all message boundaries in incoming datagrams are preserved.
.PP
.B SOCK_DGRAM
and
.B SOCK_RAW
sockets allow sending of datagrams to correspondents named in
.BR sendto (2)
calls.
Datagrams are generally received with
.BR recvfrom (2),
which returns the next datagram along with the address of its sender.
.PP
.B SOCK_PACKET
is an obsolete socket type to receive raw packets directly from the
device driver.
Use
.BR packet (7)
instead.
.PP
An
.BR fcntl (2)
.B F_SETOWN
operation can be used to specify a process or process group to receive a
.B SIGURG
signal when the out-of-band data arrives or
.B SIGPIPE
signal when a
.B SOCK_STREAM
connection breaks unexpectedly.
This operation may also be used to set the process or process group
that receives the I/O and asynchronous notification of I/O events via
.BR SIGIO .
Using
.B F_SETOWN
is equivalent to an
.BR ioctl (2)
call with the
.B FIOSETOWN
or
.B SIOCSPGRP
argument.
.PP
When the network signals an error condition to the protocol module (e.g.,
using a ICMP message for IP) the pending error flag is set for the socket.
The next operation on this socket will return the error code of the pending
error.
For some protocols it is possible to enable a per-socket error queue
to retrieve detailed information about the error; see
.B IP_RECVERR
in
.BR ip (7).
.PP
The operation of sockets is controlled by socket level
.IR options .
These options are defined in
.IR <sys/socket.h> .
The functions
.BR setsockopt (2)
and
.BR getsockopt (2)
are used to set and get options, respectively.
.SH "RETURN VALUE"
On success, a file descriptor for the new socket is returned.
On error, \-1 is returned, and
.I errno
is set appropriately.
.SH ERRORS
.TP
.B EACCES
Permission to create a socket of the specified type and/or protocol
is denied.
.TP
.B EAFNOSUPPORT
The implementation does not support the specified address family.
.TP
.B EINVAL
Unknown protocol, or protocol family not available.
.TP
.B EMFILE
Process file table overflow.
.TP
.B ENFILE
The system limit on the total number of open files has been reached.
.TP
.BR ENOBUFS " or " ENOMEM
Insufficient memory is available.
The socket cannot be
created until sufficient resources are freed.
.TP
.B EPROTONOSUPPORT
The protocol type or the specified protocol is not
supported within this domain.
.PP
Other errors may be generated by the underlying protocol modules.
.SH "CONFORMING TO"
4.4BSD, POSIX.1-2001.
.BR socket ()
appeared in 4.2BSD.
It is generally portable to/from
non-BSD systems supporting clones of the BSD socket layer (including
System V variants).
.SH NOTES
POSIX.1-2001 does not require the inclusion of
.IR <sys/types.h> ,
and this header file is not required on Linux.
However, some historical (BSD) implementations required this header
file, and portable applications are probably wise to include it.
The manifest constants used under 4.x BSD for protocol families
are
.BR PF_UNIX ,
.BR PF_INET ,
etc., while
.B AF_UNIX
etc. are used for address
families.
However, already the BSD man page promises: "The protocol
family generally is the same as the address family", and subsequent
standards use AF_* everywhere.
.SH BUGS
.B SOCK_UUCP
is not implemented yet.
.SH EXAMPLE
An example of the use of
.BR socket ()
is shown in
.BR getaddrinfo (3).
.SH "SEE ALSO"
.BR accept (2),
.BR bind (2),
.BR connect (2),
.BR fcntl (2),
.BR getpeername (2),
.BR getsockname (2),
.BR getsockopt (2),
.BR ioctl (2),
.BR listen (2),
.BR read (2),
.BR recv (2),
.BR select (2),
.BR send (2),
.BR shutdown (2),
.BR socketpair (2),
.BR write (2),
.BR getprotoent (3),
.BR ip (7),
.BR socket (7),
.BR tcp (7),
.BR udp (7),
.BR unix (7)
.PP
\(lqAn Introductory 4.3BSD Interprocess Communication Tutorial\(rq
is reprinted in
.I UNIX Programmer's Supplementary Documents Volume 1.
.PP
\(lqBSD Interprocess Communication Tutorial\(rq
is reprinted in
.I UNIX Programmer's Supplementary Documents Volume 1.

View File

@ -1,8 +1,118 @@
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\" $Id: rtnetlink.3,v 1.2 1999/05/18 10:35:10 freitag Exp $
.TH RTNETLINK 3 1999-05-14 "GNU" "Linux Programmer's Manual"
.SH NAME
rtnetlink \- macros to manipulate rtnetlink messages
.SH SYNOPSIS
.B #include <asm/types.h>
.br
.B #include <linux/netlink.h>
.br
.B #include <linux/rtnetlink.h>
.br
.B #include <sys/socket.h>
.BI "rtnetlink_socket = socket(PF_NETLINK, int " socket_type \
", NETLINK_ROUTE);"
.sp
.BI "int RTA_OK(struct rtattr *" rta ", int " rtabuflen );
.sp
.BI "void *RTA_DATA(struct rtattr *" rta );
.sp
.BI "unsigned int RTA_PAYLOAD(struct rtattr *" rta );
.sp
.BI "struct rtattr *RTA_NEXT(struct rtattr *" rta \
", unsigned int " rtabuflen );
.sp
.BI "unsigned int RTA_LENGTH(unsigned int " length );
.sp
.BI "unsigned int RTA_SPACE(unsigned int "length );
.SH DESCRIPTION
All
.BR rtnetlink (7)
messages consist of a
.BR netlink (7)
message header and appended attributes.
The attributes should be only
manipulated using the macros provided here.
.PP
.BI RTA_OK( rta ", " attrlen )
returns true if
.I rta
points to a valid routing attribute;
.I attrlen
is the running length of the attribute buffer.
When not true then you must assume there are no more attributes in the
message, even if
.I attrlen
is non-zero.
.PP
.BI RTA_DATA( rta )
returns a pointer to the start of this attribute's data.
.PP
.BI RTA_PAYLOAD( rta )
returns the length of this attribute's data.
.PP
.BI RTA_NEXT( rta ", " attrlen )
gets the next attribute after
.IR rta .
Calling this macro will update
.IR attrlen .
You should use
.B RTA_OK
to check the validity of the returned pointer.
.PP
.BI RTA_LENGTH( len )
returns the length which is required for
.I len
bytes of data plus the header.
.PP
.BI RTA_SPACE( len )
returns the amount of space which will be needed in a message with
.I len
bytes of data.
.SH CONFORMING TO
These macros are non-standard Linux extensions.
.SH BUGS
This manual page is incomplete.
.SH EXAMPLE
.\" FIXME ? would be better to use libnetlink in the EXAMPLE code here
Creating a rtnetlink message to set the MTU of a device:
.nf
struct {
struct nlmsghdr nh;
struct ifinfomsg if;
char attrbuf[512];
} req;
struct rtattr *rta;
unsigned int mtu = 1000;
int rtnetlink_sk = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE);
memset(&req, 0, sizeof(req));
req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
req.nh.nlmsg_flags = NLM_F_REQUEST;
req.nh.nlmsg_type = RTML_NEWLINK;
req.if.ifi_family = AF_UNSPEC;
req.if.ifi_index = INTERFACE_INDEX;
req.if.ifi_change = 0xffffffff; /* ???*/
rta = (struct rtattr *)(((char *) &req) +
NLMSG_ALIGN(n\->nlmsg_len));
rta\->rta_type = IFLA_MTU;
rta\->rta_len = sizeof(unsigned int);
req.n.nlmsg_len = NLMSG_ALIGN(req.n.nlmsg_len) +
RTA_LENGTH(sizeof(mtu));
memcpy(RTA_DATA(rta), &mtu, sizeof(mtu));
send(rtnetlink_sk, &req, req.n.nlmsg_len);
.fi
.SH "SEE ALSO"
.BR netlink (3),
.BR netlink (7),
.BR rtnetlink (7)

View File

@ -1,8 +1,275 @@
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
'\" t
.\" This man page is Copyright (C) 1999 Matthew Wilcox <willy@bofh.ai>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\" Modified June 1999 Andi Kleen
.\" $Id: arp.7,v 1.10 2000/04/27 19:31:38 ak Exp $
.TH ARP 7 2007-07-27 "Linux" "Linux Programmer's Manual"
.SH NAME
arp \- Linux ARP kernel module.
.SH DESCRIPTION
This kernel protocol module implements the Address Resolution
Protocol defined in RFC\ 826.
It is used to convert between Layer2 hardware addresses
and IPv4 protocol addresses on directly connected networks.
The user normally doesn't interact directly with this module except to
configure it;
instead it provides a service for other protocols in the kernel.
A user process can receive ARP packets by using
.BR packet (7)
sockets.
There is also a mechanism for managing the ARP cache
in user-space by using
.BR netlink (7)
sockets.
The ARP table can also be controlled via
.BR ioctl (2)
on any
.B PF_INET
socket.
The ARP module maintains a cache of mappings between hardware addresses
and protocol addresses.
The cache has a limited size so old and less
frequently used entries are garbage-collected.
Entries which are marked
as permanent are never deleted by the garbage-collector.
The cache can
be directly manipulated by the use of ioctls and its behavior can be
tuned by the sysctls defined below.
When there is no positive feedback for an existing mapping after some
time (see the sysctls below) a neighbor cache entry is considered stale.
Positive feedback can be gotten from a higher layer; for example from
a successful TCP ACK.
Other protocols can signal forward progress
using the
.B MSG_CONFIRM
flag to
.BR sendmsg (2).
When there is no forward progress ARP tries to reprobe.
It first tries to ask a local arp daemon
.B app_solicit
times for an updated MAC address.
If that fails and an old MAC address is known an unicast probe is send
.B ucast_solicit
times.
If that fails too it will broadcast a new ARP
request to the network.
Requests are only send when there is data queued
for sending.
Linux will automatically add a non-permanent proxy arp entry when it
receives a request for an address it forwards to and proxy arp is
enabled on the receiving interface.
When there is a reject route for the target no proxy arp entry is added.
.SS Ioctls
Three ioctls are available on all
.B PF_INET
sockets.
They take a pointer to a
.I struct arpreq
as their argument.
.in +4n
.nf
struct arpreq {
struct sockaddr arp_pa; /* protocol address */
struct sockaddr arp_ha; /* hardware address */
int arp_flags; /* flags */
struct sockaddr arp_netmask; /* netmask of protocol address */
char arp_dev[16];
};
.fi
.in
.BR SIOCSARP ", " SIOCDARP " and " SIOCGARP
respectively set, delete and get an ARP mapping.
Setting & deleting ARP maps are privileged operations and may
only be performed by a process with the
.B CAP_NET_ADMIN
capability or an effective UID of 0.
.I arp_pa
must be an
.B AF_INET
socket and
.I arp_ha
must have the same type as the device which is specified in
.IR arp_dev .
.I arp_dev
is a zero-terminated string which names a device.
.RS
.TS
tab(:) allbox;
c s
l l.
\fIarp_flags\fR
flag:meaning
ATF_COM:Lookup complete
ATF_PERM:Permanent entry
ATF_PUBL:Publish entry
ATF_USETRAILERS:Trailers requested
ATF_NETMASK:Use a netmask
ATF_DONTPUB:Don't answer
.TE
.RE
.PP
If the
.B ATF_NETMASK
flag is set, then
.I arp_netmask
should be valid.
Linux 2.2 does not support proxy network ARP entries, so this
should be set to 0xffffffff, or 0 to remove an existing proxy arp entry.
.B ATF_USETRAILERS
is obsolete and should not be used.
.SS Sysctls
ARP supports a sysctl interface to configure parameters on a global
or per-interface basis.
The sysctls can be accessed by reading or writing the
.I /proc/sys/net/ipv4/neigh/*/*
files or with the
.BR sysctl (2)
interface.
Each interface in the system has its own directory in
/proc/sys/net/ipv4/neigh/.
The setting in the "default" directory is used for all newly created
devices.
Unless otherwise specified time-related sysctls are specified
in seconds.
.TP
.B anycast_delay
The maximum number of jiffies to delay before replying to a
IPv6 neighbor solicitation message.
Anycast support is not yet implemented.
Defaults to 1 second.
.TP
.B app_solicit
The maximum number of probes to send to the user space ARP daemon via
netlink before dropping back to multicast probes (see
.IR mcast_solicit ).
Defaults to 0.
.TP
.B base_reachable_time
Once a neighbor has been found, the entry is considered to be valid
for at least a random value between
.IR base_reachable_time "/2 and 3*" base_reachable_time /2.
An entry's validity will be extended if it receives positive feedback
from higher level protocols.
Defaults to 30 seconds.
.TP
.B delay_first_probe_time
Delay before first probe after it has been decided that a neighbor
is stale.
Defaults to 5 seconds.
.TP
.B gc_interval
How frequently the garbage collector for neighbor entries
should attempt to run.
Defaults to 30 seconds.
.TP
.B gc_stale_time
Determines how often to check for stale neighbor entries.
When a neighbor entry is considered stale it is resolved again before
sending data to it.
Defaults to 60 seconds.
.TP
.B gc_thresh1
The minimum number of entries to keep in the ARP cache.
The garbage collector will not run if there are fewer than
this number of entries in the cache.
Defaults to 128.
.TP
.B gc_thresh2
The soft maximum number of entries to keep in the ARP cache.
The garbage collector will allow the number of entries to exceed
this for 5 seconds before collection will be performed.
Defaults to 512.
.TP
.B gc_thresh3
The hard maximum number of entries to keep in the ARP cache.
The garbage collector will always run if there are more than
this number of entries in the cache.
Defaults to 1024.
.TP
.B locktime
The minimum number of jiffies to keep an ARP entry in the cache.
This prevents ARP cache thrashing if there is more than one potential
mapping (generally due to network misconfiguration).
Defaults to 1 second.
.TP
.B mcast_solicit
The maximum number of attempts to resolve an address by
multicast/broadcast before marking the entry as unreachable.
Defaults to 3.
.TP
.B proxy_delay
When an ARP request for a known proxy-ARP address is received, delay up to
.I proxy_delay
jiffies before replying.
This is used to prevent network flooding in some cases.
Defaults to 0.8 seconds.
.TP
.B proxy_qlen
The maximum number of packets which may be queued to proxy-ARP addresses.
Defaults to 64.
.TP
.B retrans_time
The number of jiffies to delay before retransmitting a request.
Defaults to 1 second.
.TP
.B ucast_solicit
The maximum number of attempts to send unicast probes before asking
the ARP daemon (see
.IR app_solicit ).
Defaults to 3.
.TP
.B unres_qlen
The maximum number of packets which may be queued for each unresolved
address by other network layers.
Defaults to 3.
.SH VERSIONS
The
.I struct arpreq
changed in Linux 2.0 to include the
.I arp_dev
member and the ioctl numbers changed at the same time.
Support for the old ioctls was dropped in Linux 2.2.
Support for proxy arp entries for networks (netmask not equal 0xffffffff)
was dropped in Linux 2.2.
It is replaced by automatic proxy arp setup by
the kernel for all reachable hosts on other interfaces (when
forwarding and proxy arp is enabled for the interface).
The
.I neigh/*
sysctls did not exist before Linux 2.2.
.SH BUGS
Some timer settings are specified in jiffies, which is architecture-
and kernel version-dependent; see
.BR time (7).
There is no way to signal positive feedback from user space.
This means connection oriented protocols implemented in user space
will generate excessive ARP traffic, because ndisc will regularly
reprobe the MAC address.
The same problem applies for some kernel protocols (e.g., NFS over UDP).
This man page mashes IPv4 specific and shared between IPv4 and IPv6
functionality together.
.SH "SEE ALSO"
.BR capabilities (7),
.BR ip (7)
.PP
RFC\ 826 for a description of ARP.
.br
RFC\ 2461 for a description of IPv6 neighbor discovery and the base
algorithms used.
.LP
Linux 2.2+ IPv4 ARP uses the IPv6 algorithms when applicable.

View File

@ -1,8 +1,251 @@
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.\" This man page is Copyright (C) 1998 Alan Cox.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\" $Id: ddp.7,v 1.3 1999/05/13 11:33:22 freitag Exp $
.TH DDP 7 1999-05-01 "Linux" "Linux Programmer's Manual"
.SH NAME
ddp \- Linux AppleTalk protocol implementation
.SH SYNOPSIS
.B #include <sys/socket.h>
.br
.B #include <netatalk/at.h>
.sp
.IB ddp_socket " = socket(PF_APPLETALK, SOCK_DGRAM, 0);"
.br
.IB raw_socket " = socket(PF_APPLETALK, SOCK_RAW, " protocol ");"
.SH DESCRIPTION
Linux implements the Appletalk protocols described in
.IR "Inside Appletalk" .
Only the DDP layer and AARP are present in
the kernel.
They are designed to be used via the
.B netatalk
protocol
libraries.
This page documents the interface for those who wish or need to
use the DDP layer directly.
.PP
The communication between Appletalk and the user program works using a
BSD-compatible socket interface.
For more information on sockets, see
.BR socket (7).
.PP
An AppleTalk socket is created by calling the
.BR socket (2)
function with a
.B PF_APPLETALK
socket family argument.
Valid socket types are
.B SOCK_DGRAM
to open a
.B ddp
socket or
.B SOCK_RAW
to open a
.B raw
socket.
.I protocol
is the Appletalk protocol to be received or sent.
For
.B SOCK_RAW
you must specify
.BR ATPROTO_DDP .
.PP
Raw sockets may be only opened by a process with effective user ID 0
or when the process has the
.B CAP_NET_RAW
capability.
.SS "Address Format"
An Appletalk socket address is defined as a combination of a network number,
a node number, and a port number.
.PP
.in +4n
.nf
struct at_addr {
unsigned short s_net;
unsigned char s_node;
};
struct sockaddr_atalk {
sa_family_t sat_family; /* address family */
unsigned char sat_port; /* port */
struct at_addr sat_addr; /* net/node */
};
.fi
.in
.PP
.I sat_family
is always set to
.BR AF_APPLETALK .
.I sat_port
contains the port.
The port numbers below 129 are known as
.I reserved ports.
Only processes with the effective user ID 0 or the
.B CAP_NET_BIND_SERVICE
capability may
.BR bind (2)
to these sockets.
.I sat_addr
is the host address.
The
.I net
member of
.I struct at_addr
contains the host network in network byte order.
The value of
.B AT_ANYNET
is a
wildcard and also implies \(lqthis network.\(rq
The
.I node
member of
.I struct at_addr
contains the host node number.
The value of
.B AT_ANYNODE
is a
wildcard and also implies \(lqthis node.\(rq The value of
.B ATADDR_BCAST
is a link
local broadcast address.
.\" FIXME this doesn't make sense [johnl]
.SS "Socket Options"
No protocol-specific socket options are supported.
.SS Sysctls
IP supports a sysctl interface to configure some global AppleTalk
parameters.
The sysctls can be accessed by reading or writing the
.I /proc/sys/net/atalk/*
files or with the
.BR sysctl (2)
interface.
.TP
.B aarp-expiry-time
The time interval (in seconds) before an AARP cache entry expires.
.TP
.B aarp-resolve-time
The time interval (in seconds) before an AARP cache entry is resolved.
.TP
.B aarp-retransmit-limit
The number of retransmissions of an AARP query before the node is declared
dead.
.TP
.B aarp-tick-time
The timer rate (in seconds) for the timer driving AARP.
.PP
The default values match the specification and should never need to be
changed.
.SS Ioctls
All ioctls described in
.BR socket (7)
apply to ddp.
.\" FIXME Add a section about multicasting
.SH ERRORS
.\" FIXME document all errors. We should really fix the kernels to
.\" give more uniform error returns (ENOMEM vs ENOBUFS, EPERM vs
.\" EACCES etc.)
.TP
.B EACCES
The user tried to execute an operation without the necessary permissions.
These include sending to a broadcast address without
having the broadcast flag set,
and trying to bind to a reserved port without effective user ID 0 or
.BR CAP_NET_BIND_SERVICE .
.TP
.B EADDRINUSE
Tried to bind to an address already in use.
.TP
.B EADDRNOTAVAIL
A nonexistent interface was requested or the requested source address was
not local.
.TP
.B EAGAIN
Operation on a non-blocking socket would block.
.TP
.B EALREADY
A connection operation on a non-blocking socket is already in progress.
.TP
.B ECONNABORTED
A connection was closed during an
.BR accept (2).
.TP
.B EHOSTUNREACH
No routing table entry matches the destination address.
.TP
.B EINVAL
Invalid argument passed.
.TP
.B EISCONN
.BR connect (2)
was called on an already connected socket.
.TP
.B EMSGSIZE
Datagram is bigger than the DDP MTU.
.TP
.B ENODEV
Network device not available or not capable of sending IP.
.TP
.B ENOENT
.B SIOCGSTAMP
was called on a socket where no packet arrived.
.TP
.BR ENOMEM " and " ENOBUFS
Not enough memory available.
.TP
.B ENOPKG
A kernel subsystem was not configured.
.TP
.BR ENOPROTOOPT " and " EOPNOTSUPP
Invalid socket option passed.
.TP
.B ENOTCONN
The operation is only defined on a connected socket, but the socket wasn't
connected.
.TP
.B EPERM
User doesn't have permission to set high priority,
make a configuration change,
or send signals to the requested process or group,
.TP
.B EPIPE
The connection was unexpectedly closed or shut down by the other end.
.TP
.B ESOCKTNOSUPPORT
The socket was unconfigured, or an unknown socket type was requested.
.SH VERSIONS
Appletalk is supported by Linux 2.0 or higher.
The
.B sysctl
interface is
new in Linux 2.2.
.SH NOTES
Be very careful with the
.B SO_BROADCAST
option \- it is not privileged in Linux.
It is easy to overload the network
with careless sending to broadcast addresses.
.SS Compatibility
The basic AppleTalk socket interface is compatible with
.B netatalk
on BSD-derived systems.
Many BSD systems fail to check
.B SO_BROADCAST
when sending broadcast frames; this can lead to compatibility problems.
.PP
The
raw
socket mode is unique to Linux and exists to support the alternative CAP
package and AppleTalk monitoring tools more easily.
.SH BUGS
There are too many inconsistent error values.
.PP
The ioctls used to configure routing tables, devices,
AARP tables and other devices are not yet described.
.SH "SEE ALSO"
.BR recvmsg (2),
.BR sendmsg (2),
.BR capabilities (7),
.BR socket (7)

1041
man7/ip.7

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,327 @@
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.\" This man page is Copyright (C) 2000 Andi Kleen <ak@muc.de>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\" $Id: ipv6.7,v 1.3 2000/12/20 18:10:31 ak Exp $
.TH IPV6 7 2008-07-17 "Linux" "Linux Programmer's Manual"
.SH NAME
ipv6, PF_INET6 \- Linux IPv6 protocol implementation
.SH SYNOPSIS
.B #include <sys/socket.h>
.br
.B #include <netinet/in.h>
.sp
.IB tcp6_socket " = socket(PF_INET6, SOCK_STREAM, 0);"
.br
.IB raw6_socket " = socket(PF_INET6, SOCK_RAW, " protocol ");"
.br
.IB udp6_socket " = socket(PF_INET6, SOCK_DGRAM, " protocol ");"
.SH DESCRIPTION
Linux 2.2 optionally implements the Internet Protocol, version 6.
This man page contains a description of the IPv6 basic API as
implemented by the Linux kernel and glibc 2.1.
The interface
is based on the BSD sockets interface; see
.BR socket (7).
.PP
The IPv6 API aims to be mostly compatible with the
.BR ip (7)
v4 API.
Only differences are described in this man page.
.PP
To bind an
.B AF_INET6
socket to any process the local address should be copied from the
.I in6addr_any
variable which has
.I in6_addr
type.
In static initializations
.B IN6ADDR_ANY_INIT
may also be used, which expands to a constant expression.
Both of them are in network order.
.PP
The IPv6 loopback address (::1) is available in the global
.I in6addr_loopback
variable.
For initializations
.B IN6ADDR_LOOPBACK_INIT
should be used.
.PP
IPv4 connections can be handled with the v6 API by using the
v4-mapped-on-v6 address type;
thus a program only needs only to support this API type to
support both protocols.
This is handled transparently by the address
handling functions in libc.
.PP
IPv4 and IPv6 share the local port space.
When you get an IPv4 connection
or packet to a IPv6 socket its source address will be mapped
to v6 and it will be mapped to v6.
.SS "Address Format"
.in +4n
.nf
struct sockaddr_in6 {
uint16_t sin6_family; /* AF_INET6 */
uint16_t sin6_port; /* port number */
uint32_t sin6_flowinfo; /* IPv6 flow information */
struct in6_addr sin6_addr; /* IPv6 address */
uint32_t sin6_scope_id; /* Scope ID (new in 2.4) */
};
struct in6_addr {
unsigned char s6_addr[16]; /* IPv6 address */
};
.fi
.in
.sp
.I sin6_family
is always set to
.BR AF_INET6 ;
.I sin6_port
is the protocol port (see
.I sin_port
in
.BR ip (7));
.I sin6_flowinfo
is the IPv6 flow identifier;
.I sin6_addr
is the 128-bit IPv6 address.
.I sin6_scope_id
is an ID of depending of on the scope of the address.
It is new in Linux 2.4.
Linux only supports it for link scope addresses, in that case
.I sin6_scope_id
contains the interface index (see
.BR netdevice (7))
.PP
IPv6 supports several address types: unicast to address a single
host, multicast to address a group of hosts,
anycast to address the nearest member of a group of hosts
(not implemented in Linux), IPv4-on-IPv6 to
address a IPv4 host, and other reserved address types.
.PP
The address notation for IPv6 is a group of 16 2-digit hexadecimal
numbers, separated with a \(aq:\(aq.
\&"::" stands for a string of 0 bits.
Special addresses are ::1 for loopback and ::FFFF:<IPv4 address>
for IPv4-mapped-on-IPv6.
.PP
The port space of IPv6 is shared with IPv4.
.SS "Socket Options"
IPv6 supports some protocol-specific socket options that can be set with
.BR setsockopt (2)
and read with
.BR getsockopt (2).
The socket option level for IPv6 is
.BR IPPROTO_IPV6 .
A boolean integer flag is zero when it is false, otherwise true.
.TP
.B IPV6_ADDRFORM
Turn an
.B AF_INET6
socket into a socket of a different address family.
Only
.B AF_INET
is currently supported for that.
It is only allowed for IPv6 sockets
that are connected and bound to a v4-mapped-on-v6 address.
The argument is a pointer to an integer containing
.BR AF_INET .
This is useful to pass v4-mapped sockets as file descriptors to
programs that don't know how to deal with the IPv6 API.
.TP
.B IPV6_ADD_MEMBERSHIP, IPV6_DROP_MEMBERSHIP
Control membership in multicast groups.
Argument is a pointer to a
.I struct ipv6_mreq
structure.
.\" FIXME IPV6_CHECKSUM is not documented, and probably should be
.\" FIXME IPV6_JOIN_ANYCAST is not documented, and probably should be
.\" FIXME IPV6_LEAVE_ANYCAST is not documented, and probably should be
.\" FIXME IPV6_RECVPKTINFO is not documented, and probably should be
.\" FIXME IPV6_2292PKTINFO is not documented, and probably should be
.\" FIXME there are probably many other IPV6_* socket options that
.\" should be documented
.TP
.B IPV6_MTU
Set the MTU to be used for the socket.
The MTU is limited by the device
MTU or the path mtu when path mtu discovery is enabled.
Argument is a pointer to integer.
.TP
.B IPV6_MTU_DISCOVER
Control path mtu discovery on the socket.
See
.B IP_MTU_DISCOVER
in
.BR ip (7)
for details.
.TP
.B IPV6_MULTICAST_HOPS
Set the multicast hop limit for the socket.
Argument is a pointer to an
integer.
\-1 in the value means use the route default, otherwise it should be
between 0 and 255.
.TP
.B IPV6_MULTICAST_IF
Set the device for outgoing multicast packets on the socket.
This is only allowed
for
.B SOCK_DGRAM
and
.B SOCK_RAW
socket.
The argument is a pointer to an interface index (see
.BR netdevice (7))
in an integer.
.TP
.B IPV6_MULTICAST_LOOP
Control whether the socket sees multicast packets that it has send itself.
Argument is a pointer to boolean.
.TP
.B IPV6_PKTINFO
Set delivery of the
.B IPV6_PKTINFO
control message on incoming datagrams.
Only allowed for
.B SOCK_DGRAM
or
.B SOCK_RAW
sockets.
Argument is a pointer to a boolean value in an integer.
.TP
.nh
.B IPV6_RTHDR, IPV6_AUTHHDR, IPV6_DSTOPS, IPV6_HOPOPTS, IPV6_FLOWINFO, IPV6_HOPLIMIT
.hy
Set delivery of control messages for incoming datagrams containing
extension headers from the received packet.
.B IPV6_RTHDR
delivers the routing header,
.B IPV6_AUTHHDR
delivers the authentication header,
.B IPV6_DSTOPTS
delivers the destination options,
.B IPV6_HOPOPTS
delivers the hop options,
.B IPV6_FLOWINFO
delivers an integer containing the flow ID,
.B IPV6_HOPLIMIT
delivers an integer containing the hop count of the packet.
The control messages have the same type as the socket option.
All these header options can also be set for outgoing packets
by putting the appropriate control message into the control buffer of
.BR sendmsg (2).
Only allowed for
.B SOCK_DGRAM
or
.B SOCK_RAW
sockets.
Argument is a pointer to a boolean value.
.TP
.B IPV6_RECVERR
Control receiving of asynchronous error options.
See
.B IP_RECVERR
in
.BR ip (7)
for details.
Argument is a pointer to boolean.
.TP
.B IPV6_ROUTER_ALERT
Pass forwarded packets containing a router alert hop-by-hop option to
this socket.
Only allowed for SOCK_RAW sockets.
The tapped packets are not forwarded by the kernel, it is the
user's responsibility to send them out again.
Argument is a pointer to an integer.
A positive integer indicates a router alert option value to intercept.
Packets carrying a router alert option with a value field containing
this integer will be delivered to the socket.
A negative integer disables delivery of packets with router alert options
to this socket.
.TP
.B IPV6_UNICAST_HOPS
Set the unicast hop limit for the socket.
Argument is a pointer to an integer.
\-1 in the value means use the route default,
otherwise it should be between 0 and 255.
.TP
.BR IPV6_V6ONLY " (since Linux 2.4.21 and 2.6)"
.\" See RFC 3493
If this flag is set to true (non-zero), then the socket is restricted
to sending and receiving IPv6 packets only.
In this case, an IPv4 and an IPv6 application can bind
to a single port at the same time.
If this flag is set to false (zero),
then the socket can be used to send and receive packets
to and from an IPv6 address or an IPv4-mapped IPv6 address.
The argument is a pointer to a boolean value in an integer.
The default value for this flag is defined by the contents of the file
.BR /proc/sys/net/ipv6/bindv6only .
The default value for that file is 0 (false).
.\" FLOWLABEL_MGR, FLOWINFO_SEND
.SH VERSIONS
The older
.I libinet6
libc5 based IPv6 API implementation for Linux is not described here
and may vary in details.
.PP
Linux 2.4 will break binary compatibility for the
.I sockaddr_in6
for 64-bit
hosts by changing the alignment of
.I in6_addr
and adding an additional
.I sin6_scope_id
field.
The kernel interfaces stay compatible, but a program including
.I sockaddr_in6
or
.I in6_addr
into other structures may not be.
This is not
a problem for 32-bit hosts like i386.
.PP
The
.I sin6_flowinfo
field is new in Linux 2.4.
It is transparently passed/read by the kernel
when the passed address length contains it.
Some programs that pass a longer address buffer and then
check the outgoing address length may break.
.SH "NOTES"
The
.I sockaddr_in6
structure is bigger than the generic
.IR sockaddr .
Programs that assume that all address types can be stored safely in a
.I struct sockaddr
need to be changed to use
.I struct sockaddr_storage
for that instead.
.SH BUGS
The IPv6 extended API as in RFC\ 2292 is currently only partly
implemented;
although the 2.2 kernel has near complete support for receiving options,
the macros for generating IPv6 options are missing in glibc 2.1.
.PP
IPSec support for EH and AH headers is missing.
.PP
Flow label management is not complete and not documented here.
.PP
This man page is not complete.
.SH "SEE ALSO"
.BR cmsg (3),
.BR ip (7)
.PP
RFC\ 2553: IPv6 BASIC API.
Linux tries to be compliant to this.
.PP
RFC\ 2460: IPv6 specification.

View File

@ -1,8 +1,460 @@
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual"
'\" t
.\" Don't change the first line, it tells man that tbl is needed.
.\" This man page is Copyright (c) 1998 by Andi Kleen. Subject to the GPL.
.\" Based on the original comments from Alexey Kuznetsov
.\" Modified 2005-12-27 by Hasso Tepper <hasso@estpak.ee>
.\" $Id: netlink.7,v 1.8 2000/06/22 13:23:00 ak Exp $
.TH NETLINK 7 2005-12-27 "Linux" "Linux Programmer's Manual"
.SH NAME
netlink \- Communication between kernel and userspace (PF_NETLINK)
.SH SYNOPSIS
.nf
.B #include <asm/types.h>
.B #include <sys/socket.h>
.B #include <linux/netlink.h>
.BI "netlink_socket = socket(PF_NETLINK, " socket_type ", " netlink_family );
.fi
.SH DESCRIPTION
Netlink is used to transfer information between kernel and
userspace processes.
It consists of a standard sockets-based interface for userspace
processes and an internal kernel API for kernel modules.
The internal kernel interface is not documented in this manual page.
There is also an obsolete netlink interface
via netlink character devices; this interface is not documented here
and is only provided for backwards compatibility.
Netlink is a datagram-oriented service.
Both
.B SOCK_RAW
and
.B SOCK_DGRAM
are valid values for
.IR socket_type .
However, the netlink protocol does not distinguish between datagram
and raw sockets.
.I netlink_family
selects the kernel module or netlink group to communicate with.
The currently assigned netlink families are:
.TP
.B NETLINK_ROUTE
Receives routing and link updates and may be used to modify the routing
tables (both IPv4 and IPv6), IP addresses, link parameters,
neighbor setups, queueing disciplines, traffic classes and
packet classifiers (see
.BR rtnetlink (7)).
.TP
.B NETLINK_W1
Messages from 1-wire subsystem.
.TP
.B NETLINK_USERSOCK
Reserved for user-mode socket protocols.
.TP
.B NETLINK_FIREWALL
Transport IPv4 packets from netfilter to userspace.
Used by
.I ip_queue
kernel module.
.TP
.B NETLINK_INET_DIAG
.\" FIXME More details on NETLINK_INET_DIAG needed.
INET socket monitoring.
.TP
.B NETLINK_NFLOG
Netfilter/iptables ULOG.
.TP
.B NETLINK_XFRM
.\" FIXME More details on NETLINK_XFRM needed.
IPsec.
.TP
.B NETLINK_SELINUX
SELinux event notifications.
.TP
.B NETLINK_ISCSI
.\" FIXME More details on NETLINK_ISCSI needed.
Open-iSCSI.
.TP
.B NETLINK_AUDIT
.\" FIXME More details on NETLINK_AUDIT needed.
Auditing.
.TP
.B NETLINK_FIB_LOOKUP
.\" FIXME More details on NETLINK_FIB_LOOKUP needed.
Access to FIB lookup from userspace.
.TP
.B NETLINK_CONNECTOR
Kernel connector.
See
.I Documentation/connector/*
in the kernel source for further information.
.TP
.B NETLINK_NETFILTER
.\" FIXME More details on NETLINK_NETFILTER needed.
Netfilter subsystem.
.TP
.B NETLINK_IP6_FW
Transport IPv6 packets from netfilter to userspace.
Used by
.I ip6_queue
kernel module.
.TP
.B NETLINK_DNRTMSG
DECnet routing messages.
.TP
.B NETLINK_KOBJECT_UEVENT
.\" FIXME More details on NETLINK_KOBJECT_UEVENT needed.
Kernel messages to userspace.
.TP
.B NETLINK_GENERIC
Generic netlink family for simplified netlink usage.
.PP
Netlink messages consist of a byte stream with one or multiple
.I nlmsghdr
headers and associated payload.
The byte stream should only be accessed with the standard
.B NLMSG_*
macros.
See
.BR netlink (3)
for further information.
In multipart messages (multiple
.I nlmsghdr
headers with associated payload in one byte stream) the first and all
following headers have the
.B NLM_F_MULTI
flag set, except for the last header which has the type
.BR NLMSG_DONE .
After each
.I nlmsghdr
the payload follows.
.in +4n
.nf
struct nlmsghdr {
__u32 nlmsg_len; /* Length of message including header. */
__u16 nlmsg_type; /* Type of message content. */
__u16 nlmsg_flags; /* Additional flags. */
__u32 nlmsg_seq; /* Sequence number. */
__u32 nlmsg_pid; /* PID of the sending process. */
};
.fi
.in
.I nlmsg_type
can be one of the standard message types:
.B NLMSG_NOOP
message is to be ignored,
.B NLMSG_ERROR
message signals an error and the payload contains an
.I nlmsgerr
structure,
.B NLMSG_DONE
message terminates a multipart message.
.in +4n
.nf
struct nlmsgerr {
int error; /* Negative errno or 0 for acknowledgements */
struct nlmsghdr msg; /* Message header that caused the error */
};
.fi
.in
A netlink family usually specifies more message types, see the
appropriate manual pages for that, for example,
.BR rtnetlink (7)
for
.BR NETLINK_ROUTE .
Standard flag bits in
.I nlmsg_flags
.br
---------------------------------
.TS
tab(:);
lB l.
NLM_F_REQUEST:Must be set on all request messages.
NLM_F_MULTI:T{
The message is part of a multipart message terminated by
.BR NLMSG_DONE .
T}
NLM_F_ACK:Request for an acknowledgment on success.
NLM_F_ECHO:Echo this request.
.TE
Additional flag bits for GET requests
.br
-------------------------------------
.TS
tab(:);
lB l.
NLM_F_ROOT:Return the complete table instead of a single entry.
NLM_F_MATCH:T{
Return all entries matching criteria passed in message content.
Not implemented yet.
T}
.\" FIXME NLM_F_ATOMIC is not used any more?
NLM_F_ATOMIC:Return an atomic snapshot of the table.
NLM_F_DUMP:Convenience macro; equivalent to (NLM_F_ROOT|NLM_F_MATCH).
.TE
Note that
.B NLM_F_ATOMIC
requires the
.B CAP_NET_ADMIN
capability or an effective UID of 0.
Additional flag bits for NEW requests
.br
-------------------------------------
.TS
tab(:);
lB l.
NLM_F_REPLACE:Replace existing matching object.
NLM_F_EXCL:Don't replace if the object already exists.
NLM_F_CREATE:Create object if it doesn't already exist.
NLM_F_APPEND:Add to the end of the object list.
.TE
.I nlmsg_seq
and
.I nlmsg_pid
are used to track messages.
.I nlmsg_pid
shows the origin of the message.
Note that there isn't a 1:1 relationship between
.I nlmsg_pid
and the PID of the process if the message originated from a netlink
socket.
See the
.B ADDRESS FORMATS
section for further information.
Both
.I nlmsg_seq
and
.I nlmsg_pid
.\" FIXME Explain more about nlmsg_seq and nlmsg_pid.
are opaque to netlink core.
Netlink is not a reliable protocol.
It tries its best to deliver a message to its destination(s),
but may drop messages when an out-of-memory condition or
other error occurs.
For reliable transfer the sender can request an
acknowledgement from the receiver by setting the
.B NLM_F_ACK
flag.
An acknowledgment is an
.B NLMSG_ERROR
packet with the error field set to 0.
The application must generate acknowledgements for
received messages itself.
The kernel tries to send an
.B NLMSG_ERROR
message for every failed packet.
A user process should follow this convention too.
However, reliable transmissions from kernel to user are impossible
in any case.
The kernel can't send a netlink message if the socket buffer is full:
the message will be dropped and the kernel and the userspace process will
no longer have the same view of kernel state.
It is up to the application to detect when this happens (via the
.B ENOBUFS
error returned by
.BR recvmsg (2))
and resynchronize.
.SS Address Formats
The
.I sockaddr_nl
structure describes a netlink client in user space or in the kernel.
A
.I sockaddr_nl
can be either unicast (only sent to one peer) or sent to
netlink multicast groups
.RI ( nl_groups
not equal 0).
.in +4n
.nf
struct sockaddr_nl {
sa_family_t nl_family; /* AF_NETLINK */
unsigned short nl_pad; /* Zero. */
pid_t nl_pid; /* Process ID. */
__u32 nl_groups; /* Multicast groups mask. */
};
.fi
.in
.I nl_pid
is the unicast address of netlink socket.
It's always 0 if the destination is in the kernel.
For a userspace process,
.I nl_pid
is usually the PID of the process owning the destination socket.
However,
.I nl_pid
identifies a netlink socket, not a process.
If a process owns several netlink
sockets, then
.I nl_pid
can only be equal to the process ID for at most one socket.
There are two ways to assign
.I nl_pid
to a netlink socket.
If the application sets
.I nl_pid
before calling
.BR bind (2),
then it is up to the application to make sure that
.I nl_pid
is unique.
If the application sets it to 0, the kernel takes care of assigning it.
The kernel assigns the process ID to the first netlink socket the process
opens and assigns a unique
.I nl_pid
to every netlink socket that the process subsequently creates.
.I nl_groups
is a bit mask with every bit representing a netlink group number.
Each netlink family has a set of 32 multicast groups.
When
.BR bind (2)
is called on the socket, the
.I nl_groups
field in the
.I sockaddr_nl
should be set to a bit mask of the groups which it wishes to listen to.
The default value for this field is zero which means that no multicasts
will be received.
A socket may multicast messages to any of the multicast groups by setting
.I nl_groups
to a bit mask of the groups it wishes to send to when it calls
.BR sendmsg (2)
or does a
.BR connect (2).
Only processes with an effective UID of 0 or the
.B CAP_NET_ADMIN
capability may send or listen to a netlink multicast group.
Any replies to a message received for a multicast group should be
sent back to the sending PID and the multicast group.
.SH VERSIONS
The socket interface to netlink is a new feature of Linux 2.2.
Linux 2.0 supported a more primitive device based netlink interface
(which is still available as a compatibility option).
This obsolete interface is not described here.
NETLINK_SELINUX appeared in Linux 2.6.4.
NETLINK_AUDIT appeared in Linux 2.6.6.
NETLINK_KOBJECT_UEVENT appeared in Linux 2.6.10.
NETLINK_W1 and NETLINK_FIB_LOOKUP appeared in Linux 2.6.13.
NETLINK_INET_DIAG, NETLINK_CONNECTOR and NETLINK_NETFILTER appeared in
Linux 2.6.14.
NETLINK_GENERIC and NETLINK_ISCSI appeared in Linux 2.6.15.
.SH NOTES
It is often better to use netlink via
.I libnetlink
or
.I libnl
than via the low-level kernel interface.
.SH BUGS
This manual page is not complete.
.SH EXAMPLE
The following example creates a
.B NETLINK_ROUTE
netlink socket which will listen to the
.B RTMGRP_LINK
(network interface create/delete/up/down events) and
.B RTMGRP_IPV4_IFADDR
(IPv4 addresses add/delete events) multicast groups.
.in +4n
.nf
struct sockaddr_nl sa;
memset(&sa, 0, sizeof(sa));
snl.nl_family = AF_NETLINK;
snl.nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR;
fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
bind(fd, (struct sockaddr*)&sa, sizeof(sa));
.fi
.in
The next example demonstrates how to send a netlink message to the
kernel (pid 0).
Note that application must take care of message sequence numbers
in order to reliably track acknowledgements.
.in +4n
.nf
struct nlmsghdr *nh; /* The nlmsghdr with payload to send. */
struct sockaddr_nl sa;
struct iovec iov = { (void *) nh, nh\->nlmsg_len };
struct msghdr msg;
msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 };
memset(&sa, 0, sizeof(sa));
sa.nl_family = AF_NETLINK;
nh\->nlmsg_pid = 0;
nh\->nlmsg_seq = ++sequence_number;
/* Request an ack from kernel by setting NLM_F_ACK. */
nh\->nlmsg_flags |= NLM_F_ACK;
sendmsg(fd, &msg, 0);
.fi
.in
And the last example is about reading netlink message.
.in +4n
.nf
int len;
char buf[4096];
struct iovec iov = { buf, sizeof(buf) };
struct sockaddr_nl sa;
struct msghdr msg;
struct nlmsghdr *nh;
msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 };
len = recvmsg(fd, &msg, 0);
for (nh = (struct nlmsghdr *) buf; NLMSG_OK (nh, len);
nh = NLMSG_NEXT (nh, len)) {
/* The end of multipart message. */
if (nh\->nlmsg_type == NLMSG_DONE)
return;
if (nh\->nlmsg_type == NLMSG_ERROR)
/* Do some error handling. */
...
/* Continue with parsing payload. */
...
}
.fi
.in
.SH "SEE ALSO"
.BR cmsg (3),
.BR netlink (3),
.BR capabilities (7),
.BR rtnetlink (7)
.PP
ftp://ftp.inr.ac.ru/ip-routing/iproute2*
for information about libnetlink.
http://people.suug.ch/~tgr/libnl/
for information about libnl.
RFC 3549 "Linux Netlink as an IP Services Protocol"

View File

@ -1,8 +1,402 @@
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\" $Id: packet.7,v 1.13 2000/08/14 08:03:45 ak Exp $
.TH PACKET 7 1999-04-29 "Linux" "Linux Programmer's Manual"
.SH NAME
packet, PF_PACKET \- packet interface on device level.
.SH SYNOPSIS
.nf
.B #include <sys/socket.h>
.br
.B #include <netpacket/packet.h>
.br
.B #include <net/ethernet.h> /* the L2 protocols */
.sp
.BI "packet_socket = socket(PF_PACKET, int " socket_type ", int "protocol );
.fi
.SH DESCRIPTION
Packet sockets are used to receive or send raw packets at the device driver
(OSI Layer 2) level.
They allow the user to implement protocol modules in user space
on top of the physical layer.
The
.I socket_type
is either
.B SOCK_RAW
for raw packets including the link level header or
.B SOCK_DGRAM
for cooked packets with the link level header removed.
The link level
header information is available in a common format in a
.IR sockaddr_ll .
.I protocol
is the IEEE 802.3 protocol number in network order.
See the
.I <linux/if_ether.h>
include file for a list of allowed protocols.
When protocol
is set to
.B htons(ETH_P_ALL)
then all protocols are received.
All incoming packets of that protocol type will be passed to the packet
socket before they are passed to the protocols implemented in the kernel.
Only processes with effective UID 0 or the
.B CAP_NET_RAW
capability may open packet sockets.
.B SOCK_RAW
packets are passed to and from the device driver without any changes in
the packet data.
When receiving a packet, the address is still parsed and
passed in a standard
.I sockaddr_ll
address structure.
When transmitting a packet, the user supplied buffer
should contain the physical layer header.
That packet is then
queued unmodified to the network driver of the interface defined by the
destination address.
Some device drivers always add other headers.
.B SOCK_RAW
is similar to but not compatible with the obsolete
.B PF_INET/SOCK_PACKET
of Linux 2.0.
.B SOCK_DGRAM
operates on a slightly higher level.
The physical header is removed before the packet is passed to the user.
Packets sent through a
.B SOCK_DGRAM
packet socket get a suitable physical layer header based on the
information in the
.I sockaddr_ll
destination address before they are queued.
By default all packets of the specified protocol type
are passed to a packet socket.
To only get packets from a specific interface use
.BR bind (2)
specifying an address in a
.I struct sockaddr_ll
to bind the packet socket to an interface.
Only the
.I sll_protocol
and the
.I sll_ifindex
address fields are used for purposes of binding.
The
.BR connect (2)
operation is not supported on packet sockets.
When the
.B MSG_TRUNC
flag is passed to
.BR recvmsg (2),
.BR recv (2),
.BR recvfrom (2)
the real length of the packet on the wire is always returned,
even when it is longer than the buffer.
.SS Address Types
The sockaddr_ll is a device independent physical layer address.
.in +4n
.nf
struct sockaddr_ll {
unsigned short sll_family; /* Always AF_PACKET */
unsigned short sll_protocol; /* Physical layer protocol */
int sll_ifindex; /* Interface number */
unsigned short sll_hatype; /* Header type */
unsigned char sll_pkttype; /* Packet type */
unsigned char sll_halen; /* Length of address */
unsigned char sll_addr[8]; /* Physical layer address */
};
.fi
.in
.I sll_protocol
is the standard ethernet protocol type in network order as defined
in the
.I <linux/if_ether.h>
include file.
It defaults to the socket's protocol.
.I sll_ifindex
is the interface index of the interface
(see
.BR netdevice (7));
0 matches any interface (only permitted for binding).
.I sll_hatype
is a ARP type as defined in the
.I <linux/if_arp.h>
include file.
.I sll_pkttype
contains the packet type.
Valid types are
.B PACKET_HOST
for a packet addressed to the local host,
.B PACKET_BROADCAST
for a physical layer broadcast packet,
.B PACKET_MULTICAST
for a packet sent to a physical layer multicast address,
.B PACKET_OTHERHOST
for a packet to some other host that has been caught by a device driver
in promiscuous mode, and
.B PACKET_OUTGOING
for a packet originated from the local host that is looped back to a packet
socket.
These types make only sense for receiving.
.I sll_addr
and
.I sll_halen
contain the physical layer (e.g., IEEE 802.3) address and its length.
The exact interpretation depends on the device.
When you send packets it is enough to specify
.IR sll_family ,
.IR sll_addr ,
.IR sll_halen ,
.IR sll_ifindex .
The other fields should be 0.
.I sll_hatype
and
.I sll_pkttype
are set on received packets for your information.
For bind only
.I sll_protocol
and
.I sll_ifindex
are used.
.SS Socket Options
Packet sockets can be used to configure physical layer multicasting
and promiscuous mode.
It works by calling
.BR setsockopt (2)
on a packet socket for
.B SOL_PACKET
and one of the options
.B PACKET_ADD_MEMBERSHIP
to add a binding or
.B PACKET_DROP_MEMBERSHIP
to drop it.
They both expect a
.B packet_mreq
structure as argument:
.in +4n
.nf
struct packet_mreq {
int mr_ifindex; /* interface index */
unsigned short mr_type; /* action */
unsigned short mr_alen; /* address length */
unsigned char mr_address[8]; /* physical layer address */
};
.fi
.in
.B mr_ifindex
contains the interface index for the interface whose status
should be changed.
The
.B mr_type
parameter specifies which action to perform.
.B PACKET_MR_PROMISC
enables receiving all packets on a shared medium (often known as
"promiscuous mode"),
.B PACKET_MR_MULTICAST
binds the socket to the physical layer multicast group specified in
.B mr_address
and
.BR mr_alen ,
and
.B PACKET_MR_ALLMULTI
sets the socket up to receive all multicast packets arriving at
the interface.
In addition the traditional ioctls
.BR SIOCSIFFLAGS ,
.BR SIOCADDMULTI ,
.B SIOCDELMULTI
can be used for the same purpose.
.SS Ioctls
.B SIOCGSTAMP
can be used to receive the timestamp of the last received packet.
Argument is a
.I struct timeval.
In addition all standard ioctls defined in
.BR netdevice (7)
and
.BR socket (7)
are valid on packet sockets.
.SS Error Handling
Packet sockets do no error handling other than errors occurred
while passing the packet to the device driver.
They don't have the concept of a pending error.
.SH ERRORS
.TP
.B EADDRNOTAVAIL
Unknown multicast group address passed.
.TP
.B EFAULT
User passed invalid memory address.
.TP
.B EINVAL
Invalid argument.
.TP
.B EMSGSIZE
Packet is bigger than interface MTU.
.TP
.B ENETDOWN
Interface is not up.
.TP
.B ENOBUFS
Not enough memory to allocate the packet.
.TP
.B ENODEV
Unknown device name or interface index specified in interface address.
.TP
.B ENOENT
No packet received.
.TP
.B ENOTCONN
No interface address passed.
.TP
.B ENXIO
Interface address contained an invalid interface index.
.TP
.B EPERM
User has insufficient privileges to carry out this operation.
In addition other errors may be generated by the low-level driver.
.SH VERSIONS
.B PF_PACKET
is a new feature in Linux 2.2.
Earlier Linux versions supported only
.BR SOCK_PACKET .
.PP
The include file
.I <netpacket/packet.h>
is present since glibc 2.1.
Older systems need:
.sp
.in +4n
.nf
#include <asm/types.h>
#include <linux/if_packet.h>
#include <linux/if_ether.h> /* The L2 protocols */
.fi
.in
.SH NOTES
For portable programs it is suggested to use
.B PF_PACKET
via
.BR pcap (3);
although this only covers a subset of the
.B PF_PACKET
features.
The
.B SOCK_DGRAM
packet sockets make no attempt to create or parse the IEEE 802.2 LLC
header for a IEEE 802.3 frame.
When
.B ETH_P_802_3
is specified as protocol for sending the kernel creates the
802.3 frame and fills out the length field; the user has to supply the LLC
header to get a fully conforming packet.
Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol
fields; instead they are supplied to the user as protocol
.B ETH_P_802_2
with the LLC header prepended.
It is thus not possible to bind to
.BR ETH_P_802_3 ;
bind to
.B ETH_P_802_2
instead and do the protocol multiplex yourself.
The default for sending is the standard Ethernet DIX
encapsulation with the protocol filled in.
Packet sockets are not subject to the input or output firewall chains.
.SS Compatibility
In Linux 2.0, the only way to get a packet socket was by calling
.BI "socket(PF_INET, SOCK_PACKET, " protocol )\fR.
This is still supported but strongly deprecated.
The main difference between the two methods is that
.B SOCK_PACKET
uses the old
.I struct sockaddr_pkt
to specify an interface, which doesn't provide physical layer
independence.
.in +4n
.nf
struct sockaddr_pkt {
unsigned short spkt_family;
unsigned char spkt_device[14];
unsigned short spkt_protocol;
};
.fi
.in
.I spkt_family
contains
the device type,
.I spkt_protocol
is the IEEE 802.3 protocol type as defined in
.I <sys/if_ether.h>
and
.I spkt_device
is the device name as a null terminated string, for example, eth0.
This structure is obsolete and should not be used in new code.
.SH BUGS
glibc 2.1 does not have a define for
.BR SOL_PACKET .
The suggested workaround is to use:
.in +4n
.nf
#ifndef SOL_PACKET
#define SOL_PACKET 263
#endif
.fi
.in
This is fixed in later glibc versions and also does not occur on
libc5 systems.
The IEEE 802.2/803.3 LLC handling could be considered as a bug.
Socket filters are not documented.
The
.B MSG_TRUNC
.BR recvmsg (2)
extension is an ugly hack and should be replaced by a control message.
There is currently no way to get the original destination address of
packets via
.BR SOCK_DGRAM .
.\" .SH CREDITS
.\" This man page was written by Andi Kleen with help from Matthew Wilcox.
.\" PF_PACKET in Linux 2.2 was implemented
.\" by Alexey Kuznetsov, based on code by Alan Cox and others.
.SH "SEE ALSO"
.BR socket (2),
.BR pcap (3),
.BR capabilities (7),
.BR ip (7),
.BR raw (7),
.BR socket (7)
RFC\ 894 for the standard IP Ethernet encapsulation.
RFC\ 1700 for the IEEE 802.3 IP encapsulation.
The
.I <linux/if_ether.h>
include file for physical layer protocols.

View File

@ -1,8 +1,278 @@
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual"
'\" t
.\" Don't change the first line, it tells man that we need tbl.
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\" $Id: raw.7,v 1.6 1999/06/05 10:32:08 freitag Exp $
.TH RAW 7 1998-10-02 "Linux" "Linux Programmer's Manual"
.SH NAME
raw, SOCK_RAW \- Linux IPv4 raw sockets
.SH SYNOPSIS
.B #include <sys/socket.h>
.br
.B #include <netinet/in.h>
.br
.BI "raw_socket = socket(PF_INET, SOCK_RAW, int " protocol );
.SH DESCRIPTION
Raw sockets allow new IPv4 protocols to be implemented in user space.
A raw socket receives or sends the raw datagram not
including link level headers.
The IPv4 layer generates an IP header when sending a packet unless the
.B IP_HDRINCL
socket option is enabled on the socket.
When it is enabled, the packet must contain an IP header.
For receiving the IP header is always included in the packet.
Only processes with an effective user ID of 0 or the
.B CAP_NET_RAW
capability are allowed to open raw sockets.
All packets or errors matching the
.I protocol
number specified
for the raw socket are passed to this socket.
For a list of the allowed protocols see RFC\ 1700 assigned numbers and
.BR getprotobyname (3).
A protocol of
.B IPPROTO_RAW
implies enabled
.B IP_HDRINCL
and is able to send any IP protocol that is specified in the passed
header.
Receiving of all IP protocols via
.B IPPROTO_RAW
is not possible using raw sockets.
.RS
.TS
tab(:) allbox;
c s
l l.
IP Header fields modified on sending by \fBIP_HDRINCL\fP
IP Checksum:Always filled in.
Source Address:Filled in when zero.
Packet Id:Filled in when zero.
Total Length:Always filled in.
.TE
.RE
.sp
.PP
If
.B IP_HDRINCL
is specified and the IP header has a non-zero destination address then
the destination address of the socket is used to route the packet.
When
.B MSG_DONTROUTE
is specified the destination address should refer to a local interface,
otherwise a routing table lookup is done anyway but gatewayed routes
are ignored.
If
.B IP_HDRINCL
isn't set then IP header options can be set on raw sockets with
.BR setsockopt (2);
see
.BR ip (7)
for more information.
In Linux 2.2 all IP header fields and options can be set using
IP socket options.
This means raw sockets are usually only needed for new
protocols or protocols with no user interface (like ICMP).
When a packet is received, it is passed to any raw sockets which have
been bound to its protocol before it is passed to other protocol handlers
(e.g., kernel protocol modules).
.SS Address Format
Raw sockets use the standard
.I sockaddr_in
address structure defined in
.BR ip (7).
The
.I sin_port
field could be used to specify the IP protocol number,
but it is ignored for sending in Linux 2.2 and should be always
set to 0 (see BUGS)
For incoming packets
.I sin_port
is set to the protocol of the packet.
See the
.I <netinet/in.h>
include file for valid IP protocols.
.SS Socket Options
Raw socket options can be set with
.BR setsockopt (2)
and read with
.BR getsockopt (2)
by passing the
.B IPPROTO_RAW
.\" Or SOL_RAW on Linux
family flag.
.TP
.B ICMP_FILTER
Enable a special filter for raw sockets bound to the
.B IPPROTO_ICMP
protocol.
The value has a bit set for each ICMP message type which
should be filtered out.
The default is to filter no ICMP messages.
.PP
In addition all
.BR ip (7)
.B IPPROTO_IP
socket options valid for datagram sockets are supported.
.SS Error Handling
Errors originating from the network are only passed to the user when the
socket is connected or the
.B IP_RECVERR
flag is enabled.
For connected sockets only
.B EMSGSIZE
and
.B EPROTO
are passed for compatibility.
With
.B IP_RECVERR
all network errors are saved in the error queue.
.SH ERRORS
.TP
.B EACCES
User tried to send to a broadcast address without having the
broadcast flag set on the socket.
.TP
.B EFAULT
An invalid memory address was supplied.
.TP
.B EINVAL
Invalid argument.
.TP
.B EMSGSIZE
Packet too big.
Either Path MTU Discovery is enabled (the
.B IP_MTU_DISCOVER
socket flag) or the packet size exceeds the maximum allowed IPv4
packet size of 64KB.
.TP
.B EOPNOTSUPP
Invalid flag has been passed to a socket call (like
.BR MSG_OOB ).
.TP
.B EPERM
The user doesn't have permission to open raw sockets.
Only processes with an effective user ID of 0 or the
.B CAP_NET_RAW
attribute may do that.
.TP
.B EPROTO
An ICMP error has arrived reporting a parameter problem.
.SH VERSIONS
.B IP_RECVERR
and
.B ICMP_FILTER
are new in Linux 2.2.
They are Linux extensions and should not be used in portable programs.
Linux 2.0 enabled some bug-to-bug compatibility with BSD in the
raw socket code when the
.B SO_BSDCOMPAT
socket option was set \(em since Linux 2.2,
this option no longer has that effect.
.SH NOTES
By default raw sockets do path MTU (Maximum Transmission Unit) discovery.
This means the kernel
will keep track of the MTU to a specific target IP address and return
.B EMSGSIZE
when a raw packet write exceeds it.
When this happens the application should decrease the packet size.
Path MTU discovery can be also turned off using the
.B IP_MTU_DISCOVER
socket option or the
.I ip_no_pmtu_disc
sysctl, see
.BR ip (7)
for details.
When turned off raw sockets will fragment outgoing packets
that exceed the interface MTU.
However disabling it is not recommended
for performance and reliability reasons.
A raw socket can be bound to a specific local address using the
.BR bind (2)
call.
If it isn't bound all packets with the specified IP protocol are received.
In addition a RAW socket can be bound to a specific network device using
.BR SO_BINDTODEVICE ;
see
.BR socket (7).
An
.B IPPROTO_RAW
socket is send only.
If you really want to receive all IP packets use a
.BR packet (7)
socket with the
.B ETH_P_IP
protocol.
Note that packet sockets don't reassemble IP fragments,
unlike raw sockets.
If you want to receive all ICMP packets for a datagram socket
it is often better to use
.B IP_RECVERR
on that particular socket; see
.BR ip (7).
Raw sockets may tap all IP protocols in Linux, even
protocols like ICMP or TCP which have a protocol module in the kernel.
In this case the packets are passed to both the kernel module and the raw
socket(s).
This should not be relied upon in portable programs, many other BSD
socket implementation have limitations here.
Linux never changes headers passed from the user (except for filling
in some zeroed fields as described for
.BR IP_HDRINCL ).
This differs from many other implementations of raw sockets.
RAW sockets are generally rather unportable and should be avoided in
programs intended to be portable.
Sending on raw sockets should take the IP protocol from
.IR sin_port ;
this ability was lost in Linux 2.2.
The workaround is to use
.BR IP_HDRINCL .
.SH BUGS
Transparent proxy extensions are not described.
When the
.B IP_HDRINCL
option is set datagrams will not be fragmented and are limited to
the interface MTU.
Setting the IP protocol for sending in
.I sin_port
got lost in Linux 2.2.
The protocol that the socket was bound to or that
was specified in the initial
.BR socket (2)
call is always used.
.\" .SH AUTHORS
.\" This man page was written by Andi Kleen.
.SH "SEE ALSO"
.BR recvmsg (2),
.BR sendmsg (2),
.BR capabilities (7),
.BR ip (7),
.BR socket (7)
.B RFC\ 1191
for path MTU discovery.
.B RFC\ 791
and the
.I <linux/ip.h>
include file for the IP protocol.

View File

@ -1,8 +1,449 @@
'\" t
.\" Don't remove the line above, it tells man that tbl is needed.
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\" Based on the original comments from Alexey Kuznetsov, written with
.\" help from Matthew Wilcox.
.\" $Id: rtnetlink.7,v 1.8 2000/01/22 01:55:04 freitag Exp $
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual"
.SH NAME
rtnetlink, NETLINK_ROUTE \- Linux IPv4 routing socket
.SH SYNOPSIS
.B #include <asm/types.h>
.br
.B #include <linux/netlink.h>
.br
.B #include <linux/rtnetlink.h>
.br
.B #include <sys/socket.h>
.sp
.BI "rtnetlink_socket = socket(PF_NETLINK, int " socket_type ", NETLINK_ROUTE);"
.SH DESCRIPTION
Rtnetlink allows the kernel's routing tables to be read and altered.
It is used within the kernel to communicate between
various subsystems, though this usage is not documented here, and for
communication with user-space programs.
Network routes, ip addresses, link parameters, neighbor setups, queueing
disciplines, traffic classes and packet classifiers may all be controlled
through
.B NETLINK_ROUTE
sockets.
It is based on netlink messages, see
.BR netlink (7)
for more information.
.\" FIXME ? all these macros could be moved to rtnetlink(3)
.SS "Routing Attributes"
Some rtnetlink messages have optional attributes after the initial header:
.in +4n
.nf
struct rtattr {
unsigned short rta_len; /* Length of option */
unsigned short rta_type; /* Type of option */
/* Data follows */
};
.fi
.in
These attributes should be only manipulated using the RTA_* macros
or libnetlink, see
.BR rtnetlink (3).
.SS Messages
Rtnetlink consists of these message types
(in addition to standard netlink messages):
.TP
.BR RTM_NEWLINK ", " RTM_DELLINK ", " RTM_GETLINK
Create, remove or get information about a specific network interface.
These messages contain an
.I ifinfomsg
structure followed by a series of
.I rtattr
structures.
.nf
struct ifinfomsg {
unsigned char ifi_family; /* AF_UNSPEC */
unsigned short ifi_type; /* Device type */
int ifi_index; /* Interface index */
unsigned int ifi_flags; /* Device flags */
unsigned int ifi_change; /* change mask */
};
.fi
.\" FIXME ifi_type
.I ifi_flags
contains the device flags, see
.BR netdevice (7);
.I ifi_index
is the unique interface index,
.I ifi_change
is reserved for future use and should be always set to 0xFFFFFFFF.
.TS
tab(:);
c
l l l.
Routing attributes
rta_type:value type:description
_
IFLA_UNSPEC:-:unspecified.
IFLA_ADDRESS:hardware address:interface L2 address
IFLA_BROADCAST:hardware address:L2 broadcast address.
IFLA_IFNAME:asciiz string:Device name.
IFLA_MTU:unsigned int:MTU of the device.
IFLA_LINK:int:Link type.
IFLA_QDISC:asciiz string:Queueing discipline.
IFLA_STATS:T{
see below
T}:Interface Statistics.
.TE
.sp
The value type for IFLA_STATS is \fIstruct net_device_stats\fP.
.TP
.BR RTM_NEWADDR ", " RTM_DELADDR ", " RTM_GETADDR
Add, remove or receive information about an IP address associated with
an interface.
In Linux 2.2 an interface can carry multiple IP addresses,
this replaces the alias device concept in 2.0.
In Linux 2.2 these messages
support IPv4 and IPv6 addresses.
They contain an
.I ifaddrmsg
structure, optionally followed by
.I rtaddr
routing attributes.
.nf
struct ifaddrmsg {
unsigned char ifa_family; /* Address type */
unsigned char ifa_prefixlen; /* Prefixlength of address */
unsigned char ifa_flags; /* Address flags */
unsigned char ifa_scope; /* Address scope */
int ifa_index; /* Interface index */
};
.fi
.I ifa_family
is the address family type (currently
.B AF_INET
or
.BR AF_INET6 ),
.I ifa_prefixlen
is the length of the address mask of the address if defined for the
family (like for IPv4),
.I ifa_scope
is the address scope,
.I ifa_index
is the interface index of the interface the address is associated with.
.I ifa_flags
is a flag word of
.B IFA_F_SECONDARY
for secondary address (old alias interface),
.B IFA_F_PERMANENT
for a permanent address set by the user and other undocumented flags.
.TS
tab(:);
c
l l l.
Attributes
rta_type:value type:description
_
IFA_UNSPEC:-:unspecified.
IFA_ADDRESS:raw protocol address:interface address
IFA_LOCAL:raw protocol address:local address
IFA_LABEL:asciiz string:name of the interface
IFA_BROADCAST:raw protocol address:broadcast address.
IFA_ANYCAST:raw protocol address:anycast address
IFA_CACHEINFO:struct ifa_cacheinfo:Address information.
.TE
.\" FIXME struct ifa_cacheinfo
.TP
.BR RTM_NEWROUTE ", " RTM_DELROUTE ", " RTM_GETROUTE
Create, remove or receive information about a network route.
These messages contain an
.I rtmsg
structure with an optional sequence of
.I rtattr
structures following.
For
.B RTM_GETROUTE
setting
.I rtm_dst_len
and
.I rtm_src_len
to 0 means you get all entries for the specified routing table.
For the other fields except
.I rtm_table
and
.I rtm_protocol
0 is the wildcard.
.nf
struct rtmsg {
unsigned char rtm_family; /* Address family of route */
unsigned char rtm_dst_len; /* Length of source */
unsigned char rtm_src_len; /* Length of destination */
unsigned char rtm_tos; /* TOS filter */
unsigned char rtm_table; /* Routing table ID */
unsigned char rtm_protocol; /* Routing protocol; see below */
unsigned char rtm_scope; /* See below */
unsigned char rtm_type; /* See below */
unsigned int rtm_flags;
};
.fi
.TS
tab(:);
l l.
rtm_type:Route type
_
RTN_UNSPEC:unknown route
RTN_UNICAST:a gateway or direct route
RTN_LOCAL:a local interface route
RTN_BROADCAST:T{
a local broadcast route (sent as a broadcast)
T}
RTN_ANYCAST:T{
a local broadcast route (sent as a unicast)
T}
RTN_MULTICAST:a multicast route
RTN_BLACKHOLE:a packet dropping route
RTN_UNREACHABLE:an unreachable destination
RTN_PROHIBIT:a packet rejection route
RTN_THROW:continue routing lookup in another table
RTN_NAT:a network address translation rule
RTN_XRESOLVE:T{
refer to an external resolver (not implemented)
T}
.TE
.TS
tab(:);
l l.
rtm_protocol:Route origin.
_
RTPROT_UNSPEC:unknown
RTPROT_REDIRECT:T{
by an ICMP redirect (currently unused)
T}
RTPROT_KERNEL:by the kernel
RTPROT_BOOT:during boot
RTPROT_STATIC:by the administrator
.TE
Values larger than
.B RTPROT_STATIC
are not interpreted by the kernel, they are just for user information.
They may be used to tag the source of a routing information or to
distinguish between multiple routing daemons.
See
.I <linux/rtnetlink.h>
for the routing daemon identifiers which are already assigned.
.I rtm_scope
is the distance to the destination:
.TS
tab(:);
l l.
RT_SCOPE_UNIVERSE:global route
RT_SCOPE_SITE:T{
interior route in the local autonomous system
T}
RT_SCOPE_LINK:route on this link
RT_SCOPE_HOST:route on the local host
RT_SCOPE_NOWHERE:destination doesn't exist
.TE
The values between
.B RT_SCOPE_UNIVERSE
and
.B RT_SCOPE_SITE
are available to the user.
The
.I rtm_flags
have the following meanings:
.TS
tab(:);
l l.
RTM_F_NOTIFY:T{
if the route changes, notify the user via rtnetlink
T}
RTM_F_CLONED:route is cloned from another route
RTM_F_EQUALIZE:a multipath equalizer (not yet implemented)
.TE
.I rtm_table
specifies the routing table
.TS
tab(:);
l l.
RT_TABLE_UNSPEC:an unspecified routing table
RT_TABLE_DEFAULT:the default table
RT_TABLE_MAIN:the main table
RT_TABLE_LOCAL:the local table
.TE
The user may assign arbitrary values between
.B RT_TABLE_UNSPEC
and
.BR RT_TABLE_DEFAULT .
.TS
tab(:);
c
l l l.
Attributes
rta_type:value type:description
_
RTA_UNSPEC:-:ignored.
RTA_DST:protocol address:Route destination address.
RTA_SRC:protocol address:Route source address.
RTA_IIF:int:Input interface index.
RTA_OIF:int:Output interface index.
RTA_GATEWAY:protocol address:The gateway of the route
RTA_PRIORITY:int:Priority of route.
RTA_PREFSRC::
RTA_METRICS:int:Route metric
RTA_MULTIPATH::
RTA_PROTOINFO::
RTA_FLOW::
RTA_CACHEINFO::
.TE
.B Fill these values in!
.TP
.BR RTM_NEWNEIGH ", " RTM_DELNEIGH ", " RTM_GETNEIGH
Add, remove or receive information about a neighbor table
entry (e.g., an ARP entry).
The message contains an
.I ndmsg
structure.
.nf
struct ndmsg {
unsigned char ndm_family;
int ndm_ifindex; /* Interface index */
__u16 ndm_state; /* State */
__u8 ndm_flags; /* Flags */
__u8 ndm_type;
};
struct nda_cacheinfo {
__u32 ndm_confirmed;
__u32 ndm_used;
__u32 ndm_updated;
__u32 ndm_refcnt;
};
.fi
.I ndm_state
is a bit mask of the following states:
.TS
tab(:);
l l.
NUD_INCOMPLETE:a currently resolving cache entry
NUD_REACHABLE:a confirmed working cache entry
NUD_STALE:an expired cache entry
NUD_DELAY:an entry waiting for a timer
NUD_PROBE:a cache entry that is currently reprobed
NUD_FAILED:an invalid cache entry
NUD_NOARP:a device with no destination cache
NUD_PERMANENT:a static entry
.TE
Valid
.I ndm_flags
are:
.TS
tab(:);
l l.
NTF_PROXY:a proxy arp entry
NTF_ROUTER:an IPv6 router
.TE
.\" FIXME
.\" document the members of the struct better
The
.I rtaddr
struct has the following meanings for the
.I rta_type
field:
.TS
tab(:);
l l.
NDA_UNSPEC:unknown type
NDA_DST:a neighbor cache n/w layer destination address
NDA_LLADDR:a neighbor cache link layer address
NDA_CACHEINFO:cache statistics.
.TE
If the
.I rta_type
field is
.B NDA_CACHEINFO
then a
.I struct nda_cacheinfo
header follows
.TP
.BR RTM_NEWRULE ", " RTM_DELRULE ", " RTM_GETRULE
Add, delete or retrieve a routing rule.
Carries a
.I struct rtmsg
.TP
.BR RTM_NEWQDISC ", " RTM_DELQDISC ", " RTM_GETQDISC
Add, remove or get a queueing discipline.
The message contains a
.I struct tcmsg
and may be followed by a series of
attributes.
.nf
struct tcmsg {
unsigned char tcm_family;
int tcm_ifindex; /* interface index */
__u32 tcm_handle; /* Qdisc handle */
__u32 tcm_parent; /* Parent qdisc */
__u32 tcm_info;
};
.fi
.TS
tab(:);
c
l l l.
Attributes
rta_type:value type:Description
_
TCA_UNSPEC:-:unspecified
TCA_KIND:asciiz string:Name of queueing discipline
TCA_OPTIONS:byte sequence:Qdisc-specific options follow
TCA_STATS:struct tc_stats:Qdisc statistics.
TCA_XSTATS:qdisc specific:Module-specific statistics.
TCA_RATE:struct tc_estimator:Rate limit.
.TE
In addition various other qdisc module specific attributes are allowed.
For more information see the appropriate include files.
.TP
.BR RTM_NEWTCLASS ", " RTM_DELTCLASS ", " RTM_GETTCLASS
Add, remove or get a traffic class.
These messages contain a
.I struct tcmsg
as described above.
.TP
.BR RTM_NEWTFILTER ", " RTM_DELTFILTER ", " RTM_GETTFILTER
Add, remove or receive information about a traffic filter.
These messages contain a
.I struct tcmsg
as described above.
.SH VERSIONS
.B rtnetlink
is a new feature of Linux 2.2.
.SH BUGS
This manual page is incomplete.
.SH "SEE ALSO"
.BR cmsg (3),
.BR rtnetlink (3),
.BR ip (7),
.BR netlink (7)

View File

@ -1,8 +1,728 @@
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
'\" t
.\" Don't change the first line, it tells man that we need tbl.
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
.\" and copyright (c) 1999 Matthew Wilcox.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\"
.\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
.\" Added description of SO_ACCEPTCONN
.\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
.\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
.\" Added notes on capability requirements
.\" A few small grammar fixes
.\"
.\" FIXME probably all PF_* should be AF_* in this page, since
.\" POSIX only specifies the latter values.
.\"
.TH SOCKET 7 2007-12-28 Linux "Linux Programmer's Manual"
.SH NAME
socket \- Linux socket interface
.SH SYNOPSIS
.B #include <sys/socket.h>
.sp
.IB mysocket " = socket(int " socket_family ", int " socket_type ", int " protocol );
.SH DESCRIPTION
This manual page describes the Linux networking socket layer user
interface.
The BSD compatible sockets
are the uniform interface
between the user process and the network protocol stacks in the kernel.
The protocol modules are grouped into
.I protocol families
like
.BR PF_INET ", " PF_IPX ", " PF_PACKET
and
.I socket types
like
.B SOCK_STREAM
or
.BR SOCK_DGRAM .
See
.BR socket (2)
for more information on families and types.
.SS Socket Layer Functions
These functions are used by the user process to send or receive packets
and to do other socket operations.
For more information see their respective manual pages.
.BR socket (2)
creates a socket,
.BR connect (2)
connects a socket to a remote socket address,
the
.BR bind (2)
function binds a socket to a local socket address,
.BR listen (2)
tells the socket that new connections shall be accepted, and
.BR accept (2)
is used to get a new socket with a new incoming connection.
.BR socketpair (2)
returns two connected anonymous sockets (only implemented for a few
local families like
.BR PF_UNIX )
.PP
.BR send (2),
.BR sendto (2),
and
.BR sendmsg (2)
send data over a socket, and
.BR recv (2),
.BR recvfrom (2),
.BR recvmsg (2)
receive data from a socket.
.BR poll (2)
and
.BR select (2)
wait for arriving data or a readiness to send data.
In addition, the standard I/O operations like
.BR write (2),
.BR writev (2),
.BR sendfile (2),
.BR read (2),
and
.BR readv (2)
can be used to read and write data.
.PP
.BR getsockname (2)
returns the local socket address and
.BR getpeername (2)
returns the remote socket address.
.BR getsockopt (2)
and
.BR setsockopt (2)
are used to set or get socket layer or protocol options.
.BR ioctl (2)
can be used to set or read some other options.
.PP
.BR close (2)
is used to close a socket.
.BR shutdown (2)
closes parts of a full-duplex socket connection.
.PP
Seeking, or calling
.BR pread (2)
or
.BR pwrite (2)
with a non-zero position is not supported on sockets.
.PP
It is possible to do non-blocking I/O on sockets by setting the
.B O_NONBLOCK
flag on a socket file descriptor using
.BR fcntl (2).
Then all operations that would block will (usually)
return with
.B EAGAIN
(operation should be retried later);
.BR connect (2)
will return
.B EINPROGRESS
error.
The user can then wait for various events via
.BR poll (2)
or
.BR select (2).
.TS
tab(:) allbox;
c s s
l l l.
I/O events
Event:Poll flag:Occurrence
Read:POLLIN:T{
New data arrived.
T}
Read:POLLIN:T{
A connection setup has been completed
(for connection-oriented sockets)
T}
Read:POLLHUP:T{
A disconnection request has been initiated by the other end.
T}
Read:POLLHUP:T{
A connection is broken (only for connection-oriented protocols).
When the socket is written
.B SIGPIPE
is also sent.
T}
Write:POLLOUT:T{
Socket has enough send buffer space for writing new data.
T}
Read/Write:T{
POLLIN|
.br
POLLOUT
T}:T{
An outgoing
.BR connect (2)
finished.
T}
Read/Write:POLLERR:An asynchronous error occurred.
Read/Write:POLLHUP:The other end has shut down one direction.
Exception:POLLPRI:T{
Urgent data arrived.
.B SIGURG
is sent then.
T}
.\" FIXME . The following is not true currently:
.\" It is no I/O event when the connection
.\" is broken from the local end using
.\" .BR shutdown (2)
.\" or
.\" .BR close (2).
.TE
.PP
An alternative to
.BR poll (2)
and
.BR select (2)
is to let the kernel inform the application about events
via a
.B SIGIO
signal.
For that the
.B O_ASYNC
flag must be set on a socket file descriptor via
.BR fcntl (2)
and a valid signal handler for
.B SIGIO
must be installed via
.BR sigaction (2).
See the
.I Signals
discussion below.
.SS Socket Options
These socket options can be set by using
.BR setsockopt (2)
and read with
.BR getsockopt (2)
with the socket level set to
.B SOL_SOCKET
for all sockets:
.\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
.\" W R Stevens, UNPv1
.TP
.B SO_ACCEPTCONN
Returns a value indicating whether or not this socket has been marked
to accept connections with
.BR listen (2).
The value 0 indicates that this is not a listening socket,
the value 1 indicates that this is a listening socket.
Can only be read
with
.BR getsockopt (2).
.TP
.B SO_BINDTODEVICE
Bind this socket to a particular device like \(lqeth0\(rq,
as specified in the passed interface name.
If the
name is an empty string or the option length is zero, the socket device
binding is removed.
The passed option is a variable-length null terminated
interface name string with the maximum size of
.BR IFNAMSIZ .
If a socket is bound to an interface,
only packets received from that particular interface are processed by the
socket.
Note that this only works for some socket types, particularly
.B AF_INET
sockets.
It is not supported for packet sockets (use normal
.BR bind (8)
there).
.TP
.B SO_BROADCAST
Set or get the broadcast flag.
When enabled, datagram sockets
receive packets sent to a broadcast address and they are allowed to send
packets to a broadcast address.
This option has no effect on stream-oriented sockets.
.TP
.B SO_BSDCOMPAT
Enable BSD bug-to-bug compatibility.
This is used by the UDP protocol module in Linux 2.0 and 2.2.
If enabled ICMP errors received for a UDP socket will not be passed
to the user program.
In later kernel versions, support for this option has been phased out:
Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
(printk()) if a program uses this option.
Linux 2.0 also enabled BSD bug-to-bug compatibility
options (random header changing, skipping of the broadcast flag) for raw
sockets with this option, but that was removed in Linux 2.2.
.TP
.B SO_DEBUG
Enable socket debugging.
Only allowed for processes with the
.B CAP_NET_ADMIN
capability or an effective user ID of 0.
.TP
.B SO_ERROR
Get and clear the pending socket error.
Only valid as a
.BR getsockopt (2).
Expects an integer.
.TP
.B SO_DONTROUTE
Don't send via a gateway, only send to directly connected hosts.
The same effect can be achieved by setting the
.B MSG_DONTROUTE
flag on a socket
.BR send (2)
operation.
Expects an integer boolean flag.
.TP
.B SO_KEEPALIVE
Enable sending of keep-alive messages on connection-oriented sockets.
Expects an integer boolean flag.
.TP
.B SO_LINGER
Sets or gets the
.B SO_LINGER
option.
The argument is a
.I linger
structure.
.sp
.in +4n
.nf
struct linger {
int l_onoff; /* linger active */
int l_linger; /* how many seconds to linger for */
};
.fi
.in
.IP
When enabled, a
.BR close (2)
or
.BR shutdown (2)
will not return until all queued messages for the socket have been
successfully sent or the linger timeout has been reached.
Otherwise,
the call returns immediately and the closing is done in the background.
When the socket is closed as part of
.BR exit (2),
it always lingers in the background.
.TP
.B SO_OOBINLINE
If this option is enabled,
out-of-band data is directly placed into the receive data stream.
Otherwise out-of-band data is only passed when the
.B MSG_OOB
flag is set during receiving.
.\" don't document it because it can do too much harm.
.\".B SO_NO_CHECK
.TP
.B SO_PASSCRED
Enable or disable the receiving of the
.B SCM_CREDENTIALS
control message.
For more information see
.BR unix (7).
.\" FIXME Document SO_PASSSEC, added in 2.6.18; there is some info
.\" in the 2.6.18 ChangeLog
.TP
.B SO_PEERCRED
Return the credentials of the foreign process connected to this socket.
This is only possible for connected
.B PF_UNIX
stream sockets and
.B PF_UNIX
stream and datagram socket pairs created using
.BR socketpair (2);
see
.BR unix (7).
The returned credentials are those that were in effect at the time
of the call to
.BR connect (2)
or
.BR socketpair (2).
Argument is a
.I ucred
structure.
Only valid as a
.BR getsockopt (2).
.TP
.B SO_PRIORITY
Set the protocol-defined priority for all packets to be sent on
this socket.
Linux uses this value to order the networking queues:
packets with a higher priority may be processed first depending
on the selected device queueing discipline.
For
.BR ip (7),
this also sets the IP type-of-service (TOS) field for outgoing packets.
Setting a priority outside the range 0 to 6 requires the
.B CAP_NET_ADMIN
capability.
.TP
.B SO_RCVBUF
Sets or gets the maximum socket receive buffer in bytes.
The kernel doubles this value (to allow space for bookkeeping overhead)
when it is set using
.\" Most (all?) other implementations do not do this -- MTK, Dec 05
.BR setsockopt (2),
and this doubled value is returned by
.BR getsockopt (2).
The default value is set by the
.I rmem_default
sysctl and the maximum allowed value is set by the
.I rmem_max
sysctl.
The minimum (doubled) value for this option is 256.
.TP
.BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
Using this socket option, a privileged
.RB ( CAP_NET_ADMIN )
process can perform the same task as
.BR SO_RCVBUF ,
but the
.I rmem_max
limit can be overridden.
.TP
.BR SO_RCVLOWAT " and " SO_SNDLOWAT
Specify the minimum number of bytes in the buffer until the socket layer
will pass the data to the protocol
.RB ( SO_SNDLOWAT )
or the user on receiving
.RB ( SO_RCVLOWAT ).
These two values are initialized to 1.
.B SO_SNDLOWAT
is not changeable on Linux
.RB ( setsockopt (2)
fails with the error
.BR ENOPROTOOPT ).
.B SO_RCVLOWAT
is changeable
only since Linux 2.4.
The
.BR select (2)
and
.BR poll (2)
system calls currently do not respect the
.B SO_RCVLOWAT
setting on Linux,
and mark a socket readable when even a single byte of data is available.
A subsequent read from the socket will block until
.B SO_RCVLOWAT
bytes are available.
.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
.\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
.TP
.BR SO_RCVTIMEO " and " SO_SNDTIMEO
.\" Not implemented in 2.0.
.\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
.\" Implemented in 2.3.41 for setsockopt, and actually used.
Specify the receiving or sending timeouts until reporting an error.
The argument is a
.IR "struct timeval" .
If an input or output function blocks for this period of time, and
data has been sent or received, the return value of that function
will be the amount of data transferred; if no data has been transferred
and the timeout has been reached then \-1 is returned with
.I errno
set to
.B EAGAIN
or
.B EWOULDBLOCK
.\" in fact to EAGAIN
just as if the socket was specified to be non-blocking.
If the timeout is set to zero (the default)
then the operation will never timeout.
Timeouts only have effect for system calls that perform socket I/O (e.g.,
.BR read (2),
.BR recvmsg (2),
.BR send (2),
.BR sendmsg (2));
timeouts have no effect for
.BR select (2),
.BR poll (2),
.BR epoll_wait (2),
etc.
.TP
.B SO_REUSEADDR
Indicates that the rules used in validating addresses supplied in a
.BR bind (2)
call should allow reuse of local addresses.
For
.B PF_INET
sockets this
means that a socket may bind, except when there
is an active listening socket bound to the address.
When the listening socket is bound to
.B INADDR_ANY
with a specific port then it is not possible
to bind to this port for any local address.
Argument is an integer boolean flag.
.TP
.B SO_SNDBUF
Sets or gets the maximum socket send buffer in bytes.
The kernel doubles this value (to allow space for bookkeeping overhead)
when it is set using
.\" Most (all?) other implementations do not do this -- MTK, Dec 05
.BR setsockopt (2),
and this doubled value is returned by
.BR getsockopt (2).
The default value is set by the
.I wmem_default
sysctl and the maximum allowed value is set by the
.I wmem_max
sysctl.
The minimum (doubled) value for this option is 2048.
.TP
.BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
Using this socket option, a privileged
.RB ( CAP_NET_ADMIN )
process can perform the same task as
.BR SO_SNDBUF ,
but the
.I wmem_max
limit can be overridden.
.TP
.B SO_TIMESTAMP
Enable or disable the receiving of the
.B SO_TIMESTAMP
control message.
The timestamp control message is sent with level
.B SOL_SOCKET
and the
.I cmsg_data
field is a
.I "struct timeval"
indicating the
reception time of the last packet passed to the user in this call.
See
.BR cmsg (3)
for details on control messages.
.TP
.B SO_TYPE
Gets the socket type as an integer (like
.BR SOCK_STREAM ).
Can only be read
with
.BR getsockopt (2).
.SS Signals
When writing onto a connection-oriented socket that has been shut down
(by the local or the remote end)
.B SIGPIPE
is sent to the writing process and
.B EPIPE
is returned.
The signal is not sent when the write call
specified the
.B MSG_NOSIGNAL
flag.
.PP
When requested with the
.B FIOSETOWN
.BR fcntl (2)
or
.B SIOCSPGRP
.BR ioctl (2),
.B SIGIO
is sent when an I/O event occurs.
It is possible to use
.BR poll (2)
or
.BR select (2)
in the signal handler to find out which socket the event occurred on.
An alternative (in Linux 2.2) is to set a real-time signal using the
.B F_SETSIG
.BR fcntl (2);
the handler of the real time signal will be called with
the file descriptor in the
.I si_fd
field of its
.IR siginfo_t .
See
.BR fcntl (2)
for more information.
.PP
Under some circumstances (e.g., multiple processes accessing a
single socket), the condition that caused the
.B SIGIO
may have already disappeared when the process reacts to the signal.
If this happens, the process should wait again because Linux
will resend the signal later.
.\" .SS Ancillary Messages
.SS Sysctls
The core socket networking sysctls can be accessed using the
.I /proc/sys/net/core/*
files or with the
.BR sysctl (2)
interface.
.TP
.I rmem_default
contains the default setting in bytes of the socket receive buffer.
.TP
.I rmem_max
contains the maximum socket receive buffer size in bytes which a user may
set by using the
.B SO_RCVBUF
socket option.
.TP
.I wmem_default
contains the default setting in bytes of the socket send buffer.
.TP
.I wmem_max
contains the maximum socket send buffer size in bytes which a user may
set by using the
.B SO_SNDBUF
socket option.
.TP
.BR message_cost " and " message_burst
configure the token bucket filter used to load limit warning messages
caused by external network events.
.TP
.I netdev_max_backlog
Maximum number of packets in the global input queue.
.TP
.I optmem_max
Maximum length of ancillary data and user control data like the iovecs
per socket.
.\" netdev_fastroute is not documented because it is experimental
.SS Ioctls
These operations can be accessed using
.BR ioctl (2):
.in +4n
.nf
.IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
.fi
.in
.TP
.B SIOCGSTAMP
Return a
.I struct timeval
with the receive timestamp of the last packet passed to the user.
This is useful for accurate round trip time measurements.
See
.BR setitimer (2)
for a description of
.IR "struct timeval" .
.\"
This ioctl should only be used if the socket option
.B SO_TIMESTAMP
is not set on the socket.
Otherwise, it returns the timestamp of the
last packet that was received while
.B SO_TIMESTAMP
was not set, or it fails if no such packet has been received,
(i.e.,
.BR ioctl (2)
returns \-1 with
.I errno
set to
.BR ENOENT ).
.TP
.B SIOCSPGRP
Set the process or process group to send
.B SIGIO
or
.B SIGURG
signals
to when an
asynchronous I/O operation has finished or urgent data is available.
The argument is a pointer to a
.IR pid_t .
If the argument is positive, send the signals to that process.
If the
argument is negative, send the signals to the process group with the ID
of the absolute value of the argument.
The process may only choose itself or its own process group to receive
signals unless it has the
.B CAP_KILL
capability or an effective UID of 0.
.TP
.B FIOASYNC
Change the
.B O_ASYNC
flag to enable or disable asynchronous I/O mode of the socket.
Asynchronous I/O mode means that the
.B SIGIO
signal or the signal set with
.B F_SETSIG
is raised when a new I/O event occurs.
.IP
Argument is an integer boolean flag.
(This operation is synonymous with the use of
.BR fcntl (2)
to set the
.B O_ASYNC
flag.)
.\"
.TP
.B SIOCGPGRP
Get the current process or process group that receives
.B SIGIO
or
.B SIGURG
signals,
or 0
when none is set.
.PP
Valid
.BR fcntl (2)
operations:
.TP
.B FIOGETOWN
The same as the
.B SIOCGPGRP
.BR ioctl (2).
.TP
.B FIOSETOWN
The same as the
.B SIOCSPGRP
.BR ioctl (2).
.SH VERSIONS
.B SO_BINDTODEVICE
was introduced in Linux 2.0.30.
.B SO_PASSCRED
is new in Linux 2.2.
The sysctls are new in Linux 2.2.
.B SO_RCVTIMEO
and
.B SO_SNDTIMEO
are supported since Linux 2.3.41.
Earlier, timeouts were fixed to
a protocol-specific setting, and could not be read or written.
.SH NOTES
Linux assumes that half of the send/receive buffer is used for internal
kernel structures; thus the sysctls are twice what can be observed
on the wire.
Linux will only allow port re-use with the
.B SO_REUSEADDR
option
when this option was set both in the previous program that performed a
.BR bind (2)
to the port and in the program that wants to re-use the port.
This differs from some implementations (e.g., FreeBSD)
where only the later program needs to set the
.B SO_REUSEADDR
option.
Typically this difference is invisible, since, for example, a server
program is designed to always set this option.
.SH BUGS
The
.B CONFIG_FILTER
socket options
.B SO_ATTACH_FILTER
and
.B SO_DETACH_FILTER
are
not documented.
The suggested interface to use them is via the libpcap
library.
.\" .SH AUTHORS
.\" This man page was written by Andi Kleen.
.SH "SEE ALSO"
.BR getsockopt (2),
.BR setsockopt (2),
.BR socket (2),
.BR capabilities (7),
.BR ddp (7),
.BR ip (7),
.BR packet (7)

View File

@ -1,8 +1,947 @@
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\"
.\" 2.4 Updates by Nivedita Singhvi 4/20/02 <nivedita@us.ibm.com>.
.\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer
.\" Updated details of interaction of TCP_CORK and TCP_NODELAY.
.\"
.\" FIXME 2.6.17-rc1 adds the following /proc files, which need to be
.\" documented: tcp_mtu_probing, tcp_base_mss, and
.\" tcp_workaround_signed_windows
.\"
.TH TCP 7 2007-11-25 "Linux" "Linux Programmer's Manual"
.SH NAME
tcp \- TCP protocol
.SH SYNOPSIS
.B #include <sys/socket.h>
.br
.B #include <netinet/in.h>
.br
.B #include <netinet/tcp.h>
.sp
.B tcp_socket = socket(PF_INET, SOCK_STREAM, 0);
.SH DESCRIPTION
This is an implementation of the TCP protocol defined in
RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK
extensions.
It provides a reliable, stream-oriented,
full-duplex connection between two sockets on top of
.BR ip (7),
for both v4 and v6 versions.
TCP guarantees that the data arrives in order and
retransmits lost packets.
It generates and checks a per-packet checksum to catch
transmission errors.
TCP does not preserve record boundaries.
A newly created TCP socket has no remote or local address and is not
fully specified.
To create an outgoing TCP connection use
.BR connect (2)
to establish a connection to another TCP socket.
To receive new incoming connections, first
.BR bind (2)
the socket to a local address and port and then call
.BR listen (2)
to put the socket into the listening state.
After that a new
socket for each incoming connection can be accepted
using
.BR accept (2).
A socket which has had
.BR accept (2)
or
.BR connect (2)
successfully called on it is fully specified and may
transmit data.
Data cannot be transmitted on listening or
not yet connected sockets.
Linux supports RFC\ 1323 TCP high performance
extensions.
These include Protection Against Wrapped
Sequence Numbers (PAWS), Window Scaling and
Timestamps.
Window scaling allows the use
of large (> 64K) TCP windows in order to support links with high
latency or bandwidth.
To make use of them, the send and
receive buffer sizes must be increased.
They can be set globally with the
.I net.ipv4.tcp_wmem
and
.I net.ipv4.tcp_rmem
sysctl variables, or on individual sockets by using the
.B SO_SNDBUF
and
.B SO_RCVBUF
socket options with the
.BR setsockopt (2)
call.
The maximum sizes for socket buffers declared via the
.B SO_SNDBUF
and
.B SO_RCVBUF
mechanisms are limited by the global
.I net.core.rmem_max
and
.I net.core.wmem_max
sysctls.
Note that TCP actually allocates twice the size of
the buffer requested in the
.BR setsockopt (2)
call, and so a succeeding
.BR getsockopt (2)
call will not return the same size of buffer as requested
in the
.BR setsockopt (2)
call.
TCP uses the extra space for administrative purposes and internal
kernel structures, and the sysctl variables reflect the
larger sizes compared to the actual TCP windows.
On individual connections, the socket buffer size must be
set prior to the
.BR listen (2)
or
.BR connect (2)
calls in order to have it take effect.
See
.BR socket (7)
for more information.
.PP
TCP supports urgent data.
Urgent data is used to signal the
receiver that some important message is part of the data
stream and that it should be processed as soon as possible.
To send urgent data specify the
.B MSG_OOB
option to
.BR send (2).
When urgent data is received, the kernel sends a
.B SIGURG
signal to the process or process group that has been set as the
socket "owner" using the
.B SIOCSPGRP
or
.B FIOSETOWN
ioctls (or the POSIX.1-2001-specified
.BR fcntl (2)
.B F_SETOWN
operation).
When the
.B SO_OOBINLINE
socket option is enabled, urgent data is put into the normal
data stream (a program can test for its location using the
.B SIOCATMARK
ioctl described below),
otherwise it can be only received when the
.B MSG_OOB
flag is set for
.BR recv (2)
or
.BR recvmsg (2).
Linux 2.4 introduced a number of changes for improved
throughput and scaling, as well as enhanced functionality.
Some of these features include support for zero-copy
.BR sendfile (2),
Explicit Congestion Notification, new
management of TIME_WAIT sockets, keep-alive socket options
and support for Duplicate SACK extensions.
.SS Address Formats
TCP is built on top of IP (see
.BR ip (7)).
The address formats defined by
.BR ip (7)
apply to TCP.
TCP only supports point-to-point
communication; broadcasting and multicasting are not
supported.
.SS Sysctls
These variables can be accessed by the
.I /proc/sys/net/ipv4/*
files or with the
.BR sysctl (2)
interface.
In addition, most IP sysctls also apply to TCP; see
.BR ip (7).
Variables described as
.I Boolean
take an integer value, with a non-zero value ("true") meaning that
the corresponding option is enabled, and a zero value ("false")
meaning that the option is disabled.
.\" FIXME As at Sept 2006, kernel 2.6.18-rc5, the following are
.\" not yet documented (shown with default values):
.\"
.\" /proc/sys/net/ipv4/tcp_congestion_control (since 2.6.13)
.\" bic
.\" /proc/sys/net/ipv4/tcp_moderate_rcvbuf
.\" 1
.\" /proc/sys/net/ipv4/tcp_no_metrics_save
.\" 0
.TP
.BR tcp_abort_on_overflow " (Boolean; default: disabled)"
Enable resetting connections if the listening service is too
slow and unable to keep up and accept them.
It means that if overflow occurred due
to a burst, the connection will recover.
Enable this option
.I only
if you are really sure that the listening daemon
cannot be tuned to accept connections faster.
Enabling this
option can harm the clients of your server.
.TP
.BR tcp_adv_win_scale " (integer; default: 2)"
Count buffering overhead as
.IR "bytes/2^tcp_adv_win_scale" ,
if
.I tcp_adv_win_scale
is greater than 0; or
.IR "bytes-bytes/2^(\-tcp_adv_win_scale)" ,
if
.I tcp_adv_win_scale
is less than or equal to zero.
The socket receive buffer space is shared between the
application and kernel.
TCP maintains part of the buffer as
the TCP window, this is the size of the receive window
advertised to the other end.
The rest of the space is used
as the "application" buffer, used to isolate the network
from scheduling and application latencies.
The
.I tcp_adv_win_scale
default value of 2 implies that the space
used for the application buffer is one fourth that of the
total.
.TP
.BR tcp_app_win " (integer; default: 31)"
This variable defines how many
bytes of the TCP window are reserved for buffering
overhead.
A maximum of (\fIwindow/2^tcp_app_win\fP, mss) bytes in the window
are reserved for the application buffer.
A value of 0
implies that no amount is reserved.
.\"
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
.TP
.BR tcp_bic " (Boolean; default: disabled)"
Enable BIC TCP congestion control algorithm.
BIC-TCP is a sender-side only change that ensures a linear RTT
fairness under large windows while offering both scalability and
bounded TCP-friendliness.
The protocol combines two schemes
called additive increase and binary search increase.
When the
congestion window is large, additive increase with a large
increment ensures linear RTT fairness as well as good
scalability.
Under small congestion windows, binary search
increase provides TCP friendliness.
.\"
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
.TP
.BR tcp_bic_low_window " (integer; default: 14)"
Sets the threshold window (in packets) where BIC TCP starts to
adjust the congestion window.
Below this threshold BIC TCP behaves
the same as the default TCP Reno.
.\"
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
.TP
.BR tcp_bic_fast_convergence " (Boolean; default: enabled)"
Forces BIC TCP to more quickly respond to changes in congestion
window.
Allows two flows sharing the same connection to converge
more rapidly.
.TP
.BR tcp_dsack " (Boolean; default: enabled)"
Enable RFC\ 2883 TCP Duplicate SACK support.
.TP
.BR tcp_ecn " (Boolean; default: disabled)"
Enable RFC\ 2884 Explicit Congestion Notification.
When enabled, connectivity to some
destinations could be affected due to older, misbehaving
routers along the path causing connections to be dropped.
.TP
.BR tcp_fack " (Boolean; default: enabled)"
Enable TCP Forward Acknowledgement support.
.TP
.BR tcp_fin_timeout " (integer; default: 60)"
This specifies how many seconds to wait for a final FIN packet before the
socket is forcibly closed.
This is strictly a violation of
the TCP specification, but required to prevent
denial-of-service attacks.
In Linux 2.2, the default value was 180.
.\"
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
.TP
.BR tcp_frto " (Boolean; default: disabled)"
Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
timeouts.
It is particularly beneficial in wireless environments
where packet loss is typically due to random radio interference
rather than intermediate router congestion.
.TP
.BR tcp_keepalive_intvl " (integer; default: 75)"
The number of seconds between TCP keep-alive probes.
.TP
.BR tcp_keepalive_probes " (integer; default: 9)"
The maximum number of TCP keep-alive probes to send
before giving up and killing the connection if
no response is obtained from the other end.
.TP
.BR tcp_keepalive_time " (integer; default: 7200)"
The number of seconds a connection needs to be idle
before TCP begins sending out keep-alive probes.
Keep-alives are only sent when the
.B SO_KEEPALIVE
socket option is enabled.
The default value is 7200 seconds (2 hours).
An idle connection is terminated after
approximately an additional 11 minutes (9 probes an interval
of 75 seconds apart) when keep-alive is enabled.
Note that underlying connection tracking mechanisms and
application timeouts may be much shorter.
.\"
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
.TP
.BR tcp_low_latency " (Boolean; default: disabled)"
If enabled, the TCP stack makes decisions that prefer lower
latency as opposed to higher throughput.
It this option is disabled, then higher throughput is preferred.
An example of an application where this default should be
changed would be a Beowulf compute cluster.
.TP
.BR tcp_max_orphans " (integer; default: see below)"
The maximum number of orphaned (not attached to any user file
handle) TCP sockets allowed in the system.
When this number
is exceeded, the orphaned connection is reset and a warning
is printed.
This limit exists only to prevent simple denial-of-service attacks.
Lowering this limit is not recommended.
Network conditions might require you to increase the number of
orphans allowed, but note that each orphan can eat up to ~64K
of unswappable memory.
The default initial value is set
equal to the kernel parameter NR_FILE.
This initial default is adjusted depending on the memory in the system.
.TP
.BR tcp_max_syn_backlog " (integer; default: see below)"
The maximum number of queued connection requests which have
still not received an acknowledgement from the connecting client.
If this number is exceeded, the kernel will begin
dropping requests.
The default value of 256 is increased to
1024 when the memory present in the system is adequate or
greater (>= 128Mb), and reduced to 128 for those systems with
very low memory (<= 32Mb).
It is recommended that if this
needs to be increased above 1024, TCP_SYNQ_HSIZE in
.I include/net/tcp.h
be modified to keep
TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be
recompiled.
.TP
.BR tcp_max_tw_buckets " (integer; default: see below)"
The maximum number of sockets in TIME_WAIT state allowed in
the system.
This limit exists only to prevent simple denial-of-service
attacks.
The default value of NR_FILE*2 is adjusted
depending on the memory in the system.
If this number is
exceeded, the socket is closed and a warning is printed.
.TP
.I tcp_mem
This is a vector of 3 integers: [low, pressure, high].
These bounds are used by TCP to track its memory usage.
The
defaults are calculated at boot time from the amount of
available memory.
(TCP can only use
.I "low memory"
for this, which is limited to around 900 megabytes on 32-bit systems.
64-bit systems do not suffer this limitation.)
.I low
- TCP doesn't regulate its memory allocation when the number
of pages it has allocated globally is below this number.
.I pressure
- when the amount of memory allocated by TCP
exceeds this number of pages, TCP moderates its memory consumption.
This memory pressure state is exited
once the number of pages allocated falls below
the
.I low
mark.
.I high
- the maximum number of pages, globally, that TCP
will allocate.
This value overrides any other limits
imposed by the kernel.
.TP
.BR tcp_orphan_retries " (integer; default: 8)"
The maximum number of attempts made to probe the other
end of a connection which has been closed by our end.
.TP
.BR tcp_reordering " (integer; default: 3)"
The maximum a packet can be reordered in a TCP packet stream
without TCP assuming packet loss and going into slow start.
It is not advisable to change this number.
This is a packet reordering detection metric designed to
minimize unnecessary back off and retransmits provoked by
reordering of packets on a connection.
.TP
.BR tcp_retrans_collapse " (Boolean; default: enabled)"
Try to send full-sized packets during retransmit.
.TP
.BR tcp_retries1 " (integer; default: 3)"
The number of times TCP will attempt to retransmit a
packet on an established connection normally,
without the extra effort of getting the network
layers involved.
Once we exceed this number of
retransmits, we first have the network layer
update the route if possible before each new retransmit.
The default is the RFC specified minimum of 3.
.TP
.BR tcp_retries2 " (integer; default: 15)"
The maximum number of times a TCP packet is retransmitted
in established state before giving up.
The default
value is 15, which corresponds to a duration of
approximately between 13 to 30 minutes, depending
on the retransmission timeout.
The RFC\ 1122 specified
minimum limit of 100 seconds is typically deemed too
short.
.TP
.BR tcp_rfc1337 " (Boolean; default: disabled)"
Enable TCP behavior conformant with RFC\ 1337.
When disabled,
if a RST is received in TIME_WAIT state, we close
the socket immediately without waiting for the end
of the TIME_WAIT period.
.TP
.I tcp_rmem
This is a vector of 3 integers: [min, default,
max].
These parameters are used by TCP to regulate receive
buffer sizes.
TCP dynamically adjusts the size of the
receive buffer from the defaults listed below, in the range
of these sysctl variables, depending on memory available
in the system.
.RS
.TP 9
.I min
minimum size of the receive buffer used by each TCP socket.
The default value is 4K, and is lowered to
.B PAGE_SIZE
bytes in low-memory systems.
This value
is used to ensure that in memory pressure mode,
allocations below this size will still succeed.
This is not
used to bound the size of the receive buffer declared
using
.B SO_RCVBUF
on a socket.
.TP
.I default
the default size of the receive buffer for a TCP socket.
This value overwrites the initial default buffer size from
the generic global
.I net.core.rmem_default
defined for all protocols.
The default value is 87380
bytes, and is lowered to 43689 in low-memory systems.
If larger receive buffer sizes are desired, this value should
be increased (to affect all sockets).
To employ large TCP
windows, the
.I net.ipv4.tcp_window_scaling
must be enabled (default).
.TP
.I max
the maximum size of the receive buffer used by
each TCP socket.
This value does not override the global
.IR net.core.rmem_max .
This is not used to limit the size of the receive buffer
declared using
.B SO_RCVBUF
on a socket.
The default value of 87380*2 bytes is lowered to 87380
in low-memory systems.
.RE
.TP
.BR tcp_sack " (Boolean; default: enabled)"
Enable RFC\ 2018 TCP Selective Acknowledgements.
.TP
.BR tcp_stdurg " (Boolean; default: disabled)"
If this option is enabled, then use the RFC\ 1122 interpretation
of the TCP urgent-pointer field.
.\" RFC 793 was ambiguous in its specification of the meaning of the
.\" urgent pointer. RFC 1122 (and RFC 961) fixed on a particular
.\" resolution of this ambiguity (unfortunately the "wrong" one).
According to this interpretation, the urgent pointer points
to the last byte of urgent data.
If this option is disabled, then use the BSD-compatible interpretation of
the urgent pointer:
the urgent pointer points to the first byte after the urgent data.
Enabling this option may lead to interoperability problems.
.TP
.BR tcp_synack_retries " (integer; default: 5)"
The maximum number of times a SYN/ACK segment
for a passive TCP connection will be retransmitted.
This number should not be higher than 255.
.TP
.BR tcp_syncookies " (Boolean)"
Enable TCP syncookies.
The kernel must be compiled with
.BR CONFIG_SYN_COOKIES .
Send out syncookies when the syn backlog queue of a socket
overflows.
The syncookies feature attempts to protect a
socket from a SYN flood attack.
This should be used as a
last resort, if at all.
This is a violation of the TCP
protocol, and conflicts with other areas of TCP such as TCP
extensions.
It can cause problems for clients and relays.
It is not recommended as a tuning mechanism for heavily
loaded servers to help with overloaded or misconfigured
conditions.
For recommended alternatives see
.IR tcp_max_syn_backlog ,
.IR tcp_synack_retries ,
and
.IR tcp_abort_on_overflow .
.TP
.BR tcp_syn_retries " (integer; default: 5)"
The maximum number of times initial SYNs for an active TCP
connection attempt will be retransmitted.
This value should
not be higher than 255.
The default value is 5, which
corresponds to approximately 180 seconds.
.TP
.BR tcp_timestamps " (Boolean; default: enabled)"
Enable RFC\ 1323 TCP timestamps.
.TP
.BR tcp_tw_recycle " (Boolean; default: disabled)"
Enable fast recycling of TIME_WAIT sockets.
Enabling this option is not
recommended since this causes problems when working
with NAT (Network Address Translation).
.\"
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
.TP
.BR tcp_tw_reuse " (Boolean; default: disabled)"
Allow to reuse TIME_WAIT sockets for new connections when it is
safe from protocol viewpoint.
It should not be changed without advice/request of technical
experts.
.TP
.BR tcp_window_scaling " (Boolean; default: enabled)"
Enable RFC\ 1323 TCP window scaling.
This feature allows the use of a large window
(> 64K) on a TCP connection, should the other end support it.
Normally, the 16 bit window length field in the TCP header
limits the window size to less than 64K bytes.
If larger
windows are desired, applications can increase the size of
their socket buffers and the window scaling option will be
employed.
If
.I tcp_window_scaling
is disabled, TCP will not negotiate the use of window
scaling with the other end during connection setup.
.\"
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
.TP
.BR tcp_vegas_cong_avoid " (Boolean; default: disabled)"
Enable TCP Vegas congestion avoidance algorithm.
TCP Vegas is a sender-side only change to TCP that anticipates
the onset of congestion by estimating the bandwidth.
TCP Vegas
adjusts the sending rate by modifying the congestion
window.
TCP Vegas should provide less packet loss, but it is
not as aggressive as TCP Reno.
.\"
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
.TP
.BR tcp_westwood " (Boolean; default: disabled)"
Enable TCP Westwood+ congestion control algorithm.
TCP Westwood+ is a sender-side only modification of the TCP Reno
protocol stack that optimizes the performance of TCP congestion
control.
It is based on end-to-end bandwidth estimation to set
congestion window and slow start threshold after a congestion
episode.
Using this estimation, TCP Westwood+ adaptively sets a
slow start threshold and a congestion window which takes into
account the bandwidth used at the time congestion is experienced.
TCP Westwood+ significantly increases fairness with respect to
TCP Reno in wired networks and throughput over wireless links.
.TP
.I tcp_wmem
This is a vector of 3 integers: [min, default, max].
These parameters are used by TCP to regulate send buffer sizes.
TCP dynamically adjusts the size of the send buffer from the
default values listed below, in the range of these sysctl
variables, depending on memory available.
.I min
- minimum size of the send buffer used by each TCP socket.
The default value is 4K bytes.
This value is used to ensure that in memory pressure mode,
allocations below this size will still succeed.
This is not
used to bound the size of the send buffer declared
using
.B SO_SNDBUF
on a socket.
.I default
- the default size of the send buffer for a TCP socket.
This value overwrites the initial default buffer size from
the generic global
.I net.core.wmem_default
defined for all protocols.
The default value is 16K bytes.
If larger send buffer sizes are desired, this value
should be increased (to affect all sockets).
To employ large TCP windows, the sysctl variable
.I net.ipv4.tcp_window_scaling
must be enabled (default).
.I max
- the maximum size of the send buffer used by
each TCP socket.
This value does not override the global
.IR net.core.wmem_max .
This is not used to limit the size of the send buffer
declared using
.B SO_SNDBUF
on a socket.
The default value is 128K bytes.
It is lowered to 64K
depending on the memory available in the system.
.SS Socket Options
To set or get a TCP socket option, call
.BR getsockopt (2)
to read or
.BR setsockopt (2)
to write the option with the option level argument set to
.BR IPPROTO_TCP .
.\" or SOL_TCP on Linux
In addition,
most
.B IPPROTO_IP
socket options are valid on TCP sockets.
For more information see
.BR ip (7).
.\" FIXME Document TCP_CONGESTION (new in 2.6.13)
.TP
.B TCP_CORK
If set, don't send out partial frames.
All queued
partial frames are sent when the option is cleared again.
This is useful for prepending headers before calling
.BR sendfile (2),
or for throughput optimization.
As currently implemented, there is a 200 millisecond ceiling on the time
for which output is corked by
.BR TCP_CORK .
If this ceiling is reached, then queued data is automatically transmitted.
This option can be
combined with
.B TCP_NODELAY
only since Linux 2.5.71.
This option should not be used in code intended to be
portable.
.TP
.B TCP_DEFER_ACCEPT
Allows a listener to be awakened only when data arrives on
the socket.
Takes an integer value (seconds), this can
bound the maximum number of attempts TCP will make to
complete the connection.
This option should not be used in
code intended to be portable.
.TP
.B TCP_INFO
Used to collect information about this socket.
The kernel returns a \fIstruct tcp_info\fP as defined in the file
.IR /usr/include/linux/tcp.h .
This option should not be used in code intended to be portable.
.TP
.B TCP_KEEPCNT
The maximum number of keepalive probes TCP should send
before dropping the connection.
This option should not be
used in code intended to be portable.
.TP
.B TCP_KEEPIDLE
The time (in seconds) the connection needs to remain idle
before TCP starts sending keepalive probes, if the socket
option
.B SO_KEEPALIVE
has been set on this socket.
This option should not be used in code intended to be portable.
.TP
.B TCP_KEEPINTVL
The time (in seconds) between individual keepalive probes.
This option should not be used in code intended to be
portable.
.TP
.B TCP_LINGER2
The lifetime of orphaned FIN_WAIT2 state sockets.
This option can be used to override the system wide sysctl
.I tcp_fin_timeout
on this socket.
This is not to be confused with the
.BR socket (7)
level option
.BR SO_LINGER .
This option should not be used in code intended to be
portable.
.TP
.B TCP_MAXSEG
The maximum segment size for outgoing TCP packets.
If this option is set before connection establishment, it also
changes the MSS value announced to the other end in the
initial packet.
Values greater than the (eventual) interface MTU have no effect.
TCP will also impose
its minimum and maximum bounds over the value provided.
.TP
.B TCP_NODELAY
If set, disable the Nagle algorithm.
This means that segments
are always sent as soon as possible, even if there is only a
small amount of data.
When not set, data is buffered until there
is a sufficient amount to send out, thereby avoiding the
frequent sending of small packets, which results in poor
utilization of the network.
This option is overridden by
.BR TCP_CORK ;
however, setting this option forces an explicit flush of
pending output, even if
.B TCP_CORK
is currently set.
.TP
.B TCP_QUICKACK
Enable quickack mode if set or disable quickack
mode if cleared.
In quickack mode, acks are sent
immediately, rather than delayed if needed in accordance
to normal TCP operation.
This flag is not permanent,
it only enables a switch to or from quickack mode.
Subsequent operation of the TCP protocol will
once again enter/leave quickack mode depending on
internal protocol processing and factors such as
delayed ack timeouts occurring and data transfer.
This option should not be used in code intended to be
portable.
.TP
.B TCP_SYNCNT
Set the number of SYN retransmits that TCP should send before
aborting the attempt to connect.
It cannot exceed 255.
This option should not be used in code intended to be
portable.
.TP
.B TCP_WINDOW_CLAMP
Bound the size of the advertised window to this value.
The kernel imposes a minimum size of SOCK_MIN_RCVBUF/2.
This option should not be used in code intended to be
portable.
.SS Ioctls
These following
.BR ioctl (2)
calls return information in
.IR value .
The correct syntax is:
.PP
.RS
.nf
.BI int " value";
.IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");"
.fi
.RE
.PP
.I ioctl_type
is one of the following:
.TP
.B SIOCINQ
Returns the amount of queued unread data in the receive buffer.
The socket must not be in LISTEN state, otherwise an error
.RB ( EINVAL )
is returned.
.TP
.B SIOCATMARK
Returns true (i.e.,
.I value
is non-zero) if the inbound data stream is at the urgent mark.
If the
.B SO_OOBINLINE
socket option is set, and
.B SIOCATMARK
returns true, then the
next read from the socket will return the urgent data.
If the
.B SO_OOBINLINE
socket option is not set, and
.B SIOCATMARK
returns true, then the
next read from the socket will return the bytes following
the urgent data (to actually read the urgent data requires the
.B recv(MSG_OOB)
flag).
Note that a read never reads across the urgent mark.
If an application is informed of the presence of urgent data via
.BR select (2)
(using the
.I exceptfds
argument) or through delivery of a
.B SIGURG
signal,
then it can advance up to the mark using a loop which repeatedly tests
.B SIOCATMARK
and performs a read (requesting any number of bytes) as long as
.B SIOCATMARK
returns false.
.TP
.B SIOCOUTQ
Returns the amount of unsent data in the socket send queue.
The socket must not be in LISTEN state, otherwise an error
.RB ( EINVAL )
is returned.
.SS Error Handling
When a network error occurs, TCP tries to resend the packet.
If it doesn't succeed after some time, either
.B ETIMEDOUT
or the last received error on this connection is reported.
.PP
Some applications require a quicker error notification.
This can be enabled with the
.B IPPROTO_IP
level
.B IP_RECVERR
socket option.
When this option is enabled, all incoming
errors are immediately passed to the user program.
Use this
option with care \(em it makes TCP less tolerant to routing
changes and other normal network conditions.
.SH ERRORS
.TP
.B EAFNOTSUPPORT
Passed socket address type in
.I sin_family
was not
.BR AF_INET .
.TP
.B EPIPE
The other end closed the socket unexpectedly or a read is
executed on a shut down socket.
.TP
.B ETIMEDOUT
The other end didn't acknowledge retransmitted data after
some time.
.PP
Any errors defined for
.BR ip (7)
or the generic socket layer may also be returned for TCP.
.SH VERSIONS
Support for Explicit Congestion Notification, zero-copy
.BR sendfile (2),
reordering support and some SACK extensions
(DSACK) were introduced in 2.4.
Support for forward acknowledgement (FACK), TIME_WAIT recycling,
per connection keepalive socket options and sysctls
were introduced in 2.3.
The default values and descriptions for the sysctl variables
given above are applicable for the 2.4 kernel.
.SH NOTES
TCP has no real out-of-band data; it has urgent data.
In Linux this means if the other end sends newer out-of-band
data the older urgent data is inserted as normal data into
the stream (even when
.B SO_OOBINLINE
is not set).
This differs from BSD-based stacks.
.PP
Linux uses the BSD compatible interpretation of the urgent
pointer field by default.
This violates RFC\ 1122, but is
required for interoperability with other stacks.
It can be changed by the
.I tcp_stdurg
sysctl.
.SH BUGS
Not all errors are documented.
.br
IPv6 is not described.
.\" Only a single Linux kernel version is described
.\" Info for 2.2 was lost. Should be added again,
.\" or put into a separate page.
.\" .SH AUTHORS
.\" This man page was originally written by Andi Kleen.
.\" It was updated for 2.4 by Nivedita Singhvi with input from
.\" Alexey Kuznetsov's Documentation/networking/ip-sysctl.txt
.\" document.
.SH "SEE ALSO"
.BR accept (2),
.BR bind (2),
.BR connect (2),
.BR getsockopt (2),
.BR listen (2),
.BR recvmsg (2),
.BR sendfile (2),
.BR sendmsg (2),
.BR socket (2),
.BR sysctl (2),
.BR ip (7),
.BR socket (7)
.sp
RFC\ 793 for the TCP specification.
.br
RFC\ 1122 for the TCP requirements and a description
of the Nagle algorithm.
.br
RFC\ 1323 for TCP timestamp and window scaling options.
.br
RFC\ 1644 for a description of TIME_WAIT assassination
hazards.
.br
RFC\ 3168 for a description of Explicit Congestion
Notification.
.br
RFC\ 2581 for TCP congestion control algorithms.
.br
RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK.

View File

@ -1,8 +1,193 @@
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\" $Id: udp.7,v 1.7 2000/01/22 01:55:05 freitag Exp $
.\"
.TH UDP 7 1998-10-02 "Linux" "Linux Programmer's Manual"
.SH NAME
udp \- User Datagram Protocol for IPv4
.SH SYNOPSIS
.B #include <sys/socket.h>
.br
.B #include <netinet/in.h>
.sp
.B udp_socket = socket(PF_INET, SOCK_DGRAM, 0);
.SH DESCRIPTION
This is an implementation of the User Datagram Protocol
described in RFC\ 768.
It implements a connectionless, unreliable datagram packet service.
Packets may be reordered or duplicated before they arrive.
UDP generates and checks checksums to catch transmission errors.
When a UDP socket is created,
its local and remote addresses are unspecified.
Datagrams can be sent immediately using
.BR sendto (2)
or
.BR sendmsg (2)
with a valid destination address as an argument.
When
.BR connect (2)
is called on the socket the default destination address is set and
datagrams can now be sent using
.BR send (2)
or
.BR write (2)
without specifying a destination address.
It is still possible to send to other destinations by passing an
address to
.BR sendto (2)
or
.BR sendmsg (2).
In order to receive packets the socket can be bound to a local
address first by using
.BR bind (2).
Otherwise the socket layer will automatically assign
a free local port out of the range defined by
.I net.ipv4.ip_local_port_range
and bind the socket to
.BR INADDR_ANY .
All receive operations return only one packet.
When the packet is smaller than the passed buffer only that much
data is returned, when it is bigger the packet is truncated and the
.B MSG_TRUNC
flag is set.
.B MSG_WAITALL
is not supported.
IP options may be sent or received using the socket options described in
.BR ip (7).
They are only processed by the kernel when the appropriate sysctl
is enabled (but still passed to the user even when it is turned off).
See
.BR ip (7).
When the
.B MSG_DONTROUTE
flag is set on sending the destination address must refer to a local
interface address and the packet is only sent to that interface.
By default Linux UDP does path MTU (Maximum Transmission Unit) discovery.
This means the kernel
will keep track of the MTU to a specific target IP address and return
.B EMSGSIZE
when a UDP packet write exceeds it.
When this happens the application should decrease the packet size.
Path MTU discovery can be also turned off using the
.B IP_MTU_DISCOVER
socket option or the
.I ip_no_pmtu_disc
sysctl, see
.BR ip (7)
for details.
When turned off UDP will fragment outgoing UDP packets
that exceed the interface MTU.
However disabling it is not recommended
for performance and reliability reasons.
.SS "Address Format"
UDP uses the IPv4
.I sockaddr_in
address format described in
.BR ip (7).
.SS "Error Handling"
All fatal errors will be passed to the user as an error return even
when the socket is not connected.
This includes asynchronous errors
received from the network.
You may get an error for an earlier packet
that was sent on the same socket.
This behavior differs from many other BSD socket implementations
which don't pass any errors unless the socket is connected.
Linux's behavior is mandated by
.BR RFC\ 1122 .
For compatibility with legacy code in Linux 2.0 and 2.2
it was possible to set the
.B SO_BSDCOMPAT
.B SOL_SOCKET
option to receive remote errors only when the socket has been
connected (except for
.B EPROTO
and
.BR EMSGSIZE ).
Locally generated errors are always passed.
Support for this socket option was removed in later kernels; see
.BR socket (7)
for further information.
When the
.B IP_RECVERR
option is enabled all errors are stored in the socket error queue
and can be received by
.BR recvmsg (2)
with the
.B MSG_ERRQUEUE
flag set.
.SS "Socket Options"
To set or get a UDP socket option, call
.BR getsockopt (2)
to read or
.BR setsockopt (2)
to write the option with the option level argument set to
.BR IPPROTO_UDP .
.TP
.BR UDP_CORK " (since Linux 2.5.44)"
If this option is enabled, then all data output on this socket
is accumulated into a single datagram that is transmitted when
the option is disabled.
This option should not be used in code intended to be
portable.
.\" FIXME document UDP_ENCAP (new in kernel 2.5.67)
.SS Ioctls
These ioctls can be accessed using
.BR ioctl (2).
The correct syntax is:
.PP
.RS
.nf
.BI int " value";
.IB error " = ioctl(" udp_socket ", " ioctl_type ", &" value ");"
.fi
.RE
.TP
.BR FIONREAD " (" SIOCINQ )
Gets a pointer to an integer as argument.
Returns the size of the next pending datagram in the integer in bytes,
or 0 when no datagram is pending.
.TP
.BR TIOCOUTQ " (" SIOCOUTQ )
Returns the number of data bytes in the local send queue.
Only supported with Linux 2.4 and above.
.PP
In addition all ioctls documented in
.BR ip (7)
and
.BR socket (7)
are supported.
.SH ERRORS
All errors documented for
.BR socket (7)
or
.BR ip (7)
may be returned by a send or receive on a UDP socket.
.B ECONNREFUSED
No receiver was associated with the destination address.
This might be caused by a previous packet sent over the socket.
.SH VERSIONS
IP_RECVERR is a new feature in Linux 2.2.
.\" .SH CREDITS
.\" This man page was written by Andi Kleen.
.SH "SEE ALSO"
.BR ip (7),
.BR raw (7),
.BR socket (7)
RFC\ 768 for the User Datagram Protocol.
.br
RFC\ 1122 for the host requirements.
.br
RFC\ 1191 for a description of path MTU discovery.

View File

@ -1,8 +1,359 @@
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\"
.\" Modified, 2003-12-02, Michael Kerrisk, <mtk.manpages@gmail.com>
.\" Modified, 2003-09-23, Adam Langley
.\" Modified, 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com>
.\" Added SOCK_SEQPACKET
.\" 2008-05-27, mtk, Provide a clear description of the three types of
.\" address that can appear in the sockaddr_un structure: pathname,
.\" unnamed, and abstract.
.\"
.TH UNIX 7 2008-06-17 "Linux" "Linux Programmer's Manual"
.SH NAME
unix, PF_UNIX, AF_UNIX, PF_LOCAL, AF_LOCAL \- Sockets for local
interprocess communication
.SH SYNOPSIS
.B #include <sys/socket.h>
.br
.B #include <sys/un.h>
.IB unix_socket " = socket(PF_UNIX, type, 0);"
.br
.IB error " = socketpair(PF_UNIX, type, 0, int *" sv ");"
.SH DESCRIPTION
The
.B PF_UNIX
(also known as
.BR PF_LOCAL )
socket family is used to communicate between processes on the same machine
efficiently.
Traditionally, Unix sockets can be either unnamed,
or bound to a file system pathname (marked as being of type socket).
Linux also supports an abstract namespace which is independent of the
file system.
Valid types are:
.BR SOCK_STREAM ,
for a stream-oriented socket and
.BR SOCK_DGRAM ,
for a datagram-oriented socket that preserves message boundaries
(as on most Unix implementations, Unix domain datagram
sockets are always reliable and don't reorder datagrams);
and (since Linux 2.6.4)
.BR SOCK_SEQPACKET ,
for a connection-oriented socket that preserves message boundaries
and delivers messages in the order that they were sent.
Unix sockets support passing file descriptors or process credentials
to other processes using ancillary data.
.SS Address Format
A Unix domain socket address is represented in the following structure:
.in +4n
.nf
#define UNIX_PATH_MAX 108
struct sockaddr_un {
sa_family_t sun_family; /* AF_UNIX */
char sun_path[UNIX_PATH_MAX]; /* pathname */
};
.fi
.in
.PP
.I sun_family
always contains
.BR AF_UNIX .
Three types of address are distinguished in this structure:
.IP * 3
.IR pathname :
a Unix domain socket can be bound to a null-terminated file
system pathname using
.BR bind (2).
When the address of the socket is returned by
.BR getsockname (2),
.BR getpeername (2),
and
.BR accept (2),
its length is
.IR "sizeof(sa_family_t) + strlen(sun_path) + 1" ,
and
.I sun_path
contains the null-terminated pathname.
.IP *
.IR unnamed :
A stream socket that has not been bound to a pathname using
.BR bind (2)
has no name.
Likewise, the two sockets created by
.BR socketpair (2)
are unnamed.
When the address of an unnamed socket is returned by
.BR getsockname (2),
.BR getpeername (2),
and
.BR accept (2),
its length is
.IR "sizeof(sa_family_t)" ,
and
.I sun_path
should not be inspected.
.\" There is quite some variation across implementations: FreeBSD
.\" says the length is 16 bytes, HP-UX 11 says it's zero bytes.
.IP *
.IR abstract :
an abstract socket address is distinguished by the fact that
.IR sun_path[0]
is a null byte ('\\0').
All of the remaining bytes in
.I sun_path
define the "name" of the socket.
(Null bytes in the name have no special significance.)
The name has no connection with file system pathnames.
The socket's address in this namespace is given by the rest of the
bytes in
.IR sun_path .
When the address of an abstract socket is returned by
.BR getsockname (2),
.BR getpeername (2),
and
.BR accept (2),
its length is
.IR "sizeof(struct sockaddr_un)" ,
and
.I sun_path
contains the abstract name.
The abstract socket namespace is a non-portable Linux extension.
.SS Socket Options
For historical reasons these socket options are specified with a
.B SOL_SOCKET
type even though they are
.B PF_UNIX
specific.
They can be set with
.BR setsockopt (2)
and read with
.BR getsockopt (2)
by specifying
.B SOL_SOCKET
as the socket family.
.TP
.B SO_PASSCRED
Enables the receiving of the credentials of the sending process
ancillary message.
When this option is set and the socket is not yet connected
a unique name in the abstract namespace will be generated automatically.
Expects an integer boolean flag.
.SS (Un)supported Features
The following paragraphs describe domain-specific details and
unsupported features of the sockets API for Unix domain sockets on Linux.
Unix domain sockets do not support the transmission of
out-of-band data (the
.B MSG_OOB
flag for
.BR send (2)
and
.BR recv (2)).
The
.BR send (2)
.B MSG_MORE
flag is not supported by Unix domain sockets.
The
.B SO_SNDBUF
socket option does have an effect for Unix domain sockets, but the
.B SO_RCVBUF
option does not.
For datagram sockets, the
.B SO_SNDBUF
value imposes an upper limit on the size of outgoing datagrams.
This limit is calculated as the doubled (see
.BR socket (7))
option value less 32 bytes used for overhead.
.SS Ancillary Messages
Ancillary data is sent and received using
.BR sendmsg (2)
and
.BR recvmsg (2).
For historical reasons the ancillary message types listed below
are specified with a
.B SOL_SOCKET
type even though they are
.B PF_UNIX
specific.
To send them set the
.I cmsg_level
field of the struct
.I cmsghdr
to
.B SOL_SOCKET
and the
.I cmsg_type
field to the type.
For more information see
.BR cmsg (3).
.TP
.B SCM_RIGHTS
Send or receive a set of open file descriptors from another process.
The data portion contains an integer array of the file descriptors.
The passed file descriptors behave as though they have been created with
.BR dup (2).
.TP
.B SCM_CREDENTIALS
Send or receive Unix credentials.
This can be used for authentication.
The credentials are passed as a
.I struct ucred
ancillary message.
.in +4n
.nf
struct ucred {
pid_t pid; /* process ID of the sending process */
uid_t uid; /* user ID of the sending process */
gid_t gid; /* group ID of the sending process */
};
.fi
.in
The credentials which the sender specifies are checked by the kernel.
A process with effective user ID 0 is allowed to specify values that do
not match its own.
The sender must specify its own process ID (unless it has the capability
.BR CAP_SYS_ADMIN ),
its user ID, effective user ID, or saved set-user-ID (unless it has
.BR CAP_SETUID ),
and its group ID, effective group ID, or saved set-group-ID
(unless it has
.BR CAP_SETGID ).
To receive a
.I struct ucred
message the
.B SO_PASSCRED
option must be enabled on the socket.
.SH ERRORS
.TP
.B EADDRINUSE
Selected local address is already taken or file system socket
object already exists.
.TP
.B ECONNREFUSED
.BR connect (2)
called with a socket object that isn't listening.
This can happen when
the remote socket does not exist or the filename is not a socket.
.TP
.B ECONNRESET
Remote socket was unexpectedly closed.
.TP
.B EFAULT
User memory address was not valid.
.TP
.B EINVAL
Invalid argument passed.
A common cause is the missing setting of AF_UNIX
in the
.I sun_type
field of passed addresses or the socket being in an
invalid state for the applied operation.
.TP
.B EISCONN
.BR connect (2)
called on an already connected socket or a target address was
specified on a connected socket.
.TP
.B ENOMEM
Out of memory.
.TP
.B ENOTCONN
Socket operation needs a target address, but the socket is not connected.
.TP
.B EOPNOTSUPP
Stream operation called on non-stream oriented socket or tried to
use the out-of-band data option.
.TP
.B EPERM
The sender passed invalid credentials in the
.IR "struct ucred" .
.TP
.B EPIPE
Remote socket was closed on a stream socket.
If enabled, a
.B SIGPIPE
is sent as well.
This can be avoided by passing the
.B MSG_NOSIGNAL
flag to
.BR sendmsg (2)
or
.BR recvmsg (2).
.TP
.B EPROTONOSUPPORT
Passed protocol is not PF_UNIX.
.TP
.B EPROTOTYPE
Remote socket does not match the local socket type
.RB ( SOCK_DGRAM
vs.
.BR SOCK_STREAM )
.TP
.B ESOCKTNOSUPPORT
Unknown socket type.
.PP
Other errors can be generated by the generic socket layer or
by the file system while generating a file system socket object.
See the appropriate manual pages for more information.
.SH VERSIONS
.B SCM_CREDENTIALS
and the abstract namespace were introduced with Linux 2.2 and should not
be used in portable programs.
(Some BSD-derived systems also support credential passing,
but the implementation details differ.)
.SH NOTES
In the Linux implementation, sockets which are visible in the
file system honor the permissions of the directory they are in.
Their owner, group and their permissions can be changed.
Creation of a new socket will fail if the process does not have write and
search (execute) permission on the directory the socket is created in.
Connecting to the socket object requires read/write permission.
This behavior differs from many BSD-derived systems which
ignore permissions for Unix sockets.
Portable programs should not rely on
this feature for security.
Binding to a socket with a filename creates a socket
in the file system that must be deleted by the caller when it is no
longer needed (using
.BR unlink (2)).
The usual Unix close-behind semantics apply; the socket can be unlinked
at any time and will be finally removed from the file system when the last
reference to it is closed.
To pass file descriptors or credentials over a
.BR SOCK_STREAM ,
you need
to send or receive at least one byte of non-ancillary data in the same
.BR sendmsg (2)
or
.BR recvmsg (2)
call.
Unix domain stream sockets do not support the notion of out-of-band data.
.SH EXAMPLE
See
.BR bind (2).
.SH "SEE ALSO"
.BR recvmsg (2),
.BR sendmsg (2),
.BR socket (2),
.BR socketpair (2),
.BR cmsg (3),
.BR capabilities (7),
.BR credentials (7),
.BR socket (7)

View File

@ -1,8 +1,122 @@
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
.\" This man page is Copyright (C) 1998 Heiner Eisen.
.\" Permission is granted to distribute possibly modified copies
.\" of this page provided the header is included verbatim,
.\" and in case of nontrivial modification author and date
.\" of the modification is added to the header.
.\" $Id: x25.7,v 1.4 1999/05/18 10:35:12 freitag Exp $
.TH X25 7 1998-12-01 "Linux" "Linux Programmer's Manual"
.SH NAME
x25, PF_X25 \- ITU-T X.25 / ISO-8208 protocol interface.
.SH SYNOPSIS
.B #include <sys/socket.h>
.br
.B #include <linux/x25.h>
.sp
.B x25_socket = socket(PF_X25, SOCK_SEQPACKET, 0);
.SH DESCRIPTION
X25 sockets provide an interface to the X.25 packet layer protocol.
This allows applications to
communicate over a public X.25 data network as standardized by
International Telecommunication Union's recommendation X.25
(X.25 DTE-DCE mode).
X25 sockets can also be used for communication
without an intermediate X.25 network (X.25 DTE-DTE mode) as described
in ISO-8208.
.PP
Message boundaries are preserved \(em a
.BR read (2)
from a socket will
retrieve the same chunk of data as output with the corresponding
.BR write (2)
to the peer socket.
When necessary, the kernel takes care
of segmenting and re-assembling long messages by means of
the X.25 M-bit.
There is no hard-coded upper limit for the
message size.
However, re-assembling of a long message might fail if
there is a temporary lack of system resources or when other constraints
(such as socket memory or buffer size limits) become effective.
If that
occurs, the X.25 connection will be reset.
.SS Socket Addresses
The
.B AF_X25
socket address family uses the
.I struct sockaddr_x25
for representing network addresses as defined in ITU-T
recommendation X.121.
.PP
.in +4n
.nf
struct sockaddr_x25 {
sa_family_t sx25_family; /* must be AF_X25 */
x25_address sx25_addr; /* X.121 Address */
};
.fi
.in
.PP
.I sx25_addr
contains a char array
.I x25_addr[]
to be interpreted as a null-terminated string.
.I sx25_addr.x25_addr[]
consists of up to 15 (not counting the terminating 0) ASCII
characters forming the X.121 address.
Only the decimal digit characters from \(aq0\(aq to \(aq9\(aq are allowed.
.SS Socket Options
The following X.25-specific socket options can be set by using
.BR setsockopt (2)
and read with
.BR getsockopt (2)
with the
.I level
argument set to
.BR SOL_X25 .
.TP
.B X25_QBITINCL
Controls whether the X.25 Q-bit (Qualified Data Bit) is accessible by the
user.
It expects an integer argument.
If set to 0 (default),
the Q-bit is never set for outgoing packets and the Q-bit of incoming
packets is ignored.
If set to 1, an additional first byte is prepended
to each message read from or written to the socket.
For data read from
the socket, a 0 first byte indicates that the Q-bits of the corresponding
incoming data packets were not set.
A first byte with value 1 indicates
that the Q-bit of the corresponding incoming data packets was set.
If the first byte of the data written to the socket is 1 the Q-bit of the
corresponding outgoing data packets will be set.
If the first byte is 0
the Q-bit will not be set.
.SH VERSIONS
The PF_X25 protocol family is a new feature of Linux 2.2.
.SH BUGS
Plenty, as the X.25 PLP implementation is
.BR CONFIG_EXPERIMENTAL .
.PP
This man page is incomplete.
.PP
There is no dedicated application programmer's header file yet;
you need to include the kernel header file
.IR <linux/x25.h> .
.B CONFIG_EXPERIMENTAL
might also imply that future versions of the
interface are not binary compatible.
.PP
X.25 N-Reset events are not propagated to the user process yet.
Thus,
if a reset occurred, data might be lost without notice.
.SH "SEE ALSO"
.BR socket (2),
.BR socket (7)
.PP
Jonathan Simon Naylor:
\(lqThe Re-Analysis and Re-Implementation of X.25.\(rq
The URL is
.RS
.I ftp://ftp.pspt.fi/pub/ham/linux/ax25/x25doc.tgz
.RE