diff --git a/Changes b/Changes index b360280ec..d26917cc4 100644 --- a/Changes +++ b/Changes @@ -38,6 +38,29 @@ initrd.4 mtk Fix mis-ordered (.SH) sections. +connect.2 +socket.2 +rtnetlink.3 +arp.7 +ddp.7 +ip.7 +ipv6.7 +netlink.7 +packet.7 +raw.7 +rtnetlink.7 +socket.7 +tcp.7 +udp.7 +unix.7 +x25.7 + mtk + s/PF_/AF_/ for socket family conistants. Reasons: the AF_ and + PF_ constants have always had the same values; there never has + been a protocol family that had more than one address family, + and POSIX.1-2001 only specifies the AF_* constants. + + Typographical or grammatical errors have been corrected in several other places. diff --git a/man2/connect.2 b/man2/connect.2 index 5c14002c9..b78e375be 100644 --- a/man2/connect.2 +++ b/man2/connect.2 @@ -1,8 +1,268 @@ -.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual" +.\" Hey Emacs! This file is -*- nroff -*- source. +.\" +.\" Copyright 1993 Rickard E. Faith (faith@cs.unc.edu) +.\" Portions extracted from /usr/include/sys/socket.h, which does not have +.\" any authorship information in it. It is probably available under the GPL. +.\" +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" +.\" +.\" Other portions are from the 6.9 (Berkeley) 3/10/91 man page: +.\" +.\" Copyright (c) 1983 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" Modified 1997-01-31 by Eric S. Raymond +.\" Modified 1998, 1999 by Andi Kleen +.\" Modified 2004-06-23 by Michael Kerrisk +.\" +.TH CONNECT 2 2007-12-28 "Linux" "Linux Programmer's Manual" +.SH NAME +connect \- initiate a connection on a socket +.SH SYNOPSIS +.nf +.BR "#include " " /* See NOTES */" +.br +.B #include +.sp +.BI "int connect(int " sockfd ", const struct sockaddr *" serv_addr , +.BI " socklen_t " addrlen ); +.fi +.SH DESCRIPTION +The +.BR connect () +system call connects the socket referred to by the file descriptor +.I sockfd +to the address specified by +.IR serv_addr . +The +.I addrlen +argument specifies the size of +.IR serv_addr . +The format of the address in +.I serv_addr +is determined by the address space of the socket +.IR sockfd ; +see +.BR socket (2) +for further details. + +If the socket +.I sockfd +is of type +.B SOCK_DGRAM +then +.I serv_addr +is the address to which datagrams are sent by default, and the only +address from which datagrams are received. +If the socket is of type +.B SOCK_STREAM +or +.BR SOCK_SEQPACKET , +this call attempts to make a connection to the socket that is bound +to the address specified by +.IR serv_addr . +.PP +Generally, connection-based protocol sockets may successfully +.BR connect () +only once; connectionless protocol sockets may use +.BR connect () +multiple times to change their association. +Connectionless sockets may +dissolve the association by connecting to an address with the +.I sa_family +member of +.I sockaddr +set to +.BR AF_UNSPEC +(supported on Linux since kernel 2.2). +.SH "RETURN VALUE" +If the connection or binding succeeds, zero is returned. +On error, \-1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +The following are general socket errors only. +There may be other domain-specific error codes. +.TP +.B EACCES +For Unix domain sockets, which are identified by pathname: +Write permission is denied on the socket file, +or search permission is denied for one of the directories +in the path prefix. +(See also +.BR path_resolution (7).) +.TP +.BR EACCES ", " EPERM +The user tried to connect to a broadcast address without having the socket +broadcast flag enabled or the connection request failed because of a local +firewall rule. +.TP +.B EADDRINUSE +Local address is already in use. +.TP +.B EAFNOSUPPORT +The passed address didn't have the correct address family in its +.I sa_family +field. +.TP +.B EAGAIN +No more free local ports or insufficient entries in the routing cache. +For +.B PF_INET +see the +.I net.ipv4.ip_local_port_range +sysctl in +.BR ip (7) +on how to increase the number of local ports. +.TP +.B EALREADY +The socket is non-blocking and a previous connection attempt has not yet +been completed. +.TP +.B EBADF +The file descriptor is not a valid index in the descriptor table. +.TP +.B ECONNREFUSED +No-one listening on the remote address. +.TP +.B EFAULT +The socket structure address is outside the user's address space. +.TP +.B EINPROGRESS +The socket is non-blocking and the connection cannot be completed +immediately. +It is possible to +.BR select (2) +or +.BR poll (2) +for completion by selecting the socket for writing. +After +.BR select (2) +indicates writability, use +.BR getsockopt (2) +to read the +.B SO_ERROR +option at level +.B SOL_SOCKET +to determine whether +.BR connect () +completed successfully +.RB ( SO_ERROR +is zero) or unsuccessfully +.RB ( SO_ERROR +is one of the usual error codes listed here, +explaining the reason for the failure). +.TP +.B EINTR +The system call was interrupted by a signal that was caught; see +.BR signal (7). +.\" For TCP, the connection will complete asynchronously. +.\" See http://lkml.org/lkml/2005/7/12/254 +.TP +.B EISCONN +The socket is already connected. +.TP +.B ENETUNREACH +Network is unreachable. +.TP +.B ENOTSOCK +The file descriptor is not associated with a socket. +.TP +.B ETIMEDOUT +Timeout while attempting connection. +The server may be too +busy to accept new connections. +Note that for IP sockets the timeout may +be very long when syncookies are enabled on the server. +.SH "CONFORMING TO" +SVr4, 4.4BSD, (the +.BR connect () +function first appeared in 4.2BSD), POSIX.1-2001. +.\" SVr4 documents the additional +.\" general error codes +.\" .BR EADDRNOTAVAIL , +.\" .BR EINVAL , +.\" .BR EAFNOSUPPORT , +.\" .BR EALREADY , +.\" .BR EINTR , +.\" .BR EPROTOTYPE , +.\" and +.\" .BR ENOSR . +.\" It also +.\" documents many additional error conditions not described here. +.SH NOTES +POSIX.1-2001 does not require the inclusion of +.IR , +and this header file is not required on Linux. +However, some historical (BSD) implementations required this header +file, and portable applications are probably wise to include it. + +The third argument of +.BR connect () +is in reality an +.I int +(and this is what 4.x BSD and libc4 and libc5 have). +Some POSIX confusion resulted in the present +.IR socklen_t , +also used by glibc. +See also +.BR accept (2). +.SH EXAMPLE +An example of the use of +.BR connect () +is shown in +.BR getaddrinfo (3). +.SH "SEE ALSO" +.BR accept (2), +.BR bind (2), +.BR getsockname (2), +.BR listen (2), +.BR socket (2), +.BR path_resolution (7) diff --git a/man2/socket.2 b/man2/socket.2 index 4dde28bfe..9a17b2cfd 100644 --- a/man2/socket.2 +++ b/man2/socket.2 @@ -1,8 +1,382 @@ -.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual" +'\" t +.\" Copyright (c) 1983, 1991 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" 3. All advertising materials mentioning features or use of this software +.\" must display the following acknowledgement: +.\" This product includes software developed by the University of +.\" California, Berkeley and its contributors. +.\" 4. Neither the name of the University nor the names of its contributors +.\" may be used to endorse or promote products derived from this software +.\" without specific prior written permission. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $Id: socket.2,v 1.4 1999/05/13 11:33:42 freitag Exp $ +.\" +.\" Modified 1993-07-24 by Rik Faith +.\" Modified 1996-10-22 by Eric S. Raymond +.\" Modified 1998, 1999 by Andi Kleen +.\" Modified 2002-07-17 by Michael Kerrisk +.\" Modified 2004-06-17 by Michael Kerrisk +.\" +.TH SOCKET 2 2004-06-17 "Linux" "Linux Programmer's Manual" +.SH NAME +socket \- create an endpoint for communication +.SH SYNOPSIS +.BR "#include " " /* See NOTES */" +.br +.B #include +.sp +.BI "int socket(int " domain ", int " type ", int " protocol ); +.SH DESCRIPTION +.BR socket () +creates an endpoint for communication and returns a descriptor. +.PP +The +.I domain +argument specifies a communication domain; this selects the protocol +family which will be used for communication. +These families are defined in +.IR . +The currently understood formats include: +.TS +tab(:); +l l l. +Name:Purpose:Man page +T{ +.BR PF_UNIX ", " PF_LOCAL +T}:T{ +Local communication +T}:T{ +.BR unix (7) +T} +T{ +.B PF_INET +T}:IPv4 Internet protocols:T{ +.BR ip (7) +T} +T{ +.B PF_INET6 +T}:IPv6 Internet protocols:T{ +.BR ipv6 (7) +T} +T{ +.B PF_IPX +T}:IPX \- Novell protocols: +T{ +.B PF_NETLINK +T}:T{ +Kernel user interface device +T}:T{ +.BR netlink (7) +T} +T{ +.B PF_X25 +T}:ITU-T X.25 / ISO-8208 protocol:T{ +.BR x25 (7) +T} +T{ +.B PF_AX25 +T}:T{ +Amateur radio AX.25 protocol +T}: +T{ +.B PF_ATMPVC +T}:Access to raw ATM PVCs: +T{ +.B PF_APPLETALK +T}:Appletalk:T{ +.BR ddp (7) +T} +T{ +.B PF_PACKET +T}:T{ +Low level packet interface +T}:T{ +.BR packet (7) +T} +.TE +.PP +The socket has the indicated +.IR type , +which specifies the communication semantics. +Currently defined types +are: +.TP +.B SOCK_STREAM +Provides sequenced, reliable, two-way, connection-based byte streams. +An out-of-band data transmission mechanism may be supported. +.TP +.B SOCK_DGRAM +Supports datagrams (connectionless, unreliable messages of a fixed +maximum length). +.TP +.B SOCK_SEQPACKET +Provides a sequenced, reliable, two-way connection-based data +transmission path for datagrams of fixed maximum length; a consumer is +required to read an entire packet with each input system call. +.TP +.B SOCK_RAW +Provides raw network protocol access. +.TP +.B SOCK_RDM +Provides a reliable datagram layer that does not guarantee ordering. +.TP +.B SOCK_PACKET +Obsolete and should not be used in new programs; +see +.BR packet (7). +.PP +Some socket types may not be implemented by all protocol families; +for example, +.B SOCK_SEQPACKET +is not implemented for +.BR AF_INET . +.PP +The +.I protocol +specifies a particular protocol to be used with the socket. +Normally only a single protocol exists to support a particular +socket type within a given protocol family, in which case +.I protocol +can be specified as 0. +However, it is possible that many protocols may exist, in +which case a particular protocol must be specified in this manner. +The protocol number to use is specific to the \*(lqcommunication domain\*(rq +in which communication is to take place; see +.BR protocols (5). +See +.BR getprotoent (3) +on how to map protocol name strings to protocol numbers. +.PP +Sockets of type +.B SOCK_STREAM +are full-duplex byte streams, similar to pipes. +They do not preserve +record boundaries. +A stream socket must be in +a +.I connected +state before any data may be sent or received on it. +A connection to +another socket is created with a +.BR connect (2) +call. +Once connected, data may be transferred using +.BR read (2) +and +.BR write (2) +calls or some variant of the +.BR send (2) +and +.BR recv (2) +calls. +When a session has been completed a +.BR close (2) +may be performed. +Out-of-band data may also be transmitted as described in +.BR send (2) +and received as described in +.BR recv (2). +.PP +The communications protocols which implement a +.B SOCK_STREAM +ensure that data is not lost or duplicated. +If a piece of data for which +the peer protocol has buffer space cannot be successfully transmitted +within a reasonable length of time, then the connection is considered +to be dead. +When +.B SO_KEEPALIVE +is enabled on the socket the protocol checks in a protocol-specific +manner if the other end is still alive. +A +.B SIGPIPE +signal is raised if a process sends or receives +on a broken stream; this causes naive processes, +which do not handle the signal, to exit. +.B SOCK_SEQPACKET +sockets employ the same system calls as +.B SOCK_STREAM +sockets. +The only difference is that +.BR read (2) +calls will return only the amount of data requested, +and any data remaining in the arriving packet will be discarded. +Also all message boundaries in incoming datagrams are preserved. +.PP +.B SOCK_DGRAM +and +.B SOCK_RAW +sockets allow sending of datagrams to correspondents named in +.BR sendto (2) +calls. +Datagrams are generally received with +.BR recvfrom (2), +which returns the next datagram along with the address of its sender. +.PP +.B SOCK_PACKET +is an obsolete socket type to receive raw packets directly from the +device driver. +Use +.BR packet (7) +instead. +.PP +An +.BR fcntl (2) +.B F_SETOWN +operation can be used to specify a process or process group to receive a +.B SIGURG +signal when the out-of-band data arrives or +.B SIGPIPE +signal when a +.B SOCK_STREAM +connection breaks unexpectedly. +This operation may also be used to set the process or process group +that receives the I/O and asynchronous notification of I/O events via +.BR SIGIO . +Using +.B F_SETOWN +is equivalent to an +.BR ioctl (2) +call with the +.B FIOSETOWN +or +.B SIOCSPGRP +argument. +.PP +When the network signals an error condition to the protocol module (e.g., +using a ICMP message for IP) the pending error flag is set for the socket. +The next operation on this socket will return the error code of the pending +error. +For some protocols it is possible to enable a per-socket error queue +to retrieve detailed information about the error; see +.B IP_RECVERR +in +.BR ip (7). +.PP +The operation of sockets is controlled by socket level +.IR options . +These options are defined in +.IR . +The functions +.BR setsockopt (2) +and +.BR getsockopt (2) +are used to set and get options, respectively. +.SH "RETURN VALUE" +On success, a file descriptor for the new socket is returned. +On error, \-1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +.TP +.B EACCES +Permission to create a socket of the specified type and/or protocol +is denied. +.TP +.B EAFNOSUPPORT +The implementation does not support the specified address family. +.TP +.B EINVAL +Unknown protocol, or protocol family not available. +.TP +.B EMFILE +Process file table overflow. +.TP +.B ENFILE +The system limit on the total number of open files has been reached. +.TP +.BR ENOBUFS " or " ENOMEM +Insufficient memory is available. +The socket cannot be +created until sufficient resources are freed. +.TP +.B EPROTONOSUPPORT +The protocol type or the specified protocol is not +supported within this domain. +.PP +Other errors may be generated by the underlying protocol modules. +.SH "CONFORMING TO" +4.4BSD, POSIX.1-2001. +.BR socket () +appeared in 4.2BSD. +It is generally portable to/from +non-BSD systems supporting clones of the BSD socket layer (including +System V variants). +.SH NOTES +POSIX.1-2001 does not require the inclusion of +.IR , +and this header file is not required on Linux. +However, some historical (BSD) implementations required this header +file, and portable applications are probably wise to include it. + +The manifest constants used under 4.x BSD for protocol families +are +.BR PF_UNIX , +.BR PF_INET , +etc., while +.B AF_UNIX +etc. are used for address +families. +However, already the BSD man page promises: "The protocol +family generally is the same as the address family", and subsequent +standards use AF_* everywhere. +.SH BUGS +.B SOCK_UUCP +is not implemented yet. +.SH EXAMPLE +An example of the use of +.BR socket () +is shown in +.BR getaddrinfo (3). +.SH "SEE ALSO" +.BR accept (2), +.BR bind (2), +.BR connect (2), +.BR fcntl (2), +.BR getpeername (2), +.BR getsockname (2), +.BR getsockopt (2), +.BR ioctl (2), +.BR listen (2), +.BR read (2), +.BR recv (2), +.BR select (2), +.BR send (2), +.BR shutdown (2), +.BR socketpair (2), +.BR write (2), +.BR getprotoent (3), +.BR ip (7), +.BR socket (7), +.BR tcp (7), +.BR udp (7), +.BR unix (7) +.PP +\(lqAn Introductory 4.3BSD Interprocess Communication Tutorial\(rq +is reprinted in +.I UNIX Programmer's Supplementary Documents Volume 1. +.PP +\(lqBSD Interprocess Communication Tutorial\(rq +is reprinted in +.I UNIX Programmer's Supplementary Documents Volume 1. diff --git a/man3/rtnetlink.3 b/man3/rtnetlink.3 index a7d7d2f37..581e2409d 100644 --- a/man3/rtnetlink.3 +++ b/man3/rtnetlink.3 @@ -1,8 +1,118 @@ -.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual" -.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual" -.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual" -.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual" -.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual" -.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual" -.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual" -.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual" +.\" This man page is Copyright (C) 1999 Andi Kleen . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" $Id: rtnetlink.3,v 1.2 1999/05/18 10:35:10 freitag Exp $ +.TH RTNETLINK 3 1999-05-14 "GNU" "Linux Programmer's Manual" +.SH NAME +rtnetlink \- macros to manipulate rtnetlink messages +.SH SYNOPSIS +.B #include +.br +.B #include +.br +.B #include +.br +.B #include + +.BI "rtnetlink_socket = socket(PF_NETLINK, int " socket_type \ +", NETLINK_ROUTE);" +.sp +.BI "int RTA_OK(struct rtattr *" rta ", int " rtabuflen ); +.sp +.BI "void *RTA_DATA(struct rtattr *" rta ); +.sp +.BI "unsigned int RTA_PAYLOAD(struct rtattr *" rta ); +.sp +.BI "struct rtattr *RTA_NEXT(struct rtattr *" rta \ +", unsigned int " rtabuflen ); +.sp +.BI "unsigned int RTA_LENGTH(unsigned int " length ); +.sp +.BI "unsigned int RTA_SPACE(unsigned int "length ); +.SH DESCRIPTION +All +.BR rtnetlink (7) +messages consist of a +.BR netlink (7) +message header and appended attributes. +The attributes should be only +manipulated using the macros provided here. +.PP +.BI RTA_OK( rta ", " attrlen ) +returns true if +.I rta +points to a valid routing attribute; +.I attrlen +is the running length of the attribute buffer. +When not true then you must assume there are no more attributes in the +message, even if +.I attrlen +is non-zero. +.PP +.BI RTA_DATA( rta ) +returns a pointer to the start of this attribute's data. +.PP +.BI RTA_PAYLOAD( rta ) +returns the length of this attribute's data. +.PP +.BI RTA_NEXT( rta ", " attrlen ) +gets the next attribute after +.IR rta . +Calling this macro will update +.IR attrlen . +You should use +.B RTA_OK +to check the validity of the returned pointer. +.PP +.BI RTA_LENGTH( len ) +returns the length which is required for +.I len +bytes of data plus the header. +.PP +.BI RTA_SPACE( len ) +returns the amount of space which will be needed in a message with +.I len +bytes of data. +.SH CONFORMING TO +These macros are non-standard Linux extensions. +.SH BUGS +This manual page is incomplete. +.SH EXAMPLE +.\" FIXME ? would be better to use libnetlink in the EXAMPLE code here + +Creating a rtnetlink message to set the MTU of a device: +.nf + + struct { + struct nlmsghdr nh; + struct ifinfomsg if; + char attrbuf[512]; + } req; + + struct rtattr *rta; + unsigned int mtu = 1000; + + int rtnetlink_sk = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE); + + memset(&req, 0, sizeof(req)); + req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); + req.nh.nlmsg_flags = NLM_F_REQUEST; + req.nh.nlmsg_type = RTML_NEWLINK; + req.if.ifi_family = AF_UNSPEC; + req.if.ifi_index = INTERFACE_INDEX; + req.if.ifi_change = 0xffffffff; /* ???*/ + rta = (struct rtattr *)(((char *) &req) + + NLMSG_ALIGN(n\->nlmsg_len)); + rta\->rta_type = IFLA_MTU; + rta\->rta_len = sizeof(unsigned int); + req.n.nlmsg_len = NLMSG_ALIGN(req.n.nlmsg_len) + + RTA_LENGTH(sizeof(mtu)); + memcpy(RTA_DATA(rta), &mtu, sizeof(mtu)); + send(rtnetlink_sk, &req, req.n.nlmsg_len); +.fi +.SH "SEE ALSO" +.BR netlink (3), +.BR netlink (7), +.BR rtnetlink (7) diff --git a/man7/arp.7 b/man7/arp.7 index 768543d49..c1879c230 100644 --- a/man7/arp.7 +++ b/man7/arp.7 @@ -1,8 +1,275 @@ -.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual" +'\" t +.\" This man page is Copyright (C) 1999 Matthew Wilcox . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" Modified June 1999 Andi Kleen +.\" $Id: arp.7,v 1.10 2000/04/27 19:31:38 ak Exp $ +.TH ARP 7 2007-07-27 "Linux" "Linux Programmer's Manual" +.SH NAME +arp \- Linux ARP kernel module. +.SH DESCRIPTION +This kernel protocol module implements the Address Resolution +Protocol defined in RFC\ 826. +It is used to convert between Layer2 hardware addresses +and IPv4 protocol addresses on directly connected networks. +The user normally doesn't interact directly with this module except to +configure it; +instead it provides a service for other protocols in the kernel. + +A user process can receive ARP packets by using +.BR packet (7) +sockets. +There is also a mechanism for managing the ARP cache +in user-space by using +.BR netlink (7) +sockets. +The ARP table can also be controlled via +.BR ioctl (2) +on any +.B PF_INET +socket. + +The ARP module maintains a cache of mappings between hardware addresses +and protocol addresses. +The cache has a limited size so old and less +frequently used entries are garbage-collected. +Entries which are marked +as permanent are never deleted by the garbage-collector. +The cache can +be directly manipulated by the use of ioctls and its behavior can be +tuned by the sysctls defined below. + +When there is no positive feedback for an existing mapping after some +time (see the sysctls below) a neighbor cache entry is considered stale. +Positive feedback can be gotten from a higher layer; for example from +a successful TCP ACK. +Other protocols can signal forward progress +using the +.B MSG_CONFIRM +flag to +.BR sendmsg (2). +When there is no forward progress ARP tries to reprobe. +It first tries to ask a local arp daemon +.B app_solicit +times for an updated MAC address. +If that fails and an old MAC address is known an unicast probe is send +.B ucast_solicit +times. +If that fails too it will broadcast a new ARP +request to the network. +Requests are only send when there is data queued +for sending. + +Linux will automatically add a non-permanent proxy arp entry when it +receives a request for an address it forwards to and proxy arp is +enabled on the receiving interface. +When there is a reject route for the target no proxy arp entry is added. +.SS Ioctls +Three ioctls are available on all +.B PF_INET +sockets. +They take a pointer to a +.I struct arpreq +as their argument. + +.in +4n +.nf +struct arpreq { + struct sockaddr arp_pa; /* protocol address */ + struct sockaddr arp_ha; /* hardware address */ + int arp_flags; /* flags */ + struct sockaddr arp_netmask; /* netmask of protocol address */ + char arp_dev[16]; +}; +.fi +.in + +.BR SIOCSARP ", " SIOCDARP " and " SIOCGARP +respectively set, delete and get an ARP mapping. +Setting & deleting ARP maps are privileged operations and may +only be performed by a process with the +.B CAP_NET_ADMIN +capability or an effective UID of 0. + +.I arp_pa +must be an +.B AF_INET +socket and +.I arp_ha +must have the same type as the device which is specified in +.IR arp_dev . +.I arp_dev +is a zero-terminated string which names a device. +.RS +.TS +tab(:) allbox; +c s +l l. +\fIarp_flags\fR +flag:meaning +ATF_COM:Lookup complete +ATF_PERM:Permanent entry +ATF_PUBL:Publish entry +ATF_USETRAILERS:Trailers requested +ATF_NETMASK:Use a netmask +ATF_DONTPUB:Don't answer +.TE +.RE + +.PP +If the +.B ATF_NETMASK +flag is set, then +.I arp_netmask +should be valid. +Linux 2.2 does not support proxy network ARP entries, so this +should be set to 0xffffffff, or 0 to remove an existing proxy arp entry. +.B ATF_USETRAILERS +is obsolete and should not be used. +.SS Sysctls +ARP supports a sysctl interface to configure parameters on a global +or per-interface basis. +The sysctls can be accessed by reading or writing the +.I /proc/sys/net/ipv4/neigh/*/* +files or with the +.BR sysctl (2) +interface. +Each interface in the system has its own directory in +/proc/sys/net/ipv4/neigh/. +The setting in the "default" directory is used for all newly created +devices. +Unless otherwise specified time-related sysctls are specified +in seconds. +.TP +.B anycast_delay +The maximum number of jiffies to delay before replying to a +IPv6 neighbor solicitation message. +Anycast support is not yet implemented. +Defaults to 1 second. +.TP +.B app_solicit +The maximum number of probes to send to the user space ARP daemon via +netlink before dropping back to multicast probes (see +.IR mcast_solicit ). +Defaults to 0. +.TP +.B base_reachable_time +Once a neighbor has been found, the entry is considered to be valid +for at least a random value between +.IR base_reachable_time "/2 and 3*" base_reachable_time /2. +An entry's validity will be extended if it receives positive feedback +from higher level protocols. +Defaults to 30 seconds. +.TP +.B delay_first_probe_time +Delay before first probe after it has been decided that a neighbor +is stale. +Defaults to 5 seconds. +.TP +.B gc_interval +How frequently the garbage collector for neighbor entries +should attempt to run. +Defaults to 30 seconds. +.TP +.B gc_stale_time +Determines how often to check for stale neighbor entries. +When a neighbor entry is considered stale it is resolved again before +sending data to it. +Defaults to 60 seconds. +.TP +.B gc_thresh1 +The minimum number of entries to keep in the ARP cache. +The garbage collector will not run if there are fewer than +this number of entries in the cache. +Defaults to 128. +.TP +.B gc_thresh2 +The soft maximum number of entries to keep in the ARP cache. +The garbage collector will allow the number of entries to exceed +this for 5 seconds before collection will be performed. +Defaults to 512. +.TP +.B gc_thresh3 +The hard maximum number of entries to keep in the ARP cache. +The garbage collector will always run if there are more than +this number of entries in the cache. +Defaults to 1024. +.TP +.B locktime +The minimum number of jiffies to keep an ARP entry in the cache. +This prevents ARP cache thrashing if there is more than one potential +mapping (generally due to network misconfiguration). +Defaults to 1 second. +.TP +.B mcast_solicit +The maximum number of attempts to resolve an address by +multicast/broadcast before marking the entry as unreachable. +Defaults to 3. +.TP +.B proxy_delay +When an ARP request for a known proxy-ARP address is received, delay up to +.I proxy_delay +jiffies before replying. +This is used to prevent network flooding in some cases. +Defaults to 0.8 seconds. +.TP +.B proxy_qlen +The maximum number of packets which may be queued to proxy-ARP addresses. +Defaults to 64. +.TP +.B retrans_time +The number of jiffies to delay before retransmitting a request. +Defaults to 1 second. +.TP +.B ucast_solicit +The maximum number of attempts to send unicast probes before asking +the ARP daemon (see +.IR app_solicit ). +Defaults to 3. +.TP +.B unres_qlen +The maximum number of packets which may be queued for each unresolved +address by other network layers. +Defaults to 3. +.SH VERSIONS +The +.I struct arpreq +changed in Linux 2.0 to include the +.I arp_dev +member and the ioctl numbers changed at the same time. +Support for the old ioctls was dropped in Linux 2.2. + +Support for proxy arp entries for networks (netmask not equal 0xffffffff) +was dropped in Linux 2.2. +It is replaced by automatic proxy arp setup by +the kernel for all reachable hosts on other interfaces (when +forwarding and proxy arp is enabled for the interface). + +The +.I neigh/* +sysctls did not exist before Linux 2.2. +.SH BUGS +Some timer settings are specified in jiffies, which is architecture- +and kernel version-dependent; see +.BR time (7). + +There is no way to signal positive feedback from user space. +This means connection oriented protocols implemented in user space +will generate excessive ARP traffic, because ndisc will regularly +reprobe the MAC address. +The same problem applies for some kernel protocols (e.g., NFS over UDP). + +This man page mashes IPv4 specific and shared between IPv4 and IPv6 +functionality together. +.SH "SEE ALSO" +.BR capabilities (7), +.BR ip (7) +.PP +RFC\ 826 for a description of ARP. +.br +RFC\ 2461 for a description of IPv6 neighbor discovery and the base +algorithms used. +.LP +Linux 2.2+ IPv4 ARP uses the IPv6 algorithms when applicable. diff --git a/man7/ddp.7 b/man7/ddp.7 index 9b672eb86..0831993fb 100644 --- a/man7/ddp.7 +++ b/man7/ddp.7 @@ -1,8 +1,251 @@ -.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH DDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" +.\" This man page is Copyright (C) 1998 Alan Cox. +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" $Id: ddp.7,v 1.3 1999/05/13 11:33:22 freitag Exp $ +.TH DDP 7 1999-05-01 "Linux" "Linux Programmer's Manual" +.SH NAME +ddp \- Linux AppleTalk protocol implementation +.SH SYNOPSIS +.B #include +.br +.B #include +.sp +.IB ddp_socket " = socket(PF_APPLETALK, SOCK_DGRAM, 0);" +.br +.IB raw_socket " = socket(PF_APPLETALK, SOCK_RAW, " protocol ");" +.SH DESCRIPTION +Linux implements the Appletalk protocols described in +.IR "Inside Appletalk" . +Only the DDP layer and AARP are present in +the kernel. +They are designed to be used via the +.B netatalk +protocol +libraries. +This page documents the interface for those who wish or need to +use the DDP layer directly. +.PP +The communication between Appletalk and the user program works using a +BSD-compatible socket interface. +For more information on sockets, see +.BR socket (7). +.PP +An AppleTalk socket is created by calling the +.BR socket (2) +function with a +.B PF_APPLETALK +socket family argument. +Valid socket types are +.B SOCK_DGRAM +to open a +.B ddp +socket or +.B SOCK_RAW +to open a +.B raw +socket. +.I protocol +is the Appletalk protocol to be received or sent. +For +.B SOCK_RAW +you must specify +.BR ATPROTO_DDP . +.PP +Raw sockets may be only opened by a process with effective user ID 0 +or when the process has the +.B CAP_NET_RAW +capability. +.SS "Address Format" +An Appletalk socket address is defined as a combination of a network number, +a node number, and a port number. +.PP +.in +4n +.nf +struct at_addr { + unsigned short s_net; + unsigned char s_node; +}; + +struct sockaddr_atalk { + sa_family_t sat_family; /* address family */ + unsigned char sat_port; /* port */ + struct at_addr sat_addr; /* net/node */ +}; +.fi +.in +.PP +.I sat_family +is always set to +.BR AF_APPLETALK . +.I sat_port +contains the port. +The port numbers below 129 are known as +.I reserved ports. +Only processes with the effective user ID 0 or the +.B CAP_NET_BIND_SERVICE +capability may +.BR bind (2) +to these sockets. +.I sat_addr +is the host address. +The +.I net +member of +.I struct at_addr +contains the host network in network byte order. +The value of +.B AT_ANYNET +is a +wildcard and also implies \(lqthis network.\(rq +The +.I node +member of +.I struct at_addr +contains the host node number. +The value of +.B AT_ANYNODE +is a +wildcard and also implies \(lqthis node.\(rq The value of +.B ATADDR_BCAST +is a link +local broadcast address. +.\" FIXME this doesn't make sense [johnl] +.SS "Socket Options" +No protocol-specific socket options are supported. +.SS Sysctls +IP supports a sysctl interface to configure some global AppleTalk +parameters. +The sysctls can be accessed by reading or writing the +.I /proc/sys/net/atalk/* +files or with the +.BR sysctl (2) +interface. +.TP +.B aarp-expiry-time +The time interval (in seconds) before an AARP cache entry expires. +.TP +.B aarp-resolve-time +The time interval (in seconds) before an AARP cache entry is resolved. +.TP +.B aarp-retransmit-limit +The number of retransmissions of an AARP query before the node is declared +dead. +.TP +.B aarp-tick-time +The timer rate (in seconds) for the timer driving AARP. +.PP +The default values match the specification and should never need to be +changed. +.SS Ioctls +All ioctls described in +.BR socket (7) +apply to ddp. +.\" FIXME Add a section about multicasting +.SH ERRORS +.\" FIXME document all errors. We should really fix the kernels to +.\" give more uniform error returns (ENOMEM vs ENOBUFS, EPERM vs +.\" EACCES etc.) +.TP +.B EACCES +The user tried to execute an operation without the necessary permissions. +These include sending to a broadcast address without +having the broadcast flag set, +and trying to bind to a reserved port without effective user ID 0 or +.BR CAP_NET_BIND_SERVICE . +.TP +.B EADDRINUSE +Tried to bind to an address already in use. +.TP +.B EADDRNOTAVAIL +A nonexistent interface was requested or the requested source address was +not local. +.TP +.B EAGAIN +Operation on a non-blocking socket would block. +.TP +.B EALREADY +A connection operation on a non-blocking socket is already in progress. +.TP +.B ECONNABORTED +A connection was closed during an +.BR accept (2). +.TP +.B EHOSTUNREACH +No routing table entry matches the destination address. +.TP +.B EINVAL +Invalid argument passed. +.TP +.B EISCONN +.BR connect (2) +was called on an already connected socket. +.TP +.B EMSGSIZE +Datagram is bigger than the DDP MTU. +.TP +.B ENODEV +Network device not available or not capable of sending IP. +.TP +.B ENOENT +.B SIOCGSTAMP +was called on a socket where no packet arrived. +.TP +.BR ENOMEM " and " ENOBUFS +Not enough memory available. +.TP +.B ENOPKG +A kernel subsystem was not configured. +.TP +.BR ENOPROTOOPT " and " EOPNOTSUPP +Invalid socket option passed. +.TP +.B ENOTCONN +The operation is only defined on a connected socket, but the socket wasn't +connected. +.TP +.B EPERM +User doesn't have permission to set high priority, +make a configuration change, +or send signals to the requested process or group, +.TP +.B EPIPE +The connection was unexpectedly closed or shut down by the other end. +.TP +.B ESOCKTNOSUPPORT +The socket was unconfigured, or an unknown socket type was requested. +.SH VERSIONS +Appletalk is supported by Linux 2.0 or higher. +The +.B sysctl +interface is +new in Linux 2.2. +.SH NOTES +Be very careful with the +.B SO_BROADCAST +option \- it is not privileged in Linux. +It is easy to overload the network +with careless sending to broadcast addresses. +.SS Compatibility +The basic AppleTalk socket interface is compatible with +.B netatalk +on BSD-derived systems. +Many BSD systems fail to check +.B SO_BROADCAST +when sending broadcast frames; this can lead to compatibility problems. +.PP +The +raw +socket mode is unique to Linux and exists to support the alternative CAP +package and AppleTalk monitoring tools more easily. +.SH BUGS +There are too many inconsistent error values. +.PP +The ioctls used to configure routing tables, devices, +AARP tables and other devices are not yet described. +.SH "SEE ALSO" +.BR recvmsg (2), +.BR sendmsg (2), +.BR capabilities (7), +.BR socket (7) diff --git a/man7/ip.7 b/man7/ip.7 index 336ae5a9f..f26f31bb2 100644 --- a/man7/ip.7 +++ b/man7/ip.7 @@ -1,8 +1,1033 @@ -.TH IP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IP 7 2008-08-07 "Linux" "Linux Programmer's Manual" +'\" t +.\" Don't change the line above. it tells man that tbl is needed. +.\" This man page is Copyright (C) 1999 Andi Kleen . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" $Id: ip.7,v 1.19 2000/12/20 18:10:31 ak Exp $ +.TH IP 7 2001-06-19 "Linux" "Linux Programmer's Manual" +.SH NAME +ip \- Linux IPv4 protocol implementation +.SH SYNOPSIS +.B #include +.br +.\" .B #include -- does not exist anymore +.\" .B #include -- never include +.B #include +.br +.B #include \fR/* superset of previous */ +.sp +.IB tcp_socket " = socket(PF_INET, SOCK_STREAM, 0);" +.br +.IB udp_socket " = socket(PF_INET, SOCK_DGRAM, 0);" +.br +.IB raw_socket " = socket(PF_INET, SOCK_RAW, " protocol ");" +.SH DESCRIPTION +Linux implements the Internet Protocol, version 4, +described in RFC\ 791 and RFC\ 1122. +.B ip +contains a level 2 +multicasting implementation conforming to RFC\ 1112. +It also contains an IP router including a packet filter. +.\" FIXME has someone verified that 2.1 is really 1812 compliant? +.PP +The programming interface is BSD sockets compatible. +For more information on sockets, see +.BR socket (7). +.PP +An IP socket is created by calling the +.BR socket (2) +function as +.BR "socket(PF_INET, socket_type, protocol)" . +Valid socket types are +.B SOCK_STREAM +to open a +.BR tcp (7) +socket, +.B SOCK_DGRAM +to open a +.BR udp (7) +socket, or +.B SOCK_RAW +to open a +.BR raw (7) +socket to access the IP protocol directly. +.I protocol +is the IP protocol in the IP header to be received or sent. +The only valid values for +.I protocol +are +.B 0 +and +.B IPPROTO_TCP +for TCP sockets and +.B 0 +and +.B IPPROTO_UDP +for UDP sockets. +For +.B SOCK_RAW +you may specify +a valid IANA IP protocol defined in +RFC\ 1700 +assigned numbers. +.PP +.\" FIXME ip current does an autobind in listen, but I'm not sure +.\" if that should be documented. +When a process wants to receive new incoming packets or connections, it +should bind a socket to a local interface address using +.BR bind (2). +Only one IP socket may be bound to any given local (address, port) pair. +When +.B INADDR_ANY +is specified in the bind call the socket will be bound to +.I all +local interfaces. +When +.BR listen (2) +or +.BR connect (2) +are called on an unbound socket, it is automatically bound to a +random free port with the local address set to +.BR INADDR_ANY . + +A TCP local socket address that has been bound is unavailable for +some time after closing, +unless the +.B SO_REUSEADDR +flag has been set. +Care should be taken when using this flag as it +makes TCP less reliable. +.SS Address Format +An IP socket address is defined as a combination of an IP interface +address and a 16-bit port number. +The basic IP protocol does not supply port numbers, they +are implemented by higher level protocols like +.BR udp (7) +and +.BR tcp (7). +On raw sockets +.I sin_port +is set to the IP protocol. +.PP +.in +4n +.nf +struct sockaddr_in { + sa_family_t sin_family; /* address family: AF_INET */ + uint16_t sin_port; /* port in network byte order */ + struct in_addr sin_addr; /* internet address */ +}; + +/* Internet address. */ +struct in_addr { + uint32_t s_addr; /* address in network byte order */ +}; +.fi +.in +.PP +.I sin_family +is always set to +.BR AF_INET . +This is required; in Linux 2.2 most networking functions return +.B EINVAL +when this setting is missing. +.I sin_port +contains the port in network byte order. +The port numbers below 1024 are called +.IR "reserved ports" . +Only privileged processes (i.e., those having the +.B CAP_NET_BIND_SERVICE +capability) may +.BR bind (2) +to these sockets. +Note that the raw IPv4 protocol as such has no concept of a +port, they are only implemented by higher protocols like +.BR tcp (7) +and +.BR udp (7). +.PP +.I sin_addr +is the IP host address. +The +.I s_addr +member of +.I struct in_addr +contains the host interface address in network byte order. +.I in_addr +should be assigned one of the INADDR_* values (e.g., +.BR INADDR_ANY ) +or set using the +.BR inet_aton (3), +.BR inet_addr (3), +.BR inet_makeaddr (3) +library functions or directly with the name resolver (see +.BR gethostbyname (3)). +IPv4 addresses are divided into unicast, broadcast +and multicast addresses. +Unicast addresses specify a single interface of a host, +broadcast addresses specify all hosts on a network and multicast +addresses address all hosts in a multicast group. +Datagrams to broadcast addresses can be only sent or received when the +.B SO_BROADCAST +socket flag is set. +In the current implementation connection oriented sockets are only allowed +to use unicast addresses. +.\" Leave a loophole for XTP @) + +Note that the address and the port are always stored in +network byte order. +In particular, this means that you need to call +.BR htons (3) +on the number that is assigned to a port. +All address/port manipulation +functions in the standard library work in network byte order. + +There are several special addresses: +.B INADDR_LOOPBACK +(127.0.0.1) +always refers to the local host via the loopback device; +.B INADDR_ANY +(0.0.0.0) +means any address for binding; +.B INADDR_BROADCAST +(255.255.255.255) +means any host and has the same effect on bind as +.B INADDR_ANY +for historical reasons. +.SS Socket Options +IP supports some protocol-specific socket options that can be set with +.BR setsockopt (2) +and read with +.BR getsockopt (2). +The socket option level for IP is +.BR IPPROTO_IP . +.\" or SOL_IP on Linux +A boolean integer flag is zero when it is false, otherwise true. +.\" +.\" FIXME Document IP_FREEBIND +.\" +.TP +.B IP_OPTIONS +Sets or get the IP options to be sent with every packet from this +socket. +The arguments are a pointer to a memory buffer containing the options +and the option length. +The +.BR setsockopt (2) +call sets the IP options associated with a socket. +The maximum option size for IPv4 is 40 bytes. +See RFC\ 791 for the allowed +options. +When the initial connection request packet for a +.B SOCK_STREAM +socket contains IP options, the IP options will be set automatically +to the options from the initial packet with routing headers reversed. +Incoming packets are not allowed to change options after the connection +is established. +The processing of all incoming source routing options +is disabled by default and can be enabled by using the +.B accept_source_route +sysctl. +Other options like timestamps are still handled. +For datagram sockets, IP options can be only set by the local user. +Calling +.BR getsockopt (2) +with +.B IP_OPTIONS +puts the current IP options used for sending into the supplied buffer. +.TP +.B IP_PKTINFO +Pass an +.B IP_PKTINFO +ancillary message that contains a +.I pktinfo +structure that supplies some information about the incoming packet. +This only works for datagram oriented sockets. +The argument is a flag that tells the socket whether the +.B IP_PKTINFO +message should be passed or not. +The message itself can only be sent/retrieved +as control message with a packet using +.BR recvmsg (2) +or +.BR sendmsg (2). +.IP +.in +4n +.nf +struct in_pktinfo { + unsigned int ipi_ifindex; /* Interface index */ + struct in_addr ipi_spec_dst; /* Local address */ + struct in_addr ipi_addr; /* Header Destination + address */ +}; +.fi +.in +.IP +.\" FIXME elaborate on that. +.I ipi_ifindex +is the unique index of the interface the packet was received on. +.I ipi_spec_dst +is the local address of the packet and +.I ipi_addr +is the destination address in the packet header. +If +.B IP_PKTINFO +is passed to +.BR sendmsg (2) +and +.\" This field is grossly misnamed +.I ipi_spec_dst +is not zero, then it is used as the local source address for the routing +table lookup and for setting up IP source route options. +When +.I ipi_ifindex +is not zero the primary local address of the interface specified by the +index overwrites +.I ipi_spec_dst +for the routing table lookup. +.TP +.B IP_RECVTOS +If enabled the +.B IP_TOS +ancillary message is passed with incoming packets. +It contains a byte which specifies the Type of Service/Precedence +field of the packet header. +Expects a boolean integer flag. +.TP +.B IP_RECVTTL +When this flag is set +pass a +.B IP_TTL +control message with the time to live +field of the received packet as a byte. +Not supported for +.B SOCK_STREAM +sockets. +.TP +.B IP_RECVOPTS +Pass all incoming IP options to the user in a +.B IP_OPTIONS +control message. +The routing header and other options are already filled in +for the local host. +Not supported for +.B SOCK_STREAM +sockets. +.TP +.B IP_RETOPTS +Identical to +.B IP_RECVOPTS +but returns raw unprocessed options with timestamp and route record +options not filled in for this hop. +.TP +.B IP_TOS +Set or receive the Type-Of-Service (TOS) field that is sent +with every IP packet originating from this socket. +It is used to prioritize packets on the network. +TOS is a byte. +There are some standard TOS flags defined: +.B IPTOS_LOWDELAY +to minimize delays for interactive traffic, +.B IPTOS_THROUGHPUT +to optimize throughput, +.B IPTOS_RELIABILITY +to optimize for reliability, +.B IPTOS_MINCOST +should be used for "filler data" where slow transmission doesn't matter. +At most one of these TOS values can be specified. +Other bits are invalid and shall be cleared. +Linux sends +.B IPTOS_LOWDELAY +datagrams first by default, +but the exact behavior depends on the configured queueing discipline. +.\" FIXME elaborate on this +Some high priority levels may require superuser privileges (the +.B CAP_NET_ADMIN +capability). +The priority can also be set in a protocol independent way by the +.RB ( SOL_SOCKET ", " SO_PRIORITY ) +socket option (see +.BR socket (7)). +.TP +.B IP_TTL +Set or retrieve the current time to live field that is used in every packet +sent from this socket. +.TP +.B IP_HDRINCL +If enabled +the user supplies an IP header in front of the user data. +Only valid for +.B SOCK_RAW +sockets. +See +.BR raw (7) +for more information. +When this flag is enabled the values set by +.BR IP_OPTIONS , +.B IP_TTL +and +.B IP_TOS +are ignored. +.TP +.BR IP_RECVERR " (defined in \fI\fP)" +Enable extended reliable error message passing. +When enabled on a datagram socket all +generated errors will be queued in a per-socket error queue. +When the user +receives an error from a socket operation the errors can +be received by calling +.BR recvmsg (2) +with the +.B MSG_ERRQUEUE +flag set. +The +.I sock_extended_err +structure describing the error will be passed in an ancillary message with +the type +.B IP_RECVERR +and the level +.BR IPPROTO_IP . +.\" or SOL_IP on Linux +This is useful for reliable error handling on unconnected sockets. +The received data portion of the error queue +contains the error packet. +.IP +The +.B IP_RECVERR +control message contains a +.I sock_extended_err +structure: +.IP +.in +4n +.ne 18 +.nf +#define SO_EE_ORIGIN_NONE 0 +#define SO_EE_ORIGIN_LOCAL 1 +#define SO_EE_ORIGIN_ICMP 2 +#define SO_EE_ORIGIN_ICMP6 3 + +struct sock_extended_err { + uint32_t ee_errno; /* error number */ + uint8_t ee_origin; /* where the error originated */ + uint8_t ee_type; /* type */ + uint8_t ee_code; /* code */ + uint8_t ee_pad; + uint32_t ee_info; /* additional information */ + uint32_t ee_data; /* other data */ + /* More data may follow */ +}; + +struct sockaddr *SO_EE_OFFENDER(struct sock_extended_err *); +.fi +.in +.IP +.I ee_errno +contains the +.I errno +number of the queued error. +.I ee_origin +is the origin code of where the error originated. +The other fields are protocol-specific. +The macro +.B SO_EE_OFFENDER +returns a pointer to the address of the network object +where the error originated from given a pointer to the ancillary message. +If this address is not known, the +.I sa_family +member of the +.I sockaddr +contains +.B AF_UNSPEC +and the other fields of the +.I sockaddr +are undefined. +.IP +IP uses the +.I sock_extended_err +structure as follows: +.I ee_origin +is set to +.B SO_EE_ORIGIN_ICMP +for errors received as an ICMP packet, or +.B SO_EE_ORIGIN_LOCAL +for locally generated errors. +Unknown values should be ignored. +.I ee_type +and +.I ee_code +are set from the type and code fields of the ICMP header. +.I ee_info +contains the discovered MTU for +.B EMSGSIZE +errors. +The message also contains the +.I sockaddr_in of the node +caused the error, which can be accessed with the +.B SO_EE_OFFENDER +macro. +The +.I sin_family +field of the SO_EE_OFFENDER address is +.B AF_UNSPEC +when the source was unknown. +When the error originated from the network, all IP options +.RI ( IP_OPTIONS ", " IP_TTL ", " +etc.) enabled on the socket and contained in the +error packet are passed as control messages. +The payload of the packet +causing the error is returned as normal payload. +.\" FIXME . Is it a good idea to document that? It is a dubious feature. +.\" On +.\" .B SOCK_STREAM +.\" sockets, +.\" .B IP_RECVERR +.\" has slightly different semantics. Instead of +.\" saving the errors for the next timeout, it passes all incoming +.\" errors immediately to the user. +.\" This might be useful for very short-lived TCP connections which +.\" need fast error handling. Use this option with care: +.\" it makes TCP unreliable +.\" by not allowing it to recover properly from routing +.\" shifts and other normal +.\" conditions and breaks the protocol specification. +Note that TCP has no error queue; +.B MSG_ERRQUEUE +is not permitted on +.B SOCK_STREAM +sockets. +.B IP_RECVERR +is valid for TCP, but all errors are +returned by socket function return or +.B SO_ERROR +only. +.IP +For raw sockets, +.B IP_RECVERR +enables passing of all received ICMP errors to the +application, otherwise errors are only reported on connected sockets +.IP +It sets or retrieves an integer boolean flag. +.B IP_RECVERR +defaults to off. +.TP +.B IP_MTU_DISCOVER +Sets or receives the Path MTU Discovery setting +for a socket. +When enabled, Linux will perform Path MTU Discovery +as defined in RFC\ 1191 +on this socket. +The don't fragment flag is set on all outgoing datagrams. +The system-wide default is controlled by the +.B ip_no_pmtu_disc +sysctl for +.B SOCK_STREAM +sockets, and disabled on all others. +For +.RB non- SOCK_STREAM +sockets it is the user's responsibility to packetize the data +in MTU sized chunks and to do the retransmits if necessary. +The kernel will reject packets that are bigger than the known +path MTU if this flag is set (with +.B EMSGSIZE +). +.TS +tab(:); +c l +l l. +Path MTU discovery flags:Meaning +IP_PMTUDISC_WANT:Use per-route settings. +IP_PMTUDISC_DONT:Never do Path MTU Discovery. +IP_PMTUDISC_DO:Always do Path MTU Discovery. +IP_PMTUDISC_PROBE:Set DF but ignore Path MTU. +.TE + +When PMTU discovery is enabled the kernel automatically keeps track of +the path MTU per destination host. +When it is connected to a specific peer with +.BR connect (2) +the currently known path MTU can be retrieved conveniently using the +.B IP_MTU +socket option (e.g., after a +.B EMSGSIZE +error occurred). +It may change over time. +For connectionless sockets with many destinations +the new also MTU for a given destination can also be accessed using the +error queue (see +.BR IP_RECVERR ). +A new error will be queued for every incoming MTU update. + +While MTU discovery is in progress initial packets from datagram sockets +may be dropped. +Applications using UDP should be aware of this and not +take it into account for their packet retransmit strategy. + +To bootstrap the path MTU discovery process on unconnected sockets it +is possible to start with a big datagram size +(up to 64K-headers bytes long) and let it shrink by updates of the +path MTU. +.\" FIXME this is an ugly hack + +To get an initial estimate of the +path MTU connect a datagram socket to the destination address using +.BR connect (2) +and retrieve the MTU by calling +.BR getsockopt (2) +with the +.B IP_MTU +option. + +It is possible to implement RFC 4821 MTU probing with +.B SOCK_DGRAM +or +.B SOCK_RAW +sockets by setting a value of +.BR IP_PMTUDISC_PROBE . +This is also particularly useful for diagnostic tools such as +.BR tracepath (8) +that wish to deliberately send probe packets larger than +the observed Path MTU. +.TP +.B IP_MTU +Retrieve the current known path MTU of the current socket. +Only valid when the socket has been connected. +Returns an integer. +Only valid as a +.BR getsockopt (2). +.\" +.TP +.B IP_ROUTER_ALERT +Pass all to-be forwarded packets with the +IP Router Alert +option +set to this socket. +Only valid for raw sockets. +This is useful, for instance, for user +space RSVP daemons. +The tapped packets are not forwarded by the kernel, it is +the users responsibility to send them out again. +Socket binding is ignored, +such packets are only filtered by protocol. +Expects an integer flag. +.\" +.TP +.B IP_MULTICAST_TTL +Set or reads the time-to-live value of outgoing multicast packets for this +socket. +It is very important for multicast packets to set the smallest TTL possible. +The default is 1 which means that multicast packets don't leave the local +network unless the user program explicitly requests it. +Argument is an +integer. +.\" +.TP +.B IP_MULTICAST_LOOP +Sets or reads a boolean integer argument whether sent multicast +packets should be looped back to the local sockets. +.\" +.TP +.B IP_ADD_MEMBERSHIP +Join a multicast group. +Argument is an +.I ip_mreqn +structure. +.sp +.in +4n +.nf +struct ip_mreqn { + struct in_addr imr_multiaddr; /* IP multicast group + address */ + struct in_addr imr_address; /* IP address of local + interface */ + int imr_ifindex; /* interface index */ +}; +.fi +.in +.sp +.I imr_multiaddr +contains the address of the multicast group the application +wants to join or leave. +It must be a valid multicast address. +.I imr_address +is the address of the local interface with which the system +should join the multicast +group; if it is equal to +.B INADDR_ANY +an appropriate interface is chosen by the system. +.I imr_ifindex +is the interface index of the interface that should join/leave the +.I imr_multiaddr +group, or 0 to indicate any interface. +.IP +For compatibility, the old +.I ip_mreq +structure is still supported. +It differs from +.I ip_mreqn +only by not including +the +.I imr_ifindex +field. +Only valid as a +.BR setsockopt (2). +.\" +.TP +.B IP_DROP_MEMBERSHIP +Leave a multicast group. +Argument is an +.I ip_mreqn +or +.I ip_mreq +structure similar to +.BR IP_ADD_MEMBERSHIP . +.\" +.TP +.B IP_MULTICAST_IF +Set the local device for a multicast socket. +Argument is an +.I ip_mreqn +or +.I ip_mreq +structure similar to +.BR IP_ADD_MEMBERSHIP . +.IP +When an invalid socket option is passed, +.B ENOPROTOOPT +is returned. +.SS Sysctls +The IP protocol +supports the sysctl interface to configure some global options. +The sysctls can be accessed by reading or writing the +.I /proc/sys/net/ipv4/* +files or using the +.\" FIXME As at 2.6.12, 14 Jun 2005, the following are undocumented: +.\" ip_queue_maxlen +.\" ip_conntrack_max +.BR sysctl (2) +interface. +Variables described as +.I Boolean +take an integer value, with a non-zero value ("true") meaning that +the corresponding option is enabled, and a zero value ("false") +meaning that the option is disabled. +.\" +.TP +.BR ip_always_defrag " (Boolean)" +[New with kernel 2.2.13; in earlier kernel versions this feature +was controlled at compile time by the +.B CONFIG_IP_ALWAYS_DEFRAG +option; this option is not present in 2.4.x and later] + +When this boolean frag is enabled (not equal 0) incoming fragments +(parts of IP packets +that arose when some host between origin and destination decided +that the packets were too large and cut them into pieces) will be +reassembled (defragmented) before being processed, even if they are +about to be forwarded. + +Only enable if running either a firewall that is the sole link +to your network or a transparent proxy; never ever use it for a +normal router or host. +Otherwise fragmented communication can be disturbed +if the fragments travel over different links. +Defragmentation also has a large memory and CPU time cost. + +This is automagically turned on when masquerading or transparent +proxying are configured. +.\" +.TP +.B ip_autoconfig +.\" FIXME document ip_autoconfig +Not documented. +.\" +.TP +.BR ip_default_ttl " (integer; default: 64)" +Set the default time-to-live value of outgoing packets. +This can be changed per socket with the +.B IP_TTL +option. +.\" +.TP +.BR ip_dynaddr " (Boolean; default: disabled)" +Enable dynamic socket address and masquerading entry rewriting on interface +address change. +This is useful for dialup interface with changing IP addresses. +0 means no rewriting, 1 turns it on and 2 enables verbose mode. +.\" +.TP +.BR ip_forward " (Boolean; default: disabled)" +Enable IP forwarding with a boolean flag. +IP forwarding can be also set on a per interface basis. +.\" +.TP +.B ip_local_port_range +Contains two integers that define the default local port range +allocated to sockets. +Allocation starts with the first number and ends with the second number. +Note that these should not conflict with the ports used by masquerading +(although the case is handled). +Also arbitrary choices may cause problems with some firewall packet +filters that make assumptions about the local ports in use. +First number should be at least >1024, better >4096 to avoid clashes +with well known ports and to minimize firewall problems. +.\" +.TP +.BR ip_no_pmtu_disc " (Boolean; default: disabled)" +If enabled, don't do Path MTU Discovery for TCP sockets by default. +Path MTU discovery may fail if misconfigured firewalls (that drop +all ICMP packets) or misconfigured interfaces (e.g., a point-to-point +link where the both ends don't agree on the MTU) are on the path. +It is better to fix the broken routers on the path than to turn off +Path MTU Discovery globally, because not doing it incurs a high cost +to the network. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR ip_nonlocal_bind " (Boolean; default: disabled)" +If set, allows processes to +.BR bind (2) +to non-local IP addresses, +which can be quite useful, but may break some applications. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR ip6frag_time " (integer; default 30)" +Time in seconds to keep an IPv6 fragment in memory. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR ip6frag_secret_interval " (integer; default 600)" +Regeneration interval (in seconds) of the hash secret (or lifetime +for the hash secret) for IPv6 fragments. +.TP +.BR ipfrag_high_thresh " (integer), " ipfrag_low_thresh " (integer)" +If the amount of queued IP fragments reaches +.BR ipfrag_high_thresh , +the queue +is pruned down to +.BR ipfrag_low_thresh . +Contains an integer with the number of +bytes. +.TP +.B neigh/* +See +.BR arp (7). +.\" FIXME Document the conf/*/* sysctls +.\" FIXME Document the route/* sysctls +.\" FIXME document them all +.SS Ioctls +All ioctls described in +.BR socket (7) +apply to ip. +.\" 2006-04-02, mtk +.\" commented out the following because ipchains is obsolete +.\" .PP +.\" The ioctls to configure firewalling are documented in +.\" .BR ipfw (4) +.\" from the +.\" .B ipchains +.\" package. +.PP +Ioctls to configure generic device parameters are described in +.BR netdevice (7). +.\" FIXME Add a discussion of multicasting +.SH ERRORS +.\" FIXME document all errors. +.\" We should really fix the kernels to give more uniform +.\" error returns (ENOMEM vs ENOBUFS, EPERM vs EACCES etc.) +.TP +.B EACCES +The user tried to execute an operation without the necessary permissions. +These include: +sending a packet to a broadcast address without having the +.B SO_BROADCAST +flag set; +sending a packet via a +.I prohibit +route; +modifying firewall settings without superuser privileges (the +.B CAP_NET_ADMIN +capability); +binding to a reserved port without superuser privileges (the +.B CAP_NET_BIND_SERVICE +capability). +.TP +.B EADDRINUSE +Tried to bind to an address already in use. +.TP +.B EADDRNOTAVAIL +A nonexistent interface was requested or the requested source +address was +not local. +.TP +.B EAGAIN +Operation on a non-blocking socket would block. +.TP +.B EALREADY +An connection operation on a non-blocking socket is already in progress. +.TP +.B ECONNABORTED +A connection was closed during an +.BR accept (2). +.TP +.B EHOSTUNREACH +No valid routing table entry matches the destination address. +This error can be caused by a ICMP message from a remote router or +for the local routing table. +.TP +.B EINVAL +Invalid argument passed. +For send operations this can be caused by sending to a +.I blackhole +route. +.TP +.B EISCONN +.BR connect (2) +was called on an already connected socket. +.TP +.B EMSGSIZE +Datagram is bigger than an MTU on the path and it cannot be fragmented. +.TP +.BR ENOBUFS ", " ENOMEM +Not enough free memory. +This often means that the memory allocation is limited by the socket +buffer limits, not by the system memory, but this is not +100% consistent. +.TP +.B ENOENT +.B SIOCGSTAMP +was called on a socket where no packet arrived. +.TP +.B ENOPKG +A kernel subsystem was not configured. +.TP +.BR ENOPROTOOPT " and " EOPNOTSUPP +Invalid socket option passed. +.TP +.B ENOTCONN +The operation is only defined on a connected socket, but the socket wasn't +connected. +.TP +.B EPERM +User doesn't have permission to set high priority, change configuration, +or send signals to the requested process or group. +.TP +.B EPIPE +The connection was unexpectedly closed or shut down by the other end. +.TP +.B ESOCKTNOSUPPORT +The socket is not configured or an unknown socket type was requested. +.PP +Other errors may be generated by the overlaying protocols; see +.BR tcp (7), +.BR raw (7), +.BR udp (7) +and +.BR socket (7). +.SH VERSIONS +.BR IP_MTU , +.BR IP_MTU_DISCOVER , +.BR IP_PKTINFO , +.B IP_RECVERR +and +.B IP_ROUTER_ALERT +are new options in Linux 2.2. +They are also all Linux-specific and should not be used in +programs intended to be portable. +.PP +.\" FIXME +.\" To be confirmed that IP_PMTUDISC_PROBE makes it into kernel 2.6.22 +.B IP_PMTUDISC_PROBE +is new in Linux 2.6.22. +.PP +.I struct ip_mreqn +is new in Linux 2.2. +Linux 2.0 only supported +.BR ip_mreq . +.PP +The sysctls were introduced with Linux 2.2. +.SH NOTES +Be very careful with the +.B SO_BROADCAST +option \- it is not privileged in Linux. +It is easy to overload the network +with careless broadcasts. +For new application protocols +it is better to use a multicast group instead of broadcasting. +Broadcasting is discouraged. +.PP +Some other BSD sockets implementations provide +.B IP_RCVDSTADDR +and +.B IP_RECVIF +socket options to get the destination address and the interface of +received datagrams. +Linux has the more general +.B IP_PKTINFO +for the same task. +.PP +Some BSD sockets implementations also provide an +.B IP_RECVTTL +option, but an ancillary message with type +.B IP_RECVTTL +is passed with the incoming packet. +This is different from the +.B IP_TTL +option used in Linux. +.PP +Using +.B SOL_IP +socket options level isn't portable, BSD-based stacks use +.B IPPROTO_IP +level. +.SS Compatibility +For compatibility with Linux 2.0, the obsolete +.BI "socket(PF_INET, SOCK_PACKET, " protocol ) +syntax is still supported to open a +.BR packet (7) +socket. +This is deprecated and should be replaced by +.BI "socket(PF_PACKET, SOCK_RAW, " protocol ) +instead. +The main difference is the new +.I sockaddr_ll +address structure for generic link layer information instead of the old +.BR sockaddr_pkt . +.SH BUGS +There are too many inconsistent error values. +.PP +The ioctls to configure IP-specific interface options and ARP tables are +not described. +.PP +Some versions of glibc forget to declare +.IR in_pktinfo . +Workaround currently is to copy it into your program from this man page. +.PP +Receiving the original destination address with +.B MSG_ERRQUEUE +in +.I msg_name +by +.BR recvmsg (2) +does not work in some 2.2 kernels. +.\" .SH AUTHORS +.\" This man page was written by Andi Kleen. +.SH "SEE ALSO" +.BR recvmsg (2), +.BR sendmsg (2), +.BR byteorder (3), +.BR ipfw (4), +.BR capabilities (7), +.BR netlink (7), +.BR raw (7), +.BR socket (7), +.BR tcp (7), +.BR udp (7) +.PP +RFC\ 791 for the original IP specification. +.br +RFC\ 1122 for the IPv4 host requirements. +.br +RFC\ 1812 for the IPv4 router requirements. +.\" FIXME autobind INADDR REUSEADDR diff --git a/man7/ipv6.7 b/man7/ipv6.7 index c4e0bedfc..024cd5073 100644 --- a/man7/ipv6.7 +++ b/man7/ipv6.7 @@ -1,8 +1,327 @@ -.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual" +.\" This man page is Copyright (C) 2000 Andi Kleen . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" $Id: ipv6.7,v 1.3 2000/12/20 18:10:31 ak Exp $ +.TH IPV6 7 2008-07-17 "Linux" "Linux Programmer's Manual" +.SH NAME +ipv6, PF_INET6 \- Linux IPv6 protocol implementation +.SH SYNOPSIS +.B #include +.br +.B #include +.sp +.IB tcp6_socket " = socket(PF_INET6, SOCK_STREAM, 0);" +.br +.IB raw6_socket " = socket(PF_INET6, SOCK_RAW, " protocol ");" +.br +.IB udp6_socket " = socket(PF_INET6, SOCK_DGRAM, " protocol ");" +.SH DESCRIPTION +Linux 2.2 optionally implements the Internet Protocol, version 6. +This man page contains a description of the IPv6 basic API as +implemented by the Linux kernel and glibc 2.1. +The interface +is based on the BSD sockets interface; see +.BR socket (7). +.PP +The IPv6 API aims to be mostly compatible with the +.BR ip (7) +v4 API. +Only differences are described in this man page. +.PP +To bind an +.B AF_INET6 +socket to any process the local address should be copied from the +.I in6addr_any +variable which has +.I in6_addr +type. +In static initializations +.B IN6ADDR_ANY_INIT +may also be used, which expands to a constant expression. +Both of them are in network order. +.PP +The IPv6 loopback address (::1) is available in the global +.I in6addr_loopback +variable. +For initializations +.B IN6ADDR_LOOPBACK_INIT +should be used. +.PP +IPv4 connections can be handled with the v6 API by using the +v4-mapped-on-v6 address type; +thus a program only needs only to support this API type to +support both protocols. +This is handled transparently by the address +handling functions in libc. +.PP +IPv4 and IPv6 share the local port space. +When you get an IPv4 connection +or packet to a IPv6 socket its source address will be mapped +to v6 and it will be mapped to v6. +.SS "Address Format" +.in +4n +.nf +struct sockaddr_in6 { + uint16_t sin6_family; /* AF_INET6 */ + uint16_t sin6_port; /* port number */ + uint32_t sin6_flowinfo; /* IPv6 flow information */ + struct in6_addr sin6_addr; /* IPv6 address */ + uint32_t sin6_scope_id; /* Scope ID (new in 2.4) */ +}; + +struct in6_addr { + unsigned char s6_addr[16]; /* IPv6 address */ +}; +.fi +.in +.sp +.I sin6_family +is always set to +.BR AF_INET6 ; +.I sin6_port +is the protocol port (see +.I sin_port +in +.BR ip (7)); +.I sin6_flowinfo +is the IPv6 flow identifier; +.I sin6_addr +is the 128-bit IPv6 address. +.I sin6_scope_id +is an ID of depending of on the scope of the address. +It is new in Linux 2.4. +Linux only supports it for link scope addresses, in that case +.I sin6_scope_id +contains the interface index (see +.BR netdevice (7)) +.PP +IPv6 supports several address types: unicast to address a single +host, multicast to address a group of hosts, +anycast to address the nearest member of a group of hosts +(not implemented in Linux), IPv4-on-IPv6 to +address a IPv4 host, and other reserved address types. +.PP +The address notation for IPv6 is a group of 16 2-digit hexadecimal +numbers, separated with a \(aq:\(aq. +\&"::" stands for a string of 0 bits. +Special addresses are ::1 for loopback and ::FFFF: +for IPv4-mapped-on-IPv6. +.PP +The port space of IPv6 is shared with IPv4. +.SS "Socket Options" +IPv6 supports some protocol-specific socket options that can be set with +.BR setsockopt (2) +and read with +.BR getsockopt (2). +The socket option level for IPv6 is +.BR IPPROTO_IPV6 . +A boolean integer flag is zero when it is false, otherwise true. +.TP +.B IPV6_ADDRFORM +Turn an +.B AF_INET6 +socket into a socket of a different address family. +Only +.B AF_INET +is currently supported for that. +It is only allowed for IPv6 sockets +that are connected and bound to a v4-mapped-on-v6 address. +The argument is a pointer to an integer containing +.BR AF_INET . +This is useful to pass v4-mapped sockets as file descriptors to +programs that don't know how to deal with the IPv6 API. +.TP +.B IPV6_ADD_MEMBERSHIP, IPV6_DROP_MEMBERSHIP +Control membership in multicast groups. +Argument is a pointer to a +.I struct ipv6_mreq +structure. +.\" FIXME IPV6_CHECKSUM is not documented, and probably should be +.\" FIXME IPV6_JOIN_ANYCAST is not documented, and probably should be +.\" FIXME IPV6_LEAVE_ANYCAST is not documented, and probably should be +.\" FIXME IPV6_RECVPKTINFO is not documented, and probably should be +.\" FIXME IPV6_2292PKTINFO is not documented, and probably should be +.\" FIXME there are probably many other IPV6_* socket options that +.\" should be documented +.TP +.B IPV6_MTU +Set the MTU to be used for the socket. +The MTU is limited by the device +MTU or the path mtu when path mtu discovery is enabled. +Argument is a pointer to integer. +.TP +.B IPV6_MTU_DISCOVER +Control path mtu discovery on the socket. +See +.B IP_MTU_DISCOVER +in +.BR ip (7) +for details. +.TP +.B IPV6_MULTICAST_HOPS +Set the multicast hop limit for the socket. +Argument is a pointer to an +integer. +\-1 in the value means use the route default, otherwise it should be +between 0 and 255. +.TP +.B IPV6_MULTICAST_IF +Set the device for outgoing multicast packets on the socket. +This is only allowed +for +.B SOCK_DGRAM +and +.B SOCK_RAW +socket. +The argument is a pointer to an interface index (see +.BR netdevice (7)) +in an integer. +.TP +.B IPV6_MULTICAST_LOOP +Control whether the socket sees multicast packets that it has send itself. +Argument is a pointer to boolean. +.TP +.B IPV6_PKTINFO +Set delivery of the +.B IPV6_PKTINFO +control message on incoming datagrams. +Only allowed for +.B SOCK_DGRAM +or +.B SOCK_RAW +sockets. +Argument is a pointer to a boolean value in an integer. +.TP +.nh +.B IPV6_RTHDR, IPV6_AUTHHDR, IPV6_DSTOPS, IPV6_HOPOPTS, IPV6_FLOWINFO, IPV6_HOPLIMIT +.hy +Set delivery of control messages for incoming datagrams containing +extension headers from the received packet. +.B IPV6_RTHDR +delivers the routing header, +.B IPV6_AUTHHDR +delivers the authentication header, +.B IPV6_DSTOPTS +delivers the destination options, +.B IPV6_HOPOPTS +delivers the hop options, +.B IPV6_FLOWINFO +delivers an integer containing the flow ID, +.B IPV6_HOPLIMIT +delivers an integer containing the hop count of the packet. +The control messages have the same type as the socket option. +All these header options can also be set for outgoing packets +by putting the appropriate control message into the control buffer of +.BR sendmsg (2). +Only allowed for +.B SOCK_DGRAM +or +.B SOCK_RAW +sockets. +Argument is a pointer to a boolean value. +.TP +.B IPV6_RECVERR +Control receiving of asynchronous error options. +See +.B IP_RECVERR +in +.BR ip (7) +for details. +Argument is a pointer to boolean. +.TP +.B IPV6_ROUTER_ALERT +Pass forwarded packets containing a router alert hop-by-hop option to +this socket. +Only allowed for SOCK_RAW sockets. +The tapped packets are not forwarded by the kernel, it is the +user's responsibility to send them out again. +Argument is a pointer to an integer. +A positive integer indicates a router alert option value to intercept. +Packets carrying a router alert option with a value field containing +this integer will be delivered to the socket. +A negative integer disables delivery of packets with router alert options +to this socket. +.TP +.B IPV6_UNICAST_HOPS +Set the unicast hop limit for the socket. +Argument is a pointer to an integer. +\-1 in the value means use the route default, +otherwise it should be between 0 and 255. +.TP +.BR IPV6_V6ONLY " (since Linux 2.4.21 and 2.6)" +.\" See RFC 3493 +If this flag is set to true (non-zero), then the socket is restricted +to sending and receiving IPv6 packets only. +In this case, an IPv4 and an IPv6 application can bind +to a single port at the same time. + +If this flag is set to false (zero), +then the socket can be used to send and receive packets +to and from an IPv6 address or an IPv4-mapped IPv6 address. + +The argument is a pointer to a boolean value in an integer. + +The default value for this flag is defined by the contents of the file +.BR /proc/sys/net/ipv6/bindv6only . +The default value for that file is 0 (false). +.\" FLOWLABEL_MGR, FLOWINFO_SEND +.SH VERSIONS +The older +.I libinet6 +libc5 based IPv6 API implementation for Linux is not described here +and may vary in details. +.PP +Linux 2.4 will break binary compatibility for the +.I sockaddr_in6 +for 64-bit +hosts by changing the alignment of +.I in6_addr +and adding an additional +.I sin6_scope_id +field. +The kernel interfaces stay compatible, but a program including +.I sockaddr_in6 +or +.I in6_addr +into other structures may not be. +This is not +a problem for 32-bit hosts like i386. +.PP +The +.I sin6_flowinfo +field is new in Linux 2.4. +It is transparently passed/read by the kernel +when the passed address length contains it. +Some programs that pass a longer address buffer and then +check the outgoing address length may break. +.SH "NOTES" +The +.I sockaddr_in6 +structure is bigger than the generic +.IR sockaddr . +Programs that assume that all address types can be stored safely in a +.I struct sockaddr +need to be changed to use +.I struct sockaddr_storage +for that instead. +.SH BUGS +The IPv6 extended API as in RFC\ 2292 is currently only partly +implemented; +although the 2.2 kernel has near complete support for receiving options, +the macros for generating IPv6 options are missing in glibc 2.1. +.PP +IPSec support for EH and AH headers is missing. +.PP +Flow label management is not complete and not documented here. +.PP +This man page is not complete. +.SH "SEE ALSO" +.BR cmsg (3), +.BR ip (7) +.PP +RFC\ 2553: IPv6 BASIC API. +Linux tries to be compliant to this. +.PP +RFC\ 2460: IPv6 specification. diff --git a/man7/netlink.7 b/man7/netlink.7 index fcc5470df..58835fe8d 100644 --- a/man7/netlink.7 +++ b/man7/netlink.7 @@ -1,8 +1,460 @@ -.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH NETLINK 7 2008-08-07 "Linux" "Linux Programmer's Manual" +'\" t +.\" Don't change the first line, it tells man that tbl is needed. +.\" This man page is Copyright (c) 1998 by Andi Kleen. Subject to the GPL. +.\" Based on the original comments from Alexey Kuznetsov +.\" Modified 2005-12-27 by Hasso Tepper +.\" $Id: netlink.7,v 1.8 2000/06/22 13:23:00 ak Exp $ +.TH NETLINK 7 2005-12-27 "Linux" "Linux Programmer's Manual" +.SH NAME +netlink \- Communication between kernel and userspace (PF_NETLINK) +.SH SYNOPSIS +.nf +.B #include +.B #include +.B #include + +.BI "netlink_socket = socket(PF_NETLINK, " socket_type ", " netlink_family ); +.fi +.SH DESCRIPTION +Netlink is used to transfer information between kernel and +userspace processes. +It consists of a standard sockets-based interface for userspace +processes and an internal kernel API for kernel modules. +The internal kernel interface is not documented in this manual page. +There is also an obsolete netlink interface +via netlink character devices; this interface is not documented here +and is only provided for backwards compatibility. + +Netlink is a datagram-oriented service. +Both +.B SOCK_RAW +and +.B SOCK_DGRAM +are valid values for +.IR socket_type . +However, the netlink protocol does not distinguish between datagram +and raw sockets. + +.I netlink_family +selects the kernel module or netlink group to communicate with. +The currently assigned netlink families are: +.TP +.B NETLINK_ROUTE +Receives routing and link updates and may be used to modify the routing +tables (both IPv4 and IPv6), IP addresses, link parameters, +neighbor setups, queueing disciplines, traffic classes and +packet classifiers (see +.BR rtnetlink (7)). +.TP +.B NETLINK_W1 +Messages from 1-wire subsystem. +.TP +.B NETLINK_USERSOCK +Reserved for user-mode socket protocols. +.TP +.B NETLINK_FIREWALL +Transport IPv4 packets from netfilter to userspace. +Used by +.I ip_queue +kernel module. +.TP +.B NETLINK_INET_DIAG +.\" FIXME More details on NETLINK_INET_DIAG needed. +INET socket monitoring. +.TP +.B NETLINK_NFLOG +Netfilter/iptables ULOG. +.TP +.B NETLINK_XFRM +.\" FIXME More details on NETLINK_XFRM needed. +IPsec. +.TP +.B NETLINK_SELINUX +SELinux event notifications. +.TP +.B NETLINK_ISCSI +.\" FIXME More details on NETLINK_ISCSI needed. +Open-iSCSI. +.TP +.B NETLINK_AUDIT +.\" FIXME More details on NETLINK_AUDIT needed. +Auditing. +.TP +.B NETLINK_FIB_LOOKUP +.\" FIXME More details on NETLINK_FIB_LOOKUP needed. +Access to FIB lookup from userspace. +.TP +.B NETLINK_CONNECTOR +Kernel connector. +See +.I Documentation/connector/* +in the kernel source for further information. +.TP +.B NETLINK_NETFILTER +.\" FIXME More details on NETLINK_NETFILTER needed. +Netfilter subsystem. +.TP +.B NETLINK_IP6_FW +Transport IPv6 packets from netfilter to userspace. +Used by +.I ip6_queue +kernel module. +.TP +.B NETLINK_DNRTMSG +DECnet routing messages. +.TP +.B NETLINK_KOBJECT_UEVENT +.\" FIXME More details on NETLINK_KOBJECT_UEVENT needed. +Kernel messages to userspace. +.TP +.B NETLINK_GENERIC +Generic netlink family for simplified netlink usage. +.PP +Netlink messages consist of a byte stream with one or multiple +.I nlmsghdr +headers and associated payload. +The byte stream should only be accessed with the standard +.B NLMSG_* +macros. +See +.BR netlink (3) +for further information. + +In multipart messages (multiple +.I nlmsghdr +headers with associated payload in one byte stream) the first and all +following headers have the +.B NLM_F_MULTI +flag set, except for the last header which has the type +.BR NLMSG_DONE . + +After each +.I nlmsghdr +the payload follows. + +.in +4n +.nf +struct nlmsghdr { + __u32 nlmsg_len; /* Length of message including header. */ + __u16 nlmsg_type; /* Type of message content. */ + __u16 nlmsg_flags; /* Additional flags. */ + __u32 nlmsg_seq; /* Sequence number. */ + __u32 nlmsg_pid; /* PID of the sending process. */ +}; +.fi +.in + +.I nlmsg_type +can be one of the standard message types: +.B NLMSG_NOOP +message is to be ignored, +.B NLMSG_ERROR +message signals an error and the payload contains an +.I nlmsgerr +structure, +.B NLMSG_DONE +message terminates a multipart message. + +.in +4n +.nf +struct nlmsgerr { + int error; /* Negative errno or 0 for acknowledgements */ + struct nlmsghdr msg; /* Message header that caused the error */ +}; +.fi +.in + +A netlink family usually specifies more message types, see the +appropriate manual pages for that, for example, +.BR rtnetlink (7) +for +.BR NETLINK_ROUTE . + +Standard flag bits in +.I nlmsg_flags +.br +--------------------------------- +.TS +tab(:); +lB l. +NLM_F_REQUEST:Must be set on all request messages. +NLM_F_MULTI:T{ +The message is part of a multipart message terminated by +.BR NLMSG_DONE . +T} +NLM_F_ACK:Request for an acknowledgment on success. +NLM_F_ECHO:Echo this request. +.TE + +Additional flag bits for GET requests +.br +------------------------------------- +.TS +tab(:); +lB l. +NLM_F_ROOT:Return the complete table instead of a single entry. +NLM_F_MATCH:T{ +Return all entries matching criteria passed in message content. +Not implemented yet. +T} +.\" FIXME NLM_F_ATOMIC is not used any more? +NLM_F_ATOMIC:Return an atomic snapshot of the table. +NLM_F_DUMP:Convenience macro; equivalent to (NLM_F_ROOT|NLM_F_MATCH). +.TE + +Note that +.B NLM_F_ATOMIC +requires the +.B CAP_NET_ADMIN +capability or an effective UID of 0. + +Additional flag bits for NEW requests +.br +------------------------------------- +.TS +tab(:); +lB l. +NLM_F_REPLACE:Replace existing matching object. +NLM_F_EXCL:Don't replace if the object already exists. +NLM_F_CREATE:Create object if it doesn't already exist. +NLM_F_APPEND:Add to the end of the object list. +.TE + +.I nlmsg_seq +and +.I nlmsg_pid +are used to track messages. +.I nlmsg_pid +shows the origin of the message. +Note that there isn't a 1:1 relationship between +.I nlmsg_pid +and the PID of the process if the message originated from a netlink +socket. +See the +.B ADDRESS FORMATS +section for further information. + +Both +.I nlmsg_seq +and +.I nlmsg_pid +.\" FIXME Explain more about nlmsg_seq and nlmsg_pid. +are opaque to netlink core. + +Netlink is not a reliable protocol. +It tries its best to deliver a message to its destination(s), +but may drop messages when an out-of-memory condition or +other error occurs. +For reliable transfer the sender can request an +acknowledgement from the receiver by setting the +.B NLM_F_ACK +flag. +An acknowledgment is an +.B NLMSG_ERROR +packet with the error field set to 0. +The application must generate acknowledgements for +received messages itself. +The kernel tries to send an +.B NLMSG_ERROR +message for every failed packet. +A user process should follow this convention too. + +However, reliable transmissions from kernel to user are impossible +in any case. +The kernel can't send a netlink message if the socket buffer is full: +the message will be dropped and the kernel and the userspace process will +no longer have the same view of kernel state. +It is up to the application to detect when this happens (via the +.B ENOBUFS +error returned by +.BR recvmsg (2)) +and resynchronize. +.SS Address Formats +The +.I sockaddr_nl +structure describes a netlink client in user space or in the kernel. +A +.I sockaddr_nl +can be either unicast (only sent to one peer) or sent to +netlink multicast groups +.RI ( nl_groups +not equal 0). + +.in +4n +.nf +struct sockaddr_nl { + sa_family_t nl_family; /* AF_NETLINK */ + unsigned short nl_pad; /* Zero. */ + pid_t nl_pid; /* Process ID. */ + __u32 nl_groups; /* Multicast groups mask. */ +}; +.fi +.in + +.I nl_pid +is the unicast address of netlink socket. +It's always 0 if the destination is in the kernel. +For a userspace process, +.I nl_pid +is usually the PID of the process owning the destination socket. +However, +.I nl_pid +identifies a netlink socket, not a process. +If a process owns several netlink +sockets, then +.I nl_pid +can only be equal to the process ID for at most one socket. +There are two ways to assign +.I nl_pid +to a netlink socket. +If the application sets +.I nl_pid +before calling +.BR bind (2), +then it is up to the application to make sure that +.I nl_pid +is unique. +If the application sets it to 0, the kernel takes care of assigning it. +The kernel assigns the process ID to the first netlink socket the process +opens and assigns a unique +.I nl_pid +to every netlink socket that the process subsequently creates. + +.I nl_groups +is a bit mask with every bit representing a netlink group number. +Each netlink family has a set of 32 multicast groups. +When +.BR bind (2) +is called on the socket, the +.I nl_groups +field in the +.I sockaddr_nl +should be set to a bit mask of the groups which it wishes to listen to. +The default value for this field is zero which means that no multicasts +will be received. +A socket may multicast messages to any of the multicast groups by setting +.I nl_groups +to a bit mask of the groups it wishes to send to when it calls +.BR sendmsg (2) +or does a +.BR connect (2). +Only processes with an effective UID of 0 or the +.B CAP_NET_ADMIN +capability may send or listen to a netlink multicast group. +Any replies to a message received for a multicast group should be +sent back to the sending PID and the multicast group. +.SH VERSIONS +The socket interface to netlink is a new feature of Linux 2.2. + +Linux 2.0 supported a more primitive device based netlink interface +(which is still available as a compatibility option). +This obsolete interface is not described here. + +NETLINK_SELINUX appeared in Linux 2.6.4. + +NETLINK_AUDIT appeared in Linux 2.6.6. + +NETLINK_KOBJECT_UEVENT appeared in Linux 2.6.10. + +NETLINK_W1 and NETLINK_FIB_LOOKUP appeared in Linux 2.6.13. + +NETLINK_INET_DIAG, NETLINK_CONNECTOR and NETLINK_NETFILTER appeared in +Linux 2.6.14. + +NETLINK_GENERIC and NETLINK_ISCSI appeared in Linux 2.6.15. +.SH NOTES +It is often better to use netlink via +.I libnetlink +or +.I libnl +than via the low-level kernel interface. +.SH BUGS +This manual page is not complete. +.SH EXAMPLE +The following example creates a +.B NETLINK_ROUTE +netlink socket which will listen to the +.B RTMGRP_LINK +(network interface create/delete/up/down events) and +.B RTMGRP_IPV4_IFADDR +(IPv4 addresses add/delete events) multicast groups. + +.in +4n +.nf +struct sockaddr_nl sa; + +memset(&sa, 0, sizeof(sa)); +snl.nl_family = AF_NETLINK; +snl.nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR; + +fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); +bind(fd, (struct sockaddr*)&sa, sizeof(sa)); +.fi +.in + +The next example demonstrates how to send a netlink message to the +kernel (pid 0). +Note that application must take care of message sequence numbers +in order to reliably track acknowledgements. + +.in +4n +.nf +struct nlmsghdr *nh; /* The nlmsghdr with payload to send. */ +struct sockaddr_nl sa; +struct iovec iov = { (void *) nh, nh\->nlmsg_len }; +struct msghdr msg; + +msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 }; +memset(&sa, 0, sizeof(sa)); +sa.nl_family = AF_NETLINK; +nh\->nlmsg_pid = 0; +nh\->nlmsg_seq = ++sequence_number; +/* Request an ack from kernel by setting NLM_F_ACK. */ +nh\->nlmsg_flags |= NLM_F_ACK; + +sendmsg(fd, &msg, 0); +.fi +.in + +And the last example is about reading netlink message. + +.in +4n +.nf +int len; +char buf[4096]; +struct iovec iov = { buf, sizeof(buf) }; +struct sockaddr_nl sa; +struct msghdr msg; +struct nlmsghdr *nh; + +msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 }; +len = recvmsg(fd, &msg, 0); + +for (nh = (struct nlmsghdr *) buf; NLMSG_OK (nh, len); + nh = NLMSG_NEXT (nh, len)) { + /* The end of multipart message. */ + if (nh\->nlmsg_type == NLMSG_DONE) + return; + + if (nh\->nlmsg_type == NLMSG_ERROR) + /* Do some error handling. */ + ... + + /* Continue with parsing payload. */ + ... +} +.fi +.in +.SH "SEE ALSO" +.BR cmsg (3), +.BR netlink (3), +.BR capabilities (7), +.BR rtnetlink (7) +.PP +ftp://ftp.inr.ac.ru/ip-routing/iproute2* +for information about libnetlink. + +http://people.suug.ch/~tgr/libnl/ +for information about libnl. + +RFC 3549 "Linux Netlink as an IP Services Protocol" diff --git a/man7/packet.7 b/man7/packet.7 index ec8a82144..787ac7c37 100644 --- a/man7/packet.7 +++ b/man7/packet.7 @@ -1,8 +1,402 @@ -.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH PACKET 7 2008-08-07 "Linux" "Linux Programmer's Manual" +.\" This man page is Copyright (C) 1999 Andi Kleen . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" $Id: packet.7,v 1.13 2000/08/14 08:03:45 ak Exp $ +.TH PACKET 7 1999-04-29 "Linux" "Linux Programmer's Manual" +.SH NAME +packet, PF_PACKET \- packet interface on device level. +.SH SYNOPSIS +.nf +.B #include +.br +.B #include +.br +.B #include /* the L2 protocols */ +.sp +.BI "packet_socket = socket(PF_PACKET, int " socket_type ", int "protocol ); +.fi +.SH DESCRIPTION +Packet sockets are used to receive or send raw packets at the device driver +(OSI Layer 2) level. +They allow the user to implement protocol modules in user space +on top of the physical layer. + +The +.I socket_type +is either +.B SOCK_RAW +for raw packets including the link level header or +.B SOCK_DGRAM +for cooked packets with the link level header removed. +The link level +header information is available in a common format in a +.IR sockaddr_ll . +.I protocol +is the IEEE 802.3 protocol number in network order. +See the +.I +include file for a list of allowed protocols. +When protocol +is set to +.B htons(ETH_P_ALL) +then all protocols are received. +All incoming packets of that protocol type will be passed to the packet +socket before they are passed to the protocols implemented in the kernel. + +Only processes with effective UID 0 or the +.B CAP_NET_RAW +capability may open packet sockets. + +.B SOCK_RAW +packets are passed to and from the device driver without any changes in +the packet data. +When receiving a packet, the address is still parsed and +passed in a standard +.I sockaddr_ll +address structure. +When transmitting a packet, the user supplied buffer +should contain the physical layer header. +That packet is then +queued unmodified to the network driver of the interface defined by the +destination address. +Some device drivers always add other headers. +.B SOCK_RAW +is similar to but not compatible with the obsolete +.B PF_INET/SOCK_PACKET +of Linux 2.0. + +.B SOCK_DGRAM +operates on a slightly higher level. +The physical header is removed before the packet is passed to the user. +Packets sent through a +.B SOCK_DGRAM +packet socket get a suitable physical layer header based on the +information in the +.I sockaddr_ll +destination address before they are queued. + +By default all packets of the specified protocol type +are passed to a packet socket. +To only get packets from a specific interface use +.BR bind (2) +specifying an address in a +.I struct sockaddr_ll +to bind the packet socket to an interface. +Only the +.I sll_protocol +and the +.I sll_ifindex +address fields are used for purposes of binding. + +The +.BR connect (2) +operation is not supported on packet sockets. + +When the +.B MSG_TRUNC +flag is passed to +.BR recvmsg (2), +.BR recv (2), +.BR recvfrom (2) +the real length of the packet on the wire is always returned, +even when it is longer than the buffer. +.SS Address Types +The sockaddr_ll is a device independent physical layer address. + +.in +4n +.nf +struct sockaddr_ll { + unsigned short sll_family; /* Always AF_PACKET */ + unsigned short sll_protocol; /* Physical layer protocol */ + int sll_ifindex; /* Interface number */ + unsigned short sll_hatype; /* Header type */ + unsigned char sll_pkttype; /* Packet type */ + unsigned char sll_halen; /* Length of address */ + unsigned char sll_addr[8]; /* Physical layer address */ +}; +.fi +.in + +.I sll_protocol +is the standard ethernet protocol type in network order as defined +in the +.I +include file. +It defaults to the socket's protocol. +.I sll_ifindex +is the interface index of the interface +(see +.BR netdevice (7)); +0 matches any interface (only permitted for binding). +.I sll_hatype +is a ARP type as defined in the +.I +include file. +.I sll_pkttype +contains the packet type. +Valid types are +.B PACKET_HOST +for a packet addressed to the local host, +.B PACKET_BROADCAST +for a physical layer broadcast packet, +.B PACKET_MULTICAST +for a packet sent to a physical layer multicast address, +.B PACKET_OTHERHOST +for a packet to some other host that has been caught by a device driver +in promiscuous mode, and +.B PACKET_OUTGOING +for a packet originated from the local host that is looped back to a packet +socket. +These types make only sense for receiving. +.I sll_addr +and +.I sll_halen +contain the physical layer (e.g., IEEE 802.3) address and its length. +The exact interpretation depends on the device. + +When you send packets it is enough to specify +.IR sll_family , +.IR sll_addr , +.IR sll_halen , +.IR sll_ifindex . +The other fields should be 0. +.I sll_hatype +and +.I sll_pkttype +are set on received packets for your information. +For bind only +.I sll_protocol +and +.I sll_ifindex +are used. +.SS Socket Options +Packet sockets can be used to configure physical layer multicasting +and promiscuous mode. +It works by calling +.BR setsockopt (2) +on a packet socket for +.B SOL_PACKET +and one of the options +.B PACKET_ADD_MEMBERSHIP +to add a binding or +.B PACKET_DROP_MEMBERSHIP +to drop it. +They both expect a +.B packet_mreq +structure as argument: + +.in +4n +.nf +struct packet_mreq { + int mr_ifindex; /* interface index */ + unsigned short mr_type; /* action */ + unsigned short mr_alen; /* address length */ + unsigned char mr_address[8]; /* physical layer address */ +}; +.fi +.in + +.B mr_ifindex +contains the interface index for the interface whose status +should be changed. +The +.B mr_type +parameter specifies which action to perform. +.B PACKET_MR_PROMISC +enables receiving all packets on a shared medium (often known as +"promiscuous mode"), +.B PACKET_MR_MULTICAST +binds the socket to the physical layer multicast group specified in +.B mr_address +and +.BR mr_alen , +and +.B PACKET_MR_ALLMULTI +sets the socket up to receive all multicast packets arriving at +the interface. + +In addition the traditional ioctls +.BR SIOCSIFFLAGS , +.BR SIOCADDMULTI , +.B SIOCDELMULTI +can be used for the same purpose. +.SS Ioctls +.B SIOCGSTAMP +can be used to receive the timestamp of the last received packet. +Argument is a +.I struct timeval. + +In addition all standard ioctls defined in +.BR netdevice (7) +and +.BR socket (7) +are valid on packet sockets. +.SS Error Handling +Packet sockets do no error handling other than errors occurred +while passing the packet to the device driver. +They don't have the concept of a pending error. +.SH ERRORS +.TP +.B EADDRNOTAVAIL +Unknown multicast group address passed. +.TP +.B EFAULT +User passed invalid memory address. +.TP +.B EINVAL +Invalid argument. +.TP +.B EMSGSIZE +Packet is bigger than interface MTU. +.TP +.B ENETDOWN +Interface is not up. +.TP +.B ENOBUFS +Not enough memory to allocate the packet. +.TP +.B ENODEV +Unknown device name or interface index specified in interface address. +.TP +.B ENOENT +No packet received. +.TP +.B ENOTCONN +No interface address passed. +.TP +.B ENXIO +Interface address contained an invalid interface index. +.TP +.B EPERM +User has insufficient privileges to carry out this operation. + +In addition other errors may be generated by the low-level driver. +.SH VERSIONS +.B PF_PACKET +is a new feature in Linux 2.2. +Earlier Linux versions supported only +.BR SOCK_PACKET . +.PP +The include file +.I +is present since glibc 2.1. +Older systems need: +.sp +.in +4n +.nf +#include +#include +#include /* The L2 protocols */ +.fi +.in +.SH NOTES +For portable programs it is suggested to use +.B PF_PACKET +via +.BR pcap (3); +although this only covers a subset of the +.B PF_PACKET +features. + +The +.B SOCK_DGRAM +packet sockets make no attempt to create or parse the IEEE 802.2 LLC +header for a IEEE 802.3 frame. +When +.B ETH_P_802_3 +is specified as protocol for sending the kernel creates the +802.3 frame and fills out the length field; the user has to supply the LLC +header to get a fully conforming packet. +Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol +fields; instead they are supplied to the user as protocol +.B ETH_P_802_2 +with the LLC header prepended. +It is thus not possible to bind to +.BR ETH_P_802_3 ; +bind to +.B ETH_P_802_2 +instead and do the protocol multiplex yourself. +The default for sending is the standard Ethernet DIX +encapsulation with the protocol filled in. + +Packet sockets are not subject to the input or output firewall chains. +.SS Compatibility +In Linux 2.0, the only way to get a packet socket was by calling +.BI "socket(PF_INET, SOCK_PACKET, " protocol )\fR. +This is still supported but strongly deprecated. +The main difference between the two methods is that +.B SOCK_PACKET +uses the old +.I struct sockaddr_pkt +to specify an interface, which doesn't provide physical layer +independence. + +.in +4n +.nf +struct sockaddr_pkt { + unsigned short spkt_family; + unsigned char spkt_device[14]; + unsigned short spkt_protocol; +}; +.fi +.in + +.I spkt_family +contains +the device type, +.I spkt_protocol +is the IEEE 802.3 protocol type as defined in +.I +and +.I spkt_device +is the device name as a null terminated string, for example, eth0. + +This structure is obsolete and should not be used in new code. +.SH BUGS +glibc 2.1 does not have a define for +.BR SOL_PACKET . +The suggested workaround is to use: +.in +4n +.nf + +#ifndef SOL_PACKET +#define SOL_PACKET 263 +#endif + +.fi +.in +This is fixed in later glibc versions and also does not occur on +libc5 systems. + +The IEEE 802.2/803.3 LLC handling could be considered as a bug. + +Socket filters are not documented. + +The +.B MSG_TRUNC +.BR recvmsg (2) +extension is an ugly hack and should be replaced by a control message. +There is currently no way to get the original destination address of +packets via +.BR SOCK_DGRAM . +.\" .SH CREDITS +.\" This man page was written by Andi Kleen with help from Matthew Wilcox. +.\" PF_PACKET in Linux 2.2 was implemented +.\" by Alexey Kuznetsov, based on code by Alan Cox and others. +.SH "SEE ALSO" +.BR socket (2), +.BR pcap (3), +.BR capabilities (7), +.BR ip (7), +.BR raw (7), +.BR socket (7) + +RFC\ 894 for the standard IP Ethernet encapsulation. + +RFC\ 1700 for the IEEE 802.3 IP encapsulation. + +The +.I +include file for physical layer protocols. diff --git a/man7/raw.7 b/man7/raw.7 index a1b9c6924..3b42f75ee 100644 --- a/man7/raw.7 +++ b/man7/raw.7 @@ -1,8 +1,278 @@ -.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH RAW 7 2008-08-07 "Linux" "Linux Programmer's Manual" +'\" t +.\" Don't change the first line, it tells man that we need tbl. +.\" This man page is Copyright (C) 1999 Andi Kleen . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" $Id: raw.7,v 1.6 1999/06/05 10:32:08 freitag Exp $ +.TH RAW 7 1998-10-02 "Linux" "Linux Programmer's Manual" +.SH NAME +raw, SOCK_RAW \- Linux IPv4 raw sockets +.SH SYNOPSIS +.B #include +.br +.B #include +.br +.BI "raw_socket = socket(PF_INET, SOCK_RAW, int " protocol ); +.SH DESCRIPTION +Raw sockets allow new IPv4 protocols to be implemented in user space. +A raw socket receives or sends the raw datagram not +including link level headers. + +The IPv4 layer generates an IP header when sending a packet unless the +.B IP_HDRINCL +socket option is enabled on the socket. +When it is enabled, the packet must contain an IP header. +For receiving the IP header is always included in the packet. + +Only processes with an effective user ID of 0 or the +.B CAP_NET_RAW +capability are allowed to open raw sockets. + +All packets or errors matching the +.I protocol +number specified +for the raw socket are passed to this socket. +For a list of the allowed protocols see RFC\ 1700 assigned numbers and +.BR getprotobyname (3). + +A protocol of +.B IPPROTO_RAW +implies enabled +.B IP_HDRINCL +and is able to send any IP protocol that is specified in the passed +header. +Receiving of all IP protocols via +.B IPPROTO_RAW +is not possible using raw sockets. +.RS +.TS +tab(:) allbox; +c s +l l. +IP Header fields modified on sending by \fBIP_HDRINCL\fP +IP Checksum:Always filled in. +Source Address:Filled in when zero. +Packet Id:Filled in when zero. +Total Length:Always filled in. +.TE +.RE +.sp +.PP +If +.B IP_HDRINCL +is specified and the IP header has a non-zero destination address then +the destination address of the socket is used to route the packet. +When +.B MSG_DONTROUTE +is specified the destination address should refer to a local interface, +otherwise a routing table lookup is done anyway but gatewayed routes +are ignored. + +If +.B IP_HDRINCL +isn't set then IP header options can be set on raw sockets with +.BR setsockopt (2); +see +.BR ip (7) +for more information. + +In Linux 2.2 all IP header fields and options can be set using +IP socket options. +This means raw sockets are usually only needed for new +protocols or protocols with no user interface (like ICMP). + +When a packet is received, it is passed to any raw sockets which have +been bound to its protocol before it is passed to other protocol handlers +(e.g., kernel protocol modules). +.SS Address Format +Raw sockets use the standard +.I sockaddr_in +address structure defined in +.BR ip (7). +The +.I sin_port +field could be used to specify the IP protocol number, +but it is ignored for sending in Linux 2.2 and should be always +set to 0 (see BUGS) +For incoming packets +.I sin_port +is set to the protocol of the packet. +See the +.I +include file for valid IP protocols. +.SS Socket Options +Raw socket options can be set with +.BR setsockopt (2) +and read with +.BR getsockopt (2) +by passing the +.B IPPROTO_RAW +.\" Or SOL_RAW on Linux +family flag. +.TP +.B ICMP_FILTER +Enable a special filter for raw sockets bound to the +.B IPPROTO_ICMP +protocol. +The value has a bit set for each ICMP message type which +should be filtered out. +The default is to filter no ICMP messages. +.PP +In addition all +.BR ip (7) +.B IPPROTO_IP +socket options valid for datagram sockets are supported. +.SS Error Handling +Errors originating from the network are only passed to the user when the +socket is connected or the +.B IP_RECVERR +flag is enabled. +For connected sockets only +.B EMSGSIZE +and +.B EPROTO +are passed for compatibility. +With +.B IP_RECVERR +all network errors are saved in the error queue. +.SH ERRORS +.TP +.B EACCES +User tried to send to a broadcast address without having the +broadcast flag set on the socket. +.TP +.B EFAULT +An invalid memory address was supplied. +.TP +.B EINVAL +Invalid argument. +.TP +.B EMSGSIZE +Packet too big. +Either Path MTU Discovery is enabled (the +.B IP_MTU_DISCOVER +socket flag) or the packet size exceeds the maximum allowed IPv4 +packet size of 64KB. +.TP +.B EOPNOTSUPP +Invalid flag has been passed to a socket call (like +.BR MSG_OOB ). +.TP +.B EPERM +The user doesn't have permission to open raw sockets. +Only processes with an effective user ID of 0 or the +.B CAP_NET_RAW +attribute may do that. +.TP +.B EPROTO +An ICMP error has arrived reporting a parameter problem. +.SH VERSIONS +.B IP_RECVERR +and +.B ICMP_FILTER +are new in Linux 2.2. +They are Linux extensions and should not be used in portable programs. + +Linux 2.0 enabled some bug-to-bug compatibility with BSD in the +raw socket code when the +.B SO_BSDCOMPAT +socket option was set \(em since Linux 2.2, +this option no longer has that effect. +.SH NOTES +By default raw sockets do path MTU (Maximum Transmission Unit) discovery. +This means the kernel +will keep track of the MTU to a specific target IP address and return +.B EMSGSIZE +when a raw packet write exceeds it. +When this happens the application should decrease the packet size. +Path MTU discovery can be also turned off using the +.B IP_MTU_DISCOVER +socket option or the +.I ip_no_pmtu_disc +sysctl, see +.BR ip (7) +for details. +When turned off raw sockets will fragment outgoing packets +that exceed the interface MTU. +However disabling it is not recommended +for performance and reliability reasons. + +A raw socket can be bound to a specific local address using the +.BR bind (2) +call. +If it isn't bound all packets with the specified IP protocol are received. +In addition a RAW socket can be bound to a specific network device using +.BR SO_BINDTODEVICE ; +see +.BR socket (7). + +An +.B IPPROTO_RAW +socket is send only. +If you really want to receive all IP packets use a +.BR packet (7) +socket with the +.B ETH_P_IP +protocol. +Note that packet sockets don't reassemble IP fragments, +unlike raw sockets. + +If you want to receive all ICMP packets for a datagram socket +it is often better to use +.B IP_RECVERR +on that particular socket; see +.BR ip (7). + +Raw sockets may tap all IP protocols in Linux, even +protocols like ICMP or TCP which have a protocol module in the kernel. +In this case the packets are passed to both the kernel module and the raw +socket(s). +This should not be relied upon in portable programs, many other BSD +socket implementation have limitations here. + +Linux never changes headers passed from the user (except for filling +in some zeroed fields as described for +.BR IP_HDRINCL ). +This differs from many other implementations of raw sockets. + +RAW sockets are generally rather unportable and should be avoided in +programs intended to be portable. + +Sending on raw sockets should take the IP protocol from +.IR sin_port ; +this ability was lost in Linux 2.2. +The workaround is to use +.BR IP_HDRINCL . +.SH BUGS +Transparent proxy extensions are not described. + +When the +.B IP_HDRINCL +option is set datagrams will not be fragmented and are limited to +the interface MTU. + +Setting the IP protocol for sending in +.I sin_port +got lost in Linux 2.2. +The protocol that the socket was bound to or that +was specified in the initial +.BR socket (2) +call is always used. +.\" .SH AUTHORS +.\" This man page was written by Andi Kleen. +.SH "SEE ALSO" +.BR recvmsg (2), +.BR sendmsg (2), +.BR capabilities (7), +.BR ip (7), +.BR socket (7) + +.B RFC\ 1191 +for path MTU discovery. + +.B RFC\ 791 +and the +.I +include file for the IP protocol. diff --git a/man7/rtnetlink.7 b/man7/rtnetlink.7 index 667254e45..5cbe681ac 100644 --- a/man7/rtnetlink.7 +++ b/man7/rtnetlink.7 @@ -1,8 +1,449 @@ +'\" t +.\" Don't remove the line above, it tells man that tbl is needed. +.\" This man page is Copyright (C) 1999 Andi Kleen . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" Based on the original comments from Alexey Kuznetsov, written with +.\" help from Matthew Wilcox. +.\" $Id: rtnetlink.7,v 1.8 2000/01/22 01:55:04 freitag Exp $ .TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual" -.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual" -.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual" -.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual" -.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual" -.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual" -.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual" -.TH RTNETLINK 7 1999-04-30 "Linux" "Linux Programmer's Manual" +.SH NAME +rtnetlink, NETLINK_ROUTE \- Linux IPv4 routing socket +.SH SYNOPSIS +.B #include +.br +.B #include +.br +.B #include +.br +.B #include +.sp +.BI "rtnetlink_socket = socket(PF_NETLINK, int " socket_type ", NETLINK_ROUTE);" +.SH DESCRIPTION +Rtnetlink allows the kernel's routing tables to be read and altered. +It is used within the kernel to communicate between +various subsystems, though this usage is not documented here, and for +communication with user-space programs. +Network routes, ip addresses, link parameters, neighbor setups, queueing +disciplines, traffic classes and packet classifiers may all be controlled +through +.B NETLINK_ROUTE +sockets. +It is based on netlink messages, see +.BR netlink (7) +for more information. +.\" FIXME ? all these macros could be moved to rtnetlink(3) +.SS "Routing Attributes" +Some rtnetlink messages have optional attributes after the initial header: + +.in +4n +.nf +struct rtattr { + unsigned short rta_len; /* Length of option */ + unsigned short rta_type; /* Type of option */ + /* Data follows */ +}; +.fi +.in + +These attributes should be only manipulated using the RTA_* macros +or libnetlink, see +.BR rtnetlink (3). +.SS Messages +Rtnetlink consists of these message types +(in addition to standard netlink messages): +.TP +.BR RTM_NEWLINK ", " RTM_DELLINK ", " RTM_GETLINK +Create, remove or get information about a specific network interface. +These messages contain an +.I ifinfomsg +structure followed by a series of +.I rtattr +structures. + +.nf +struct ifinfomsg { + unsigned char ifi_family; /* AF_UNSPEC */ + unsigned short ifi_type; /* Device type */ + int ifi_index; /* Interface index */ + unsigned int ifi_flags; /* Device flags */ + unsigned int ifi_change; /* change mask */ +}; +.fi + +.\" FIXME ifi_type +.I ifi_flags +contains the device flags, see +.BR netdevice (7); +.I ifi_index +is the unique interface index, +.I ifi_change +is reserved for future use and should be always set to 0xFFFFFFFF. +.TS +tab(:); +c +l l l. +Routing attributes +rta_type:value type:description +_ +IFLA_UNSPEC:-:unspecified. +IFLA_ADDRESS:hardware address:interface L2 address +IFLA_BROADCAST:hardware address:L2 broadcast address. +IFLA_IFNAME:asciiz string:Device name. +IFLA_MTU:unsigned int:MTU of the device. +IFLA_LINK:int:Link type. +IFLA_QDISC:asciiz string:Queueing discipline. +IFLA_STATS:T{ +see below +T}:Interface Statistics. +.TE +.sp +The value type for IFLA_STATS is \fIstruct net_device_stats\fP. +.TP +.BR RTM_NEWADDR ", " RTM_DELADDR ", " RTM_GETADDR +Add, remove or receive information about an IP address associated with +an interface. +In Linux 2.2 an interface can carry multiple IP addresses, +this replaces the alias device concept in 2.0. +In Linux 2.2 these messages +support IPv4 and IPv6 addresses. +They contain an +.I ifaddrmsg +structure, optionally followed by +.I rtaddr +routing attributes. + +.nf +struct ifaddrmsg { + unsigned char ifa_family; /* Address type */ + unsigned char ifa_prefixlen; /* Prefixlength of address */ + unsigned char ifa_flags; /* Address flags */ + unsigned char ifa_scope; /* Address scope */ + int ifa_index; /* Interface index */ +}; +.fi + +.I ifa_family +is the address family type (currently +.B AF_INET +or +.BR AF_INET6 ), +.I ifa_prefixlen +is the length of the address mask of the address if defined for the +family (like for IPv4), +.I ifa_scope +is the address scope, +.I ifa_index +is the interface index of the interface the address is associated with. +.I ifa_flags +is a flag word of +.B IFA_F_SECONDARY +for secondary address (old alias interface), +.B IFA_F_PERMANENT +for a permanent address set by the user and other undocumented flags. +.TS +tab(:); +c +l l l. +Attributes +rta_type:value type:description +_ +IFA_UNSPEC:-:unspecified. +IFA_ADDRESS:raw protocol address:interface address +IFA_LOCAL:raw protocol address:local address +IFA_LABEL:asciiz string:name of the interface +IFA_BROADCAST:raw protocol address:broadcast address. +IFA_ANYCAST:raw protocol address:anycast address +IFA_CACHEINFO:struct ifa_cacheinfo:Address information. +.TE +.\" FIXME struct ifa_cacheinfo +.TP +.BR RTM_NEWROUTE ", " RTM_DELROUTE ", " RTM_GETROUTE +Create, remove or receive information about a network route. +These messages contain an +.I rtmsg +structure with an optional sequence of +.I rtattr +structures following. +For +.B RTM_GETROUTE +setting +.I rtm_dst_len +and +.I rtm_src_len +to 0 means you get all entries for the specified routing table. +For the other fields except +.I rtm_table +and +.I rtm_protocol +0 is the wildcard. + +.nf +struct rtmsg { + unsigned char rtm_family; /* Address family of route */ + unsigned char rtm_dst_len; /* Length of source */ + unsigned char rtm_src_len; /* Length of destination */ + unsigned char rtm_tos; /* TOS filter */ + + unsigned char rtm_table; /* Routing table ID */ + unsigned char rtm_protocol; /* Routing protocol; see below */ + unsigned char rtm_scope; /* See below */ + unsigned char rtm_type; /* See below */ + + unsigned int rtm_flags; +}; +.fi +.TS +tab(:); +l l. +rtm_type:Route type +_ +RTN_UNSPEC:unknown route +RTN_UNICAST:a gateway or direct route +RTN_LOCAL:a local interface route +RTN_BROADCAST:T{ +a local broadcast route (sent as a broadcast) +T} +RTN_ANYCAST:T{ +a local broadcast route (sent as a unicast) +T} +RTN_MULTICAST:a multicast route +RTN_BLACKHOLE:a packet dropping route +RTN_UNREACHABLE:an unreachable destination +RTN_PROHIBIT:a packet rejection route +RTN_THROW:continue routing lookup in another table +RTN_NAT:a network address translation rule +RTN_XRESOLVE:T{ +refer to an external resolver (not implemented) +T} +.TE +.TS +tab(:); +l l. +rtm_protocol:Route origin. +_ +RTPROT_UNSPEC:unknown +RTPROT_REDIRECT:T{ +by an ICMP redirect (currently unused) +T} +RTPROT_KERNEL:by the kernel +RTPROT_BOOT:during boot +RTPROT_STATIC:by the administrator +.TE + +Values larger than +.B RTPROT_STATIC +are not interpreted by the kernel, they are just for user information. +They may be used to tag the source of a routing information or to +distinguish between multiple routing daemons. +See +.I +for the routing daemon identifiers which are already assigned. + +.I rtm_scope +is the distance to the destination: +.TS +tab(:); +l l. +RT_SCOPE_UNIVERSE:global route +RT_SCOPE_SITE:T{ +interior route in the local autonomous system +T} +RT_SCOPE_LINK:route on this link +RT_SCOPE_HOST:route on the local host +RT_SCOPE_NOWHERE:destination doesn't exist +.TE + +The values between +.B RT_SCOPE_UNIVERSE +and +.B RT_SCOPE_SITE +are available to the user. + +The +.I rtm_flags +have the following meanings: +.TS +tab(:); +l l. +RTM_F_NOTIFY:T{ +if the route changes, notify the user via rtnetlink +T} +RTM_F_CLONED:route is cloned from another route +RTM_F_EQUALIZE:a multipath equalizer (not yet implemented) +.TE + +.I rtm_table +specifies the routing table +.TS +tab(:); +l l. +RT_TABLE_UNSPEC:an unspecified routing table +RT_TABLE_DEFAULT:the default table +RT_TABLE_MAIN:the main table +RT_TABLE_LOCAL:the local table +.TE + +The user may assign arbitrary values between +.B RT_TABLE_UNSPEC +and +.BR RT_TABLE_DEFAULT . +.TS +tab(:); +c +l l l. +Attributes +rta_type:value type:description +_ +RTA_UNSPEC:-:ignored. +RTA_DST:protocol address:Route destination address. +RTA_SRC:protocol address:Route source address. +RTA_IIF:int:Input interface index. +RTA_OIF:int:Output interface index. +RTA_GATEWAY:protocol address:The gateway of the route +RTA_PRIORITY:int:Priority of route. +RTA_PREFSRC:: +RTA_METRICS:int:Route metric +RTA_MULTIPATH:: +RTA_PROTOINFO:: +RTA_FLOW:: +RTA_CACHEINFO:: +.TE + +.B Fill these values in! +.TP +.BR RTM_NEWNEIGH ", " RTM_DELNEIGH ", " RTM_GETNEIGH +Add, remove or receive information about a neighbor table +entry (e.g., an ARP entry). +The message contains an +.I ndmsg +structure. + +.nf +struct ndmsg { + unsigned char ndm_family; + int ndm_ifindex; /* Interface index */ + __u16 ndm_state; /* State */ + __u8 ndm_flags; /* Flags */ + __u8 ndm_type; +}; + +struct nda_cacheinfo { + __u32 ndm_confirmed; + __u32 ndm_used; + __u32 ndm_updated; + __u32 ndm_refcnt; +}; +.fi + +.I ndm_state +is a bit mask of the following states: +.TS +tab(:); +l l. +NUD_INCOMPLETE:a currently resolving cache entry +NUD_REACHABLE:a confirmed working cache entry +NUD_STALE:an expired cache entry +NUD_DELAY:an entry waiting for a timer +NUD_PROBE:a cache entry that is currently reprobed +NUD_FAILED:an invalid cache entry +NUD_NOARP:a device with no destination cache +NUD_PERMANENT:a static entry +.TE + +Valid +.I ndm_flags +are: +.TS +tab(:); +l l. +NTF_PROXY:a proxy arp entry +NTF_ROUTER:an IPv6 router +.TE + +.\" FIXME +.\" document the members of the struct better +The +.I rtaddr +struct has the following meanings for the +.I rta_type +field: +.TS +tab(:); +l l. +NDA_UNSPEC:unknown type +NDA_DST:a neighbor cache n/w layer destination address +NDA_LLADDR:a neighbor cache link layer address +NDA_CACHEINFO:cache statistics. +.TE + +If the +.I rta_type +field is +.B NDA_CACHEINFO +then a +.I struct nda_cacheinfo +header follows +.TP +.BR RTM_NEWRULE ", " RTM_DELRULE ", " RTM_GETRULE +Add, delete or retrieve a routing rule. +Carries a +.I struct rtmsg +.TP +.BR RTM_NEWQDISC ", " RTM_DELQDISC ", " RTM_GETQDISC +Add, remove or get a queueing discipline. +The message contains a +.I struct tcmsg +and may be followed by a series of +attributes. + +.nf +struct tcmsg { + unsigned char tcm_family; + int tcm_ifindex; /* interface index */ + __u32 tcm_handle; /* Qdisc handle */ + __u32 tcm_parent; /* Parent qdisc */ + __u32 tcm_info; +}; +.fi +.TS +tab(:); +c +l l l. +Attributes +rta_type:value type:Description +_ +TCA_UNSPEC:-:unspecified +TCA_KIND:asciiz string:Name of queueing discipline +TCA_OPTIONS:byte sequence:Qdisc-specific options follow +TCA_STATS:struct tc_stats:Qdisc statistics. +TCA_XSTATS:qdisc specific:Module-specific statistics. +TCA_RATE:struct tc_estimator:Rate limit. +.TE + +In addition various other qdisc module specific attributes are allowed. +For more information see the appropriate include files. +.TP +.BR RTM_NEWTCLASS ", " RTM_DELTCLASS ", " RTM_GETTCLASS +Add, remove or get a traffic class. +These messages contain a +.I struct tcmsg +as described above. +.TP +.BR RTM_NEWTFILTER ", " RTM_DELTFILTER ", " RTM_GETTFILTER +Add, remove or receive information about a traffic filter. +These messages contain a +.I struct tcmsg +as described above. +.SH VERSIONS +.B rtnetlink +is a new feature of Linux 2.2. +.SH BUGS +This manual page is incomplete. +.SH "SEE ALSO" +.BR cmsg (3), +.BR rtnetlink (3), +.BR ip (7), +.BR netlink (7) diff --git a/man7/socket.7 b/man7/socket.7 index a191ad44b..41b37f1b4 100644 --- a/man7/socket.7 +++ b/man7/socket.7 @@ -1,8 +1,728 @@ -.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual" -.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual" -.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual" -.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual" -.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual" -.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual" -.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual" -.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual" +'\" t +.\" Don't change the first line, it tells man that we need tbl. +.\" This man page is Copyright (C) 1999 Andi Kleen . +.\" and copyright (c) 1999 Matthew Wilcox. +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" +.\" 2002-10-30, Michael Kerrisk, +.\" Added description of SO_ACCEPTCONN +.\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text. +.\" Modified, 27 May 2004, Michael Kerrisk +.\" Added notes on capability requirements +.\" A few small grammar fixes +.\" +.\" FIXME probably all PF_* should be AF_* in this page, since +.\" POSIX only specifies the latter values. +.\" +.TH SOCKET 7 2007-12-28 Linux "Linux Programmer's Manual" +.SH NAME +socket \- Linux socket interface +.SH SYNOPSIS +.B #include +.sp +.IB mysocket " = socket(int " socket_family ", int " socket_type ", int " protocol ); +.SH DESCRIPTION +This manual page describes the Linux networking socket layer user +interface. +The BSD compatible sockets +are the uniform interface +between the user process and the network protocol stacks in the kernel. +The protocol modules are grouped into +.I protocol families +like +.BR PF_INET ", " PF_IPX ", " PF_PACKET +and +.I socket types +like +.B SOCK_STREAM +or +.BR SOCK_DGRAM . +See +.BR socket (2) +for more information on families and types. +.SS Socket Layer Functions +These functions are used by the user process to send or receive packets +and to do other socket operations. +For more information see their respective manual pages. + +.BR socket (2) +creates a socket, +.BR connect (2) +connects a socket to a remote socket address, +the +.BR bind (2) +function binds a socket to a local socket address, +.BR listen (2) +tells the socket that new connections shall be accepted, and +.BR accept (2) +is used to get a new socket with a new incoming connection. +.BR socketpair (2) +returns two connected anonymous sockets (only implemented for a few +local families like +.BR PF_UNIX ) +.PP +.BR send (2), +.BR sendto (2), +and +.BR sendmsg (2) +send data over a socket, and +.BR recv (2), +.BR recvfrom (2), +.BR recvmsg (2) +receive data from a socket. +.BR poll (2) +and +.BR select (2) +wait for arriving data or a readiness to send data. +In addition, the standard I/O operations like +.BR write (2), +.BR writev (2), +.BR sendfile (2), +.BR read (2), +and +.BR readv (2) +can be used to read and write data. +.PP +.BR getsockname (2) +returns the local socket address and +.BR getpeername (2) +returns the remote socket address. +.BR getsockopt (2) +and +.BR setsockopt (2) +are used to set or get socket layer or protocol options. +.BR ioctl (2) +can be used to set or read some other options. +.PP +.BR close (2) +is used to close a socket. +.BR shutdown (2) +closes parts of a full-duplex socket connection. +.PP +Seeking, or calling +.BR pread (2) +or +.BR pwrite (2) +with a non-zero position is not supported on sockets. +.PP +It is possible to do non-blocking I/O on sockets by setting the +.B O_NONBLOCK +flag on a socket file descriptor using +.BR fcntl (2). +Then all operations that would block will (usually) +return with +.B EAGAIN +(operation should be retried later); +.BR connect (2) +will return +.B EINPROGRESS +error. +The user can then wait for various events via +.BR poll (2) +or +.BR select (2). +.TS +tab(:) allbox; +c s s +l l l. +I/O events +Event:Poll flag:Occurrence +Read:POLLIN:T{ +New data arrived. +T} +Read:POLLIN:T{ +A connection setup has been completed +(for connection-oriented sockets) +T} +Read:POLLHUP:T{ +A disconnection request has been initiated by the other end. +T} +Read:POLLHUP:T{ +A connection is broken (only for connection-oriented protocols). +When the socket is written +.B SIGPIPE +is also sent. +T} +Write:POLLOUT:T{ +Socket has enough send buffer space for writing new data. +T} +Read/Write:T{ +POLLIN| +.br +POLLOUT +T}:T{ +An outgoing +.BR connect (2) +finished. +T} +Read/Write:POLLERR:An asynchronous error occurred. +Read/Write:POLLHUP:The other end has shut down one direction. +Exception:POLLPRI:T{ +Urgent data arrived. +.B SIGURG +is sent then. +T} +.\" FIXME . The following is not true currently: +.\" It is no I/O event when the connection +.\" is broken from the local end using +.\" .BR shutdown (2) +.\" or +.\" .BR close (2). +.TE + +.PP +An alternative to +.BR poll (2) +and +.BR select (2) +is to let the kernel inform the application about events +via a +.B SIGIO +signal. +For that the +.B O_ASYNC +flag must be set on a socket file descriptor via +.BR fcntl (2) +and a valid signal handler for +.B SIGIO +must be installed via +.BR sigaction (2). +See the +.I Signals +discussion below. +.SS Socket Options +These socket options can be set by using +.BR setsockopt (2) +and read with +.BR getsockopt (2) +with the socket level set to +.B SOL_SOCKET +for all sockets: +.\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in +.\" W R Stevens, UNPv1 +.TP +.B SO_ACCEPTCONN +Returns a value indicating whether or not this socket has been marked +to accept connections with +.BR listen (2). +The value 0 indicates that this is not a listening socket, +the value 1 indicates that this is a listening socket. +Can only be read +with +.BR getsockopt (2). +.TP +.B SO_BINDTODEVICE +Bind this socket to a particular device like \(lqeth0\(rq, +as specified in the passed interface name. +If the +name is an empty string or the option length is zero, the socket device +binding is removed. +The passed option is a variable-length null terminated +interface name string with the maximum size of +.BR IFNAMSIZ . +If a socket is bound to an interface, +only packets received from that particular interface are processed by the +socket. +Note that this only works for some socket types, particularly +.B AF_INET +sockets. +It is not supported for packet sockets (use normal +.BR bind (8) +there). +.TP +.B SO_BROADCAST +Set or get the broadcast flag. +When enabled, datagram sockets +receive packets sent to a broadcast address and they are allowed to send +packets to a broadcast address. +This option has no effect on stream-oriented sockets. +.TP +.B SO_BSDCOMPAT +Enable BSD bug-to-bug compatibility. +This is used by the UDP protocol module in Linux 2.0 and 2.2. +If enabled ICMP errors received for a UDP socket will not be passed +to the user program. +In later kernel versions, support for this option has been phased out: +Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning +(printk()) if a program uses this option. +Linux 2.0 also enabled BSD bug-to-bug compatibility +options (random header changing, skipping of the broadcast flag) for raw +sockets with this option, but that was removed in Linux 2.2. +.TP +.B SO_DEBUG +Enable socket debugging. +Only allowed for processes with the +.B CAP_NET_ADMIN +capability or an effective user ID of 0. +.TP +.B SO_ERROR +Get and clear the pending socket error. +Only valid as a +.BR getsockopt (2). +Expects an integer. +.TP +.B SO_DONTROUTE +Don't send via a gateway, only send to directly connected hosts. +The same effect can be achieved by setting the +.B MSG_DONTROUTE +flag on a socket +.BR send (2) +operation. +Expects an integer boolean flag. +.TP +.B SO_KEEPALIVE +Enable sending of keep-alive messages on connection-oriented sockets. +Expects an integer boolean flag. +.TP +.B SO_LINGER +Sets or gets the +.B SO_LINGER +option. +The argument is a +.I linger +structure. +.sp +.in +4n +.nf +struct linger { + int l_onoff; /* linger active */ + int l_linger; /* how many seconds to linger for */ +}; +.fi +.in +.IP +When enabled, a +.BR close (2) +or +.BR shutdown (2) +will not return until all queued messages for the socket have been +successfully sent or the linger timeout has been reached. +Otherwise, +the call returns immediately and the closing is done in the background. +When the socket is closed as part of +.BR exit (2), +it always lingers in the background. +.TP +.B SO_OOBINLINE +If this option is enabled, +out-of-band data is directly placed into the receive data stream. +Otherwise out-of-band data is only passed when the +.B MSG_OOB +flag is set during receiving. +.\" don't document it because it can do too much harm. +.\".B SO_NO_CHECK +.TP +.B SO_PASSCRED +Enable or disable the receiving of the +.B SCM_CREDENTIALS +control message. +For more information see +.BR unix (7). +.\" FIXME Document SO_PASSSEC, added in 2.6.18; there is some info +.\" in the 2.6.18 ChangeLog +.TP +.B SO_PEERCRED +Return the credentials of the foreign process connected to this socket. +This is only possible for connected +.B PF_UNIX +stream sockets and +.B PF_UNIX +stream and datagram socket pairs created using +.BR socketpair (2); +see +.BR unix (7). +The returned credentials are those that were in effect at the time +of the call to +.BR connect (2) +or +.BR socketpair (2). +Argument is a +.I ucred +structure. +Only valid as a +.BR getsockopt (2). +.TP +.B SO_PRIORITY +Set the protocol-defined priority for all packets to be sent on +this socket. +Linux uses this value to order the networking queues: +packets with a higher priority may be processed first depending +on the selected device queueing discipline. +For +.BR ip (7), +this also sets the IP type-of-service (TOS) field for outgoing packets. +Setting a priority outside the range 0 to 6 requires the +.B CAP_NET_ADMIN +capability. +.TP +.B SO_RCVBUF +Sets or gets the maximum socket receive buffer in bytes. +The kernel doubles this value (to allow space for bookkeeping overhead) +when it is set using +.\" Most (all?) other implementations do not do this -- MTK, Dec 05 +.BR setsockopt (2), +and this doubled value is returned by +.BR getsockopt (2). +The default value is set by the +.I rmem_default +sysctl and the maximum allowed value is set by the +.I rmem_max +sysctl. +The minimum (doubled) value for this option is 256. +.TP +.BR SO_RCVBUFFORCE " (since Linux 2.6.14)" +Using this socket option, a privileged +.RB ( CAP_NET_ADMIN ) +process can perform the same task as +.BR SO_RCVBUF , +but the +.I rmem_max +limit can be overridden. +.TP +.BR SO_RCVLOWAT " and " SO_SNDLOWAT +Specify the minimum number of bytes in the buffer until the socket layer +will pass the data to the protocol +.RB ( SO_SNDLOWAT ) +or the user on receiving +.RB ( SO_RCVLOWAT ). +These two values are initialized to 1. +.B SO_SNDLOWAT +is not changeable on Linux +.RB ( setsockopt (2) +fails with the error +.BR ENOPROTOOPT ). +.B SO_RCVLOWAT +is changeable +only since Linux 2.4. +The +.BR select (2) +and +.BR poll (2) +system calls currently do not respect the +.B SO_RCVLOWAT +setting on Linux, +and mark a socket readable when even a single byte of data is available. +A subsequent read from the socket will block until +.B SO_RCVLOWAT +bytes are available. +.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2 +.\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05 +.TP +.BR SO_RCVTIMEO " and " SO_SNDTIMEO +.\" Not implemented in 2.0. +.\" Implemented in 2.1.11 for getsockopt: always return a zero struct. +.\" Implemented in 2.3.41 for setsockopt, and actually used. +Specify the receiving or sending timeouts until reporting an error. +The argument is a +.IR "struct timeval" . +If an input or output function blocks for this period of time, and +data has been sent or received, the return value of that function +will be the amount of data transferred; if no data has been transferred +and the timeout has been reached then \-1 is returned with +.I errno +set to +.B EAGAIN +or +.B EWOULDBLOCK +.\" in fact to EAGAIN +just as if the socket was specified to be non-blocking. +If the timeout is set to zero (the default) +then the operation will never timeout. +Timeouts only have effect for system calls that perform socket I/O (e.g., +.BR read (2), +.BR recvmsg (2), +.BR send (2), +.BR sendmsg (2)); +timeouts have no effect for +.BR select (2), +.BR poll (2), +.BR epoll_wait (2), +etc. +.TP +.B SO_REUSEADDR +Indicates that the rules used in validating addresses supplied in a +.BR bind (2) +call should allow reuse of local addresses. +For +.B PF_INET +sockets this +means that a socket may bind, except when there +is an active listening socket bound to the address. +When the listening socket is bound to +.B INADDR_ANY +with a specific port then it is not possible +to bind to this port for any local address. +Argument is an integer boolean flag. +.TP +.B SO_SNDBUF +Sets or gets the maximum socket send buffer in bytes. +The kernel doubles this value (to allow space for bookkeeping overhead) +when it is set using +.\" Most (all?) other implementations do not do this -- MTK, Dec 05 +.BR setsockopt (2), +and this doubled value is returned by +.BR getsockopt (2). +The default value is set by the +.I wmem_default +sysctl and the maximum allowed value is set by the +.I wmem_max +sysctl. +The minimum (doubled) value for this option is 2048. +.TP +.BR SO_SNDBUFFORCE " (since Linux 2.6.14)" +Using this socket option, a privileged +.RB ( CAP_NET_ADMIN ) +process can perform the same task as +.BR SO_SNDBUF , +but the +.I wmem_max +limit can be overridden. +.TP +.B SO_TIMESTAMP +Enable or disable the receiving of the +.B SO_TIMESTAMP +control message. +The timestamp control message is sent with level +.B SOL_SOCKET +and the +.I cmsg_data +field is a +.I "struct timeval" +indicating the +reception time of the last packet passed to the user in this call. +See +.BR cmsg (3) +for details on control messages. +.TP +.B SO_TYPE +Gets the socket type as an integer (like +.BR SOCK_STREAM ). +Can only be read +with +.BR getsockopt (2). +.SS Signals +When writing onto a connection-oriented socket that has been shut down +(by the local or the remote end) +.B SIGPIPE +is sent to the writing process and +.B EPIPE +is returned. +The signal is not sent when the write call +specified the +.B MSG_NOSIGNAL +flag. +.PP +When requested with the +.B FIOSETOWN +.BR fcntl (2) +or +.B SIOCSPGRP +.BR ioctl (2), +.B SIGIO +is sent when an I/O event occurs. +It is possible to use +.BR poll (2) +or +.BR select (2) +in the signal handler to find out which socket the event occurred on. +An alternative (in Linux 2.2) is to set a real-time signal using the +.B F_SETSIG +.BR fcntl (2); +the handler of the real time signal will be called with +the file descriptor in the +.I si_fd +field of its +.IR siginfo_t . +See +.BR fcntl (2) +for more information. +.PP +Under some circumstances (e.g., multiple processes accessing a +single socket), the condition that caused the +.B SIGIO +may have already disappeared when the process reacts to the signal. +If this happens, the process should wait again because Linux +will resend the signal later. +.\" .SS Ancillary Messages +.SS Sysctls +The core socket networking sysctls can be accessed using the +.I /proc/sys/net/core/* +files or with the +.BR sysctl (2) +interface. +.TP +.I rmem_default +contains the default setting in bytes of the socket receive buffer. +.TP +.I rmem_max +contains the maximum socket receive buffer size in bytes which a user may +set by using the +.B SO_RCVBUF +socket option. +.TP +.I wmem_default +contains the default setting in bytes of the socket send buffer. +.TP +.I wmem_max +contains the maximum socket send buffer size in bytes which a user may +set by using the +.B SO_SNDBUF +socket option. +.TP +.BR message_cost " and " message_burst +configure the token bucket filter used to load limit warning messages +caused by external network events. +.TP +.I netdev_max_backlog +Maximum number of packets in the global input queue. +.TP +.I optmem_max +Maximum length of ancillary data and user control data like the iovecs +per socket. +.\" netdev_fastroute is not documented because it is experimental +.SS Ioctls +These operations can be accessed using +.BR ioctl (2): + +.in +4n +.nf +.IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");" +.fi +.in +.TP +.B SIOCGSTAMP +Return a +.I struct timeval +with the receive timestamp of the last packet passed to the user. +This is useful for accurate round trip time measurements. +See +.BR setitimer (2) +for a description of +.IR "struct timeval" . +.\" +This ioctl should only be used if the socket option +.B SO_TIMESTAMP +is not set on the socket. +Otherwise, it returns the timestamp of the +last packet that was received while +.B SO_TIMESTAMP +was not set, or it fails if no such packet has been received, +(i.e., +.BR ioctl (2) +returns \-1 with +.I errno +set to +.BR ENOENT ). +.TP +.B SIOCSPGRP +Set the process or process group to send +.B SIGIO +or +.B SIGURG +signals +to when an +asynchronous I/O operation has finished or urgent data is available. +The argument is a pointer to a +.IR pid_t . +If the argument is positive, send the signals to that process. +If the +argument is negative, send the signals to the process group with the ID +of the absolute value of the argument. +The process may only choose itself or its own process group to receive +signals unless it has the +.B CAP_KILL +capability or an effective UID of 0. +.TP +.B FIOASYNC +Change the +.B O_ASYNC +flag to enable or disable asynchronous I/O mode of the socket. +Asynchronous I/O mode means that the +.B SIGIO +signal or the signal set with +.B F_SETSIG +is raised when a new I/O event occurs. +.IP +Argument is an integer boolean flag. +(This operation is synonymous with the use of +.BR fcntl (2) +to set the +.B O_ASYNC +flag.) +.\" +.TP +.B SIOCGPGRP +Get the current process or process group that receives +.B SIGIO +or +.B SIGURG +signals, +or 0 +when none is set. +.PP +Valid +.BR fcntl (2) +operations: +.TP +.B FIOGETOWN +The same as the +.B SIOCGPGRP +.BR ioctl (2). +.TP +.B FIOSETOWN +The same as the +.B SIOCSPGRP +.BR ioctl (2). +.SH VERSIONS +.B SO_BINDTODEVICE +was introduced in Linux 2.0.30. +.B SO_PASSCRED +is new in Linux 2.2. +The sysctls are new in Linux 2.2. +.B SO_RCVTIMEO +and +.B SO_SNDTIMEO +are supported since Linux 2.3.41. +Earlier, timeouts were fixed to +a protocol-specific setting, and could not be read or written. +.SH NOTES +Linux assumes that half of the send/receive buffer is used for internal +kernel structures; thus the sysctls are twice what can be observed +on the wire. + +Linux will only allow port re-use with the +.B SO_REUSEADDR +option +when this option was set both in the previous program that performed a +.BR bind (2) +to the port and in the program that wants to re-use the port. +This differs from some implementations (e.g., FreeBSD) +where only the later program needs to set the +.B SO_REUSEADDR +option. +Typically this difference is invisible, since, for example, a server +program is designed to always set this option. +.SH BUGS +The +.B CONFIG_FILTER +socket options +.B SO_ATTACH_FILTER +and +.B SO_DETACH_FILTER +are +not documented. +The suggested interface to use them is via the libpcap +library. +.\" .SH AUTHORS +.\" This man page was written by Andi Kleen. +.SH "SEE ALSO" +.BR getsockopt (2), +.BR setsockopt (2), +.BR socket (2), +.BR capabilities (7), +.BR ddp (7), +.BR ip (7), +.BR packet (7) diff --git a/man7/tcp.7 b/man7/tcp.7 index 5ad2fef09..c641d2f90 100644 --- a/man7/tcp.7 +++ b/man7/tcp.7 @@ -1,8 +1,947 @@ -.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH TCP 7 2008-08-07 "Linux" "Linux Programmer's Manual" +.\" This man page is Copyright (C) 1999 Andi Kleen . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" +.\" 2.4 Updates by Nivedita Singhvi 4/20/02 . +.\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer +.\" Updated details of interaction of TCP_CORK and TCP_NODELAY. +.\" +.\" FIXME 2.6.17-rc1 adds the following /proc files, which need to be +.\" documented: tcp_mtu_probing, tcp_base_mss, and +.\" tcp_workaround_signed_windows +.\" +.TH TCP 7 2007-11-25 "Linux" "Linux Programmer's Manual" +.SH NAME +tcp \- TCP protocol +.SH SYNOPSIS +.B #include +.br +.B #include +.br +.B #include +.sp +.B tcp_socket = socket(PF_INET, SOCK_STREAM, 0); +.SH DESCRIPTION +This is an implementation of the TCP protocol defined in +RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK +extensions. +It provides a reliable, stream-oriented, +full-duplex connection between two sockets on top of +.BR ip (7), +for both v4 and v6 versions. +TCP guarantees that the data arrives in order and +retransmits lost packets. +It generates and checks a per-packet checksum to catch +transmission errors. +TCP does not preserve record boundaries. + +A newly created TCP socket has no remote or local address and is not +fully specified. +To create an outgoing TCP connection use +.BR connect (2) +to establish a connection to another TCP socket. +To receive new incoming connections, first +.BR bind (2) +the socket to a local address and port and then call +.BR listen (2) +to put the socket into the listening state. +After that a new +socket for each incoming connection can be accepted +using +.BR accept (2). +A socket which has had +.BR accept (2) +or +.BR connect (2) +successfully called on it is fully specified and may +transmit data. +Data cannot be transmitted on listening or +not yet connected sockets. + +Linux supports RFC\ 1323 TCP high performance +extensions. +These include Protection Against Wrapped +Sequence Numbers (PAWS), Window Scaling and +Timestamps. +Window scaling allows the use +of large (> 64K) TCP windows in order to support links with high +latency or bandwidth. +To make use of them, the send and +receive buffer sizes must be increased. +They can be set globally with the +.I net.ipv4.tcp_wmem +and +.I net.ipv4.tcp_rmem +sysctl variables, or on individual sockets by using the +.B SO_SNDBUF +and +.B SO_RCVBUF +socket options with the +.BR setsockopt (2) +call. + +The maximum sizes for socket buffers declared via the +.B SO_SNDBUF +and +.B SO_RCVBUF +mechanisms are limited by the global +.I net.core.rmem_max +and +.I net.core.wmem_max +sysctls. +Note that TCP actually allocates twice the size of +the buffer requested in the +.BR setsockopt (2) +call, and so a succeeding +.BR getsockopt (2) +call will not return the same size of buffer as requested +in the +.BR setsockopt (2) +call. +TCP uses the extra space for administrative purposes and internal +kernel structures, and the sysctl variables reflect the +larger sizes compared to the actual TCP windows. +On individual connections, the socket buffer size must be +set prior to the +.BR listen (2) +or +.BR connect (2) +calls in order to have it take effect. +See +.BR socket (7) +for more information. +.PP +TCP supports urgent data. +Urgent data is used to signal the +receiver that some important message is part of the data +stream and that it should be processed as soon as possible. +To send urgent data specify the +.B MSG_OOB +option to +.BR send (2). +When urgent data is received, the kernel sends a +.B SIGURG +signal to the process or process group that has been set as the +socket "owner" using the +.B SIOCSPGRP +or +.B FIOSETOWN +ioctls (or the POSIX.1-2001-specified +.BR fcntl (2) +.B F_SETOWN +operation). +When the +.B SO_OOBINLINE +socket option is enabled, urgent data is put into the normal +data stream (a program can test for its location using the +.B SIOCATMARK +ioctl described below), +otherwise it can be only received when the +.B MSG_OOB +flag is set for +.BR recv (2) +or +.BR recvmsg (2). + +Linux 2.4 introduced a number of changes for improved +throughput and scaling, as well as enhanced functionality. +Some of these features include support for zero-copy +.BR sendfile (2), +Explicit Congestion Notification, new +management of TIME_WAIT sockets, keep-alive socket options +and support for Duplicate SACK extensions. +.SS Address Formats +TCP is built on top of IP (see +.BR ip (7)). +The address formats defined by +.BR ip (7) +apply to TCP. +TCP only supports point-to-point +communication; broadcasting and multicasting are not +supported. +.SS Sysctls +These variables can be accessed by the +.I /proc/sys/net/ipv4/* +files or with the +.BR sysctl (2) +interface. +In addition, most IP sysctls also apply to TCP; see +.BR ip (7). +Variables described as +.I Boolean +take an integer value, with a non-zero value ("true") meaning that +the corresponding option is enabled, and a zero value ("false") +meaning that the option is disabled. +.\" FIXME As at Sept 2006, kernel 2.6.18-rc5, the following are +.\" not yet documented (shown with default values): +.\" +.\" /proc/sys/net/ipv4/tcp_congestion_control (since 2.6.13) +.\" bic +.\" /proc/sys/net/ipv4/tcp_moderate_rcvbuf +.\" 1 +.\" /proc/sys/net/ipv4/tcp_no_metrics_save +.\" 0 +.TP +.BR tcp_abort_on_overflow " (Boolean; default: disabled)" +Enable resetting connections if the listening service is too +slow and unable to keep up and accept them. +It means that if overflow occurred due +to a burst, the connection will recover. +Enable this option +.I only +if you are really sure that the listening daemon +cannot be tuned to accept connections faster. +Enabling this +option can harm the clients of your server. +.TP +.BR tcp_adv_win_scale " (integer; default: 2)" +Count buffering overhead as +.IR "bytes/2^tcp_adv_win_scale" , +if +.I tcp_adv_win_scale +is greater than 0; or +.IR "bytes-bytes/2^(\-tcp_adv_win_scale)" , +if +.I tcp_adv_win_scale +is less than or equal to zero. + +The socket receive buffer space is shared between the +application and kernel. +TCP maintains part of the buffer as +the TCP window, this is the size of the receive window +advertised to the other end. +The rest of the space is used +as the "application" buffer, used to isolate the network +from scheduling and application latencies. +The +.I tcp_adv_win_scale +default value of 2 implies that the space +used for the application buffer is one fourth that of the +total. +.TP +.BR tcp_app_win " (integer; default: 31)" +This variable defines how many +bytes of the TCP window are reserved for buffering +overhead. + +A maximum of (\fIwindow/2^tcp_app_win\fP, mss) bytes in the window +are reserved for the application buffer. +A value of 0 +implies that no amount is reserved. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR tcp_bic " (Boolean; default: disabled)" +Enable BIC TCP congestion control algorithm. +BIC-TCP is a sender-side only change that ensures a linear RTT +fairness under large windows while offering both scalability and +bounded TCP-friendliness. +The protocol combines two schemes +called additive increase and binary search increase. +When the +congestion window is large, additive increase with a large +increment ensures linear RTT fairness as well as good +scalability. +Under small congestion windows, binary search +increase provides TCP friendliness. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR tcp_bic_low_window " (integer; default: 14)" +Sets the threshold window (in packets) where BIC TCP starts to +adjust the congestion window. +Below this threshold BIC TCP behaves +the same as the default TCP Reno. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR tcp_bic_fast_convergence " (Boolean; default: enabled)" +Forces BIC TCP to more quickly respond to changes in congestion +window. +Allows two flows sharing the same connection to converge +more rapidly. +.TP +.BR tcp_dsack " (Boolean; default: enabled)" +Enable RFC\ 2883 TCP Duplicate SACK support. +.TP +.BR tcp_ecn " (Boolean; default: disabled)" +Enable RFC\ 2884 Explicit Congestion Notification. +When enabled, connectivity to some +destinations could be affected due to older, misbehaving +routers along the path causing connections to be dropped. +.TP +.BR tcp_fack " (Boolean; default: enabled)" +Enable TCP Forward Acknowledgement support. +.TP +.BR tcp_fin_timeout " (integer; default: 60)" +This specifies how many seconds to wait for a final FIN packet before the +socket is forcibly closed. +This is strictly a violation of +the TCP specification, but required to prevent +denial-of-service attacks. +In Linux 2.2, the default value was 180. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR tcp_frto " (Boolean; default: disabled)" +Enables F-RTO, an enhanced recovery algorithm for TCP retransmission +timeouts. +It is particularly beneficial in wireless environments +where packet loss is typically due to random radio interference +rather than intermediate router congestion. +.TP +.BR tcp_keepalive_intvl " (integer; default: 75)" +The number of seconds between TCP keep-alive probes. +.TP +.BR tcp_keepalive_probes " (integer; default: 9)" +The maximum number of TCP keep-alive probes to send +before giving up and killing the connection if +no response is obtained from the other end. +.TP +.BR tcp_keepalive_time " (integer; default: 7200)" +The number of seconds a connection needs to be idle +before TCP begins sending out keep-alive probes. +Keep-alives are only sent when the +.B SO_KEEPALIVE +socket option is enabled. +The default value is 7200 seconds (2 hours). +An idle connection is terminated after +approximately an additional 11 minutes (9 probes an interval +of 75 seconds apart) when keep-alive is enabled. + +Note that underlying connection tracking mechanisms and +application timeouts may be much shorter. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR tcp_low_latency " (Boolean; default: disabled)" +If enabled, the TCP stack makes decisions that prefer lower +latency as opposed to higher throughput. +It this option is disabled, then higher throughput is preferred. +An example of an application where this default should be +changed would be a Beowulf compute cluster. +.TP +.BR tcp_max_orphans " (integer; default: see below)" +The maximum number of orphaned (not attached to any user file +handle) TCP sockets allowed in the system. +When this number +is exceeded, the orphaned connection is reset and a warning +is printed. +This limit exists only to prevent simple denial-of-service attacks. +Lowering this limit is not recommended. +Network conditions might require you to increase the number of +orphans allowed, but note that each orphan can eat up to ~64K +of unswappable memory. +The default initial value is set +equal to the kernel parameter NR_FILE. +This initial default is adjusted depending on the memory in the system. +.TP +.BR tcp_max_syn_backlog " (integer; default: see below)" +The maximum number of queued connection requests which have +still not received an acknowledgement from the connecting client. +If this number is exceeded, the kernel will begin +dropping requests. +The default value of 256 is increased to +1024 when the memory present in the system is adequate or +greater (>= 128Mb), and reduced to 128 for those systems with +very low memory (<= 32Mb). +It is recommended that if this +needs to be increased above 1024, TCP_SYNQ_HSIZE in +.I include/net/tcp.h +be modified to keep +TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be +recompiled. +.TP +.BR tcp_max_tw_buckets " (integer; default: see below)" +The maximum number of sockets in TIME_WAIT state allowed in +the system. +This limit exists only to prevent simple denial-of-service +attacks. +The default value of NR_FILE*2 is adjusted +depending on the memory in the system. +If this number is +exceeded, the socket is closed and a warning is printed. +.TP +.I tcp_mem +This is a vector of 3 integers: [low, pressure, high]. +These bounds are used by TCP to track its memory usage. +The +defaults are calculated at boot time from the amount of +available memory. +(TCP can only use +.I "low memory" +for this, which is limited to around 900 megabytes on 32-bit systems. +64-bit systems do not suffer this limitation.) + +.I low +- TCP doesn't regulate its memory allocation when the number +of pages it has allocated globally is below this number. + +.I pressure +- when the amount of memory allocated by TCP +exceeds this number of pages, TCP moderates its memory consumption. +This memory pressure state is exited +once the number of pages allocated falls below +the +.I low +mark. + +.I high +- the maximum number of pages, globally, that TCP +will allocate. +This value overrides any other limits +imposed by the kernel. +.TP +.BR tcp_orphan_retries " (integer; default: 8)" +The maximum number of attempts made to probe the other +end of a connection which has been closed by our end. +.TP +.BR tcp_reordering " (integer; default: 3)" +The maximum a packet can be reordered in a TCP packet stream +without TCP assuming packet loss and going into slow start. +It is not advisable to change this number. +This is a packet reordering detection metric designed to +minimize unnecessary back off and retransmits provoked by +reordering of packets on a connection. +.TP +.BR tcp_retrans_collapse " (Boolean; default: enabled)" +Try to send full-sized packets during retransmit. +.TP +.BR tcp_retries1 " (integer; default: 3)" +The number of times TCP will attempt to retransmit a +packet on an established connection normally, +without the extra effort of getting the network +layers involved. +Once we exceed this number of +retransmits, we first have the network layer +update the route if possible before each new retransmit. +The default is the RFC specified minimum of 3. +.TP +.BR tcp_retries2 " (integer; default: 15)" +The maximum number of times a TCP packet is retransmitted +in established state before giving up. +The default +value is 15, which corresponds to a duration of +approximately between 13 to 30 minutes, depending +on the retransmission timeout. +The RFC\ 1122 specified +minimum limit of 100 seconds is typically deemed too +short. +.TP +.BR tcp_rfc1337 " (Boolean; default: disabled)" +Enable TCP behavior conformant with RFC\ 1337. +When disabled, +if a RST is received in TIME_WAIT state, we close +the socket immediately without waiting for the end +of the TIME_WAIT period. +.TP +.I tcp_rmem +This is a vector of 3 integers: [min, default, +max]. +These parameters are used by TCP to regulate receive +buffer sizes. +TCP dynamically adjusts the size of the +receive buffer from the defaults listed below, in the range +of these sysctl variables, depending on memory available +in the system. +.RS +.TP 9 +.I min +minimum size of the receive buffer used by each TCP socket. +The default value is 4K, and is lowered to +.B PAGE_SIZE +bytes in low-memory systems. +This value +is used to ensure that in memory pressure mode, +allocations below this size will still succeed. +This is not +used to bound the size of the receive buffer declared +using +.B SO_RCVBUF +on a socket. +.TP +.I default +the default size of the receive buffer for a TCP socket. +This value overwrites the initial default buffer size from +the generic global +.I net.core.rmem_default +defined for all protocols. +The default value is 87380 +bytes, and is lowered to 43689 in low-memory systems. +If larger receive buffer sizes are desired, this value should +be increased (to affect all sockets). +To employ large TCP +windows, the +.I net.ipv4.tcp_window_scaling +must be enabled (default). +.TP +.I max +the maximum size of the receive buffer used by +each TCP socket. +This value does not override the global +.IR net.core.rmem_max . +This is not used to limit the size of the receive buffer +declared using +.B SO_RCVBUF +on a socket. +The default value of 87380*2 bytes is lowered to 87380 +in low-memory systems. +.RE +.TP +.BR tcp_sack " (Boolean; default: enabled)" +Enable RFC\ 2018 TCP Selective Acknowledgements. +.TP +.BR tcp_stdurg " (Boolean; default: disabled)" +If this option is enabled, then use the RFC\ 1122 interpretation +of the TCP urgent-pointer field. +.\" RFC 793 was ambiguous in its specification of the meaning of the +.\" urgent pointer. RFC 1122 (and RFC 961) fixed on a particular +.\" resolution of this ambiguity (unfortunately the "wrong" one). +According to this interpretation, the urgent pointer points +to the last byte of urgent data. +If this option is disabled, then use the BSD-compatible interpretation of +the urgent pointer: +the urgent pointer points to the first byte after the urgent data. +Enabling this option may lead to interoperability problems. +.TP +.BR tcp_synack_retries " (integer; default: 5)" +The maximum number of times a SYN/ACK segment +for a passive TCP connection will be retransmitted. +This number should not be higher than 255. +.TP +.BR tcp_syncookies " (Boolean)" +Enable TCP syncookies. +The kernel must be compiled with +.BR CONFIG_SYN_COOKIES . +Send out syncookies when the syn backlog queue of a socket +overflows. +The syncookies feature attempts to protect a +socket from a SYN flood attack. +This should be used as a +last resort, if at all. +This is a violation of the TCP +protocol, and conflicts with other areas of TCP such as TCP +extensions. +It can cause problems for clients and relays. +It is not recommended as a tuning mechanism for heavily +loaded servers to help with overloaded or misconfigured +conditions. +For recommended alternatives see +.IR tcp_max_syn_backlog , +.IR tcp_synack_retries , +and +.IR tcp_abort_on_overflow . +.TP +.BR tcp_syn_retries " (integer; default: 5)" +The maximum number of times initial SYNs for an active TCP +connection attempt will be retransmitted. +This value should +not be higher than 255. +The default value is 5, which +corresponds to approximately 180 seconds. +.TP +.BR tcp_timestamps " (Boolean; default: enabled)" +Enable RFC\ 1323 TCP timestamps. +.TP +.BR tcp_tw_recycle " (Boolean; default: disabled)" +Enable fast recycling of TIME_WAIT sockets. +Enabling this option is not +recommended since this causes problems when working +with NAT (Network Address Translation). +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR tcp_tw_reuse " (Boolean; default: disabled)" +Allow to reuse TIME_WAIT sockets for new connections when it is +safe from protocol viewpoint. +It should not be changed without advice/request of technical +experts. +.TP +.BR tcp_window_scaling " (Boolean; default: enabled)" +Enable RFC\ 1323 TCP window scaling. +This feature allows the use of a large window +(> 64K) on a TCP connection, should the other end support it. +Normally, the 16 bit window length field in the TCP header +limits the window size to less than 64K bytes. +If larger +windows are desired, applications can increase the size of +their socket buffers and the window scaling option will be +employed. +If +.I tcp_window_scaling +is disabled, TCP will not negotiate the use of window +scaling with the other end during connection setup. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR tcp_vegas_cong_avoid " (Boolean; default: disabled)" +Enable TCP Vegas congestion avoidance algorithm. +TCP Vegas is a sender-side only change to TCP that anticipates +the onset of congestion by estimating the bandwidth. +TCP Vegas +adjusts the sending rate by modifying the congestion +window. +TCP Vegas should provide less packet loss, but it is +not as aggressive as TCP Reno. +.\" +.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt +.TP +.BR tcp_westwood " (Boolean; default: disabled)" +Enable TCP Westwood+ congestion control algorithm. +TCP Westwood+ is a sender-side only modification of the TCP Reno +protocol stack that optimizes the performance of TCP congestion +control. +It is based on end-to-end bandwidth estimation to set +congestion window and slow start threshold after a congestion +episode. +Using this estimation, TCP Westwood+ adaptively sets a +slow start threshold and a congestion window which takes into +account the bandwidth used at the time congestion is experienced. +TCP Westwood+ significantly increases fairness with respect to +TCP Reno in wired networks and throughput over wireless links. +.TP +.I tcp_wmem +This is a vector of 3 integers: [min, default, max]. +These parameters are used by TCP to regulate send buffer sizes. +TCP dynamically adjusts the size of the send buffer from the +default values listed below, in the range of these sysctl +variables, depending on memory available. + +.I min +- minimum size of the send buffer used by each TCP socket. +The default value is 4K bytes. +This value is used to ensure that in memory pressure mode, +allocations below this size will still succeed. +This is not +used to bound the size of the send buffer declared +using +.B SO_SNDBUF +on a socket. + +.I default +- the default size of the send buffer for a TCP socket. +This value overwrites the initial default buffer size from +the generic global +.I net.core.wmem_default +defined for all protocols. +The default value is 16K bytes. +If larger send buffer sizes are desired, this value +should be increased (to affect all sockets). +To employ large TCP windows, the sysctl variable +.I net.ipv4.tcp_window_scaling +must be enabled (default). + +.I max +- the maximum size of the send buffer used by +each TCP socket. +This value does not override the global +.IR net.core.wmem_max . +This is not used to limit the size of the send buffer +declared using +.B SO_SNDBUF +on a socket. +The default value is 128K bytes. +It is lowered to 64K +depending on the memory available in the system. +.SS Socket Options +To set or get a TCP socket option, call +.BR getsockopt (2) +to read or +.BR setsockopt (2) +to write the option with the option level argument set to +.BR IPPROTO_TCP . +.\" or SOL_TCP on Linux +In addition, +most +.B IPPROTO_IP +socket options are valid on TCP sockets. +For more information see +.BR ip (7). +.\" FIXME Document TCP_CONGESTION (new in 2.6.13) +.TP +.B TCP_CORK +If set, don't send out partial frames. +All queued +partial frames are sent when the option is cleared again. +This is useful for prepending headers before calling +.BR sendfile (2), +or for throughput optimization. +As currently implemented, there is a 200 millisecond ceiling on the time +for which output is corked by +.BR TCP_CORK . +If this ceiling is reached, then queued data is automatically transmitted. +This option can be +combined with +.B TCP_NODELAY +only since Linux 2.5.71. +This option should not be used in code intended to be +portable. +.TP +.B TCP_DEFER_ACCEPT +Allows a listener to be awakened only when data arrives on +the socket. +Takes an integer value (seconds), this can +bound the maximum number of attempts TCP will make to +complete the connection. +This option should not be used in +code intended to be portable. +.TP +.B TCP_INFO +Used to collect information about this socket. +The kernel returns a \fIstruct tcp_info\fP as defined in the file +.IR /usr/include/linux/tcp.h . +This option should not be used in code intended to be portable. +.TP +.B TCP_KEEPCNT +The maximum number of keepalive probes TCP should send +before dropping the connection. +This option should not be +used in code intended to be portable. +.TP +.B TCP_KEEPIDLE +The time (in seconds) the connection needs to remain idle +before TCP starts sending keepalive probes, if the socket +option +.B SO_KEEPALIVE +has been set on this socket. +This option should not be used in code intended to be portable. +.TP +.B TCP_KEEPINTVL +The time (in seconds) between individual keepalive probes. +This option should not be used in code intended to be +portable. +.TP +.B TCP_LINGER2 +The lifetime of orphaned FIN_WAIT2 state sockets. +This option can be used to override the system wide sysctl +.I tcp_fin_timeout +on this socket. +This is not to be confused with the +.BR socket (7) +level option +.BR SO_LINGER . +This option should not be used in code intended to be +portable. +.TP +.B TCP_MAXSEG +The maximum segment size for outgoing TCP packets. +If this option is set before connection establishment, it also +changes the MSS value announced to the other end in the +initial packet. +Values greater than the (eventual) interface MTU have no effect. +TCP will also impose +its minimum and maximum bounds over the value provided. +.TP +.B TCP_NODELAY +If set, disable the Nagle algorithm. +This means that segments +are always sent as soon as possible, even if there is only a +small amount of data. +When not set, data is buffered until there +is a sufficient amount to send out, thereby avoiding the +frequent sending of small packets, which results in poor +utilization of the network. +This option is overridden by +.BR TCP_CORK ; +however, setting this option forces an explicit flush of +pending output, even if +.B TCP_CORK +is currently set. +.TP +.B TCP_QUICKACK +Enable quickack mode if set or disable quickack +mode if cleared. +In quickack mode, acks are sent +immediately, rather than delayed if needed in accordance +to normal TCP operation. +This flag is not permanent, +it only enables a switch to or from quickack mode. +Subsequent operation of the TCP protocol will +once again enter/leave quickack mode depending on +internal protocol processing and factors such as +delayed ack timeouts occurring and data transfer. +This option should not be used in code intended to be +portable. +.TP +.B TCP_SYNCNT +Set the number of SYN retransmits that TCP should send before +aborting the attempt to connect. +It cannot exceed 255. +This option should not be used in code intended to be +portable. +.TP +.B TCP_WINDOW_CLAMP +Bound the size of the advertised window to this value. +The kernel imposes a minimum size of SOCK_MIN_RCVBUF/2. +This option should not be used in code intended to be +portable. +.SS Ioctls +These following +.BR ioctl (2) +calls return information in +.IR value . +The correct syntax is: +.PP +.RS +.nf +.BI int " value"; +.IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");" +.fi +.RE +.PP +.I ioctl_type +is one of the following: +.TP +.B SIOCINQ +Returns the amount of queued unread data in the receive buffer. +The socket must not be in LISTEN state, otherwise an error +.RB ( EINVAL ) +is returned. +.TP +.B SIOCATMARK +Returns true (i.e., +.I value +is non-zero) if the inbound data stream is at the urgent mark. + +If the +.B SO_OOBINLINE +socket option is set, and +.B SIOCATMARK +returns true, then the +next read from the socket will return the urgent data. +If the +.B SO_OOBINLINE +socket option is not set, and +.B SIOCATMARK +returns true, then the +next read from the socket will return the bytes following +the urgent data (to actually read the urgent data requires the +.B recv(MSG_OOB) +flag). + +Note that a read never reads across the urgent mark. +If an application is informed of the presence of urgent data via +.BR select (2) +(using the +.I exceptfds +argument) or through delivery of a +.B SIGURG +signal, +then it can advance up to the mark using a loop which repeatedly tests +.B SIOCATMARK +and performs a read (requesting any number of bytes) as long as +.B SIOCATMARK +returns false. +.TP +.B SIOCOUTQ +Returns the amount of unsent data in the socket send queue. +The socket must not be in LISTEN state, otherwise an error +.RB ( EINVAL ) +is returned. +.SS Error Handling +When a network error occurs, TCP tries to resend the packet. +If it doesn't succeed after some time, either +.B ETIMEDOUT +or the last received error on this connection is reported. +.PP +Some applications require a quicker error notification. +This can be enabled with the +.B IPPROTO_IP +level +.B IP_RECVERR +socket option. +When this option is enabled, all incoming +errors are immediately passed to the user program. +Use this +option with care \(em it makes TCP less tolerant to routing +changes and other normal network conditions. +.SH ERRORS +.TP +.B EAFNOTSUPPORT +Passed socket address type in +.I sin_family +was not +.BR AF_INET . +.TP +.B EPIPE +The other end closed the socket unexpectedly or a read is +executed on a shut down socket. +.TP +.B ETIMEDOUT +The other end didn't acknowledge retransmitted data after +some time. +.PP +Any errors defined for +.BR ip (7) +or the generic socket layer may also be returned for TCP. +.SH VERSIONS +Support for Explicit Congestion Notification, zero-copy +.BR sendfile (2), +reordering support and some SACK extensions +(DSACK) were introduced in 2.4. +Support for forward acknowledgement (FACK), TIME_WAIT recycling, +per connection keepalive socket options and sysctls +were introduced in 2.3. + +The default values and descriptions for the sysctl variables +given above are applicable for the 2.4 kernel. +.SH NOTES +TCP has no real out-of-band data; it has urgent data. +In Linux this means if the other end sends newer out-of-band +data the older urgent data is inserted as normal data into +the stream (even when +.B SO_OOBINLINE +is not set). +This differs from BSD-based stacks. +.PP +Linux uses the BSD compatible interpretation of the urgent +pointer field by default. +This violates RFC\ 1122, but is +required for interoperability with other stacks. +It can be changed by the +.I tcp_stdurg +sysctl. +.SH BUGS +Not all errors are documented. +.br +IPv6 is not described. +.\" Only a single Linux kernel version is described +.\" Info for 2.2 was lost. Should be added again, +.\" or put into a separate page. +.\" .SH AUTHORS +.\" This man page was originally written by Andi Kleen. +.\" It was updated for 2.4 by Nivedita Singhvi with input from +.\" Alexey Kuznetsov's Documentation/networking/ip-sysctl.txt +.\" document. +.SH "SEE ALSO" +.BR accept (2), +.BR bind (2), +.BR connect (2), +.BR getsockopt (2), +.BR listen (2), +.BR recvmsg (2), +.BR sendfile (2), +.BR sendmsg (2), +.BR socket (2), +.BR sysctl (2), +.BR ip (7), +.BR socket (7) +.sp +RFC\ 793 for the TCP specification. +.br +RFC\ 1122 for the TCP requirements and a description +of the Nagle algorithm. +.br +RFC\ 1323 for TCP timestamp and window scaling options. +.br +RFC\ 1644 for a description of TIME_WAIT assassination +hazards. +.br +RFC\ 3168 for a description of Explicit Congestion +Notification. +.br +RFC\ 2581 for TCP congestion control algorithms. +.br +RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK. diff --git a/man7/udp.7 b/man7/udp.7 index a3d4e989a..243d31346 100644 --- a/man7/udp.7 +++ b/man7/udp.7 @@ -1,8 +1,193 @@ -.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UDP 7 2008-08-07 "Linux" "Linux Programmer's Manual" +.\" This man page is Copyright (C) 1999 Andi Kleen . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" $Id: udp.7,v 1.7 2000/01/22 01:55:05 freitag Exp $ +.\" +.TH UDP 7 1998-10-02 "Linux" "Linux Programmer's Manual" +.SH NAME +udp \- User Datagram Protocol for IPv4 +.SH SYNOPSIS +.B #include +.br +.B #include +.sp +.B udp_socket = socket(PF_INET, SOCK_DGRAM, 0); +.SH DESCRIPTION +This is an implementation of the User Datagram Protocol +described in RFC\ 768. +It implements a connectionless, unreliable datagram packet service. +Packets may be reordered or duplicated before they arrive. +UDP generates and checks checksums to catch transmission errors. + +When a UDP socket is created, +its local and remote addresses are unspecified. +Datagrams can be sent immediately using +.BR sendto (2) +or +.BR sendmsg (2) +with a valid destination address as an argument. +When +.BR connect (2) +is called on the socket the default destination address is set and +datagrams can now be sent using +.BR send (2) +or +.BR write (2) +without specifying a destination address. +It is still possible to send to other destinations by passing an +address to +.BR sendto (2) +or +.BR sendmsg (2). +In order to receive packets the socket can be bound to a local +address first by using +.BR bind (2). +Otherwise the socket layer will automatically assign +a free local port out of the range defined by +.I net.ipv4.ip_local_port_range +and bind the socket to +.BR INADDR_ANY . + +All receive operations return only one packet. +When the packet is smaller than the passed buffer only that much +data is returned, when it is bigger the packet is truncated and the +.B MSG_TRUNC +flag is set. +.B MSG_WAITALL +is not supported. + +IP options may be sent or received using the socket options described in +.BR ip (7). +They are only processed by the kernel when the appropriate sysctl +is enabled (but still passed to the user even when it is turned off). +See +.BR ip (7). + +When the +.B MSG_DONTROUTE +flag is set on sending the destination address must refer to a local +interface address and the packet is only sent to that interface. + +By default Linux UDP does path MTU (Maximum Transmission Unit) discovery. +This means the kernel +will keep track of the MTU to a specific target IP address and return +.B EMSGSIZE +when a UDP packet write exceeds it. +When this happens the application should decrease the packet size. +Path MTU discovery can be also turned off using the +.B IP_MTU_DISCOVER +socket option or the +.I ip_no_pmtu_disc +sysctl, see +.BR ip (7) +for details. +When turned off UDP will fragment outgoing UDP packets +that exceed the interface MTU. +However disabling it is not recommended +for performance and reliability reasons. +.SS "Address Format" +UDP uses the IPv4 +.I sockaddr_in +address format described in +.BR ip (7). +.SS "Error Handling" +All fatal errors will be passed to the user as an error return even +when the socket is not connected. +This includes asynchronous errors +received from the network. +You may get an error for an earlier packet +that was sent on the same socket. +This behavior differs from many other BSD socket implementations +which don't pass any errors unless the socket is connected. +Linux's behavior is mandated by +.BR RFC\ 1122 . + +For compatibility with legacy code in Linux 2.0 and 2.2 +it was possible to set the +.B SO_BSDCOMPAT +.B SOL_SOCKET +option to receive remote errors only when the socket has been +connected (except for +.B EPROTO +and +.BR EMSGSIZE ). +Locally generated errors are always passed. +Support for this socket option was removed in later kernels; see +.BR socket (7) +for further information. + +When the +.B IP_RECVERR +option is enabled all errors are stored in the socket error queue +and can be received by +.BR recvmsg (2) +with the +.B MSG_ERRQUEUE +flag set. +.SS "Socket Options" +To set or get a UDP socket option, call +.BR getsockopt (2) +to read or +.BR setsockopt (2) +to write the option with the option level argument set to +.BR IPPROTO_UDP . +.TP +.BR UDP_CORK " (since Linux 2.5.44)" +If this option is enabled, then all data output on this socket +is accumulated into a single datagram that is transmitted when +the option is disabled. +This option should not be used in code intended to be +portable. +.\" FIXME document UDP_ENCAP (new in kernel 2.5.67) +.SS Ioctls +These ioctls can be accessed using +.BR ioctl (2). +The correct syntax is: +.PP +.RS +.nf +.BI int " value"; +.IB error " = ioctl(" udp_socket ", " ioctl_type ", &" value ");" +.fi +.RE +.TP +.BR FIONREAD " (" SIOCINQ ) +Gets a pointer to an integer as argument. +Returns the size of the next pending datagram in the integer in bytes, +or 0 when no datagram is pending. +.TP +.BR TIOCOUTQ " (" SIOCOUTQ ) +Returns the number of data bytes in the local send queue. +Only supported with Linux 2.4 and above. +.PP +In addition all ioctls documented in +.BR ip (7) +and +.BR socket (7) +are supported. +.SH ERRORS +All errors documented for +.BR socket (7) +or +.BR ip (7) +may be returned by a send or receive on a UDP socket. + +.B ECONNREFUSED +No receiver was associated with the destination address. +This might be caused by a previous packet sent over the socket. +.SH VERSIONS +IP_RECVERR is a new feature in Linux 2.2. +.\" .SH CREDITS +.\" This man page was written by Andi Kleen. +.SH "SEE ALSO" +.BR ip (7), +.BR raw (7), +.BR socket (7) + +RFC\ 768 for the User Datagram Protocol. +.br +RFC\ 1122 for the host requirements. +.br +RFC\ 1191 for a description of path MTU discovery. diff --git a/man7/unix.7 b/man7/unix.7 index 631a99c5a..3f66b3742 100644 --- a/man7/unix.7 +++ b/man7/unix.7 @@ -1,8 +1,359 @@ -.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH UNIX 7 2008-08-07 "Linux" "Linux Programmer's Manual" +.\" This man page is Copyright (C) 1999 Andi Kleen . +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" +.\" Modified, 2003-12-02, Michael Kerrisk, +.\" Modified, 2003-09-23, Adam Langley +.\" Modified, 2004-05-27, Michael Kerrisk, +.\" Added SOCK_SEQPACKET +.\" 2008-05-27, mtk, Provide a clear description of the three types of +.\" address that can appear in the sockaddr_un structure: pathname, +.\" unnamed, and abstract. +.\" +.TH UNIX 7 2008-06-17 "Linux" "Linux Programmer's Manual" +.SH NAME +unix, PF_UNIX, AF_UNIX, PF_LOCAL, AF_LOCAL \- Sockets for local +interprocess communication +.SH SYNOPSIS +.B #include +.br +.B #include + +.IB unix_socket " = socket(PF_UNIX, type, 0);" +.br +.IB error " = socketpair(PF_UNIX, type, 0, int *" sv ");" +.SH DESCRIPTION +The +.B PF_UNIX +(also known as +.BR PF_LOCAL ) +socket family is used to communicate between processes on the same machine +efficiently. +Traditionally, Unix sockets can be either unnamed, +or bound to a file system pathname (marked as being of type socket). +Linux also supports an abstract namespace which is independent of the +file system. + +Valid types are: +.BR SOCK_STREAM , +for a stream-oriented socket and +.BR SOCK_DGRAM , +for a datagram-oriented socket that preserves message boundaries +(as on most Unix implementations, Unix domain datagram +sockets are always reliable and don't reorder datagrams); +and (since Linux 2.6.4) +.BR SOCK_SEQPACKET , +for a connection-oriented socket that preserves message boundaries +and delivers messages in the order that they were sent. + +Unix sockets support passing file descriptors or process credentials +to other processes using ancillary data. +.SS Address Format +A Unix domain socket address is represented in the following structure: +.in +4n +.nf + +#define UNIX_PATH_MAX 108 + +struct sockaddr_un { + sa_family_t sun_family; /* AF_UNIX */ + char sun_path[UNIX_PATH_MAX]; /* pathname */ +}; +.fi +.in +.PP +.I sun_family +always contains +.BR AF_UNIX . + +Three types of address are distinguished in this structure: +.IP * 3 +.IR pathname : +a Unix domain socket can be bound to a null-terminated file +system pathname using +.BR bind (2). +When the address of the socket is returned by +.BR getsockname (2), +.BR getpeername (2), +and +.BR accept (2), +its length is +.IR "sizeof(sa_family_t) + strlen(sun_path) + 1" , +and +.I sun_path +contains the null-terminated pathname. +.IP * +.IR unnamed : +A stream socket that has not been bound to a pathname using +.BR bind (2) +has no name. +Likewise, the two sockets created by +.BR socketpair (2) +are unnamed. +When the address of an unnamed socket is returned by +.BR getsockname (2), +.BR getpeername (2), +and +.BR accept (2), +its length is +.IR "sizeof(sa_family_t)" , +and +.I sun_path +should not be inspected. +.\" There is quite some variation across implementations: FreeBSD +.\" says the length is 16 bytes, HP-UX 11 says it's zero bytes. +.IP * +.IR abstract : +an abstract socket address is distinguished by the fact that +.IR sun_path[0] +is a null byte ('\\0'). +All of the remaining bytes in +.I sun_path +define the "name" of the socket. +(Null bytes in the name have no special significance.) +The name has no connection with file system pathnames. +The socket's address in this namespace is given by the rest of the +bytes in +.IR sun_path . +When the address of an abstract socket is returned by +.BR getsockname (2), +.BR getpeername (2), +and +.BR accept (2), +its length is +.IR "sizeof(struct sockaddr_un)" , +and +.I sun_path +contains the abstract name. +The abstract socket namespace is a non-portable Linux extension. +.SS Socket Options +For historical reasons these socket options are specified with a +.B SOL_SOCKET +type even though they are +.B PF_UNIX +specific. +They can be set with +.BR setsockopt (2) +and read with +.BR getsockopt (2) +by specifying +.B SOL_SOCKET +as the socket family. +.TP +.B SO_PASSCRED +Enables the receiving of the credentials of the sending process +ancillary message. +When this option is set and the socket is not yet connected +a unique name in the abstract namespace will be generated automatically. +Expects an integer boolean flag. +.SS (Un)supported Features +The following paragraphs describe domain-specific details and +unsupported features of the sockets API for Unix domain sockets on Linux. + +Unix domain sockets do not support the transmission of +out-of-band data (the +.B MSG_OOB +flag for +.BR send (2) +and +.BR recv (2)). + +The +.BR send (2) +.B MSG_MORE +flag is not supported by Unix domain sockets. + +The +.B SO_SNDBUF +socket option does have an effect for Unix domain sockets, but the +.B SO_RCVBUF +option does not. +For datagram sockets, the +.B SO_SNDBUF +value imposes an upper limit on the size of outgoing datagrams. +This limit is calculated as the doubled (see +.BR socket (7)) +option value less 32 bytes used for overhead. +.SS Ancillary Messages +Ancillary data is sent and received using +.BR sendmsg (2) +and +.BR recvmsg (2). +For historical reasons the ancillary message types listed below +are specified with a +.B SOL_SOCKET +type even though they are +.B PF_UNIX +specific. +To send them set the +.I cmsg_level +field of the struct +.I cmsghdr +to +.B SOL_SOCKET +and the +.I cmsg_type +field to the type. +For more information see +.BR cmsg (3). +.TP +.B SCM_RIGHTS +Send or receive a set of open file descriptors from another process. +The data portion contains an integer array of the file descriptors. +The passed file descriptors behave as though they have been created with +.BR dup (2). +.TP +.B SCM_CREDENTIALS +Send or receive Unix credentials. +This can be used for authentication. +The credentials are passed as a +.I struct ucred +ancillary message. + +.in +4n +.nf +struct ucred { + pid_t pid; /* process ID of the sending process */ + uid_t uid; /* user ID of the sending process */ + gid_t gid; /* group ID of the sending process */ +}; +.fi +.in + +The credentials which the sender specifies are checked by the kernel. +A process with effective user ID 0 is allowed to specify values that do +not match its own. +The sender must specify its own process ID (unless it has the capability +.BR CAP_SYS_ADMIN ), +its user ID, effective user ID, or saved set-user-ID (unless it has +.BR CAP_SETUID ), +and its group ID, effective group ID, or saved set-group-ID +(unless it has +.BR CAP_SETGID ). +To receive a +.I struct ucred +message the +.B SO_PASSCRED +option must be enabled on the socket. +.SH ERRORS +.TP +.B EADDRINUSE +Selected local address is already taken or file system socket +object already exists. +.TP +.B ECONNREFUSED +.BR connect (2) +called with a socket object that isn't listening. +This can happen when +the remote socket does not exist or the filename is not a socket. +.TP +.B ECONNRESET +Remote socket was unexpectedly closed. +.TP +.B EFAULT +User memory address was not valid. +.TP +.B EINVAL +Invalid argument passed. +A common cause is the missing setting of AF_UNIX +in the +.I sun_type +field of passed addresses or the socket being in an +invalid state for the applied operation. +.TP +.B EISCONN +.BR connect (2) +called on an already connected socket or a target address was +specified on a connected socket. +.TP +.B ENOMEM +Out of memory. +.TP +.B ENOTCONN +Socket operation needs a target address, but the socket is not connected. +.TP +.B EOPNOTSUPP +Stream operation called on non-stream oriented socket or tried to +use the out-of-band data option. +.TP +.B EPERM +The sender passed invalid credentials in the +.IR "struct ucred" . +.TP +.B EPIPE +Remote socket was closed on a stream socket. +If enabled, a +.B SIGPIPE +is sent as well. +This can be avoided by passing the +.B MSG_NOSIGNAL +flag to +.BR sendmsg (2) +or +.BR recvmsg (2). +.TP +.B EPROTONOSUPPORT +Passed protocol is not PF_UNIX. +.TP +.B EPROTOTYPE +Remote socket does not match the local socket type +.RB ( SOCK_DGRAM +vs. +.BR SOCK_STREAM ) +.TP +.B ESOCKTNOSUPPORT +Unknown socket type. +.PP +Other errors can be generated by the generic socket layer or +by the file system while generating a file system socket object. +See the appropriate manual pages for more information. +.SH VERSIONS +.B SCM_CREDENTIALS +and the abstract namespace were introduced with Linux 2.2 and should not +be used in portable programs. +(Some BSD-derived systems also support credential passing, +but the implementation details differ.) +.SH NOTES +In the Linux implementation, sockets which are visible in the +file system honor the permissions of the directory they are in. +Their owner, group and their permissions can be changed. +Creation of a new socket will fail if the process does not have write and +search (execute) permission on the directory the socket is created in. +Connecting to the socket object requires read/write permission. +This behavior differs from many BSD-derived systems which +ignore permissions for Unix sockets. +Portable programs should not rely on +this feature for security. + +Binding to a socket with a filename creates a socket +in the file system that must be deleted by the caller when it is no +longer needed (using +.BR unlink (2)). +The usual Unix close-behind semantics apply; the socket can be unlinked +at any time and will be finally removed from the file system when the last +reference to it is closed. + +To pass file descriptors or credentials over a +.BR SOCK_STREAM , +you need +to send or receive at least one byte of non-ancillary data in the same +.BR sendmsg (2) +or +.BR recvmsg (2) +call. + +Unix domain stream sockets do not support the notion of out-of-band data. +.SH EXAMPLE +See +.BR bind (2). +.SH "SEE ALSO" +.BR recvmsg (2), +.BR sendmsg (2), +.BR socket (2), +.BR socketpair (2), +.BR cmsg (3), +.BR capabilities (7), +.BR credentials (7), +.BR socket (7) diff --git a/man7/x25.7 b/man7/x25.7 index f1c71ac9a..92a77e06b 100644 --- a/man7/x25.7 +++ b/man7/x25.7 @@ -1,8 +1,122 @@ -.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual" -.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual" +.\" This man page is Copyright (C) 1998 Heiner Eisen. +.\" Permission is granted to distribute possibly modified copies +.\" of this page provided the header is included verbatim, +.\" and in case of nontrivial modification author and date +.\" of the modification is added to the header. +.\" $Id: x25.7,v 1.4 1999/05/18 10:35:12 freitag Exp $ +.TH X25 7 1998-12-01 "Linux" "Linux Programmer's Manual" +.SH NAME +x25, PF_X25 \- ITU-T X.25 / ISO-8208 protocol interface. +.SH SYNOPSIS +.B #include +.br +.B #include +.sp +.B x25_socket = socket(PF_X25, SOCK_SEQPACKET, 0); +.SH DESCRIPTION +X25 sockets provide an interface to the X.25 packet layer protocol. +This allows applications to +communicate over a public X.25 data network as standardized by +International Telecommunication Union's recommendation X.25 +(X.25 DTE-DCE mode). +X25 sockets can also be used for communication +without an intermediate X.25 network (X.25 DTE-DTE mode) as described +in ISO-8208. +.PP +Message boundaries are preserved \(em a +.BR read (2) +from a socket will +retrieve the same chunk of data as output with the corresponding +.BR write (2) +to the peer socket. +When necessary, the kernel takes care +of segmenting and re-assembling long messages by means of +the X.25 M-bit. +There is no hard-coded upper limit for the +message size. +However, re-assembling of a long message might fail if +there is a temporary lack of system resources or when other constraints +(such as socket memory or buffer size limits) become effective. +If that +occurs, the X.25 connection will be reset. +.SS Socket Addresses +The +.B AF_X25 +socket address family uses the +.I struct sockaddr_x25 +for representing network addresses as defined in ITU-T +recommendation X.121. +.PP +.in +4n +.nf +struct sockaddr_x25 { + sa_family_t sx25_family; /* must be AF_X25 */ + x25_address sx25_addr; /* X.121 Address */ +}; +.fi +.in +.PP +.I sx25_addr +contains a char array +.I x25_addr[] +to be interpreted as a null-terminated string. +.I sx25_addr.x25_addr[] +consists of up to 15 (not counting the terminating 0) ASCII +characters forming the X.121 address. +Only the decimal digit characters from \(aq0\(aq to \(aq9\(aq are allowed. +.SS Socket Options +The following X.25-specific socket options can be set by using +.BR setsockopt (2) +and read with +.BR getsockopt (2) +with the +.I level +argument set to +.BR SOL_X25 . +.TP +.B X25_QBITINCL +Controls whether the X.25 Q-bit (Qualified Data Bit) is accessible by the +user. +It expects an integer argument. +If set to 0 (default), +the Q-bit is never set for outgoing packets and the Q-bit of incoming +packets is ignored. +If set to 1, an additional first byte is prepended +to each message read from or written to the socket. +For data read from +the socket, a 0 first byte indicates that the Q-bits of the corresponding +incoming data packets were not set. +A first byte with value 1 indicates +that the Q-bit of the corresponding incoming data packets was set. +If the first byte of the data written to the socket is 1 the Q-bit of the +corresponding outgoing data packets will be set. +If the first byte is 0 +the Q-bit will not be set. +.SH VERSIONS +The PF_X25 protocol family is a new feature of Linux 2.2. +.SH BUGS +Plenty, as the X.25 PLP implementation is +.BR CONFIG_EXPERIMENTAL . +.PP +This man page is incomplete. +.PP +There is no dedicated application programmer's header file yet; +you need to include the kernel header file +.IR . +.B CONFIG_EXPERIMENTAL +might also imply that future versions of the +interface are not binary compatible. +.PP +X.25 N-Reset events are not propagated to the user process yet. +Thus, +if a reset occurred, data might be lost without notice. +.SH "SEE ALSO" +.BR socket (2), +.BR socket (7) +.PP +Jonathan Simon Naylor: +\(lqThe Re-Analysis and Re-Implementation of X.25.\(rq +The URL is +.RS +.I ftp://ftp.pspt.fi/pub/ham/linux/ax25/x25doc.tgz +.RE