Reverting blunder in commit 4699

2008-08-08 16:41:48 +00:00 · 2008-08-08 16:41:48 +00:00 · 77117f4fc5
parent 10874173db
commit 77117f4fc5
17 changed files with 6614 additions and 127 deletions
--- a/23
+++ b/23
@ -38,6 +38,29 @@ initrd.4
    mtk
        Fix mis-ordered (.SH) sections.

+connect.2
+socket.2
+rtnetlink.3
+arp.7
+ddp.7
+ip.7
+ipv6.7
+netlink.7
+packet.7
+raw.7
+rtnetlink.7
+socket.7
+tcp.7
+udp.7
+unix.7
+x25.7
+    mtk
+        s/PF_/AF_/ for socket family conistants.  Reasons: the AF_ and
+	PF_ constants have always had the same values; there never has
+	been a protocol family that had more than one address family,
+	and POSIX.1-2001 only specifies the AF_* constants.
+
+
 Typographical or grammatical errors have been corrected in several
 other places.

--- a/man2/connect.2
+++ b/man2/connect.2
@ -1,8 +1,268 @@
-.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH CONNECT 2 2008-08-07 "Linux" "Linux Programmer's Manual"
+.\" Hey Emacs! This file is -*- nroff -*- source.
+.\"
+.\" Copyright 1993 Rickard E. Faith (faith@cs.unc.edu)
+.\" Portions extracted from /usr/include/sys/socket.h, which does not have
+.\" any authorship information in it.  It is probably available under the GPL.
+.\"
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\"
+.\"
+.\" Other portions are from the 6.9 (Berkeley) 3/10/91 man page:
+.\"
+.\" Copyright (c) 1983 The Regents of the University of California.
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\" 3. All advertising materials mentioning features or use of this software
+.\"    must display the following acknowledgement:
+.\"     This product includes software developed by the University of
+.\"     California, Berkeley and its contributors.
+.\" 4. Neither the name of the University nor the names of its contributors
+.\"    may be used to endorse or promote products derived from this software
+.\"    without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" Modified 1997-01-31 by Eric S. Raymond <esr@thyrsus.com>
+.\" Modified 1998, 1999 by Andi Kleen
+.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
+.\"
+.TH CONNECT 2 2007-12-28 "Linux" "Linux Programmer's Manual"
+.SH NAME
+connect \- initiate a connection on a socket
+.SH SYNOPSIS
+.nf
+.BR "#include <sys/types.h>" "          /* See NOTES */"
+.br
+.B #include <sys/socket.h>
+.sp
+.BI "int connect(int " sockfd ", const struct sockaddr *" serv_addr ,
+.BI "            socklen_t " addrlen );
+.fi
+.SH DESCRIPTION
+The
+.BR connect ()
+system call connects the socket referred to by the file descriptor
+.I sockfd
+to the address specified by
+.IR serv_addr .
+The
+.I addrlen
+argument specifies the size of
+.IR serv_addr .
+The format of the address in
+.I serv_addr
+is determined by the address space of the socket
+.IR sockfd ;
+see
+.BR socket (2)
+for further details.
+
+If the socket
+.I sockfd
+is of type
+.B SOCK_DGRAM
+then
+.I serv_addr
+is the address to which datagrams are sent by default, and the only
+address from which datagrams are received.
+If the socket is of type
+.B SOCK_STREAM
+or
+.BR SOCK_SEQPACKET ,
+this call attempts to make a connection to the socket that is bound
+to the address specified by
+.IR serv_addr .
+.PP
+Generally, connection-based protocol sockets may successfully
+.BR connect ()
+only once; connectionless protocol sockets may use
+.BR connect ()
+multiple times to change their association.
+Connectionless sockets may
+dissolve the association by connecting to an address with the
+.I sa_family
+member of
+.I sockaddr
+set to
+.BR AF_UNSPEC
+(supported on Linux since kernel 2.2).
+.SH "RETURN VALUE"
+If the connection or binding succeeds, zero is returned.
+On error, \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+The following are general socket errors only.
+There may be other domain-specific error codes.
+.TP
+.B EACCES
+For Unix domain sockets, which are identified by pathname:
+Write permission is denied on the socket file,
+or search permission is denied for one of the directories
+in the path prefix.
+(See also
+.BR path_resolution (7).)
+.TP
+.BR EACCES ", " EPERM
+The user tried to connect to a broadcast address without having the socket
+broadcast flag enabled or the connection request failed because of a local
+firewall rule.
+.TP
+.B EADDRINUSE
+Local address is already in use.
+.TP
+.B EAFNOSUPPORT
+The passed address didn't have the correct address family in its
+.I sa_family
+field.
+.TP
+.B EAGAIN
+No more free local ports or insufficient entries in the routing cache.
+For
+.B PF_INET
+see the
+.I net.ipv4.ip_local_port_range
+sysctl in
+.BR ip (7)
+on how to increase the number of local ports.
+.TP
+.B EALREADY
+The socket is non-blocking and a previous connection attempt has not yet
+been completed.
+.TP
+.B EBADF
+The file descriptor is not a valid index in the descriptor table.
+.TP
+.B ECONNREFUSED
+No-one listening on the remote address.
+.TP
+.B EFAULT
+The socket structure address is outside the user's address space.
+.TP
+.B EINPROGRESS
+The socket is non-blocking and the connection cannot be completed
+immediately.
+It is possible to
+.BR select (2)
+or
+.BR poll (2)
+for completion by selecting the socket for writing.
+After
+.BR select (2)
+indicates writability, use
+.BR getsockopt (2)
+to read the
+.B SO_ERROR
+option at level
+.B SOL_SOCKET
+to determine whether
+.BR connect ()
+completed successfully
+.RB ( SO_ERROR
+is zero) or unsuccessfully
+.RB ( SO_ERROR
+is one of the usual error codes listed here,
+explaining the reason for the failure).
+.TP
+.B EINTR
+The system call was interrupted by a signal that was caught; see
+.BR signal (7).
+.\" For TCP, the connection will complete asynchronously.
+.\" See http://lkml.org/lkml/2005/7/12/254
+.TP
+.B EISCONN
+The socket is already connected.
+.TP
+.B ENETUNREACH
+Network is unreachable.
+.TP
+.B ENOTSOCK
+The file descriptor is not associated with a socket.
+.TP
+.B ETIMEDOUT
+Timeout while attempting connection.
+The server may be too
+busy to accept new connections.
+Note that for IP sockets the timeout may
+be very long when syncookies are enabled on the server.
+.SH "CONFORMING TO"
+SVr4, 4.4BSD, (the
+.BR connect ()
+function first appeared in 4.2BSD), POSIX.1-2001.
+.\" SVr4 documents the additional
+.\" general error codes
+.\" .BR EADDRNOTAVAIL ,
+.\" .BR EINVAL ,
+.\" .BR EAFNOSUPPORT ,
+.\" .BR EALREADY ,
+.\" .BR EINTR ,
+.\" .BR EPROTOTYPE ,
+.\" and
+.\" .BR ENOSR .
+.\" It also
+.\" documents many additional error conditions not described here.
+.SH NOTES
+POSIX.1-2001 does not require the inclusion of
+.IR <sys/types.h> ,
+and this header file is not required on Linux.
+However, some historical (BSD) implementations required this header
+file, and portable applications are probably wise to include it.
+
+The third argument of
+.BR connect ()
+is in reality an
+.I int
+(and this is what 4.x BSD and libc4 and libc5 have).
+Some POSIX confusion resulted in the present
+.IR socklen_t ,
+also used by glibc.
+See also
+.BR accept (2).
+.SH EXAMPLE
+An example of the use of
+.BR connect ()
+is shown in
+.BR getaddrinfo (3).
+.SH "SEE ALSO"
+.BR accept (2),
+.BR bind (2),
+.BR getsockname (2),
+.BR listen (2),
+.BR socket (2),
+.BR path_resolution (7)
--- a/man2/socket.2
+++ b/man2/socket.2
@ -1,8 +1,382 @@
-.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH SOCKET 2 2008-08-07 "Linux" "Linux Programmer's Manual"
+'\" t
+.\" Copyright (c) 1983, 1991 The Regents of the University of California.
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\" 3. All advertising materials mentioning features or use of this software
+.\"    must display the following acknowledgement:
+.\"	This product includes software developed by the University of
+.\"	California, Berkeley and its contributors.
+.\" 4. Neither the name of the University nor the names of its contributors
+.\"    may be used to endorse or promote products derived from this software
+.\"    without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\"     $Id: socket.2,v 1.4 1999/05/13 11:33:42 freitag Exp $
+.\"
+.\" Modified 1993-07-24 by Rik Faith <faith@cs.unc.edu>
+.\" Modified 1996-10-22 by Eric S. Raymond <esr@thyrsus.com>
+.\" Modified 1998, 1999 by Andi Kleen <ak@muc.de>
+.\" Modified 2002-07-17 by Michael Kerrisk <mtk.manpages@gmail.com>
+.\" Modified 2004-06-17 by Michael Kerrisk <mtk.manpages@gmail.com>
+.\"
+.TH SOCKET 2 2004-06-17 "Linux" "Linux Programmer's Manual"
+.SH NAME
+socket \- create an endpoint for communication
+.SH SYNOPSIS
+.BR "#include <sys/types.h>" "          /* See NOTES */"
+.br
+.B #include <sys/socket.h>
+.sp
+.BI "int socket(int " domain ", int " type ", int " protocol );
+.SH DESCRIPTION
+.BR socket ()
+creates an endpoint for communication and returns a descriptor.
+.PP
+The
+.I domain
+argument specifies a communication domain; this selects the protocol
+family which will be used for communication.
+These families are defined in
+.IR <sys/socket.h> .
+The currently understood formats include:
+.TS
+tab(:);
+l l l.
+Name:Purpose:Man page
+T{
+.BR PF_UNIX ", " PF_LOCAL
+T}:T{
+Local communication
+T}:T{
+.BR unix (7)
+T}
+T{
+.B PF_INET
+T}:IPv4 Internet protocols:T{
+.BR ip (7)
+T}
+T{
+.B PF_INET6
+T}:IPv6 Internet protocols:T{
+.BR ipv6 (7)
+T}
+T{
+.B PF_IPX
+T}:IPX \- Novell protocols:
+T{
+.B PF_NETLINK
+T}:T{
+Kernel user interface device
+T}:T{
+.BR netlink (7)
+T}
+T{
+.B PF_X25
+T}:ITU-T X.25 / ISO-8208 protocol:T{
+.BR x25 (7)
+T}
+T{
+.B PF_AX25
+T}:T{
+Amateur radio AX.25 protocol
+T}:
+T{
+.B PF_ATMPVC
+T}:Access to raw ATM PVCs:
+T{
+.B PF_APPLETALK
+T}:Appletalk:T{
+.BR ddp (7)
+T}
+T{
+.B PF_PACKET
+T}:T{
+Low level packet interface
+T}:T{
+.BR packet (7)
+T}
+.TE
+.PP
+The socket has the indicated
+.IR type ,
+which specifies the communication semantics.
+Currently defined types
+are:
+.TP
+.B SOCK_STREAM
+Provides sequenced, reliable, two-way, connection-based byte streams.
+An out-of-band data transmission mechanism may be supported.
+.TP
+.B SOCK_DGRAM
+Supports datagrams (connectionless, unreliable messages of a fixed
+maximum length).
+.TP
+.B SOCK_SEQPACKET
+Provides a sequenced, reliable, two-way connection-based data
+transmission path for datagrams of fixed maximum length; a consumer is
+required to read an entire packet with each input system call.
+.TP
+.B SOCK_RAW
+Provides raw network protocol access.
+.TP
+.B SOCK_RDM
+Provides a reliable datagram layer that does not guarantee ordering.
+.TP
+.B SOCK_PACKET
+Obsolete and should not be used in new programs;
+see
+.BR packet (7).
+.PP
+Some socket types may not be implemented by all protocol families;
+for example,
+.B SOCK_SEQPACKET
+is not implemented for
+.BR AF_INET .
+.PP
+The
+.I protocol
+specifies a particular protocol to be used with the socket.
+Normally only a single protocol exists to support a particular
+socket type within a given protocol family, in which case
+.I protocol
+can be specified as 0.
+However, it is possible that many protocols may exist, in
+which case a particular protocol must be specified in this manner.
+The protocol number to use is specific to the \*(lqcommunication domain\*(rq
+in which communication is to take place; see
+.BR protocols (5).
+See
+.BR getprotoent (3)
+on how to map protocol name strings to protocol numbers.
+.PP
+Sockets of type
+.B SOCK_STREAM
+are full-duplex byte streams, similar to pipes.
+They do not preserve
+record boundaries.
+A stream socket must be in
+a
+.I connected
+state before any data may be sent or received on it.
+A connection to
+another socket is created with a
+.BR connect (2)
+call.
+Once connected, data may be transferred using
+.BR read (2)
+and
+.BR write (2)
+calls or some variant of the
+.BR send (2)
+and
+.BR recv (2)
+calls.
+When a session has been completed a
+.BR close (2)
+may be performed.
+Out-of-band data may also be transmitted as described in
+.BR send (2)
+and received as described in
+.BR recv (2).
+.PP
+The communications protocols which implement a
+.B SOCK_STREAM
+ensure that data is not lost or duplicated.
+If a piece of data for which
+the peer protocol has buffer space cannot be successfully transmitted
+within a reasonable length of time, then the connection is considered
+to be dead.
+When
+.B SO_KEEPALIVE
+is enabled on the socket the protocol checks in a protocol-specific
+manner if the other end is still alive.
+A
+.B SIGPIPE
+signal is raised if a process sends or receives
+on a broken stream; this causes naive processes,
+which do not handle the signal, to exit.
+.B SOCK_SEQPACKET
+sockets employ the same system calls as
+.B SOCK_STREAM
+sockets.
+The only difference is that
+.BR read (2)
+calls will return only the amount of data requested,
+and any data remaining in the arriving packet will be discarded.
+Also all message boundaries in incoming datagrams are preserved.
+.PP
+.B SOCK_DGRAM
+and
+.B SOCK_RAW
+sockets allow sending of datagrams to correspondents named in
+.BR sendto (2)
+calls.
+Datagrams are generally received with
+.BR recvfrom (2),
+which returns the next datagram along with the address of its sender.
+.PP
+.B SOCK_PACKET
+is an obsolete socket type to receive raw packets directly from the
+device driver.
+Use
+.BR packet (7)
+instead.
+.PP
+An
+.BR fcntl (2)
+.B F_SETOWN
+operation can be used to specify a process or process group to receive a
+.B SIGURG
+signal when the out-of-band data arrives or
+.B SIGPIPE
+signal when a
+.B SOCK_STREAM
+connection breaks unexpectedly.
+This operation may also be used to set the process or process group
+that receives the I/O and asynchronous notification of I/O events via
+.BR SIGIO .
+Using
+.B F_SETOWN
+is equivalent to an
+.BR ioctl (2)
+call with the
+.B FIOSETOWN
+or
+.B SIOCSPGRP
+argument.
+.PP
+When the network signals an error condition to the protocol module (e.g.,
+using a ICMP message for IP) the pending error flag is set for the socket.
+The next operation on this socket will return the error code of the pending
+error.
+For some protocols it is possible to enable a per-socket error queue
+to retrieve detailed information about the error; see
+.B IP_RECVERR
+in
+.BR ip (7).
+.PP
+The operation of sockets is controlled by socket level
+.IR options .
+These options are defined in
+.IR <sys/socket.h> .
+The functions
+.BR setsockopt (2)
+and
+.BR getsockopt (2)
+are used to set and get options, respectively.
+.SH "RETURN VALUE"
+On success, a file descriptor for the new socket is returned.
+On error, \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+.TP
+.B EACCES
+Permission to create a socket of the specified type and/or protocol
+is denied.
+.TP
+.B EAFNOSUPPORT
+The implementation does not support the specified address family.
+.TP
+.B EINVAL
+Unknown protocol, or protocol family not available.
+.TP
+.B EMFILE
+Process file table overflow.
+.TP
+.B ENFILE
+The system limit on the total number of open files has been reached.
+.TP
+.BR ENOBUFS " or " ENOMEM
+Insufficient memory is available.
+The socket cannot be
+created until sufficient resources are freed.
+.TP
+.B EPROTONOSUPPORT
+The protocol type or the specified protocol is not
+supported within this domain.
+.PP
+Other errors may be generated by the underlying protocol modules.
+.SH "CONFORMING TO"
+4.4BSD, POSIX.1-2001.
+.BR socket ()
+appeared in 4.2BSD.
+It is generally portable to/from
+non-BSD systems supporting clones of the BSD socket layer (including
+System V variants).
+.SH NOTES
+POSIX.1-2001 does not require the inclusion of
+.IR <sys/types.h> ,
+and this header file is not required on Linux.
+However, some historical (BSD) implementations required this header
+file, and portable applications are probably wise to include it.
+
+The manifest constants used under 4.x BSD for protocol families
+are
+.BR PF_UNIX ,
+.BR PF_INET ,
+etc., while
+.B AF_UNIX
+etc. are used for address
+families.
+However, already the BSD man page promises: "The protocol
+family generally is the same as the address family", and subsequent
+standards use AF_* everywhere.
+.SH BUGS
+.B SOCK_UUCP
+is not implemented yet.
+.SH EXAMPLE
+An example of the use of
+.BR socket ()
+is shown in
+.BR getaddrinfo (3).
+.SH "SEE ALSO"
+.BR accept (2),
+.BR bind (2),
+.BR connect (2),
+.BR fcntl (2),
+.BR getpeername (2),
+.BR getsockname (2),
+.BR getsockopt (2),
+.BR ioctl (2),
+.BR listen (2),
+.BR read (2),
+.BR recv (2),
+.BR select (2),
+.BR send (2),
+.BR shutdown (2),
+.BR socketpair (2),
+.BR write (2),
+.BR getprotoent (3),
+.BR ip (7),
+.BR socket (7),
+.BR tcp (7),
+.BR udp (7),
+.BR unix (7)
+.PP
+\(lqAn Introductory 4.3BSD Interprocess Communication Tutorial\(rq
+is reprinted in
+.I UNIX Programmer's Supplementary Documents Volume 1.
+.PP
+\(lqBSD Interprocess Communication Tutorial\(rq
+is reprinted in
+.I UNIX Programmer's Supplementary Documents Volume 1.
--- a/man3/rtnetlink.3
+++ b/man3/rtnetlink.3
@ -1,8 +1,118 @@
-.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
-.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
-.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
-.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
-.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
-.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
-.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
-.TH RTNETLINK 3 2008-08-07 "GNU" "Linux Programmer's Manual"
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\" $Id: rtnetlink.3,v 1.2 1999/05/18 10:35:10 freitag Exp $
+.TH RTNETLINK 3 1999-05-14 "GNU" "Linux Programmer's Manual"
+.SH NAME
+rtnetlink \- macros to manipulate rtnetlink messages
+.SH SYNOPSIS
+.B #include <asm/types.h>
+.br
+.B #include <linux/netlink.h>
+.br
+.B #include <linux/rtnetlink.h>
+.br
+.B #include <sys/socket.h>
+
+.BI "rtnetlink_socket = socket(PF_NETLINK, int " socket_type \
+", NETLINK_ROUTE);"
+.sp
+.BI "int RTA_OK(struct rtattr *" rta ", int " rtabuflen );
+.sp
+.BI "void *RTA_DATA(struct rtattr *" rta );
+.sp
+.BI "unsigned int RTA_PAYLOAD(struct rtattr *" rta );
+.sp
+.BI "struct rtattr *RTA_NEXT(struct rtattr *" rta \
+", unsigned int " rtabuflen );
+.sp
+.BI "unsigned int RTA_LENGTH(unsigned int " length );
+.sp
+.BI "unsigned int RTA_SPACE(unsigned int "length );
+.SH DESCRIPTION
+All
+.BR rtnetlink (7)
+messages consist of a
+.BR netlink (7)
+message header and appended attributes.
+The attributes should be only
+manipulated using the macros provided here.
+.PP
+.BI RTA_OK( rta ", " attrlen )
+returns true if
+.I rta
+points to a valid routing attribute;
+.I attrlen
+is the running length of the attribute buffer.
+When not true then you must assume there are no more attributes in the
+message, even if
+.I attrlen
+is non-zero.
+.PP
+.BI RTA_DATA( rta )
+returns a pointer to the start of this attribute's data.
+.PP
+.BI RTA_PAYLOAD( rta )
+returns the length of this attribute's data.
+.PP
+.BI RTA_NEXT( rta ", " attrlen )
+gets the next attribute after
+.IR rta .
+Calling this macro will update
+.IR attrlen .
+You should use
+.B RTA_OK
+to check the validity of the returned pointer.
+.PP
+.BI RTA_LENGTH( len )
+returns the length which is required for
+.I len
+bytes of data plus the header.
+.PP
+.BI RTA_SPACE( len )
+returns the amount of space which will be needed in a message with
+.I len
+bytes of data.
+.SH CONFORMING TO
+These macros are non-standard Linux extensions.
+.SH BUGS
+This manual page is incomplete.
+.SH EXAMPLE
+.\" FIXME ? would be better to use libnetlink in the EXAMPLE code here
+
+Creating a rtnetlink message to set the MTU of a device:
+.nf
+
+    struct {
+        struct nlmsghdr  nh;
+        struct ifinfomsg if;
+        char             attrbuf[512];
+    } req;
+
+    struct rtattr *rta;
+    unsigned int mtu = 1000;
+
+    int rtnetlink_sk = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE);
+
+    memset(&req, 0, sizeof(req));
+    req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
+    req.nh.nlmsg_flags = NLM_F_REQUEST;
+    req.nh.nlmsg_type = RTML_NEWLINK;
+    req.if.ifi_family = AF_UNSPEC;
+    req.if.ifi_index = INTERFACE_INDEX;
+    req.if.ifi_change = 0xffffffff; /* ???*/
+    rta = (struct rtattr *)(((char *) &req) +
+                                  NLMSG_ALIGN(n\->nlmsg_len));
+    rta\->rta_type = IFLA_MTU;
+    rta\->rta_len = sizeof(unsigned int);
+    req.n.nlmsg_len = NLMSG_ALIGN(req.n.nlmsg_len) +
+                                  RTA_LENGTH(sizeof(mtu));
+    memcpy(RTA_DATA(rta), &mtu, sizeof(mtu));
+    send(rtnetlink_sk, &req, req.n.nlmsg_len);
+.fi
+.SH "SEE ALSO"
+.BR netlink (3),
+.BR netlink (7),
+.BR rtnetlink (7)
--- a/man7/arp.7
+++ b/man7/arp.7
@ -1,8 +1,275 @@
-.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH ARP 7 2008-08-07 "Linux" "Linux Programmer's Manual"
+'\" t
+.\" This man page is Copyright (C) 1999 Matthew Wilcox <willy@bofh.ai>.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\" Modified June 1999 Andi Kleen
+.\" $Id: arp.7,v 1.10 2000/04/27 19:31:38 ak Exp $
+.TH ARP 7 2007-07-27 "Linux" "Linux Programmer's Manual"
+.SH NAME
+arp \- Linux ARP kernel module.
+.SH DESCRIPTION
+This kernel protocol module implements the Address Resolution
+Protocol defined in RFC\ 826.
+It is used to convert between Layer2 hardware addresses
+and IPv4 protocol addresses on directly connected networks.
+The user normally doesn't interact directly with this module except to
+configure it;
+instead it provides a service for other protocols in the kernel.
+
+A user process can receive ARP packets by using
+.BR packet (7)
+sockets.
+There is also a mechanism for managing the ARP cache
+in user-space by using
+.BR netlink (7)
+sockets.
+The ARP table can also be controlled via
+.BR ioctl (2)
+on any
+.B PF_INET
+socket.
+
+The ARP module maintains a cache of mappings between hardware addresses
+and protocol addresses.
+The cache has a limited size so old and less
+frequently used entries are garbage-collected.
+Entries which are marked
+as permanent are never deleted by the garbage-collector.
+The cache can
+be directly manipulated by the use of ioctls and its behavior can be
+tuned by the sysctls defined below.
+
+When there is no positive feedback for an existing mapping after some
+time (see the sysctls below) a neighbor cache entry is considered stale.
+Positive feedback can be gotten from a higher layer; for example from
+a successful TCP ACK.
+Other protocols can signal forward progress
+using the
+.B MSG_CONFIRM
+flag to
+.BR sendmsg (2).
+When there is no forward progress ARP tries to reprobe.
+It first tries to ask a local arp daemon
+.B app_solicit
+times for an updated MAC address.
+If that fails and an old MAC address is known an unicast probe is send
+.B ucast_solicit
+times.
+If that fails too it will broadcast a new ARP
+request to the network.
+Requests are only send when there is data queued
+for sending.
+
+Linux will automatically add a non-permanent proxy arp entry when it
+receives a request for an address it forwards to and proxy arp is
+enabled on the receiving interface.
+When there is a reject route for the target no proxy arp entry is added.
+.SS Ioctls
+Three ioctls are available on all
+.B PF_INET
+sockets.
+They take a pointer to a
+.I struct arpreq
+as their argument.
+
+.in +4n
+.nf
+struct arpreq {
+    struct sockaddr arp_pa;      /* protocol address */
+    struct sockaddr arp_ha;      /* hardware address */
+    int             arp_flags;   /* flags */
+    struct sockaddr arp_netmask; /* netmask of protocol address */
+    char            arp_dev[16];
+};
+.fi
+.in
+
+.BR SIOCSARP ", " SIOCDARP " and " SIOCGARP
+respectively set, delete and get an ARP mapping.
+Setting & deleting ARP maps are privileged operations and may
+only be performed by a process with the
+.B CAP_NET_ADMIN
+capability or an effective UID of 0.
+
+.I arp_pa
+must be an
+.B AF_INET
+socket and
+.I arp_ha
+must have the same type as the device which is specified in
+.IR arp_dev .
+.I arp_dev
+is a zero-terminated string which names a device.
+.RS
+.TS
+tab(:) allbox;
+c s
+l l.
+\fIarp_flags\fR
+flag:meaning
+ATF_COM:Lookup complete
+ATF_PERM:Permanent entry
+ATF_PUBL:Publish entry
+ATF_USETRAILERS:Trailers requested
+ATF_NETMASK:Use a netmask
+ATF_DONTPUB:Don't answer
+.TE
+.RE
+
+.PP
+If the
+.B ATF_NETMASK
+flag is set, then
+.I arp_netmask
+should be valid.
+Linux 2.2 does not support proxy network ARP entries, so this
+should be set to 0xffffffff, or 0 to remove an existing proxy arp entry.
+.B ATF_USETRAILERS
+is obsolete and should not be used.
+.SS Sysctls
+ARP supports a sysctl interface to configure parameters on a global
+or per-interface basis.
+The sysctls can be accessed by reading or writing the
+.I /proc/sys/net/ipv4/neigh/*/*
+files or with the
+.BR sysctl (2)
+interface.
+Each interface in the system has its own directory in
+/proc/sys/net/ipv4/neigh/.
+The setting in the "default" directory is used for all newly created
+devices.
+Unless otherwise specified time-related sysctls are specified
+in seconds.
+.TP
+.B anycast_delay
+The maximum number of jiffies to delay before replying to a
+IPv6 neighbor solicitation message.
+Anycast support is not yet implemented.
+Defaults to 1 second.
+.TP
+.B app_solicit
+The maximum number of probes to send to the user space ARP daemon via
+netlink before dropping back to multicast probes (see
+.IR mcast_solicit ).
+Defaults to 0.
+.TP
+.B base_reachable_time
+Once a neighbor has been found, the entry is considered to be valid
+for at least a random value between
+.IR base_reachable_time "/2 and 3*" base_reachable_time /2.
+An entry's validity will be extended if it receives positive feedback
+from higher level protocols.
+Defaults to 30 seconds.
+.TP
+.B delay_first_probe_time
+Delay before first probe after it has been decided that a neighbor
+is stale.
+Defaults to 5 seconds.
+.TP
+.B gc_interval
+How frequently the garbage collector for neighbor entries
+should attempt to run.
+Defaults to 30 seconds.
+.TP
+.B gc_stale_time
+Determines how often to check for stale neighbor entries.
+When a neighbor entry is considered stale it is resolved again before
+sending data to it.
+Defaults to 60 seconds.
+.TP
+.B gc_thresh1
+The minimum number of entries to keep in the ARP cache.
+The garbage collector will not run if there are fewer than
+this number of entries in the cache.
+Defaults to 128.
+.TP
+.B gc_thresh2
+The soft maximum number of entries to keep in the ARP cache.
+The garbage collector will allow the number of entries to exceed
+this for 5 seconds before collection will be performed.
+Defaults to 512.
+.TP
+.B gc_thresh3
+The hard maximum number of entries to keep in the ARP cache.
+The garbage collector will always run if there are more than
+this number of entries in the cache.
+Defaults to 1024.
+.TP
+.B locktime
+The minimum number of jiffies to keep an ARP entry in the cache.
+This prevents ARP cache thrashing if there is more than one potential
+mapping (generally due to network misconfiguration).
+Defaults to 1 second.
+.TP
+.B mcast_solicit
+The maximum number of attempts to resolve an address by
+multicast/broadcast before marking the entry as unreachable.
+Defaults to 3.
+.TP
+.B proxy_delay
+When an ARP request for a known proxy-ARP address is received, delay up to
+.I proxy_delay
+jiffies before replying.
+This is used to prevent network flooding in some cases.
+Defaults to 0.8 seconds.
+.TP
+.B proxy_qlen
+The maximum number of packets which may be queued to proxy-ARP addresses.
+Defaults to 64.
+.TP
+.B retrans_time
+The number of jiffies to delay before retransmitting a request.
+Defaults to 1 second.
+.TP
+.B ucast_solicit
+The maximum number of attempts to send unicast probes before asking
+the ARP daemon (see
+.IR app_solicit ).
+Defaults to 3.
+.TP
+.B unres_qlen
+The maximum number of packets which may be queued for each unresolved
+address by other network layers.
+Defaults to 3.
+.SH VERSIONS
+The
+.I struct arpreq
+changed in Linux 2.0 to include the
+.I arp_dev
+member and the ioctl numbers changed at the same time.
+Support for the old ioctls was dropped in Linux 2.2.
+
+Support for proxy arp entries for networks (netmask not equal 0xffffffff)
+was dropped in Linux 2.2.
+It is replaced by automatic proxy arp setup by
+the kernel for all reachable hosts on other interfaces (when
+forwarding and proxy arp is enabled for the interface).
+
+The
+.I neigh/*
+sysctls did not exist before Linux 2.2.
+.SH BUGS
+Some timer settings are specified in jiffies, which is architecture-
+and kernel version-dependent; see
+.BR time (7).
+
+There is no way to signal positive feedback from user space.
+This means connection oriented protocols implemented in user space
+will generate excessive ARP traffic, because ndisc will regularly
+reprobe the MAC address.
+The same problem applies for some kernel protocols (e.g., NFS over UDP).
+
+This man page mashes IPv4 specific and shared between IPv4 and IPv6
+functionality together.
+.SH "SEE ALSO"
+.BR capabilities (7),
+.BR ip (7)
+.PP
+RFC\ 826 for a description of ARP.
+.br
+RFC\ 2461 for a description of IPv6 neighbor discovery and the base
+algorithms used.
+.LP
+Linux 2.2+ IPv4 ARP uses the IPv6 algorithms when applicable.
--- a/man7/ddp.7
+++ b/man7/ddp.7
@ -1,8 +1,251 @@
-.TH DDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH DDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH DDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH DDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH DDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH DDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH DDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH DDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
+.\" This man page is Copyright (C) 1998 Alan Cox.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\" $Id: ddp.7,v 1.3 1999/05/13 11:33:22 freitag Exp $
+.TH DDP  7 1999-05-01 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ddp \- Linux AppleTalk protocol implementation
+.SH SYNOPSIS
+.B #include <sys/socket.h>
+.br
+.B #include <netatalk/at.h>
+.sp
+.IB ddp_socket " = socket(PF_APPLETALK, SOCK_DGRAM, 0);"
+.br
+.IB raw_socket " = socket(PF_APPLETALK, SOCK_RAW, " protocol ");"
+.SH DESCRIPTION
+Linux implements the Appletalk protocols described in
+.IR "Inside Appletalk" .
+Only the DDP layer and AARP are present in
+the kernel.
+They are designed to be used via the
+.B netatalk
+protocol
+libraries.
+This page documents the interface for those who wish or need to
+use the DDP layer directly.
+.PP
+The communication between Appletalk and the user program works using a
+BSD-compatible socket interface.
+For more information on sockets, see
+.BR socket (7).
+.PP
+An AppleTalk socket is created by calling the
+.BR socket (2)
+function with a
+.B PF_APPLETALK
+socket family argument.
+Valid socket types are
+.B SOCK_DGRAM
+to open a
+.B ddp
+socket or
+.B SOCK_RAW
+to open a
+.B raw
+socket.
+.I protocol
+is the Appletalk protocol to be received or sent.
+For
+.B SOCK_RAW
+you must specify
+.BR ATPROTO_DDP .
+.PP
+Raw sockets may be only opened by a process with effective user ID 0
+or when the process has the
+.B CAP_NET_RAW
+capability.
+.SS "Address Format"
+An Appletalk socket address is defined as a combination of a network number,
+a node number, and a port number.
+.PP
+.in +4n
+.nf
+struct at_addr {
+    unsigned short s_net;
+    unsigned char  s_node;
+};
+
+struct sockaddr_atalk {
+    sa_family_t    sat_family;    /* address family */
+    unsigned char  sat_port;      /* port */
+    struct at_addr sat_addr;      /* net/node */
+};
+.fi
+.in
+.PP
+.I sat_family
+is always set to
+.BR AF_APPLETALK .
+.I sat_port
+contains the port.
+The port numbers below 129 are known as
+.I reserved ports.
+Only processes with the effective user ID 0 or the
+.B CAP_NET_BIND_SERVICE
+capability may
+.BR bind (2)
+to these sockets.
+.I sat_addr
+is the host address.
+The
+.I net
+member of
+.I struct at_addr
+contains the host network in network byte order.
+The value of
+.B AT_ANYNET
+is a
+wildcard and also implies \(lqthis network.\(rq
+The
+.I node
+member of
+.I struct at_addr
+contains the host node number.
+The value of
+.B AT_ANYNODE
+is a
+wildcard and also implies \(lqthis node.\(rq The value of
+.B ATADDR_BCAST
+is a link
+local broadcast address.
+.\" FIXME this doesn't make sense [johnl]
+.SS "Socket Options"
+No protocol-specific socket options are supported.
+.SS Sysctls
+IP supports a sysctl interface to configure some global AppleTalk
+parameters.
+The sysctls can be accessed by reading or writing the
+.I /proc/sys/net/atalk/*
+files or with the
+.BR sysctl (2)
+interface.
+.TP
+.B aarp-expiry-time
+The time interval (in seconds) before an AARP cache entry expires.
+.TP
+.B aarp-resolve-time
+The time interval (in seconds) before an AARP cache entry is resolved.
+.TP
+.B aarp-retransmit-limit
+The number of retransmissions of an AARP query before the node is declared
+dead.
+.TP
+.B aarp-tick-time
+The timer rate (in seconds) for the timer driving AARP.
+.PP
+The default values match the specification and should never need to be
+changed.
+.SS Ioctls
+All ioctls described in
+.BR socket (7)
+apply to ddp.
+.\" FIXME Add a section about multicasting
+.SH ERRORS
+.\" FIXME document all errors. We should really fix the kernels to
+.\" give more uniform error returns (ENOMEM vs ENOBUFS, EPERM vs
+.\" EACCES etc.)
+.TP
+.B EACCES
+The user tried to execute an operation without the necessary permissions.
+These include sending to a broadcast address without
+having the broadcast flag set,
+and trying to bind to a reserved port without effective user ID 0 or
+.BR CAP_NET_BIND_SERVICE .
+.TP
+.B EADDRINUSE
+Tried to bind to an address already in use.
+.TP
+.B EADDRNOTAVAIL
+A nonexistent interface was requested or the requested source address was
+not local.
+.TP
+.B EAGAIN
+Operation on a non-blocking socket would block.
+.TP
+.B EALREADY
+A connection operation on a non-blocking socket is already in progress.
+.TP
+.B ECONNABORTED
+A connection was closed during an
+.BR accept (2).
+.TP
+.B EHOSTUNREACH
+No routing table entry matches the destination address.
+.TP
+.B EINVAL
+Invalid argument passed.
+.TP
+.B EISCONN
+.BR connect (2)
+was called on an already connected socket.
+.TP
+.B EMSGSIZE
+Datagram is bigger than the DDP MTU.
+.TP
+.B ENODEV
+Network device not available or not capable of sending IP.
+.TP
+.B ENOENT
+.B SIOCGSTAMP
+was called on a socket where no packet arrived.
+.TP
+.BR ENOMEM " and " ENOBUFS
+Not enough memory available.
+.TP
+.B ENOPKG
+A kernel subsystem was not configured.
+.TP
+.BR ENOPROTOOPT " and " EOPNOTSUPP
+Invalid socket option passed.
+.TP
+.B ENOTCONN
+The operation is only defined on a connected socket, but the socket wasn't
+connected.
+.TP
+.B EPERM
+User doesn't have permission to set high priority,
+make a configuration change,
+or send signals to the requested process or group,
+.TP
+.B EPIPE
+The connection was unexpectedly closed or shut down by the other end.
+.TP
+.B ESOCKTNOSUPPORT
+The socket was unconfigured, or an unknown socket type was requested.
+.SH VERSIONS
+Appletalk is supported by Linux 2.0 or higher.
+The
+.B sysctl
+interface is
+new in Linux 2.2.
+.SH NOTES
+Be very careful with the
+.B SO_BROADCAST
+option \- it is not privileged in Linux.
+It is easy to overload the network
+with careless sending to broadcast addresses.
+.SS Compatibility
+The basic AppleTalk socket interface is compatible with
+.B netatalk
+on BSD-derived systems.
+Many BSD systems fail to check
+.B SO_BROADCAST
+when sending broadcast frames; this can lead to compatibility problems.
+.PP
+The
+raw
+socket mode is unique to Linux and exists to support the alternative CAP
+package and AppleTalk monitoring tools more easily.
+.SH BUGS
+There are too many inconsistent error values.
+.PP
+The ioctls used to configure routing tables, devices,
+AARP tables and other devices are not yet described.
+.SH "SEE ALSO"
+.BR recvmsg (2),
+.BR sendmsg (2),
+.BR capabilities (7),
+.BR socket (7)
--- a/man7/ip.7
+++ b/man7/ip.7
--- a/man7/ipv6.7
+++ b/man7/ipv6.7
@ -1,8 +1,327 @@
-.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH IPV6 7 2008-08-07 "Linux" "Linux Programmer's Manual"
+.\" This man page is Copyright (C) 2000 Andi Kleen <ak@muc.de>.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\" $Id: ipv6.7,v 1.3 2000/12/20 18:10:31 ak Exp $
+.TH IPV6 7 2008-07-17 "Linux" "Linux Programmer's Manual"
+.SH NAME
+ipv6, PF_INET6 \- Linux IPv6 protocol implementation
+.SH SYNOPSIS
+.B #include <sys/socket.h>
+.br
+.B #include <netinet/in.h>
+.sp
+.IB tcp6_socket " = socket(PF_INET6, SOCK_STREAM, 0);"
+.br
+.IB raw6_socket " = socket(PF_INET6, SOCK_RAW, " protocol ");"
+.br
+.IB udp6_socket " = socket(PF_INET6, SOCK_DGRAM, " protocol ");"
+.SH DESCRIPTION
+Linux 2.2 optionally implements the Internet Protocol, version 6.
+This man page contains a description of the IPv6 basic API as
+implemented by the Linux kernel and glibc 2.1.
+The interface
+is based on the BSD sockets interface; see
+.BR socket (7).
+.PP
+The IPv6 API aims to be mostly compatible with the
+.BR ip (7)
+v4 API.
+Only differences are described in this man page.
+.PP
+To bind an
+.B AF_INET6
+socket to any process the local address should be copied from the
+.I in6addr_any
+variable which has
+.I in6_addr
+type.
+In static initializations
+.B IN6ADDR_ANY_INIT
+may also be used, which expands to a constant expression.
+Both of them are in network order.
+.PP
+The IPv6 loopback address (::1) is available in the global
+.I in6addr_loopback
+variable.
+For initializations
+.B IN6ADDR_LOOPBACK_INIT
+should be used.
+.PP
+IPv4 connections can be handled with the v6 API by using the
+v4-mapped-on-v6 address type;
+thus a program only needs only to support this API type to
+support both protocols.
+This is handled transparently by the address
+handling functions in libc.
+.PP
+IPv4 and IPv6 share the local port space.
+When you get an IPv4 connection
+or packet to a IPv6 socket its source address will be mapped
+to v6 and it will be mapped to v6.
+.SS "Address Format"
+.in +4n
+.nf
+struct sockaddr_in6 {
+    uint16_t        sin6_family;   /* AF_INET6 */
+    uint16_t        sin6_port;     /* port number */
+    uint32_t        sin6_flowinfo; /* IPv6 flow information */
+    struct in6_addr sin6_addr;     /* IPv6 address */
+    uint32_t        sin6_scope_id; /* Scope ID (new in 2.4) */
+};
+
+struct in6_addr {
+    unsigned char   s6_addr[16];   /* IPv6 address */
+};
+.fi
+.in
+.sp
+.I sin6_family
+is always set to
+.BR AF_INET6 ;
+.I sin6_port
+is the protocol port (see
+.I sin_port
+in
+.BR ip (7));
+.I sin6_flowinfo
+is the IPv6 flow identifier;
+.I sin6_addr
+is the 128-bit IPv6 address.
+.I sin6_scope_id
+is an ID of depending of on the scope of the address.
+It is new in Linux 2.4.
+Linux only supports it for link scope addresses, in that case
+.I sin6_scope_id
+contains the interface index (see
+.BR netdevice (7))
+.PP
+IPv6 supports several address types: unicast to address a single
+host, multicast to address a group of hosts,
+anycast to address the nearest member of a group of hosts
+(not implemented in Linux), IPv4-on-IPv6 to
+address a IPv4 host, and other reserved address types.
+.PP
+The address notation for IPv6 is a group of 16 2-digit hexadecimal
+numbers, separated with a \(aq:\(aq.
+\&"::" stands for a string of 0 bits.
+Special addresses are ::1 for loopback and ::FFFF:<IPv4 address>
+for IPv4-mapped-on-IPv6.
+.PP
+The port space of IPv6 is shared with IPv4.
+.SS "Socket Options"
+IPv6 supports some protocol-specific socket options that can be set with
+.BR setsockopt (2)
+and read with
+.BR getsockopt (2).
+The socket option level for IPv6 is
+.BR IPPROTO_IPV6 .
+A boolean integer flag is zero when it is false, otherwise true.
+.TP
+.B IPV6_ADDRFORM
+Turn an
+.B AF_INET6
+socket into a socket of a different address family.
+Only
+.B AF_INET
+is currently supported for that.
+It is only allowed for IPv6 sockets
+that are connected and bound to a v4-mapped-on-v6 address.
+The argument is a pointer to an integer containing
+.BR AF_INET .
+This is useful to pass v4-mapped sockets as file descriptors to
+programs that don't know how to deal with the IPv6 API.
+.TP
+.B IPV6_ADD_MEMBERSHIP, IPV6_DROP_MEMBERSHIP
+Control membership in multicast groups.
+Argument is a pointer to a
+.I struct ipv6_mreq
+structure.
+.\" FIXME IPV6_CHECKSUM is not documented, and probably should be
+.\" FIXME IPV6_JOIN_ANYCAST is not documented, and probably should be
+.\" FIXME IPV6_LEAVE_ANYCAST is not documented, and probably should be
+.\" FIXME IPV6_RECVPKTINFO is not documented, and probably should be
+.\" FIXME IPV6_2292PKTINFO is not documented, and probably should be
+.\" FIXME there are probably many other IPV6_* socket options that
+.\" should be documented
+.TP
+.B IPV6_MTU
+Set the MTU to be used for the socket.
+The MTU is limited by the device
+MTU or the path mtu when path mtu discovery is enabled.
+Argument is a pointer to integer.
+.TP
+.B IPV6_MTU_DISCOVER
+Control path mtu discovery on the socket.
+See
+.B IP_MTU_DISCOVER
+in
+.BR ip (7)
+for details.
+.TP
+.B IPV6_MULTICAST_HOPS
+Set the multicast hop limit for the socket.
+Argument is a pointer to an
+integer.
+\-1 in the value means use the route default, otherwise it should be
+between 0 and 255.
+.TP
+.B IPV6_MULTICAST_IF
+Set the device for outgoing multicast packets on the socket.
+This is only allowed
+for
+.B SOCK_DGRAM
+and
+.B SOCK_RAW
+socket.
+The argument is a pointer to an interface index (see
+.BR netdevice (7))
+in an integer.
+.TP
+.B IPV6_MULTICAST_LOOP
+Control whether the socket sees multicast packets that it has send itself.
+Argument is a pointer to boolean.
+.TP
+.B IPV6_PKTINFO
+Set delivery of the
+.B IPV6_PKTINFO
+control message on incoming datagrams.
+Only allowed for
+.B SOCK_DGRAM
+or
+.B SOCK_RAW
+sockets.
+Argument is a pointer to a boolean value in an integer.
+.TP
+.nh
+.B IPV6_RTHDR, IPV6_AUTHHDR, IPV6_DSTOPS, IPV6_HOPOPTS, IPV6_FLOWINFO, IPV6_HOPLIMIT
+.hy
+Set delivery of control messages for incoming datagrams containing
+extension headers from the received packet.
+.B IPV6_RTHDR
+delivers the routing header,
+.B IPV6_AUTHHDR
+delivers the authentication header,
+.B IPV6_DSTOPTS
+delivers the destination options,
+.B IPV6_HOPOPTS
+delivers the hop options,
+.B IPV6_FLOWINFO
+delivers an integer containing the flow ID,
+.B IPV6_HOPLIMIT
+delivers an integer containing the hop count of the packet.
+The control messages have the same type as the socket option.
+All these header options can also be set for outgoing packets
+by putting the appropriate control message into the control buffer of
+.BR sendmsg (2).
+Only allowed for
+.B SOCK_DGRAM
+or
+.B SOCK_RAW
+sockets.
+Argument is a pointer to a boolean value.
+.TP
+.B IPV6_RECVERR
+Control receiving of asynchronous error options.
+See
+.B IP_RECVERR
+in
+.BR ip (7)
+for details.
+Argument is a pointer to boolean.
+.TP
+.B IPV6_ROUTER_ALERT
+Pass forwarded packets containing a router alert hop-by-hop option to
+this socket.
+Only allowed for SOCK_RAW sockets.
+The tapped packets are not forwarded by the kernel, it is the
+user's responsibility to send them out again.
+Argument is a pointer to an integer.
+A positive integer indicates a router alert option value to intercept.
+Packets carrying a router alert option with a value field containing
+this integer will be delivered to the socket.
+A negative integer disables delivery of packets with router alert options
+to this socket.
+.TP
+.B IPV6_UNICAST_HOPS
+Set the unicast hop limit for the socket.
+Argument is a pointer to an integer.
+\-1 in the value means use the route default,
+otherwise it should be between 0 and 255.
+.TP
+.BR IPV6_V6ONLY " (since Linux 2.4.21 and 2.6)"
+.\" See RFC 3493
+If this flag is set to true (non-zero), then the socket is restricted
+to sending and receiving IPv6 packets only.
+In this case, an IPv4 and an IPv6 application can bind
+to a single port at the same time.
+
+If this flag is set to false (zero),
+then the socket can be used to send and receive packets
+to and from an IPv6 address or an IPv4-mapped IPv6 address.
+
+The argument is a pointer to a boolean value in an integer.
+
+The default value for this flag is defined by the contents of the file
+.BR /proc/sys/net/ipv6/bindv6only .
+The default value for that file is 0 (false).
+.\" FLOWLABEL_MGR, FLOWINFO_SEND
+.SH VERSIONS
+The older
+.I libinet6
+libc5 based IPv6 API implementation for Linux is not described here
+and may vary in details.
+.PP
+Linux 2.4 will break binary compatibility for the
+.I sockaddr_in6
+for 64-bit
+hosts by changing the alignment of
+.I in6_addr
+and adding an additional
+.I sin6_scope_id
+field.
+The kernel interfaces stay compatible, but a program including
+.I sockaddr_in6
+or
+.I in6_addr
+into other structures may not be.
+This is not
+a problem for 32-bit hosts like i386.
+.PP
+The
+.I sin6_flowinfo
+field is new in Linux 2.4.
+It is transparently passed/read by the kernel
+when the passed address length contains it.
+Some programs that pass a longer address buffer and then
+check the outgoing address length may break.
+.SH "NOTES"
+The
+.I sockaddr_in6
+structure is bigger than the generic
+.IR sockaddr .
+Programs that assume that all address types can be stored safely in a
+.I struct sockaddr
+need to be changed to use
+.I struct sockaddr_storage
+for that instead.
+.SH BUGS
+The IPv6 extended API as in RFC\ 2292 is currently only partly
+implemented;
+although the 2.2 kernel has near complete support for receiving options,
+the macros for generating IPv6 options are missing in glibc 2.1.
+.PP
+IPSec support for EH and AH headers is missing.
+.PP
+Flow label management is not complete and not documented here.
+.PP
+This man page is not complete.
+.SH "SEE ALSO"
+.BR cmsg (3),
+.BR ip (7)
+.PP
+RFC\ 2553: IPv6 BASIC API.
+Linux tries to be compliant to this.
+.PP
+RFC\ 2460: IPv6 specification.
--- a/man7/netlink.7
+++ b/man7/netlink.7
@ -1,8 +1,460 @@
-.TH NETLINK  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH NETLINK  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH NETLINK  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH NETLINK  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH NETLINK  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH NETLINK  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH NETLINK  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH NETLINK  7 2008-08-07 "Linux" "Linux Programmer's Manual"
+'\" t
+.\" Don't change the first line, it tells man that tbl is needed.
+.\" This man page is Copyright (c) 1998 by Andi Kleen. Subject to the GPL.
+.\" Based on the original comments from Alexey Kuznetsov
+.\" Modified 2005-12-27 by Hasso Tepper <hasso@estpak.ee>
+.\" $Id: netlink.7,v 1.8 2000/06/22 13:23:00 ak Exp $
+.TH NETLINK  7 2005-12-27 "Linux" "Linux Programmer's Manual"
+.SH NAME
+netlink \- Communication between kernel and userspace (PF_NETLINK)
+.SH SYNOPSIS
+.nf
+.B #include <asm/types.h>
+.B #include <sys/socket.h>
+.B #include <linux/netlink.h>
+
+.BI "netlink_socket = socket(PF_NETLINK, " socket_type ", " netlink_family );
+.fi
+.SH DESCRIPTION
+Netlink is used to transfer information between kernel and
+userspace processes.
+It consists of a standard sockets-based interface for userspace
+processes and an internal kernel API for kernel modules.
+The internal kernel interface is not documented in this manual page.
+There is also an obsolete netlink interface
+via netlink character devices; this interface is not documented here
+and is only provided for backwards compatibility.
+
+Netlink is a datagram-oriented service.
+Both
+.B SOCK_RAW
+and
+.B SOCK_DGRAM
+are valid values for
+.IR socket_type .
+However, the netlink protocol does not distinguish between datagram
+and raw sockets.
+
+.I netlink_family
+selects the kernel module or netlink group to communicate with.
+The currently assigned netlink families are:
+.TP
+.B NETLINK_ROUTE
+Receives routing and link updates and may be used to modify the routing
+tables (both IPv4 and IPv6), IP addresses, link parameters,
+neighbor setups, queueing disciplines, traffic classes and
+packet classifiers (see
+.BR rtnetlink (7)).
+.TP
+.B NETLINK_W1
+Messages from 1-wire subsystem.
+.TP
+.B NETLINK_USERSOCK
+Reserved for user-mode socket protocols.
+.TP
+.B NETLINK_FIREWALL
+Transport IPv4 packets from netfilter to userspace.
+Used by
+.I ip_queue
+kernel module.
+.TP
+.B NETLINK_INET_DIAG
+.\" FIXME More details on NETLINK_INET_DIAG needed.
+INET socket monitoring.
+.TP
+.B NETLINK_NFLOG
+Netfilter/iptables ULOG.
+.TP
+.B NETLINK_XFRM
+.\" FIXME More details on NETLINK_XFRM needed.
+IPsec.
+.TP
+.B NETLINK_SELINUX
+SELinux event notifications.
+.TP
+.B NETLINK_ISCSI
+.\" FIXME More details on NETLINK_ISCSI needed.
+Open-iSCSI.
+.TP
+.B NETLINK_AUDIT
+.\" FIXME More details on NETLINK_AUDIT needed.
+Auditing.
+.TP
+.B NETLINK_FIB_LOOKUP
+.\" FIXME More details on NETLINK_FIB_LOOKUP needed.
+Access to FIB lookup from userspace.
+.TP
+.B NETLINK_CONNECTOR
+Kernel connector.
+See
+.I Documentation/connector/*
+in the kernel source for further information.
+.TP
+.B NETLINK_NETFILTER
+.\" FIXME More details on NETLINK_NETFILTER needed.
+Netfilter subsystem.
+.TP
+.B NETLINK_IP6_FW
+Transport IPv6 packets from netfilter to userspace.
+Used by
+.I ip6_queue
+kernel module.
+.TP
+.B NETLINK_DNRTMSG
+DECnet routing messages.
+.TP
+.B NETLINK_KOBJECT_UEVENT
+.\" FIXME More details on NETLINK_KOBJECT_UEVENT needed.
+Kernel messages to userspace.
+.TP
+.B NETLINK_GENERIC
+Generic netlink family for simplified netlink usage.
+.PP
+Netlink messages consist of a byte stream with one or multiple
+.I nlmsghdr
+headers and associated payload.
+The byte stream should only be accessed with the standard
+.B NLMSG_*
+macros.
+See
+.BR netlink (3)
+for further information.
+
+In multipart messages (multiple
+.I nlmsghdr
+headers with associated payload in one byte stream) the first and all
+following headers have the
+.B NLM_F_MULTI
+flag set, except for the last header which has the type
+.BR NLMSG_DONE .
+
+After each
+.I nlmsghdr
+the payload follows.
+
+.in +4n
+.nf
+struct nlmsghdr {
+    __u32 nlmsg_len;    /* Length of message including header. */
+    __u16 nlmsg_type;   /* Type of message content. */
+    __u16 nlmsg_flags;  /* Additional flags. */
+    __u32 nlmsg_seq;    /* Sequence number. */
+    __u32 nlmsg_pid;    /* PID of the sending process. */
+};
+.fi
+.in
+
+.I nlmsg_type
+can be one of the standard message types:
+.B NLMSG_NOOP
+message is to be ignored,
+.B NLMSG_ERROR
+message signals an error and the payload contains an
+.I nlmsgerr
+structure,
+.B NLMSG_DONE
+message terminates a multipart message.
+
+.in +4n
+.nf
+struct nlmsgerr {
+    int error;        /* Negative errno or 0 for acknowledgements */
+    struct nlmsghdr msg;  /* Message header that caused the error */
+};
+.fi
+.in
+
+A netlink family usually specifies more message types, see the
+appropriate manual pages for that, for example,
+.BR rtnetlink (7)
+for
+.BR NETLINK_ROUTE .
+
+Standard flag bits in
+.I nlmsg_flags
+.br
+---------------------------------
+.TS
+tab(:);
+lB l.
+NLM_F_REQUEST:Must be set on all request messages.
+NLM_F_MULTI:T{
+The message is part of a multipart message terminated by
+.BR NLMSG_DONE .
+T}
+NLM_F_ACK:Request for an acknowledgment on success.
+NLM_F_ECHO:Echo this request.
+.TE
+
+Additional flag bits for GET requests
+.br
+-------------------------------------
+.TS
+tab(:);
+lB l.
+NLM_F_ROOT:Return the complete table instead of a single entry.
+NLM_F_MATCH:T{
+Return all entries matching criteria passed in message content.
+Not implemented yet.
+T}
+.\" FIXME NLM_F_ATOMIC is not used any more?
+NLM_F_ATOMIC:Return an atomic snapshot of the table.
+NLM_F_DUMP:Convenience macro; equivalent to (NLM_F_ROOT|NLM_F_MATCH).
+.TE
+
+Note that
+.B NLM_F_ATOMIC
+requires the
+.B CAP_NET_ADMIN
+capability or an effective UID of 0.
+
+Additional flag bits for NEW requests
+.br
+-------------------------------------
+.TS
+tab(:);
+lB l.
+NLM_F_REPLACE:Replace existing matching object.
+NLM_F_EXCL:Don't replace if the object already exists.
+NLM_F_CREATE:Create object if it doesn't already exist.
+NLM_F_APPEND:Add to the end of the object list.
+.TE
+
+.I nlmsg_seq
+and
+.I nlmsg_pid
+are used to track messages.
+.I nlmsg_pid
+shows the origin of the message.
+Note that there isn't a 1:1 relationship between
+.I nlmsg_pid
+and the PID of the process if the message originated from a netlink
+socket.
+See the
+.B ADDRESS FORMATS
+section for further information.
+
+Both
+.I nlmsg_seq
+and
+.I nlmsg_pid
+.\" FIXME Explain more about nlmsg_seq and nlmsg_pid.
+are opaque to netlink core.
+
+Netlink is not a reliable protocol.
+It tries its best to deliver a message to its destination(s),
+but may drop messages when an out-of-memory condition or
+other error occurs.
+For reliable transfer the sender can request an
+acknowledgement from the receiver by setting the
+.B NLM_F_ACK
+flag.
+An acknowledgment is an
+.B NLMSG_ERROR
+packet with the error field set to 0.
+The application must generate acknowledgements for
+received messages itself.
+The kernel tries to send an
+.B NLMSG_ERROR
+message for every failed packet.
+A user process should follow this convention too.
+
+However, reliable transmissions from kernel to user are impossible
+in any case.
+The kernel can't send a netlink message if the socket buffer is full:
+the message will be dropped and the kernel and the userspace process will
+no longer have the same view of kernel state.
+It is up to the application to detect when this happens (via the
+.B ENOBUFS
+error returned by
+.BR recvmsg (2))
+and resynchronize.
+.SS Address Formats
+The
+.I sockaddr_nl
+structure describes a netlink client in user space or in the kernel.
+A
+.I sockaddr_nl
+can be either unicast (only sent to one peer) or sent to
+netlink multicast groups
+.RI ( nl_groups
+not equal 0).
+
+.in +4n
+.nf
+struct sockaddr_nl {
+    sa_family_t     nl_family;  /* AF_NETLINK */
+    unsigned short  nl_pad;     /* Zero. */
+    pid_t           nl_pid;     /* Process ID. */
+    __u32           nl_groups;  /* Multicast groups mask. */
+};
+.fi
+.in
+
+.I nl_pid
+is the unicast address of netlink socket.
+It's always 0 if the destination is in the kernel.
+For a userspace process,
+.I nl_pid
+is usually the PID of the process owning the destination socket.
+However,
+.I nl_pid
+identifies a netlink socket, not a process.
+If a process owns several netlink
+sockets, then
+.I nl_pid
+can only be equal to the process ID for at most one socket.
+There are two ways to assign
+.I nl_pid
+to a netlink socket.
+If the application sets
+.I nl_pid
+before calling
+.BR bind (2),
+then it is up to the application to make sure that
+.I nl_pid
+is unique.
+If the application sets it to 0, the kernel takes care of assigning it.
+The kernel assigns the process ID to the first netlink socket the process
+opens and assigns a unique
+.I nl_pid
+to every netlink socket that the process subsequently creates.
+
+.I nl_groups
+is a bit mask with every bit representing a netlink group number.
+Each netlink family has a set of 32 multicast groups.
+When
+.BR bind (2)
+is called on the socket, the
+.I nl_groups
+field in the
+.I sockaddr_nl
+should be set to a bit mask of the groups which it wishes to listen to.
+The default value for this field is zero which means that no multicasts
+will be received.
+A socket may multicast messages to any of the multicast groups by setting
+.I nl_groups
+to a bit mask of the groups it wishes to send to when it calls
+.BR sendmsg (2)
+or does a
+.BR connect (2).
+Only processes with an effective UID of 0 or the
+.B CAP_NET_ADMIN
+capability may send or listen to a netlink multicast group.
+Any replies to a message received for a multicast group should be
+sent back to the sending PID and the multicast group.
+.SH VERSIONS
+The socket interface to netlink is a new feature of Linux 2.2.
+
+Linux 2.0 supported a more primitive device based netlink interface
+(which is still available as a compatibility option).
+This obsolete interface is not described here.
+
+NETLINK_SELINUX appeared in Linux 2.6.4.
+
+NETLINK_AUDIT appeared in Linux 2.6.6.
+
+NETLINK_KOBJECT_UEVENT appeared in Linux 2.6.10.
+
+NETLINK_W1 and NETLINK_FIB_LOOKUP appeared in Linux 2.6.13.
+
+NETLINK_INET_DIAG, NETLINK_CONNECTOR and NETLINK_NETFILTER appeared in
+Linux 2.6.14.
+
+NETLINK_GENERIC and NETLINK_ISCSI appeared in Linux 2.6.15.
+.SH NOTES
+It is often better to use netlink via
+.I libnetlink
+or
+.I libnl
+than via the low-level kernel interface.
+.SH BUGS
+This manual page is not complete.
+.SH EXAMPLE
+The following example creates a
+.B NETLINK_ROUTE
+netlink socket which will listen to the
+.B RTMGRP_LINK
+(network interface create/delete/up/down events) and
+.B RTMGRP_IPV4_IFADDR
+(IPv4 addresses add/delete events) multicast groups.
+
+.in +4n
+.nf
+struct sockaddr_nl sa;
+
+memset(&sa, 0, sizeof(sa));
+snl.nl_family = AF_NETLINK;
+snl.nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR;
+
+fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
+bind(fd, (struct sockaddr*)&sa, sizeof(sa));
+.fi
+.in
+
+The next example demonstrates how to send a netlink message to the
+kernel (pid 0).
+Note that application must take care of message sequence numbers
+in order to reliably track acknowledgements.
+
+.in +4n
+.nf
+struct nlmsghdr *nh;    /* The nlmsghdr with payload to send. */
+struct sockaddr_nl sa;
+struct iovec iov = { (void *) nh, nh\->nlmsg_len };
+struct msghdr msg;
+
+msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 };
+memset(&sa, 0, sizeof(sa));
+sa.nl_family = AF_NETLINK;
+nh\->nlmsg_pid = 0;
+nh\->nlmsg_seq = ++sequence_number;
+/* Request an ack from kernel by setting NLM_F_ACK. */
+nh\->nlmsg_flags |= NLM_F_ACK;
+
+sendmsg(fd, &msg, 0);
+.fi
+.in
+
+And the last example is about reading netlink message.
+
+.in +4n
+.nf
+int len;
+char buf[4096];
+struct iovec iov = { buf, sizeof(buf) };
+struct sockaddr_nl sa;
+struct msghdr msg;
+struct nlmsghdr *nh;
+
+msg = { (void *)&sa, sizeof(sa), &iov, 1, NULL, 0, 0 };
+len = recvmsg(fd, &msg, 0);
+
+for (nh = (struct nlmsghdr *) buf; NLMSG_OK (nh, len);
+     nh = NLMSG_NEXT (nh, len)) {
+    /* The end of multipart message. */
+    if (nh\->nlmsg_type == NLMSG_DONE)
+        return;
+
+    if (nh\->nlmsg_type == NLMSG_ERROR)
+        /* Do some error handling. */
+    ...
+
+    /* Continue with parsing payload. */
+    ...
+}
+.fi
+.in
+.SH "SEE ALSO"
+.BR cmsg (3),
+.BR netlink (3),
+.BR capabilities (7),
+.BR rtnetlink (7)
+.PP
+ftp://ftp.inr.ac.ru/ip-routing/iproute2*
+for information about libnetlink.
+
+http://people.suug.ch/~tgr/libnl/
+for information about libnl.
+
+RFC 3549 "Linux Netlink as an IP Services Protocol"
--- a/man7/packet.7
+++ b/man7/packet.7
@ -1,8 +1,402 @@
-.TH PACKET  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH PACKET  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH PACKET  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH PACKET  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH PACKET  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH PACKET  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH PACKET  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH PACKET  7 2008-08-07 "Linux" "Linux Programmer's Manual"
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\" $Id: packet.7,v 1.13 2000/08/14 08:03:45 ak Exp $
+.TH PACKET  7 1999-04-29 "Linux" "Linux Programmer's Manual"
+.SH NAME
+packet, PF_PACKET \- packet interface on device level.
+.SH SYNOPSIS
+.nf
+.B #include <sys/socket.h>
+.br
+.B #include <netpacket/packet.h>
+.br
+.B #include <net/ethernet.h>     /* the L2 protocols */
+.sp
+.BI "packet_socket = socket(PF_PACKET, int " socket_type ", int "protocol );
+.fi
+.SH DESCRIPTION
+Packet sockets are used to receive or send raw packets at the device driver
+(OSI Layer 2) level.
+They allow the user to implement protocol modules in user space
+on top of the physical layer.
+
+The
+.I socket_type
+is either
+.B SOCK_RAW
+for raw packets including the link level header or
+.B SOCK_DGRAM
+for cooked packets with the link level header removed.
+The link level
+header information is available in a common format in a
+.IR sockaddr_ll .
+.I protocol
+is the IEEE 802.3 protocol number in network order.
+See the
+.I <linux/if_ether.h>
+include file for a list of allowed protocols.
+When protocol
+is set to
+.B htons(ETH_P_ALL)
+then all protocols are received.
+All incoming packets of that protocol type will be passed to the packet
+socket before they are passed to the protocols implemented in the kernel.
+
+Only processes with effective UID 0 or the
+.B CAP_NET_RAW
+capability may open packet sockets.
+
+.B SOCK_RAW
+packets are passed to and from the device driver without any changes in
+the packet data.
+When receiving a packet, the address is still parsed and
+passed in a standard
+.I sockaddr_ll
+address structure.
+When transmitting a packet, the user supplied buffer
+should contain the physical layer header.
+That packet is then
+queued unmodified to the network driver of the interface defined by the
+destination address.
+Some device drivers always add other headers.
+.B SOCK_RAW
+is similar to but not compatible with the obsolete
+.B PF_INET/SOCK_PACKET
+of Linux 2.0.
+
+.B SOCK_DGRAM
+operates on a slightly higher level.
+The physical header is removed before the packet is passed to the user.
+Packets sent through a
+.B SOCK_DGRAM
+packet socket get a suitable physical layer header based on the
+information in the
+.I sockaddr_ll
+destination address before they are queued.
+
+By default all packets of the specified protocol type
+are passed to a packet socket.
+To only get packets from a specific interface use
+.BR bind (2)
+specifying an address in a
+.I struct sockaddr_ll
+to bind the packet socket to an interface.
+Only the
+.I sll_protocol
+and the
+.I sll_ifindex
+address fields are used for purposes of binding.
+
+The
+.BR connect (2)
+operation is not supported on packet sockets.
+
+When the
+.B MSG_TRUNC
+flag is passed to
+.BR recvmsg (2),
+.BR recv (2),
+.BR recvfrom (2)
+the real length of the packet on the wire is always returned,
+even when it is longer than the buffer.
+.SS Address Types
+The sockaddr_ll is a device independent physical layer address.
+
+.in +4n
+.nf
+struct sockaddr_ll {
+    unsigned short sll_family;   /* Always AF_PACKET */
+    unsigned short sll_protocol; /* Physical layer protocol */
+    int            sll_ifindex;  /* Interface number */
+    unsigned short sll_hatype;   /* Header type */
+    unsigned char  sll_pkttype;  /* Packet type */
+    unsigned char  sll_halen;    /* Length of address */
+    unsigned char  sll_addr[8];  /* Physical layer address */
+};
+.fi
+.in
+
+.I sll_protocol
+is the standard ethernet protocol type in network order as defined
+in the
+.I <linux/if_ether.h>
+include file.
+It defaults to the socket's protocol.
+.I sll_ifindex
+is the interface index of the interface
+(see
+.BR netdevice (7));
+0 matches any interface (only permitted for binding).
+.I sll_hatype
+is a ARP type as defined in the
+.I <linux/if_arp.h>
+include file.
+.I sll_pkttype
+contains the packet type.
+Valid types are
+.B PACKET_HOST
+for a packet addressed to the local host,
+.B PACKET_BROADCAST
+for a physical layer broadcast packet,
+.B PACKET_MULTICAST
+for a packet sent to a physical layer multicast address,
+.B PACKET_OTHERHOST
+for a packet to some other host that has been caught by a device driver
+in promiscuous mode, and
+.B PACKET_OUTGOING
+for a packet originated from the local host that is looped back to a packet
+socket.
+These types make only sense for receiving.
+.I sll_addr
+and
+.I sll_halen
+contain the physical layer (e.g., IEEE 802.3) address and its length.
+The exact interpretation depends on the device.
+
+When you send packets it is enough to specify
+.IR sll_family ,
+.IR sll_addr ,
+.IR sll_halen ,
+.IR sll_ifindex .
+The other fields should be 0.
+.I sll_hatype
+and
+.I sll_pkttype
+are set on received packets for your information.
+For bind only
+.I sll_protocol
+and
+.I sll_ifindex
+are used.
+.SS Socket Options
+Packet sockets can be used to configure physical layer multicasting
+and promiscuous mode.
+It works by calling
+.BR setsockopt (2)
+on a packet socket for
+.B SOL_PACKET
+and one of the options
+.B PACKET_ADD_MEMBERSHIP
+to add a binding or
+.B PACKET_DROP_MEMBERSHIP
+to drop it.
+They both expect a
+.B packet_mreq
+structure as argument:
+
+.in +4n
+.nf
+struct packet_mreq {
+    int            mr_ifindex;    /* interface index */
+    unsigned short mr_type;       /* action */
+    unsigned short mr_alen;       /* address length */
+    unsigned char  mr_address[8]; /* physical layer address */
+};
+.fi
+.in
+
+.B mr_ifindex
+contains the interface index for the interface whose status
+should be changed.
+The
+.B mr_type
+parameter specifies which action to perform.
+.B PACKET_MR_PROMISC
+enables receiving all packets on a shared medium (often known as
+"promiscuous mode"),
+.B PACKET_MR_MULTICAST
+binds the socket to the physical layer multicast group specified in
+.B mr_address
+and
+.BR mr_alen ,
+and
+.B PACKET_MR_ALLMULTI
+sets the socket up to receive all multicast packets arriving at
+the interface.
+
+In addition the traditional ioctls
+.BR SIOCSIFFLAGS ,
+.BR SIOCADDMULTI ,
+.B SIOCDELMULTI
+can be used for the same purpose.
+.SS Ioctls
+.B SIOCGSTAMP
+can be used to receive the timestamp of the last received packet.
+Argument is a
+.I struct timeval.
+
+In addition all standard ioctls defined in
+.BR netdevice (7)
+and
+.BR socket (7)
+are valid on packet sockets.
+.SS Error Handling
+Packet sockets do no error handling other than errors occurred
+while passing the packet to the device driver.
+They don't have the concept of a pending error.
+.SH ERRORS
+.TP
+.B EADDRNOTAVAIL
+Unknown multicast group address passed.
+.TP
+.B EFAULT
+User passed invalid memory address.
+.TP
+.B EINVAL
+Invalid argument.
+.TP
+.B EMSGSIZE
+Packet is bigger than interface MTU.
+.TP
+.B ENETDOWN
+Interface is not up.
+.TP
+.B ENOBUFS
+Not enough memory to allocate the packet.
+.TP
+.B ENODEV
+Unknown device name or interface index specified in interface address.
+.TP
+.B ENOENT
+No packet received.
+.TP
+.B ENOTCONN
+No interface address passed.
+.TP
+.B ENXIO
+Interface address contained an invalid interface index.
+.TP
+.B EPERM
+User has insufficient privileges to carry out this operation.
+
+In addition other errors may be generated by the low-level driver.
+.SH VERSIONS
+.B PF_PACKET
+is a new feature in Linux 2.2.
+Earlier Linux versions supported only
+.BR SOCK_PACKET .
+.PP
+The include file
+.I <netpacket/packet.h>
+is present since glibc 2.1.
+Older systems need:
+.sp
+.in +4n
+.nf
+#include <asm/types.h>
+#include <linux/if_packet.h>
+#include <linux/if_ether.h>  /* The L2 protocols */
+.fi
+.in
+.SH NOTES
+For portable programs it is suggested to use
+.B PF_PACKET
+via
+.BR pcap (3);
+although this only covers a subset of the
+.B PF_PACKET
+features.
+
+The
+.B SOCK_DGRAM
+packet sockets make no attempt to create or parse the IEEE 802.2 LLC
+header for a IEEE 802.3 frame.
+When
+.B ETH_P_802_3
+is specified as protocol for sending the kernel creates the
+802.3 frame and fills out the length field; the user has to supply the LLC
+header to get a fully conforming packet.
+Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol
+fields; instead they are supplied to the user as protocol
+.B ETH_P_802_2
+with the LLC header prepended.
+It is thus not possible to bind to
+.BR ETH_P_802_3 ;
+bind to
+.B ETH_P_802_2
+instead and do the protocol multiplex yourself.
+The default for sending is the standard Ethernet DIX
+encapsulation with the protocol filled in.
+
+Packet sockets are not subject to the input or output firewall chains.
+.SS Compatibility
+In Linux 2.0, the only way to get a packet socket was by calling
+.BI "socket(PF_INET, SOCK_PACKET, " protocol )\fR.
+This is still supported but strongly deprecated.
+The main difference between the two methods is that
+.B SOCK_PACKET
+uses the old
+.I struct sockaddr_pkt
+to specify an interface, which doesn't provide physical layer
+independence.
+
+.in +4n
+.nf
+struct sockaddr_pkt {
+    unsigned short spkt_family;
+    unsigned char  spkt_device[14];
+    unsigned short spkt_protocol;
+};
+.fi
+.in
+
+.I spkt_family
+contains
+the device type,
+.I spkt_protocol
+is the IEEE 802.3 protocol type as defined in
+.I <sys/if_ether.h>
+and
+.I spkt_device
+is the device name as a null terminated string, for example, eth0.
+
+This structure is obsolete and should not be used in new code.
+.SH BUGS
+glibc 2.1 does not have a define for
+.BR SOL_PACKET .
+The suggested workaround is to use:
+.in +4n
+.nf
+
+#ifndef SOL_PACKET
+#define SOL_PACKET 263
+#endif
+
+.fi
+.in
+This is fixed in later glibc versions and also does not occur on
+libc5 systems.
+
+The IEEE 802.2/803.3 LLC handling could be considered as a bug.
+
+Socket filters are not documented.
+
+The
+.B MSG_TRUNC
+.BR recvmsg (2)
+extension is an ugly hack and should be replaced by a control message.
+There is currently no way to get the original destination address of
+packets via
+.BR SOCK_DGRAM .
+.\" .SH CREDITS
+.\" This man page was written by Andi Kleen with help from Matthew Wilcox.
+.\" PF_PACKET in Linux 2.2 was implemented
+.\" by Alexey Kuznetsov, based on code by Alan Cox and others.
+.SH "SEE ALSO"
+.BR socket (2),
+.BR pcap (3),
+.BR capabilities (7),
+.BR ip (7),
+.BR raw (7),
+.BR socket (7)
+
+RFC\ 894 for the standard IP Ethernet encapsulation.
+
+RFC\ 1700 for the IEEE 802.3 IP encapsulation.
+
+The
+.I <linux/if_ether.h>
+include file for physical layer protocols.
--- a/man7/raw.7
+++ b/man7/raw.7
@ -1,8 +1,278 @@
-.TH RAW  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH RAW  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH RAW  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH RAW  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH RAW  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH RAW  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH RAW  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH RAW  7 2008-08-07 "Linux" "Linux Programmer's Manual"
+'\" t
+.\" Don't change the first line, it tells man that we need tbl.
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\" $Id: raw.7,v 1.6 1999/06/05 10:32:08 freitag Exp $
+.TH RAW  7 1998-10-02 "Linux" "Linux Programmer's Manual"
+.SH NAME
+raw, SOCK_RAW \- Linux IPv4 raw sockets
+.SH SYNOPSIS
+.B #include <sys/socket.h>
+.br
+.B #include <netinet/in.h>
+.br
+.BI "raw_socket = socket(PF_INET, SOCK_RAW, int " protocol );
+.SH DESCRIPTION
+Raw sockets allow new IPv4 protocols to be implemented in user space.
+A raw socket receives or sends the raw datagram not
+including link level headers.
+
+The IPv4 layer generates an IP header when sending a packet unless the
+.B IP_HDRINCL
+socket option is enabled on the socket.
+When it is enabled, the packet must contain an IP header.
+For receiving the IP header is always included in the packet.
+
+Only processes with an effective user ID of 0 or the
+.B CAP_NET_RAW
+capability are allowed to open raw sockets.
+
+All packets or errors matching the
+.I protocol
+number specified
+for the raw socket are passed to this socket.
+For a list of the allowed protocols see RFC\ 1700 assigned numbers and
+.BR getprotobyname (3).
+
+A protocol of
+.B IPPROTO_RAW
+implies enabled
+.B IP_HDRINCL
+and is able to send any IP protocol that is specified in the passed
+header.
+Receiving of all IP protocols via
+.B IPPROTO_RAW
+is not possible using raw sockets.
+.RS
+.TS
+tab(:) allbox;
+c s
+l l.
+IP Header fields modified on sending by \fBIP_HDRINCL\fP
+IP Checksum:Always filled in.
+Source Address:Filled in when zero.
+Packet Id:Filled in when zero.
+Total Length:Always filled in.
+.TE
+.RE
+.sp
+.PP
+If
+.B IP_HDRINCL
+is specified and the IP header has a non-zero destination address then
+the destination address of the socket is used to route the packet.
+When
+.B MSG_DONTROUTE
+is specified the destination address should refer to a local interface,
+otherwise a routing table lookup is done anyway but gatewayed routes
+are ignored.
+
+If
+.B IP_HDRINCL
+isn't set then IP header options can be set on raw sockets with
+.BR setsockopt (2);
+see
+.BR ip (7)
+for more information.
+
+In Linux 2.2 all IP header fields and options can be set using
+IP socket options.
+This means raw sockets are usually only needed for new
+protocols or protocols with no user interface (like ICMP).
+
+When a packet is received, it is passed to any raw sockets which have
+been bound to its protocol before it is passed to other protocol handlers
+(e.g., kernel protocol modules).
+.SS Address Format
+Raw sockets use the standard
+.I sockaddr_in
+address structure defined in
+.BR ip (7).
+The
+.I sin_port
+field could be used to specify the IP protocol number,
+but it is ignored for sending in Linux 2.2 and should be always
+set to 0 (see BUGS)
+For incoming packets
+.I sin_port
+is set to the protocol of the packet.
+See the
+.I <netinet/in.h>
+include file for valid IP protocols.
+.SS Socket Options
+Raw socket options can be set with
+.BR setsockopt (2)
+and read with
+.BR getsockopt (2)
+by passing the
+.B IPPROTO_RAW
+.\" Or SOL_RAW on Linux
+family flag.
+.TP
+.B ICMP_FILTER
+Enable a special filter for raw sockets bound to the
+.B IPPROTO_ICMP
+protocol.
+The value has a bit set for each ICMP message type which
+should be filtered out.
+The default is to filter no ICMP messages.
+.PP
+In addition all
+.BR ip (7)
+.B IPPROTO_IP
+socket options valid for datagram sockets are supported.
+.SS Error Handling
+Errors originating from the network are only passed to the user when the
+socket is connected or the
+.B IP_RECVERR
+flag is enabled.
+For connected sockets only
+.B EMSGSIZE
+and
+.B EPROTO
+are passed for compatibility.
+With
+.B IP_RECVERR
+all network errors are saved in the error queue.
+.SH ERRORS
+.TP
+.B EACCES
+User tried to send to a broadcast address without having the
+broadcast flag set on the socket.
+.TP
+.B EFAULT
+An invalid memory address was supplied.
+.TP
+.B EINVAL
+Invalid argument.
+.TP
+.B EMSGSIZE
+Packet too big.
+Either Path MTU Discovery is enabled (the
+.B IP_MTU_DISCOVER
+socket flag) or the packet size exceeds the maximum allowed IPv4
+packet size of 64KB.
+.TP
+.B EOPNOTSUPP
+Invalid flag has been passed to a socket call (like
+.BR MSG_OOB ).
+.TP
+.B EPERM
+The user doesn't have permission to open raw sockets.
+Only processes with an effective user ID of 0 or the
+.B CAP_NET_RAW
+attribute may do that.
+.TP
+.B EPROTO
+An ICMP error has arrived reporting a parameter problem.
+.SH VERSIONS
+.B IP_RECVERR
+and
+.B ICMP_FILTER
+are new in Linux 2.2.
+They are Linux extensions and should not be used in portable programs.
+
+Linux 2.0 enabled some bug-to-bug compatibility with BSD in the
+raw socket code when the
+.B SO_BSDCOMPAT
+socket option was set \(em since Linux 2.2,
+this option no longer has that effect.
+.SH NOTES
+By default raw sockets do path MTU (Maximum Transmission Unit) discovery.
+This means the kernel
+will keep track of the MTU to a specific target IP address and return
+.B EMSGSIZE
+when a raw packet write exceeds it.
+When this happens the application should decrease the packet size.
+Path MTU discovery can be also turned off using the
+.B IP_MTU_DISCOVER
+socket option or the
+.I ip_no_pmtu_disc
+sysctl, see
+.BR ip (7)
+for details.
+When turned off raw sockets will fragment outgoing packets
+that exceed the interface MTU.
+However disabling it is not recommended
+for performance and reliability reasons.
+
+A raw socket can be bound to a specific local address using the
+.BR bind (2)
+call.
+If it isn't bound all packets with the specified IP protocol are received.
+In addition a RAW socket can be bound to a specific network device using
+.BR SO_BINDTODEVICE ;
+see
+.BR socket (7).
+
+An
+.B IPPROTO_RAW
+socket is send only.
+If you really want to receive all IP packets use a
+.BR packet (7)
+socket with the
+.B ETH_P_IP
+protocol.
+Note that packet sockets don't reassemble IP fragments,
+unlike raw sockets.
+
+If you want to receive all ICMP packets for a datagram socket
+it is often better to use
+.B IP_RECVERR
+on that particular socket; see
+.BR ip (7).
+
+Raw sockets may tap all IP protocols in Linux, even
+protocols like ICMP or TCP which have a protocol module in the kernel.
+In this case the packets are passed to both the kernel module and the raw
+socket(s).
+This should not be relied upon in portable programs, many other BSD
+socket implementation have limitations here.
+
+Linux never changes headers passed from the user (except for filling
+in some zeroed fields as described for
+.BR IP_HDRINCL ).
+This differs from many other implementations of raw sockets.
+
+RAW sockets are generally rather unportable and should be avoided in
+programs intended to be portable.
+
+Sending on raw sockets should take the IP protocol from
+.IR sin_port ;
+this ability was lost in Linux 2.2.
+The workaround is to use
+.BR IP_HDRINCL .
+.SH BUGS
+Transparent proxy extensions are not described.
+
+When the
+.B IP_HDRINCL
+option is set datagrams will not be fragmented and are limited to
+the interface MTU.
+
+Setting the IP protocol for sending in
+.I sin_port
+got lost in Linux 2.2.
+The protocol that the socket was bound to or that
+was specified in the initial
+.BR socket (2)
+call is always used.
+.\" .SH AUTHORS
+.\" This man page was written by Andi Kleen.
+.SH "SEE ALSO"
+.BR recvmsg (2),
+.BR sendmsg (2),
+.BR capabilities (7),
+.BR ip (7),
+.BR socket (7)
+
+.B RFC\ 1191
+for path MTU discovery.
+
+.B RFC\ 791
+and the
+.I <linux/ip.h>
+include file for the IP protocol.
--- a/man7/rtnetlink.7
+++ b/man7/rtnetlink.7
@ -1,8 +1,449 @@
+'\" t
+.\" Don't remove the line above, it tells man that tbl is needed.
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\" Based on the original comments from Alexey Kuznetsov, written with
+.\" help from Matthew Wilcox.
+.\" $Id: rtnetlink.7,v 1.8 2000/01/22 01:55:04 freitag Exp $
 .TH RTNETLINK  7 1999-04-30 "Linux" "Linux Programmer's Manual"
-.TH RTNETLINK  7 1999-04-30 "Linux" "Linux Programmer's Manual"
-.TH RTNETLINK  7 1999-04-30 "Linux" "Linux Programmer's Manual"
-.TH RTNETLINK  7 1999-04-30 "Linux" "Linux Programmer's Manual"
-.TH RTNETLINK  7 1999-04-30 "Linux" "Linux Programmer's Manual"
-.TH RTNETLINK  7 1999-04-30 "Linux" "Linux Programmer's Manual"
-.TH RTNETLINK  7 1999-04-30 "Linux" "Linux Programmer's Manual"
-.TH RTNETLINK  7 1999-04-30 "Linux" "Linux Programmer's Manual"
+.SH NAME
+rtnetlink, NETLINK_ROUTE \- Linux IPv4 routing socket
+.SH SYNOPSIS
+.B #include <asm/types.h>
+.br
+.B #include <linux/netlink.h>
+.br
+.B #include <linux/rtnetlink.h>
+.br
+.B #include <sys/socket.h>
+.sp
+.BI "rtnetlink_socket = socket(PF_NETLINK, int " socket_type ", NETLINK_ROUTE);"
+.SH DESCRIPTION
+Rtnetlink allows the kernel's routing tables to be read and altered.
+It is used within the kernel to communicate between
+various subsystems, though this usage is not documented here, and for
+communication with user-space programs.
+Network routes, ip addresses, link parameters, neighbor setups, queueing
+disciplines, traffic classes and packet classifiers may all be controlled
+through
+.B NETLINK_ROUTE
+sockets.
+It is based on netlink messages, see
+.BR netlink (7)
+for more information.
+.\" FIXME ? all these macros could be moved to rtnetlink(3)
+.SS "Routing Attributes"
+Some rtnetlink messages have optional attributes after the initial header:
+
+.in +4n
+.nf
+struct rtattr {
+    unsigned short rta_len;    /* Length of option */
+    unsigned short rta_type;   /* Type of option */
+    /* Data follows */
+};
+.fi
+.in
+
+These attributes should be only manipulated using the RTA_* macros
+or libnetlink, see
+.BR rtnetlink (3).
+.SS Messages
+Rtnetlink consists of these message types
+(in addition to standard netlink messages):
+.TP
+.BR RTM_NEWLINK ", " RTM_DELLINK ", " RTM_GETLINK
+Create, remove or get information about a specific network interface.
+These messages contain an
+.I ifinfomsg
+structure followed by a series of
+.I rtattr
+structures.
+
+.nf
+struct ifinfomsg {
+    unsigned char  ifi_family; /* AF_UNSPEC */
+    unsigned short ifi_type;   /* Device type */
+    int            ifi_index;  /* Interface index */
+    unsigned int   ifi_flags;  /* Device flags  */
+    unsigned int   ifi_change; /* change mask */
+};
+.fi
+
+.\" FIXME ifi_type
+.I ifi_flags
+contains the device flags, see
+.BR netdevice (7);
+.I ifi_index
+is the unique interface index,
+.I ifi_change
+is reserved for future use and should be always set to 0xFFFFFFFF.
+.TS
+tab(:);
+c
+l l l.
+Routing attributes
+rta_type:value type:description
+_
+IFLA_UNSPEC:-:unspecified.
+IFLA_ADDRESS:hardware address:interface L2 address
+IFLA_BROADCAST:hardware address:L2 broadcast address.
+IFLA_IFNAME:asciiz string:Device name.
+IFLA_MTU:unsigned int:MTU of the device.
+IFLA_LINK:int:Link type.
+IFLA_QDISC:asciiz string:Queueing discipline.
+IFLA_STATS:T{
+see below
+T}:Interface Statistics.
+.TE
+.sp
+The value type for IFLA_STATS is \fIstruct net_device_stats\fP.
+.TP
+.BR RTM_NEWADDR ", " RTM_DELADDR ", " RTM_GETADDR
+Add, remove or receive information about an IP address associated with
+an interface.
+In Linux 2.2 an interface can carry multiple IP addresses,
+this replaces the alias device concept in 2.0.
+In Linux 2.2 these messages
+support IPv4 and IPv6 addresses.
+They contain an
+.I ifaddrmsg
+structure, optionally followed by
+.I rtaddr
+routing attributes.
+
+.nf
+struct ifaddrmsg {
+    unsigned char ifa_family;    /* Address type */
+    unsigned char ifa_prefixlen; /* Prefixlength of address */
+    unsigned char ifa_flags;     /* Address flags */
+    unsigned char ifa_scope;     /* Address scope */
+    int           ifa_index;     /* Interface index */
+};
+.fi
+
+.I ifa_family
+is the address family type (currently
+.B AF_INET
+or
+.BR AF_INET6 ),
+.I ifa_prefixlen
+is the length of the address mask of the address if defined for the
+family (like for IPv4),
+.I ifa_scope
+is the address scope,
+.I ifa_index
+is the interface index of the interface the address is associated with.
+.I ifa_flags
+is a flag word of
+.B IFA_F_SECONDARY
+for secondary address (old alias interface),
+.B IFA_F_PERMANENT
+for a permanent address set by the user and other undocumented flags.
+.TS
+tab(:);
+c
+l l l.
+Attributes
+rta_type:value type:description
+_
+IFA_UNSPEC:-:unspecified.
+IFA_ADDRESS:raw protocol address:interface address
+IFA_LOCAL:raw protocol address:local address
+IFA_LABEL:asciiz string:name of the interface
+IFA_BROADCAST:raw protocol address:broadcast address.
+IFA_ANYCAST:raw protocol address:anycast address
+IFA_CACHEINFO:struct ifa_cacheinfo:Address information.
+.TE
+.\" FIXME struct ifa_cacheinfo
+.TP
+.BR RTM_NEWROUTE ", " RTM_DELROUTE ", " RTM_GETROUTE
+Create, remove or receive information about a network route.
+These messages contain an
+.I rtmsg
+structure with an optional sequence of
+.I rtattr
+structures following.
+For
+.B RTM_GETROUTE
+setting
+.I rtm_dst_len
+and
+.I rtm_src_len
+to 0 means you get all entries for the specified routing table.
+For the other fields except
+.I rtm_table
+and
+.I rtm_protocol
+0 is the wildcard.
+
+.nf
+struct rtmsg {
+    unsigned char rtm_family;   /* Address family of route */
+    unsigned char rtm_dst_len;  /* Length of source */
+    unsigned char rtm_src_len;  /* Length of destination */
+    unsigned char rtm_tos;      /* TOS filter */
+
+    unsigned char rtm_table;    /* Routing table ID */
+    unsigned char rtm_protocol; /* Routing protocol; see below */
+    unsigned char rtm_scope;    /* See below */
+    unsigned char rtm_type;     /* See below */
+
+    unsigned int  rtm_flags;
+};
+.fi
+.TS
+tab(:);
+l l.
+rtm_type:Route type
+_
+RTN_UNSPEC:unknown route
+RTN_UNICAST:a gateway or direct route
+RTN_LOCAL:a local interface route
+RTN_BROADCAST:T{
+a local broadcast route (sent as a broadcast)
+T}
+RTN_ANYCAST:T{
+a local broadcast route (sent as a unicast)
+T}
+RTN_MULTICAST:a multicast route
+RTN_BLACKHOLE:a packet dropping route
+RTN_UNREACHABLE:an unreachable destination
+RTN_PROHIBIT:a packet rejection route
+RTN_THROW:continue routing lookup in another table
+RTN_NAT:a network address translation rule
+RTN_XRESOLVE:T{
+refer to an external resolver (not implemented)
+T}
+.TE
+.TS
+tab(:);
+l l.
+rtm_protocol:Route origin.
+_
+RTPROT_UNSPEC:unknown
+RTPROT_REDIRECT:T{
+by an ICMP redirect (currently unused)
+T}
+RTPROT_KERNEL:by the kernel
+RTPROT_BOOT:during boot
+RTPROT_STATIC:by the administrator
+.TE
+
+Values larger than
+.B RTPROT_STATIC
+are not interpreted by the kernel, they are just for user information.
+They may be used to tag the source of a routing information or to
+distinguish between multiple routing daemons.
+See
+.I <linux/rtnetlink.h>
+for the routing daemon identifiers which are already assigned.
+
+.I rtm_scope
+is the distance to the destination:
+.TS
+tab(:);
+l l.
+RT_SCOPE_UNIVERSE:global route
+RT_SCOPE_SITE:T{
+interior route in the local autonomous system
+T}
+RT_SCOPE_LINK:route on this link
+RT_SCOPE_HOST:route on the local host
+RT_SCOPE_NOWHERE:destination doesn't exist
+.TE
+
+The values between
+.B RT_SCOPE_UNIVERSE
+and
+.B RT_SCOPE_SITE
+are available to the user.
+
+The
+.I rtm_flags
+have the following meanings:
+.TS
+tab(:);
+l l.
+RTM_F_NOTIFY:T{
+if the route changes, notify the user via rtnetlink
+T}
+RTM_F_CLONED:route is cloned from another route
+RTM_F_EQUALIZE:a multipath equalizer (not yet implemented)
+.TE
+
+.I rtm_table
+specifies the routing table
+.TS
+tab(:);
+l l.
+RT_TABLE_UNSPEC:an unspecified routing table
+RT_TABLE_DEFAULT:the default table
+RT_TABLE_MAIN:the main table
+RT_TABLE_LOCAL:the local table
+.TE
+
+The user may assign arbitrary values between
+.B RT_TABLE_UNSPEC
+and
+.BR RT_TABLE_DEFAULT .
+.TS
+tab(:);
+c
+l l l.
+Attributes
+rta_type:value type:description
+_
+RTA_UNSPEC:-:ignored.
+RTA_DST:protocol address:Route destination address.
+RTA_SRC:protocol address:Route source address.
+RTA_IIF:int:Input interface index.
+RTA_OIF:int:Output interface index.
+RTA_GATEWAY:protocol address:The gateway of the route
+RTA_PRIORITY:int:Priority of route.
+RTA_PREFSRC::
+RTA_METRICS:int:Route metric
+RTA_MULTIPATH::
+RTA_PROTOINFO::
+RTA_FLOW::
+RTA_CACHEINFO::
+.TE
+
+.B Fill these values in!
+.TP
+.BR RTM_NEWNEIGH ", " RTM_DELNEIGH  ", " RTM_GETNEIGH
+Add, remove or receive information about a neighbor table
+entry (e.g., an ARP entry).
+The message contains an
+.I ndmsg
+structure.
+
+.nf
+struct ndmsg {
+    unsigned char ndm_family;
+    int           ndm_ifindex;  /* Interface index */
+    __u16         ndm_state;    /* State */
+    __u8          ndm_flags;    /* Flags */
+    __u8          ndm_type;
+};
+
+struct nda_cacheinfo {
+    __u32         ndm_confirmed;
+    __u32         ndm_used;
+    __u32         ndm_updated;
+    __u32         ndm_refcnt;
+};
+.fi
+
+.I ndm_state
+is a bit mask of the following states:
+.TS
+tab(:);
+l l.
+NUD_INCOMPLETE:a currently resolving cache entry
+NUD_REACHABLE:a confirmed working cache entry
+NUD_STALE:an expired cache entry
+NUD_DELAY:an entry waiting for a timer
+NUD_PROBE:a cache entry that is currently reprobed
+NUD_FAILED:an invalid cache entry
+NUD_NOARP:a device with no destination cache
+NUD_PERMANENT:a static entry
+.TE
+
+Valid
+.I ndm_flags
+are:
+.TS
+tab(:);
+l l.
+NTF_PROXY:a proxy arp entry
+NTF_ROUTER:an IPv6 router
+.TE
+
+.\" FIXME
+.\" document the members of the struct better
+The
+.I rtaddr
+struct has the following meanings for the
+.I rta_type
+field:
+.TS
+tab(:);
+l l.
+NDA_UNSPEC:unknown type
+NDA_DST:a neighbor cache n/w layer destination address
+NDA_LLADDR:a neighbor cache link layer address
+NDA_CACHEINFO:cache statistics.
+.TE
+
+If the
+.I rta_type
+field is
+.B NDA_CACHEINFO
+then a
+.I struct nda_cacheinfo
+header follows
+.TP
+.BR RTM_NEWRULE ", " RTM_DELRULE ", " RTM_GETRULE
+Add, delete or retrieve a routing rule.
+Carries a
+.I struct rtmsg
+.TP
+.BR RTM_NEWQDISC ", " RTM_DELQDISC ", " RTM_GETQDISC
+Add, remove or get a queueing discipline.
+The message contains a
+.I struct tcmsg
+and may be followed by a series of
+attributes.
+
+.nf
+struct tcmsg {
+    unsigned char    tcm_family;
+    int              tcm_ifindex;   /* interface index */
+    __u32            tcm_handle;    /* Qdisc handle */
+    __u32            tcm_parent;    /* Parent qdisc */
+    __u32            tcm_info;
+};
+.fi
+.TS
+tab(:);
+c
+l l l.
+Attributes
+rta_type:value type:Description
+_
+TCA_UNSPEC:-:unspecified
+TCA_KIND:asciiz string:Name of queueing discipline
+TCA_OPTIONS:byte sequence:Qdisc-specific options follow
+TCA_STATS:struct tc_stats:Qdisc statistics.
+TCA_XSTATS:qdisc specific:Module-specific statistics.
+TCA_RATE:struct tc_estimator:Rate limit.
+.TE
+
+In addition various other qdisc module specific attributes are allowed.
+For more information see the appropriate include files.
+.TP
+.BR RTM_NEWTCLASS ", " RTM_DELTCLASS ", " RTM_GETTCLASS
+Add, remove or get a traffic class.
+These messages contain a
+.I struct tcmsg
+as described above.
+.TP
+.BR RTM_NEWTFILTER ", " RTM_DELTFILTER ", " RTM_GETTFILTER
+Add, remove or receive information about a traffic filter.
+These messages contain a
+.I struct tcmsg
+as described above.
+.SH VERSIONS
+.B rtnetlink
+is a new feature of Linux 2.2.
+.SH BUGS
+This manual page is incomplete.
+.SH "SEE ALSO"
+.BR cmsg (3),
+.BR rtnetlink (3),
+.BR ip (7),
+.BR netlink (7)
--- a/man7/socket.7
+++ b/man7/socket.7
@ -1,8 +1,728 @@
-.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
-.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
-.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
-.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
-.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
-.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
-.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
-.TH SOCKET 7 2008-08-07 Linux "Linux Programmer's Manual"
+'\" t
+.\" Don't change the first line, it tells man that we need tbl.
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" and copyright (c) 1999 Matthew Wilcox.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\"
+.\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
+.\"	Added description of SO_ACCEPTCONN
+.\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
+.\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
+.\"     Added notes on capability requirements
+.\"	A few small grammar fixes
+.\"
+.\" FIXME probably all PF_* should be AF_* in this page, since
+.\"        POSIX only specifies the latter values.
+.\"
+.TH SOCKET 7 2007-12-28 Linux "Linux Programmer's Manual"
+.SH NAME
+socket \- Linux socket interface
+.SH SYNOPSIS
+.B #include <sys/socket.h>
+.sp
+.IB mysocket " = socket(int " socket_family ", int " socket_type ", int " protocol );
+.SH DESCRIPTION
+This manual page describes the Linux networking socket layer user
+interface.
+The BSD compatible sockets
+are the uniform interface
+between the user process and the network protocol stacks in the kernel.
+The protocol modules are grouped into
+.I protocol families
+like
+.BR PF_INET ", " PF_IPX ", " PF_PACKET
+and
+.I socket types
+like
+.B SOCK_STREAM
+or
+.BR SOCK_DGRAM .
+See
+.BR socket (2)
+for more information on families and types.
+.SS Socket Layer Functions
+These functions are used by the user process to send or receive packets
+and to do other socket operations.
+For more information see their respective manual pages.
+
+.BR socket (2)
+creates a socket,
+.BR connect (2)
+connects a socket to a remote socket address,
+the
+.BR bind (2)
+function binds a socket to a local socket address,
+.BR listen (2)
+tells the socket that new connections shall be accepted, and
+.BR accept (2)
+is used to get a new socket with a new incoming connection.
+.BR socketpair (2)
+returns two connected anonymous sockets (only implemented for a few
+local families like
+.BR PF_UNIX )
+.PP
+.BR send (2),
+.BR sendto (2),
+and
+.BR sendmsg (2)
+send data over a socket, and
+.BR recv (2),
+.BR recvfrom (2),
+.BR recvmsg (2)
+receive data from a socket.
+.BR poll (2)
+and
+.BR select (2)
+wait for arriving data or a readiness to send data.
+In addition, the standard I/O operations like
+.BR write (2),
+.BR writev (2),
+.BR sendfile (2),
+.BR read (2),
+and
+.BR readv (2)
+can be used to read and write data.
+.PP
+.BR getsockname (2)
+returns the local socket address and
+.BR getpeername (2)
+returns the remote socket address.
+.BR getsockopt (2)
+and
+.BR setsockopt (2)
+are used to set or get socket layer or protocol options.
+.BR ioctl (2)
+can be used to set or read some other options.
+.PP
+.BR close (2)
+is used to close a socket.
+.BR shutdown (2)
+closes parts of a full-duplex socket connection.
+.PP
+Seeking, or calling
+.BR pread (2)
+or
+.BR pwrite (2)
+with a non-zero position is not supported on sockets.
+.PP
+It is possible to do non-blocking I/O on sockets by setting the
+.B O_NONBLOCK
+flag on a socket file descriptor using
+.BR fcntl (2).
+Then all operations that would block will (usually)
+return with
+.B EAGAIN
+(operation should be retried later);
+.BR connect (2)
+will return
+.B EINPROGRESS
+error.
+The user can then wait for various events via
+.BR poll (2)
+or
+.BR select (2).
+.TS
+tab(:) allbox;
+c s s
+l l l.
+I/O events
+Event:Poll flag:Occurrence
+Read:POLLIN:T{
+New data arrived.
+T}
+Read:POLLIN:T{
+A connection setup has been completed
+(for connection-oriented sockets)
+T}
+Read:POLLHUP:T{
+A disconnection request has been initiated by the other end.
+T}
+Read:POLLHUP:T{
+A connection is broken (only for connection-oriented protocols).
+When the socket is written
+.B SIGPIPE
+is also sent.
+T}
+Write:POLLOUT:T{
+Socket has enough send buffer space for writing new data.
+T}
+Read/Write:T{
+POLLIN|
+.br
+POLLOUT
+T}:T{
+An outgoing
+.BR connect (2)
+finished.
+T}
+Read/Write:POLLERR:An asynchronous error occurred.
+Read/Write:POLLHUP:The other end has shut down one direction.
+Exception:POLLPRI:T{
+Urgent data arrived.
+.B SIGURG
+is sent then.
+T}
+.\" FIXME . The following is not true currently:
+.\" It is no I/O event when the connection
+.\" is broken from the local end using
+.\" .BR shutdown (2)
+.\" or
+.\" .BR close (2).
+.TE
+
+.PP
+An alternative to
+.BR poll (2)
+and
+.BR select (2)
+is to let the kernel inform the application about events
+via a
+.B SIGIO
+signal.
+For that the
+.B O_ASYNC
+flag must be set on a socket file descriptor via
+.BR fcntl (2)
+and a valid signal handler for
+.B SIGIO
+must be installed via
+.BR sigaction (2).
+See the
+.I Signals
+discussion below.
+.SS Socket Options
+These socket options can be set by using
+.BR setsockopt (2)
+and read with
+.BR getsockopt (2)
+with the socket level set to
+.B SOL_SOCKET
+for all sockets:
+.\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
+.\" W R Stevens, UNPv1
+.TP
+.B SO_ACCEPTCONN
+Returns a value indicating whether or not this socket has been marked
+to accept connections with
+.BR listen (2).
+The value 0 indicates that this is not a listening socket,
+the value 1 indicates that this is a listening socket.
+Can only be read
+with
+.BR getsockopt (2).
+.TP
+.B SO_BINDTODEVICE
+Bind this socket to a particular device like \(lqeth0\(rq,
+as specified in the passed interface name.
+If the
+name is an empty string or the option length is zero, the socket device
+binding is removed.
+The passed option is a variable-length null terminated
+interface name string with the maximum size of
+.BR IFNAMSIZ .
+If a socket is bound to an interface,
+only packets received from that particular interface are processed by the
+socket.
+Note that this only works for some socket types, particularly
+.B AF_INET
+sockets.
+It is not supported for packet sockets (use normal
+.BR bind (8)
+there).
+.TP
+.B SO_BROADCAST
+Set or get the broadcast flag.
+When enabled, datagram sockets
+receive packets sent to a broadcast address and they are allowed to send
+packets to a broadcast address.
+This option has no effect on stream-oriented sockets.
+.TP
+.B SO_BSDCOMPAT
+Enable BSD bug-to-bug compatibility.
+This is used by the UDP protocol module in Linux 2.0 and 2.2.
+If enabled ICMP errors received for a UDP socket will not be passed
+to the user program.
+In later kernel versions, support for this option has been phased out:
+Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
+(printk()) if a program uses this option.
+Linux 2.0 also enabled BSD bug-to-bug compatibility
+options (random header changing, skipping of the broadcast flag) for raw
+sockets with this option, but that was removed in Linux 2.2.
+.TP
+.B SO_DEBUG
+Enable socket debugging.
+Only allowed for processes with the
+.B CAP_NET_ADMIN
+capability or an effective user ID of 0.
+.TP
+.B SO_ERROR
+Get and clear the pending socket error.
+Only valid as a
+.BR getsockopt (2).
+Expects an integer.
+.TP
+.B SO_DONTROUTE
+Don't send via a gateway, only send to directly connected hosts.
+The same effect can be achieved by setting the
+.B MSG_DONTROUTE
+flag on a socket
+.BR send (2)
+operation.
+Expects an integer boolean flag.
+.TP
+.B SO_KEEPALIVE
+Enable sending of keep-alive messages on connection-oriented sockets.
+Expects an integer boolean flag.
+.TP
+.B SO_LINGER
+Sets or gets the
+.B SO_LINGER
+option.
+The argument is a
+.I linger
+structure.
+.sp
+.in +4n
+.nf
+struct linger {
+    int l_onoff;    /* linger active */
+    int l_linger;   /* how many seconds to linger for */
+};
+.fi
+.in
+.IP
+When enabled, a
+.BR close (2)
+or
+.BR shutdown (2)
+will not return until all queued messages for the socket have been
+successfully sent or the linger timeout has been reached.
+Otherwise,
+the call returns immediately and the closing is done in the background.
+When the socket is closed as part of
+.BR exit (2),
+it always lingers in the background.
+.TP
+.B SO_OOBINLINE
+If this option is enabled,
+out-of-band data is directly placed into the receive data stream.
+Otherwise out-of-band data is only passed when the
+.B MSG_OOB
+flag is set during receiving.
+.\" don't document it because it can do too much harm.
+.\".B SO_NO_CHECK
+.TP
+.B SO_PASSCRED
+Enable or disable the receiving of the
+.B SCM_CREDENTIALS
+control message.
+For more information see
+.BR unix (7).
+.\" FIXME Document SO_PASSSEC, added in 2.6.18; there is some info
+.\" in the 2.6.18 ChangeLog
+.TP
+.B SO_PEERCRED
+Return the credentials of the foreign process connected to this socket.
+This is only possible for connected
+.B PF_UNIX
+stream sockets and
+.B PF_UNIX
+stream and datagram socket pairs created using
+.BR socketpair (2);
+see
+.BR unix (7).
+The returned credentials are those that were in effect at the time
+of the call to
+.BR connect (2)
+or
+.BR socketpair (2).
+Argument is a
+.I ucred
+structure.
+Only valid as a
+.BR getsockopt (2).
+.TP
+.B SO_PRIORITY
+Set the protocol-defined priority for all packets to be sent on
+this socket.
+Linux uses this value to order the networking queues:
+packets with a higher priority may be processed first depending
+on the selected device queueing discipline.
+For
+.BR ip (7),
+this also sets the IP type-of-service (TOS) field for outgoing packets.
+Setting a priority outside the range 0 to 6 requires the
+.B CAP_NET_ADMIN
+capability.
+.TP
+.B SO_RCVBUF
+Sets or gets the maximum socket receive buffer in bytes.
+The kernel doubles this value (to allow space for bookkeeping overhead)
+when it is set using
+.\" Most (all?) other implementations do not do this -- MTK, Dec 05
+.BR setsockopt (2),
+and this doubled value is returned by
+.BR getsockopt (2).
+The default value is set by the
+.I rmem_default
+sysctl and the maximum allowed value is set by the
+.I rmem_max
+sysctl.
+The minimum (doubled) value for this option is 256.
+.TP
+.BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
+Using this socket option, a privileged
+.RB ( CAP_NET_ADMIN )
+process can perform the same task as
+.BR SO_RCVBUF ,
+but the
+.I rmem_max
+limit can be overridden.
+.TP
+.BR SO_RCVLOWAT " and " SO_SNDLOWAT
+Specify the minimum number of bytes in the buffer until the socket layer
+will pass the data to the protocol
+.RB ( SO_SNDLOWAT )
+or the user on receiving
+.RB ( SO_RCVLOWAT ).
+These two values are initialized to 1.
+.B SO_SNDLOWAT
+is not changeable on Linux
+.RB ( setsockopt (2)
+fails with the error
+.BR ENOPROTOOPT ).
+.B SO_RCVLOWAT
+is changeable
+only since Linux 2.4.
+The
+.BR select (2)
+and
+.BR poll (2)
+system calls currently do not respect the
+.B SO_RCVLOWAT
+setting on Linux,
+and mark a socket readable when even a single byte of data is available.
+A subsequent read from the socket will block until
+.B SO_RCVLOWAT
+bytes are available.
+.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
+.\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
+.TP
+.BR SO_RCVTIMEO " and " SO_SNDTIMEO
+.\" Not implemented in 2.0.
+.\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
+.\" Implemented in 2.3.41 for setsockopt, and actually used.
+Specify the receiving or sending timeouts until reporting an error.
+The argument is a
+.IR "struct timeval" .
+If an input or output function blocks for this period of time, and
+data has been sent or received, the return value of that function
+will be the amount of data transferred; if no data has been transferred
+and the timeout has been reached then \-1 is returned with
+.I errno
+set to
+.B EAGAIN
+or
+.B EWOULDBLOCK
+.\" in fact to EAGAIN
+just as if the socket was specified to be non-blocking.
+If the timeout is set to zero (the default)
+then the operation will never timeout.
+Timeouts only have effect for system calls that perform socket I/O (e.g.,
+.BR read (2),
+.BR recvmsg (2),
+.BR send (2),
+.BR sendmsg (2));
+timeouts have no effect for
+.BR select (2),
+.BR poll (2),
+.BR epoll_wait (2),
+etc.
+.TP
+.B SO_REUSEADDR
+Indicates that the rules used in validating addresses supplied in a
+.BR bind (2)
+call should allow reuse of local addresses.
+For
+.B PF_INET
+sockets this
+means that a socket may bind, except when there
+is an active listening socket bound to the address.
+When the listening socket is bound to
+.B INADDR_ANY
+with a specific port then it is not possible
+to bind to this port for any local address.
+Argument is an integer boolean flag.
+.TP
+.B SO_SNDBUF
+Sets or gets the maximum socket send buffer in bytes.
+The kernel doubles this value (to allow space for bookkeeping overhead)
+when it is set using
+.\" Most (all?) other implementations do not do this -- MTK, Dec 05
+.BR setsockopt (2),
+and this doubled value is returned by
+.BR getsockopt (2).
+The default value is set by the
+.I wmem_default
+sysctl and the maximum allowed value is set by the
+.I wmem_max
+sysctl.
+The minimum (doubled) value for this option is 2048.
+.TP
+.BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
+Using this socket option, a privileged
+.RB ( CAP_NET_ADMIN )
+process can perform the same task as
+.BR SO_SNDBUF ,
+but the
+.I wmem_max
+limit can be overridden.
+.TP
+.B SO_TIMESTAMP
+Enable or disable the receiving of the
+.B SO_TIMESTAMP
+control message.
+The timestamp control message is sent with level
+.B SOL_SOCKET
+and the
+.I cmsg_data
+field is a
+.I "struct timeval"
+indicating the
+reception time of the last packet passed to the user in this call.
+See
+.BR cmsg (3)
+for details on control messages.
+.TP
+.B SO_TYPE
+Gets the socket type as an integer (like
+.BR SOCK_STREAM ).
+Can only be read
+with
+.BR getsockopt (2).
+.SS Signals
+When writing onto a connection-oriented socket that has been shut down
+(by the local or the remote end)
+.B SIGPIPE
+is sent to the writing process and
+.B EPIPE
+is returned.
+The signal is not sent when the write call
+specified the
+.B MSG_NOSIGNAL
+flag.
+.PP
+When requested with the
+.B FIOSETOWN
+.BR fcntl (2)
+or
+.B SIOCSPGRP
+.BR ioctl (2),
+.B SIGIO
+is sent when an I/O event occurs.
+It is possible to use
+.BR poll (2)
+or
+.BR select (2)
+in the signal handler to find out which socket the event occurred on.
+An alternative (in Linux 2.2) is to set a real-time signal using the
+.B F_SETSIG
+.BR fcntl (2);
+the handler of the real time signal will be called with
+the file descriptor in the
+.I si_fd
+field of its
+.IR siginfo_t .
+See
+.BR fcntl (2)
+for more information.
+.PP
+Under some circumstances (e.g., multiple processes accessing a
+single socket), the condition that caused the
+.B SIGIO
+may have already disappeared when the process reacts to the signal.
+If this happens, the process should wait again because Linux
+will resend the signal later.
+.\" .SS Ancillary Messages
+.SS Sysctls
+The core socket networking sysctls can be accessed using the
+.I /proc/sys/net/core/*
+files or with the
+.BR sysctl (2)
+interface.
+.TP
+.I rmem_default
+contains the default setting in bytes of the socket receive buffer.
+.TP
+.I rmem_max
+contains the maximum socket receive buffer size in bytes which a user may
+set by using the
+.B SO_RCVBUF
+socket option.
+.TP
+.I wmem_default
+contains the default setting in bytes of the socket send buffer.
+.TP
+.I wmem_max
+contains the maximum socket send buffer size in bytes which a user may
+set by using the
+.B SO_SNDBUF
+socket option.
+.TP
+.BR message_cost " and " message_burst
+configure the token bucket filter used to load limit warning messages
+caused by external network events.
+.TP
+.I netdev_max_backlog
+Maximum number of packets in the global input queue.
+.TP
+.I optmem_max
+Maximum length of ancillary data and user control data like the iovecs
+per socket.
+.\" netdev_fastroute is not documented because it is experimental
+.SS Ioctls
+These operations can be accessed using
+.BR ioctl (2):
+
+.in +4n
+.nf
+.IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
+.fi
+.in
+.TP
+.B SIOCGSTAMP
+Return a
+.I struct timeval
+with the receive timestamp of the last packet passed to the user.
+This is useful for accurate round trip time measurements.
+See
+.BR setitimer (2)
+for a description of
+.IR "struct timeval" .
+.\"
+This ioctl should only be used if the socket option
+.B SO_TIMESTAMP
+is not set on the socket.
+Otherwise, it returns the timestamp of the
+last packet that was received while
+.B SO_TIMESTAMP
+was not set, or it fails if no such packet has been received,
+(i.e.,
+.BR ioctl (2)
+returns \-1 with
+.I errno
+set to
+.BR ENOENT ).
+.TP
+.B SIOCSPGRP
+Set the process or process group to send
+.B SIGIO
+or
+.B SIGURG
+signals
+to when an
+asynchronous I/O operation has finished or urgent data is available.
+The argument is a pointer to a
+.IR pid_t .
+If the argument is positive, send the signals to that process.
+If the
+argument is negative, send the signals to the process group with the ID
+of the absolute value of the argument.
+The process may only choose itself or its own process group to receive
+signals unless it has the
+.B CAP_KILL
+capability or an effective UID of 0.
+.TP
+.B FIOASYNC
+Change the
+.B O_ASYNC
+flag to enable or disable asynchronous I/O mode of the socket.
+Asynchronous I/O mode means that the
+.B SIGIO
+signal or the signal set with
+.B F_SETSIG
+is raised when a new I/O event occurs.
+.IP
+Argument is an integer boolean flag.
+(This operation is synonymous with the use of
+.BR fcntl (2)
+to set the
+.B O_ASYNC
+flag.)
+.\"
+.TP
+.B SIOCGPGRP
+Get the current process or process group that receives
+.B SIGIO
+or
+.B SIGURG
+signals,
+or 0
+when none is set.
+.PP
+Valid
+.BR fcntl (2)
+operations:
+.TP
+.B FIOGETOWN
+The same as the
+.B SIOCGPGRP
+.BR ioctl (2).
+.TP
+.B FIOSETOWN
+The same as the
+.B SIOCSPGRP
+.BR ioctl (2).
+.SH VERSIONS
+.B SO_BINDTODEVICE
+was introduced in Linux 2.0.30.
+.B SO_PASSCRED
+is new in Linux 2.2.
+The sysctls are new in Linux 2.2.
+.B SO_RCVTIMEO
+and
+.B SO_SNDTIMEO
+are supported since Linux 2.3.41.
+Earlier, timeouts were fixed to
+a protocol-specific setting, and could not be read or written.
+.SH NOTES
+Linux assumes that half of the send/receive buffer is used for internal
+kernel structures; thus the sysctls are twice what can be observed
+on the wire.
+
+Linux will only allow port re-use with the
+.B SO_REUSEADDR
+option
+when this option was set both in the previous program that performed a
+.BR bind (2)
+to the port and in the program that wants to re-use the port.
+This differs from some implementations (e.g., FreeBSD)
+where only the later program needs to set the
+.B SO_REUSEADDR
+option.
+Typically this difference is invisible, since, for example, a server
+program is designed to always set this option.
+.SH BUGS
+The
+.B CONFIG_FILTER
+socket options
+.B SO_ATTACH_FILTER
+and
+.B SO_DETACH_FILTER
+are
+not documented.
+The suggested interface to use them is via the libpcap
+library.
+.\" .SH AUTHORS
+.\" This man page was written by Andi Kleen.
+.SH "SEE ALSO"
+.BR getsockopt (2),
+.BR setsockopt (2),
+.BR socket (2),
+.BR capabilities (7),
+.BR ddp (7),
+.BR ip (7),
+.BR packet (7)
--- a/man7/tcp.7
+++ b/man7/tcp.7
@ -1,8 +1,947 @@
-.TH TCP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH TCP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH TCP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH TCP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH TCP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH TCP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH TCP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH TCP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\"
+.\" 2.4 Updates by Nivedita Singhvi 4/20/02 <nivedita@us.ibm.com>.
+.\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer
+.\"	Updated details of interaction of TCP_CORK and TCP_NODELAY.
+.\"
+.\" FIXME 2.6.17-rc1 adds the following /proc files, which need to be
+.\" 	  documented: tcp_mtu_probing, tcp_base_mss, and
+.\"	  tcp_workaround_signed_windows
+.\"
+.TH TCP  7 2007-11-25 "Linux" "Linux Programmer's Manual"
+.SH NAME
+tcp \- TCP protocol
+.SH SYNOPSIS
+.B #include <sys/socket.h>
+.br
+.B #include <netinet/in.h>
+.br
+.B #include <netinet/tcp.h>
+.sp
+.B tcp_socket = socket(PF_INET, SOCK_STREAM, 0);
+.SH DESCRIPTION
+This is an implementation of the TCP protocol defined in
+RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK
+extensions.
+It provides a reliable, stream-oriented,
+full-duplex connection between two sockets on top of
+.BR ip (7),
+for both v4 and v6 versions.
+TCP guarantees that the data arrives in order and
+retransmits lost packets.
+It generates and checks a per-packet checksum to catch
+transmission errors.
+TCP does not preserve record boundaries.
+
+A newly created TCP socket has no remote or local address and is not
+fully specified.
+To create an outgoing TCP connection use
+.BR connect (2)
+to establish a connection to another TCP socket.
+To receive new incoming connections, first
+.BR bind (2)
+the socket to a local address and port and then call
+.BR listen (2)
+to put the socket into the listening state.
+After that a new
+socket for each incoming connection can be accepted
+using
+.BR accept (2).
+A socket which has had
+.BR accept (2)
+or
+.BR connect (2)
+successfully called on it is fully specified and may
+transmit data.
+Data cannot be transmitted on listening or
+not yet connected sockets.
+
+Linux supports RFC\ 1323 TCP high performance
+extensions.
+These include Protection Against Wrapped
+Sequence Numbers (PAWS), Window Scaling and
+Timestamps.
+Window scaling allows the use
+of large (> 64K) TCP windows in order to support links with high
+latency or bandwidth.
+To make use of them, the send and
+receive buffer sizes must be increased.
+They can be set globally with the
+.I net.ipv4.tcp_wmem
+and
+.I net.ipv4.tcp_rmem
+sysctl variables, or on individual sockets by using the
+.B SO_SNDBUF
+and
+.B SO_RCVBUF
+socket options with the
+.BR setsockopt (2)
+call.
+
+The maximum sizes for socket buffers declared via the
+.B SO_SNDBUF
+and
+.B SO_RCVBUF
+mechanisms are limited by the global
+.I net.core.rmem_max
+and
+.I net.core.wmem_max
+sysctls.
+Note that TCP actually allocates twice the size of
+the buffer requested in the
+.BR setsockopt (2)
+call, and so a succeeding
+.BR getsockopt (2)
+call will not return the same size of buffer as requested
+in the
+.BR setsockopt (2)
+call.
+TCP uses the extra space for administrative purposes and internal
+kernel structures, and the sysctl variables reflect the
+larger sizes compared to the actual TCP windows.
+On individual connections, the socket buffer size must be
+set prior to the
+.BR listen (2)
+or
+.BR connect (2)
+calls in order to have it take effect.
+See
+.BR socket (7)
+for more information.
+.PP
+TCP supports urgent data.
+Urgent data is used to signal the
+receiver that some important message is part of the data
+stream and that it should be processed as soon as possible.
+To send urgent data specify the
+.B MSG_OOB
+option to
+.BR send (2).
+When urgent data is received, the kernel sends a
+.B SIGURG
+signal to the process or process group that has been set as the
+socket "owner" using the
+.B SIOCSPGRP
+or
+.B FIOSETOWN
+ioctls (or the POSIX.1-2001-specified
+.BR fcntl (2)
+.B F_SETOWN
+operation).
+When the
+.B SO_OOBINLINE
+socket option is enabled, urgent data is put into the normal
+data stream (a program can test for its location using the
+.B SIOCATMARK
+ioctl described below),
+otherwise it can be only received when the
+.B MSG_OOB
+flag is set for
+.BR recv (2)
+or
+.BR recvmsg (2).
+
+Linux 2.4 introduced a number of changes for improved
+throughput and scaling, as well as enhanced functionality.
+Some of these features include support for zero-copy
+.BR sendfile (2),
+Explicit Congestion Notification, new
+management of TIME_WAIT sockets, keep-alive socket options
+and support for Duplicate SACK extensions.
+.SS Address Formats
+TCP is built on top of IP (see
+.BR ip (7)).
+The address formats defined by
+.BR ip (7)
+apply to TCP.
+TCP only supports point-to-point
+communication; broadcasting and multicasting are not
+supported.
+.SS Sysctls
+These variables can be accessed by the
+.I /proc/sys/net/ipv4/*
+files or with the
+.BR sysctl (2)
+interface.
+In addition, most IP sysctls also apply to TCP; see
+.BR ip (7).
+Variables described as
+.I Boolean
+take an integer value, with a non-zero value ("true") meaning that
+the corresponding option is enabled, and a zero value ("false")
+meaning that the option is disabled.
+.\" FIXME As at Sept 2006, kernel 2.6.18-rc5, the following are
+.\"	not yet documented (shown with default values):
+.\"
+.\"     /proc/sys/net/ipv4/tcp_congestion_control (since 2.6.13)
+.\"     bic
+.\"     /proc/sys/net/ipv4/tcp_moderate_rcvbuf
+.\"     1
+.\"     /proc/sys/net/ipv4/tcp_no_metrics_save
+.\"     0
+.TP
+.BR tcp_abort_on_overflow " (Boolean; default: disabled)"
+Enable resetting connections if the listening service is too
+slow and unable to keep up and accept them.
+It means that if overflow occurred due
+to a burst, the connection will recover.
+Enable this option
+.I only
+if you are really sure that the listening daemon
+cannot be tuned to accept connections faster.
+Enabling this
+option can harm the clients of your server.
+.TP
+.BR tcp_adv_win_scale " (integer; default: 2)"
+Count buffering overhead as
+.IR "bytes/2^tcp_adv_win_scale" ,
+if
+.I tcp_adv_win_scale
+is greater than 0; or
+.IR "bytes-bytes/2^(\-tcp_adv_win_scale)" ,
+if
+.I tcp_adv_win_scale
+is less than or equal to zero.
+
+The socket receive buffer space is shared between the
+application and kernel.
+TCP maintains part of the buffer as
+the TCP window, this is the size of the receive window
+advertised to the other end.
+The rest of the space is used
+as the "application" buffer, used to isolate the network
+from scheduling and application latencies.
+The
+.I tcp_adv_win_scale
+default value of 2 implies that the space
+used for the application buffer is one fourth that of the
+total.
+.TP
+.BR tcp_app_win  " (integer; default: 31)"
+This variable defines how many
+bytes of the TCP window are reserved for buffering
+overhead.
+
+A maximum of (\fIwindow/2^tcp_app_win\fP, mss) bytes in the window
+are reserved for the application buffer.
+A value of 0
+implies that no amount is reserved.
+.\"
+.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.BR tcp_bic " (Boolean; default: disabled)"
+Enable BIC TCP congestion control algorithm.
+BIC-TCP is a sender-side only change that ensures a linear RTT
+fairness under large windows while offering both scalability and
+bounded TCP-friendliness.
+The protocol combines two schemes
+called additive increase and binary search increase.
+When the
+congestion window is large, additive increase with a large
+increment ensures linear RTT fairness as well as good
+scalability.
+Under small congestion windows, binary search
+increase provides TCP friendliness.
+.\"
+.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.BR tcp_bic_low_window " (integer; default: 14)"
+Sets the threshold window (in packets) where BIC TCP starts to
+adjust the congestion window.
+Below this threshold BIC TCP behaves
+the same as the default TCP Reno.
+.\"
+.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.BR tcp_bic_fast_convergence " (Boolean; default: enabled)"
+Forces BIC TCP to more quickly respond to changes in congestion
+window.
+Allows two flows sharing the same connection to converge
+more rapidly.
+.TP
+.BR tcp_dsack " (Boolean; default: enabled)"
+Enable RFC\ 2883 TCP Duplicate SACK support.
+.TP
+.BR tcp_ecn " (Boolean; default: disabled)"
+Enable RFC\ 2884 Explicit Congestion Notification.
+When enabled, connectivity to some
+destinations could be affected due to older, misbehaving
+routers along the path causing connections to be dropped.
+.TP
+.BR tcp_fack " (Boolean; default: enabled)"
+Enable TCP Forward Acknowledgement support.
+.TP
+.BR tcp_fin_timeout " (integer; default: 60)"
+This specifies how many seconds to wait for a final FIN packet before the
+socket is forcibly closed.
+This is strictly a violation of
+the TCP specification, but required to prevent
+denial-of-service attacks.
+In Linux 2.2, the default value was 180.
+.\"
+.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.BR tcp_frto " (Boolean; default: disabled)"
+Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
+timeouts.
+It is particularly beneficial in wireless environments
+where packet loss is typically due to random radio interference
+rather than intermediate router congestion.
+.TP
+.BR tcp_keepalive_intvl " (integer; default: 75)"
+The number of seconds between TCP keep-alive probes.
+.TP
+.BR tcp_keepalive_probes " (integer; default: 9)"
+The maximum number of TCP keep-alive probes to send
+before giving up and killing the connection if
+no response is obtained from the other end.
+.TP
+.BR tcp_keepalive_time " (integer; default: 7200)"
+The number of seconds a connection needs to be idle
+before TCP begins sending out keep-alive probes.
+Keep-alives are only sent when the
+.B SO_KEEPALIVE
+socket option is enabled.
+The default value is 7200 seconds (2 hours).
+An idle connection is terminated after
+approximately an additional 11 minutes (9 probes an interval
+of 75 seconds apart) when keep-alive is enabled.
+
+Note that underlying connection tracking mechanisms and
+application timeouts may be much shorter.
+.\"
+.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.BR tcp_low_latency  " (Boolean; default: disabled)"
+If enabled, the TCP stack makes decisions that prefer lower
+latency as opposed to higher throughput.
+It this option is disabled, then higher throughput is preferred.
+An example of an application where this default should be
+changed would be a Beowulf compute cluster.
+.TP
+.BR tcp_max_orphans  " (integer; default: see below)"
+The maximum number of orphaned (not attached to any user file
+handle) TCP sockets allowed in the system.
+When this number
+is exceeded, the orphaned connection is reset and a warning
+is printed.
+This limit exists only to prevent simple denial-of-service attacks.
+Lowering this limit is not recommended.
+Network conditions might require you to increase the number of
+orphans allowed, but note that each orphan can eat up to ~64K
+of unswappable memory.
+The default initial value is set
+equal to the kernel parameter NR_FILE.
+This initial default is adjusted depending on the memory in the system.
+.TP
+.BR tcp_max_syn_backlog " (integer; default: see below)"
+The maximum number of queued connection requests which have
+still not received an acknowledgement from the connecting client.
+If this number is exceeded, the kernel will begin
+dropping requests.
+The default value of 256 is increased to
+1024 when the memory present in the system is adequate or
+greater (>= 128Mb), and reduced to 128 for those systems with
+very low memory (<= 32Mb).
+It is recommended that if this
+needs to be increased above 1024, TCP_SYNQ_HSIZE in
+.I include/net/tcp.h
+be modified to keep
+TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be
+recompiled.
+.TP
+.BR tcp_max_tw_buckets " (integer; default: see below)"
+The maximum number of sockets in TIME_WAIT state allowed in
+the system.
+This limit exists only to prevent simple denial-of-service
+attacks.
+The default value of NR_FILE*2 is adjusted
+depending on the memory in the system.
+If this number is
+exceeded, the socket is closed and a warning is printed.
+.TP
+.I tcp_mem
+This is a vector of 3 integers: [low, pressure, high].
+These bounds are used by TCP to track its memory usage.
+The
+defaults are calculated at boot time from the amount of
+available memory.
+(TCP can only use
+.I "low memory"
+for this, which is limited to around 900 megabytes on 32-bit systems.
+64-bit systems do not suffer this limitation.)
+
+.I low
+- TCP doesn't regulate its memory allocation when the number
+of pages it has allocated globally is below this number.
+
+.I pressure
+- when the amount of memory allocated by TCP
+exceeds this number of pages, TCP moderates its memory consumption.
+This memory pressure state is exited
+once the number of pages allocated falls below
+the
+.I low
+mark.
+
+.I high
+- the maximum number of pages, globally, that TCP
+will allocate.
+This value overrides any other limits
+imposed by the kernel.
+.TP
+.BR tcp_orphan_retries " (integer; default: 8)"
+The maximum number of attempts made to probe the other
+end of a connection which has been closed by our end.
+.TP
+.BR tcp_reordering " (integer; default: 3)"
+The maximum a packet can be reordered in a TCP packet stream
+without TCP assuming packet loss and going into slow start.
+It is not advisable to change this number.
+This is a packet reordering detection metric designed to
+minimize unnecessary back off and retransmits provoked by
+reordering of packets on a connection.
+.TP
+.BR tcp_retrans_collapse " (Boolean; default: enabled)"
+Try to send full-sized packets during retransmit.
+.TP
+.BR tcp_retries1 " (integer; default: 3)"
+The number of times TCP will attempt to retransmit a
+packet on an established connection normally,
+without the extra effort of getting the network
+layers involved.
+Once we exceed this number of
+retransmits, we first have the network layer
+update the route if possible before each new retransmit.
+The default is the RFC specified minimum of 3.
+.TP
+.BR tcp_retries2 " (integer; default: 15)"
+The maximum number of times a TCP packet is retransmitted
+in established state before giving up.
+The default
+value is 15, which corresponds to a duration of
+approximately between 13 to 30 minutes, depending
+on the retransmission timeout.
+The RFC\ 1122 specified
+minimum limit of 100 seconds is typically deemed too
+short.
+.TP
+.BR tcp_rfc1337 " (Boolean; default: disabled)"
+Enable TCP behavior conformant with RFC\ 1337.
+When disabled,
+if a RST is received in TIME_WAIT state, we close
+the socket immediately without waiting for the end
+of the TIME_WAIT period.
+.TP
+.I tcp_rmem
+This is a vector of 3 integers: [min, default,
+max].
+These parameters are used by TCP to regulate receive
+buffer sizes.
+TCP dynamically adjusts the size of the
+receive buffer from the defaults listed below, in the range
+of these sysctl variables, depending on memory available
+in the system.
+.RS
+.TP 9
+.I min
+minimum size of the receive buffer used by each TCP socket.
+The default value is 4K, and is lowered to
+.B PAGE_SIZE
+bytes in low-memory systems.
+This value
+is used to ensure that in memory pressure mode,
+allocations below this size will still succeed.
+This is not
+used to bound the size of the receive buffer declared
+using
+.B SO_RCVBUF
+on a socket.
+.TP
+.I default
+the default size of the receive buffer for a TCP socket.
+This value overwrites the initial default buffer size from
+the generic global
+.I net.core.rmem_default
+defined for all protocols.
+The default value is 87380
+bytes, and is lowered to 43689 in low-memory systems.
+If larger receive buffer sizes are desired, this value should
+be increased (to affect all sockets).
+To employ large TCP
+windows, the
+.I net.ipv4.tcp_window_scaling
+must be enabled (default).
+.TP
+.I max
+the maximum size of the receive buffer used by
+each TCP socket.
+This value does not override the global
+.IR net.core.rmem_max .
+This is not used to limit the size of the receive buffer
+declared using
+.B SO_RCVBUF
+on a socket.
+The default value of 87380*2 bytes is lowered to 87380
+in low-memory systems.
+.RE
+.TP
+.BR tcp_sack " (Boolean; default: enabled)"
+Enable RFC\ 2018 TCP Selective Acknowledgements.
+.TP
+.BR tcp_stdurg " (Boolean; default: disabled)"
+If this option is enabled, then use the RFC\ 1122 interpretation
+of the TCP urgent-pointer field.
+.\" RFC 793 was ambiguous in its specification of the meaning of the
+.\" urgent pointer.  RFC 1122 (and RFC 961) fixed on a particular
+.\" resolution of this ambiguity (unfortunately the "wrong" one).
+According to this interpretation, the urgent pointer points
+to the last byte of urgent data.
+If this option is disabled, then use the BSD-compatible interpretation of
+the urgent pointer:
+the urgent pointer points to the first byte after the urgent data.
+Enabling this option may lead to interoperability problems.
+.TP
+.BR tcp_synack_retries " (integer; default: 5)"
+The maximum number of times a SYN/ACK segment
+for a passive TCP connection will be retransmitted.
+This number should not be higher than 255.
+.TP
+.BR tcp_syncookies " (Boolean)"
+Enable TCP syncookies.
+The kernel must be compiled with
+.BR CONFIG_SYN_COOKIES .
+Send out syncookies when the syn backlog queue of a socket
+overflows.
+The syncookies feature attempts to protect a
+socket from a SYN flood attack.
+This should be used as a
+last resort, if at all.
+This is a violation of the TCP
+protocol, and conflicts with other areas of TCP such as TCP
+extensions.
+It can cause problems for clients and relays.
+It is not recommended as a tuning mechanism for heavily
+loaded servers to help with overloaded or misconfigured
+conditions.
+For recommended alternatives see
+.IR tcp_max_syn_backlog ,
+.IR tcp_synack_retries ,
+and
+.IR tcp_abort_on_overflow .
+.TP
+.BR tcp_syn_retries  " (integer; default: 5)"
+The maximum number of times initial SYNs for an active TCP
+connection attempt will be retransmitted.
+This value should
+not be higher than 255.
+The default value is 5, which
+corresponds to approximately 180 seconds.
+.TP
+.BR tcp_timestamps " (Boolean; default: enabled)"
+Enable RFC\ 1323 TCP timestamps.
+.TP
+.BR tcp_tw_recycle " (Boolean; default: disabled)"
+Enable fast recycling of TIME_WAIT sockets.
+Enabling this option is not
+recommended since this causes problems when working
+with NAT (Network Address Translation).
+.\"
+.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.BR tcp_tw_reuse " (Boolean; default: disabled)"
+Allow to reuse TIME_WAIT sockets for new connections when it is
+safe from protocol viewpoint.
+It should not be changed without advice/request of technical
+experts.
+.TP
+.BR tcp_window_scaling " (Boolean; default: enabled)"
+Enable RFC\ 1323 TCP window scaling.
+This feature allows the use of a large window
+(> 64K) on a TCP connection, should the other end support it.
+Normally, the 16 bit window length field in the TCP header
+limits the window size to less than 64K bytes.
+If larger
+windows are desired, applications can increase the size of
+their socket buffers and the window scaling option will be
+employed.
+If
+.I tcp_window_scaling
+is disabled, TCP will not negotiate the use of window
+scaling with the other end during connection setup.
+.\"
+.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.BR tcp_vegas_cong_avoid  " (Boolean; default: disabled)"
+Enable TCP Vegas congestion avoidance algorithm.
+TCP Vegas is a sender-side only change to TCP that anticipates
+the onset of congestion by estimating the bandwidth.
+TCP Vegas
+adjusts the sending rate by modifying the congestion
+window.
+TCP Vegas should provide less packet loss, but it is
+not as aggressive as TCP Reno.
+.\"
+.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
+.TP
+.BR tcp_westwood " (Boolean; default: disabled)"
+Enable TCP Westwood+ congestion control algorithm.
+TCP Westwood+ is a sender-side only modification of the TCP Reno
+protocol stack that optimizes the performance of TCP congestion
+control.
+It is based on end-to-end bandwidth estimation to set
+congestion window and slow start threshold after a congestion
+episode.
+Using this estimation, TCP Westwood+ adaptively sets a
+slow start threshold and a congestion window which takes into
+account the bandwidth used at the time congestion is experienced.
+TCP Westwood+ significantly increases fairness with respect to
+TCP Reno in wired networks and throughput over wireless links.
+.TP
+.I tcp_wmem
+This is a vector of 3 integers: [min, default, max].
+These parameters are used by TCP to regulate send buffer sizes.
+TCP dynamically adjusts the size of the send buffer from the
+default values listed below, in the range of these sysctl
+variables, depending on memory available.
+
+.I min
+- minimum size of the send buffer used by each TCP socket.
+The default value is 4K bytes.
+This value is used to ensure that in memory pressure mode,
+allocations below this size will still succeed.
+This is not
+used to bound the size of the send buffer declared
+using
+.B SO_SNDBUF
+on a socket.
+
+.I default
+- the default size of the send buffer for a TCP socket.
+This value overwrites the initial default buffer size from
+the generic global
+.I net.core.wmem_default
+defined for all protocols.
+The default value is 16K bytes.
+If larger send buffer sizes are desired, this value
+should be increased (to affect all sockets).
+To employ large TCP windows, the sysctl variable
+.I net.ipv4.tcp_window_scaling
+must be enabled (default).
+
+.I max
+- the maximum size of the send buffer used by
+each TCP socket.
+This value does not override the global
+.IR net.core.wmem_max .
+This is not used to limit the size of the send buffer
+declared using
+.B SO_SNDBUF
+on a socket.
+The default value is 128K bytes.
+It is lowered to 64K
+depending on the memory available in the system.
+.SS Socket Options
+To set or get a TCP socket option, call
+.BR getsockopt (2)
+to read or
+.BR setsockopt (2)
+to write the option with the option level argument set to
+.BR IPPROTO_TCP .
+.\" or SOL_TCP on Linux
+In addition,
+most
+.B IPPROTO_IP
+socket options are valid on TCP sockets.
+For more information see
+.BR ip (7).
+.\" FIXME Document TCP_CONGESTION (new in 2.6.13)
+.TP
+.B TCP_CORK
+If set, don't send out partial frames.
+All queued
+partial frames are sent when the option is cleared again.
+This is useful for prepending headers before calling
+.BR sendfile (2),
+or for throughput optimization.
+As currently implemented, there is a 200 millisecond ceiling on the time
+for which output is corked by
+.BR TCP_CORK .
+If this ceiling is reached, then queued data is automatically transmitted.
+This option can be
+combined with
+.B TCP_NODELAY
+only since Linux 2.5.71.
+This option should not be used in code intended to be
+portable.
+.TP
+.B TCP_DEFER_ACCEPT
+Allows a listener to be awakened only when data arrives on
+the socket.
+Takes an integer value (seconds), this can
+bound the maximum number of attempts TCP will make to
+complete the connection.
+This option should not be used in
+code intended to be portable.
+.TP
+.B TCP_INFO
+Used to collect information about this socket.
+The kernel returns a \fIstruct tcp_info\fP as defined in the file
+.IR /usr/include/linux/tcp.h .
+This option should not be used in code intended to be portable.
+.TP
+.B TCP_KEEPCNT
+The maximum number of keepalive probes TCP should send
+before dropping the connection.
+This option should not be
+used in code intended to be portable.
+.TP
+.B TCP_KEEPIDLE
+The time (in seconds) the connection needs to remain idle
+before TCP starts sending keepalive probes, if the socket
+option
+.B SO_KEEPALIVE
+has been set on this socket.
+This option should not be used in code intended to be portable.
+.TP
+.B TCP_KEEPINTVL
+The time (in seconds) between individual keepalive probes.
+This option should not be used in code intended to be
+portable.
+.TP
+.B TCP_LINGER2
+The lifetime of orphaned FIN_WAIT2 state sockets.
+This option can be used to override the system wide sysctl
+.I tcp_fin_timeout
+on this socket.
+This is not to be confused with the
+.BR socket (7)
+level option
+.BR SO_LINGER .
+This option should not be used in code intended to be
+portable.
+.TP
+.B TCP_MAXSEG
+The maximum segment size for outgoing TCP packets.
+If this option is set before connection establishment, it also
+changes the MSS value announced to the other end in the
+initial packet.
+Values greater than the (eventual) interface MTU have no effect.
+TCP will also impose
+its minimum and maximum bounds over the value provided.
+.TP
+.B TCP_NODELAY
+If set, disable the Nagle algorithm.
+This means that segments
+are always sent as soon as possible, even if there is only a
+small amount of data.
+When not set, data is buffered until there
+is a sufficient amount to send out, thereby avoiding the
+frequent sending of small packets, which results in poor
+utilization of the network.
+This option is overridden by
+.BR TCP_CORK ;
+however, setting this option forces an explicit flush of
+pending output, even if
+.B TCP_CORK
+is currently set.
+.TP
+.B TCP_QUICKACK
+Enable quickack mode if set or disable quickack
+mode if cleared.
+In quickack mode, acks are sent
+immediately, rather than delayed if needed in accordance
+to normal TCP operation.
+This flag is not permanent,
+it only enables a switch to or from quickack mode.
+Subsequent operation of the TCP protocol will
+once again enter/leave quickack mode depending on
+internal protocol processing and factors such as
+delayed ack timeouts occurring and data transfer.
+This option should not be used in code intended to be
+portable.
+.TP
+.B TCP_SYNCNT
+Set the number of SYN retransmits that TCP should send before
+aborting the attempt to connect.
+It cannot exceed 255.
+This option should not be used in code intended to be
+portable.
+.TP
+.B TCP_WINDOW_CLAMP
+Bound the size of the advertised window to this value.
+The kernel imposes a minimum size of SOCK_MIN_RCVBUF/2.
+This option should not be used in code intended to be
+portable.
+.SS Ioctls
+These following
+.BR ioctl (2)
+calls return information in
+.IR value .
+The correct syntax is:
+.PP
+.RS
+.nf
+.BI int " value";
+.IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");"
+.fi
+.RE
+.PP
+.I ioctl_type
+is one of the following:
+.TP
+.B SIOCINQ
+Returns the amount of queued unread data in the receive buffer.
+The socket must not be in LISTEN state, otherwise an error
+.RB ( EINVAL )
+is returned.
+.TP
+.B SIOCATMARK
+Returns true (i.e.,
+.I value
+is non-zero) if the inbound data stream is at the urgent mark.
+
+If the
+.B SO_OOBINLINE
+socket option is set, and
+.B SIOCATMARK
+returns true, then the
+next read from the socket will return the urgent data.
+If the
+.B SO_OOBINLINE
+socket option is not set, and
+.B SIOCATMARK
+returns true, then the
+next read from the socket will return the bytes following
+the urgent data (to actually read the urgent data requires the
+.B recv(MSG_OOB)
+flag).
+
+Note that a read never reads across the urgent mark.
+If an application is informed of the presence of urgent data via
+.BR select (2)
+(using the
+.I exceptfds
+argument) or through delivery of a
+.B SIGURG
+signal,
+then it can advance up to the mark using a loop which repeatedly tests
+.B SIOCATMARK
+and performs a read (requesting any number of bytes) as long as
+.B SIOCATMARK
+returns false.
+.TP
+.B SIOCOUTQ
+Returns the amount of unsent data in the socket send queue.
+The socket must not be in LISTEN state, otherwise an error
+.RB ( EINVAL )
+is returned.
+.SS Error Handling
+When a network error occurs, TCP tries to resend the packet.
+If it doesn't succeed after some time, either
+.B ETIMEDOUT
+or the last received error on this connection is reported.
+.PP
+Some applications require a quicker error notification.
+This can be enabled with the
+.B IPPROTO_IP
+level
+.B IP_RECVERR
+socket option.
+When this option is enabled, all incoming
+errors are immediately passed to the user program.
+Use this
+option with care \(em it makes TCP less tolerant to routing
+changes and other normal network conditions.
+.SH ERRORS
+.TP
+.B EAFNOTSUPPORT
+Passed socket address type in
+.I sin_family
+was not
+.BR AF_INET .
+.TP
+.B EPIPE
+The other end closed the socket unexpectedly or a read is
+executed on a shut down socket.
+.TP
+.B ETIMEDOUT
+The other end didn't acknowledge retransmitted data after
+some time.
+.PP
+Any errors defined for
+.BR ip (7)
+or the generic socket layer may also be returned for TCP.
+.SH VERSIONS
+Support for Explicit Congestion Notification, zero-copy
+.BR sendfile (2),
+reordering support and some SACK extensions
+(DSACK) were introduced in 2.4.
+Support for forward acknowledgement (FACK), TIME_WAIT recycling,
+per connection keepalive socket options and sysctls
+were introduced in 2.3.
+
+The default values and descriptions for the sysctl variables
+given above are applicable for the 2.4 kernel.
+.SH NOTES
+TCP has no real out-of-band data; it has urgent data.
+In Linux this means if the other end sends newer out-of-band
+data the older urgent data is inserted as normal data into
+the stream (even when
+.B SO_OOBINLINE
+is not set).
+This differs from BSD-based stacks.
+.PP
+Linux uses the BSD compatible interpretation of the urgent
+pointer field by default.
+This violates RFC\ 1122, but is
+required for interoperability with other stacks.
+It can be changed by the
+.I tcp_stdurg
+sysctl.
+.SH BUGS
+Not all errors are documented.
+.br
+IPv6 is not described.
+.\" Only a single Linux kernel version is described
+.\" Info for 2.2 was lost. Should be added again,
+.\" or put into a separate page.
+.\" .SH AUTHORS
+.\" This man page was originally written by Andi Kleen.
+.\" It was updated for 2.4 by Nivedita Singhvi with input from
+.\" Alexey Kuznetsov's Documentation/networking/ip-sysctl.txt
+.\" document.
+.SH "SEE ALSO"
+.BR accept (2),
+.BR bind (2),
+.BR connect (2),
+.BR getsockopt (2),
+.BR listen (2),
+.BR recvmsg (2),
+.BR sendfile (2),
+.BR sendmsg (2),
+.BR socket (2),
+.BR sysctl (2),
+.BR ip (7),
+.BR socket (7)
+.sp
+RFC\ 793 for the TCP specification.
+.br
+RFC\ 1122 for the TCP requirements and a description
+of the Nagle algorithm.
+.br
+RFC\ 1323 for TCP timestamp and window scaling options.
+.br
+RFC\ 1644 for a description of TIME_WAIT assassination
+hazards.
+.br
+RFC\ 3168 for a description of Explicit Congestion
+Notification.
+.br
+RFC\ 2581 for TCP congestion control algorithms.
+.br
+RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK.
--- a/man7/udp.7
+++ b/man7/udp.7
@ -1,8 +1,193 @@
-.TH UDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UDP  7 2008-08-07 "Linux" "Linux Programmer's Manual"
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\" $Id: udp.7,v 1.7 2000/01/22 01:55:05 freitag Exp $
+.\"
+.TH UDP  7 1998-10-02 "Linux" "Linux Programmer's Manual"
+.SH NAME
+udp \- User Datagram Protocol for IPv4
+.SH SYNOPSIS
+.B #include <sys/socket.h>
+.br
+.B #include <netinet/in.h>
+.sp
+.B udp_socket = socket(PF_INET, SOCK_DGRAM, 0);
+.SH DESCRIPTION
+This is an implementation of the User Datagram Protocol
+described in RFC\ 768.
+It implements a connectionless, unreliable datagram packet service.
+Packets may be reordered or duplicated before they arrive.
+UDP generates and checks checksums to catch transmission errors.
+
+When a UDP socket is created,
+its local and remote addresses are unspecified.
+Datagrams can be sent immediately using
+.BR sendto (2)
+or
+.BR sendmsg (2)
+with a valid destination address as an argument.
+When
+.BR connect (2)
+is called on the socket the default destination address is set and
+datagrams can now be sent using
+.BR send (2)
+or
+.BR write (2)
+without specifying a destination address.
+It is still possible to send to other destinations by passing an
+address to
+.BR sendto (2)
+or
+.BR sendmsg (2).
+In order to receive packets the socket can be bound to a local
+address first by using
+.BR bind (2).
+Otherwise the socket layer will automatically assign
+a free local port out of the range defined by
+.I net.ipv4.ip_local_port_range
+and bind the socket to
+.BR INADDR_ANY .
+
+All receive operations return only one packet.
+When the packet is smaller than the passed buffer only that much
+data is returned, when it is bigger the packet is truncated and the
+.B MSG_TRUNC
+flag is set.
+.B MSG_WAITALL
+is not supported.
+
+IP options may be sent or received using the socket options described in
+.BR ip (7).
+They are only processed by the kernel when the appropriate sysctl
+is enabled (but still passed to the user even when it is turned off).
+See
+.BR ip (7).
+
+When the
+.B MSG_DONTROUTE
+flag is set on sending the destination address must refer to a local
+interface address and the packet is only sent to that interface.
+
+By default Linux UDP does path MTU (Maximum Transmission Unit) discovery.
+This means the kernel
+will keep track of the MTU to a specific target IP address and return
+.B EMSGSIZE
+when a UDP packet write exceeds it.
+When this happens the application should decrease the packet size.
+Path MTU discovery can be also turned off using the
+.B IP_MTU_DISCOVER
+socket option or the
+.I ip_no_pmtu_disc
+sysctl, see
+.BR ip (7)
+for details.
+When turned off UDP will fragment outgoing UDP packets
+that exceed the interface MTU.
+However disabling it is not recommended
+for performance and reliability reasons.
+.SS "Address Format"
+UDP uses the IPv4
+.I sockaddr_in
+address format described in
+.BR ip (7).
+.SS "Error Handling"
+All fatal errors will be passed to the user as an error return even
+when the socket is not connected.
+This includes asynchronous errors
+received from the network.
+You may get an error for an earlier packet
+that was sent on the same socket.
+This behavior differs from many other BSD socket implementations
+which don't pass any errors unless the socket is connected.
+Linux's behavior is mandated by
+.BR RFC\ 1122 .
+
+For compatibility with legacy code in Linux 2.0 and 2.2
+it was possible to set the
+.B SO_BSDCOMPAT
+.B SOL_SOCKET
+option to receive remote errors only when the socket has been
+connected (except for
+.B EPROTO
+and
+.BR EMSGSIZE ).
+Locally generated errors are always passed.
+Support for this socket option was removed in later kernels; see
+.BR socket (7)
+for further information.
+
+When the
+.B IP_RECVERR
+option is enabled all errors are stored in the socket error queue
+and can be received by
+.BR recvmsg (2)
+with the
+.B MSG_ERRQUEUE
+flag set.
+.SS "Socket Options"
+To set or get a UDP socket option, call
+.BR getsockopt (2)
+to read or
+.BR setsockopt (2)
+to write the option with the option level argument set to
+.BR IPPROTO_UDP .
+.TP
+.BR UDP_CORK " (since Linux 2.5.44)"
+If this option is enabled, then all data output on this socket
+is accumulated into a single datagram that is transmitted when
+the option is disabled.
+This option should not be used in code intended to be
+portable.
+.\" FIXME document UDP_ENCAP (new in kernel 2.5.67)
+.SS Ioctls
+These ioctls can be accessed using
+.BR ioctl (2).
+The correct syntax is:
+.PP
+.RS
+.nf
+.BI int " value";
+.IB error " = ioctl(" udp_socket ", " ioctl_type ", &" value ");"
+.fi
+.RE
+.TP
+.BR FIONREAD " (" SIOCINQ )
+Gets a pointer to an integer as argument.
+Returns the size of the next pending datagram in the integer in bytes,
+or 0 when no datagram is pending.
+.TP
+.BR TIOCOUTQ " (" SIOCOUTQ )
+Returns the number of data bytes in the local send queue.
+Only supported with Linux 2.4 and above.
+.PP
+In addition all ioctls documented in
+.BR ip (7)
+and
+.BR socket (7)
+are supported.
+.SH ERRORS
+All errors documented for
+.BR socket (7)
+or
+.BR ip (7)
+may be returned by a send or receive on a UDP socket.
+
+.B ECONNREFUSED
+No receiver was associated with the destination address.
+This might be caused by a previous packet sent over the socket.
+.SH VERSIONS
+IP_RECVERR is a new feature in Linux 2.2.
+.\" .SH CREDITS
+.\" This man page was written by Andi Kleen.
+.SH "SEE ALSO"
+.BR ip (7),
+.BR raw (7),
+.BR socket (7)
+
+RFC\ 768 for the User Datagram Protocol.
+.br
+RFC\ 1122 for the host requirements.
+.br
+RFC\ 1191 for a description of path MTU discovery.
--- a/man7/unix.7
+++ b/man7/unix.7
@ -1,8 +1,359 @@
-.TH UNIX  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UNIX  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UNIX  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UNIX  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UNIX  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UNIX  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UNIX  7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH UNIX  7 2008-08-07 "Linux" "Linux Programmer's Manual"
+.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\"
+.\" Modified, 2003-12-02, Michael Kerrisk, <mtk.manpages@gmail.com>
+.\" Modified, 2003-09-23, Adam Langley
+.\" Modified, 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com>
+.\"	Added SOCK_SEQPACKET
+.\" 2008-05-27, mtk, Provide a clear description of the three types of
+.\"     address that can appear in the sockaddr_un structure: pathname,
+.\"     unnamed, and abstract.
+.\"
+.TH UNIX  7 2008-06-17 "Linux" "Linux Programmer's Manual"
+.SH NAME
+unix, PF_UNIX, AF_UNIX, PF_LOCAL, AF_LOCAL \- Sockets for local
+interprocess communication
+.SH SYNOPSIS
+.B #include <sys/socket.h>
+.br
+.B #include <sys/un.h>
+
+.IB unix_socket " = socket(PF_UNIX, type, 0);"
+.br
+.IB error " = socketpair(PF_UNIX, type, 0, int *" sv ");"
+.SH DESCRIPTION
+The
+.B PF_UNIX
+(also known as
+.BR PF_LOCAL )
+socket family is used to communicate between processes on the same machine
+efficiently.
+Traditionally, Unix sockets can be either unnamed,
+or bound to a file system pathname (marked as being of type socket).
+Linux also supports an abstract namespace which is independent of the
+file system.
+
+Valid types are:
+.BR SOCK_STREAM ,
+for a stream-oriented socket and
+.BR SOCK_DGRAM ,
+for a datagram-oriented socket that preserves message boundaries
+(as on most Unix implementations, Unix domain datagram
+sockets are always reliable and don't reorder datagrams);
+and (since Linux 2.6.4)
+.BR SOCK_SEQPACKET ,
+for a connection-oriented socket that preserves message boundaries
+and delivers messages in the order that they were sent.
+
+Unix sockets support passing file descriptors or process credentials
+to other processes using ancillary data.
+.SS Address Format
+A Unix domain socket address is represented in the following structure:
+.in +4n
+.nf
+
+#define UNIX_PATH_MAX    108
+
+struct sockaddr_un {
+    sa_family_t sun_family;               /* AF_UNIX */
+    char        sun_path[UNIX_PATH_MAX];  /* pathname */
+};
+.fi
+.in
+.PP
+.I sun_family
+always contains
+.BR AF_UNIX .
+
+Three types of address are distinguished in this structure:
+.IP * 3
+.IR pathname :
+a Unix domain socket can be bound to a null-terminated file
+system pathname using
+.BR bind (2).
+When the address of the socket is returned by
+.BR getsockname (2),
+.BR getpeername (2),
+and
+.BR accept (2),
+its length is
+.IR "sizeof(sa_family_t) + strlen(sun_path) + 1" ,
+and
+.I sun_path
+contains the null-terminated pathname.
+.IP *
+.IR unnamed :
+A stream socket that has not been bound to a pathname using
+.BR bind (2)
+has no name.
+Likewise, the two sockets created by
+.BR socketpair (2)
+are unnamed.
+When the address of an unnamed socket is returned by
+.BR getsockname (2),
+.BR getpeername (2),
+and
+.BR accept (2),
+its length is
+.IR "sizeof(sa_family_t)" ,
+and
+.I sun_path
+should not be inspected.
+.\" There is quite some variation across implementations: FreeBSD
+.\" says the length is 16 bytes, HP-UX 11 says it's zero bytes.
+.IP *
+.IR abstract :
+an abstract socket address is distinguished by the fact that
+.IR sun_path[0]
+is a null byte ('\\0').
+All of the remaining bytes in
+.I sun_path
+define the "name" of the socket.
+(Null bytes in the name have no special significance.)
+The name has no connection with file system pathnames.
+The socket's address in this namespace is given by the rest of the
+bytes in
+.IR sun_path .
+When the address of an abstract socket is returned by
+.BR getsockname (2),
+.BR getpeername (2),
+and
+.BR accept (2),
+its length is
+.IR "sizeof(struct sockaddr_un)" ,
+and
+.I sun_path
+contains the abstract name.
+The abstract socket namespace is a non-portable Linux extension.
+.SS Socket Options
+For historical reasons these socket options are specified with a
+.B SOL_SOCKET
+type even though they are
+.B PF_UNIX
+specific.
+They can be set with
+.BR setsockopt (2)
+and read with
+.BR getsockopt (2)
+by specifying
+.B SOL_SOCKET
+as the socket family.
+.TP
+.B SO_PASSCRED
+Enables the receiving of the credentials of the sending process
+ancillary message.
+When this option is set and the socket is not yet connected
+a unique name in the abstract namespace will be generated automatically.
+Expects an integer boolean flag.
+.SS (Un)supported Features
+The following paragraphs describe domain-specific details and
+unsupported features of the sockets API for Unix domain sockets on Linux.
+
+Unix domain sockets do not support the transmission of
+out-of-band data (the
+.B MSG_OOB
+flag for
+.BR send (2)
+and
+.BR recv (2)).
+
+The
+.BR send (2)
+.B MSG_MORE
+flag is not supported by Unix domain sockets.
+
+The
+.B SO_SNDBUF
+socket option does have an effect for Unix domain sockets, but the
+.B SO_RCVBUF
+option does not.
+For datagram sockets, the
+.B SO_SNDBUF
+value imposes an upper limit on the size of outgoing datagrams.
+This limit is calculated as the doubled (see
+.BR socket (7))
+option value less 32 bytes used for overhead.
+.SS Ancillary Messages
+Ancillary data is sent and received using
+.BR sendmsg (2)
+and
+.BR recvmsg (2).
+For historical reasons the ancillary message types listed below
+are specified with a
+.B SOL_SOCKET
+type even though they are
+.B PF_UNIX
+specific.
+To send them set the
+.I cmsg_level
+field of the struct
+.I cmsghdr
+to
+.B SOL_SOCKET
+and the
+.I cmsg_type
+field to the type.
+For more information see
+.BR cmsg (3).
+.TP
+.B SCM_RIGHTS
+Send or receive a set of open file descriptors from another process.
+The data portion contains an integer array of the file descriptors.
+The passed file descriptors behave as though they have been created with
+.BR dup (2).
+.TP
+.B SCM_CREDENTIALS
+Send or receive Unix credentials.
+This can be used for authentication.
+The credentials are passed as a
+.I struct ucred
+ancillary message.
+
+.in +4n
+.nf
+struct ucred {
+    pid_t pid;    /* process ID of the sending process */
+    uid_t uid;    /* user ID of the sending process */
+    gid_t gid;    /* group ID of the sending process */
+};
+.fi
+.in
+
+The credentials which the sender specifies are checked by the kernel.
+A process with effective user ID 0 is allowed to specify values that do
+not match its own.
+The sender must specify its own process ID (unless it has the capability
+.BR CAP_SYS_ADMIN ),
+its user ID, effective user ID, or saved set-user-ID (unless it has
+.BR CAP_SETUID ),
+and its group ID, effective group ID, or saved set-group-ID
+(unless it has
+.BR CAP_SETGID ).
+To receive a
+.I struct ucred
+message the
+.B SO_PASSCRED
+option must be enabled on the socket.
+.SH ERRORS
+.TP
+.B EADDRINUSE
+Selected local address is already taken or file system socket
+object already exists.
+.TP
+.B ECONNREFUSED
+.BR connect (2)
+called with a socket object that isn't listening.
+This can happen when
+the remote socket does not exist or the filename is not a socket.
+.TP
+.B ECONNRESET
+Remote socket was unexpectedly closed.
+.TP
+.B EFAULT
+User memory address was not valid.
+.TP
+.B EINVAL
+Invalid argument passed.
+A common cause is the missing setting of AF_UNIX
+in the
+.I sun_type
+field of passed addresses or the socket being in an
+invalid state for the applied operation.
+.TP
+.B EISCONN
+.BR connect (2)
+called on an already connected socket or a target address was
+specified on a connected socket.
+.TP
+.B ENOMEM
+Out of memory.
+.TP
+.B ENOTCONN
+Socket operation needs a target address, but the socket is not connected.
+.TP
+.B EOPNOTSUPP
+Stream operation called on non-stream oriented socket or tried to
+use the out-of-band data option.
+.TP
+.B EPERM
+The sender passed invalid credentials in the
+.IR "struct ucred" .
+.TP
+.B EPIPE
+Remote socket was closed on a stream socket.
+If enabled, a
+.B SIGPIPE
+is sent as well.
+This can be avoided by passing the
+.B MSG_NOSIGNAL
+flag to
+.BR sendmsg (2)
+or
+.BR recvmsg (2).
+.TP
+.B EPROTONOSUPPORT
+Passed protocol is not PF_UNIX.
+.TP
+.B EPROTOTYPE
+Remote socket does not match the local socket type
+.RB ( SOCK_DGRAM
+vs.
+.BR SOCK_STREAM )
+.TP
+.B ESOCKTNOSUPPORT
+Unknown socket type.
+.PP
+Other errors can be generated by the generic socket layer or
+by the file system while generating a file system socket object.
+See the appropriate manual pages for more information.
+.SH VERSIONS
+.B SCM_CREDENTIALS
+and the abstract namespace were introduced with Linux 2.2 and should not
+be used in portable programs.
+(Some BSD-derived systems also support credential passing,
+but the implementation details differ.)
+.SH NOTES
+In the Linux implementation, sockets which are visible in the
+file system honor the permissions of the directory they are in.
+Their owner, group and their permissions can be changed.
+Creation of a new socket will fail if the process does not have write and
+search (execute) permission on the directory the socket is created in.
+Connecting to the socket object requires read/write permission.
+This behavior differs from many BSD-derived systems which
+ignore permissions for Unix sockets.
+Portable programs should not rely on
+this feature for security.
+
+Binding to a socket with a filename creates a socket
+in the file system that must be deleted by the caller when it is no
+longer needed (using
+.BR unlink (2)).
+The usual Unix close-behind semantics apply; the socket can be unlinked
+at any time and will be finally removed from the file system when the last
+reference to it is closed.
+
+To pass file descriptors or credentials over a
+.BR SOCK_STREAM ,
+you need
+to send or receive at least one byte of non-ancillary data in the same
+.BR sendmsg (2)
+or
+.BR recvmsg (2)
+call.
+
+Unix domain stream sockets do not support the notion of out-of-band data.
+.SH EXAMPLE
+See
+.BR bind (2).
+.SH "SEE ALSO"
+.BR recvmsg (2),
+.BR sendmsg (2),
+.BR socket (2),
+.BR socketpair (2),
+.BR cmsg (3),
+.BR capabilities (7),
+.BR credentials (7),
+.BR socket (7)
--- a/man7/x25.7
+++ b/man7/x25.7
@ -1,8 +1,122 @@
-.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
-.TH X25 7 2008-08-07 "Linux" "Linux Programmer's Manual"
+.\" This man page is Copyright (C) 1998 Heiner Eisen.
+.\" Permission is granted to distribute possibly modified copies
+.\" of this page provided the header is included verbatim,
+.\" and in case of nontrivial modification author and date
+.\" of the modification is added to the header.
+.\" $Id: x25.7,v 1.4 1999/05/18 10:35:12 freitag Exp $
+.TH X25 7 1998-12-01 "Linux" "Linux Programmer's Manual"
+.SH NAME
+x25, PF_X25 \- ITU-T X.25 / ISO-8208 protocol interface.
+.SH SYNOPSIS
+.B #include <sys/socket.h>
+.br
+.B #include <linux/x25.h>
+.sp
+.B x25_socket = socket(PF_X25, SOCK_SEQPACKET, 0);
+.SH DESCRIPTION
+X25 sockets provide an interface to the X.25 packet layer protocol.
+This allows applications to
+communicate over a public X.25 data network as standardized by
+International Telecommunication Union's recommendation X.25
+(X.25 DTE-DCE mode).
+X25 sockets can also be used for communication
+without an intermediate X.25 network (X.25 DTE-DTE mode) as described
+in ISO-8208.
+.PP
+Message boundaries are preserved \(em a
+.BR read (2)
+from a socket will
+retrieve the same chunk of data as output with the corresponding
+.BR write (2)
+to the peer socket.
+When necessary, the kernel takes care
+of segmenting and re-assembling long messages by means of
+the X.25 M-bit.
+There is no hard-coded upper limit for the
+message size.
+However, re-assembling of a long message might fail if
+there is a temporary lack of system resources or when other constraints
+(such as socket memory or buffer size limits) become effective.
+If that
+occurs, the X.25 connection will be reset.
+.SS Socket Addresses
+The
+.B AF_X25
+socket address family uses the
+.I struct sockaddr_x25
+for representing network addresses as defined in ITU-T
+recommendation X.121.
+.PP
+.in +4n
+.nf
+struct sockaddr_x25 {
+    sa_family_t sx25_family;    /* must be AF_X25 */
+    x25_address sx25_addr;      /* X.121 Address */
+};
+.fi
+.in
+.PP
+.I sx25_addr
+contains a char array
+.I x25_addr[]
+to be interpreted as a null-terminated string.
+.I sx25_addr.x25_addr[]
+consists of up to 15 (not counting the terminating 0) ASCII
+characters forming the X.121 address.
+Only the decimal digit characters from \(aq0\(aq to \(aq9\(aq are allowed.
+.SS Socket Options
+The following X.25-specific socket options can be set by using
+.BR setsockopt (2)
+and read with
+.BR getsockopt (2)
+with the
+.I level
+argument set to
+.BR SOL_X25 .
+.TP
+.B X25_QBITINCL
+Controls whether the X.25 Q-bit (Qualified Data Bit) is accessible by the
+user.
+It expects an integer argument.
+If set to 0 (default),
+the Q-bit is never set for outgoing packets and the Q-bit of incoming
+packets is ignored.
+If set to 1, an additional first byte is prepended
+to each message read from or written to the socket.
+For data read from
+the socket, a 0 first byte indicates that the Q-bits of the corresponding
+incoming data packets were not set.
+A first byte with value 1 indicates
+that the Q-bit of the corresponding incoming data packets was set.
+If the first byte of the data written to the socket is 1 the Q-bit of the
+corresponding outgoing data packets will be set.
+If the first byte is 0
+the Q-bit will not be set.
+.SH VERSIONS
+The PF_X25 protocol family is a new feature of Linux 2.2.
+.SH BUGS
+Plenty, as the X.25 PLP implementation is
+.BR CONFIG_EXPERIMENTAL .
+.PP
+This man page is incomplete.
+.PP
+There is no dedicated application programmer's header file yet;
+you need to include the kernel header file
+.IR <linux/x25.h> .
+.B CONFIG_EXPERIMENTAL
+might also imply that future versions of the
+interface are not binary compatible.
+.PP
+X.25 N-Reset events are not propagated to the user process yet.
+Thus,
+if a reset occurred, data might be lost without notice.
+.SH "SEE ALSO"
+.BR socket (2),
+.BR socket (7)
+.PP
+Jonathan Simon Naylor:
+\(lqThe Re-Analysis and Re-Implementation of X.25.\(rq
+The URL is
+.RS
+.I ftp://ftp.pspt.fi/pub/ham/linux/ax25/x25doc.tgz
+.RE