mirror of https://github.com/mkerrisk/man-pages
1190 lines
41 KiB
Groff
1190 lines
41 KiB
Groff
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
|
.\"
|
|
.\" %%%LICENSE_START(VERBATIM_ONE_PARA)
|
|
.\" Permission is granted to distribute possibly modified copies
|
|
.\" of this page provided the header is included verbatim,
|
|
.\" and in case of nontrivial modification author and date
|
|
.\" of the modification is added to the header.
|
|
.\" %%%LICENSE_END
|
|
.\"
|
|
.\" 2.4 Updates by Nivedita Singhvi 4/20/02 <nivedita@us.ibm.com>.
|
|
.\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer
|
|
.\" Updated details of interaction of TCP_CORK and TCP_NODELAY.
|
|
.\"
|
|
.\" 2008-11-21, mtk, many, many updates.
|
|
.\" The descriptions of /proc files and socket options should now
|
|
.\" be more or less up to date and complete as at Linux 2.6.27
|
|
.\" (other than the remaining FIXMEs in the page source below).
|
|
.\"
|
|
.\" FIXME The following need to be documented
|
|
.\" TCP_CONGESTION (new in 2.6.13)
|
|
.\" commit 5f8ef48d240963093451bcf83df89f1a1364f51d
|
|
.\" Author: Stephen Hemminger <shemminger@osdl.org>
|
|
.\" TCP_MD5SIG (2.6.20)
|
|
.\" commit cfb6eeb4c860592edd123fdea908d23c6ad1c7dc
|
|
.\" Author was yoshfuji@linux-ipv6.org
|
|
.\" Needs CONFIG_TCP_MD5SIG
|
|
.\" From net/inet/Kconfig
|
|
.\" bool "TCP: MD5 Signature Option support (RFC2385) (EXPERIMENTAL)"
|
|
.\" RFC2385 specifies a method of giving MD5 protection to TCP sessions.
|
|
.\" Its main (only?) use is to protect BGP sessions between core routers
|
|
.\" on the Internet.
|
|
.\"
|
|
.\" There is a TCP_MD5SIG option documented in FreeBSD's tcp(4),
|
|
.\" but probably many details are different on Linux
|
|
.\" http://thread.gmane.org/gmane.linux.network/47490
|
|
.\" http://www.daemon-systems.org/man/tcp.4.html
|
|
.\" http://article.gmane.org/gmane.os.netbsd.devel.network/3767/match=tcp_md5sig+freebsd
|
|
.\" TCP_COOKIE_TRANSACTIONS (2.6.33)
|
|
.\" commit 519855c508b9a17878c0977a3cdefc09b59b30df
|
|
.\" Author: William Allen Simpson <william.allen.simpson@gmail.com>
|
|
.\" commit e56fb50f2b7958b931c8a2fc0966061b3f3c8f3a
|
|
.\" Author: William Allen Simpson <william.allen.simpson@gmail.com>
|
|
.\" TCP_THIN_LINEAR_TIMEOUTS (2.6.34)
|
|
.\" commit 36e31b0af58728071e8023cf8e20c5166b700717
|
|
.\" Author: Andreas Petlund <apetlund@simula.no>
|
|
.\" TCP_THIN_DUPACK (2..6.34)
|
|
.\" commit 7e38017557bc0b87434d184f8804cadb102bb903
|
|
.\" Author: Andreas Petlund <apetlund@simula.no>
|
|
.\" TCP_USER_TIMEOUT (new in 2.6.37)
|
|
.\" Author: Jerry Chu <hkchu@google.com>
|
|
.\" commit dca43c75e7e545694a9dd6288553f55c53e2a3a3
|
|
.\" TCP_REPAIR (3.5)
|
|
.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37
|
|
.\" Author: Pavel Emelyanov <xemul@parallels.com>
|
|
.\" TCP_REPAIR_QUEUE (3.5)
|
|
.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37
|
|
.\" Author: Pavel Emelyanov <xemul@parallels.com>
|
|
.\" TCP_QUEUE_SEQ (3.5)
|
|
.\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37
|
|
.\" Author: Pavel Emelyanov <xemul@parallels.com>
|
|
.\" TCP_REPAIR_OPTIONS (3.5)
|
|
.\" commit b139ba4e90dccbf4cd4efb112af96a5c9e0b098c
|
|
.\" Author: Pavel Emelyanov <xemul@parallels.com>
|
|
.\"
|
|
.TH TCP 7 2012-04-23 "Linux" "Linux Programmer's Manual"
|
|
.SH NAME
|
|
tcp \- TCP protocol
|
|
.SH SYNOPSIS
|
|
.B #include <sys/socket.h>
|
|
.br
|
|
.B #include <netinet/in.h>
|
|
.br
|
|
.B #include <netinet/tcp.h>
|
|
.sp
|
|
.B tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
|
|
.SH DESCRIPTION
|
|
This is an implementation of the TCP protocol defined in
|
|
RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK
|
|
extensions.
|
|
It provides a reliable, stream-oriented,
|
|
full-duplex connection between two sockets on top of
|
|
.BR ip (7),
|
|
for both v4 and v6 versions.
|
|
TCP guarantees that the data arrives in order and
|
|
retransmits lost packets.
|
|
It generates and checks a per-packet checksum to catch
|
|
transmission errors.
|
|
TCP does not preserve record boundaries.
|
|
|
|
A newly created TCP socket has no remote or local address and is not
|
|
fully specified.
|
|
To create an outgoing TCP connection use
|
|
.BR connect (2)
|
|
to establish a connection to another TCP socket.
|
|
To receive new incoming connections, first
|
|
.BR bind (2)
|
|
the socket to a local address and port and then call
|
|
.BR listen (2)
|
|
to put the socket into the listening state.
|
|
After that a new socket for each incoming connection can be accepted using
|
|
.BR accept (2).
|
|
A socket which has had
|
|
.BR accept (2)
|
|
or
|
|
.BR connect (2)
|
|
successfully called on it is fully specified and may transmit data.
|
|
Data cannot be transmitted on listening or not yet connected sockets.
|
|
|
|
Linux supports RFC\ 1323 TCP high performance
|
|
extensions.
|
|
These include Protection Against Wrapped
|
|
Sequence Numbers (PAWS), Window Scaling and Timestamps.
|
|
Window scaling allows the use
|
|
of large (> 64K) TCP windows in order to support links with high
|
|
latency or bandwidth.
|
|
To make use of them, the send and receive buffer sizes must be increased.
|
|
They can be set globally with the
|
|
.I /proc/sys/net/ipv4/tcp_wmem
|
|
and
|
|
.I /proc/sys/net/ipv4/tcp_rmem
|
|
files, or on individual sockets by using the
|
|
.B SO_SNDBUF
|
|
and
|
|
.B SO_RCVBUF
|
|
socket options with the
|
|
.BR setsockopt (2)
|
|
call.
|
|
|
|
The maximum sizes for socket buffers declared via the
|
|
.B SO_SNDBUF
|
|
and
|
|
.B SO_RCVBUF
|
|
mechanisms are limited by the values in the
|
|
.I /proc/sys/net/core/rmem_max
|
|
and
|
|
.I /proc/sys/net/core/wmem_max
|
|
files.
|
|
Note that TCP actually allocates twice the size of
|
|
the buffer requested in the
|
|
.BR setsockopt (2)
|
|
call, and so a succeeding
|
|
.BR getsockopt (2)
|
|
call will not return the same size of buffer as requested in the
|
|
.BR setsockopt (2)
|
|
call.
|
|
TCP uses the extra space for administrative purposes and internal
|
|
kernel structures, and the
|
|
.I /proc
|
|
file values reflect the
|
|
larger sizes compared to the actual TCP windows.
|
|
On individual connections, the socket buffer size must be set prior to the
|
|
.BR listen (2)
|
|
or
|
|
.BR connect (2)
|
|
calls in order to have it take effect.
|
|
See
|
|
.BR socket (7)
|
|
for more information.
|
|
.PP
|
|
TCP supports urgent data.
|
|
Urgent data is used to signal the
|
|
receiver that some important message is part of the data
|
|
stream and that it should be processed as soon as possible.
|
|
To send urgent data specify the
|
|
.B MSG_OOB
|
|
option to
|
|
.BR send (2).
|
|
When urgent data is received, the kernel sends a
|
|
.B SIGURG
|
|
signal to the process or process group that has been set as the
|
|
socket "owner" using the
|
|
.B SIOCSPGRP
|
|
or
|
|
.B FIOSETOWN
|
|
ioctls (or the POSIX.1-2001-specified
|
|
.BR fcntl (2)
|
|
.B F_SETOWN
|
|
operation).
|
|
When the
|
|
.B SO_OOBINLINE
|
|
socket option is enabled, urgent data is put into the normal
|
|
data stream (a program can test for its location using the
|
|
.B SIOCATMARK
|
|
ioctl described below),
|
|
otherwise it can be received only when the
|
|
.B MSG_OOB
|
|
flag is set for
|
|
.BR recv (2)
|
|
or
|
|
.BR recvmsg (2).
|
|
|
|
Linux 2.4 introduced a number of changes for improved
|
|
throughput and scaling, as well as enhanced functionality.
|
|
Some of these features include support for zero-copy
|
|
.BR sendfile (2),
|
|
Explicit Congestion Notification, new
|
|
management of TIME_WAIT sockets, keep-alive socket options
|
|
and support for Duplicate SACK extensions.
|
|
.SS Address formats
|
|
TCP is built on top of IP (see
|
|
.BR ip (7)).
|
|
The address formats defined by
|
|
.BR ip (7)
|
|
apply to TCP.
|
|
TCP supports point-to-point communication only;
|
|
broadcasting and multicasting are not
|
|
supported.
|
|
.SS /proc interfaces
|
|
System-wide TCP parameter settings can be accessed by files in the directory
|
|
.IR /proc/sys/net/ipv4/ .
|
|
In addition, most IP
|
|
.I /proc
|
|
interfaces also apply to TCP; see
|
|
.BR ip (7).
|
|
Variables described as
|
|
.I Boolean
|
|
take an integer value, with a nonzero value ("true") meaning that
|
|
the corresponding option is enabled, and a zero value ("false")
|
|
meaning that the option is disabled.
|
|
.TP
|
|
.IR tcp_abc " (Integer; default: 0; since Linux 2.6.15)"
|
|
.\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
|
|
Control the Appropriate Byte Count (ABC), defined in RFC 3465.
|
|
ABC is a way of increasing the congestion window
|
|
.RI ( cwnd )
|
|
more slowly in response to partial acknowledgments.
|
|
Possible values are:
|
|
.RS
|
|
.IP 0 3
|
|
increase
|
|
.I cwnd
|
|
once per acknowledgment (no ABC)
|
|
.IP 1
|
|
increase
|
|
.I cwnd
|
|
once per acknowledgment of full sized segment
|
|
.IP 2
|
|
allow increase
|
|
.I cwnd
|
|
by two if acknowledgment is
|
|
of two segments to compensate for delayed acknowledgments.
|
|
.RE
|
|
.TP
|
|
.IR tcp_abort_on_overflow " (Boolean; default: disabled; since Linux 2.4)"
|
|
.\" Since 2.3.41
|
|
Enable resetting connections if the listening service is too
|
|
slow and unable to keep up and accept them.
|
|
It means that if overflow occurred due
|
|
to a burst, the connection will recover.
|
|
Enable this option
|
|
.I only
|
|
if you are really sure that the listening daemon
|
|
cannot be tuned to accept connections faster.
|
|
Enabling this option can harm the clients of your server.
|
|
.TP
|
|
.IR tcp_adv_win_scale " (integer; default: 2; since Linux 2.4)"
|
|
.\" Since 2.4.0-test7
|
|
Count buffering overhead as
|
|
.IR "bytes/2^tcp_adv_win_scale" ,
|
|
if
|
|
.I tcp_adv_win_scale
|
|
is greater than 0; or
|
|
.IR "bytes-bytes/2^(\-tcp_adv_win_scale)" ,
|
|
if
|
|
.I tcp_adv_win_scale
|
|
is less than or equal to zero.
|
|
|
|
The socket receive buffer space is shared between the
|
|
application and kernel.
|
|
TCP maintains part of the buffer as
|
|
the TCP window, this is the size of the receive window
|
|
advertised to the other end.
|
|
The rest of the space is used
|
|
as the "application" buffer, used to isolate the network
|
|
from scheduling and application latencies.
|
|
The
|
|
.I tcp_adv_win_scale
|
|
default value of 2 implies that the space
|
|
used for the application buffer is one fourth that of the total.
|
|
.TP
|
|
.IR tcp_allowed_congestion_control " (String; default: see text; since Linux 2.4.20)"
|
|
.\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
|
|
Show/set the congestion control algorithm choices available to unprivileged
|
|
processes (see the description of the
|
|
.B TCP_CONGESTION
|
|
socket option).
|
|
The list is a subset of those listed in
|
|
.IR tcp_available_congestion_control .
|
|
.\" FIXME How are the items in this delimited? Null bytes, spaces, commas?
|
|
The default value for this list is "reno" plus the default setting of
|
|
.IR tcp_congestion_control .
|
|
.TP
|
|
.IR tcp_available_congestion_control " (String; read-only; since Linux 2.4.20)"
|
|
.\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
|
|
Show a list of the congestion-control algorithms
|
|
that are registered.
|
|
.\" FIXME How are the items in this delimited? Null bytes, spaces, commas?
|
|
This list is a limiting set for the list in
|
|
.IR tcp_allowed_congestion_control .
|
|
More congestion-control algorithms may be available as modules,
|
|
but not loaded.
|
|
.TP
|
|
.IR tcp_app_win " (integer; default: 31; since Linux 2.4)"
|
|
.\" Since 2.4.0-test7
|
|
This variable defines how many
|
|
bytes of the TCP window are reserved for buffering overhead.
|
|
|
|
A maximum of (\fIwindow/2^tcp_app_win\fP, mss) bytes in the window
|
|
are reserved for the application buffer.
|
|
A value of 0 implies that no amount is reserved.
|
|
.\"
|
|
.\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
|
|
.TP
|
|
.IR tcp_base_mss " (Integer; default: 512; since Linux 2.6.17)
|
|
The initial value of
|
|
.I search_low
|
|
to be used by the packetization layer Path MTU discovery (MTU probing).
|
|
If MTU probing is enabled,
|
|
this is the initial MSS used by the connection.
|
|
.\"
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
|
.TP
|
|
.IR tcp_bic " (Boolean; default: disabled; Linux 2.4.27/2.6.6 to 2.6.13)"
|
|
Enable BIC TCP congestion control algorithm.
|
|
BIC-TCP is a sender-side only change that ensures a linear RTT
|
|
fairness under large windows while offering both scalability and
|
|
bounded TCP-friendliness.
|
|
The protocol combines two schemes
|
|
called additive increase and binary search increase.
|
|
When the congestion window is large, additive increase with a large
|
|
increment ensures linear RTT fairness as well as good scalability.
|
|
Under small congestion windows, binary search
|
|
increase provides TCP friendliness.
|
|
.\"
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
|
.TP
|
|
.IR tcp_bic_low_window " (integer; default: 14; Linux 2.4.27/2.6.6 to 2.6.13)"
|
|
Set the threshold window (in packets) where BIC TCP starts to
|
|
adjust the congestion window.
|
|
Below this threshold BIC TCP behaves the same as the default TCP Reno.
|
|
.\"
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
|
.TP
|
|
.IR tcp_bic_fast_convergence " (Boolean; default: enabled; Linux 2.4.27/2.6.6 to 2.6.13)"
|
|
Force BIC TCP to more quickly respond to changes in congestion window.
|
|
Allows two flows sharing the same connection to converge more rapidly.
|
|
.TP
|
|
.IR tcp_congestion_control " (String; default: see text; since Linux 2.4.13)"
|
|
.\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
|
|
Set the default congestion-control algorithm to be used for new connections.
|
|
The algorithm "reno" is always available,
|
|
but additional choices may be available depending on kernel configuration.
|
|
The default value for this file is set as part of kernel configuration.
|
|
.TP
|
|
.IR tcp_dma_copybreak " (integer; default: 4096; since Linux 2.6.24)"
|
|
Lower limit, in bytes, of the size of socket reads that will be
|
|
offloaded to a DMA copy engine, if one is present in the system
|
|
and the kernel was configured with the
|
|
.B CONFIG_NET_DMA
|
|
option.
|
|
.TP
|
|
.IR tcp_dsack " (Boolean; default: enabled; since Linux 2.4)"
|
|
.\" Since 2.4.0-test7
|
|
Enable RFC\ 2883 TCP Duplicate SACK support.
|
|
.TP
|
|
.IR tcp_ecn " (Boolean; default: disabled; since Linux 2.4)"
|
|
.\" Since 2.4.0-test7
|
|
Enable RFC\ 2884 Explicit Congestion Notification.
|
|
When enabled, connectivity to some
|
|
destinations could be affected due to older, misbehaving
|
|
routers along the path causing connections to be dropped.
|
|
.TP
|
|
.IR tcp_fack " (Boolean; default: enabled; since Linux 2.2)"
|
|
.\" Since 2.1.92
|
|
Enable TCP Forward Acknowledgement support.
|
|
.TP
|
|
.IR tcp_fin_timeout " (integer; default: 60; since Linux 2.2)"
|
|
.\" Since 2.1.53
|
|
This specifies how many seconds to wait for a final FIN packet before the
|
|
socket is forcibly closed.
|
|
This is strictly a violation of the TCP specification,
|
|
but required to prevent denial-of-service attacks.
|
|
In Linux 2.2, the default value was 180.
|
|
.\"
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
|
.TP
|
|
.IR tcp_frto " (integer; default: 0; since Linux 2.4.21/2.6)"
|
|
.\" Since 2.4.21/2.5.43
|
|
Enable F-RTO, an enhanced recovery algorithm for TCP retransmission
|
|
timeouts (RTOs).
|
|
It is particularly beneficial in wireless environments
|
|
where packet loss is typically due to random radio interference
|
|
rather than intermediate router congestion.
|
|
See RFC 4138 for more details.
|
|
|
|
This file can have one of the following values:
|
|
.RS
|
|
.IP 0 3
|
|
Disabled.
|
|
.IP 1
|
|
The basic version F-RTO algorithm is enabled.
|
|
.IP 2
|
|
Enable SACK-enhanced F-RTO if flow uses SACK.
|
|
The basic version can be used also when
|
|
SACK is in use though in that case scenario(s) exists where F-RTO
|
|
interacts badly with the packet counting of the SACK-enabled TCP flow.
|
|
.RE
|
|
.IP
|
|
Before Linux 2.6.22, this parameter was a Boolean value,
|
|
supporting just values 0 and 1 above.
|
|
.TP
|
|
.IR tcp_frto_response " (integer; default: 0; since Linux 2.6.22)"
|
|
When F-RTO has detected that a TCP retransmission timeout was spurious
|
|
(i.e, the timeout would have been avoided had TCP set a
|
|
longer retransmission timeout),
|
|
TCP has several options concerning what to do next.
|
|
Possible values are:
|
|
.RS
|
|
.IP 0 3
|
|
Rate halving based; a smooth and conservative response,
|
|
results in halved congestion window
|
|
.RI ( cwnd )
|
|
and slow-start threshold
|
|
.RI ( ssthresh )
|
|
after one RTT.
|
|
.IP 1
|
|
Very conservative response; not recommended because even
|
|
though being valid, it interacts poorly with the rest of Linux TCP; halves
|
|
.I cwnd
|
|
and
|
|
.I ssthresh
|
|
immediately.
|
|
.IP 2
|
|
Aggressive response; undoes congestion-control measures
|
|
that are now known to be unnecessary
|
|
(ignoring the possibility of a lost retransmission that would require
|
|
TCP to be more cautious);
|
|
.I cwnd
|
|
and
|
|
.I ssthresh
|
|
are restored to the values prior to timeout.
|
|
.RE
|
|
.TP
|
|
.IR tcp_keepalive_intvl " (integer; default: 75; since Linux 2.4)"
|
|
.\" Since 2.3.18
|
|
The number of seconds between TCP keep-alive probes.
|
|
.TP
|
|
.IR tcp_keepalive_probes " (integer; default: 9; since Linux 2.2)"
|
|
.\" Since 2.1.43
|
|
The maximum number of TCP keep-alive probes to send
|
|
before giving up and killing the connection if
|
|
no response is obtained from the other end.
|
|
.TP
|
|
.IR tcp_keepalive_time " (integer; default: 7200; since Linux 2.2)"
|
|
.\" Since 2.1.43
|
|
The number of seconds a connection needs to be idle
|
|
before TCP begins sending out keep-alive probes.
|
|
Keep-alives are sent only when the
|
|
.B SO_KEEPALIVE
|
|
socket option is enabled.
|
|
The default value is 7200 seconds (2 hours).
|
|
An idle connection is terminated after
|
|
approximately an additional 11 minutes (9 probes an interval
|
|
of 75 seconds apart) when keep-alive is enabled.
|
|
|
|
Note that underlying connection tracking mechanisms and
|
|
application timeouts may be much shorter.
|
|
.\"
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
|
.TP
|
|
.IR tcp_low_latency " (Boolean; default: disabled; since Linux 2.4.21/2.6)"
|
|
.\" Since 2.4.21/2.5.60
|
|
If enabled, the TCP stack makes decisions that prefer lower
|
|
latency as opposed to higher throughput.
|
|
It this option is disabled, then higher throughput is preferred.
|
|
An example of an application where this default should be
|
|
changed would be a Beowulf compute cluster.
|
|
.TP
|
|
.IR tcp_max_orphans " (integer; default: see below; since Linux 2.4)"
|
|
.\" Since 2.3.41
|
|
The maximum number of orphaned (not attached to any user file
|
|
handle) TCP sockets allowed in the system.
|
|
When this number is exceeded,
|
|
the orphaned connection is reset and a warning is printed.
|
|
This limit exists only to prevent simple denial-of-service attacks.
|
|
Lowering this limit is not recommended.
|
|
Network conditions might require you to increase the number of
|
|
orphans allowed, but note that each orphan can eat up to ~64K
|
|
of unswappable memory.
|
|
The default initial value is set equal to the kernel parameter NR_FILE.
|
|
This initial default is adjusted depending on the memory in the system.
|
|
.TP
|
|
.IR tcp_max_syn_backlog " (integer; default: see below; since Linux 2.2)"
|
|
.\" Since 2.1.53
|
|
The maximum number of queued connection requests which have
|
|
still not received an acknowledgement from the connecting client.
|
|
If this number is exceeded, the kernel will begin
|
|
dropping requests.
|
|
The default value of 256 is increased to
|
|
1024 when the memory present in the system is adequate or
|
|
greater (>= 128Mb), and reduced to 128 for those systems with
|
|
very low memory (<= 32Mb).
|
|
It is recommended that if this
|
|
needs to be increased above 1024, TCP_SYNQ_HSIZE in
|
|
.I include/net/tcp.h
|
|
be modified to keep
|
|
TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be
|
|
recompiled.
|
|
.TP
|
|
.IR tcp_max_tw_buckets " (integer; default: see below; since Linux 2.4)"
|
|
.\" Since 2.3.41
|
|
The maximum number of sockets in TIME_WAIT state allowed in
|
|
the system.
|
|
This limit exists only to prevent simple denial-of-service attacks.
|
|
The default value of NR_FILE*2 is adjusted
|
|
depending on the memory in the system.
|
|
If this number is
|
|
exceeded, the socket is closed and a warning is printed.
|
|
.TP
|
|
.IR tcp_moderate_rcvbuf " (Boolean; default: enabled; since Linux 2.4.17/2.6.7)"
|
|
.\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
|
|
If enabled, TCP performs receive buffer auto-tuning,
|
|
attempting to automatically size the buffer (no greater than
|
|
.IR tcp_rmem[2] )
|
|
to match the size required by the path for full throughput.
|
|
.TP
|
|
.IR tcp_mem " (since Linux 2.4)
|
|
.\" Since 2.4.0-test7
|
|
This is a vector of 3 integers: [low, pressure, high].
|
|
These bounds, measured in units of the system page size,
|
|
are used by TCP to track its memory usage.
|
|
The defaults are calculated at boot time from the amount of
|
|
available memory.
|
|
(TCP can only use
|
|
.I "low memory"
|
|
for this, which is limited to around 900 megabytes on 32-bit systems.
|
|
64-bit systems do not suffer this limitation.)
|
|
.RS
|
|
.TP 10
|
|
.I low
|
|
TCP doesn't regulate its memory allocation when the number
|
|
of pages it has allocated globally is below this number.
|
|
.TP
|
|
.I pressure
|
|
When the amount of memory allocated by TCP
|
|
exceeds this number of pages, TCP moderates its memory consumption.
|
|
This memory pressure state is exited
|
|
once the number of pages allocated falls below
|
|
the
|
|
.I low
|
|
mark.
|
|
.TP
|
|
.I high
|
|
The maximum number of pages, globally, that TCP will allocate.
|
|
This value overrides any other limits imposed by the kernel.
|
|
.RE
|
|
.TP
|
|
.IR tcp_mtu_probing " (integer; default: 0; since Linux 2.6.17)"
|
|
.\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
|
|
This parameter controls TCP Packetization-Layer Path MTU Discovery.
|
|
The following values may be assigned to the file:
|
|
.RS
|
|
.IP 0 3
|
|
Disabled
|
|
.IP 1
|
|
Disabled by default, enabled when an ICMP black hole detected
|
|
.IP 2
|
|
Always enabled, use initial MSS of
|
|
.IR tcp_base_mss .
|
|
.RE
|
|
.TP
|
|
.IR tcp_no_metrics_save " (Boolean; default: disabled; since Linux 2.6.6)"
|
|
.\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
|
|
By default, TCP saves various connection metrics in the route cache
|
|
when the connection closes, so that connections established in the
|
|
near future can use these to set initial conditions.
|
|
Usually, this increases overall performance,
|
|
but it may sometimes cause performance degradation.
|
|
If
|
|
.I tcp_no_metrics_save
|
|
is enabled, TCP will not cache metrics on closing connections.
|
|
.TP
|
|
.IR tcp_orphan_retries " (integer; default: 8; since Linux 2.4)"
|
|
.\" Since 2.3.41
|
|
The maximum number of attempts made to probe the other
|
|
end of a connection which has been closed by our end.
|
|
.TP
|
|
.IR tcp_reordering " (integer; default: 3; since Linux 2.4)"
|
|
.\" Since 2.4.0-test7
|
|
The maximum a packet can be reordered in a TCP packet stream
|
|
without TCP assuming packet loss and going into slow start.
|
|
It is not advisable to change this number.
|
|
This is a packet reordering detection metric designed to
|
|
minimize unnecessary back off and retransmits provoked by
|
|
reordering of packets on a connection.
|
|
.TP
|
|
.IR tcp_retrans_collapse " (Boolean; default: enabled; since Linux 2.2)"
|
|
.\" Since 2.1.96
|
|
Try to send full-sized packets during retransmit.
|
|
.TP
|
|
.IR tcp_retries1 " (integer; default: 3; since Linux 2.2)"
|
|
.\" Since 2.1.43
|
|
The number of times TCP will attempt to retransmit a
|
|
packet on an established connection normally,
|
|
without the extra effort of getting the network layers involved.
|
|
Once we exceed this number of
|
|
retransmits, we first have the network layer
|
|
update the route if possible before each new retransmit.
|
|
The default is the RFC specified minimum of 3.
|
|
.TP
|
|
.IR tcp_retries2 " (integer; default: 15; since Linux 2.2)"
|
|
.\" Since 2.1.43
|
|
The maximum number of times a TCP packet is retransmitted
|
|
in established state before giving up.
|
|
The default value is 15, which corresponds to a duration of
|
|
approximately between 13 to 30 minutes, depending
|
|
on the retransmission timeout.
|
|
The RFC\ 1122 specified
|
|
minimum limit of 100 seconds is typically deemed too short.
|
|
.TP
|
|
.IR tcp_rfc1337 " (Boolean; default: disabled; since Linux 2.2)"
|
|
.\" Since 2.1.90
|
|
Enable TCP behavior conformant with RFC\ 1337.
|
|
When disabled,
|
|
if a RST is received in TIME_WAIT state, we close
|
|
the socket immediately without waiting for the end
|
|
of the TIME_WAIT period.
|
|
.TP
|
|
.IR tcp_rmem " (since Linux 2.4)"
|
|
.\" Since 2.4.0-test7
|
|
This is a vector of 3 integers: [min, default, max].
|
|
These parameters are used by TCP to regulate receive buffer sizes.
|
|
TCP dynamically adjusts the size of the
|
|
receive buffer from the defaults listed below, in the range
|
|
of these values, depending on memory available in the system.
|
|
.RS
|
|
.TP 10
|
|
.I min
|
|
minimum size of the receive buffer used by each TCP socket.
|
|
The default value is the system page size.
|
|
(On Linux 2.4, the default value is 4K, lowered to
|
|
.B PAGE_SIZE
|
|
bytes in low-memory systems.)
|
|
This value
|
|
is used to ensure that in memory pressure mode,
|
|
allocations below this size will still succeed.
|
|
This is not
|
|
used to bound the size of the receive buffer declared
|
|
using
|
|
.B SO_RCVBUF
|
|
on a socket.
|
|
.TP
|
|
.I default
|
|
the default size of the receive buffer for a TCP socket.
|
|
This value overwrites the initial default buffer size from
|
|
the generic global
|
|
.I net.core.rmem_default
|
|
defined for all protocols.
|
|
The default value is 87380 bytes.
|
|
(On Linux 2.4, this will be lowered to 43689 in low-memory systems.)
|
|
If larger receive buffer sizes are desired, this value should
|
|
be increased (to affect all sockets).
|
|
To employ large TCP windows, the
|
|
.I net.ipv4.tcp_window_scaling
|
|
must be enabled (default).
|
|
.TP
|
|
.I max
|
|
the maximum size of the receive buffer used by each TCP socket.
|
|
This value does not override the global
|
|
.IR net.core.rmem_max .
|
|
This is not used to limit the size of the receive buffer declared using
|
|
.B SO_RCVBUF
|
|
on a socket.
|
|
The default value is calculated using the formula
|
|
|
|
max(87380, min(4MB, \fItcp_mem\fP[1]*PAGE_SIZE/128))
|
|
|
|
(On Linux 2.4, the default is 87380*2 bytes,
|
|
lowered to 87380 in low-memory systems).
|
|
.RE
|
|
.TP
|
|
.IR tcp_sack " (Boolean; default: enabled; since Linux 2.2)"
|
|
.\" Since 2.1.36
|
|
Enable RFC\ 2018 TCP Selective Acknowledgements.
|
|
.TP
|
|
.IR tcp_slow_start_after_idle " (Boolean; default: enabled; since Linux 2.6.18)"
|
|
.\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt
|
|
If enabled, provide RFC 2861 behavior and time out the congestion
|
|
window after an idle period.
|
|
An idle period is defined as the current RTO (retransmission timeout).
|
|
If disabled, the congestion window will not
|
|
be timed out after an idle period.
|
|
.TP
|
|
.IR tcp_stdurg " (Boolean; default: disabled; since Linux 2.2)"
|
|
.\" Since 2.1.44
|
|
If this option is enabled, then use the RFC\ 1122 interpretation
|
|
of the TCP urgent-pointer field.
|
|
.\" RFC 793 was ambiguous in its specification of the meaning of the
|
|
.\" urgent pointer. RFC 1122 (and RFC 961) fixed on a particular
|
|
.\" resolution of this ambiguity (unfortunately the "wrong" one).
|
|
According to this interpretation, the urgent pointer points
|
|
to the last byte of urgent data.
|
|
If this option is disabled, then use the BSD-compatible interpretation of
|
|
the urgent pointer:
|
|
the urgent pointer points to the first byte after the urgent data.
|
|
Enabling this option may lead to interoperability problems.
|
|
.TP
|
|
.IR tcp_syn_retries " (integer; default: 5; since Linux 2.2)"
|
|
.\" Since 2.1.38
|
|
The maximum number of times initial SYNs for an active TCP
|
|
connection attempt will be retransmitted.
|
|
This value should not be higher than 255.
|
|
The default value is 5, which corresponds to approximately 180 seconds.
|
|
.TP
|
|
.IR tcp_synack_retries " (integer; default: 5; since Linux 2.2)"
|
|
.\" Since 2.1.38
|
|
The maximum number of times a SYN/ACK segment
|
|
for a passive TCP connection will be retransmitted.
|
|
This number should not be higher than 255.
|
|
.TP
|
|
.IR tcp_syncookies " (Boolean; since Linux 2.2)"
|
|
.\" Since 2.1.43
|
|
Enable TCP syncookies.
|
|
The kernel must be compiled with
|
|
.BR CONFIG_SYN_COOKIES .
|
|
Send out syncookies when the syn backlog queue of a socket overflows.
|
|
The syncookies feature attempts to protect a
|
|
socket from a SYN flood attack.
|
|
This should be used as a last resort, if at all.
|
|
This is a violation of the TCP protocol,
|
|
and conflicts with other areas of TCP such as TCP extensions.
|
|
It can cause problems for clients and relays.
|
|
It is not recommended as a tuning mechanism for heavily
|
|
loaded servers to help with overloaded or misconfigured conditions.
|
|
For recommended alternatives see
|
|
.IR tcp_max_syn_backlog ,
|
|
.IR tcp_synack_retries ,
|
|
and
|
|
.IR tcp_abort_on_overflow .
|
|
.TP
|
|
.IR tcp_timestamps " (Boolean; default: enabled; since Linux 2.2)"
|
|
.\" Since 2.1.36
|
|
Enable RFC\ 1323 TCP timestamps.
|
|
.TP
|
|
.IR tcp_tso_win_divisor " (integer; default: 3; since Linux 2.6.9)"
|
|
This parameter controls what percentage of the congestion window
|
|
can be consumed by a single TCP Segmentation Offload (TSO) frame.
|
|
The setting of this parameter is a tradeoff between burstiness and
|
|
building larger TSO frames.
|
|
.TP
|
|
.IR tcp_tw_recycle " (Boolean; default: disabled; since Linux 2.4)"
|
|
.\" Since 2.3.15
|
|
Enable fast recycling of TIME_WAIT sockets.
|
|
Enabling this option is not
|
|
recommended since this causes problems when working
|
|
with NAT (Network Address Translation).
|
|
.\"
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
|
.TP
|
|
.IR tcp_tw_reuse " (Boolean; default: disabled; since Linux 2.4.19/2.6)"
|
|
.\" Since 2.4.19/2.5.43
|
|
Allow to reuse TIME_WAIT sockets for new connections when it is
|
|
safe from protocol viewpoint.
|
|
It should not be changed without advice/request of technical experts.
|
|
.\"
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
|
.TP
|
|
.IR tcp_vegas_cong_avoid " (Boolean; default: disabled; Linux 2.2 to 2.6.13)"
|
|
.\" Since 2.1.8; removed in 2.6.13
|
|
Enable TCP Vegas congestion avoidance algorithm.
|
|
TCP Vegas is a sender-side only change to TCP that anticipates
|
|
the onset of congestion by estimating the bandwidth.
|
|
TCP Vegas adjusts the sending rate by modifying the congestion window.
|
|
TCP Vegas should provide less packet loss, but it is
|
|
not as aggressive as TCP Reno.
|
|
.\"
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
|
.TP
|
|
.IR tcp_westwood " (Boolean; default: disabled; Linux 2.4.26/2.6.3 to 2.6.13)"
|
|
Enable TCP Westwood+ congestion control algorithm.
|
|
TCP Westwood+ is a sender-side only modification of the TCP Reno
|
|
protocol stack that optimizes the performance of TCP congestion control.
|
|
It is based on end-to-end bandwidth estimation to set
|
|
congestion window and slow start threshold after a congestion episode.
|
|
Using this estimation, TCP Westwood+ adaptively sets a
|
|
slow start threshold and a congestion window which takes into
|
|
account the bandwidth used at the time congestion is experienced.
|
|
TCP Westwood+ significantly increases fairness with respect to
|
|
TCP Reno in wired networks and throughput over wireless links.
|
|
.TP
|
|
.IR tcp_window_scaling " (Boolean; default: enabled; since Linux 2.2)"
|
|
.\" Since 2.1.36
|
|
Enable RFC\ 1323 TCP window scaling.
|
|
This feature allows the use of a large window
|
|
(> 64K) on a TCP connection, should the other end support it.
|
|
Normally, the 16 bit window length field in the TCP header
|
|
limits the window size to less than 64K bytes.
|
|
If larger windows are desired, applications can increase the size of
|
|
their socket buffers and the window scaling option will be employed.
|
|
If
|
|
.I tcp_window_scaling
|
|
is disabled, TCP will not negotiate the use of window
|
|
scaling with the other end during connection setup.
|
|
.TP
|
|
.IR tcp_wmem " (since Linux 2.4)"
|
|
.\" Since 2.4.0-test7
|
|
This is a vector of 3 integers: [min, default, max].
|
|
These parameters are used by TCP to regulate send buffer sizes.
|
|
TCP dynamically adjusts the size of the send buffer from the
|
|
default values listed below, in the range of these values,
|
|
depending on memory available.
|
|
.RS
|
|
.TP 10
|
|
.I min
|
|
Minimum size of the send buffer used by each TCP socket.
|
|
The default value is the system page size.
|
|
(On Linux 2.4, the default value is 4K bytes.)
|
|
This value is used to ensure that in memory pressure mode,
|
|
allocations below this size will still succeed.
|
|
This is not used to bound the size of the send buffer declared using
|
|
.B SO_SNDBUF
|
|
on a socket.
|
|
.TP
|
|
.I default
|
|
The default size of the send buffer for a TCP socket.
|
|
This value overwrites the initial default buffer size from
|
|
the generic global
|
|
.I /proc/sys/net/core/wmem_default
|
|
defined for all protocols.
|
|
The default value is 16K bytes.
|
|
.\" True in Linux 2.4 and 2.6
|
|
If larger send buffer sizes are desired, this value
|
|
should be increased (to affect all sockets).
|
|
To employ large TCP windows, the
|
|
.I /proc/sys/net/ipv4/tcp_window_scaling
|
|
must be set to a nonzero value (default).
|
|
.TP
|
|
.I max
|
|
The maximum size of the send buffer used by each TCP socket.
|
|
This value does not override the value in
|
|
.IR /proc/sys/net/core/wmem_max .
|
|
This is not used to limit the size of the send buffer declared using
|
|
.B SO_SNDBUF
|
|
on a socket.
|
|
The default value is calculated using the formula
|
|
|
|
max(65536, min(4MB, \fItcp_mem\fP[1]*PAGE_SIZE/128))
|
|
|
|
(On Linux 2.4, the default value is 128K bytes,
|
|
lowered 64K depending on low-memory systems.)
|
|
.RE
|
|
.TP
|
|
.IR tcp_workaround_signed_windows " (Boolean; default: disabled; since Linux 2.6.26)"
|
|
If enabled, assume that no receipt of a window-scaling option means that the
|
|
remote TCP is broken and treats the window as a signed quantity.
|
|
If disabled, assume that the remote TCP is not broken even if we do
|
|
not receive a window scaling option from it.
|
|
.SS Socket options
|
|
To set or get a TCP socket option, call
|
|
.BR getsockopt (2)
|
|
to read or
|
|
.BR setsockopt (2)
|
|
to write the option with the option level argument set to
|
|
.BR IPPROTO_TCP .
|
|
.\" or SOL_TCP on Linux
|
|
In addition,
|
|
most
|
|
.B IPPROTO_IP
|
|
socket options are valid on TCP sockets.
|
|
For more information see
|
|
.BR ip (7).
|
|
.TP
|
|
.BR TCP_CORK " (since Linux 2.2)"
|
|
.\" precisely: since 2.1.127
|
|
If set, don't send out partial frames.
|
|
All queued partial frames are sent when the option is cleared again.
|
|
This is useful for prepending headers before calling
|
|
.BR sendfile (2),
|
|
or for throughput optimization.
|
|
As currently implemented, there is a 200 millisecond ceiling on the time
|
|
for which output is corked by
|
|
.BR TCP_CORK .
|
|
If this ceiling is reached, then queued data is automatically transmitted.
|
|
This option can be combined with
|
|
.B TCP_NODELAY
|
|
only since Linux 2.5.71.
|
|
This option should not be used in code intended to be portable.
|
|
.TP
|
|
.BR TCP_DEFER_ACCEPT " (since Linux 2.4)"
|
|
.\" Precisely: since 2.3.38
|
|
Allow a listener to be awakened only when data arrives on the socket.
|
|
Takes an integer value (seconds), this can
|
|
bound the maximum number of attempts TCP will make to
|
|
complete the connection.
|
|
This option should not be used in code intended to be portable.
|
|
.TP
|
|
.BR TCP_INFO " (since Linux 2.4)"
|
|
Used to collect information about this socket.
|
|
The kernel returns a \fIstruct tcp_info\fP as defined in the file
|
|
.IR /usr/include/linux/tcp.h .
|
|
This option should not be used in code intended to be portable.
|
|
.TP
|
|
.BR TCP_KEEPCNT " (since Linux 2.4)"
|
|
.\" Precisely: since 2.3.18
|
|
The maximum number of keepalive probes TCP should send
|
|
before dropping the connection.
|
|
This option should not be
|
|
used in code intended to be portable.
|
|
.TP
|
|
.BR TCP_KEEPIDLE " (since Linux 2.4)"
|
|
.\" Precisely: since 2.3.18
|
|
The time (in seconds) the connection needs to remain idle
|
|
before TCP starts sending keepalive probes, if the socket
|
|
option
|
|
.B SO_KEEPALIVE
|
|
has been set on this socket.
|
|
This option should not be used in code intended to be portable.
|
|
.TP
|
|
.BR TCP_KEEPINTVL " (since Linux 2.4)"
|
|
.\" Precisely: since 2.3.18
|
|
The time (in seconds) between individual keepalive probes.
|
|
This option should not be used in code intended to be portable.
|
|
.TP
|
|
.BR TCP_LINGER2 " (since Linux 2.4)"
|
|
.\" Precisely: since 2.3.41
|
|
The lifetime of orphaned FIN_WAIT2 state sockets.
|
|
This option can be used to override the system-wide setting in the file
|
|
.I /proc/sys/net/ipv4/tcp_fin_timeout
|
|
for this socket.
|
|
This is not to be confused with the
|
|
.BR socket (7)
|
|
level option
|
|
.BR SO_LINGER .
|
|
This option should not be used in code intended to be portable.
|
|
.TP
|
|
.B TCP_MAXSEG
|
|
.\" Present in Linux 1.0
|
|
The maximum segment size for outgoing TCP packets.
|
|
In Linux 2.2 and earlier, and in Linux 2.6.28 and later,
|
|
if this option is set before connection establishment, it also
|
|
changes the MSS value announced to the other end in the initial packet.
|
|
Values greater than the (eventual) interface MTU have no effect.
|
|
TCP will also impose
|
|
its minimum and maximum bounds over the value provided.
|
|
.TP
|
|
.B TCP_NODELAY
|
|
.\" Present in Linux 1.0
|
|
If set, disable the Nagle algorithm.
|
|
This means that segments
|
|
are always sent as soon as possible, even if there is only a
|
|
small amount of data.
|
|
When not set, data is buffered until there
|
|
is a sufficient amount to send out, thereby avoiding the
|
|
frequent sending of small packets, which results in poor
|
|
utilization of the network.
|
|
This option is overridden by
|
|
.BR TCP_CORK ;
|
|
however, setting this option forces an explicit flush of
|
|
pending output, even if
|
|
.B TCP_CORK
|
|
is currently set.
|
|
.TP
|
|
.BR TCP_QUICKACK " (since Linux 2.4.4)"
|
|
Enable quickack mode if set or disable quickack
|
|
mode if cleared.
|
|
In quickack mode, acks are sent
|
|
immediately, rather than delayed if needed in accordance
|
|
to normal TCP operation.
|
|
This flag is not permanent,
|
|
it only enables a switch to or from quickack mode.
|
|
Subsequent operation of the TCP protocol will
|
|
once again enter/leave quickack mode depending on
|
|
internal protocol processing and factors such as
|
|
delayed ack timeouts occurring and data transfer.
|
|
This option should not be used in code intended to be
|
|
portable.
|
|
.TP
|
|
.BR TCP_SYNCNT " (since Linux 2.4)"
|
|
.\" Precisely: since 2.3.18
|
|
Set the number of SYN retransmits that TCP should send before
|
|
aborting the attempt to connect.
|
|
It cannot exceed 255.
|
|
This option should not be used in code intended to be portable.
|
|
.TP
|
|
.BR TCP_WINDOW_CLAMP " (since Linux 2.4)"
|
|
.\" Precisely: since 2.3.41
|
|
Bound the size of the advertised window to this value.
|
|
The kernel imposes a minimum size of SOCK_MIN_RCVBUF/2.
|
|
This option should not be used in code intended to be
|
|
portable.
|
|
.SS Sockets API
|
|
TCP provides limited support for out-of-band data,
|
|
in the form of (a single byte of) urgent data.
|
|
In Linux this means if the other end sends newer out-of-band
|
|
data the older urgent data is inserted as normal data into
|
|
the stream (even when
|
|
.B SO_OOBINLINE
|
|
is not set).
|
|
This differs from BSD-based stacks.
|
|
.PP
|
|
Linux uses the BSD compatible interpretation of the urgent
|
|
pointer field by default.
|
|
This violates RFC\ 1122, but is
|
|
required for interoperability with other stacks.
|
|
It can be changed via
|
|
.IR /proc/sys/net/ipv4/tcp_stdurg .
|
|
|
|
It is possible to peek at out-of-band data using the
|
|
.IR recv (2)
|
|
.B MSG_PEEK
|
|
flag.
|
|
|
|
Since version 2.4, Linux supports the use of
|
|
.B MSG_TRUNC
|
|
in the
|
|
.I flags
|
|
argument of
|
|
.BR recv (2)
|
|
(and
|
|
.BR recvmsg (2)).
|
|
This flag causes the received bytes of data to be discarded,
|
|
rather than passed back in a caller-supplied buffer.
|
|
Since Linux 2.4.4,
|
|
.BR MSG_PEEK
|
|
also has this effect when used in conjunction with
|
|
.BR MSG_OOB
|
|
to receive out-of-band data.
|
|
.SS Ioctls
|
|
The following
|
|
.BR ioctl (2)
|
|
calls return information in
|
|
.IR value .
|
|
The correct syntax is:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
.BI int " value";
|
|
.IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");"
|
|
.fi
|
|
.RE
|
|
.PP
|
|
.I ioctl_type
|
|
is one of the following:
|
|
.TP
|
|
.B SIOCINQ
|
|
Returns the amount of queued unread data in the receive buffer.
|
|
The socket must not be in LISTEN state, otherwise an error
|
|
.RB ( EINVAL )
|
|
is returned.
|
|
.B SIOCINQ
|
|
is defined in
|
|
.IR <linux/sockios.h> .
|
|
.\" FIXME http://sources.redhat.com/bugzilla/show_bug.cgi?id=12002,
|
|
.\" filed 2010-09-10, may cause SIOCINQ to be defined in glibc headers
|
|
Alternatively,
|
|
you can use the synonymous
|
|
.BR FIONREAD ,
|
|
defined in
|
|
.IR <sys/ioctl.h> .
|
|
.TP
|
|
.B SIOCATMARK
|
|
Returns true (i.e.,
|
|
.I value
|
|
is nonzero) if the inbound data stream is at the urgent mark.
|
|
|
|
If the
|
|
.B SO_OOBINLINE
|
|
socket option is set, and
|
|
.B SIOCATMARK
|
|
returns true, then the
|
|
next read from the socket will return the urgent data.
|
|
If the
|
|
.B SO_OOBINLINE
|
|
socket option is not set, and
|
|
.B SIOCATMARK
|
|
returns true, then the
|
|
next read from the socket will return the bytes following
|
|
the urgent data (to actually read the urgent data requires the
|
|
.B recv(MSG_OOB)
|
|
flag).
|
|
|
|
Note that a read never reads across the urgent mark.
|
|
If an application is informed of the presence of urgent data via
|
|
.BR select (2)
|
|
(using the
|
|
.I exceptfds
|
|
argument) or through delivery of a
|
|
.B SIGURG
|
|
signal,
|
|
then it can advance up to the mark using a loop which repeatedly tests
|
|
.B SIOCATMARK
|
|
and performs a read (requesting any number of bytes) as long as
|
|
.B SIOCATMARK
|
|
returns false.
|
|
.TP
|
|
.B SIOCOUTQ
|
|
Returns the amount of unsent data in the socket send queue.
|
|
The socket must not be in LISTEN state, otherwise an error
|
|
.RB ( EINVAL )
|
|
is returned.
|
|
.B SIOCOUTQ
|
|
is defined in
|
|
.IR <linux/sockios.h> .
|
|
.\" FIXME http://sources.redhat.com/bugzilla/show_bug.cgi?id=12002,
|
|
.\" filed 2010-09-10, may cause SIOCOUTQ to be defined in glibc headers
|
|
Alternatively,
|
|
you can use the synonymous
|
|
.BR TIOCOUTQ ,
|
|
defined in
|
|
.IR <sys/ioctl.h> .
|
|
.SS Error handling
|
|
When a network error occurs, TCP tries to resend the packet.
|
|
If it doesn't succeed after some time, either
|
|
.B ETIMEDOUT
|
|
or the last received error on this connection is reported.
|
|
.PP
|
|
Some applications require a quicker error notification.
|
|
This can be enabled with the
|
|
.B IPPROTO_IP
|
|
level
|
|
.B IP_RECVERR
|
|
socket option.
|
|
When this option is enabled, all incoming
|
|
errors are immediately passed to the user program.
|
|
Use this option with care \(em it makes TCP less tolerant to routing
|
|
changes and other normal network conditions.
|
|
.SH ERRORS
|
|
.TP
|
|
.B EAFNOTSUPPORT
|
|
Passed socket address type in
|
|
.I sin_family
|
|
was not
|
|
.BR AF_INET .
|
|
.TP
|
|
.B EPIPE
|
|
The other end closed the socket unexpectedly or a read is
|
|
executed on a shut down socket.
|
|
.TP
|
|
.B ETIMEDOUT
|
|
The other end didn't acknowledge retransmitted data after some time.
|
|
.PP
|
|
Any errors defined for
|
|
.BR ip (7)
|
|
or the generic socket layer may also be returned for TCP.
|
|
.SH VERSIONS
|
|
Support for Explicit Congestion Notification, zero-copy
|
|
.BR sendfile (2),
|
|
reordering support and some SACK extensions
|
|
(DSACK) were introduced in 2.4.
|
|
Support for forward acknowledgement (FACK), TIME_WAIT recycling,
|
|
and per-connection keepalive socket options were introduced in 2.3.
|
|
.SH BUGS
|
|
Not all errors are documented.
|
|
.br
|
|
IPv6 is not described.
|
|
.\" Only a single Linux kernel version is described
|
|
.\" Info for 2.2 was lost. Should be added again,
|
|
.\" or put into a separate page.
|
|
.\" .SH AUTHORS
|
|
.\" This man page was originally written by Andi Kleen.
|
|
.\" It was updated for 2.4 by Nivedita Singhvi with input from
|
|
.\" Alexey Kuznetsov's Documentation/networking/ip-sysctl.txt
|
|
.\" document.
|
|
.SH SEE ALSO
|
|
.BR accept (2),
|
|
.BR bind (2),
|
|
.BR connect (2),
|
|
.BR getsockopt (2),
|
|
.BR listen (2),
|
|
.BR recvmsg (2),
|
|
.BR sendfile (2),
|
|
.BR sendmsg (2),
|
|
.BR socket (2),
|
|
.BR ip (7),
|
|
.BR socket (7)
|
|
.sp
|
|
RFC\ 793 for the TCP specification.
|
|
.br
|
|
RFC\ 1122 for the TCP requirements and a description of the Nagle algorithm.
|
|
.br
|
|
RFC\ 1323 for TCP timestamp and window scaling options.
|
|
.br
|
|
RFC\ 1337 for a description of TIME_WAIT assassination hazards.
|
|
.br
|
|
RFC\ 3168 for a description of Explicit Congestion Notification.
|
|
.br
|
|
RFC\ 2581 for TCP congestion control algorithms.
|
|
.br
|
|
RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK.
|