2004-11-03 13:51:07 +00:00
|
|
|
.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
|
|
|
|
.\" Permission is granted to distribute possibly modified copies
|
|
|
|
.\" of this page provided the header is included verbatim,
|
|
|
|
.\" and in case of nontrivial modification author and date
|
|
|
|
.\" of the modification is added to the header.
|
|
|
|
.\"
|
|
|
|
.\" 2.4 Updates by Nivedita Singhvi 4/20/02 <nivedita@us.ibm.com>.
|
2004-11-11 16:11:50 +00:00
|
|
|
.\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer
|
|
|
|
.\" Updated details of interaction of TCP_CORK and TCP_NODELAY.
|
2006-04-03 23:03:27 +00:00
|
|
|
.\"
|
|
|
|
.\" FIXME 2.6.17-rc1 adds the following /proc files, which need to
|
|
|
|
.\" documentedtcp_mtu_probing, tcp_base_mss, and
|
|
|
|
.\" tcp_workaround_signed_windows
|
2004-11-03 13:51:07 +00:00
|
|
|
.\"
|
2005-06-15 12:56:21 +00:00
|
|
|
.TH TCP 7 2005-06-15 "Linux Man Page" "Linux Programmer's Manual"
|
2004-11-03 13:51:07 +00:00
|
|
|
.SH NAME
|
|
|
|
tcp \- TCP protocol
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.B #include <sys/socket.h>
|
|
|
|
.br
|
|
|
|
.B #include <netinet/in.h>
|
|
|
|
.br
|
|
|
|
.B #include <netinet/tcp.h>
|
|
|
|
.br
|
|
|
|
.B tcp_socket = socket(PF_INET, SOCK_STREAM, 0);
|
|
|
|
.SH DESCRIPTION
|
|
|
|
This is an implementation of the TCP protocol defined in
|
2005-06-21 14:46:08 +00:00
|
|
|
RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK
|
2005-06-14 15:24:55 +00:00
|
|
|
extensions. It provides a reliable, stream-oriented,
|
|
|
|
full-duplex connection between two sockets on top of
|
2004-11-03 13:51:07 +00:00
|
|
|
.BR ip (7),
|
|
|
|
for both v4 and v6 versions.
|
|
|
|
TCP guarantees that the data arrives in order and
|
2005-06-14 15:24:55 +00:00
|
|
|
retransmits lost packets.
|
2006-01-03 10:58:34 +00:00
|
|
|
It generates and checks a per-packet checksum to catch
|
|
|
|
transmission errors.
|
2005-06-14 15:24:55 +00:00
|
|
|
TCP does not preserve record boundaries.
|
2004-11-03 13:51:07 +00:00
|
|
|
|
2005-06-14 15:24:55 +00:00
|
|
|
A newly created TCP socket has no remote or local address and is not
|
2004-11-03 13:51:07 +00:00
|
|
|
fully specified. To create an outgoing TCP connection use
|
|
|
|
.BR connect (2)
|
|
|
|
to establish a connection to another TCP socket.
|
2005-06-14 15:24:55 +00:00
|
|
|
To receive new incoming connections, first
|
2004-11-03 13:51:07 +00:00
|
|
|
.BR bind (2)
|
2005-06-14 15:24:55 +00:00
|
|
|
the socket to a local address and port and then call
|
2004-11-03 13:51:07 +00:00
|
|
|
.BR listen (2)
|
2005-06-14 15:24:55 +00:00
|
|
|
to put the socket into the listening state. After that a new
|
2004-11-03 13:51:07 +00:00
|
|
|
socket for each incoming connection can be accepted
|
|
|
|
using
|
|
|
|
.BR accept (2).
|
|
|
|
A socket which has had
|
2005-10-19 07:07:02 +00:00
|
|
|
.BR accept ()
|
2004-11-03 13:51:07 +00:00
|
|
|
or
|
2005-10-19 07:07:02 +00:00
|
|
|
.BR connect ()
|
2004-11-03 13:51:07 +00:00
|
|
|
successfully called on it is fully specified and may
|
|
|
|
transmit data. Data cannot be transmitted on listening or
|
|
|
|
not yet connected sockets.
|
|
|
|
|
2005-06-21 14:46:08 +00:00
|
|
|
Linux supports RFC\ 1323 TCP high performance
|
2004-11-03 13:51:07 +00:00
|
|
|
extensions. These include Protection Against Wrapped
|
|
|
|
Sequence Numbers (PAWS), Window Scaling and
|
|
|
|
Timestamps. Window scaling allows the use
|
|
|
|
of large (> 64K) TCP windows in order to support links with high
|
|
|
|
latency or bandwidth. To make use of them, the send and
|
|
|
|
receive buffer sizes must be increased.
|
|
|
|
They can be set globally with the
|
|
|
|
.B net.ipv4.tcp_wmem
|
|
|
|
and
|
|
|
|
.B net.ipv4.tcp_rmem
|
|
|
|
sysctl variables, or on individual sockets by using the
|
|
|
|
.B SO_SNDBUF
|
|
|
|
and
|
|
|
|
.B SO_RCVBUF
|
|
|
|
socket options with the
|
|
|
|
.BR setsockopt (2)
|
|
|
|
call.
|
|
|
|
|
|
|
|
The maximum sizes for socket buffers declared via the
|
|
|
|
.B SO_SNDBUF
|
|
|
|
and
|
|
|
|
.B SO_RCVBUF
|
|
|
|
mechanisms are limited by the global
|
|
|
|
.B net.core.rmem_max
|
|
|
|
and
|
|
|
|
.B net.core.wmem_max
|
|
|
|
sysctls. Note that TCP actually allocates twice the size of
|
|
|
|
the buffer requested in the
|
|
|
|
.BR setsockopt (2)
|
|
|
|
call, and so a succeeding
|
|
|
|
.BR getsockopt (2)
|
|
|
|
call will not return the same size of buffer as requested
|
|
|
|
in the
|
|
|
|
.BR setsockopt (2)
|
2006-01-03 10:58:34 +00:00
|
|
|
call.
|
|
|
|
TCP uses the extra space for administrative purposes and internal
|
2004-11-03 13:51:07 +00:00
|
|
|
kernel structures, and the sysctl variables reflect the
|
|
|
|
larger sizes compared to the actual TCP windows.
|
|
|
|
On individual connections, the socket buffer size must be
|
|
|
|
set prior to the
|
2005-10-19 07:07:02 +00:00
|
|
|
.BR listen ()
|
2004-11-03 13:51:07 +00:00
|
|
|
or
|
2005-10-19 07:07:02 +00:00
|
|
|
.BR connect ()
|
2004-11-03 13:51:07 +00:00
|
|
|
calls in order to have it take effect. See
|
|
|
|
.BR socket (7)
|
|
|
|
for more information.
|
|
|
|
.PP
|
|
|
|
TCP supports urgent data. Urgent data is used to signal the
|
|
|
|
receiver that some important message is part of the data
|
|
|
|
stream and that it should be processed as soon as possible.
|
|
|
|
To send urgent data specify the
|
|
|
|
.B MSG_OOB
|
|
|
|
option to
|
|
|
|
.BR send (2).
|
|
|
|
When urgent data is received, the kernel sends a
|
|
|
|
.B SIGURG
|
2005-07-05 13:53:03 +00:00
|
|
|
signal to the process or process group that has been set as the
|
2005-06-21 14:46:08 +00:00
|
|
|
socket "owner" using the
|
2004-11-03 13:51:07 +00:00
|
|
|
.B SIOCSPGRP
|
|
|
|
or
|
|
|
|
.B FIOSETOWN
|
2005-06-21 14:46:08 +00:00
|
|
|
ioctls (or the SUSv3-specified
|
|
|
|
.BR fcntl (2)
|
|
|
|
.B F_SETOWN
|
|
|
|
operation).
|
|
|
|
When the
|
2004-11-03 13:51:07 +00:00
|
|
|
.B SO_OOBINLINE
|
|
|
|
socket option is enabled, urgent data is put into the normal
|
2005-06-14 15:24:55 +00:00
|
|
|
data stream (a program can test for its location using the
|
2004-11-03 13:51:07 +00:00
|
|
|
.B SIOCATMARK
|
2005-06-21 14:46:08 +00:00
|
|
|
ioctl described below),
|
2004-11-03 13:51:07 +00:00
|
|
|
otherwise it can be only received when the
|
|
|
|
.B MSG_OOB
|
|
|
|
flag is set for
|
2005-06-16 10:23:59 +00:00
|
|
|
.BR recv (2)
|
|
|
|
or
|
|
|
|
.BR recvmsg (2).
|
2004-11-03 13:51:07 +00:00
|
|
|
|
|
|
|
Linux 2.4 introduced a number of changes for improved
|
|
|
|
throughput and scaling, as well as enhanced functionality.
|
2005-06-14 15:24:55 +00:00
|
|
|
Some of these features include support for zero-copy
|
2004-11-03 13:51:07 +00:00
|
|
|
.BR sendfile (2),
|
|
|
|
Explicit Congestion Notification, new
|
|
|
|
management of TIME_WAIT sockets, keep-alive socket options
|
|
|
|
and support for Duplicate SACK extensions.
|
|
|
|
.SH "ADDRESS FORMATS"
|
|
|
|
TCP is built on top of IP (see
|
|
|
|
.BR ip (7)).
|
|
|
|
The address formats defined by
|
|
|
|
.BR ip (7)
|
|
|
|
apply to TCP. TCP only supports point-to-point
|
|
|
|
communication; broadcasting and multicasting are not
|
|
|
|
supported.
|
|
|
|
.SH SYSCTLS
|
|
|
|
These variables can be accessed by the
|
2005-11-02 13:55:25 +00:00
|
|
|
.I /proc/sys/net/ipv4/*
|
2004-11-03 13:51:07 +00:00
|
|
|
files or with the
|
|
|
|
.BR sysctl (2)
|
|
|
|
interface. In addition, most IP sysctls also apply to TCP; see
|
|
|
|
.BR ip (7).
|
2005-06-21 14:46:08 +00:00
|
|
|
Variables described as
|
|
|
|
.I Boolean
|
|
|
|
take an integer value, with a non-zero value ("true") meaning that
|
|
|
|
the corresponding option is enabled, and a zero value ("false")
|
|
|
|
meaning that the option is disabled.
|
2006-03-23 02:13:08 +00:00
|
|
|
.\" FIXME As at 14 Jun 2005, kernel 2.6.12, the following are
|
2005-06-15 12:56:21 +00:00
|
|
|
.\" not yet documented (shown with default values):
|
|
|
|
.\"
|
|
|
|
.\" /proc/sys/net/ipv4/tcp_bic_beta
|
|
|
|
.\" 819
|
|
|
|
.\" /proc/sys/net/ipv4/tcp_moderate_rcvbuf
|
|
|
|
.\" 1
|
|
|
|
.\" /proc/sys/net/ipv4/tcp_no_metrics_save
|
|
|
|
.\" 0
|
|
|
|
.\" /proc/sys/net/ipv4/tcp_vegas_alpha
|
|
|
|
.\" 2
|
|
|
|
.\" /proc/sys/net/ipv4/tcp_vegas_beta
|
|
|
|
.\" 6
|
|
|
|
.\" /proc/sys/net/ipv4/tcp_vegas_gamma
|
|
|
|
.\" 2
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_abort_on_overflow " (Boolean; default: disabled)"
|
2004-11-03 13:51:07 +00:00
|
|
|
Enable resetting connections if the listening service is too
|
2005-06-15 12:56:21 +00:00
|
|
|
slow and unable to keep up and accept them.
|
|
|
|
It means that if overflow occurred due
|
2004-11-03 13:51:07 +00:00
|
|
|
to a burst, the connection will recover. Enable this option
|
2005-06-14 15:24:55 +00:00
|
|
|
.I only
|
|
|
|
if you are really sure that the listening daemon
|
2004-11-03 13:51:07 +00:00
|
|
|
cannot be tuned to accept connections faster. Enabling this
|
|
|
|
option can harm the clients of your server.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_adv_win_scale " (integer; default: 2)"
|
2004-11-03 13:51:07 +00:00
|
|
|
Count buffering overhead as bytes/2^tcp_adv_win_scale
|
2005-07-06 12:57:38 +00:00
|
|
|
(if tcp_adv_win_scale > 0) or bytes-bytes/2^(\-tcp_adv_win_scale),
|
2005-06-15 12:56:21 +00:00
|
|
|
if it is <= 0.
|
2004-11-03 13:51:07 +00:00
|
|
|
|
|
|
|
The socket receive buffer space is shared between the
|
|
|
|
application and kernel. TCP maintains part of the buffer as
|
|
|
|
the TCP window, this is the size of the receive window
|
|
|
|
advertised to the other end. The rest of the space is used
|
|
|
|
as the "application" buffer, used to isolate the network
|
|
|
|
from scheduling and application latencies. The
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_adv_win_scale
|
2004-11-03 13:51:07 +00:00
|
|
|
default value of 2 implies that the space
|
|
|
|
used for the application buffer is one fourth that of the
|
|
|
|
total.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_app_win " (integer; default: 31)"
|
2004-11-03 13:51:07 +00:00
|
|
|
This variable defines how many
|
|
|
|
bytes of the TCP window are reserved for buffering
|
|
|
|
overhead.
|
|
|
|
|
|
|
|
A maximum of (window/2^tcp_app_win, mss) bytes in the window
|
|
|
|
are reserved for the application buffer. A value of 0
|
2005-06-15 12:56:21 +00:00
|
|
|
implies that no amount is reserved.
|
|
|
|
.\"
|
2005-06-20 14:45:09 +00:00
|
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_bic " (Boolean; default: disabled)"
|
|
|
|
Enable BIC TCP congestion control algorithm.
|
|
|
|
BIC-TCP is a sender-side only change that ensures a linear RTT
|
|
|
|
fairness under large windows while offering both scalability and
|
|
|
|
bounded TCP-friendliness. The protocol combines two schemes
|
|
|
|
called additive increase and binary search increase. When the
|
|
|
|
congestion window is large, additive increase with a large
|
|
|
|
increment ensures linear RTT fairness as well as good
|
|
|
|
scalability. Under small congestion windows, binary search
|
|
|
|
increase provides TCP friendliness.
|
|
|
|
.\"
|
2005-06-20 14:45:09 +00:00
|
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
2005-06-15 12:56:21 +00:00
|
|
|
.TP
|
|
|
|
.BR tcp_bic_low_window " (integer; default: 14)"
|
|
|
|
Sets the threshold window (in packets) where BIC TCP starts to
|
|
|
|
adjust the congestion window. Below this threshold BIC TCP behaves
|
|
|
|
the same as the default TCP Reno.
|
|
|
|
.\"
|
2005-06-20 14:45:09 +00:00
|
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
2005-06-15 12:56:21 +00:00
|
|
|
.TP
|
|
|
|
.BR tcp_bic_fast_convergence " (Boolean; default: enabled)"
|
|
|
|
Forces BIC TCP to more quickly respond to changes in congestion
|
|
|
|
window. Allows two flows sharing the same connection to converge
|
|
|
|
more rapidly.
|
|
|
|
.TP
|
|
|
|
.BR tcp_dsack " (Boolean; default: enabled)"
|
2005-06-21 14:46:08 +00:00
|
|
|
Enable RFC\ 2883 TCP Duplicate SACK support.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_ecn " (Boolean; default: disabled)"
|
2005-06-21 14:46:08 +00:00
|
|
|
Enable RFC\ 2884 Explicit Congestion Notification.
|
2005-06-15 12:56:21 +00:00
|
|
|
When enabled, connectivity to some
|
2004-11-03 13:51:07 +00:00
|
|
|
destinations could be affected due to older, misbehaving
|
|
|
|
routers along the path causing connections to be dropped.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_fack " (Boolean; default: enabled)"
|
|
|
|
Enable TCP Forward Acknowledgement support.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_fin_timeout " (integer; default: 60)"
|
2005-06-14 15:24:55 +00:00
|
|
|
This specifies how many seconds to wait for a final FIN packet before the
|
2004-11-03 13:51:07 +00:00
|
|
|
socket is forcibly closed. This is strictly a violation of
|
|
|
|
the TCP specification, but required to prevent
|
2005-06-15 12:56:21 +00:00
|
|
|
denial-of-service attacks.
|
|
|
|
In Linux 2.2, the default value was 180.
|
|
|
|
.\"
|
2005-06-20 14:45:09 +00:00
|
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
2005-06-15 12:56:21 +00:00
|
|
|
.TP
|
|
|
|
.BR tcp_frto " (Boolean; default: disabled)"
|
|
|
|
Enables F-RTO, an enhanced recovery algorithm for TCP retransmission
|
|
|
|
timeouts. It is particularly beneficial in wireless environments
|
|
|
|
where packet loss is typically due to random radio interference
|
|
|
|
rather than intermediate router congestion.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_keepalive_intvl " (integer; default: 75)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The number of seconds between TCP keep-alive probes.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_keepalive_probes " (integer; default: 9)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The maximum number of TCP keep-alive probes to send
|
|
|
|
before giving up and killing the connection if
|
|
|
|
no response is obtained from the other end.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_keepalive_time " (integer; default: 7200)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The number of seconds a connection needs to be idle
|
|
|
|
before TCP begins sending out keep-alive probes.
|
|
|
|
Keep-alives are only sent when the
|
|
|
|
.B SO_KEEPALIVE
|
|
|
|
socket option is enabled. The default value is 7200 seconds
|
|
|
|
(2 hours). An idle connection is terminated after
|
|
|
|
approximately an additional 11 minutes (9 probes an interval
|
|
|
|
of 75 seconds apart) when keep-alive is enabled.
|
|
|
|
|
|
|
|
Note that underlying connection tracking mechanisms and
|
|
|
|
application timeouts may be much shorter.
|
2005-06-15 12:56:21 +00:00
|
|
|
.\"
|
2005-06-20 14:45:09 +00:00
|
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
2005-06-15 12:56:21 +00:00
|
|
|
.TP
|
|
|
|
.BR tcp_low_latency " (Boolean; default: disabled)"
|
|
|
|
If enabled, the TCP stack makes decisions that prefer lower
|
|
|
|
latency as opposed to higher throughput.
|
|
|
|
It this option is disabled, then higher throughput is preferred.
|
|
|
|
An example of an application where this default should be
|
|
|
|
changed would be a Beowulf compute cluster.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_max_orphans " (integer; default: see below)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The maximum number of orphaned (not attached to any user file
|
|
|
|
handle) TCP sockets allowed in the system. When this number
|
|
|
|
is exceeded, the orphaned connection is reset and a warning
|
2005-06-14 15:24:55 +00:00
|
|
|
is printed. This limit exists only to prevent simple denial-of-service
|
2004-11-03 13:51:07 +00:00
|
|
|
attacks. Lowering this limit is not recommended. Network
|
|
|
|
conditions might require you to increase the number of
|
|
|
|
orphans allowed, but note that each orphan can eat up to ~64K
|
|
|
|
of unswappable memory. The default initial value is set
|
|
|
|
equal to the kernel parameter NR_FILE. This initial default
|
|
|
|
is adjusted depending on the memory in the system.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_max_syn_backlog " (integer; default: see below)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The maximum number of queued connection requests which have
|
|
|
|
still not received an acknowledgement from the connecting
|
|
|
|
client. If this number is exceeded, the kernel will begin
|
|
|
|
dropping requests. The default value of 256 is increased to
|
|
|
|
1024 when the memory present in the system is adequate or
|
|
|
|
greater (>= 128Mb), and reduced to 128 for those systems with
|
|
|
|
very low memory (<= 32Mb). It is recommended that if this
|
|
|
|
needs to be increased above 1024, TCP_SYNQ_HSIZE in
|
|
|
|
include/net/tcp.h be modified to keep
|
|
|
|
TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be
|
|
|
|
recompiled.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_max_tw_buckets " (integer; default: see below)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The maximum number of sockets in TIME_WAIT state allowed in
|
2005-06-14 15:24:55 +00:00
|
|
|
the system. This limit exists only to prevent simple denial-of-service
|
2004-11-03 13:51:07 +00:00
|
|
|
attacks. The default value of NR_FILE*2 is adjusted
|
|
|
|
depending on the memory in the system. If this number is
|
|
|
|
exceeded, the socket is closed and a warning is printed.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_mem
|
2004-11-03 13:51:07 +00:00
|
|
|
This is a vector of 3 integers: [low, pressure, high]. These
|
|
|
|
bounds are used by TCP to track its memory usage. The
|
|
|
|
defaults are calculated at boot time from the amount of
|
|
|
|
available memory.
|
|
|
|
|
|
|
|
.I low
|
|
|
|
- TCP doesn't regulate its memory allocation when the number
|
|
|
|
of pages it has allocated globally is below this number.
|
|
|
|
|
|
|
|
.I pressure
|
|
|
|
- when the amount of memory allocated by TCP
|
|
|
|
exceeds this number of pages, TCP moderates its memory
|
|
|
|
consumption. This memory pressure state is exited
|
|
|
|
once the number of pages allocated falls below
|
|
|
|
the
|
2005-11-02 13:55:25 +00:00
|
|
|
.I low
|
2004-11-03 13:51:07 +00:00
|
|
|
mark.
|
|
|
|
|
|
|
|
.I high
|
|
|
|
- the maximum number of pages, globally, that TCP
|
|
|
|
will allocate. This value overrides any other limits
|
|
|
|
imposed by the kernel.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_orphan_retries " (integer; default: 8)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The maximum number of attempts made to probe the other
|
|
|
|
end of a connection which has been closed by our end.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_reordering " (integer; default: 3)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The maximum a packet can be reordered in a TCP packet stream
|
|
|
|
without TCP assuming packet loss and going into slow start.
|
2005-06-15 12:56:21 +00:00
|
|
|
It is not advisable to change this number.
|
2004-11-03 13:51:07 +00:00
|
|
|
This is a packet reordering detection metric designed to
|
|
|
|
minimize unnecessary back off and retransmits provoked by
|
|
|
|
reordering of packets on a connection.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_retrans_collapse " (Boolean; default: enabled)"
|
2004-11-03 13:51:07 +00:00
|
|
|
Try to send full-sized packets during retransmit.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_retries1 " (integer; default: 3)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The number of times TCP will attempt to retransmit a
|
|
|
|
packet on an established connection normally,
|
|
|
|
without the extra effort of getting the network
|
|
|
|
layers involved. Once we exceed this number of
|
|
|
|
retransmits, we first have the network layer
|
|
|
|
update the route if possible before each new retransmit.
|
|
|
|
The default is the RFC specified minimum of 3.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_retries2 " (integer; default: 15)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The maximum number of times a TCP packet is retransmitted
|
|
|
|
in established state before giving up. The default
|
|
|
|
value is 15, which corresponds to a duration of
|
|
|
|
approximately between 13 to 30 minutes, depending
|
2005-06-21 14:46:08 +00:00
|
|
|
on the retransmission timeout. The RFC\ 1122 specified
|
2004-11-03 13:51:07 +00:00
|
|
|
minimum limit of 100 seconds is typically deemed too
|
|
|
|
short.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_rfc1337 " (Boolean; default: disabled)"
|
2005-07-20 07:50:45 +00:00
|
|
|
Enable TCP behaviour conformant with RFC\ 1337.
|
2005-06-15 12:56:21 +00:00
|
|
|
When disabled,
|
2004-11-03 13:51:07 +00:00
|
|
|
if a RST is received in TIME_WAIT state, we close
|
|
|
|
the socket immediately without waiting for the end
|
|
|
|
of the TIME_WAIT period.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_rmem
|
2004-11-03 13:51:07 +00:00
|
|
|
This is a vector of 3 integers: [min, default,
|
|
|
|
max]. These parameters are used by TCP to regulate receive
|
|
|
|
buffer sizes. TCP dynamically adjusts the size of the
|
|
|
|
receive buffer from the defaults listed below, in the range
|
|
|
|
of these sysctl variables, depending on memory available
|
|
|
|
in the system.
|
|
|
|
|
|
|
|
.I min
|
|
|
|
- minimum size of the receive buffer used by each TCP
|
|
|
|
socket. The default value is 4K, and is lowered to
|
2005-06-14 15:24:55 +00:00
|
|
|
PAGE_SIZE bytes in low-memory systems. This value
|
2004-11-03 13:51:07 +00:00
|
|
|
is used to ensure that in memory pressure mode,
|
|
|
|
allocations below this size will still succeed. This is not
|
|
|
|
used to bound the size of the receive buffer declared
|
|
|
|
using
|
|
|
|
.B SO_RCVBUF
|
|
|
|
on a socket.
|
|
|
|
|
|
|
|
.I default
|
|
|
|
- the default size of the receive buffer for a TCP socket.
|
|
|
|
This value overwrites the initial default buffer size from
|
|
|
|
the generic global
|
|
|
|
.B net.core.rmem_default
|
|
|
|
defined for all protocols. The default value is 87380
|
2005-06-14 15:24:55 +00:00
|
|
|
bytes, and is lowered to 43689 in low-memory systems. If
|
2004-11-03 13:51:07 +00:00
|
|
|
larger receive buffer sizes are desired, this value should
|
|
|
|
be increased (to affect all sockets). To employ large TCP
|
|
|
|
windows, the
|
|
|
|
.B net.ipv4.tcp_window_scaling
|
|
|
|
must be enabled (default).
|
|
|
|
|
|
|
|
.I max
|
|
|
|
- the maximum size of the receive buffer used by
|
|
|
|
each TCP socket. This value does not override the global
|
|
|
|
.BR net.core.rmem_max .
|
|
|
|
This is not used to limit the size of the receive buffer
|
|
|
|
declared using
|
|
|
|
.B SO_RCVBUF
|
|
|
|
on a socket.
|
|
|
|
The default value of 87380*2 bytes is lowered to 87380
|
2005-06-14 15:24:55 +00:00
|
|
|
in low-memory systems.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_sack " (Boolean; default: enabled)"
|
2005-06-21 14:46:08 +00:00
|
|
|
Enable RFC\ 2018 TCP Selective Acknowledgements.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_stdurg " (Boolean; default: disabled)"
|
2005-06-21 14:46:08 +00:00
|
|
|
If this option is enabled, then use the RFC\ 1122 interpretation
|
2005-06-20 14:45:09 +00:00
|
|
|
of the TCP urgent-pointer field.
|
2005-06-21 14:46:08 +00:00
|
|
|
.\" RFC 793 was ambiguous in its specification of the meaning of the
|
|
|
|
.\" urgent pointer. RFC 1122 (and RFC 961) fixed on a particular
|
|
|
|
.\" resolution of this ambiguity (unfortunately the "wrong" one).
|
2005-06-20 14:45:09 +00:00
|
|
|
According to this interpretation, the urgent pointer points
|
|
|
|
to the last byte of urgent data.
|
|
|
|
If this option is disabled, then use the BSD-compatible interpretation of
|
2005-06-21 14:46:08 +00:00
|
|
|
the urgent pointer:
|
2005-06-20 14:45:09 +00:00
|
|
|
the urgent pointer points to the first byte after the urgent data.
|
|
|
|
Enabling this option may lead to interoperability problems.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_synack_retries " (integer; default: 5)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The maximum number of times a SYN/ACK segment
|
|
|
|
for a passive TCP connection will be retransmitted.
|
2005-06-15 12:56:21 +00:00
|
|
|
This number should not be higher than 255.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_syncookies " (Boolean)"
|
2004-11-03 13:51:07 +00:00
|
|
|
Enable TCP syncookies. The kernel must be compiled with
|
|
|
|
.BR CONFIG_SYN_COOKIES .
|
|
|
|
Send out syncookies when the syn backlog queue of a socket
|
|
|
|
overflows. The syncookies feature attempts to protect a
|
|
|
|
socket from a SYN flood attack. This should be used as a
|
|
|
|
last resort, if at all. This is a violation of the TCP
|
|
|
|
protocol, and conflicts with other areas of TCP such as TCP
|
|
|
|
extensions. It can cause problems for clients and relays.
|
|
|
|
It is not recommended as a tuning mechanism for heavily
|
|
|
|
loaded servers to help with overloaded or misconfigured
|
|
|
|
conditions. For recommended alternatives see
|
|
|
|
.BR tcp_max_syn_backlog ,
|
|
|
|
.BR tcp_synack_retries ,
|
2005-06-15 12:56:21 +00:00
|
|
|
and
|
2004-11-03 13:51:07 +00:00
|
|
|
.BR tcp_abort_on_overflow .
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_syn_retries " (integer; default: 5)"
|
2004-11-03 13:51:07 +00:00
|
|
|
The maximum number of times initial SYNs for an active TCP
|
|
|
|
connection attempt will be retransmitted. This value should
|
|
|
|
not be higher than 255. The default value is 5, which
|
|
|
|
corresponds to approximately 180 seconds.
|
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_timestamps " (Boolean; default: enabled)"
|
2005-06-21 14:46:08 +00:00
|
|
|
Enable RFC\ 1323 TCP timestamps.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_tw_recycle " (Boolean; default: disabled)"
|
|
|
|
Enable fast recycling of TIME-WAIT sockets.
|
|
|
|
Enabling this option is not
|
2004-11-03 13:51:07 +00:00
|
|
|
recommended since this causes problems when working
|
|
|
|
with NAT (Network Address Translation).
|
2005-06-15 12:56:21 +00:00
|
|
|
.\"
|
2005-06-20 14:45:09 +00:00
|
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
2005-06-15 12:56:21 +00:00
|
|
|
.TP
|
|
|
|
.BR tcp_tw_reuse " (Boolean; default: disabled)"
|
|
|
|
Allow to reuse TIME-WAIT sockets for new connections when it is
|
|
|
|
safe from protocol viewpoint.
|
|
|
|
It should not be changed without advice/request of technical
|
|
|
|
experts.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_window_scaling " (Boolean; default: disabled)"
|
2005-06-21 14:46:08 +00:00
|
|
|
Enable RFC\ 1323 TCP window scaling.
|
2005-06-15 12:56:21 +00:00
|
|
|
This feature allows the use of a large window
|
2004-11-03 13:51:07 +00:00
|
|
|
(> 64K) on a TCP connection, should the other end support it.
|
|
|
|
Normally, the 16 bit window length field in the TCP header
|
|
|
|
limits the window size to less than 64K bytes. If larger
|
|
|
|
windows are desired, applications can increase the size of
|
|
|
|
their socket buffers and the window scaling option will be
|
|
|
|
employed. If
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_window_scaling
|
2004-11-03 13:51:07 +00:00
|
|
|
is disabled, TCP will not negotiate the use of window
|
|
|
|
scaling with the other end during connection setup.
|
2005-06-15 12:56:21 +00:00
|
|
|
.\"
|
2005-06-20 14:45:09 +00:00
|
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
2005-06-15 12:56:21 +00:00
|
|
|
.TP
|
|
|
|
.BR tcp_vegas_cong_avoid " (Boolean; default: disabled)"
|
|
|
|
Enable TCP Vegas congestion avoidance algorithm.
|
|
|
|
TCP Vegas is a sender-side only change to TCP that anticipates
|
|
|
|
the onset of congestion by estimating the bandwidth. TCP Vegas
|
|
|
|
adjusts the sending rate by modifying the congestion
|
|
|
|
window. TCP Vegas should provide less packet loss, but it is
|
|
|
|
not as aggressive as TCP Reno.
|
|
|
|
.\"
|
2005-06-20 14:45:09 +00:00
|
|
|
.\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
|
2005-06-15 12:56:21 +00:00
|
|
|
.TP
|
|
|
|
.BR tcp_westwood " (Boolean; default: disabled)"
|
|
|
|
Enable TCP Westwood+ congestion control algorithm.
|
|
|
|
TCP Westwood+ is a sender-side only modification of the TCP Reno
|
|
|
|
protocol stack that optimizes the performance of TCP congestion
|
|
|
|
control. It is based on end-to-end bandwidth estimation to set
|
|
|
|
congestion window and slow start threshold after a congestion
|
|
|
|
episode. Using this estimation, TCP Westwood+ adaptively sets a
|
|
|
|
slow start threshold and a congestion window which takes into
|
|
|
|
account the bandwidth used at the time congestion is experienced.
|
2005-06-21 14:46:08 +00:00
|
|
|
TCP Westwood+ significantly increases fairness with respect to
|
|
|
|
TCP Reno in wired networks and throughput over wireless links.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
2005-06-15 12:56:21 +00:00
|
|
|
.BR tcp_wmem
|
2004-11-03 13:51:07 +00:00
|
|
|
This is a vector of 3 integers: [min, default, max]. These
|
|
|
|
parameters are used by TCP to regulate send buffer sizes.
|
|
|
|
TCP dynamically adjusts the size of the send buffer from the
|
|
|
|
default values listed below, in the range of these sysctl
|
|
|
|
variables, depending on memory available.
|
|
|
|
|
|
|
|
.I min
|
|
|
|
- minimum size of the send buffer used by each TCP socket.
|
|
|
|
The default value is 4K bytes.
|
|
|
|
This value is used to ensure that in memory pressure mode,
|
|
|
|
allocations below this size will still succeed. This is not
|
|
|
|
used to bound the size of the send buffer declared
|
|
|
|
using
|
|
|
|
.B SO_SNDBUF
|
|
|
|
on a socket.
|
|
|
|
|
|
|
|
.I default
|
|
|
|
- the default size of the send buffer for a TCP socket.
|
|
|
|
This value overwrites the initial default buffer size from
|
|
|
|
the generic global
|
|
|
|
.B net.core.wmem_default
|
|
|
|
defined for all protocols. The default value is 16K bytes.
|
|
|
|
If larger send buffer sizes are desired, this value
|
|
|
|
should be increased (to affect all sockets). To employ
|
|
|
|
large TCP windows, the sysctl variable
|
|
|
|
.B net.ipv4.tcp_window_scaling
|
|
|
|
must be enabled (default).
|
|
|
|
|
|
|
|
.I max
|
|
|
|
- the maximum size of the send buffer used by
|
|
|
|
each TCP socket. This value does not override the global
|
|
|
|
.BR net.core.wmem_max .
|
|
|
|
This is not used to limit the size of the send buffer
|
|
|
|
declared using
|
|
|
|
.B SO_SNDBUF
|
|
|
|
on a socket.
|
|
|
|
The default value is 128K bytes. It is lowered to 64K
|
|
|
|
depending on the memory available in the system.
|
|
|
|
.SH "SOCKET OPTIONS"
|
|
|
|
To set or get a TCP socket option, call
|
|
|
|
.BR getsockopt (2)
|
|
|
|
to read or
|
|
|
|
.BR setsockopt (2)
|
|
|
|
to write the option with the option level argument set to
|
2005-08-16 11:47:35 +00:00
|
|
|
.BR IPPROTO_TCP .
|
|
|
|
.\" or SOL_TCP on Linux
|
2004-11-03 13:51:07 +00:00
|
|
|
In addition,
|
|
|
|
most
|
2005-08-16 11:47:35 +00:00
|
|
|
.B IPPROTO_IP
|
2004-11-03 13:51:07 +00:00
|
|
|
socket options are valid on TCP sockets. For more
|
|
|
|
information see
|
|
|
|
.BR ip (7).
|
2005-12-05 16:20:34 +00:00
|
|
|
.\" FIXME Document TCP_CONGESTION (new in 2.6.13)
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
|
|
|
.B TCP_CORK
|
|
|
|
If set, don't send out partial frames. All queued
|
|
|
|
partial frames are sent when the option is cleared again.
|
|
|
|
This is useful for prepending headers before calling
|
|
|
|
.BR sendfile (2),
|
2004-11-11 16:11:50 +00:00
|
|
|
or for throughput optimization.
|
2005-12-05 13:38:18 +00:00
|
|
|
As currently implemented, there is a 200 millisecond ceiling on the time
|
|
|
|
for which output is corked by
|
|
|
|
.BR TCP_CORK .
|
|
|
|
If this ceiling is reached, then queued data is automatically transmitted.
|
2004-11-11 16:11:50 +00:00
|
|
|
This option can be
|
2004-11-03 13:51:07 +00:00
|
|
|
combined with
|
2004-11-11 16:11:50 +00:00
|
|
|
.BR TCP_NODELAY
|
|
|
|
only since Linux 2.5.71.
|
2004-11-03 13:51:07 +00:00
|
|
|
This option should not be used in code intended to be
|
|
|
|
portable.
|
|
|
|
.TP
|
|
|
|
.B TCP_DEFER_ACCEPT
|
|
|
|
Allows a listener to be awakened only when data arrives on
|
|
|
|
the socket. Takes an integer value (seconds), this can
|
|
|
|
bound the maximum number of attempts TCP will make to
|
|
|
|
complete the connection. This option should not be used in
|
|
|
|
code intended to be portable.
|
|
|
|
.TP
|
|
|
|
.B TCP_INFO
|
|
|
|
Used to collect information about this socket. The kernel
|
2005-06-14 15:24:55 +00:00
|
|
|
returns a \fIstruct tcp_info\fP as defined in the file
|
2004-11-03 13:51:07 +00:00
|
|
|
/usr/include/linux/tcp.h. This option should not be used in
|
|
|
|
code intended to be portable.
|
|
|
|
.TP
|
|
|
|
.B TCP_KEEPCNT
|
|
|
|
The maximum number of keepalive probes TCP should send
|
|
|
|
before dropping the connection. This option should not be
|
|
|
|
used in code intended to be portable.
|
|
|
|
.TP
|
|
|
|
.B TCP_KEEPIDLE
|
|
|
|
The time (in seconds) the connection needs to remain idle
|
|
|
|
before TCP starts sending keepalive probes, if the socket
|
|
|
|
option SO_KEEPALIVE has been set on this socket. This
|
|
|
|
option should not be used in code intended to be portable.
|
|
|
|
.TP
|
|
|
|
.B TCP_KEEPINTVL
|
|
|
|
The time (in seconds) between individual keepalive probes.
|
|
|
|
This option should not be used in code intended to be
|
|
|
|
portable.
|
|
|
|
.TP
|
|
|
|
.B TCP_LINGER2
|
|
|
|
The lifetime of orphaned FIN_WAIT2 state sockets. This
|
|
|
|
option can be used to override the system wide sysctl
|
|
|
|
.B tcp_fin_timeout
|
|
|
|
on this socket. This is not to be confused with the
|
|
|
|
.BR socket (7)
|
|
|
|
level option
|
|
|
|
.BR SO_LINGER .
|
|
|
|
This option should not be used in code intended to be
|
|
|
|
portable.
|
|
|
|
.TP
|
|
|
|
.B TCP_MAXSEG
|
|
|
|
The maximum segment size for outgoing TCP packets. If this
|
|
|
|
option is set before connection establishment, it also
|
|
|
|
changes the MSS value announced to the other end in the
|
|
|
|
initial packet. Values greater than the (eventual)
|
|
|
|
interface MTU have no effect. TCP will also impose
|
|
|
|
its minimum and maximum bounds over the value provided.
|
|
|
|
.TP
|
|
|
|
.B TCP_NODELAY
|
|
|
|
If set, disable the Nagle algorithm. This means that segments
|
|
|
|
are always sent as soon as possible, even if there is only a
|
|
|
|
small amount of data. When not set, data is buffered until there
|
|
|
|
is a sufficient amount to send out, thereby avoiding the
|
|
|
|
frequent sending of small packets, which results in poor
|
2004-11-11 16:11:50 +00:00
|
|
|
utilization of the network.
|
2005-04-18 13:35:29 +00:00
|
|
|
This option is overridden by
|
2004-11-11 16:22:41 +00:00
|
|
|
.BR TCP_CORK ;
|
|
|
|
however, setting this option forces an explicit flush of
|
|
|
|
pending output, even if
|
|
|
|
.B TCP_CORK
|
|
|
|
is currently set.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
|
|
|
.B TCP_QUICKACK
|
|
|
|
Enable quickack mode if set or disable quickack
|
|
|
|
mode if cleared. In quickack mode, acks are sent
|
|
|
|
immediately, rather than delayed if needed in accordance
|
|
|
|
to normal TCP operation. This flag is not permanent,
|
|
|
|
it only enables a switch to or from quickack mode.
|
|
|
|
Subsequent operation of the TCP protocol will
|
|
|
|
once again enter/leave quickack mode depending on
|
|
|
|
internal protocol processing and factors such as
|
|
|
|
delayed ack timeouts occurring and data transfer.
|
|
|
|
This option should not be used in code intended to be
|
|
|
|
portable.
|
|
|
|
.TP
|
|
|
|
.B TCP_SYNCNT
|
|
|
|
Set the number of SYN retransmits that TCP should send before
|
|
|
|
aborting the attempt to connect. It cannot exceed 255.
|
|
|
|
This option should not be used in code intended to be
|
|
|
|
portable.
|
|
|
|
.TP
|
|
|
|
.B TCP_WINDOW_CLAMP
|
|
|
|
Bound the size of the advertised window to this value. The
|
|
|
|
kernel imposes a minimum size of SOCK_MIN_RCVBUF/2.
|
|
|
|
This option should not be used in code intended to be
|
|
|
|
portable.
|
|
|
|
.SH IOCTLS
|
2005-06-14 15:24:55 +00:00
|
|
|
These following
|
|
|
|
.BR ioctl (2)
|
|
|
|
calls return information in
|
|
|
|
.IR value .
|
2004-11-03 13:51:07 +00:00
|
|
|
The correct syntax is:
|
|
|
|
.PP
|
|
|
|
.RS
|
|
|
|
.nf
|
|
|
|
.BI int " value";
|
|
|
|
.IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");"
|
|
|
|
.fi
|
|
|
|
.RE
|
2005-06-14 15:24:55 +00:00
|
|
|
.PP
|
|
|
|
.I ioctl_type
|
|
|
|
is one of the following:
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
|
|
|
.BR SIOCINQ
|
2005-06-14 15:24:55 +00:00
|
|
|
Returns the amount of queued unread data in the receive buffer.
|
|
|
|
The socket must not be in LISTEN state, otherwise an error (EINVAL)
|
2004-11-03 13:51:07 +00:00
|
|
|
is returned.
|
|
|
|
.TP
|
|
|
|
.B SIOCATMARK
|
2005-06-14 15:24:55 +00:00
|
|
|
Returns true (i.e.,
|
|
|
|
.I value
|
|
|
|
is non-zero) if the inbound data stream is at the urgent mark.
|
2005-06-16 10:23:59 +00:00
|
|
|
.sp
|
|
|
|
If the
|
|
|
|
.BR SO_OOBINLINE
|
|
|
|
socket option is set, and
|
2005-06-14 15:24:55 +00:00
|
|
|
.B SIOCATMARK
|
2005-06-16 10:23:59 +00:00
|
|
|
returns true, then the
|
2005-06-14 15:24:55 +00:00
|
|
|
next read from the socket will return the urgent data.
|
2005-06-16 10:23:59 +00:00
|
|
|
If the
|
|
|
|
.BR SO_OOBINLINE
|
|
|
|
socket option is not set, and
|
|
|
|
.B SIOCATMARK
|
|
|
|
returns true, then the
|
|
|
|
next read from the socket will return the bytes following
|
|
|
|
the urgent data (to actually read the urgent data requires the
|
|
|
|
.B recv(MSG_OOB)
|
|
|
|
flag).
|
|
|
|
.sp
|
|
|
|
Note that a read never reads across the urgent mark.
|
2005-06-21 14:46:08 +00:00
|
|
|
If an application is informed of the presence of urgent data via
|
|
|
|
.BR select (2)
|
|
|
|
(using the
|
|
|
|
.I exceptfds
|
|
|
|
argument) or through delivery of a
|
|
|
|
.B SIGURG
|
|
|
|
signal,
|
|
|
|
then it can advance up to the mark using a loop which repeatedly tests
|
|
|
|
.B SIOCATMARK
|
|
|
|
and performs a read (requesting any number of bytes) as long as
|
2005-06-14 15:24:55 +00:00
|
|
|
.B SIOCATMARK
|
2005-06-21 14:46:08 +00:00
|
|
|
returns false.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
|
|
|
.B SIOCOUTQ
|
2005-06-14 15:24:55 +00:00
|
|
|
Returns the amount of unsent data in the socket send queue.
|
|
|
|
The socket must not be in LISTEN state, otherwise an error (EINVAL)
|
2004-11-03 13:51:07 +00:00
|
|
|
is returned.
|
|
|
|
.SH "ERROR HANDLING"
|
|
|
|
When a network error occurs, TCP tries to resend the
|
|
|
|
packet. If it doesn't succeed after some time, either
|
|
|
|
.B ETIMEDOUT
|
|
|
|
or the last received error on this connection is reported.
|
|
|
|
.PP
|
|
|
|
Some applications require a quicker error notification.
|
|
|
|
This can be enabled with the
|
2005-08-16 11:47:35 +00:00
|
|
|
.B IPPROTO_IP
|
2004-11-03 13:51:07 +00:00
|
|
|
level
|
|
|
|
.B IP_RECVERR
|
|
|
|
socket option. When this option is enabled, all incoming
|
|
|
|
errors are immediately passed to the user program. Use this
|
2005-07-18 12:43:00 +00:00
|
|
|
option with care \(em it makes TCP less tolerant to routing
|
2004-11-03 13:51:07 +00:00
|
|
|
changes and other normal network conditions.
|
|
|
|
.SH NOTES
|
|
|
|
TCP has no real out-of-band data; it has urgent data. In
|
|
|
|
Linux this means if the other end sends newer out-of-band
|
|
|
|
data the older urgent data is inserted as normal data into
|
|
|
|
the stream (even when
|
|
|
|
.B SO_OOBINLINE
|
2005-06-21 14:46:08 +00:00
|
|
|
is not set). This differs from BSD-based stacks.
|
2004-11-03 13:51:07 +00:00
|
|
|
.PP
|
|
|
|
Linux uses the BSD compatible interpretation of the urgent
|
2005-06-21 14:46:08 +00:00
|
|
|
pointer field by default. This violates RFC\ 1122, but is
|
2004-11-03 13:51:07 +00:00
|
|
|
required for interoperability with other stacks. It can be
|
|
|
|
changed by the
|
|
|
|
.B tcp_stdurg
|
|
|
|
sysctl.
|
|
|
|
.SH ERRORS
|
|
|
|
.TP
|
|
|
|
.B EPIPE
|
|
|
|
The other end closed the socket unexpectedly or a read is
|
|
|
|
executed on a shut down socket.
|
|
|
|
.TP
|
|
|
|
.B ETIMEDOUT
|
|
|
|
The other end didn't acknowledge retransmitted data after
|
|
|
|
some time.
|
|
|
|
.TP
|
|
|
|
.B EAFNOTSUPPORT
|
|
|
|
Passed socket address type in
|
|
|
|
.I sin_family
|
|
|
|
was not
|
|
|
|
.BR AF_INET .
|
|
|
|
.PP
|
|
|
|
Any errors defined for
|
|
|
|
.BR ip (7)
|
|
|
|
or the generic socket layer may also be returned for TCP.
|
|
|
|
.SH BUGS
|
|
|
|
Not all errors are documented.
|
|
|
|
.br
|
|
|
|
IPv6 is not described.
|
|
|
|
.\" Only a single Linux kernel version is described
|
|
|
|
.\" Info for 2.2 was lost. Should be added again,
|
|
|
|
.\" or put into a separate page.
|
|
|
|
.SH VERSIONS
|
2005-06-14 15:24:55 +00:00
|
|
|
Support for Explicit Congestion Notification, zero-copy
|
2005-10-20 15:11:10 +00:00
|
|
|
.BR sendfile (),
|
|
|
|
reordering support and some SACK extensions
|
2004-11-03 13:51:07 +00:00
|
|
|
(DSACK) were introduced in 2.4.
|
|
|
|
Support for forward acknowledgement (FACK), TIME_WAIT recycling,
|
|
|
|
per connection keepalive socket options and sysctls
|
|
|
|
were introduced in 2.3.
|
|
|
|
|
|
|
|
The default values and descriptions for the sysctl variables
|
|
|
|
given above are applicable for the 2.4 kernel.
|
|
|
|
.SH AUTHORS
|
|
|
|
This man page was originally written by Andi Kleen.
|
|
|
|
It was updated for 2.4 by Nivedita Singhvi with input from
|
|
|
|
Alexey Kuznetsov's Documentation/networking/ip-sysctls.txt
|
|
|
|
document.
|
|
|
|
.SH "SEE ALSO"
|
|
|
|
.BR accept (2),
|
|
|
|
.BR bind (2),
|
|
|
|
.BR connect (2),
|
|
|
|
.BR getsockopt (2),
|
|
|
|
.BR listen (2),
|
|
|
|
.BR recvmsg (2),
|
|
|
|
.BR sendfile (2),
|
|
|
|
.BR sendmsg (2),
|
|
|
|
.BR socket (2),
|
|
|
|
.BR sysctl (2),
|
|
|
|
.BR ip (7),
|
|
|
|
.BR socket (7)
|
|
|
|
.sp
|
2005-06-21 14:46:08 +00:00
|
|
|
RFC\ 793 for the TCP specification.
|
2004-11-03 13:51:07 +00:00
|
|
|
.br
|
2005-06-21 14:46:08 +00:00
|
|
|
RFC\ 1122 for the TCP requirements and a description
|
2004-11-03 13:51:07 +00:00
|
|
|
of the Nagle algorithm.
|
|
|
|
.br
|
2005-06-21 14:46:08 +00:00
|
|
|
RFC\ 1323 for TCP timestamp and window scaling options.
|
2004-11-03 13:51:07 +00:00
|
|
|
.br
|
2005-06-21 14:46:08 +00:00
|
|
|
RFC\ 1644 for a description of TIME_WAIT assassination
|
2004-11-03 13:51:07 +00:00
|
|
|
hazards.
|
|
|
|
.br
|
2005-06-21 14:46:08 +00:00
|
|
|
RFC\ 2481 for a description of Explicit Congestion
|
2004-11-03 13:51:07 +00:00
|
|
|
Notification.
|
|
|
|
.br
|
2005-06-21 14:46:08 +00:00
|
|
|
RFC\ 2581 for TCP congestion control algorithms.
|
2004-11-03 13:51:07 +00:00
|
|
|
.br
|
2005-06-21 14:46:08 +00:00
|
|
|
RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK.
|