mirror of https://github.com/tLDP/LDP
728 lines
35 KiB
XML
728 lines
35 KiB
XML
<!-- start of file -->
|
||
|
||
<!-- This .xml file is part of the Traffic-Control-HOWTO document -->
|
||
|
||
<!-- $Id$ -->
|
||
|
||
<!--
|
||
|
||
The article was authored by Martin A. Brown <martin@linux-ip.net>
|
||
for the linux community, and has been released under the GNU Free
|
||
Documentation License (GFDL) through The Linux Documentation
|
||
Project (TLDP).
|
||
|
||
This HOWTO is likely available at the following address:
|
||
|
||
http://tldp.org/HOWTO/Traffic-Control-HOWTO/
|
||
|
||
-->
|
||
|
||
<!-- conventions used in this documentation....
|
||
|
||
- each section is a separate file
|
||
|
||
-->
|
||
|
||
<section id="overview">
|
||
|
||
<title>Overview of Concepts</title>
|
||
<para>
|
||
This section will
|
||
<link linkend="o-what-is">introduce traffic control</link> and
|
||
<link linkend="o-why-use">examine reasons for it</link>,
|
||
identify a few
|
||
<link linkend="o-advantages">advantages</link> and
|
||
<link linkend="o-disadvantages">disadvantages</link> and
|
||
introduce key concepts used in traffic control.
|
||
</para>
|
||
|
||
<section id="o-what-is">
|
||
<title>What is it?</title>
|
||
<para>
|
||
Traffic control is the name given to the sets of queuing systems and
|
||
mechanisms by which packets are received and transmitted on a router.
|
||
This includes deciding (if and) which packets to accept at what
|
||
rate on the input of an interface and determining which packets to
|
||
transmit in what order at what rate on the output of an interface.
|
||
</para>
|
||
<para>
|
||
In the simplest possible model, traffic control consists of
|
||
a single queue which collects entering packets and dequeues them as
|
||
quickly as the hardware (or underlying device) can accept them. This
|
||
sort of queue is a &sch_fifo;. This is like a single toll booth for
|
||
entering a highway. Every car must stop and pay the toll. Other cars
|
||
wait their turn.
|
||
</para>
|
||
<para>
|
||
Linux provides this simplest traffic control tool (&sch_fifo;), and
|
||
in addition offers a wide variety of other tools that allow all sorts of
|
||
control over packet handling.
|
||
</para>
|
||
<note>
|
||
<para>
|
||
The default qdisc under Linux is the &link-sch_pfifo_fast;, which is
|
||
slightly more complex than the &link-sch_fifo;.
|
||
</para>
|
||
</note>
|
||
<para>
|
||
There are examples of queues in all sorts of software. The queue is a
|
||
way of organizing the pending tasks or data (see also
|
||
<xref linkend="o-queues"/>). Because network links typically carry data
|
||
in a serialized fashion, a queue is required to manage the outbound
|
||
data packets.
|
||
</para>
|
||
<para>
|
||
In the case of a desktop machine and an efficient webserver sharing the
|
||
same uplink to the Internet, the following contention for bandwidth may
|
||
occur. The web server may be able to fill up the output queue on the
|
||
router faster than the data can be transmitted across the link, at which
|
||
point the router starts to drop packets (its buffer is full!). Now, the
|
||
desktop machine (with an interactive application user) may be faced with
|
||
packet loss and high latency. Note that high latency sometimes leads to
|
||
screaming users! By separating the internal queues used to service
|
||
these two different classes of application, there can be better sharing
|
||
of the network resource between the two applications.
|
||
</para>
|
||
<para>
|
||
Traffic control is a set of tools allowing an administrator granular
|
||
control over these queues and the queuing mechanisms of a
|
||
networked device. The power to rearrange traffic flows and packets with
|
||
these tools is tremendous and can be complicated, but is no substitute
|
||
for adequate bandwidth.
|
||
</para>
|
||
<para>
|
||
The term Quality of Service (QoS) is often used as a synonym for traffic
|
||
control at an IP-layer.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="o-why-use">
|
||
<title>Why use it?</title>
|
||
<para>
|
||
Traffic control tools allow the implementer to apply preferences,
|
||
organizational or business policies to packets or network flows
|
||
transmitted into a network. This control allows stewardship over the
|
||
network resources such as throughput or latency.
|
||
</para>
|
||
<para>
|
||
Fundamentally, traffic control becomes a necessity because of packet
|
||
switching in networks.
|
||
</para>
|
||
<sidebar>
|
||
<para>
|
||
For a brief digression, to explain the novelty and cleverness
|
||
of packet switching, think about the circuit-switched telephone networks
|
||
that were built over the entire 20th century. In order to set up a
|
||
call, the network gear knew rules about call establishment and when a
|
||
caller tried to connect, the network employed the rules to reserve a
|
||
circuit for the entire duration of the call or connection. While one
|
||
call was engaged, using that resource, no other call or caller could use
|
||
that resource. This meant many individual pieces of equipment could
|
||
block call setup because of resource unavailability.
|
||
</para>
|
||
<para>
|
||
Let's return to packet-switched networks, a mid-20th century invention,
|
||
later in wide use, and nearly ubiquitous in the 21st century.
|
||
Packet-switched networks differ from circuit based networks in one very
|
||
important regard. The unit of data handled by the network gear is not a
|
||
circuit, but rather a small chunk of data called a packet. Inexactly
|
||
speaking, the packet is a letter in an envelope with a destination
|
||
address. The packet-switched network had only a very small amount of
|
||
work to do, reading the destination identifier and transmitting the
|
||
packet.
|
||
</para>
|
||
<para>
|
||
Sometimes, packet-switched networks are described as stateless because
|
||
they do not need to track all of the flows (analogy to a circuit) that
|
||
are active in the network. In order to be function, the packet-handling
|
||
machines must know how to reach the destinations addresses. One analogy
|
||
is a package-handling service like your postal service, UPS or DHL.
|
||
</para>
|
||
<para>
|
||
If there's a sudden influx of packets into a packet-switched network
|
||
(or, by analogy, the increase of cards and packages sent by mail and
|
||
other carriers at Christmas), the network can become slow or
|
||
unresponsive. Lack of differentiation between importance of specific
|
||
packets or network flows is, therefore, a weakness of such
|
||
packet-switched networks. The network can be overloaded with data
|
||
packets all competing.
|
||
</para>
|
||
</sidebar>
|
||
<para>
|
||
In simplest terms, the traffic control tools allow somebody to enqueue
|
||
packets into the network differently based on attributes of the packet.
|
||
The various different tools each solve a different problem and many can
|
||
be combined, to implement complex rules to meet a preference or business
|
||
goal.
|
||
</para>
|
||
<para>
|
||
There are many practical reasons to consider traffic control, and many
|
||
scenarios in which using traffic control makes sense. Below are some
|
||
examples of common problems which can be solved or at least ameliorated
|
||
with these tools.
|
||
</para>
|
||
<para>
|
||
The list below is not an exhaustive list of the sorts of solutions
|
||
available to users of traffic control, but shows the
|
||
types of common problems that can be solved by using traffic control
|
||
tools to maximize the usability of the network.
|
||
</para>
|
||
<itemizedlist>
|
||
<title>Common traffic control solutions</title>
|
||
<listitem>
|
||
<para>
|
||
Limit total bandwidth to a known rate; &link-sch_tbf;,
|
||
&link-sch_htb; with child class(es).
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Limit the bandwidth of a particular user, service or client;
|
||
&link-sch_htb; classes and &elements-classifying; with a
|
||
&linux-filter;.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Maximize TCP throughput on an asymmetric link; prioritize
|
||
transmission of ACK packets, &link-wondershaper;.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Reserve bandwidth for a particular application or user;
|
||
&link-sch_htb; with children classes and &elements-classifying;.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Prefer latency sensitive traffic; &link-sch_prio; inside an
|
||
&link-sch_htb; class.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Managed oversubscribed bandwidth; &link-sch_htb; with borrowing.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Allow equitable distribution of unreserved bandwidth; &link-sch_htb;
|
||
with borrowing.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Ensure that a particular type of traffic is dropped; &linux-policer;
|
||
attached to a &linux-filter; with a &linux-drop; action.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>
|
||
Remember that, sometimes, it is simply better to purchase more
|
||
bandwidth. Traffic control does not solve all problems!
|
||
</para>
|
||
</section>
|
||
|
||
<section id="o-advantages">
|
||
<title>Advantages</title>
|
||
<para>
|
||
When properly employed, traffic control should lead to more predictable
|
||
usage of network resources and less volatile contention for these
|
||
resources. The network then meets the goals of the traffic control
|
||
configuration. Bulk download traffic can be allocated a reasonable
|
||
amount of bandwidth even as higher priority interactive traffic is
|
||
simultaneously
|
||
serviced. Even low priority data transfer such as mail can
|
||
be allocated bandwidth without tremendously affecting the other classes
|
||
of traffic.
|
||
</para>
|
||
<para>
|
||
In a larger picture, if the traffic control configuration represents
|
||
policy which has been communicated to the users, then users (and,
|
||
by extension, applications) know what to expect from the network.
|
||
</para>
|
||
<para>
|
||
</para>
|
||
</section>
|
||
|
||
<section id="o-disadvantages">
|
||
<title>Disdvantages</title>
|
||
<para>
|
||
</para>
|
||
<para>
|
||
Complexity is easily one of the most significant disadvantages of using
|
||
traffic control. There are ways to become familiar with traffic control
|
||
tools which ease the learning curve about traffic control and its
|
||
mechanisms, but identifying a traffic control misconfiguration can be
|
||
quite a challenge.
|
||
</para>
|
||
<para>
|
||
Traffic control when used appropriately can lead to more equitable
|
||
distribution of network resources. It can just as easily be installed
|
||
in an inappropriate manner leading to further and more divisive
|
||
contention for resources.
|
||
</para>
|
||
<para>
|
||
The computing resources required on a router to support a traffic
|
||
control scenario need to be capable of handling the increased cost of
|
||
maintaining the traffic control structures. Fortunately, this is a
|
||
small incremental cost, but can become more significant as the
|
||
configuration grows in size and complexity.
|
||
</para>
|
||
<para>
|
||
For personal use, there's no training cost associated with the use of
|
||
traffic control, but a company may find that purchasing more bandwidth
|
||
is a simpler solution than employing traffic control. Training
|
||
employees and ensuring depth of knowledge may be more costly than
|
||
investing in more bandwidth.
|
||
</para>
|
||
<para>
|
||
Although
|
||
traffic control on packet-switched networks covers a larger conceptual
|
||
area, you can think of traffic control as a way to provide [some of] the
|
||
statefulness of a circuit-based network to a packet-switched.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="o-queues">
|
||
<title>Queues</title>
|
||
<para>
|
||
Queues form the backdrop for all of traffic control and are the integral
|
||
concept behind scheduling. A queue is a location (or buffer) containing
|
||
a finite number of items waiting for an action or service. In
|
||
networking, a queue is the place where packets (our units) wait to be
|
||
transmitted by the hardware (the service). In the simplest model,
|
||
packets are transmitted in a first-come first-serve basis
|
||
<footnote>
|
||
<para>
|
||
This queueing model has long been used in civilized countries to
|
||
distribute scant food or provisions equitably. William Faulkner is
|
||
reputed to have walked to the front of the line for to fetch his
|
||
share of ice, proving that not everybody likes the FIFO model, and
|
||
providing us a model for considering priority queuing.
|
||
</para>
|
||
</footnote>.
|
||
In the discipline of computer networking (and more generally
|
||
computer science), this sort of a queue is known as a &sch_fifo;.
|
||
</para>
|
||
<para>
|
||
Without any other mechanisms, a queue doesn't offer any promise for
|
||
traffic control. There are only two interesting actions in a queue.
|
||
Anything entering a queue is enqueued into the queue. To remove an item
|
||
from a queue is to dequeue that item.
|
||
</para>
|
||
<para>
|
||
A queue becomes much more interesting when coupled with other mechanisms
|
||
which can delay packets, rearrange, drop and prioritize packets in
|
||
multiple queues. A queue can also use subqueues, which allow for
|
||
complexity of behaviour in a scheduling operation.
|
||
</para>
|
||
<para>
|
||
From the perspective of the higher layer software, a packet is simply
|
||
enqueued for transmission, and the manner and order in which the
|
||
enqueued packets are transmitted is immaterial to the higher layer. So,
|
||
to the higher layer, the entire traffic control system may appear as a
|
||
single queue
|
||
<footnote>
|
||
<para>
|
||
Similarly, the entire traffic control system appears as a queue or
|
||
scheduler to the higher layer which is enqueuing packets into this
|
||
layer.
|
||
</para>
|
||
</footnote>.
|
||
It is only by examining the internals of this layer that
|
||
the traffic control structures become exposed and available.
|
||
</para>
|
||
<para>
|
||
In the image below a simplified high level overview of the queues on
|
||
the transmit path of the Linux network stack:
|
||
</para>
|
||
<mediaobject id="img-Figure1">
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure1.svg" format="SVG"/>
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure1.png" format="PNG"/>
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure1.jpg" format="JPG"/>
|
||
</imageobject>
|
||
<textobject>
|
||
<phrase>Figure 1: Simplified high level overview of the queues on the transmit
|
||
path of the Linux network stack.</phrase>
|
||
</textobject>
|
||
<caption>
|
||
<para><command>Figure 1: </command><emphasis>Simplified high level overview of the queues on the transmit path of the Linux network stack</emphasis>.
|
||
</para>
|
||
</caption>
|
||
</mediaobject>
|
||
<para>
|
||
See <link linkend="o-nic">2.9</link> for details about NIC interface and <link linkend="c-driver-queue">4.9</link>
|
||
for details about <link linkend="c-driver-queue">driver queue</link>.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="o-flows">
|
||
<title>Flows</title>
|
||
<para>
|
||
A flow is a distinct connection or conversation between two hosts. Any
|
||
unique set of packets between two hosts can be regarded as a flow.
|
||
Under TCP the concept of a connection with a source IP and port and
|
||
destination IP and port represents a flow. A UDP flow can be similarly
|
||
defined.
|
||
</para>
|
||
<para>
|
||
Traffic control mechanisms frequently separate traffic into classes of
|
||
flows which can be aggregated and transmitted as an aggregated flow
|
||
(consider DiffServ). Alternate mechanisms may attempt to divide
|
||
bandwidth equally based on the individual flows.
|
||
</para>
|
||
<para>
|
||
Flows become important when attempting to divide bandwidth equally among
|
||
a set of competing flows, especially when some applications deliberately
|
||
build a large number of flows.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="o-tokens">
|
||
<title>Tokens and buckets</title>
|
||
<anchor id="o-buckets" xreflabel="Tokens and buckets"/>
|
||
<para>
|
||
Two of the key underpinnings of a &elements-shaping; mechanisms are
|
||
the interrelated concepts of tokens and buckets.
|
||
</para>
|
||
<para>
|
||
In order to control the rate of dequeuing, an implementation can count
|
||
the number of packets or bytes dequeued as each item is dequeued,
|
||
although this requires complex usage of timers and measurements to limit
|
||
accurately. Instead of calculating the current usage and time, one
|
||
method, used widely in traffic control, is to generate tokens at a
|
||
desired rate, and only dequeue packets or bytes if a token is available.
|
||
</para>
|
||
<para>
|
||
Consider the analogy of an amusement park ride with a queue of people
|
||
waiting to experience the ride. Let's imagine a track on which carts
|
||
traverse a fixed track. The carts arrive at the head of the queue at a
|
||
fixed rate. In order to enjoy the ride, each person must wait for an
|
||
available cart. The cart is analogous to a token and the person is
|
||
analogous to a packet. Again, this mechanism is a rate-limiting or
|
||
&elements-shaping; mechanism. Only a certain number of people can
|
||
experience the ride in a particular period.
|
||
</para>
|
||
<para>
|
||
To extend the analogy, imagine an empty line for the amusement park
|
||
ride and a large number of carts sitting on the track ready to carry
|
||
people. If a large number of people entered the line together many
|
||
(maybe all) of them could experience the ride because of the carts
|
||
available and waiting. The number of carts available is a concept
|
||
analogous to the bucket. A bucket contains a number of tokens and can
|
||
use all of the tokens in bucket without regard for passage of time.
|
||
</para>
|
||
<para>
|
||
And to complete the analogy, the carts on the amusement park ride (our
|
||
tokens) arrive at a fixed rate and are only kept available up to the
|
||
size of the bucket. So, the bucket is filled with tokens according to
|
||
the rate, and if the tokens are not used, the bucket can fill up. If
|
||
tokens are used the bucket will not fill up. Buckets are a key concept
|
||
in supporting bursty traffic such as HTTP.
|
||
</para>
|
||
<para>
|
||
The &link-sch_tbf; qdisc is a classical example of a shaper (the section
|
||
on &sch_tbf; includes a diagram which may help to visualize the token
|
||
and bucket concepts). The &sch_tbf; generates ¶m-rate; tokens and
|
||
only transmits packets when a token is available. Tokens are a generic
|
||
shaping concept.
|
||
</para>
|
||
<para>
|
||
In the case that a queue does not need tokens immediately, the tokens
|
||
can be collected until they are needed. To collect tokens indefinitely
|
||
would negate any benefit of shaping so tokens are collected until a
|
||
certain number of tokens has been reached. Now, the queue has tokens
|
||
available for a large number of packets or bytes which need to be
|
||
dequeued. These intangible tokens are stored in an intangible bucket,
|
||
and the number of tokens that can be stored depends on the size of the
|
||
bucket.
|
||
</para>
|
||
<para>
|
||
This also means that a bucket full of tokens may be available at any
|
||
instant. Very predictable regular traffic can be handled by small
|
||
buckets. Larger buckets may be required for burstier traffic, unless
|
||
one of the desired goals is to reduce the burstiness of the
|
||
&concepts-flows;.
|
||
</para>
|
||
<para>
|
||
In summary, tokens are generated at rate, and a maximum of a bucket's
|
||
worth of tokens may be collected. This allows bursty traffic to be
|
||
handled, while smoothing and shaping the transmitted traffic.
|
||
</para>
|
||
<para>
|
||
The concepts of tokens and buckets are closely interrelated and are used
|
||
in both &link-sch_tbf; (one of the &classless-qdiscs;) and
|
||
&link-sch_htb; (one of the &classful-qdiscs;).
|
||
Within the &tcng; language, the use of two- and three-color meters is
|
||
indubitably a token and bucket concept.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="o-packets">
|
||
<title>Packets and frames</title>
|
||
<anchor id="o-frames"/>
|
||
<para>
|
||
The terms for data sent across network changes depending on the layer
|
||
the user is examining. This document will rather impolitely (and
|
||
incorrectly) gloss over the technical distinction between
|
||
packets and frames although they are outlined here.
|
||
</para>
|
||
<para>
|
||
The word frame is typically used to describe a layer 2 (data link) unit
|
||
of data to be forwarded to the next recipient. Ethernet interfaces, PPP
|
||
interfaces, and T1 interfaces all name their layer 2 data unit a frame.
|
||
The frame is actually the unit on which traffic control is performed.
|
||
</para>
|
||
<para>
|
||
A packet, on the other hand, is a higher layer concept, representing
|
||
layer 3 (network) units. The term packet is preferred in this
|
||
documentation, although it is slightly inaccurate.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="o-nic">
|
||
<title>NIC, Network Interface Controller</title>
|
||
<para>
|
||
A network interface controller is a computer hardware component,
|
||
differently from previous ones thar are software components, that
|
||
connects a computer to a computer network. The network controller
|
||
implements the electronic circuitry required to communicate using a
|
||
specific data link layer and physical layer standard such as
|
||
Ethernet, Fibre Channel, Wi-Fi or Token Ring. Traffic control must
|
||
deal with the physical constraints and characteristics of the NIC
|
||
interface.
|
||
</para>
|
||
<section id="o-huge-packet">
|
||
<title>Huge Packets from the Stack</title>
|
||
<para>
|
||
Most NICs have a fixed maximum transmission unit
|
||
(<acronym>MTU</acronym>) which is the biggest frame which can be
|
||
transmitted by the physical medium. For Ethernet the default MTU
|
||
is 1500 bytes but some Ethernet networks support Jumbo Frames
|
||
of up to 9000 bytes. Inside the IP network stack, the MTU can
|
||
manifest as a limit on the size of the packets which are sent to
|
||
the device for transmission. For example, if an application
|
||
writes 2000 bytes to a TCP socket then the IP stack needs to
|
||
create two IP packets to keep the packet size less than or equal
|
||
to a 1500 byte MTU. For large data transfers the comparably small
|
||
MTU causes a large number of small packets to be created and
|
||
transferred through the <link linkend="c-driver-queue">driver
|
||
queue</link>.
|
||
</para>
|
||
<para>
|
||
In order to avoid the overhead associated with a large number of packets on the transmit path, the Linux kernel implements several optimizations: TCP segmentation offload (TSO), UDP fragmentation offload (UFO) and generic segmentation offload (GSO). All of these optimizations allow the IP stack to create packets which are larger than the MTU of the outgoing NIC. For IPv4, packets as large as the IPv4 maximum of 65,536 bytes can be created and queued to the <link linkend="c-driver-queue">driver queue</link>. In the case of TSO and UFO, the NIC hardware takes responsibility for breaking the single large packet into packets small enough to be transmitted on the physical interface. For NICs without hardware support, GSO performs the same operation in software immediately before queueing to the <link linkend="c-driver-queue">driver queue</link>.
|
||
</para>
|
||
<para>
|
||
Recall from earlier that the <link linkend="c-driver-queue">driver queue</link> contains a fixed number of descriptors which each point to packets of varying sizes, Since TSO, UFO and GSO allow for much larger packets these optimizations have the side effect of greatly increasing the number of bytes which can be queued in the <link linkend="c-driver-queue">driver queue</link>. Figure 3 illustrates this concept.
|
||
</para>
|
||
<mediaobject id="img-Figure2">
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure2.svg" format="SVG"/>
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure2.png" format="PNG"/>
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure2.jpg" format="JPG"/>
|
||
</imageobject>
|
||
<textobject>
|
||
<phrase>Figure 2: Large packets can be sent to the NIC when TSO, UFO or GSO are enabled. This can greatly increase the number of bytes in the queudrivere.
|
||
</phrase>
|
||
</textobject>
|
||
<caption>
|
||
<para><command>Figure 2:</command><emphasis> Large packets can be sent to the NIC when TSO, UFO or GSO are enabled. This can greatly increase the number of bytes in the <link linkend="c-driver-queue">driver queue</link>.</emphasis>
|
||
</para>
|
||
</caption>
|
||
</mediaobject>
|
||
</section>
|
||
</section>
|
||
|
||
<section id="o-starv-lat">
|
||
<title>Starvation and Latency</title>
|
||
<para>
|
||
The queue between the IP stack and the hardware (see <link linkend="c-driver-queue">chapter 4.2</link> for detail about <link linkend="c-driver-queue">driver queue</link> or see <link linkend="s-ethtool">chapter 5.5</link> for how manage it) introduces two problems: starvation and latency.
|
||
</para>
|
||
<para>
|
||
If the NIC driver wakes to pull packets off of the queue for transmission and the queue is empty the hardware will miss a transmission opportunity thereby reducing the throughput of the system. This is referred to as starvation. Note that an empty queue when the system does not have anything to transmit is not starvation – this is normal. The complication associated with avoiding starvation is that the IP stack which is filling the queue and the hardware driver draining the queue run asynchronously. Worse, the duration between fill or drain events varies with the load on the system and external conditions such as the network interface’s physical medium. For example, on a busy system the IP stack will get fewer opportunities to add packets to the buffer which increases the chances that the hardware will drain the buffer before more packets are queued. For this reason it is advantageous to have a very large buffer to reduce the probability of starvation and ensures high throughput.
|
||
</para>
|
||
<para>
|
||
While a large queue is necessary for a busy system to maintain high throughput, it has the downside of allowing for the introduction of a large amount of latency.
|
||
</para>
|
||
<mediaobject id="img-Figure3">
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure3.svg" format="SVG"/>
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure3.png" format="PNG"/>
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure3.jpg" format="JPG"/>
|
||
</imageobject>
|
||
<textobject>
|
||
<phrase>Figure 3: Interactive packet (yellow) behind bulk flow packets (blue)
|
||
</phrase>
|
||
</textobject>
|
||
<caption>
|
||
<para><command>Figure 3:</command> <emphasis>Interactive packet (yellow) behind bulk flow packets (blue)</emphasis>
|
||
</para>
|
||
</caption>
|
||
</mediaobject>
|
||
<para>
|
||
Figure 3 shows a <link linkend="c-driver-queue">driver queue</link> which is almost full with TCP segments for a single high bandwidth, bulk traffic flow (blue). Queued last is a packet from a VoIP or gaming flow (yellow). Interactive applications like VoIP or gaming typically emit small packets at fixed intervals which are latency sensitive while a high bandwidth data transfer generates a higher packet rate and larger packets. This higher packet rate can fill the buffer between interactive packets causing the transmission of the interactive packet to be delayed. To further illustrate this behaviour consider a scenario based on the following assumptions:
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
A network interface which is capable of transmitting at 5 Mbit/sec or 5,000,000 bits/sec.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Each packet from the bulk flow is 1,500 bytes or 12,000 bits.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
Each packet from the interactive flow is 500 bytes.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
The depth of the queue is 128 descriptors
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
There are 127 bulk data packets and 1 interactive packet queued last.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>
|
||
Given the above assumptions, the time required to drain the 127 bulk packets and create a transmission opportunity for the interactive packet is (127 * 12,000) / 5,000,000 = 0.304 seconds (304 milliseconds for those who think of latency in terms of ping results). This amount of latency is well beyond what is acceptable for interactive applications and this does not even represent the complete round trip time – it is only the time required transmit the packets queued before the interactive one. As described earlier, the size of the packets in the <link linkend="c-driver-queue">driver queue</link> can be larger than 1,500 bytes if TSO, UFO or GSO are enabled. This makes the latency problem correspondingly worse.
|
||
</para>
|
||
<para>
|
||
Choosing the correct size for the <link linkend="c-driver-queue">driver queue</link> is a Goldilocks problem – it can’t be too small or throughput suffers, it can’t be too big or latency suffers.
|
||
</para>
|
||
|
||
|
||
|
||
</section>
|
||
|
||
<section id="o-throughput-latency">
|
||
<title>Relationship between throughput and latency</title>
|
||
<para>
|
||
In all traffic control systems, there is a relationship between
|
||
throughput and latency. The maximum information rate of a network link
|
||
is termed bandwidth, but for the user of a network the actually achieved
|
||
bandwidth has a dedicated term, throughput.
|
||
</para>
|
||
<variablelist>
|
||
<varlistentry id="otl-latency">
|
||
<term>latency</term>
|
||
<listitem>
|
||
<para>
|
||
the delay in time between a sender's transmission and the
|
||
recipient's decoding or receiving of the data; always non-negative
|
||
and non-zero (time doesn't move backwards, then)
|
||
</para>
|
||
<para>
|
||
in principle, latency is unidirectional, however almost the entire
|
||
Internet networking community talks about bidirectional delay
|
||
—the delay in time between a sender's transmission of data
|
||
and some sort of acknowledgement of receipt of that data; cf.
|
||
<command>ping</command>
|
||
</para>
|
||
<para>
|
||
measured in milliseconds (ms); on Ethernet, latencies are
|
||
typically between 0.3 and 1.0 ms and on wide-area networks (i.e.
|
||
to your ISP, across a large campus or to a remote server) between
|
||
5 to 300 ms
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry id="otl-throughput">
|
||
<term>throughput</term>
|
||
<listitem>
|
||
<para>
|
||
a measure of the total amount of data that can be transmitted
|
||
successfully between a sender and receiver
|
||
</para>
|
||
<para>
|
||
measured in bits per second; the measurement most often
|
||
quoted by complaining users after buying a 10Mbit/s package from
|
||
their provider and receiving 8.2Mbit/s.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
<note>
|
||
<para>
|
||
Latency and throughput are general computing terms. For example,
|
||
application developers speak of user-perceived latency when trying to
|
||
build responsive tools. Database and filesystem people speak about
|
||
disk throughput. And, above the network layer, latency of a website
|
||
name lookup in DNS is a major contributor to the perceived performance
|
||
of a website. The remainder of this document concerns latency in the
|
||
network domain, specifically the IP network layer.
|
||
</para>
|
||
</note>
|
||
<para>
|
||
During the millenial fin de siècle, many developed world network service
|
||
providers had learned that users were interested in the highest possible
|
||
download throughput (the above mentioned 10Mbit/s bandwidth figure).
|
||
</para>
|
||
<para>
|
||
In order to maximize this download throughput, gear vendors and
|
||
providers commonly tuned their equipment to hold a large number of data
|
||
packets. When the network was ready to accept another packet, the
|
||
network gear was certain to have one in its queue and could simply send
|
||
another packet. In practice, this meant that the user, who was
|
||
measuring download throughput, would receive a high number and was
|
||
likely to be happy. This was desirable for the provider because the
|
||
delivered throughput could more likely meet the advertised number.
|
||
</para>
|
||
<para>
|
||
This technique effectively maximized throughput, at the cost of latency.
|
||
Imagine that a high priority packet is waiting at the end of the big
|
||
queue of packets mentioned above. Perhaps, the theoretical latency of
|
||
the packet on this network might be 100ms, but it needs to wait its turn
|
||
in the queue to be transmitted.
|
||
</para>
|
||
<para>
|
||
While the decision to maximize
|
||
throughput has been wildly successful, the effect on latency is
|
||
significant.
|
||
</para>
|
||
<anchor id="o-bufferbloat"/>
|
||
<para>
|
||
Despite a general warning from Stuart Cheshire in the mid-1990s called
|
||
<ulink url="http://www.stuartcheshire.org/rants/Latency.html">It's the Latency, Stupid</ulink>,
|
||
it took the novel term, bufferbloat, widely publicized about 15
|
||
years later by Jim Getty in an ACM Queue article
|
||
<ulink url="http://queue.acm.org/detail.cfm?id=2071893">Bufferbloat: Dark Buffers in the Internet</ulink>
|
||
and a
|
||
<ulink url="https://gettys.wordpress.com/bufferbloat-faq/">Bufferbloat FAQ</ulink>
|
||
in his blog, to bring some focus onto the choice for maximizing
|
||
throughput that both gear vendors and providers preferred.
|
||
</para>
|
||
<para>
|
||
The relationship (tension) between latency and throughput in
|
||
packet-switched networks have been well-known in the academic,
|
||
networking and Linux development community. Linux traffic control core
|
||
data structures date back to the 1990s and have been continuously
|
||
developed and extended with new schedulers and features.
|
||
</para>
|
||
</section>
|
||
|
||
|
||
</section>
|
||
|
||
<!-- end of file -->
|