mirror of https://github.com/tLDP/LDP
595 lines
27 KiB
XML
595 lines
27 KiB
XML
<!-- start of file -->
|
||
|
||
<!-- This .xml file is part of the Traffic-Control-HOWTO document -->
|
||
|
||
<!-- $Id$ -->
|
||
|
||
<!--
|
||
|
||
The article was authored by Martin A. Brown <martin@linux-ip.net>
|
||
for the linux community, and has been released under the GNU Free
|
||
Documentation License (GFDL) through The Linux Documentation
|
||
Project (TLDP).
|
||
|
||
This HOWTO is likely available at the following address:
|
||
|
||
http://tldp.org/HOWTO/Traffic-Control-HOWTO/
|
||
|
||
-->
|
||
|
||
<!-- conventions used in this documentation....
|
||
|
||
- each section is a separate file
|
||
|
||
-->
|
||
|
||
<section id="components">
|
||
|
||
<title>Components of Linux Traffic Control</title>
|
||
<para>
|
||
</para>
|
||
<para>
|
||
</para>
|
||
<para>
|
||
</para>
|
||
<table id="tb-c-components-correlation">
|
||
<title>Correlation between traffic control elements and Linux
|
||
components</title>
|
||
<tgroup cols="2" align="left" colsep="1" rowsep="1">
|
||
<colspec colwidth='1*' colname="elem"/>
|
||
<colspec colwidth='3*' colname="comp"/>
|
||
<thead>
|
||
<row>
|
||
<entry>traditional element</entry>
|
||
<entry>Linux component</entry>
|
||
</row>
|
||
</thead>
|
||
<tbody>
|
||
<row>
|
||
<entry colname="elem">enqueuing;</entry>
|
||
<entry colname="comp">FIXME: receiving packets from userspace and
|
||
network.</entry>
|
||
</row>
|
||
<row>
|
||
<entry colname="elem">&elements-shaping;</entry>
|
||
<entry colname="comp">The &linux-class; offers shaping capabilities.</entry>
|
||
</row>
|
||
<row>
|
||
<entry colname="elem">&elements-scheduling;</entry>
|
||
<entry colname="comp">A &linux-qdisc; is a scheduler. Schedulers
|
||
can be simple such as the &sch_fifo; or
|
||
complex, containing classes and other
|
||
qdiscs, such as &sch_htb;.</entry>
|
||
</row>
|
||
<row>
|
||
<entry colname="elem">&elements-classifying;</entry>
|
||
<entry colname="comp">The &linux-filter; object performs the
|
||
classification through the agency of a
|
||
&linux-classifier; object. Strictly speaking,
|
||
Linux classifiers cannot exist outside
|
||
of a filter.</entry>
|
||
</row>
|
||
<row>
|
||
<entry colname="elem">&elements-policing;</entry>
|
||
<entry colname="comp">A &linux-policer; exists in the Linux traffic
|
||
control implementation only as part of a
|
||
&linux-filter;.</entry>
|
||
</row>
|
||
<row>
|
||
<entry colname="elem">&elements-dropping;</entry>
|
||
<entry colname="comp">To &linux-drop; traffic requires a
|
||
&linux-filter; with a &linux-policer; which
|
||
uses <quote>drop</quote> as an action.</entry>
|
||
</row>
|
||
<row>
|
||
<entry colname="elem">&elements-marking;</entry>
|
||
<entry colname="comp">The &linux-dsmark; &linux-qdisc; is used for
|
||
marking.</entry>
|
||
</row>
|
||
<row>
|
||
<entry colname="elem">enqueuing;</entry>
|
||
<entry colname="comp">Between the scheduler's &linux-qdisc; and the <link linkend="o-nic">network interface controller (NIC)</link> lies the <link linkend="c-driver-queue">driver queue</link>. The <link linkend="c-driver-queue">driver queue</link> gives the higher layers (IP stack and traffic control subsystem) a location to queue data asynchronously for the operation of the hardware. The size of that queue is automatically set by <xref linkend="c-bql"/>.</entry>
|
||
</row>
|
||
|
||
</tbody>
|
||
</tgroup>
|
||
</table>
|
||
|
||
<section id="c-qdisc">
|
||
<title><constant>qdisc</constant></title>
|
||
<para>
|
||
Simply put, a qdisc is a scheduler
|
||
(<xref linkend="e-scheduling"/>). Every output interface needs a
|
||
scheduler of some kind, and the default scheduler is a &sch_fifo;.
|
||
Other qdiscs available under Linux will rearrange the packets entering
|
||
the scheduler's queue in accordance with that scheduler's rules.
|
||
</para>
|
||
<para>
|
||
The qdisc is the major building block on which all of Linux traffic
|
||
control is built, and is also called a queuing discipline.
|
||
</para>
|
||
<para>
|
||
The &classful-qdiscs; can contain &linux-class;es, and provide a handle
|
||
to which to attach &linux-filter;s. There is no prohibition on using a
|
||
classful qdisc without child classes, although this will usually consume
|
||
cycles and other system resources for no benefit.
|
||
</para>
|
||
<para>
|
||
The &classless-qdiscs; can contain no classes, nor is it possible to
|
||
attach filter to a classless qdisc. Because a classless qdisc contains
|
||
no children of any kind, there is no utility to &elements-classifying;.
|
||
This means that no filter can be attached to a classless qdisc.
|
||
</para>
|
||
<para>
|
||
A source of terminology confusion is the usage of the terms
|
||
&root-qdisc; and &ingress-qdisc;. These are not
|
||
really queuing disciplines, but rather locations onto which traffic
|
||
control structures can be attached for egress (outbound traffic) and
|
||
ingress (inbound traffic).
|
||
</para>
|
||
<para>
|
||
Each interface contains both. The primary and more common is the
|
||
egress qdisc, known as the &root-qdisc;. It can contain any
|
||
of the queuing disciplines (&linux-qdisc;s) with potential
|
||
&linux-class;es and class structures. The overwhelming majority of
|
||
documentation applies to the &root-qdisc; and its children. Traffic
|
||
transmitted on an interface traverses the egress or &root-qdisc;.
|
||
</para>
|
||
<para>
|
||
For traffic accepted on an interface, the &ingress-qdisc; is traversed.
|
||
With its limited utility, it allows no child &linux-class; to be
|
||
created, and only exists as an object onto which a &linux-filter; can be
|
||
attached. For practical purposes, the &ingress-qdisc; is merely a
|
||
convenient object onto which to attach a &linux-policer; to limit the
|
||
amount of traffic accepted on a network interface.
|
||
</para>
|
||
<para>
|
||
In short, you can do much more with an egress qdisc because it contains
|
||
a real qdisc and the full power of the traffic control system. An
|
||
&ingress-qdisc; can only support a policer. The remainder of the
|
||
documentation will concern itself with traffic control structures
|
||
attached to the &root-qdisc; unless otherwise specified.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="c-class">
|
||
<title><constant>class</constant></title>
|
||
<para>
|
||
Classes only exist inside a classful &linux-qdisc; (⪚, &link-sch_htb;
|
||
and &link-sch_cbq;). Classes are immensely flexible and can always
|
||
contain either multiple children classes or a single child qdisc
|
||
<footnote>
|
||
<para>
|
||
A classful qdisc can only have children classes of its type. For
|
||
example, an HTB qdisc can only have HTB classes as children. A CBQ
|
||
qdisc cannot have HTB classes as children.
|
||
</para>
|
||
</footnote>.
|
||
There is no prohibition against a class containing a classful qdisc
|
||
itself, which facilitates tremendously complex traffic control
|
||
scenarios.
|
||
</para>
|
||
<para>
|
||
Any class can also have an arbitrary number of &linux-filter;s attached
|
||
to it, which allows the selection of a child class or the use of a
|
||
filter to reclassify or drop traffic entering a particular class.
|
||
</para>
|
||
<para>
|
||
A leaf class is a terminal class in a qdisc. It contains a qdisc
|
||
(default &link-sch_fifo;) and will never contain a child class. Any
|
||
class which contains a child class is an inner class (or root class) and
|
||
not a leaf class.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="c-filter">
|
||
<title><constant>filter</constant></title>
|
||
<para>
|
||
The filter is the most complex component in the Linux
|
||
traffic control system. The filter provides a convenient mechanism for
|
||
gluing together several of the key elements of traffic control. The
|
||
simplest and most obvious role of the filter is to classify
|
||
(see <xref linkend="e-classifying"/>) packets. Linux filters allow the
|
||
user to classify packets into an output queue with either several
|
||
different filters or a single filter.
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
A filter must contain a &linux-classifier; phrase.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
A filter may contain a &linux-policer; phrase.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>
|
||
Filters can be attached either to classful &linux-qdisc;s or to
|
||
&linux-class;es, however the enqueued packet always enters the root
|
||
qdisc first. After the filter attached to the root qdisc has been
|
||
traversed, the packet may be directed to any subclasses (which can have
|
||
their own filters) where the packet may undergo further classification.
|
||
</para>
|
||
<!-- FIXME; discussion not complete here -->
|
||
<para>
|
||
</para>
|
||
</section>
|
||
|
||
<section id="c-classifier">
|
||
<title>classifier</title>
|
||
<para>
|
||
Filter objects, which can be manipulated using &link-tc;, can use several
|
||
different classifying mechanisms, the most common of which is the
|
||
&cls_u32; classifier. The &cls_u32; classifier allows the user to
|
||
select packets based on attributes of the packet.
|
||
</para>
|
||
<para>
|
||
The classifiers are tools which can be used as part of a &linux-filter;
|
||
to identify characteristics of a packet or a packet's metadata. The
|
||
Linux classfier object is a direct analogue to the basic operation and
|
||
elemental mechanism of traffic control &elements-classifying;.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="c-police">
|
||
<title>policer</title>
|
||
<para>
|
||
This elemental mechanism is only used in Linux traffic control as part
|
||
of a &linux-filter;. A policer calls one action above and another
|
||
action below the specified rate. Clever use of policers can simulate
|
||
a three-color meter. See also
|
||
<xref linkend="diagram"/>.
|
||
</para>
|
||
<para>
|
||
Although both &elements-policing; and &elements-shaping; are basic
|
||
elements of traffic control for limiting bandwidth usage a policer will
|
||
never delay traffic. It can only perform an action based on specified
|
||
criteria. See also
|
||
<xref linkend="ex-s-iproute2-tc-filter"/>.
|
||
</para>
|
||
<para>
|
||
</para>
|
||
<para>
|
||
</para>
|
||
</section>
|
||
|
||
<section id="c-drop">
|
||
<title><constant>drop</constant></title>
|
||
<para>
|
||
This basic traffic control mechanism is only used in Linux traffic
|
||
control as part of a &linux-policer;. Any policer attached to
|
||
any &linux-filter; could have a &linux-drop; action.
|
||
</para>
|
||
<note>
|
||
<para>
|
||
The only place in the Linux traffic control system where a packet can be
|
||
explicitly dropped is a policer. A policer can limit packets enqueued
|
||
at a specific rate, or it can be configured to drop all traffic matching
|
||
a particular pattern
|
||
<footnote>
|
||
<para>
|
||
In this case, you'll have a &linux-filter; which uses a
|
||
&linux-classifier; to select the packets you wish to drop. Then
|
||
you'll use a &linux-policer; with a with a drop action like this
|
||
<command>police rate 1bps burst 1 action drop/drop</command>.
|
||
</para>
|
||
</footnote>.
|
||
</para>
|
||
</note>
|
||
<para>
|
||
There are, however, places within the traffic control system where a
|
||
packet may be dropped as a side effect. For example, a packet will be
|
||
dropped if the scheduler employed uses this method to control flows as
|
||
the &link-sch_gred; does.
|
||
</para>
|
||
<para>
|
||
Also, a shaper or scheduler which runs out of its allocated buffer space
|
||
may have to drop a packet during a particularly bursty or overloaded
|
||
period.
|
||
</para>
|
||
<para>
|
||
</para>
|
||
</section>
|
||
|
||
<section id="c-handle">
|
||
<title><constant>handle</constant></title>
|
||
<para>
|
||
Every &linux-class; and classful &linux-qdisc; (see also
|
||
<xref linkend="classful-qdiscs"/>) requires a unique identifier within
|
||
the traffic control structure. This unique identifier is known as a
|
||
handle and has two constituent members, a major number and a minor
|
||
number. These numbers can be assigned arbitrarily by the user in
|
||
accordance with the following rules
|
||
<footnote>
|
||
<para>
|
||
I do not know the range nor base of these numbers. I believe they
|
||
are u32 hexadecimal, but need to confirm this.
|
||
</para>
|
||
</footnote>.
|
||
</para>
|
||
<para>
|
||
</para>
|
||
<variablelist>
|
||
<title>The numbering of handles for classes and qdiscs</title>
|
||
<varlistentry>
|
||
<term><parameter>major</parameter></term>
|
||
<listitem>
|
||
<para>
|
||
This parameter is completely free of meaning to the kernel. The
|
||
user may use an arbitrary numbering scheme, however all objects in
|
||
the traffic control structure with the same parent must share a
|
||
<parameter>major</parameter> handle number. Conventional
|
||
numbering schemes start at 1 for objects attached directly to the
|
||
&root-qdisc;.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><parameter>minor</parameter></term>
|
||
<listitem>
|
||
<para>
|
||
This parameter unambiguously identifies the object as a qdisc if
|
||
<parameter>minor</parameter> is 0. Any other value identifies the
|
||
object as a class. All classes sharing a parent must have unique
|
||
<parameter>minor</parameter> numbers.
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
<para>
|
||
The special handle ffff:0 is reserved for the &ingress-qdisc;.
|
||
</para>
|
||
<para>
|
||
The handle is used as the target in <parameter>classid</parameter> and
|
||
<parameter>flowid</parameter> phrases of &tc; &linux-filter; statements.
|
||
These handles are external identifiers for the objects, usable by
|
||
userland applications. The kernel maintains internal identifiers for
|
||
each object.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="c-txqueuelen">
|
||
<title><constant>txqueuelen</constant></title>
|
||
<para>
|
||
The current size of the transmission queue can be obtained from the
|
||
<command>ip</command> and <command>ifconfig</command> commands.
|
||
Confusingly, these commands name the transmission queue length
|
||
differently (emphasized text below):
|
||
</para>
|
||
<programlisting>
|
||
$ifconfig eth0
|
||
|
||
eth0 Link encap:Ethernet HWaddr 00:18:F3:51:44:10
|
||
inet addr:69.41.199.58 Bcast:69.41.199.63 Mask:255.255.255.248
|
||
inet6 addr: fe80::218:f3ff:fe51:4410/64 Scope:Link
|
||
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
|
||
RX packets:435033 errors:0 dropped:0 overruns:0 frame:0
|
||
TX packets:429919 errors:0 dropped:0 overruns:0 carrier:0
|
||
collisions:0 <emphasis>txqueuelen:1000</emphasis>
|
||
RX bytes:65651219 (62.6 MiB) TX bytes:132143593 (126.0 MiB)
|
||
Interrupt:23
|
||
</programlisting>
|
||
|
||
<programlisting>
|
||
$ip link
|
||
|
||
1: lo: mtu 16436 qdisc noqueue state UNKNOWN
|
||
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
|
||
2: eth0: mtu 1500 qdisc pfifo_fast state UP <emphasis>qlen 1000</emphasis>
|
||
link/ether 00:18:f3:51:44:10 brd ff:ff:ff:ff:ff:ff
|
||
</programlisting>
|
||
|
||
<para>
|
||
The length of the transmission queue in Linux defaults to 1000
|
||
packets which could represent a large amount of buffering especially
|
||
at low bandwidths. (To understand why, see the discussion on latency
|
||
and throughput, specifically
|
||
<link linkend="o-bufferbloat">bufferbloat</link>).</para>
|
||
<para>
|
||
More interestingly, <constant>txqueuelen</constant> is only used as a default queue
|
||
length for these queueing disciplines.
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<constant>pfifo_fast</constant> (Linux default queueing discipline)
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<constant>sch_fifo</constant>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<constant>sch_gred</constant>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<constant>sch_htb</constant> (only for the default queue)
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<constant>sch_plug</constant>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<constant>sch_sfb</constant>
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<constant>sch_teql</constant>
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>
|
||
Looking back at Figure 1, the txqueuelen parameter controls the size of the queues in the Queueing Discipline box for the QDiscs listed above. For most of these queueing disciplines, the “limit” argument on the tc command line overrides the txqueuelen default. In summary, if you do not use one of the above queueing disciplines or if you override the queue length then the txqueuelen value is meaningless.
|
||
</para>
|
||
<para>
|
||
The length of the transmission queue is configured with the ip or ifconfig commands.
|
||
</para>
|
||
<programlisting>
|
||
ip link set txqueuelen 500 dev eth0
|
||
</programlisting>
|
||
<para>
|
||
Notice that the ip command uses “txqueuelen” but when displaying the interface details it uses “qlen”.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="c-driver-queue">
|
||
<title>Driver Queue (aka ring buffer)</title>
|
||
<para>
|
||
Between the IP stack and the network interface controller (NIC) lies the driver queue. This queue is typically implemented as a first-in, first-out (FIFO) ring buffer – just think of it as a fixed sized buffer. The driver queue does not contain packet data. Instead it consists of descriptors which point to other data structures called socket kernel buffers (SKBs) which hold the packet data and are used throughout the kernel.
|
||
</para>
|
||
<mediaobject id="img-Figure4">
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure4.svg" format="SVG"/>
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure4.png" format="PNG"/>
|
||
</imageobject>
|
||
<imageobject>
|
||
<imagedata fileref="images/Figure4.jpg" format="JPG"/>
|
||
</imageobject>
|
||
<textobject>
|
||
<phrase>Figure 4: Partially full driver queue with descriptors pointing to SKBs</phrase>
|
||
</textobject>
|
||
<caption>
|
||
<para><command>Figure 4: </command><emphasis>Partially full driver queue with descriptors pointing to SKBs</emphasis></para>
|
||
</caption>
|
||
</mediaobject>
|
||
<para>
|
||
The input source for the driver queue is the IP stack which queues complete IP packets. The packets may be generated locally or received on one NIC to be routed out another when the device is functioning as an IP router. Packets added to the driver queue by the IP stack are dequeued by the hardware driver and sent across a data bus to the NIC hardware for transmission.
|
||
</para>
|
||
<para>
|
||
The reason the driver queue exists is to ensure that whenever the system has data to transmit, the data is available to the NIC for immediate transmission. That is, the driver queue gives the IP stack a location to queue data asynchronously from the operation of the hardware. One alternative design would be for the NIC to ask the IP stack for data whenever the physical medium is ready to transmit. Since responding to this request cannot be instantaneous this design wastes valuable transmission opportunities resulting in lower throughput. The opposite approach would be for the IP stack to wait after a packet is created until the hardware is ready to transmit. This is also not ideal because the IP stack cannot move on to other work.
|
||
</para>
|
||
<para>
|
||
For detail how to set driver queue see <link linkend="s-ethtool">chapter 5.5</link>.
|
||
</para>
|
||
</section>
|
||
|
||
<section id="c-bql" xreflabel="Byte Queue Limits (BQL)">
|
||
<title>Byte Queue Limits (<acronym>BQL</acronym>)</title>
|
||
<para>
|
||
Byte Queue Limits (BQL) is a new feature in recent Linux kernels (> 3.3.0) which attempts to solve the problem of driver queue sizing automatically. This is accomplished by adding a layer which enables and disables queuing to the driver queue based on calculating the minimum buffer size required to avoid <link linkend="o-starv-lat">starvation</link> under the current system conditions. Recall from earlier that the smaller the amount of queued data, the lower the maximum <link linkend="o-starv-lat">latency</link> experienced by queued packets.
|
||
</para>
|
||
<para>
|
||
It is key to understand that the actual size of the driver queue is not changed by BQL. Rather BQL calculates a limit of how much data (in bytes) can be queued at the current time. Any bytes over this limit must be held or dropped by the layers above the driver queue..
|
||
</para>
|
||
<para>
|
||
The BQL mechanism operates when two events occur: when packets are enqueued to the driver queue and when a transmission to the wire has completed. A simplified version of the BQL algorithm is outlined below. LIMIT refers to the value calculated by BQL.
|
||
</para>
|
||
<programlisting>
|
||
****
|
||
** After adding packets to the queue
|
||
****
|
||
|
||
if the number of queued bytes is over the current LIMIT value then
|
||
disable the queueing of more data to the driver queue
|
||
</programlisting>
|
||
<para>
|
||
Notice that the amount of queued data can exceed LIMIT because data is queued before the LIMIT check occurs. Since a large number of bytes can be queued in a single operation when TSO, UFO or GSO (see chapter 2.9.1 aggiungi link for details) are enabled these throughput optimizations have the side effect of allowing a higher than desirable amount of data to be queued. If you care about <link linkend="o-starv-lat">latency</link> you probably want to disable these features.
|
||
</para>
|
||
<para>
|
||
The second stage of BQL is executed after the hardware has completed a transmission (simplified pseudo-code):
|
||
</para>
|
||
<programlisting>
|
||
****
|
||
** When the hardware has completed sending a batch of packets
|
||
** (Referred to as the end of an interval)
|
||
****
|
||
|
||
if the hardware was starved in the interval
|
||
increase LIMIT
|
||
|
||
else if the hardware was busy during the entire interval (not starved) and there are bytes to transmit
|
||
decrease LIMIT by the number of bytes not transmitted in the interval
|
||
|
||
if the number of queued bytes is less than LIMIT
|
||
enable the queueing of more data to the buffer
|
||
</programlisting>
|
||
<para>
|
||
As you can see, BQL is based on testing whether the device was starved. If it was starved, then LIMIT is increased allowing more data to be queued which reduces the chance of <link linkend="o-starv-lat">starvation</link>. If the device was busy for the entire interval and there are still bytes to be transferred in the queue then the queue is bigger than is necessary for the system under the current conditions and LIMIT is decreased to constrain the <link linkend="o-starv-lat">latency</link>.
|
||
</para>
|
||
<para>
|
||
A real world example may help provide a sense of how much BQL affects the amount of data which can be queued. On one of my servers the driver queue size defaults to 256 descriptors. Since the Ethernet MTU is 1,500 bytes this means up to 256 * 1,500 = 384,000 bytes can be queued to the driver queue (TSO, GSO etc are disabled or this would be much higher). However, the limit value calculated by BQL is 3,012 bytes. As you can see, BQL greatly constrains the amount of data which can be queued.
|
||
</para>
|
||
<para>
|
||
An interesting aspect of BQL can be inferred from the first word in the name – byte. Unlike the size of the driver queue and most other packet queues, BQL operates on bytes. This is because the number of bytes has a more direct relationship with the time required to transmit to the physical medium than the number of packets or descriptors since the later are variably sized.
|
||
</para>
|
||
<para>
|
||
BQL reduces network <link linkend="o-starv-lat">latency</link> by limiting the amount of queued data to the minimum required to avoid <link linkend="o-starv-lat">starvation</link>. It also has the very important side effect of moving the point where most packets are queued from the driver queue which is a simple FIFO to the queueing discipline (QDisc) layer which is capable of implementing much more complicated queueing strategies. The next section introduces the Linux QDisc layer.
|
||
</para>
|
||
|
||
<section>
|
||
<title>Set BQL</title>
|
||
<para>
|
||
The BQL algorithm is self tuning so you probably don’t need to mess with this too much. However, if you are concerned about optimal latencies at low bitrates then you may want override the upper limit on the calculated LIMIT value. BQL state and configuration can be found in a /sys directory based on the location and name of the NIC. On my server the directory for eth0 is:
|
||
</para>
|
||
<programlisting>
|
||
/sys/devices/pci0000:00/0000:00:14.0/net/eth0/queues/tx-0/byte_queue_limits
|
||
</programlisting>
|
||
<para>
|
||
The files in this directory are:
|
||
</para>
|
||
<itemizedlist>
|
||
<listitem>
|
||
<para>
|
||
<emphasis>hold_time:</emphasis> Time between modifying LIMIT in milliseconds.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<emphasis>inflight:</emphasis> The number of queued but not yet transmitted bytes.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<emphasis>limit:</emphasis> The LIMIT value calculated by BQL. 0 if BQL is not supported in the NIC driver.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<emphasis>limit_max:</emphasis> A configurable maximum value for LIMIT. Set this value lower to optimize for <link linkend="o-starv-lat">latency</link>.
|
||
</para>
|
||
</listitem>
|
||
<listitem>
|
||
<para>
|
||
<emphasis>limit_min:</emphasis> A configurable minimum value for LIMIT. Set this value higher to optimize for throughput.
|
||
</para>
|
||
</listitem>
|
||
</itemizedlist>
|
||
<para>
|
||
To place a hard upper limit on the number of bytes which can be queued write the new value to the limit_max fie.
|
||
</para>
|
||
<programlisting>
|
||
echo "3000" > limit_max
|
||
</programlisting>
|
||
|
||
</section>
|
||
|
||
</section>
|
||
|
||
|
||
|
||
<!--
|
||
<section id="c-dsmark">
|
||
<title><constant>dsmark</constant></title>
|
||
<para>
|
||
</para>
|
||
<para>
|
||
</para>
|
||
</section>
|
||
-->
|
||
|
||
</section>
|
||
|
||
<!-- end of file -->
|