<entrycolname="comp">Between the scheduler's &linux-qdisc; and the <linklinkend="o-nic">network interface controller (NIC)</link> lies the <linklinkend="c-driver-queue">driver queue</link>. The <linklinkend="c-driver-queue">driver queue</link> gives the higher layers (IP stack and traffic control subsystem) a location to queue data asynchronously for the operation of the hardware. The size of that queue is automatically set by <xreflinkend="c-bql"/>.</entry>
<constant>sch_htb</constant> (only for the default queue)
</para>
</listitem>
<listitem>
<para>
<constant>sch_plug</constant>
</para>
</listitem>
<listitem>
<para>
<constant>sch_sfb</constant>
</para>
</listitem>
<listitem>
<para>
<constant>sch_teql</constant>
</para>
</listitem>
</itemizedlist>
<para>
Looking back at Figure 1, the txqueuelen parameter controls the size of the queues in the Queueing Discipline box for the QDiscs listed above. For most of these queueing disciplines, the “limit” argument on the tc command line overrides the txqueuelen default. In summary, if you do not use one of the above queueing disciplines or if you override the queue length then the txqueuelen value is meaningless.
</para>
<para>
The length of the transmission queue is configured with the ip or ifconfig commands.
</para>
<programlisting>
ip link set txqueuelen 500 dev eth0
</programlisting>
<para>
Notice that the ip command uses “txqueuelen” but when displaying the interface details it uses “qlen”.
</para>
</section>
<sectionid="c-driver-queue">
<title>Driver Queue (aka ring buffer)</title>
<para>
Between the IP stack and the network interface controller (NIC) lies the driver queue. This queue is typically implemented as a first-in, first-out (FIFO) ring buffer – just think of it as a fixed sized buffer. The driver queue does not contain packet data. Instead it consists of descriptors which point to other data structures called socket kernel buffers (SKBs) which hold the packet data and are used throughout the kernel.
<phrase>Figure 4: Partially full driver queue with descriptors pointing to SKBs</phrase>
</textobject>
<caption>
<para><command>Figure 4: </command><emphasis>Partially full driver queue with descriptors pointing to SKBs</emphasis></para>
</caption>
</mediaobject>
<para>
The input source for the driver queue is the IP stack which queues complete IP packets. The packets may be generated locally or received on one NIC to be routed out another when the device is functioning as an IP router. Packets added to the driver queue by the IP stack are dequeued by the hardware driver and sent across a data bus to the NIC hardware for transmission.
</para>
<para>
The reason the driver queue exists is to ensure that whenever the system has data to transmit, the data is available to the NIC for immediate transmission. That is, the driver queue gives the IP stack a location to queue data asynchronously from the operation of the hardware. One alternative design would be for the NIC to ask the IP stack for data whenever the physical medium is ready to transmit. Since responding to this request cannot be instantaneous this design wastes valuable transmission opportunities resulting in lower throughput. The opposite approach would be for the IP stack to wait after a packet is created until the hardware is ready to transmit. This is also not ideal because the IP stack cannot move on to other work.
</para>
<para>
For detail how to set driver queue see <linklinkend="s-ethtool">chapter 5.5</link>.
Byte Queue Limits (BQL) is a new feature in recent Linux kernels (> 3.3.0) which attempts to solve the problem of driver queue sizing automatically. This is accomplished by adding a layer which enables and disables queuing to the driver queue based on calculating the minimum buffer size required to avoid <linklinkend="o-starv-lat">starvation</link> under the current system conditions. Recall from earlier that the smaller the amount of queued data, the lower the maximum <linklinkend="o-starv-lat">latency</link> experienced by queued packets.
It is key to understand that the actual size of the driver queue is not changed by BQL. Rather BQL calculates a limit of how much data (in bytes) can be queued at the current time. Any bytes over this limit must be held or dropped by the layers above the driver queue..
</para>
<para>
The BQL mechanism operates when two events occur: when packets are enqueued to the driver queue and when a transmission to the wire has completed. A simplified version of the BQL algorithm is outlined below. LIMIT refers to the value calculated by BQL.
</para>
<programlisting>
****
** After adding packets to the queue
****
if the number of queued bytes is over the current LIMIT value then
disable the queueing of more data to the driver queue
Notice that the amount of queued data can exceed LIMIT because data is queued before the LIMIT check occurs. Since a large number of bytes can be queued in a single operation when TSO, UFO or GSO (see chapter 2.9.1 aggiungi link for details) are enabled these throughput optimizations have the side effect of allowing a higher than desirable amount of data to be queued. If you care about <linklinkend="o-starv-lat">latency</link> you probably want to disable these features.
As you can see, BQL is based on testing whether the device was starved. If it was starved, then LIMIT is increased allowing more data to be queued which reduces the chance of <linklinkend="o-starv-lat">starvation</link>. If the device was busy for the entire interval and there are still bytes to be transferred in the queue then the queue is bigger than is necessary for the system under the current conditions and LIMIT is decreased to constrain the <linklinkend="o-starv-lat">latency</link>.
A real world example may help provide a sense of how much BQL affects the amount of data which can be queued. On one of my servers the driver queue size defaults to 256 descriptors. Since the Ethernet MTU is 1,500 bytes this means up to 256 * 1,500 = 384,000 bytes can be queued to the driver queue (TSO, GSO etc are disabled or this would be much higher). However, the limit value calculated by BQL is 3,012 bytes. As you can see, BQL greatly constrains the amount of data which can be queued.
</para>
<para>
An interesting aspect of BQL can be inferred from the first word in the name – byte. Unlike the size of the driver queue and most other packet queues, BQL operates on bytes. This is because the number of bytes has a more direct relationship with the time required to transmit to the physical medium than the number of packets or descriptors since the later are variably sized.
BQL reduces network <linklinkend="o-starv-lat">latency</link> by limiting the amount of queued data to the minimum required to avoid <linklinkend="o-starv-lat">starvation</link>. It also has the very important side effect of moving the point where most packets are queued from the driver queue which is a simple FIFO to the queueing discipline (QDisc) layer which is capable of implementing much more complicated queueing strategies. The next section introduces the Linux QDisc layer.
The BQL algorithm is self tuning so you probably don’t need to mess with this too much. However, if you are concerned about optimal latencies at low bitrates then you may want override the upper limit on the calculated LIMIT value. BQL state and configuration can be found in a /sys directory based on the location and name of the NIC. On my server the directory for eth0 is:
<emphasis>limit_max:</emphasis> A configurable maximum value for LIMIT. Set this value lower to optimize for <linklinkend="o-starv-lat">latency</link>.