LDP/LDP/howto/docbook/Traffic-Control-HOWTO/software.xml

681 lines
32 KiB
XML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!-- start of file -->
<!-- This .xml file is part of the Traffic-Control-HOWTO document -->
<!--
The article was authored by Martin A. Brown <martin@linux-ip.net>
for the linux community, and has been released under the GNU Free
Documentation License (GFDL) through The Linux Documentation
Project (TLDP).
This HOWTO is likely available at the following address:
http://tldp.org/HOWTO/Traffic-Control-HOWTO/
-->
<!-- conventions used in this documentation....
- each section is a separate file
-->
<section id="software">
<title>Software and Tools</title>
<para>
</para>
<section id="s-kernel">
<title>Kernel requirements</title>
<para>
Many distributions provide kernels with modular or monolithic support
for traffic control (Quality of Service). Custom kernels may not
already provide support (modular or not) for the required features. If
not, this is a very brief listing of the required kernel options.
</para>
<para>
The user who has little or no experience compiling a kernel is
recommended to &url-kernel-howto;. Experienced kernel compilers should
be able to determine which of the below options apply to the desired
configuration, after reading a bit more about traffic control and
planning.
</para>
<example id="ex-s-kernel-options">
<title>Kernel compilation options
<footnote>
<para>
The options listed in this example are taken from a 2.4.20 kernel
source tree. The exact options may differ slightly from kernel
release to kernel release depending on patches and new schedulers
and classifiers.
</para>
</footnote>
</title>
<programlisting>
#
# QoS and/or fair queueing
#
CONFIG_NET_SCHED=y
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_CSZ=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_QOS=y
CONFIG_NET_ESTIMATOR=y
CONFIG_NET_CLS=y
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_POLICE=y
</programlisting>
</example>
<para>
A kernel compiled with the above set of options will provide modular
support for almost everything discussed in this documentation. The user
may need to <command>modprobe
<replaceable>module</replaceable></command> before using a given
feature. Again, the confused user is recommended to the
&url-kernel-howto;, as this document cannot adequately address questions
about the use of the Linux kernel.
</para>
</section>
<section id="s-iproute2">
<title>&iproute2; tools (&tc;)</title>
<para>
&iproute2; is a suite of command line utilities which
manipulate kernel structures for IP networking
configuration on a machine. For technical documentation on these tools,
see the &url-iproute2-docs; and for a more expository discussion, the
documentation at &url-linux-ip.net;. Of the tools in the &iproute2;
package, the binary &tc; is the only one used for traffic control. This
HOWTO will ignore the other tools in the suite.
</para>
<anchor id="s-iproute2-tc"/>
<para>
Because it interacts with the kernel to direct the creation, deletion
and modification of traffic control structures, the &tc; binary needs to
be compiled with support for all of the &linux-qdisc;s you wish to use.
In particular, the &sch_htb; qdisc is not supported yet in the upstream
&iproute2; package. See
<xref linkend="qc-htb"/> for more information.
</para>
<para>
The &tc; tool performs all of the configuration of the kernel structures
required to support traffic control. As a result of its many uses, the
command syntax can be described (at best) as arcane. The utility takes
as its first non-option argument one of three Linux traffic control
components, &linux-qdisc;, &linux-class; or &linux-filter;.
</para>
<example id="ex-s-iproute2-tc">
<title>&tc; command usage</title>
<programlisting>
<prompt>[root@leander]# </prompt><userinput>tc</userinput>
<computeroutput>Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }
where OBJECT := { qdisc | class | filter }
OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] }</computeroutput>
</programlisting>
</example>
<para>
Each object accepts further and different options, and will be
incompletely described and documented below. The hints in the examples
below are designed to introduce the vagaries of &tc; command line
syntax. For more examples, consult the &url-lartc-howto;. For even
better understanding, consult the kernel and &iproute2; code.
</para>
<example id="ex-s-iproute2-tc-qdisc">
<title>&tc; &linux-qdisc;</title>
<programlisting>
<prompt>[root@leander]# </prompt><userinput>tc qdisc add \</userinput> <co id="ex-s-itcq-tc" linkends="ex-s-itcq-tc-text"/>
<prompt>&gt; </prompt><userinput> dev eth0 \</userinput> <co id="ex-s-itcq-dev" linkends="ex-s-itcq-dev-text"/>
<prompt>&gt; </prompt><userinput> root \</userinput> <co id="ex-s-itcq-root" linkends="ex-s-itcq-root-text"/>
<prompt>&gt; </prompt><userinput> handle 1:0 \</userinput> <co id="ex-s-itcq-handle" linkends="ex-s-itcq-handle-text"/>
<prompt>&gt; </prompt><userinput> htb</userinput> <co id="ex-s-itcq-qdisc" linkends="ex-s-itcq-qdisc-text"/>
</programlisting>
<calloutlist>
<callout
arearefs="ex-s-itcq-tc"
id="ex-s-itcq-tc-text">
<para>
Add a queuing discipline. The verb could also be
<constant>del</constant>.
</para>
</callout>
<callout
arearefs="ex-s-itcq-dev"
id="ex-s-itcq-dev-text">
<para>
Specify the device onto which we are attaching the new queuing
discipline.
</para>
</callout>
<callout
arearefs="ex-s-itcq-root"
id="ex-s-itcq-root-text">
<para>
This means <quote>egress</quote> to &tc;. The word
<constant>root</constant> must be used, however. Another
qdisc with limited functionality, the &ingress-qdisc; can be
attached to the same device.
</para>
</callout>
<callout
arearefs="ex-s-itcq-handle"
id="ex-s-itcq-handle-text">
<para>
The &linux-handle; is a user-specified number of the form
<replaceable>major</replaceable>:<replaceable>minor</replaceable>.
The minor number for any queueing discipline handle must always be
zero (0). An acceptable shorthand for a &linux-qdisc; handle is
the syntax "1:", where the minor number is assumed to be zero (0)
if not specified.
</para>
</callout>
<callout
arearefs="ex-s-itcq-qdisc"
id="ex-s-itcq-qdisc-text">
<para>
This is the queuing discipline to attach, &sch_htb; in this
example. Queuing discipline specific parameters will follow this.
In the example here, we add no qdisc-specific parameters.
</para>
</callout>
</calloutlist>
</example>
<para>
Above was the simplest use of the &tc; utility for adding a queuing
discipline to a device. Here's an example of the use of &tc; to add a
class to an existing parent class.
</para>
<example id="ex-s-iproute2-tc-class">
<title>&tc; &linux-class;</title>
<programlisting>
<prompt>[root@leander]# </prompt><userinput>tc class add \</userinput> <co id="ex-s-itcc-tc" linkends="ex-s-itcc-tc-text"/>
<prompt>&gt; </prompt><userinput> dev eth0 \</userinput> <co id="ex-s-itcc-dev" linkends="ex-s-itcc-dev-text"/>
<prompt>&gt; </prompt><userinput> parent 1:1 \</userinput> <co id="ex-s-itcc-parent" linkends="ex-s-itcc-parent-text"/>
<prompt>&gt; </prompt><userinput> classid 1:6 \</userinput> <co id="ex-s-itcc-classid" linkends="ex-s-itcc-classid-text"/>
<prompt>&gt; </prompt><userinput> htb \</userinput> <co id="ex-s-itcc-classtype" linkends="ex-s-itcc-classtype-text"/>
<prompt>&gt; </prompt><userinput> rate 256kbit \</userinput> <co id="ex-s-itcc-htb-rate" linkends="ex-s-itcc-htb-only-text"/>
<prompt>&gt; </prompt><userinput> ceil 512kbit</userinput> <co id="ex-s-itcc-htb-ceil" linkends="ex-s-itcc-htb-only-text"/>
</programlisting>
<calloutlist>
<callout
arearefs="ex-s-itcc-tc"
id="ex-s-itcc-tc-text">
<para>
Add a class. The verb could also be <constant>del</constant>.
</para>
</callout>
<callout
arearefs="ex-s-itcc-dev"
id="ex-s-itcc-dev-text">
<para>
Specify the device onto which we are attaching the new class.
</para>
</callout>
<callout
arearefs="ex-s-itcc-parent"
id="ex-s-itcc-parent-text">
<para>
Specify the parent &linux-handle; to which we are attaching the new class.
</para>
</callout>
<callout
arearefs="ex-s-itcc-classid"
id="ex-s-itcc-classid-text">
<para>
This is a unique &linux-handle;
(<replaceable>major</replaceable>:<replaceable>minor</replaceable>)
identifying this class. The minor number must be any non-zero (0)
number.
</para>
</callout>
<callout
arearefs="ex-s-itcc-classtype"
id="ex-s-itcc-classtype-text">
<para>
Both of the &classful-qdiscs; require that any children classes be
classes of the same type as the parent. Thus an &sch_htb; qdisc
will contain HTB classes.
</para>
</callout>
<callout
arearefs="ex-s-itcc-htb-rate ex-s-itcc-htb-ceil"
id="ex-s-itcc-htb-only-text">
<para>
This is a class specific parameter. Consult
<xref linkend="qc-htb"/> for more detail on these parameters.
</para>
</callout>
</calloutlist>
</example>
<para>
</para>
<example id="ex-s-iproute2-tc-filter">
<title>&tc; &linux-filter;</title>
<programlisting>
<prompt>[root@leander]# </prompt><userinput>tc filter add \</userinput> <co id="ex-s-itcf-tc" linkends="ex-s-itcf-tc-text"/>
<prompt>&gt; </prompt><userinput> dev eth0 \</userinput> <co id="ex-s-itcf-dev" linkends="ex-s-itcf-dev-text"/>
<prompt>&gt; </prompt><userinput> parent 1:0 \</userinput> <co id="ex-s-itcf-parent" linkends="ex-s-itcf-parent-text"/>
<prompt>&gt; </prompt><userinput> protocol ip \</userinput> <co id="ex-s-itcf-protocol" linkends="ex-s-itcf-protocol-text"/>
<prompt>&gt; </prompt><userinput> prio 5 \</userinput> <co id="ex-s-itcf-prio" linkends="ex-s-itcf-prio-text"/>
<prompt>&gt; </prompt><userinput> u32 \</userinput> <co id="ex-s-itcf-classifier" linkends="ex-s-itcf-classifier-text"/>
<prompt>&gt; </prompt><userinput> match ip port 22 0xffff \</userinput> <co id="ex-s-itcf-match-port" linkends="ex-s-itcf-match-text"/>
<prompt>&gt; </prompt><userinput> match ip tos 0x10 0xff \</userinput> <co id="ex-s-itcf-match-tos" linkends="ex-s-itcf-match-text"/>
<prompt>&gt; </prompt><userinput> flowid 1:6 \</userinput> <co id="ex-s-itcf-flowid" linkends="ex-s-itcf-flowid-text"/>
<prompt>&gt; </prompt><userinput> police \</userinput> <co id="ex-s-itcf-police" linkends="ex-s-itcf-police-text"/>
<prompt>&gt; </prompt><userinput> rate 32000bps \</userinput> <co id="ex-s-itcf-prate" linkends="ex-s-itcf-prate-text"/>
<prompt>&gt; </prompt><userinput> burst 10240 \</userinput> <co id="ex-s-itcf-burst" linkends="ex-s-itcf-burst-text"/>
<prompt>&gt; </prompt><userinput> mpu 0 \</userinput> <co id="ex-s-itcf-mpu" linkends="ex-s-itcf-mpu-text"/>
<prompt>&gt; </prompt><userinput> action drop/continue</userinput> <co id="ex-s-itcf-action" linkends="ex-s-itcf-action-text"/>
</programlisting>
<calloutlist>
<callout
arearefs="ex-s-itcf-tc"
id="ex-s-itcf-tc-text">
<para>
Add a filter. The verb could also be <constant>del</constant>.
</para>
</callout>
<callout
arearefs="ex-s-itcf-dev"
id="ex-s-itcf-dev-text">
<para>
Specify the device onto which we are attaching the new filter.
</para>
</callout>
<callout
arearefs="ex-s-itcf-parent"
id="ex-s-itcf-parent-text">
<para>
Specify the parent handle to which we are attaching the new
filter.
</para>
</callout>
<callout
arearefs="ex-s-itcf-protocol"
id="ex-s-itcf-protocol-text">
<para>
This parameter is required. It's use should be obvious, although
I don't know more.
</para>
</callout>
<callout
arearefs="ex-s-itcf-prio"
id="ex-s-itcf-prio-text">
<para>
The <parameter>prio</parameter> parameter allows a given filter to
be preferred above another. The <parameter>pref</parameter> is a
synonym.
</para>
</callout>
<callout
arearefs="ex-s-itcf-classifier"
id="ex-s-itcf-classifier-text">
<para>
This is a &linux-classifier;, and is a required phrase in every
&tc; &linux-filter; command.
</para>
</callout>
<callout
arearefs="ex-s-itcf-match-port ex-s-itcf-match-tos"
id="ex-s-itcf-match-text">
<para>
These are parameters to the classifier. In this case, packets
with a type of service flag (indicating interactive usage) and
matching port 22 will be selected by this statement.
</para>
</callout>
<callout
arearefs="ex-s-itcf-flowid"
id="ex-s-itcf-flowid-text">
<para>
The <parameter>flowid</parameter> specifies the &linux-handle; of
the target class (or qdisc) to which a matching filter should send
its selected packets.
</para>
</callout>
<callout
arearefs="ex-s-itcf-police"
id="ex-s-itcf-police-text">
<para>
This is the &linux-policer;, and is an optional phrase in every
&tc; &linux-filter; command.
</para>
</callout>
<callout
arearefs="ex-s-itcf-prate"
id="ex-s-itcf-prate-text">
<para>
The policer will perform one action above this rate, and another
action below (see
<xref linkend="ex-s-itcf-action-text"/>).
</para>
</callout>
<callout
arearefs="ex-s-itcf-burst"
id="ex-s-itcf-burst-text">
<para>
The &param-burst; is an exact analog to &param-burst; in
&link-sch_htb; (&param-burst; is a &concepts-buckets; concept).
</para>
</callout>
<callout
arearefs="ex-s-itcf-mpu"
id="ex-s-itcf-mpu-text">
<para>
The minimum policed unit. To count all traffic, use an
&param-mpu; of zero (0).
</para>
</callout>
<callout xreflabel="action parameter"
arearefs="ex-s-itcf-action"
id="ex-s-itcf-action-text">
<para>
The &param-action; indicates what should be done if
the &param-rate; based on the attributes of the policer. The
first word specifies the action to take if the policer has been
exceeded. The second word specifies action to take otherwise.
</para>
</callout>
</calloutlist>
</example>
<para>
As evidenced above, the &tc; command line utility has an arcane and
complex syntax, even for simple operations such as these examples show.
It should come as no surprised to the reader that there exists an easier
way to configure Linux traffic control. See the next section,
<xref linkend="s-tcng"/>.
</para>
</section>
<section id="s-tcng">
<title>&tcng;, Traffic Control Next Generation</title>
<para>
FIXME; sing the praises of tcng. See also &url-tcng-htb-howto; and
&url-tcng-docs;.
</para>
<para>
Traffic control next generation (hereafter, &tcng;) provides all of the
power of traffic control under Linux with twenty percent of the
headache.
</para>
<para>
</para>
</section>
<section id="s-netfilter">
<title>Netfilter</title>
<para>
Netfilter is a framework provided by the Linux kernel that allows various networking-related operations to be implemented in the form of customized handlers. Netfilter offers various functions and operations for packet filtering, network address translation, and port translation, which provide the functionality required for directing packets through a network, as well as for providing ability to prohibit packets from reaching sensitive locations within a computer network.
</para>
<para>
Netfilter represents a set of hooks inside the Linux kernel, allowing specific kernel modules to register callback functions with the kernel's networking stack. Those functions, usually applied to the traffic in form of filtering and modification rules, are called for every packet that traverses the respective hook within the networking stack.
</para>
<section id="s-iptables">
<title><constant>iptables</constant></title>
<para>
iptables is a user-space application program that allows a system administrator to configure the tables provided by the Linux kernel firewall (implemented as different Netfilter modules) and the chains and rules it stores. Different kernel modules and programs are currently used for different protocols; iptables applies to IPv4, ip6tables to IPv6, arptables to ARP, and ebtables to Ethernet frames.
</para>
<para>
iptables requires elevated privileges to operate and must be executed by user root, otherwise it fails to function. On most Linux systems, iptables is installed as /usr/sbin/iptables and documented in its man pages, which can be opened using man iptables when installed. It may also be found in /sbin/iptables, but since iptables is more like a service rather than an "essential binary", the preferred location remains /usr/sbin.
</para>
<para>
The term iptables is also commonly used to inclusively refer to the kernel-level components. x_tables is the name of the kernel module carrying the shared code portion used by all four modules that also provides the API used for extensions; subsequently, Xtables is more or less used to refer to the entire firewall (v4, v6, arp, and eb) architecture.
</para>
<para>
Xtables allows the system administrator to define tables containing chains of rules for the treatment of packets. Each table is associated with a different kind of packet processing. Packets are processed by sequentially traversing the rules in chains. A rule in a chain can cause a goto or jump to another chain, and this can be repeated to whatever level of nesting is desired. (A jump is like a “call”, i.e. the point that was jumped from is remembered.) Every network packet arriving at or leaving from the computer traverses at least one chain.
</para>
<mediaobject id="img-Figure5">
<imageobject>
<imagedata fileref="images/Figure5.svg" format="SVG"/>
</imageobject>
<imageobject>
<imagedata fileref="images/Figure5.png" format="PNG"/>
</imageobject>
<imageobject>
<imagedata fileref="images/Figure5.jpg" format="JPG"/>
</imageobject>
<textobject>
<phrase>Figure 5: Packet flow paths. Packets start at a given box and will flow along a certain path, depending on the circumstances.</phrase>
</textobject>
<caption>
<para><command>Figure 5: </command><emphasis>Packet flow paths. Packets start at a given box and will flow along a certain path, depending on the circumstances.</emphasis>
</para>
</caption>
</mediaobject>
<para>
The origin of the packet determines which chain it traverses initially. There are five predefined chains (mapping to the five available Netfilter hooks, see figure 5), though a table may not have all chains.
</para>
<mediaobject id="img-Figure6">
<imageobject>
<imagedata fileref="images/Figure6.svg" format="SVG"/>
</imageobject>
<imageobject>
<imagedata fileref="images/Figure6.png" format="PNG"/>
</imageobject>
<imageobject>
<imagedata fileref="images/Figure6.jpg" format="JPG"/>
</imageobject>
<textobject>
<phrase>Figure 6: netfilters hook</phrase>
</textobject>
<caption>
<para><command>Figure 6: </command><emphasis>netfilters hook</emphasis>
</para>
</caption>
</mediaobject>
<para>
Predefined chains have a policy, for example DROP, which is applied to the packet if it reaches the end of the chain. The system administrator can create as many other chains as desired. These chains have no policy; if a packet reaches the end of the chain it is returned to the chain which called it. A chain may be empty.
</para>
<itemizedlist>
<listitem>
<para>
<constant>PREROUTING:</constant> Packets will enter this chain before a routing decision is made (point 1 in Figure 6).
</para>
</listitem>
<listitem>
<para>
<constant>INPUT:</constant> Packet is going to be locally delivered. It does not have anything to do with processes having an opened socket; local delivery is controlled by the "local-delivery" routing table: ip route show table local (point 2 Figure 6).
</para>
</listitem>
<listitem>
<para>
<constant>FORWARD:</constant> All packets that have been routed and were not for local delivery will traverse this chain (point 3 in Figure 6).
</para>
</listitem>
<listitem>
<para>
<constant>OUTPUT:</constant> Packets sent from the machine itself will be visiting this chain (point 5 in Figure 6)
</para>
</listitem>
<listitem>
<para>
<constant>POSTROUTING:</constant> Routing decision has been made. Packets enter this chain just before handing them off to the hardware (point 4 in Figure 6).
</para>
</listitem>
</itemizedlist>
<para>
Each rule in a chain contains the specification of which packets it matches. It may also contain a target (used for extensions) or verdict (one of the built-in decisions). As a packet traverses a chain, each rule in turn is examined. If a rule does not match the packet, the packet is passed to the next rule. If a rule does match the packet, the rule takes the action indicated by the target/verdict, which may result in the packet being allowed to continue along the chain or it may not. Matches make up the large part of rulesets, as they contain the conditions packets are tested for. These can happen for about any layer in the OSI model, as with e.g. the --mac-source and -p tcp --dport parameters, and there are also protocol-independent matches, such as -m time.
</para>
<para>
The packet continues to traverse the chain until either
</para>
<itemizedlist>
<listitem>
<para>
a rule matches the packet and decides the ultimate fate of the packet, for example by calling one of the <constant>ACCEPT</constant> or <constant>DROP</constant>, or a module returning such an ultimate fate; or
</para>
</listitem>
<listitem>
<para>
a rule calls the <constant>RETURN</constant> verdict, in which case processing returns to the calling chain; or
</para>
</listitem>
<listitem>
<para>
the end of the chain is reached; traversal either continues in the parent chain (as if RETURN was used), or the base chain policy, which is an ultimate fate, is used.
</para>
</listitem>
</itemizedlist>
<para>
Targets also return a verdict like ACCEPT (NAT modules will do this) or DROP (e.g. the REJECT module), but may also imply CONTINUE (e.g. the LOG module; CONTINUE is an internal name) to continue with the next rule as if no target/verdict was specified at all.
</para>
</section>
</section>
<section id="s-imq">
<title>IMQ, Intermediate Queuing device</title>
<para>
The Intermediate queueing device is not a qdisc but its usage is tightly bound to qdiscs. Within linux, qdiscs are attached to network devices and everything that is queued to the device is first queued to the qdisc and then to driver queue. From this concept, two limitations arise:
</para>
<itemizedlist>
<listitem>
<para>
Only egress shaping is possible (an ingress qdisc exists, but its possibilities are very limited compared to classful qdiscs, as seen before).
</para>
</listitem>
<listitem>
<para>
A qdisc can only see traffic of one interface, global limitations can't be placed.
</para>
</listitem>
</itemizedlist>
<para>
IMQ is there to help solve those two limitations. In short, you can put everything you choose in a qdisc. Specially marked packets get intercepted in netfilter NF_IP_PRE_ROUTING and NF_IP_POST_ROUTING hooks and pass through the qdisc attached to an imq device. An iptables target is used for marking the packets.
</para>
<para>
This enables you to do ingress shaping as you can just mark packets coming in from somewhere and/or treat interfaces as classes to set global limits. You can also do lots of other stuff like just putting your http traffic in a qdisc, put new connection requests in a qdisc, exc.
</para>
<section id="s-sample-configuration">
<title>Sample configuration</title>
<para>
The first thing that might come to mind is use ingress shaping to give yourself a high guaranteed bandwidth. Configuration is just like with any other interface:
</para>
<programlisting>
tc qdisc add dev imq0 root handle 1: htb default 20
tc class add dev imq0 parent 1: classid 1:1 htb rate 2mbit burst 15k
tc class add dev imq0 parent 1:1 classid 1:10 htb rate 1mbit
tc class add dev imq0 parent 1:1 classid 1:20 htb rate 1mbit
tc qdisc add dev imq0 parent 1:10 handle 10: pfifo
tc qdisc add dev imq0 parent 1:20 handle 20: sfq
</programlisting>
<programlisting>
tc filter add dev imq0 parent 10:0 protocol ip prio 1 u32 match \ ip dst 10.0.0.230/32 flowid 1:10
</programlisting>
<para>
In this example u32 is used for classification. Other classifiers should work as expected. Next traffic has to be selected and marked to be enqueued to imq0.
</para>
<programlisting>
iptables -t mangle -A PREROUTING -i eth0 -j IMQ --todev 0
ip link set imq0 up
</programlisting>
<para>
The IMQ iptables targets is valid in the PREROUTING and POSTROUTING chains of the mangle table. It's syntax is
</para>
<programlisting>
IMQ [ --todev n ] n : number of imq device
</programlisting>
<para>
An ip6tables target is also provided.
</para>
<para>
Please note traffic is not enqueued when the target is hit but afterwards. The exact location where traffic enters the imq device depends on the direction of the traffic (in/out). These are the predefined netfilter hooks used by iptables:
</para>
<programlisting>
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
NF_IP_PRI_CONNTRACK = -200,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
NF_IP_PRI_FILTER = 0,
NF_IP_PRI_NAT_SRC = 100,
NF_IP_PRI_LAST = INT_MAX,
};
</programlisting>
<para>
For ingress traffic, imq registers itself with NF_IP_PRI_MANGLE + 1 priority which means packets enter the imq device directly after the mangle PREROUTING chain has been passed.
</para>
<para>
For egress imq uses NF_IP_PRI_LAST which honours the fact that packets dropped by the filter table won't occupy bandwidth.
</para>
</section>
</section>
<section id="s-ethtool">
<title><constant>ethtool,</constant> Driver Queue</title>
<para>
The ethtool command is used to control the driver queue size for Ethernet devices. ethtool also provides low level interface statistics as well as the ability to enable and disable IP stack and driver features.
</para>
<para>
The -g flag to ethtool displays the driver queue (ring) parameters (see Figure 1) :
</para>
<programlisting>
$ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 16384
RX Mini: 0
RX Jumbo: 0
TX: 16384
Current hardware settings:
RX: 512
RX Mini: 0
RX Jumbo: 0
TX: 256
</programlisting>
<para>
You can see from the above output that the driver for this NIC defaults to 256 descriptors in the transmission queue. It was often recommended to reduce the size of the driver queue in order to reduce latency. With the introduction of BQL (assuming your NIC driver supports it) there is no longer any reason to modify the driver queue size (see the below for how to configure BQL).
</para>
<para>
Ethtool also allows you to manage optimization features such as <link linkend="o-huge-packet">TSO, UFO and GSO</link>. The -k flag displays the current offload settings and -K modifies them.
</para>
<programlisting>
$ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: off
</programlisting>
<para>
Since <link linkend="o-huge-packet">TSO, GSO, UFO</link> and GRO greatly increase the number of bytes which can be queued in the driver queue you should disable these optimizations if you want to optimize for latency over throughput. Its doubtful you will notice any CPU impact or throughput decrease when disabling these features unless the system is handling very high data rates.
</para>
</section>
</section>
<!-- end of file -->