LDP/LDP/guide/docbook/linux-ip/routing.xml

2254 lines
97 KiB
XML

<!-- $Id$ -->
<chapter id="ch-routing">
<title>IP Routing</title>
<indexterm zone="ch-routing">
<primary>IP Routing</primary>
<see>routing</see>
</indexterm>
<indexterm zone="ch-routing">
<primary>Routing</primary>
</indexterm>
<para>
Routing is fundamental to the design of the Internet Protocol. IP
routing has been cleverly designed to minimize the complexity for leaf
nodes and networks. Linux can be used as a leaf node, such as a
workstation, where setting the IP address, netmask and
default gateway suffices for all routing needs. Alternatively, the same
routing subsystem can be used in the core of a network connecting
multiple public and private networks.
</para>
<para>
This chapter will begin with the
<link linkend="routing-intro">basics of IP routing with linux</link>,
<link linkend="routing-local">routing to locally connected
destinations</link>,
<link linkend="routing-default">routing to destinations through the
default gateway</link>, and
<link linkend="routing-forwarding">using linux as a router</link>.
Subsequent topics will include
<link linkend="routing-selection">the kernel's route selection
algorithm</link>, the
<link linkend="routing-cache">routing cache</link>,
<link linkend="routing-tables">routing tables</link>, the
<link linkend="routing-rpdb">routing policy database</link>, and
<link linkend="routing-icmp">issues with ICMP and routing</link>.
</para>
<para>
The precinct of this documentation is primarily static routing. Though
dynamic routing is important to large networks, Internet service
providers, and backbone providers, this documentation is targetted for
smaller networks, particularly networks which use static routing.
Nonetheless, the concepts governing the manipulation of a packet in the
kernel, and how routing decisions are made by the kernel are applicable to
dynamic routing environments.
</para>
<para>
The linux routing subsystem has been designed with large
scale networks in mind, without forgetting the need for easy
configurability for leaf nodes, such as workstations and servers.
</para>
<section id="routing-intro">
<title>Introduction to Linux Routing</title>
<para>
The design of IP routing allows for very simple route
definitions for small networks, while not hindering the flexibility of
routing in complex environments. A key concept in IP routing is
the ability to define what addresses are locally reachable as opposed to
not directly known destinations. Every IP capable host knows about at
least three classes of destination: itself, locally connected
computers and everywhere else.
</para>
<para>
Most fully-featured IP-aware networked operating systems
(all unix-like operating systems with IP stacks,
modern Macintoshes, and modern Windows) include support for the loopback
device and IP. This is an IP and range configured on the host machine
itself which allows the machine to talk to itself. Linux systems can
communicate over IP on any locally configured IP address, whether on the
loopback device or not. This is the first class of destinations:
locally hosted addresses.
</para>
<para>
The second class of IP addresses are addresses in the locally
connected network segment. Each machine with a connection to an IP
network can reach a subset of the entire IP address space on its
directly connected network interface.
</para>
<para>
All other hosts or destination IPs fall into a third range. Any IP
which is not on the machine itself or locally reachable (i.e. connected
to the same media segment) is only reachable through an IP routing
device. This routing device must have an IP address in a locally
reachable IP address range.
</para>
<para>
All IP networking is a permutation of these three fundamental concepts
of reachability. This list summarizes the three possible
classifications for reachability of destination IP addresses from any
single source machine.
</para>
<anchor id="list-routing-intro"/>
<orderedlist>
<listitem>
<para>
The IP address is reachable on the machine itself. Under linux
this is considered
<link linkend="tb-tools-ip-addr-scope">scope host</link> and is used
for IPs bound to any network device including loopback devices,
and the network range for the loopback device. Addresses of this
nature are called local IPs or locally hosted IPs.
</para>
</listitem>
<listitem>
<para>
The IP address is reachable on the directly connected link layer
medium. Addresses of this type are called locally reachable or
(preferred) directly reachable IPs.
</para>
</listitem>
<listitem>
<para>
The IP address is ultimately reachable through a router which
is reachable on a directly connected link layer medium. This class
of IP addresses is only reachable through a gateway.
</para>
</listitem>
</orderedlist>
<para>
As a practical description of the above, this partial diagram of the
<link linkend="ax-example-network">example network</link> shows two
machines connected to 192.168.99.0/24. On &tristan; the IP addresses
127.0.0.1 (loopback--not pictured) and 192.168.99.35 are considered
locally hosted IP addresses. The directly reachable IP addresses fall
inside the 192.168.99.0/24 network. Any other destination addresses are
only reachable through a gateway, probably &masq-gw;.
</para>
<example id="routing-intro-classes">
<title>Classes of IP addresses</title>
<mediaobject id="image-routing-intro">
<imageobject>
<imagedata fileref="images/routing-intro.png" format="PNG"/>
</imageobject>
<imageobject>
<imagedata fileref="images/routing-intro.svg" format="SVG"/>
</imageobject>
</mediaobject>
</example>
<para>
Before examining the routing system in more detail, there are some terms
to identify and define. These terms are general IP networking terms
and should be familiar to users who have used IP on other operating
systems and networking equipment.
</para>
<variablelist id="list-routing-intro-ipdefs">
<varlistentry id="list-routing-intro-ipdefs-octet">
<term>octet</term>
<listitem>
<indexterm zone="list-routing-intro-ipdefs-octet">
<primary>octet</primary>
<see>IP addressing, octet</see>
</indexterm>
<indexterm zone="list-routing-intro-ipdefs-octet">
<primary>IP addressing</primary>
<secondary>octet</secondary>
</indexterm>
<para>
A single number between decimal 0 and 255, hexadecimal 0x00 and
0xff. An octet is a single byte in size.
</para>
<para>
Examples: <emphasis>140</emphasis>, <emphasis>254</emphasis>,
<emphasis>255</emphasis>, <emphasis>1</emphasis>,
<emphasis>0</emphasis>, <emphasis>7</emphasis>.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-intro-ipdefs-ipaddr">
<term>IP address</term>
<term>IP</term>
<listitem>
<indexterm zone="list-routing-intro-ipdefs-ipaddr">
<primary>IP address</primary>
<seealso>IP addressing, address</seealso>
</indexterm>
<indexterm zone="list-routing-intro-ipdefs-ipaddr">
<primary>IP addressing</primary>
<secondary>address</secondary>
</indexterm>
<para>
A locally unique four
<link linkend="list-routing-intro-ipdefs-octet">octet</link>
logical identifier which a machine
can use to communicate using the Internet Protocol. This
address is determined by combining the
<link linkend="list-routing-intro-ipdefs-netaddr">network
address</link> and the administratively assigned host address.
Simply put, the IP address is a unique number identifying
a host on a network.
</para>
<para>
Examples: <emphasis>192.168.99.35</emphasis>,
<emphasis>140.71.38.7</emphasis>,
<emphasis>205.254.210.186</emphasis>.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-intro-ipdefs-hostaddr">
<term>host address portion</term>
<listitem>
<indexterm zone="list-routing-intro-ipdefs-hostaddr">
<primary>IP addressing</primary>
<secondary>host address portion</secondary>
</indexterm>
<para>
The rightmost bits (frequently
<link linkend="list-routing-intro-ipdefs-octet">octets</link>)
in an
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
address</link> which are not a part of the
<link linkend="list-routing-intro-ipdefs-netaddr">network
address</link>. The part of an IP address which identifies the
computer on a network independent of the network.
</para>
<para>
Examples: 192.168.1.<emphasis>27</emphasis>/24,
10.<emphasis>10.17.24</emphasis>/8,
172.20.<emphasis>158.75</emphasis>/16.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-intro-ipdefs-netaddr">
<term>network address</term>
<term>network</term>
<term>network prefix</term>
<term>subnetwork address</term>
<listitem>
<indexterm zone="list-routing-intro-ipdefs-netaddr">
<primary>network address</primary>
<see>IP addressing, network address</see>
</indexterm>
<indexterm zone="list-routing-intro-ipdefs-netaddr">
<primary>IP addressing</primary>
<secondary>network address</secondary>
</indexterm>
<para>
A four
<link linkend="list-routing-intro-ipdefs-octet">octet</link>
address and
<link linkend="list-routing-intro-ipdefs-netmask">network
mask</link>
identifying the usable range of
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
addresses</link>. Conventional and CIDR notations combine
the four bare octets with the netmask or prefix length to
define this address. Briefly, a network address is the first
address in a range, and is reserved to identify the entire
network.
<footnote>
<para>
At least one reader (CAO) has pointed out to me that there is
ambiguity in the meaning and common usage of the
term <wordasword>network
address</wordasword>. While occasionally used to refer to a
single IP address at the top of a range of addresses, the
primary meaning requires the implicit
<link linkend="list-routing-intro-ipdefs-netmask">network
mask</link>.
</para>
<para>
Historically, this term has always meant the IP address at the
top of a range AND the netmask identifying the set of
available addresses. Without this latter piece of
information, the <wordasword>network address</wordasword> is
simply an
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
address</link>.
</para>
<para>
Technically, the use of this term to mean a single IP
at the top of the range is incorrect, although not uncommon.
</para>
</footnote>
</para>
<para>
Examples: <emphasis>192.168.187.0/24</emphasis>,
<emphasis>205.254.211.192/26</emphasis>,
<emphasis>4.20.17.128/255.255.255.248</emphasis>,
<emphasis>10.0.0.0/255.0.0.0</emphasis>,
<emphasis>12.35.17.112/28</emphasis>.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-intro-ipdefs-netmask">
<term>network mask</term>
<term>netmask</term>
<term>network bitmask</term>
<listitem>
<indexterm zone="list-routing-intro-ipdefs-netmask">
<primary>netmask</primary>
<see>IP addressing, network mask</see>
</indexterm>
<indexterm zone="list-routing-intro-ipdefs-netmask">
<primary>network mask</primary>
<see>IP addressing, network mask</see>
</indexterm>
<indexterm zone="list-routing-intro-ipdefs-netmask">
<primary>IP addressing</primary>
<secondary>network mask</secondary>
</indexterm>
<para>
A four-octet set of bits which, when AND'd with a particular
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
address</link> produces the
<link linkend="list-routing-intro-ipdefs-netaddr">network
address</link>. Combined with a network address or IP address,
the netmask identifies the range of IP addresses which are
directly reachable.
</para>
<para>
Examples: <emphasis>255.255.255.0</emphasis>,
<emphasis>255.255.0.0</emphasis>,
<emphasis>255.255.192.0</emphasis>,
<emphasis>255.255.255.224</emphasis>,
<emphasis>255.0.0.0</emphasis>.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-intro-ipdefs-prefix">
<term>prefix length</term>
<listitem>
<indexterm zone="list-routing-intro-ipdefs-prefix">
<primary>prefix length</primary>
<see>IP addressing, prefix length</see>
</indexterm>
<indexterm zone="list-routing-intro-ipdefs-prefix">
<primary>IP addressing</primary>
<secondary>prefix length</secondary>
</indexterm>
<para>
An alternate representation of
<link linkend="list-routing-intro-ipdefs-netmask">network
mask</link>, this is a single integer between 0 and 32,
identifying the number of significant bits in an
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
address</link> or
<link linkend="list-routing-intro-ipdefs-netaddr">network
address</link>. This is the "slash-number" component of a
CIDR address.
</para>
<para>
Examples: 4.20.17.0<emphasis>/24</emphasis>,
66.14.17.116<emphasis>/30</emphasis>,
10.158.42.72<emphasis>/29</emphasis>,
10.48.7.198<emphasis>/9</emphasis>,
192.168.154.64<emphasis>/26</emphasis>.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-intro-ipdefs-bcast">
<term>broadcast address</term>
<listitem>
<indexterm zone="list-routing-intro-ipdefs-bcast">
<primary>broadcast address (IP)</primary>
<see>IP addressing, broadcast address</see>
</indexterm>
<indexterm zone="list-routing-intro-ipdefs-bcast">
<primary>IP addressing</primary>
<secondary>broadcast address</secondary>
</indexterm>
<para>
A four
<link linkend="list-routing-intro-ipdefs-octet">octet</link>
address derived from an OR operation between the
<link linkend="list-routing-intro-ipdefs-hostaddr">host address
portion</link> of a
<link linkend="list-routing-intro-ipdefs-netaddr">network
address</link> and the full broadcast special 255.255.255.255.
The broadcast is the highest allowable address in a given network,
and is reserved for broadcast traffic.
</para>
<para>
Examples: <emphasis>192.168.205.255/24</emphasis>,
<emphasis>172.18.255.255/16</emphasis>,
<emphasis>12.7.149.63/26</emphasis>.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
These definitions are common to IP networking in general, and are
understood by all in the IP networking community. For less terse
introductory material on matters of IP network addressing in general,
see
<xref linkend="links-general-ip"/>.
</para>
<para>
As is apparent from the interdependencies amongst the above
definitions, each term defines a separate part of the concept of
the relationships between an IP address and its network. A good
<link linkend="tools-ipcalc">IP calculator</link> can assist in
mastering these IP fundamentals.
</para>
<example id="ex-routing-intro-ipcalc">
<title>Using ipcalc to display IP information</title>
<programlisting>
<prompt>[user@workstation]$ </prompt><userinput>ipcalc -n 12.7.149.0/26</userinput>
Address: 12.7.149.0 00001100.00000111.10010101.00 000000
Netmask: 255.255.255.192 = 26 11111111.11111111.11111111.11 000000
Wildcard: 0.0.0.63 00000000.00000000.00000000.00 111111
=>
Network: 12.7.149.0/26 00001100.00000111.10010101.00 000000 (Class A)
Broadcast: 12.7.149.63 00001100.00000111.10010101.00 111111
HostMin: 12.7.149.1 00001100.00000111.10010101.00 000001
HostMax: 12.7.149.62 00001100.00000111.10010101.00 111110
Hosts/Net: 62
</programlisting>
</example>
<para>
A tool similar to the one shown in
<xref linkend="ex-routing-intro-ipcalc"/> can assist in visualizing the
relationships among IP addressing concepts.
</para>
<para>
Subequently, this chapter will introduce some concrete examples of
routing in a real network. The
<link linkend="ax-example-network">example network</link> illustrates
this network and all of the addresses involved.
</para>
</section>
<section id="routing-local">
<title>Routing to Locally Connected Networks</title>
<indexterm zone="routing-local">
<primary>routing</primary>
<secondary>to locally reachable networks</secondary>
</indexterm>
<para>
Any IP network is defined by two sets of numbers: network address and
netmask. By convention, there are two ways to represent these two
numbers. Netmask notation is the convention and tradition in IP
networking
although the more succinct CIDR notation is gaining popularity.
</para>
<para>
In the
<link linkend="ax-example-network">example network</link>, &isolde; has
IP address 192.168.100.17.
In CIDR notation, &isolde;'s address is 192.168.100.17/24, and in
traditional netmask notation, 192.168.100.17/255.255.255.0.
Any of the
<link linkend="tools-ipcalc">IP calculators</link>, confirms that the
first usable IP address is 192.168.100.1 and the last usable IP address
is 192.168.100.254.
Importantly, the IP network address, 192.168.100.0/24, is reachable
through the directly connected Ethernet interface (refer to
<link linkend="list-routing-intro">classification 2</link>).
Therefore, &isolde; should be able to reach any IP address in
this range directly on the locally connected Ethernet segment.
</para>
<para>
Below is the routing table for &isolde;, first shown with the
conventional <command>route -n</command> output
<footnote>
<para>
The <command>route -n</command> output can also be produced with
<command>netstat -rn</command> and is commonly used by
admininstrators who rely on platform independent behaviour across
heterogeneous Unix and Unix-like systems. This traditional
routing table output uses conventional netmask notation to
denote network size.
</para>
</footnote>
and then with the
<command>ip route show</command>
<footnote>
<para>
Refer to the
<link linkend="tools-ip-route"><command>ip route</command></link>
section for a fuller discussion of this linux specific tool.
The routing table output from <command>ip route</command> uses
exclusively CIDR notation.
</para>
</footnote>
command. Each of these tools conveys
the same routing table and operates on the same kernel routing table.
For more on the routing table displayed in
<xref linkend="ex-routing-local"/>, consult
<xref linkend="routing-table-main"/>.
</para>
<example id="ex-routing-local">
<title>Identifying the locally connected networks with
<command>route</command></title>
<programlisting>
<prompt>[root@isolde]# </prompt><userinput>route -n</userinput>
<computeroutput>Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
0.0.0.0 192.168.100.254 0.0.0.0 UG 0 0 0 eth0</computeroutput>
<prompt>[root@isolde]# </prompt><userinput>ip route show</userinput>
<computeroutput>192.168.100.0/24 dev eth0 scope link
127.0.0.0/8 dev lo scope link
default via 192.168.100.254 dev eth0</computeroutput>
</programlisting>
</example>
<para>
In the above example, the locally reachable destination is
192.168.100.0/255.255.255.0 which can also be written 192.168.100.0/24
as in <command>ip route show</command>. In classful networking
terms, the network to which &isolde; is directly connected is called a
class C sized network.
</para>
<para>
When a process on &isolde; needs to send a packet to another
machine on the locally connected network, packets will be sent from
192.168.100.17 (&isolde;'s IP). The kernel will consult
the routing table to determine the route and the source address to use
when sending this packet.
Assuming the destination is 192.168.100.32, the kernel will find that
192.168.100.32 falls inside the IP address range 192.168.100.0/24 and
will select this route for the outbound packet. For further details on
source address selection, see
<xref linkend="routing-saddr-selection"/>. The source address on the
outbound packet conveys vital information to the host receiving the
packet. In order for the packet to be able to return, &isolde; has to
use an IP address that is locally available, 192.168.100.32 has to have
a route to &isolde; and neither host must block the packet.
</para>
<para>
The packet will be sent to the locally connected network segment
directly, because &isolde; interprets from the routing table
that 192.168.100.32 is directly reachable through the physical network
connection on eth0.
</para>
<para>
Occasionally, a machine will be directly connected to two different
IP networks on the same device.
The routing table will show that both networks are reachable through
the same physical device. For more on this topic, see
<xref linkend="adv-media-share"/>. Similarly, multi-homed hosts will
have routes for all locally connected networks through the
locally-connected network interface. For more on this sort of
configuration, see
<xref linkend="adv-multi-homed"/>.
</para>
<para>
This covers the classification of IP destinations which are available on
a locally connected network. This highlights the importance of an
accurate netmask and network address. The next section will cover
IP ranges which are neither locally hosted
nor fall in the range of the locally reachable networks. These
destinations must be reached through a router.
</para>
</section>
<section id="routing-default">
<title>Sending Packets Through a Gateway</title>
<indexterm zone="routing-default">
<primary>routing</primary>
<secondary>to a default gateway</secondary>
</indexterm>
<para>
By comparison to the total number of publicly accessible hosts on the
Internet there is an almost insignificant number of hosts inside any
locally reachable network. This means that the majority of potential
destinations are only available via a router.
</para>
<para>
Any machine which will accept and forward packets between two networks
is a router. Every router is at least dual-homed; one interface
connects to one network, and a second interface connects to another
network. This interface is frequently an independent NIC, although it
might be a virtual interface, such as a VLAN interface. Machines
connected to either network learn by a routing protocol or are
statically configured to pass traffic for the other network to the
router.
</para>
<para>
For &tristan;, there are two different paths out of 192.168.99.0/24.
One path has another leaf network, 192.168.98.0/24, and the other path
has many networks, including the Internet. The routing table on
&tristan; should then contain two different routes out of the network.
One destination 192.168.98.0/24 will be reachable through 192.168.99.1.
So, if &tristan; has a packet with a destination IP address in the range
of the branch office network, it will choose to send the packet directly
to &isdn-router;.
</para>
<para>
The default route is another way to say the route for destination 0/0.
This is the most general possible route.
It is the catch-all route. If no more specific
route exists in a routing table, a default route will be used.
Many servers and workstations are connected to leaf networks
with only one router, hence
<xref linkend="ex-routing-local"/>
shows a very common sort of routing table. There's a route for
localhost, for the locally connected IP network, and a default
route.
</para>
<para>
For Internet-connected hosts, the default route is customarily set to
the IP of the locally reachable router which has a path to the Internet.
Each router in turn has a default gateway pointing to another
Internet-connected router until the packet is handed off to an Internet
Service Provider's network.
</para>
</section>
<section id="routing-forwarding">
<title>Operating as a Router</title>
<indexterm zone="routing-forwarding">
<primary>router</primary>
<secondary>operating as a</secondary>
</indexterm>
<indexterm zone="routing-forwarding">
<primary>IP forwarding</primary>
</indexterm>
<indexterm zone="routing-forwarding">
<primary>forwarding</primary>
<see>IP forwarding</see>
</indexterm>
<indexterm zone="routing-forwarding">
<primary>sysctl</primary>
<secondary><constant>ip_forward</constant></secondary>
</indexterm>
<para>
Operating as a router allows a linux machine to accept packets on one
interface and transmit them on another. This is the nature of a router.
The process of accepting and transmitting IP packets is known as
forwarding. IP forwarding is a requirement for many of the networking
techniques identified here. Stateless NAT and firewalling, transparent
proxying and masquerading all require the support of IP forwarding in
order to function correctly.
</para>
<para>
The sysctl <filename>net/ipv4/ip_forward</filename> toggles the IP
forwarding functionality on a linux box. Note that setting this sysctl
alters other routing-related sysctl entries, so it is wise to set this
first, and then alter other entries.
Frequently, an administrator will forget this simple and crucial detail
when configuring a new machine to operate as a router only to be
frustrated at the simple error.
</para>
<para>
The sysctl <filename>net/ipv4/conf/$DEV/forward</filename> defaults to
the value of <filename>net/ipv4/ip_forward</filename>, but can be
independently modified. In order to allow forwarding of packets between
two interfaces while prohibiting such behaviour on a third interface,
this sysctl can be employed.
</para>
</section>
<section id="routing-selection">
<title>Route Selection</title>
<indexterm zone="routing-selection">
<primary>route selection</primary>
</indexterm>
<para>
Crucial to the proper ability of hosts to exchange IP packets is the
correct selection of a route to the destination. The rules for the
selection of route path are traditionally made on a
<!-- note: per-hop-basis? PHB; consider -->
hop-by-hop basis
<footnote>
<para>
This document could stand to allude to MPLS implementations under
linux, for those who want to look at traffic engineering and packet
tagging on backbones. This is certainly not in the scope of this
chapter, and should be in a separate chapter, which covers
developing technologies.
</para>
</footnote>
based solely upon the destination address of the packet. Linux
behaves as a conventional routing device in this way, but can also
provide a more flexible capability. Routes can be chosen and
prioritized based on other packet characteristics.
</para>
<para>
The route selection algorithm under linux has been generalized to
enable the powerful latter scenario without complicating the
overwhelmingly common case of the former scenario.
</para>
<section id="routing-selection-common">
<title>The Common Case</title>
<para>
The above sections on routing to a
<link linkend="routing-local">local network</link> and
<link linkend="routing-default">the default gateway</link>
expose the importance of destination address for route selection.
In this simplified model, the kernel need only know the destination
address of the packet, which it compares against the routing tables to
determine the route by which to send the packet.
</para>
<para>
The kernel searches for a matching entry for the destination first in
the routing cache and then the main routing table.
In the case that the machine has recently transmitted a
packet to the destination address, the
<link linkend="routing-cache">routing cache</link> will contain an
entry for the destination. The kernel will select the same route, and
transmit the packet accordingly.
</para>
<para>
If the linux machine has not recently transmitted a packet to this
destination address, it will look up the destination in its routing
table using a technique known longest prefix match
<footnote>
<para>
Refer to
<ulink url="http://www.isi.edu/in-notes/rfc3222.txt">RFC
3222</ulink> for further details.
</para>
</footnote>.
In practical terms, the concept of longest prefix match means that the
most specific route to the destination will be chosen.
</para>
<anchor id="routing-selection-lpm"/>
<indexterm zone="routing-selection-lpm">
<primary>longest prefix match</primary>
<see>route selection, longest prefix match</see>
</indexterm>
<indexterm zone="routing-selection-lpm">
<primary>route selection</primary>
<secondary>longest prefix match</secondary>
</indexterm>
<para>
The use of the
longest prefix match allows routes for large networks to be
overridden by more specific host or network routes, as required in
<xref linkend="ex-basic-del-static"/>, for example. Conversely, it is
this same property of longest prefix match which allows routes to
individual destinations to be aggregated into larger network
addresses. Instead of entering individual routes for each host, large
numbers of contiguous network addresses can be aggregated. This is
the realized promise of CIDR networking. See
<xref linkend="links-general-ip"/> for further details.
</para>
<para>
In the common case, route selection is based completely on the
destination address. Conventional (as opposed to policy-based) IP
networking relies on only the destination address to select a route
for a packet.
</para>
<para>
Because the majority of linux systems have no need of policy
based routing
features, they use the conventional routing technique of longest
prefix match. While this meets the needs of a large subset of
linux networking needs, there are unrealized policy routing features
in a machine operating in this fashion.
</para>
</section>
<section id="routing-selection-adv">
<title>The Whole Story</title>
<para>
With the prevalence of low cost bandwidth, easily configured VPN
tunnels, and increasing reliance on networks, the technique of
selecting a route based solely on the destination IP address range no
longer suffices for all situations.
The discussion of the common case
of route selection under linux neglects one
of the most powerful features in the linux IP stack.
Since kernel 2.2, linux has
supported policy based routing through the use of
<link linkend="routing-tables">multiple routing tables</link> and the
<link linkend="routing-rpdb">routing policy database (RPDB)</link>.
Together, they allow a network
administrator to configure a machine select different routing
tables and routes based on a number of criteria.
</para>
<para>
Selectors available for use in policy-based routing are
attributes of a packet
passing through the linux routing code. The source address of a
packet, the ToS flags, an fwmark (a mark carried through the kernel in
the data structure representing the packet), and the interface name on
which the packet was received are attributes which can be used as
selectors. By selecting a routing table based
on packet attributes, an administrator can have
granular control over the network path of any packet.
</para>
<para>
With this knowledge of the RPDB and multiple
routing tables, let's revisit in detail the method by which the
kernel selects the proper route for a packet. Understanding
the series of steps the kernel takes for route selection should
demystify advanced routing. In fact, advanced routing could more
accurately be called policy-based networking.
</para>
<para>
When determining the route by which to send a packet, the kernel always
<link linkend="routing-cache">consults the routing cache first</link>.
The routing cache is a hash table used for quick access to recently
used routes. If the kernel finds an entry in the routing cache, the
corresponding entry will be used. If there is no entry in the
routing cache, the kernel begins the process of route selection. For
details on the method of matching a route in the routing cache, see
<xref linkend="routing-cache"/>.
</para>
<para>
The kernel begins iterating by priority through the routing policy
database. For each matching entry in the RPDB, the kernel will try to
find a matching route to the destination IP
address in the specified routing table using the aforementioned
longest prefix match selection algorithm. When a matching destination
is found, the kernel will select the matching route, and forward the
packet. If no matching entry is found in the specified routing table,
the kernel will pass to the next rule in the RPDB, until it finds a
match or falls through the end of the RPDB and all consulted routing
tables.
</para>
<para>
Here is a snippet of python-esque pseudocode to illustrate the
kernel's route selection process again. Each of the lookups below
occurs in kernel hash tables which are accessible to the user through
the use of various &iproute2; tools.
<indexterm zone="routing-selection-algorithm">
<primary>route selection</primary>
<secondary>algorithm</secondary>
</indexterm>
<example id="routing-selection-algorithm">
<title>Routing Selection Algorithm in Pseudo-code</title>
<programlisting>
if packet.routeCacheLookupKey in routeCache :
route = routeCache[ packet.routeCacheLookupKey ]
else
for rule in rpdb :
if packet.rpdbLookupKey in rule :
routeTable = rule[ lookupTable ]
if packet.routeLookupKey in routeTable :
route = route_table[ packet.routeLookup_key ]
</programlisting>
</example>
<!--
I don't know if this is correct! Need to learn about how the routing
cache is populated with information. 2003-02-05
route_cache[ packet.routeCacheLookupKey ] = route
-->
This pseudocode provides some explanation of the decisions
required to find a route. The final piece of information
required to understand the decision making process is the lookup
process for each of the three hash table lookups. In
<xref linkend="tb-routing-selection-adv"/>, each key is listed in order
of importance. Optional keys are listed in italics and represent keys
that will be matched if they are present.
</para>
<indexterm zone="tb-routing-selection-adv">
<primary>route selection</primary>
<secondary>lookup keys</secondary>
</indexterm>
<table id="tb-routing-selection-adv">
<title>Keys used for hash table lookups during route selection</title>
<tgroup cols="3" align="center" colsep="1" rowsep="1">
<thead>
<row>
<entry>route cache</entry>
<entry>RPDB</entry>
<entry>route table</entry>
</row>
</thead>
<tbody>
<row>
<entry>destination</entry>
<entry>source</entry>
<entry>destination</entry>
</row>
<row>
<entry>source</entry>
<entry><emphasis>destination</emphasis></entry>
<entry><emphasis>ToS</emphasis></entry>
</row>
<row>
<entry><emphasis>ToS</emphasis></entry>
<entry><emphasis>ToS</emphasis></entry>
<entry><emphasis><link linkend="tb-tools-ip-addr-scope">scope</link></emphasis></entry>
</row>
<row>
<entry><emphasis>fwmark</emphasis></entry>
<entry><emphasis>fwmark</emphasis></entry>
<entry><emphasis>oif</emphasis></entry>
</row>
<row>
<entry><emphasis>iif</emphasis></entry>
<entry><emphasis>iif</emphasis></entry>
<entry></entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The route cache (also the forwarding information base) can be
displayed using
<link linkend="tools-ip-route-show-cache"><command>ip route show
cache</command></link>. The routing policy database (RPDB) can be
manipulated with the
<link linkend="tools-ip-rule"><command>ip rule</command></link>
utility. Individual route tables can be manipulated and displayed
with the
<link linkend="tools-ip-route"><command>ip route</command></link>
command line tool.
</para>
<example id="ex-routing-selection-adv-ip-rule">
<title>Listing the Routing Policy Database (RPDB)</title>
<programlisting>
<prompt>[root@isolde]# </prompt><userinput>ip rule show</userinput>
<computeroutput>0: from all lookup local
32766: from all lookup main
32767: from all lookup 253</computeroutput>
</programlisting>
</example>
<para>
Observation of the output of <command>ip rule show</command> in
<xref linkend="ex-routing-selection-adv-ip-rule"/>
on a box whose RPDB has not been changed should reveal a
high priority rule, rule 0. This rule, created at RPDB
initialization, instructs the kernel to try to find a match for the
destination in the
<link linkend="routing-table-local">local routing table</link>. If
there is no match for the packet in the local routing table, then,
per rule 32766, the kernel will perform a route lookup in the
main routing table. Normally, the main routing table will contain a
default route if not a more specific route.
Failing a route lookup in the main routing table the final rule
(32767) instructs the kernel to perform a route lookup in table 253.
</para>
<!--
FIXME; include an XREF here to the State vs Statless discussion
-->
<para>
A common mistake when working with multiple routing tables involves
forgetting about the statelessness of IP routing. This manifests when
the user configuring the policy routing machine accounts for outbound
packets (via &fwmark;, or <command>ip rule</command>
selectors), but forgets to account for the return packets.
</para>
</section>
<section id="routing-selection-summary">
<title>Summary</title>
<para>
For more ideas on how to use policy routing, how to work with
multiple routing tables, and how to troubleshoot, see
<xref linkend="adv-rpdb"/>.
</para>
<para>
Yeah. That's it. So there.
</para>
</section>
</section>
<section id="routing-saddr-selection">
<title>Source Address Selection</title>
<indexterm zone="routing-saddr-selection">
<primary>source address selection</primary>
<seealso>route selection</seealso>
</indexterm>
<para>
The selection of the correct source address is key to correct
communication between hosts with multiple IP addresses. If a host
chooses an address from a private network to communicate with a public
Internet host, it is likely that the return half of the communication
will never arrive.
</para>
<para>
</para>
<para>
The initial source address for an outbound packet is chosen in according
to the following series of rules. The application can request a
particular IP
<footnote>
<para>
Many networking applications accept a command line option to prefer
a particular source address. The call to select a particular
IP is known as <function>bind()</function>, so the command
line option frequently
contains the word <wordasword>bind</wordasword>, e.g.,
<option>--bind-address</option>.
Examples of command line tools allowing specification of the source
address are <command>nc -s $BINDADDR $DEST $PORT</command> or
<command>socat -
TCP4:$REMOTEHOST:$REMOTEPORT,bind=$BINDADDR</command>.
</para>
</footnote>,
the kernel will use the &src; hint from the chosen
route path
<footnote>
<para>
In this case, the route has already been selected (see
<xref linkend="routing-selection"/>) and the chosen route entry
includes a hint for preferred source address on outbound packets
specifically for this purpose. For examples on configuring the
routing tables to include this parameter, see
<xref linkend="ex-tools-ip-route-add-src"/>.
</para>
</footnote>,
or, lacking this hint, the kernel will choose the first address
configured on the interface which falls in the same network as the
destination address or the nexthop router.
</para>
<para>
The following list recapitulates the manner by which the kernel
determines what the source address of an outbound packet.
</para>
<itemizedlist>
<listitem>
<para>
The application is already using the socket, in which case, the
source address has been chosen. Also, the application can
specifically request a particular address (not necessarily a
locally hosted IP; see
<xref linkend="adv-nonlocal-bind"/>) using the
<function>bind</function> call.
</para>
</listitem>
<listitem>
<para>
The kernel performs a
<link linkend="routing-selection">route lookup</link> and finds an
outbound route for the destination. If the route contains the
&src; parameter, the kernel selects this IP
address for the outbound packet.
</para>
</listitem>
<listitem>
<para>
</para>
</listitem>
</itemizedlist>
<para>
</para>
<para>
</para>
<para>
Also refer to this
<ulink url="http://linux-ip.net/gl/ip-cref/node155.html">excerpt</ulink>
from the &iproute2; command reference.
</para>
</section>
<section id="routing-cache">
<title>Routing Cache</title>
<indexterm zone="routing-cache">
<primary>routing cache</primary>
</indexterm>
<indexterm zone="routing-cache">
<primary>forwarding information base</primary>
<see>routing cache</see>
</indexterm>
<para>
The routing cache is also known as the forwarding information base (FIB).
This term may be familiar to users of other routing systems.
</para>
<para>
The routing cache stores recently used routing entries in a fast and
convenient hash lookup table, and is consulted before the routing
tables. If the kernel finds a matching entry during route cache lookup,
it will forward the packet immediately and stop traversing the routing
tables.
</para>
<para>
Because the routing cache is maintained by the kernel separately from
the routing tables, manipulating the routing tables may not have an
immediate effect on the kernel's choice of path for a given packet.
To avoid a non-deterministic lag between the time that a new route
is entered into the kernel routing tables and the time that a new lookup
in those route tables is performed, use
<link linkend="tools-ip-route-flush-cache"><command>ip route flush
cache</command></link>. Once the route cache has been emptied, new
route lookups (if not by a packet, then manually with
<link linkend="tools-ip-route-get"><command>ip route
get</command></link>) will result in a new lookup to the kernel routing
tables.
</para>
<para>
The following is a listing of the hash lookup keys
in the routing cache and a description of each key. Compare this list
with the elements identified in
<xref linkend="tb-routing-selection-adv"/>.
</para>
<variablelist id="list-routing-cache-lookup-keys">
<varlistentry id="list-routing-cache-lookup-keys-dst">
<term>dst</term>
<term>Destination Address</term>
<listitem>
<indexterm zone="list-routing-cache-lookup-keys-dst">
<primary>routing cache</primary>
<secondary>lookup keys</secondary>
<tertiary>dst</tertiary>
</indexterm>
<para>
The destination IP address of the packet. This is the destination
address on the packet at the time of the route lookup. The address
is a host address. All 32 bits are significant during this lookup.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-lookup-keys-src">
<term>src</term>
<term>Source Address</term>
<listitem>
<indexterm zone="list-routing-cache-lookup-keys-src">
<primary>routing cache</primary>
<secondary>lookup keys</secondary>
<tertiary>src</tertiary>
</indexterm>
<para>
The source IP address of the packet. This is the source address
on the packet at the time of the route lookup. The address is a
host address. All 32 bits are significant during this lookup.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-lookup-keys-tos">
<term>tos</term>
<term>Type of Service</term>
<listitem>
<indexterm zone="list-routing-cache-lookup-keys-tos">
<primary>routing cache</primary>
<secondary>lookup keys</secondary>
<tertiary>tos</tertiary>
</indexterm>
<para>
The ToS marking on the packet. If there is no ToS marking on the
packet (tos == 0), this lookup key is unused. If there is a ToS
marking, the kernel will search for a match with this ToS value.
If no matching (dst, src, tos) is found, the kernel will continue
the search for a route by traversing the RPDB.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-lookup-keys-fwmark">
<term>fwmark</term>
<listitem>
<indexterm zone="list-routing-cache-lookup-keys-fwmark">
<primary>routing cache</primary>
<secondary>lookup keys</secondary>
<tertiary>fwmark</tertiary>
</indexterm>
<para>
The mark on a packet added administratively by the packet
filtering engine (<command>ipchains</command> or
<command>iptables</command>).
This mark is not part of the physical IP packet, and only exists
as part of the data structure held in memory on the routing device
to represent the IP
packet. If there is no fwmark on the packet, this lookup key is
unused. When present, the kernel will search for a matching
(dst, src, tos?, fwmark) entry. If no matching entry is found,
the kernel will continue the search for a route by traversing the
RPDB.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-lookup-keys-iif">
<term>iif</term>
<term>inbound interface</term>
<listitem>
<indexterm zone="list-routing-cache-lookup-keys-iif">
<primary>routing cache</primary>
<secondary>lookup keys</secondary>
<tertiary>iif</tertiary>
</indexterm>
<para>
The name of the interface on which the packet arrived.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
</para>
<para>
The following attributes may be stored for each entry in the routing
cache.
</para>
<variablelist id="list-routing-cache-attrs">
<varlistentry id="list-routing-cache-attrs-cwnd">
<term>cwnd</term>
<term>FIXME Window</term>
<listitem>
<indexterm zone="list-routing-cache-attrs-cwnd">
<primary>routing cache</primary>
<secondary>attributes</secondary>
<tertiary>cwnd</tertiary>
</indexterm>
<para>
FIXME. A) I don't know what it is.
B) I don't know how to describe it.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-attrs-advmss">
<term>advmss</term>
<term>Advertised Maximum Segment Size</term>
<listitem>
<indexterm zone="list-routing-cache-attrs-advmss">
<primary>routing cache</primary>
<secondary>attributes</secondary>
<tertiary>advmss</tertiary>
</indexterm>
<para>
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-attrs-src">
<term>src</term>
<term>(Preferred Local) Source Address</term>
<listitem>
<indexterm zone="list-routing-cache-attrs-src">
<primary>routing cache</primary>
<secondary>attributes</secondary>
<tertiary>src</tertiary>
</indexterm>
<para>
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-attrs-mtu">
<term>mtu</term>
<term>Maximum Transmission Unit</term>
<listitem>
<indexterm zone="list-routing-cache-attrs-mtu">
<primary>routing cache</primary>
<secondary>attributes</secondary>
<tertiary>mtu</tertiary>
</indexterm>
<para>
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-attrs-rtt">
<term>rtt</term>
<term>Round Trip Time</term>
<listitem>
<indexterm zone="list-routing-cache-attrs-rtt">
<primary>routing cache</primary>
<secondary>attributes</secondary>
<tertiary>rtt</tertiary>
</indexterm>
<para>
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-attrs-rttvar">
<term>rttvar</term>
<term>Round Trip Time Variation</term>
<listitem>
<indexterm zone="list-routing-cache-attrs-rttvar">
<primary>routing cache</primary>
<secondary>attributes</secondary>
<tertiary>rttvar</tertiary>
</indexterm>
<para>
FIXME. Gotta find some references to this, too.
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-attrs-age">
<term>age</term>
<listitem>
<indexterm zone="list-routing-cache-attrs-age">
<primary>routing cache</primary>
<secondary>attributes</secondary>
<tertiary>age</tertiary>
</indexterm>
<para>
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-attrs-users">
<term>users</term>
<listitem>
<indexterm zone="list-routing-cache-attrs-users">
<primary>routing cache</primary>
<secondary>attributes</secondary>
<tertiary>users</tertiary>
</indexterm>
<para>
</para>
</listitem>
</varlistentry>
<varlistentry id="list-routing-cache-attrs-used">
<term>used</term>
<listitem>
<indexterm zone="list-routing-cache-attrs-used">
<primary>routing cache</primary>
<secondary>attributes</secondary>
<tertiary>used</tertiary>
</indexterm>
<para>
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
Collectively the hash keys uniquely identify routes in the forwarding
information base (routing cache) and each entry provides attributes of
the route.
</para>
<para>
</para>
<para>
</para>
<para>
</para>
</section>
<section id="routing-tables">
<title>Routing Tables</title>
<indexterm zone="routing-tables">
<primary>routing tables</primary>
<secondary>multiple</secondary>
</indexterm>
<para>
Linux kernel 2.2 and 2.4 support multiple routing tables
<footnote>
<para>
The kernel must be compiled with the option
<constant>CONFIG_IP_MULTIPLE_TABLES=y</constant>. This is common
in vendor and stock kernels, both 2.2 and 2.4.
</para>
</footnote>.
Beyond the two commonly used routing tables
(<link linkend="routing-table-local">the local</link> and
<link linkend="routing-table-main">main</link> routing tables), the
kernel supports up to 252 additional routing tables.
</para>
<para>
The multiple routing table system provides a flexible infrastructure on
top of which to implement policy routing. By allowing multiple
traditional routing tables (keyed primarily to destination address)
to be combined with the
<link linkend="routing-rpdb">routing policy database (RPDB)</link>
(keyed primarily to source address), the
kernel supports a well-known and well-understood interface while
simultaneously expanding and extending its routing capabilities.
Each routing table still operates in the traditional and expected
fashion. Linux simply allows you to choose from a
number of routing tables, and to traverse routing tables in a
user-definable sequence until a matching route is found.
</para>
<anchor id="routing-tables-keys"/>
<indexterm zone="routing-tables-keys">
<primary>routing tables</primary>
<secondary>key fields</secondary>
</indexterm>
<para>
Any given routing table can contain an arbitrary number of entries,
each of which is keyed on the following characteristics (cf.
<xref linkend="tb-routing-selection-adv"/>)
<itemizedlist>
<listitem>
<para>
destination address; a network or host address (primary key)
</para>
</listitem>
<listitem>
<para>
tos; Type of Service
</para>
</listitem>
<listitem>
<para>
<link linkend="tb-tools-ip-addr-scope">scope</link>
</para>
</listitem>
<listitem>
<para>
output interface
</para>
</listitem>
</itemizedlist>
</para>
<para>
For practical purposes, this means that (even) a single routing table can
contain multiple routes to the same destination if the ToS differs
on each route or if the route applies to a different interface
<footnote>
<para>
If somebody has used scope or oif as additional keys in a routing
table, and has an example, I'd love to see it, for possible
inclusion in this documentation.
</para>
</footnote>.
</para>
<para>
Kernels supporting multiple routing tables refer to routing tables by
unique integer slots between 0 and 255
<footnote>
<para>
Can anybody describe to me what is in table 0? It looks almost like
an aggregation of the routing entries in routing tables 254 and 255.
</para>
</footnote>.
The two routing tables normally employed are
<link linkend="routing-table-local">table 255, the
&local; routing table</link>, and
<link linkend="routing-table-main">table 254, the
&main; routing table</link>. For
examples of using multiple routing tables, see
<xref linkend="ch-advanced"/>, in particular,
<xref linkend="ex-adv-multi-internet-outbound-ip-routing"/>,
<xref linkend="ex-adv-multi-internet-outbound-ip-rule"/> and
<xref linkend="ex-adv-multi-internet-inbound"/>. Also be sure
to read
<xref linkend="adv-rpdb"/> and
<xref linkend="routing-rpdb"/>.
</para>
<para>
The <command>ip route</command> and <command>ip rule</command> commands
have built in support for the special tables &main; and &local;.
Any other routing tables can be referred to by number or an
administratively maintained mapping file,
<filename>/etc/iproute2/rt_tables</filename>.
</para>
<para>
The format of this file is extraordinarily simple. Each line represents
one mapping of an arbitrary string to an integer. Comments are allowed.
</para>
<example id="ex-routing-tables-rt-table">
<title>Typical content of
<filename>/etc/iproute2/rt_tables</filename></title>
<programlisting>
<computeroutput>#
# reserved values
#
255 local <co id="id-rtrt-local" linkends="id-rtrt-local-text"/>
254 main <co id="id-rtrt-main" linkends="id-rtrt-main-text"/>
253 default <co id="id-rtrt-default" linkends="id-rtrt-default-text"/>
0 unspec <co id="id-rtrt-unspec" linkends="id-rtrt-unspec-text"/>
#
# local
#
1 inr.ruhep <co id="id-rtrt-user" linkends="id-rtrt-user-text"/></computeroutput>
</programlisting>
<calloutlist>
<callout
arearefs="id-rtrt-local"
id="id-rtrt-local-text">
<simpara>
The &local; table is a special routing table maintained by the
kernel. Users can remove entries from the local routing table
at their own risk. Users cannot add entries to the local
routing table. The file
<filename>/etc/iproute2/rt_tables</filename> need not exist, as
the &iproute2; tools have a hard-coded entry for the &local;
table.
</simpara>
</callout>
<callout
arearefs="id-rtrt-main"
id="id-rtrt-main-text">
<simpara>
The main routing table is the table operated upon by
<command>route</command> and, when not otherwise specified, by
<command>ip route</command>. The file
<filename>/etc/iproute2/rt_tables</filename> need not exist, as
the &iproute2; tools have a hard-coded entry for the &main;
table.
</simpara>
</callout>
<callout
arearefs="id-rtrt-default"
id="id-rtrt-default-text">
<simpara>
The <constant>default</constant> routing table is another
special routing table, but WHY is it special!?!
</simpara>
</callout>
<callout
arearefs="id-rtrt-unspec"
id="id-rtrt-unspec-text">
<simpara>
Operating on the <constant>unspec</constant> routing table
appears to operate on all routing tables simultaneously. Is
this true!? What does that imply?
</simpara>
</callout>
<callout
arearefs="id-rtrt-user"
id="id-rtrt-user-text">
<simpara>
This is an example indicating that table 1 is known by the name
inr.ruhep. Any references to <userinput>table
inr.ruhep</userinput> in an <command>ip rule</command>
or <command>ip route</command> will substitue the
value 1 for the word inr.ruhep.
</simpara>
</callout>
</calloutlist>
</example>
<para>
The routing table manipulated by the conventional
<link linkend="tools-route"><command>route</command></link> command
is the &main; routing table. Additionally, the use of both
<link linkend="tools-ip-address"><command>ip address</command></link> and
<link linkend="tools-ifconfig"><command>ifconfig</command></link>
will cause the kernel to alter the local routing table (and usually the
main routing table). For further documentation on how to manipulate
the other routing tables, see the command description of
<link linkend="tools-ip-route"><command>ip route</command></link>.
</para>
<para>
</para>
<para>
</para>
<para>
</para>
<section id="routing-table-entries">
<title>Routing Table Entries (Routes)</title>
<indexterm zone="routing-tables-keys">
<primary>route types</primary>
<see>routing tables, entry types</see>
</indexterm>
<indexterm zone="routing-tables-keys">
<primary>routing tables</primary>
<secondary>entry types</secondary>
</indexterm>
<para>
Each routing table can contain an arbitrary number of route entries.
Aside from the
<link linkend="routing-table-local">local routing table</link>, which
is maintained by the kernel, and the
<link linkend="routing-table-main">main routing table</link> which is
partially maintained by the kernel,
all routing tables are controlled by the administrator or routing
software. All routes on a machine can be changed or removed
<footnote>
<para>
Once again, I recommend caution when altering the local routing
table. Removing local route types from the local routing table
can break networking in strange and wonderful ways.
</para>
</footnote>.
</para>
<para>
Each of the following route types is available for use with
the <command>ip route</command> command. Each route type causes a
particular sort of behaviour, which is identified in the textual
description. Compare the route types described below with the
<link linkend="list-routing-rule-types">rule types</link> available
for use in the RPDB.
</para>
<variablelist id="list-routing-route-types">
<varlistentry id="list-routing-route-types-unicast">
<term>unicast</term>
<listitem>
<indexterm zone="list-routing-route-types-unicast">
<primary>routing tables</primary>
<secondary>entry types</secondary>
<tertiary>unicast</tertiary>
</indexterm>
<para>
A unicast route is the most common route in routing tables.
This is a typical route to a destination network address, which
describes the path to the destination. Even complex routes,
such as nexthop routes are considered unicast routes. If no
route type is specified on the command line, the route is
assumed to be a unicast route.
</para>
<example id="ex-list-route-unicast">
<title>unicast route types</title>
<programlisting>
<userinput>ip route add unicast 192.168.0.0/24 via 192.168.100.5</userinput>
<userinput>ip route add default via 193.7.255.1</userinput>
<userinput>ip route add unicast default via 206.59.29.193</userinput>
<userinput>ip route add 10.40.0.0/16 via 10.72.75.254</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-route-types-broadcast">
<term>broadcast</term>
<listitem>
<indexterm zone="list-routing-route-types-broadcast">
<primary>routing tables</primary>
<secondary>entry types</secondary>
<tertiary>broadcast</tertiary>
</indexterm>
<para>
This route type is used for link layer devices (such as Ethernet
cards) which support the notion of a broadcast address. This
route type is used only in the local routing table
<footnote>
<para>
OK, I'm not absolutely sure you can't use the broadcast
route in other routing tables, but I believe you can't.
Testing forthcoming...
</para>
</footnote>
and is typically handled by the kernel.
</para>
<example id="ex-list-route-broadcast">
<title>broadcast route types</title>
<programlisting>
<userinput>ip route add table local broadcast 10.10.20.255 dev eth0 proto kernel scope link src 10.10.20.67</userinput>
<userinput>ip route add table local broadcast 192.168.43.31 dev eth4 proto kernel scope link src 192.168.43.14</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-route-types-local">
<term>local</term>
<listitem>
<indexterm zone="list-routing-route-types-local">
<primary>routing tables</primary>
<secondary>entry types</secondary>
<tertiary>local</tertiary>
</indexterm>
<para>
The kernel will add entries into the local routing table when
IP addresses are added to an interface. This means that the IPs
are locally hosted IPs
<footnote>
<para>
Ibid. I'm not sure that local route types can be used
in any routing table other than the local routing table.
Testing forthcoming...
</para>
</footnote>.
</para>
<example id="ex-list-route-local">
<title>local route types</title>
<programlisting>
<userinput>ip route add table local local 10.10.20.64 dev eth0 proto kernel scope host src 10.10.20.67</userinput>
<userinput>ip route add table local local 192.168.43.12 dev eth4 proto kernel scope host src 192.168.43.14</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-route-types-nat">
<term>nat</term>
<listitem>
<indexterm zone="list-routing-route-types-nat">
<primary>routing tables</primary>
<secondary>entry types</secondary>
<tertiary>nat</tertiary>
</indexterm>
<para>
This route entry is added by the kernel in the local routing
table, when the user attempts to configure stateless NAT. See
<xref linkend="nat-stateless"/> for a fuller discussion of
network address translation in general.
<footnote>
<para>
Ibid. nat route types might be ineffectual outside
the local routing table. Testing forthcoming...
</para>
</footnote>.
</para>
<example id="ex-list-route-nat">
<title>nat route types</title>
<programlisting>
<userinput>ip route add nat 193.7.255.184 via 172.16.82.184</userinput>
<userinput>ip route add nat 10.40.0.0/16 via 172.40.0.0</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-route-types-unreachable">
<term>unreachable</term>
<listitem>
<indexterm zone="list-routing-route-types-unreachable">
<primary>routing tables</primary>
<secondary>entry types</secondary>
<tertiary>unreachable</tertiary>
</indexterm>
<para>
When a request for a routing decision returns a destination
with an unreachable route type, an ICMP unreachable is
generated and returned to the source address.
</para>
<example id="ex-list-route-unreachable">
<title>unreachable route types</title>
<programlisting>
<userinput>ip route add unreachable 172.16.82.184</userinput>
<userinput>ip route add unreachable 192.168.14.0/26</userinput>
<userinput>ip route add unreachable 209.10.26.51</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-route-types-prohibit">
<term>prohibit</term>
<listitem>
<indexterm zone="list-routing-route-types-prohibit">
<primary>routing tables</primary>
<secondary>entry types</secondary>
<tertiary>prohibit</tertiary>
</indexterm>
<para>
When a request for a routing decision returns a destination with
a prohibit route type, the kernel generates an ICMP prohibited
to return to the source address.
</para>
<example id="ex-list-route-prohibit">
<title>prohibit route types</title>
<programlisting>
<userinput>ip route add prohibit 10.21.82.157</userinput>
<userinput>ip route add prohibit 172.28.113.0/28</userinput>
<userinput>ip route add prohibit 209.10.26.51</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-route-types-blackhole">
<term>blackhole</term>
<listitem>
<indexterm zone="list-routing-route-types-blackhole">
<primary>routing tables</primary>
<secondary>entry types</secondary>
<tertiary>blackhole</tertiary>
</indexterm>
<para>
A packet matching a route with the route type blackhole is
discarded. No ICMP is sent and no packet is forwarded.
</para>
<example id="ex-list-route-blackhole">
<title>blackhole route types</title>
<programlisting>
<userinput>ip route add blackhole default</userinput>
<userinput>ip route add blackhole 202.143.170.0/24</userinput>
<userinput>ip route add blackhole 64.65.64.0/18</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-route-types-throw">
<term>throw</term>
<listitem>
<indexterm zone="list-routing-route-types-throw">
<primary>routing tables</primary>
<secondary>entry types</secondary>
<tertiary>throw</tertiary>
</indexterm>
<para>
The throw route type is a convenient route type which causes
a route lookup in a routing table to fail, returning the
<link linkend="routing-selection-adv">routing selection
process</link> to the RPDB. This is useful when there are
additional routing tables. Note that there is an implicit throw
if no default route exists in a routing table, so the route
created by the first command in the example is superfluous,
although legal.
</para>
<example id="ex-list-route-throw">
<title>throw route types</title>
<programlisting>
<userinput>ip route add throw default</userinput>
<userinput>ip route add throw 10.79.0.0/16</userinput>
<userinput>ip route add throw 172.16.0.0/12</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
</variablelist>
<para>
The power of these route types when combined with the
<link linkend="routing-rpdb">routing policy database</link> can hardly
be understated. All of these route types can be used without the
RPDB, although the throw route doesn't make much sense outside of a
multiple routing table installation.
</para>
<para>
</para>
<para>
</para>
<para>
</para>
</section>
<section id="routing-table-local">
<title>The Local Routing Table</title>
<indexterm zone="routing-table-local">
<primary>local routing table</primary>
<see>routing tables, local</see>
</indexterm>
<indexterm zone="routing-table-local">
<primary>routing tables</primary>
<secondary>local</secondary>
</indexterm>
<para>
The local routing table is maintained by the kernel. Normally, the
local routing table should not be manipulated,
but it is available for viewing. In
<xref linkend="ex-tools-ip-route-show-local"/>, you'll see two of the
common uses of the local routing table. The first common use is the
specification of broadcast address, necessary only for link layers
which support broadcast addressing. The second common type of entry
in a local routing table is a route to a locally hosted IP.
</para>
<para>
The route types found in the local routing table
are <constant>local</constant>, <constant>nat</constant> and
<constant>broadcast</constant>. These route types are not relevant in
other routing tables, and other route types cannot be used in the
local routing table.
</para>
<para>
If the machine has several IP addresses on one Ethernet interface,
there will be a route to each locally hosted IP in the local routing
table. This is a normal
<link linkend="list-basic-ifconfig-side-effects-up">side effect</link>
of bringing up an IP address on an interface under linux.
Maintenance of the broadcast and local routes in the local routing
table can only be done by the kernel.
</para>
<example id="ex-routing-table-local-maint">
<title>Kernel maintenance of the &local; routing table</title>
<programlisting>
<prompt>[root@real-server]# </prompt><userinput>ip address show dev eth1</userinput>
<computeroutput>6: eth1: &lt;BROADCAST,MULTICAST,UP&gt; mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:80:c8:e8:1e:fc brd ff:ff:ff:ff:ff:ff
inet 10.10.20.89/24 brd 10.10.20.255 scope global eth1</computeroutput>
<prompt>[root@real-server]# </prompt><userinput>ip route show dev eth1</userinput>
<computeroutput>10.10.20.0/24 proto kernel scope link src 10.10.20.89</computeroutput>
<prompt>[root@real-server]# </prompt><userinput>ip route show dev eth1 table local</userinput>
<computeroutput>broadcast 10.10.20.0 proto kernel scope link src 10.10.20.89
broadcast 10.10.20.255 proto kernel scope link src 10.10.20.89
local 10.10.20.89 proto kernel scope host src 10.10.20.89</computeroutput>
<prompt>[root@real-server]# </prompt><userinput>ip address add 192.168.254.254/24 brd + dev eth1</userinput>
<prompt>[root@real-server]# </prompt><userinput>ip address show dev eth1</userinput>
<computeroutput>6: eth1: &lt;BROADCAST,MULTICAST,UP&gt; mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:80:c8:e8:1e:fc brd ff:ff:ff:ff:ff:ff
inet 10.10.20.89/24 brd 10.10.20.255 scope global eth1
inet 192.168.254.254/24 brd 192.168.254.255 scope global eth1</computeroutput>
<prompt>[root@real-server]# </prompt><userinput>ip route show dev eth1</userinput>
<computeroutput>10.10.20.0/24 proto kernel scope link src 10.10.20.89
192.168.254.0/24 proto kernel scope link src 192.168.254.254</computeroutput>
<prompt>[root@real-server]# </prompt><userinput>ip route show dev eth1 table local</userinput>
<computeroutput>broadcast 10.10.20.0 proto kernel scope link src 10.10.20.89
broadcast 192.168.254.0 proto kernel scope link src 192.168.254.254
broadcast 10.10.20.255 proto kernel scope link src 10.10.20.89
local 192.168.254.254 proto kernel scope host src 192.168.254.254
local 10.10.20.89 proto kernel scope host src 10.10.20.89
broadcast 192.168.254.255 proto kernel scope link src 192.168.254.254</computeroutput>
</programlisting>
</example>
<para>
Note in
<xref linkend="ex-routing-table-local-maint"/>, that the kernel adds
not only the route for the locally connected network in the &main;
routing table, but also the three required special addresses in the
&local; routing table. Any IP addresses which are locally hosted on
the box will have &local; entries in the &local; table. The
<link linkend="list-routing-intro-ipdefs-netaddr">network
address</link> and
<link linkend="list-routing-intro-ipdefs-bcast">broadcast
address</link> are both entered as <constant>broadcast</constant> type
addresses on the interface to which they have been bound.
Conceptually, there is significance to the distinction between a
network and broadcast address, but practically, they are treated
analogously, by other networking gear as well as the linux kernel.
</para>
<para>
There is one other type of route which commonly ends up in the &local;
routing table. When using &iproute2; NAT, there will
be entries in the local routing table for each network address
translation. Refer to
<xref linkend="ex-tools-ip-route-nat-simple"/> and
<xref linkend="ex-tools-ip-route-nat-network"/> for example output.
</para>
</section>
<section id="routing-table-main">
<title>The Main Routing Table</title>
<indexterm zone="routing-table-main">
<primary>main routing table</primary>
<see>routing tables, main</see>
</indexterm>
<indexterm zone="routing-table-main">
<primary>routing tables</primary>
<secondary>main</secondary>
</indexterm>
<para>
The &main; routing table is the routing table most people think of when
considering a linux routing table. When no table is specified to an
<command>ip route</command> command, the kernel assumes the &main;
routing table. The <command>route</command> command only manipulates
the &main; routing table.
</para>
<para>
Similarly to the &local; table, the &main; table is populated
automatically by the kernel when new interfaces are brought up
with IP addresses. Consult the &main; routing table before and after
<userinput>ip address add 192.168.254.254/24 brd + dev eth1</userinput>
in
<xref linkend="ex-routing-table-local-maint"/> for a concrete example
of this kernel behaviour. Also, visit
<link linkend="list-basic-ifconfig-side-effects-up">this summary of
side effects</link> of interface definition and activation with
<command>ifconfig</command> or <command>ip address</command>.
</para>
<para>
</para>
</section>
</section>
<section id="routing-rpdb">
<title>Routing Policy Database (RPDB)</title>
<indexterm zone="routing-rpdb">
<primary>routing policy database</primary>
<see>RPDB</see>
</indexterm>
<indexterm zone="routing-rpdb">
<primary>RPDB</primary>
</indexterm>
<para>
The routing policy database (RPDB) controls the order in which the
kernel searches through the routing tables. Each rule has a priority,
and rules are examined sequentially from rule 0 through rule 32767.
</para>
<para>
When a new packet arrives for routing (assuming the routing cache
is empty), the kernel begins at the highest priority rule in the
RPDB--rule 0. The kernel iterates over each rule in turn until the
packet to be routed matches a rule. When this happens the kernel
follows the instructions in that rule. Typically, this causes the
kernel to perform a route lookup in a specified routing table. If a
matching route is found in the routing table, the kernel uses that
route. If no such route is found, the kernel returns to traverse the
RPDB again, until every option has been exhausted.
</para>
<para>
The priority-based rule system provides a flexible way to define routes
while taking advantage of the traditional routing table concept.
For a complete picture of the entire route selection process including
the RPDB, see
<link linkend="routing-selection-adv">the section on routing
selection</link>.
</para>
<para>
There are a number of different rule types available for use in the
routing policy database. These rule types have a striking similarity to
the
<link linkend="list-routing-route-types">route types</link> available
for route entries.
</para>
<variablelist id="list-routing-rule-types">
<varlistentry id="list-routing-rule-types-unicast">
<term>unicast</term>
<listitem>
<indexterm zone="list-routing-rule-types-unicast">
<primary>RPDB</primary>
<secondary>entry types</secondary>
<tertiary>unicast</tertiary>
</indexterm>
<para>
A unicast rule entry is the most common rule type. This rule type
simple causes the kernel to refer to the specified routing table
in the search for a route. If no rule type is specified on the
command line, the rule is assumed to be a unicast rule.
</para>
<example id="ex-list-rule-unicast">
<title>unicast rule type</title>
<programlisting>
<userinput>ip rule add unicast from 192.168.100.17 table 5</userinput>
<userinput>ip rule add unicast iif eth7 table 5</userinput>
<userinput>ip rule add unicast fwmark 4 table 4</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-rule-types-nat">
<term>nat</term>
<listitem>
<indexterm zone="list-routing-rule-types-nat">
<primary>RPDB</primary>
<secondary>entry types</secondary>
<tertiary>nat</tertiary>
</indexterm>
<para>
The nat rule type is required for correct operation of stateless
NAT. This rule is typically coupled with a corresponding nat
route entry. The RPDB nat entry causes the kernel to rewrite the
source address of an outbound packet. See
<xref linkend="nat-stateless"/> for a fuller discussion of network
address translation in general.
</para>
<example id="ex-list-rule-nat">
<title>nat rule type</title>
<programlisting>
<userinput>ip rule add nat 193.7.255.184 from 172.16.82.184</userinput>
<userinput>ip rule add nat 10.40.0.0 from 172.40.0.0/16</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-rule-types-unreachable">
<term>unreachable</term>
<listitem>
<indexterm zone="list-routing-rule-types-unreachable">
<primary>RPDB</primary>
<secondary>entry types</secondary>
<tertiary>unreachable</tertiary>
</indexterm>
<para>
Any route lookup matching a rule entry with an unreachable rule
type will cause the kernel to generate an ICMP unreachable to
the source address of the packet.
</para>
<example id="ex-list-rule-unreachable">
<title>unreachable rule type</title>
<programlisting>
<userinput>ip rule add unreachable iif eth2 tos 0xc0</userinput>
<userinput>ip rule add unreachable iif wan0 fwmark 5</userinput>
<userinput>ip rule add unreachable from 192.168.7.0/25</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-rule-types-prohibit">
<term>prohibit</term>
<listitem>
<indexterm zone="list-routing-rule-types-prohibit">
<primary>RPDB</primary>
<secondary>entry types</secondary>
<tertiary>prohibit</tertiary>
</indexterm>
<para>
Any route lookup matching a rule entry with a prohibit rule type
will cause the kernel to generate an ICMP prohibited to the source
address of the packet.
</para>
<example id="ex-list-rule-prohibit">
<title>prohibit rule type</title>
<programlisting>
<userinput>ip rule add prohibit from 209.10.26.51</userinput>
<userinput>ip rule add prohibit to 64.65.64.0/18</userinput>
<userinput>ip rule add prohibit fwmark 7</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
<varlistentry id="list-routing-rule-types-blackhole">
<term>blackhole</term>
<listitem>
<indexterm zone="list-routing-rule-types-blackhole">
<primary>RPDB</primary>
<secondary>entry types</secondary>
<tertiary>blackhole</tertiary>
</indexterm>
<para>
While traversing the RPDB, any route lookup which matches a rule
with the blackhole rule type will cause the packet to be dropped.
No ICMP will be sent and no packet will be forwarded.
</para>
<example id="ex-list-rule-blackhole">
<title>blackhole rule type</title>
<programlisting>
<userinput>ip rule add blackhole from 209.10.26.51</userinput>
<userinput>ip rule add blackhole from 172.19.40.0/24</userinput>
<userinput>ip rule add blackhole to 10.182.17.64/28</userinput>
</programlisting>
</example>
</listitem>
</varlistentry>
</variablelist>
<para>
The routing policy database provides the core of functionality around
which the policy routing and advanced routing features can be built.
</para>
</section>
<section id="routing-icmp">
<title>ICMP and Routing</title>
<!--
#
# content to add here:
#
# ignoring ICMP redirects (sysctl or pf); side effects, how to
# sending ICMP redirects
# suppressing generation of ICMP redirects (sysctl); how to
#
-->
<para>
ICMP is a very important part of the communication between hosts on
IP networks. Used by routers and endpoints (clients and servers)
ICMP communicates error conditions in networks and
provides a means for endpoints to receive information
about a network path or requested connection.
</para>
<para>
One of the commonest uses of ICMP by the administrator of a network is
the use of
<link linkend="tools-ping"><command>ping</command></link> to detect the
state of a machine in the network. There are other types of ICMP which
are used for other inter-computer communication. One other common type
of ICMP is the ICMP returned by a router or host which is not accepting
connections. Essentially, the host returns the ICMP as a polite method
of saying <quote>Go away.</quote>.
</para>
<para>
</para>
<section id="routing-icmp-mtu">
<title>MTU, MSS, and ICMP</title>
<!--
#
# content to add here:
#
# discuss path MTU, remove MSS; discuss necessary ICMP
# for communication with other hosts; xref pf-necessary-icmp
#
-->
<para>
One important use of ICMP, which is completely transparent
to most users (and indeed many admins), is the use of ICMP to discover
the Path Maximum Transmission Unit (PMTU). By discovering the Path MTU
and transmitting packets with this the MTU, a host can
minimize the delay of traffic due to fragmentation, and
(theoretically) attain a more even rate of data transmission. Because
each destination may have a different MTU due to different network
paths, the MTU is a per route attribute stored in the
<link linkend="routing-cache">routing cache</link>.
</para>
<!-- FIXME; make sure to make a full discussion of PMTU -->
<!--
Example from Giovanni Quadriglio. Needs to be incorporated into the
document.
As usual I've forgotten the PMTU example
- - Example PMTU - playing with Path MTU Discovering
eth = 0 1 0 0
- - - - - - - - - - - -
|server| - - - |router| - - - |client|
- - - - - - - - - - - -
MTU = 1500 1000 1500 1500
[root@server]# nc -l -p 9999
[root@router]# ifconfig eth1 mtu 1000
Now if on router we issue:
[root@client]# tcpdump -i eth0
and later on client we issue:
[root@client]# cat data | nc server 9999
(data is a file of 2000 byte in size for example)
we can see router sends the client the ICMP error:
server unreachable - need to frag but DF bit set (mtu=1000) !
now if PMTU discovery is enabled on client the new packet len. will be
recalculated with this new MTU in mind so that DF is always set
and the packet will reach server without being fragmented
if on client we had issued:
[root@client]# sysclt -w net.ipv4.ip_no_pmtu_disc=1
PMTU discovery on client would has been disabled. New packets starting from
client
will not have DF bit set and fragmentation will occour during the
path from client to server (i.e router fragments the packet).
It could happen to touch this parameter because of bad ICMP filtering
on some router.
-->
<para>
Path MTU can be quite easily broken if any single hop along the way
blocks all ICMP. Be sure to allow ICMP unreachable/fragmentation
needed packets into and out of your network. This will prevent you
from being one of the unclueful network admins who cause PMTU
problems.
</para>
<!-- FIXME; XREF link to minimum firewall for ICMP -->
<para>
</para>
</section>
<section id="routing-icmp-redirect">
<title>ICMP Redirects and Routing</title>
<para>
An ICMP redirect is a router's way of communicating
that there is a better path out of this network or into another one
than the one the host had chosen. In
<link linkend="example-network-netmap">the example network</link>,
&tristan; has a route to the world through &masq-gw; and a route to
192.168.98.0/24 through &isdn-router;. If &tristan; sends a packet
for 192.168.98.0/24 to &masq-gw;, the optimal outcome is for
&masq-gw; to suggest with an ICMP redirect that &tristan; send such
packets via &isdn-router; instead.
</para>
<para>
By this method, hosts can learn what networks are reachable
through which routers on the local network segment. ICMP redirect
messages, however, are easy to forge, and were (at one time) used to
subvert poorly configured machines. While this is infrequently a
problem on the Internet today,
it's still good practice to ignore ICMP redirect
messages from public networks. Create static routes where
necessary on private and public networks to
prevent ICMP redirect messages from being generated on your network.
</para>
<para>
To examine an example of ICMP redirect in action, we simply
need to send a packet directly from &tristan; to
&morgan;. We assume that &masq-gw; has a route to 192.168.98.0/24
via 192.168.99.1 (&isdn-router;), that &tristan; has no
such route.
</para>
<example id="ex-routing-icmp-redirect">
<title>ICMP Redirect on the Wire
<footnote>
<para>
Consult <xref linkend="tb-example-network-hosts"/> for details on
the IP and MAC addresses of the hosts referred to in this
example.
</para>
</footnote>
</title>
<programlisting>
<prompt>[root@tristan]# </prompt><userinput>echo test | nc 192.168.98.82 22</userinput>
<prompt>[root@tristan]# </prompt><userinput>tcpdump -nneqti eth0</userinput>
<computeroutput>0:80:c8:f8:4a:51 0:80:c8:f8:5c:71 74: 192.168.99.35.54510 > 192.168.98.82.22: tcp 0 (DF)
0:80:c8:f8:5c:71 0:80:c8:f8:4a:51 102: 192.168.99.254 > 192.168.99.35: icmp: redirect 192.168.98.82 to host 192.168.99.1 [tos 0xc0]
0:80:c8:f8:5c:71 0:c0:7b:45:6a:39 74: 192.168.99.35.54510 > 192.168.98.82.22: tcp 0 (DF)</computeroutput>
</programlisting>
</example>
<para>
There's a great deal of information above, so let's examine the
important parts. We have the first three packets which passed by our
NIC as a result of this attempt to establish a session. First, we see
a packet from &tristan; bound for &morgan; with &tristan;'s source MAC
and &masq-gw;'s destination MAC. Because &masq-gw; is &tristan;'s
default gateway, &tristan; will send all packets there.
</para>
<para>
The next packet is the ICMP redirect, informing &tristan; of a
better route. It includes several pieces of information.
Implicitly, the source IP indicates what router is suggesting the
alternate route, and the contents specify what the intended
destination was, and what the better route is. Note that &masq-gw;
suggests using 192.168.99.1 (&isdn-router;) as the gateway for this
destination.
</para>
<para>
The final packet is part of the intended session, but has the MAC
address of &masq-gw; on it. &masq-gw; has (courteously) informed us
that we should not use it as a route for the intended destination, but
has also (courteously) forwarded the packet as we had requested. In
this small network, it is acceptable to allow ICMP redirect messages,
although these should always be dropped at network borders, both
inbound and outbound.
</para>
<para>
So, in summary, ICMP redirect messages are not intrinsically dangerous
or problematic, but they shouldn't exist in well-maintained networks.
If you happen to see them growing in the shadows of your network, some
careful observation should show you what hosts are affected and which
routing tables could use some attention.
</para>
</section>
</section>
</chapter>