mirror of https://github.com/tLDP/LDP
2254 lines
97 KiB
XML
2254 lines
97 KiB
XML
<!-- $Id$ -->
|
|
|
|
<chapter id="ch-routing">
|
|
<title>IP Routing</title>
|
|
<indexterm zone="ch-routing">
|
|
<primary>IP Routing</primary>
|
|
<see>routing</see>
|
|
</indexterm>
|
|
<indexterm zone="ch-routing">
|
|
<primary>Routing</primary>
|
|
</indexterm>
|
|
<para>
|
|
Routing is fundamental to the design of the Internet Protocol. IP
|
|
routing has been cleverly designed to minimize the complexity for leaf
|
|
nodes and networks. Linux can be used as a leaf node, such as a
|
|
workstation, where setting the IP address, netmask and
|
|
default gateway suffices for all routing needs. Alternatively, the same
|
|
routing subsystem can be used in the core of a network connecting
|
|
multiple public and private networks.
|
|
</para>
|
|
<para>
|
|
This chapter will begin with the
|
|
<link linkend="routing-intro">basics of IP routing with linux</link>,
|
|
<link linkend="routing-local">routing to locally connected
|
|
destinations</link>,
|
|
<link linkend="routing-default">routing to destinations through the
|
|
default gateway</link>, and
|
|
<link linkend="routing-forwarding">using linux as a router</link>.
|
|
Subsequent topics will include
|
|
<link linkend="routing-selection">the kernel's route selection
|
|
algorithm</link>, the
|
|
<link linkend="routing-cache">routing cache</link>,
|
|
<link linkend="routing-tables">routing tables</link>, the
|
|
<link linkend="routing-rpdb">routing policy database</link>, and
|
|
<link linkend="routing-icmp">issues with ICMP and routing</link>.
|
|
</para>
|
|
<para>
|
|
The precinct of this documentation is primarily static routing. Though
|
|
dynamic routing is important to large networks, Internet service
|
|
providers, and backbone providers, this documentation is targetted for
|
|
smaller networks, particularly networks which use static routing.
|
|
Nonetheless, the concepts governing the manipulation of a packet in the
|
|
kernel, and how routing decisions are made by the kernel are applicable to
|
|
dynamic routing environments.
|
|
</para>
|
|
<para>
|
|
The linux routing subsystem has been designed with large
|
|
scale networks in mind, without forgetting the need for easy
|
|
configurability for leaf nodes, such as workstations and servers.
|
|
</para>
|
|
<section id="routing-intro">
|
|
<title>Introduction to Linux Routing</title>
|
|
<para>
|
|
The design of IP routing allows for very simple route
|
|
definitions for small networks, while not hindering the flexibility of
|
|
routing in complex environments. A key concept in IP routing is
|
|
the ability to define what addresses are locally reachable as opposed to
|
|
not directly known destinations. Every IP capable host knows about at
|
|
least three classes of destination: itself, locally connected
|
|
computers and everywhere else.
|
|
</para>
|
|
<para>
|
|
Most fully-featured IP-aware networked operating systems
|
|
(all unix-like operating systems with IP stacks,
|
|
modern Macintoshes, and modern Windows) include support for the loopback
|
|
device and IP. This is an IP and range configured on the host machine
|
|
itself which allows the machine to talk to itself. Linux systems can
|
|
communicate over IP on any locally configured IP address, whether on the
|
|
loopback device or not. This is the first class of destinations:
|
|
locally hosted addresses.
|
|
</para>
|
|
<para>
|
|
The second class of IP addresses are addresses in the locally
|
|
connected network segment. Each machine with a connection to an IP
|
|
network can reach a subset of the entire IP address space on its
|
|
directly connected network interface.
|
|
</para>
|
|
<para>
|
|
All other hosts or destination IPs fall into a third range. Any IP
|
|
which is not on the machine itself or locally reachable (i.e. connected
|
|
to the same media segment) is only reachable through an IP routing
|
|
device. This routing device must have an IP address in a locally
|
|
reachable IP address range.
|
|
</para>
|
|
<para>
|
|
All IP networking is a permutation of these three fundamental concepts
|
|
of reachability. This list summarizes the three possible
|
|
classifications for reachability of destination IP addresses from any
|
|
single source machine.
|
|
</para>
|
|
<anchor id="list-routing-intro"/>
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>
|
|
The IP address is reachable on the machine itself. Under linux
|
|
this is considered
|
|
<link linkend="tb-tools-ip-addr-scope">scope host</link> and is used
|
|
for IPs bound to any network device including loopback devices,
|
|
and the network range for the loopback device. Addresses of this
|
|
nature are called local IPs or locally hosted IPs.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The IP address is reachable on the directly connected link layer
|
|
medium. Addresses of this type are called locally reachable or
|
|
(preferred) directly reachable IPs.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The IP address is ultimately reachable through a router which
|
|
is reachable on a directly connected link layer medium. This class
|
|
of IP addresses is only reachable through a gateway.
|
|
</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
<para>
|
|
As a practical description of the above, this partial diagram of the
|
|
<link linkend="ax-example-network">example network</link> shows two
|
|
machines connected to 192.168.99.0/24. On &tristan; the IP addresses
|
|
127.0.0.1 (loopback--not pictured) and 192.168.99.35 are considered
|
|
locally hosted IP addresses. The directly reachable IP addresses fall
|
|
inside the 192.168.99.0/24 network. Any other destination addresses are
|
|
only reachable through a gateway, probably &masq-gw;.
|
|
</para>
|
|
<example id="routing-intro-classes">
|
|
<title>Classes of IP addresses</title>
|
|
<mediaobject id="image-routing-intro">
|
|
<imageobject>
|
|
<imagedata fileref="images/routing-intro.png" format="PNG"/>
|
|
</imageobject>
|
|
<imageobject>
|
|
<imagedata fileref="images/routing-intro.svg" format="SVG"/>
|
|
</imageobject>
|
|
</mediaobject>
|
|
</example>
|
|
<para>
|
|
Before examining the routing system in more detail, there are some terms
|
|
to identify and define. These terms are general IP networking terms
|
|
and should be familiar to users who have used IP on other operating
|
|
systems and networking equipment.
|
|
</para>
|
|
<variablelist id="list-routing-intro-ipdefs">
|
|
<varlistentry id="list-routing-intro-ipdefs-octet">
|
|
<term>octet</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-intro-ipdefs-octet">
|
|
<primary>octet</primary>
|
|
<see>IP addressing, octet</see>
|
|
</indexterm>
|
|
<indexterm zone="list-routing-intro-ipdefs-octet">
|
|
<primary>IP addressing</primary>
|
|
<secondary>octet</secondary>
|
|
</indexterm>
|
|
<para>
|
|
A single number between decimal 0 and 255, hexadecimal 0x00 and
|
|
0xff. An octet is a single byte in size.
|
|
</para>
|
|
<para>
|
|
Examples: <emphasis>140</emphasis>, <emphasis>254</emphasis>,
|
|
<emphasis>255</emphasis>, <emphasis>1</emphasis>,
|
|
<emphasis>0</emphasis>, <emphasis>7</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-intro-ipdefs-ipaddr">
|
|
<term>IP address</term>
|
|
<term>IP</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-intro-ipdefs-ipaddr">
|
|
<primary>IP address</primary>
|
|
<seealso>IP addressing, address</seealso>
|
|
</indexterm>
|
|
<indexterm zone="list-routing-intro-ipdefs-ipaddr">
|
|
<primary>IP addressing</primary>
|
|
<secondary>address</secondary>
|
|
</indexterm>
|
|
<para>
|
|
A locally unique four
|
|
<link linkend="list-routing-intro-ipdefs-octet">octet</link>
|
|
logical identifier which a machine
|
|
can use to communicate using the Internet Protocol. This
|
|
address is determined by combining the
|
|
<link linkend="list-routing-intro-ipdefs-netaddr">network
|
|
address</link> and the administratively assigned host address.
|
|
Simply put, the IP address is a unique number identifying
|
|
a host on a network.
|
|
</para>
|
|
<para>
|
|
Examples: <emphasis>192.168.99.35</emphasis>,
|
|
<emphasis>140.71.38.7</emphasis>,
|
|
<emphasis>205.254.210.186</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-intro-ipdefs-hostaddr">
|
|
<term>host address portion</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-intro-ipdefs-hostaddr">
|
|
<primary>IP addressing</primary>
|
|
<secondary>host address portion</secondary>
|
|
</indexterm>
|
|
<para>
|
|
The rightmost bits (frequently
|
|
<link linkend="list-routing-intro-ipdefs-octet">octets</link>)
|
|
in an
|
|
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
|
|
address</link> which are not a part of the
|
|
<link linkend="list-routing-intro-ipdefs-netaddr">network
|
|
address</link>. The part of an IP address which identifies the
|
|
computer on a network independent of the network.
|
|
</para>
|
|
<para>
|
|
Examples: 192.168.1.<emphasis>27</emphasis>/24,
|
|
10.<emphasis>10.17.24</emphasis>/8,
|
|
172.20.<emphasis>158.75</emphasis>/16.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-intro-ipdefs-netaddr">
|
|
<term>network address</term>
|
|
<term>network</term>
|
|
<term>network prefix</term>
|
|
<term>subnetwork address</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-intro-ipdefs-netaddr">
|
|
<primary>network address</primary>
|
|
<see>IP addressing, network address</see>
|
|
</indexterm>
|
|
<indexterm zone="list-routing-intro-ipdefs-netaddr">
|
|
<primary>IP addressing</primary>
|
|
<secondary>network address</secondary>
|
|
</indexterm>
|
|
<para>
|
|
A four
|
|
<link linkend="list-routing-intro-ipdefs-octet">octet</link>
|
|
address and
|
|
<link linkend="list-routing-intro-ipdefs-netmask">network
|
|
mask</link>
|
|
identifying the usable range of
|
|
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
|
|
addresses</link>. Conventional and CIDR notations combine
|
|
the four bare octets with the netmask or prefix length to
|
|
define this address. Briefly, a network address is the first
|
|
address in a range, and is reserved to identify the entire
|
|
network.
|
|
<footnote>
|
|
<para>
|
|
At least one reader (CAO) has pointed out to me that there is
|
|
ambiguity in the meaning and common usage of the
|
|
term <wordasword>network
|
|
address</wordasword>. While occasionally used to refer to a
|
|
single IP address at the top of a range of addresses, the
|
|
primary meaning requires the implicit
|
|
<link linkend="list-routing-intro-ipdefs-netmask">network
|
|
mask</link>.
|
|
</para>
|
|
<para>
|
|
Historically, this term has always meant the IP address at the
|
|
top of a range AND the netmask identifying the set of
|
|
available addresses. Without this latter piece of
|
|
information, the <wordasword>network address</wordasword> is
|
|
simply an
|
|
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
|
|
address</link>.
|
|
</para>
|
|
<para>
|
|
Technically, the use of this term to mean a single IP
|
|
at the top of the range is incorrect, although not uncommon.
|
|
</para>
|
|
</footnote>
|
|
</para>
|
|
<para>
|
|
Examples: <emphasis>192.168.187.0/24</emphasis>,
|
|
<emphasis>205.254.211.192/26</emphasis>,
|
|
<emphasis>4.20.17.128/255.255.255.248</emphasis>,
|
|
<emphasis>10.0.0.0/255.0.0.0</emphasis>,
|
|
<emphasis>12.35.17.112/28</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-intro-ipdefs-netmask">
|
|
<term>network mask</term>
|
|
<term>netmask</term>
|
|
<term>network bitmask</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-intro-ipdefs-netmask">
|
|
<primary>netmask</primary>
|
|
<see>IP addressing, network mask</see>
|
|
</indexterm>
|
|
<indexterm zone="list-routing-intro-ipdefs-netmask">
|
|
<primary>network mask</primary>
|
|
<see>IP addressing, network mask</see>
|
|
</indexterm>
|
|
<indexterm zone="list-routing-intro-ipdefs-netmask">
|
|
<primary>IP addressing</primary>
|
|
<secondary>network mask</secondary>
|
|
</indexterm>
|
|
<para>
|
|
A four-octet set of bits which, when AND'd with a particular
|
|
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
|
|
address</link> produces the
|
|
<link linkend="list-routing-intro-ipdefs-netaddr">network
|
|
address</link>. Combined with a network address or IP address,
|
|
the netmask identifies the range of IP addresses which are
|
|
directly reachable.
|
|
</para>
|
|
<para>
|
|
Examples: <emphasis>255.255.255.0</emphasis>,
|
|
<emphasis>255.255.0.0</emphasis>,
|
|
<emphasis>255.255.192.0</emphasis>,
|
|
<emphasis>255.255.255.224</emphasis>,
|
|
<emphasis>255.0.0.0</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-intro-ipdefs-prefix">
|
|
<term>prefix length</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-intro-ipdefs-prefix">
|
|
<primary>prefix length</primary>
|
|
<see>IP addressing, prefix length</see>
|
|
</indexterm>
|
|
<indexterm zone="list-routing-intro-ipdefs-prefix">
|
|
<primary>IP addressing</primary>
|
|
<secondary>prefix length</secondary>
|
|
</indexterm>
|
|
<para>
|
|
An alternate representation of
|
|
<link linkend="list-routing-intro-ipdefs-netmask">network
|
|
mask</link>, this is a single integer between 0 and 32,
|
|
identifying the number of significant bits in an
|
|
<link linkend="list-routing-intro-ipdefs-ipaddr">IP
|
|
address</link> or
|
|
<link linkend="list-routing-intro-ipdefs-netaddr">network
|
|
address</link>. This is the "slash-number" component of a
|
|
CIDR address.
|
|
</para>
|
|
<para>
|
|
Examples: 4.20.17.0<emphasis>/24</emphasis>,
|
|
66.14.17.116<emphasis>/30</emphasis>,
|
|
10.158.42.72<emphasis>/29</emphasis>,
|
|
10.48.7.198<emphasis>/9</emphasis>,
|
|
192.168.154.64<emphasis>/26</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-intro-ipdefs-bcast">
|
|
<term>broadcast address</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-intro-ipdefs-bcast">
|
|
<primary>broadcast address (IP)</primary>
|
|
<see>IP addressing, broadcast address</see>
|
|
</indexterm>
|
|
<indexterm zone="list-routing-intro-ipdefs-bcast">
|
|
<primary>IP addressing</primary>
|
|
<secondary>broadcast address</secondary>
|
|
</indexterm>
|
|
<para>
|
|
A four
|
|
<link linkend="list-routing-intro-ipdefs-octet">octet</link>
|
|
address derived from an OR operation between the
|
|
<link linkend="list-routing-intro-ipdefs-hostaddr">host address
|
|
portion</link> of a
|
|
<link linkend="list-routing-intro-ipdefs-netaddr">network
|
|
address</link> and the full broadcast special 255.255.255.255.
|
|
The broadcast is the highest allowable address in a given network,
|
|
and is reserved for broadcast traffic.
|
|
</para>
|
|
<para>
|
|
Examples: <emphasis>192.168.205.255/24</emphasis>,
|
|
<emphasis>172.18.255.255/16</emphasis>,
|
|
<emphasis>12.7.149.63/26</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>
|
|
These definitions are common to IP networking in general, and are
|
|
understood by all in the IP networking community. For less terse
|
|
introductory material on matters of IP network addressing in general,
|
|
see
|
|
<xref linkend="links-general-ip"/>.
|
|
</para>
|
|
<para>
|
|
As is apparent from the interdependencies amongst the above
|
|
definitions, each term defines a separate part of the concept of
|
|
the relationships between an IP address and its network. A good
|
|
<link linkend="tools-ipcalc">IP calculator</link> can assist in
|
|
mastering these IP fundamentals.
|
|
</para>
|
|
<example id="ex-routing-intro-ipcalc">
|
|
<title>Using ipcalc to display IP information</title>
|
|
<programlisting>
|
|
<prompt>[user@workstation]$ </prompt><userinput>ipcalc -n 12.7.149.0/26</userinput>
|
|
|
|
Address: 12.7.149.0 00001100.00000111.10010101.00 000000
|
|
Netmask: 255.255.255.192 = 26 11111111.11111111.11111111.11 000000
|
|
Wildcard: 0.0.0.63 00000000.00000000.00000000.00 111111
|
|
=>
|
|
Network: 12.7.149.0/26 00001100.00000111.10010101.00 000000 (Class A)
|
|
Broadcast: 12.7.149.63 00001100.00000111.10010101.00 111111
|
|
HostMin: 12.7.149.1 00001100.00000111.10010101.00 000001
|
|
HostMax: 12.7.149.62 00001100.00000111.10010101.00 111110
|
|
Hosts/Net: 62
|
|
</programlisting>
|
|
</example>
|
|
<para>
|
|
A tool similar to the one shown in
|
|
<xref linkend="ex-routing-intro-ipcalc"/> can assist in visualizing the
|
|
relationships among IP addressing concepts.
|
|
</para>
|
|
<para>
|
|
Subequently, this chapter will introduce some concrete examples of
|
|
routing in a real network. The
|
|
<link linkend="ax-example-network">example network</link> illustrates
|
|
this network and all of the addresses involved.
|
|
</para>
|
|
</section>
|
|
<section id="routing-local">
|
|
<title>Routing to Locally Connected Networks</title>
|
|
<indexterm zone="routing-local">
|
|
<primary>routing</primary>
|
|
<secondary>to locally reachable networks</secondary>
|
|
</indexterm>
|
|
<para>
|
|
Any IP network is defined by two sets of numbers: network address and
|
|
netmask. By convention, there are two ways to represent these two
|
|
numbers. Netmask notation is the convention and tradition in IP
|
|
networking
|
|
although the more succinct CIDR notation is gaining popularity.
|
|
</para>
|
|
<para>
|
|
In the
|
|
<link linkend="ax-example-network">example network</link>, &isolde; has
|
|
IP address 192.168.100.17.
|
|
In CIDR notation, &isolde;'s address is 192.168.100.17/24, and in
|
|
traditional netmask notation, 192.168.100.17/255.255.255.0.
|
|
Any of the
|
|
<link linkend="tools-ipcalc">IP calculators</link>, confirms that the
|
|
first usable IP address is 192.168.100.1 and the last usable IP address
|
|
is 192.168.100.254.
|
|
Importantly, the IP network address, 192.168.100.0/24, is reachable
|
|
through the directly connected Ethernet interface (refer to
|
|
<link linkend="list-routing-intro">classification 2</link>).
|
|
Therefore, &isolde; should be able to reach any IP address in
|
|
this range directly on the locally connected Ethernet segment.
|
|
</para>
|
|
<para>
|
|
Below is the routing table for &isolde;, first shown with the
|
|
conventional <command>route -n</command> output
|
|
<footnote>
|
|
<para>
|
|
The <command>route -n</command> output can also be produced with
|
|
<command>netstat -rn</command> and is commonly used by
|
|
admininstrators who rely on platform independent behaviour across
|
|
heterogeneous Unix and Unix-like systems. This traditional
|
|
routing table output uses conventional netmask notation to
|
|
denote network size.
|
|
</para>
|
|
</footnote>
|
|
and then with the
|
|
<command>ip route show</command>
|
|
<footnote>
|
|
<para>
|
|
Refer to the
|
|
<link linkend="tools-ip-route"><command>ip route</command></link>
|
|
section for a fuller discussion of this linux specific tool.
|
|
The routing table output from <command>ip route</command> uses
|
|
exclusively CIDR notation.
|
|
</para>
|
|
</footnote>
|
|
command. Each of these tools conveys
|
|
the same routing table and operates on the same kernel routing table.
|
|
For more on the routing table displayed in
|
|
<xref linkend="ex-routing-local"/>, consult
|
|
<xref linkend="routing-table-main"/>.
|
|
</para>
|
|
<example id="ex-routing-local">
|
|
<title>Identifying the locally connected networks with
|
|
<command>route</command></title>
|
|
<programlisting>
|
|
<prompt>[root@isolde]# </prompt><userinput>route -n</userinput>
|
|
<computeroutput>Kernel IP routing table
|
|
Destination Gateway Genmask Flags Metric Ref Use Iface
|
|
192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
|
|
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
|
|
0.0.0.0 192.168.100.254 0.0.0.0 UG 0 0 0 eth0</computeroutput>
|
|
<prompt>[root@isolde]# </prompt><userinput>ip route show</userinput>
|
|
<computeroutput>192.168.100.0/24 dev eth0 scope link
|
|
127.0.0.0/8 dev lo scope link
|
|
default via 192.168.100.254 dev eth0</computeroutput>
|
|
</programlisting>
|
|
</example>
|
|
<para>
|
|
In the above example, the locally reachable destination is
|
|
192.168.100.0/255.255.255.0 which can also be written 192.168.100.0/24
|
|
as in <command>ip route show</command>. In classful networking
|
|
terms, the network to which &isolde; is directly connected is called a
|
|
class C sized network.
|
|
</para>
|
|
<para>
|
|
When a process on &isolde; needs to send a packet to another
|
|
machine on the locally connected network, packets will be sent from
|
|
192.168.100.17 (&isolde;'s IP). The kernel will consult
|
|
the routing table to determine the route and the source address to use
|
|
when sending this packet.
|
|
Assuming the destination is 192.168.100.32, the kernel will find that
|
|
192.168.100.32 falls inside the IP address range 192.168.100.0/24 and
|
|
will select this route for the outbound packet. For further details on
|
|
source address selection, see
|
|
<xref linkend="routing-saddr-selection"/>. The source address on the
|
|
outbound packet conveys vital information to the host receiving the
|
|
packet. In order for the packet to be able to return, &isolde; has to
|
|
use an IP address that is locally available, 192.168.100.32 has to have
|
|
a route to &isolde; and neither host must block the packet.
|
|
</para>
|
|
<para>
|
|
The packet will be sent to the locally connected network segment
|
|
directly, because &isolde; interprets from the routing table
|
|
that 192.168.100.32 is directly reachable through the physical network
|
|
connection on eth0.
|
|
</para>
|
|
<para>
|
|
Occasionally, a machine will be directly connected to two different
|
|
IP networks on the same device.
|
|
The routing table will show that both networks are reachable through
|
|
the same physical device. For more on this topic, see
|
|
<xref linkend="adv-media-share"/>. Similarly, multi-homed hosts will
|
|
have routes for all locally connected networks through the
|
|
locally-connected network interface. For more on this sort of
|
|
configuration, see
|
|
<xref linkend="adv-multi-homed"/>.
|
|
</para>
|
|
<para>
|
|
This covers the classification of IP destinations which are available on
|
|
a locally connected network. This highlights the importance of an
|
|
accurate netmask and network address. The next section will cover
|
|
IP ranges which are neither locally hosted
|
|
nor fall in the range of the locally reachable networks. These
|
|
destinations must be reached through a router.
|
|
</para>
|
|
</section>
|
|
<section id="routing-default">
|
|
<title>Sending Packets Through a Gateway</title>
|
|
<indexterm zone="routing-default">
|
|
<primary>routing</primary>
|
|
<secondary>to a default gateway</secondary>
|
|
</indexterm>
|
|
<para>
|
|
By comparison to the total number of publicly accessible hosts on the
|
|
Internet there is an almost insignificant number of hosts inside any
|
|
locally reachable network. This means that the majority of potential
|
|
destinations are only available via a router.
|
|
</para>
|
|
<para>
|
|
Any machine which will accept and forward packets between two networks
|
|
is a router. Every router is at least dual-homed; one interface
|
|
connects to one network, and a second interface connects to another
|
|
network. This interface is frequently an independent NIC, although it
|
|
might be a virtual interface, such as a VLAN interface. Machines
|
|
connected to either network learn by a routing protocol or are
|
|
statically configured to pass traffic for the other network to the
|
|
router.
|
|
</para>
|
|
<para>
|
|
For &tristan;, there are two different paths out of 192.168.99.0/24.
|
|
One path has another leaf network, 192.168.98.0/24, and the other path
|
|
has many networks, including the Internet. The routing table on
|
|
&tristan; should then contain two different routes out of the network.
|
|
One destination 192.168.98.0/24 will be reachable through 192.168.99.1.
|
|
So, if &tristan; has a packet with a destination IP address in the range
|
|
of the branch office network, it will choose to send the packet directly
|
|
to &isdn-router;.
|
|
</para>
|
|
<para>
|
|
The default route is another way to say the route for destination 0/0.
|
|
This is the most general possible route.
|
|
It is the catch-all route. If no more specific
|
|
route exists in a routing table, a default route will be used.
|
|
Many servers and workstations are connected to leaf networks
|
|
with only one router, hence
|
|
<xref linkend="ex-routing-local"/>
|
|
shows a very common sort of routing table. There's a route for
|
|
localhost, for the locally connected IP network, and a default
|
|
route.
|
|
</para>
|
|
<para>
|
|
For Internet-connected hosts, the default route is customarily set to
|
|
the IP of the locally reachable router which has a path to the Internet.
|
|
Each router in turn has a default gateway pointing to another
|
|
Internet-connected router until the packet is handed off to an Internet
|
|
Service Provider's network.
|
|
</para>
|
|
</section>
|
|
<section id="routing-forwarding">
|
|
<title>Operating as a Router</title>
|
|
<indexterm zone="routing-forwarding">
|
|
<primary>router</primary>
|
|
<secondary>operating as a</secondary>
|
|
</indexterm>
|
|
<indexterm zone="routing-forwarding">
|
|
<primary>IP forwarding</primary>
|
|
</indexterm>
|
|
<indexterm zone="routing-forwarding">
|
|
<primary>forwarding</primary>
|
|
<see>IP forwarding</see>
|
|
</indexterm>
|
|
<indexterm zone="routing-forwarding">
|
|
<primary>sysctl</primary>
|
|
<secondary><constant>ip_forward</constant></secondary>
|
|
</indexterm>
|
|
<para>
|
|
Operating as a router allows a linux machine to accept packets on one
|
|
interface and transmit them on another. This is the nature of a router.
|
|
The process of accepting and transmitting IP packets is known as
|
|
forwarding. IP forwarding is a requirement for many of the networking
|
|
techniques identified here. Stateless NAT and firewalling, transparent
|
|
proxying and masquerading all require the support of IP forwarding in
|
|
order to function correctly.
|
|
</para>
|
|
<para>
|
|
The sysctl <filename>net/ipv4/ip_forward</filename> toggles the IP
|
|
forwarding functionality on a linux box. Note that setting this sysctl
|
|
alters other routing-related sysctl entries, so it is wise to set this
|
|
first, and then alter other entries.
|
|
Frequently, an administrator will forget this simple and crucial detail
|
|
when configuring a new machine to operate as a router only to be
|
|
frustrated at the simple error.
|
|
</para>
|
|
<para>
|
|
The sysctl <filename>net/ipv4/conf/$DEV/forward</filename> defaults to
|
|
the value of <filename>net/ipv4/ip_forward</filename>, but can be
|
|
independently modified. In order to allow forwarding of packets between
|
|
two interfaces while prohibiting such behaviour on a third interface,
|
|
this sysctl can be employed.
|
|
</para>
|
|
</section>
|
|
<section id="routing-selection">
|
|
<title>Route Selection</title>
|
|
<indexterm zone="routing-selection">
|
|
<primary>route selection</primary>
|
|
</indexterm>
|
|
<para>
|
|
Crucial to the proper ability of hosts to exchange IP packets is the
|
|
correct selection of a route to the destination. The rules for the
|
|
selection of route path are traditionally made on a
|
|
<!-- note: per-hop-basis? PHB; consider -->
|
|
hop-by-hop basis
|
|
<footnote>
|
|
<para>
|
|
This document could stand to allude to MPLS implementations under
|
|
linux, for those who want to look at traffic engineering and packet
|
|
tagging on backbones. This is certainly not in the scope of this
|
|
chapter, and should be in a separate chapter, which covers
|
|
developing technologies.
|
|
</para>
|
|
</footnote>
|
|
based solely upon the destination address of the packet. Linux
|
|
behaves as a conventional routing device in this way, but can also
|
|
provide a more flexible capability. Routes can be chosen and
|
|
prioritized based on other packet characteristics.
|
|
</para>
|
|
<para>
|
|
The route selection algorithm under linux has been generalized to
|
|
enable the powerful latter scenario without complicating the
|
|
overwhelmingly common case of the former scenario.
|
|
</para>
|
|
<section id="routing-selection-common">
|
|
<title>The Common Case</title>
|
|
<para>
|
|
The above sections on routing to a
|
|
<link linkend="routing-local">local network</link> and
|
|
<link linkend="routing-default">the default gateway</link>
|
|
expose the importance of destination address for route selection.
|
|
In this simplified model, the kernel need only know the destination
|
|
address of the packet, which it compares against the routing tables to
|
|
determine the route by which to send the packet.
|
|
</para>
|
|
<para>
|
|
The kernel searches for a matching entry for the destination first in
|
|
the routing cache and then the main routing table.
|
|
In the case that the machine has recently transmitted a
|
|
packet to the destination address, the
|
|
<link linkend="routing-cache">routing cache</link> will contain an
|
|
entry for the destination. The kernel will select the same route, and
|
|
transmit the packet accordingly.
|
|
</para>
|
|
<para>
|
|
If the linux machine has not recently transmitted a packet to this
|
|
destination address, it will look up the destination in its routing
|
|
table using a technique known longest prefix match
|
|
<footnote>
|
|
<para>
|
|
Refer to
|
|
<ulink url="http://www.isi.edu/in-notes/rfc3222.txt">RFC
|
|
3222</ulink> for further details.
|
|
</para>
|
|
</footnote>.
|
|
In practical terms, the concept of longest prefix match means that the
|
|
most specific route to the destination will be chosen.
|
|
</para>
|
|
<anchor id="routing-selection-lpm"/>
|
|
<indexterm zone="routing-selection-lpm">
|
|
<primary>longest prefix match</primary>
|
|
<see>route selection, longest prefix match</see>
|
|
</indexterm>
|
|
<indexterm zone="routing-selection-lpm">
|
|
<primary>route selection</primary>
|
|
<secondary>longest prefix match</secondary>
|
|
</indexterm>
|
|
<para>
|
|
The use of the
|
|
longest prefix match allows routes for large networks to be
|
|
overridden by more specific host or network routes, as required in
|
|
<xref linkend="ex-basic-del-static"/>, for example. Conversely, it is
|
|
this same property of longest prefix match which allows routes to
|
|
individual destinations to be aggregated into larger network
|
|
addresses. Instead of entering individual routes for each host, large
|
|
numbers of contiguous network addresses can be aggregated. This is
|
|
the realized promise of CIDR networking. See
|
|
<xref linkend="links-general-ip"/> for further details.
|
|
</para>
|
|
<para>
|
|
In the common case, route selection is based completely on the
|
|
destination address. Conventional (as opposed to policy-based) IP
|
|
networking relies on only the destination address to select a route
|
|
for a packet.
|
|
</para>
|
|
<para>
|
|
Because the majority of linux systems have no need of policy
|
|
based routing
|
|
features, they use the conventional routing technique of longest
|
|
prefix match. While this meets the needs of a large subset of
|
|
linux networking needs, there are unrealized policy routing features
|
|
in a machine operating in this fashion.
|
|
</para>
|
|
</section>
|
|
<section id="routing-selection-adv">
|
|
<title>The Whole Story</title>
|
|
<para>
|
|
With the prevalence of low cost bandwidth, easily configured VPN
|
|
tunnels, and increasing reliance on networks, the technique of
|
|
selecting a route based solely on the destination IP address range no
|
|
longer suffices for all situations.
|
|
The discussion of the common case
|
|
of route selection under linux neglects one
|
|
of the most powerful features in the linux IP stack.
|
|
Since kernel 2.2, linux has
|
|
supported policy based routing through the use of
|
|
<link linkend="routing-tables">multiple routing tables</link> and the
|
|
<link linkend="routing-rpdb">routing policy database (RPDB)</link>.
|
|
Together, they allow a network
|
|
administrator to configure a machine select different routing
|
|
tables and routes based on a number of criteria.
|
|
</para>
|
|
<para>
|
|
Selectors available for use in policy-based routing are
|
|
attributes of a packet
|
|
passing through the linux routing code. The source address of a
|
|
packet, the ToS flags, an fwmark (a mark carried through the kernel in
|
|
the data structure representing the packet), and the interface name on
|
|
which the packet was received are attributes which can be used as
|
|
selectors. By selecting a routing table based
|
|
on packet attributes, an administrator can have
|
|
granular control over the network path of any packet.
|
|
</para>
|
|
<para>
|
|
With this knowledge of the RPDB and multiple
|
|
routing tables, let's revisit in detail the method by which the
|
|
kernel selects the proper route for a packet. Understanding
|
|
the series of steps the kernel takes for route selection should
|
|
demystify advanced routing. In fact, advanced routing could more
|
|
accurately be called policy-based networking.
|
|
</para>
|
|
<para>
|
|
When determining the route by which to send a packet, the kernel always
|
|
<link linkend="routing-cache">consults the routing cache first</link>.
|
|
The routing cache is a hash table used for quick access to recently
|
|
used routes. If the kernel finds an entry in the routing cache, the
|
|
corresponding entry will be used. If there is no entry in the
|
|
routing cache, the kernel begins the process of route selection. For
|
|
details on the method of matching a route in the routing cache, see
|
|
<xref linkend="routing-cache"/>.
|
|
</para>
|
|
<para>
|
|
The kernel begins iterating by priority through the routing policy
|
|
database. For each matching entry in the RPDB, the kernel will try to
|
|
find a matching route to the destination IP
|
|
address in the specified routing table using the aforementioned
|
|
longest prefix match selection algorithm. When a matching destination
|
|
is found, the kernel will select the matching route, and forward the
|
|
packet. If no matching entry is found in the specified routing table,
|
|
the kernel will pass to the next rule in the RPDB, until it finds a
|
|
match or falls through the end of the RPDB and all consulted routing
|
|
tables.
|
|
</para>
|
|
<para>
|
|
Here is a snippet of python-esque pseudocode to illustrate the
|
|
kernel's route selection process again. Each of the lookups below
|
|
occurs in kernel hash tables which are accessible to the user through
|
|
the use of various &iproute2; tools.
|
|
<indexterm zone="routing-selection-algorithm">
|
|
<primary>route selection</primary>
|
|
<secondary>algorithm</secondary>
|
|
</indexterm>
|
|
<example id="routing-selection-algorithm">
|
|
<title>Routing Selection Algorithm in Pseudo-code</title>
|
|
<programlisting>
|
|
if packet.routeCacheLookupKey in routeCache :
|
|
route = routeCache[ packet.routeCacheLookupKey ]
|
|
else
|
|
for rule in rpdb :
|
|
if packet.rpdbLookupKey in rule :
|
|
routeTable = rule[ lookupTable ]
|
|
if packet.routeLookupKey in routeTable :
|
|
route = route_table[ packet.routeLookup_key ]
|
|
</programlisting>
|
|
</example>
|
|
<!--
|
|
|
|
I don't know if this is correct! Need to learn about how the routing
|
|
cache is populated with information. 2003-02-05
|
|
|
|
route_cache[ packet.routeCacheLookupKey ] = route
|
|
|
|
-->
|
|
|
|
|
|
This pseudocode provides some explanation of the decisions
|
|
required to find a route. The final piece of information
|
|
required to understand the decision making process is the lookup
|
|
process for each of the three hash table lookups. In
|
|
<xref linkend="tb-routing-selection-adv"/>, each key is listed in order
|
|
of importance. Optional keys are listed in italics and represent keys
|
|
that will be matched if they are present.
|
|
</para>
|
|
<indexterm zone="tb-routing-selection-adv">
|
|
<primary>route selection</primary>
|
|
<secondary>lookup keys</secondary>
|
|
</indexterm>
|
|
<table id="tb-routing-selection-adv">
|
|
<title>Keys used for hash table lookups during route selection</title>
|
|
<tgroup cols="3" align="center" colsep="1" rowsep="1">
|
|
<thead>
|
|
<row>
|
|
<entry>route cache</entry>
|
|
<entry>RPDB</entry>
|
|
<entry>route table</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>destination</entry>
|
|
<entry>source</entry>
|
|
<entry>destination</entry>
|
|
</row>
|
|
<row>
|
|
<entry>source</entry>
|
|
<entry><emphasis>destination</emphasis></entry>
|
|
<entry><emphasis>ToS</emphasis></entry>
|
|
</row>
|
|
<row>
|
|
<entry><emphasis>ToS</emphasis></entry>
|
|
<entry><emphasis>ToS</emphasis></entry>
|
|
<entry><emphasis><link linkend="tb-tools-ip-addr-scope">scope</link></emphasis></entry>
|
|
</row>
|
|
<row>
|
|
<entry><emphasis>fwmark</emphasis></entry>
|
|
<entry><emphasis>fwmark</emphasis></entry>
|
|
<entry><emphasis>oif</emphasis></entry>
|
|
</row>
|
|
<row>
|
|
<entry><emphasis>iif</emphasis></entry>
|
|
<entry><emphasis>iif</emphasis></entry>
|
|
<entry></entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
<para>
|
|
The route cache (also the forwarding information base) can be
|
|
displayed using
|
|
<link linkend="tools-ip-route-show-cache"><command>ip route show
|
|
cache</command></link>. The routing policy database (RPDB) can be
|
|
manipulated with the
|
|
<link linkend="tools-ip-rule"><command>ip rule</command></link>
|
|
utility. Individual route tables can be manipulated and displayed
|
|
with the
|
|
<link linkend="tools-ip-route"><command>ip route</command></link>
|
|
command line tool.
|
|
</para>
|
|
<example id="ex-routing-selection-adv-ip-rule">
|
|
<title>Listing the Routing Policy Database (RPDB)</title>
|
|
<programlisting>
|
|
<prompt>[root@isolde]# </prompt><userinput>ip rule show</userinput>
|
|
<computeroutput>0: from all lookup local
|
|
32766: from all lookup main
|
|
32767: from all lookup 253</computeroutput>
|
|
</programlisting>
|
|
</example>
|
|
<para>
|
|
Observation of the output of <command>ip rule show</command> in
|
|
<xref linkend="ex-routing-selection-adv-ip-rule"/>
|
|
on a box whose RPDB has not been changed should reveal a
|
|
high priority rule, rule 0. This rule, created at RPDB
|
|
initialization, instructs the kernel to try to find a match for the
|
|
destination in the
|
|
<link linkend="routing-table-local">local routing table</link>. If
|
|
there is no match for the packet in the local routing table, then,
|
|
per rule 32766, the kernel will perform a route lookup in the
|
|
main routing table. Normally, the main routing table will contain a
|
|
default route if not a more specific route.
|
|
Failing a route lookup in the main routing table the final rule
|
|
(32767) instructs the kernel to perform a route lookup in table 253.
|
|
</para>
|
|
|
|
<!--
|
|
|
|
FIXME; include an XREF here to the State vs Statless discussion
|
|
|
|
-->
|
|
|
|
<para>
|
|
A common mistake when working with multiple routing tables involves
|
|
forgetting about the statelessness of IP routing. This manifests when
|
|
the user configuring the policy routing machine accounts for outbound
|
|
packets (via &fwmark;, or <command>ip rule</command>
|
|
selectors), but forgets to account for the return packets.
|
|
</para>
|
|
</section>
|
|
<section id="routing-selection-summary">
|
|
<title>Summary</title>
|
|
<para>
|
|
For more ideas on how to use policy routing, how to work with
|
|
multiple routing tables, and how to troubleshoot, see
|
|
<xref linkend="adv-rpdb"/>.
|
|
</para>
|
|
<para>
|
|
Yeah. That's it. So there.
|
|
</para>
|
|
</section>
|
|
</section>
|
|
<section id="routing-saddr-selection">
|
|
<title>Source Address Selection</title>
|
|
<indexterm zone="routing-saddr-selection">
|
|
<primary>source address selection</primary>
|
|
<seealso>route selection</seealso>
|
|
</indexterm>
|
|
<para>
|
|
The selection of the correct source address is key to correct
|
|
communication between hosts with multiple IP addresses. If a host
|
|
chooses an address from a private network to communicate with a public
|
|
Internet host, it is likely that the return half of the communication
|
|
will never arrive.
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
The initial source address for an outbound packet is chosen in according
|
|
to the following series of rules. The application can request a
|
|
particular IP
|
|
<footnote>
|
|
<para>
|
|
Many networking applications accept a command line option to prefer
|
|
a particular source address. The call to select a particular
|
|
IP is known as <function>bind()</function>, so the command
|
|
line option frequently
|
|
contains the word <wordasword>bind</wordasword>, e.g.,
|
|
<option>--bind-address</option>.
|
|
Examples of command line tools allowing specification of the source
|
|
address are <command>nc -s $BINDADDR $DEST $PORT</command> or
|
|
<command>socat -
|
|
TCP4:$REMOTEHOST:$REMOTEPORT,bind=$BINDADDR</command>.
|
|
</para>
|
|
</footnote>,
|
|
the kernel will use the &src; hint from the chosen
|
|
route path
|
|
<footnote>
|
|
<para>
|
|
In this case, the route has already been selected (see
|
|
<xref linkend="routing-selection"/>) and the chosen route entry
|
|
includes a hint for preferred source address on outbound packets
|
|
specifically for this purpose. For examples on configuring the
|
|
routing tables to include this parameter, see
|
|
<xref linkend="ex-tools-ip-route-add-src"/>.
|
|
</para>
|
|
</footnote>,
|
|
or, lacking this hint, the kernel will choose the first address
|
|
configured on the interface which falls in the same network as the
|
|
destination address or the nexthop router.
|
|
</para>
|
|
<para>
|
|
The following list recapitulates the manner by which the kernel
|
|
determines what the source address of an outbound packet.
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
The application is already using the socket, in which case, the
|
|
source address has been chosen. Also, the application can
|
|
specifically request a particular address (not necessarily a
|
|
locally hosted IP; see
|
|
<xref linkend="adv-nonlocal-bind"/>) using the
|
|
<function>bind</function> call.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The kernel performs a
|
|
<link linkend="routing-selection">route lookup</link> and finds an
|
|
outbound route for the destination. If the route contains the
|
|
&src; parameter, the kernel selects this IP
|
|
address for the outbound packet.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
Also refer to this
|
|
<ulink url="http://linux-ip.net/gl/ip-cref/node155.html">excerpt</ulink>
|
|
from the &iproute2; command reference.
|
|
</para>
|
|
</section>
|
|
<section id="routing-cache">
|
|
<title>Routing Cache</title>
|
|
<indexterm zone="routing-cache">
|
|
<primary>routing cache</primary>
|
|
</indexterm>
|
|
<indexterm zone="routing-cache">
|
|
<primary>forwarding information base</primary>
|
|
<see>routing cache</see>
|
|
</indexterm>
|
|
<para>
|
|
The routing cache is also known as the forwarding information base (FIB).
|
|
This term may be familiar to users of other routing systems.
|
|
</para>
|
|
<para>
|
|
The routing cache stores recently used routing entries in a fast and
|
|
convenient hash lookup table, and is consulted before the routing
|
|
tables. If the kernel finds a matching entry during route cache lookup,
|
|
it will forward the packet immediately and stop traversing the routing
|
|
tables.
|
|
</para>
|
|
<para>
|
|
Because the routing cache is maintained by the kernel separately from
|
|
the routing tables, manipulating the routing tables may not have an
|
|
immediate effect on the kernel's choice of path for a given packet.
|
|
To avoid a non-deterministic lag between the time that a new route
|
|
is entered into the kernel routing tables and the time that a new lookup
|
|
in those route tables is performed, use
|
|
<link linkend="tools-ip-route-flush-cache"><command>ip route flush
|
|
cache</command></link>. Once the route cache has been emptied, new
|
|
route lookups (if not by a packet, then manually with
|
|
<link linkend="tools-ip-route-get"><command>ip route
|
|
get</command></link>) will result in a new lookup to the kernel routing
|
|
tables.
|
|
</para>
|
|
<para>
|
|
The following is a listing of the hash lookup keys
|
|
in the routing cache and a description of each key. Compare this list
|
|
with the elements identified in
|
|
<xref linkend="tb-routing-selection-adv"/>.
|
|
</para>
|
|
<variablelist id="list-routing-cache-lookup-keys">
|
|
<varlistentry id="list-routing-cache-lookup-keys-dst">
|
|
<term>dst</term>
|
|
<term>Destination Address</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-lookup-keys-dst">
|
|
<primary>routing cache</primary>
|
|
<secondary>lookup keys</secondary>
|
|
<tertiary>dst</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
The destination IP address of the packet. This is the destination
|
|
address on the packet at the time of the route lookup. The address
|
|
is a host address. All 32 bits are significant during this lookup.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-lookup-keys-src">
|
|
<term>src</term>
|
|
<term>Source Address</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-lookup-keys-src">
|
|
<primary>routing cache</primary>
|
|
<secondary>lookup keys</secondary>
|
|
<tertiary>src</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
The source IP address of the packet. This is the source address
|
|
on the packet at the time of the route lookup. The address is a
|
|
host address. All 32 bits are significant during this lookup.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-lookup-keys-tos">
|
|
<term>tos</term>
|
|
<term>Type of Service</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-lookup-keys-tos">
|
|
<primary>routing cache</primary>
|
|
<secondary>lookup keys</secondary>
|
|
<tertiary>tos</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
The ToS marking on the packet. If there is no ToS marking on the
|
|
packet (tos == 0), this lookup key is unused. If there is a ToS
|
|
marking, the kernel will search for a match with this ToS value.
|
|
If no matching (dst, src, tos) is found, the kernel will continue
|
|
the search for a route by traversing the RPDB.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-lookup-keys-fwmark">
|
|
<term>fwmark</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-lookup-keys-fwmark">
|
|
<primary>routing cache</primary>
|
|
<secondary>lookup keys</secondary>
|
|
<tertiary>fwmark</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
The mark on a packet added administratively by the packet
|
|
filtering engine (<command>ipchains</command> or
|
|
<command>iptables</command>).
|
|
This mark is not part of the physical IP packet, and only exists
|
|
as part of the data structure held in memory on the routing device
|
|
to represent the IP
|
|
packet. If there is no fwmark on the packet, this lookup key is
|
|
unused. When present, the kernel will search for a matching
|
|
(dst, src, tos?, fwmark) entry. If no matching entry is found,
|
|
the kernel will continue the search for a route by traversing the
|
|
RPDB.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-lookup-keys-iif">
|
|
<term>iif</term>
|
|
<term>inbound interface</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-lookup-keys-iif">
|
|
<primary>routing cache</primary>
|
|
<secondary>lookup keys</secondary>
|
|
<tertiary>iif</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
The name of the interface on which the packet arrived.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
The following attributes may be stored for each entry in the routing
|
|
cache.
|
|
</para>
|
|
<variablelist id="list-routing-cache-attrs">
|
|
<varlistentry id="list-routing-cache-attrs-cwnd">
|
|
<term>cwnd</term>
|
|
<term>FIXME Window</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-attrs-cwnd">
|
|
<primary>routing cache</primary>
|
|
<secondary>attributes</secondary>
|
|
<tertiary>cwnd</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
FIXME. A) I don't know what it is.
|
|
B) I don't know how to describe it.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-attrs-advmss">
|
|
<term>advmss</term>
|
|
<term>Advertised Maximum Segment Size</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-attrs-advmss">
|
|
<primary>routing cache</primary>
|
|
<secondary>attributes</secondary>
|
|
<tertiary>advmss</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-attrs-src">
|
|
<term>src</term>
|
|
<term>(Preferred Local) Source Address</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-attrs-src">
|
|
<primary>routing cache</primary>
|
|
<secondary>attributes</secondary>
|
|
<tertiary>src</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-attrs-mtu">
|
|
<term>mtu</term>
|
|
<term>Maximum Transmission Unit</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-attrs-mtu">
|
|
<primary>routing cache</primary>
|
|
<secondary>attributes</secondary>
|
|
<tertiary>mtu</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-attrs-rtt">
|
|
<term>rtt</term>
|
|
<term>Round Trip Time</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-attrs-rtt">
|
|
<primary>routing cache</primary>
|
|
<secondary>attributes</secondary>
|
|
<tertiary>rtt</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-attrs-rttvar">
|
|
<term>rttvar</term>
|
|
<term>Round Trip Time Variation</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-attrs-rttvar">
|
|
<primary>routing cache</primary>
|
|
<secondary>attributes</secondary>
|
|
<tertiary>rttvar</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
FIXME. Gotta find some references to this, too.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-attrs-age">
|
|
<term>age</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-attrs-age">
|
|
<primary>routing cache</primary>
|
|
<secondary>attributes</secondary>
|
|
<tertiary>age</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-attrs-users">
|
|
<term>users</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-attrs-users">
|
|
<primary>routing cache</primary>
|
|
<secondary>attributes</secondary>
|
|
<tertiary>users</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-cache-attrs-used">
|
|
<term>used</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-cache-attrs-used">
|
|
<primary>routing cache</primary>
|
|
<secondary>attributes</secondary>
|
|
<tertiary>used</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>
|
|
Collectively the hash keys uniquely identify routes in the forwarding
|
|
information base (routing cache) and each entry provides attributes of
|
|
the route.
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
</para>
|
|
</section>
|
|
<section id="routing-tables">
|
|
<title>Routing Tables</title>
|
|
<indexterm zone="routing-tables">
|
|
<primary>routing tables</primary>
|
|
<secondary>multiple</secondary>
|
|
</indexterm>
|
|
<para>
|
|
Linux kernel 2.2 and 2.4 support multiple routing tables
|
|
<footnote>
|
|
<para>
|
|
The kernel must be compiled with the option
|
|
<constant>CONFIG_IP_MULTIPLE_TABLES=y</constant>. This is common
|
|
in vendor and stock kernels, both 2.2 and 2.4.
|
|
</para>
|
|
</footnote>.
|
|
Beyond the two commonly used routing tables
|
|
(<link linkend="routing-table-local">the local</link> and
|
|
<link linkend="routing-table-main">main</link> routing tables), the
|
|
kernel supports up to 252 additional routing tables.
|
|
</para>
|
|
<para>
|
|
The multiple routing table system provides a flexible infrastructure on
|
|
top of which to implement policy routing. By allowing multiple
|
|
traditional routing tables (keyed primarily to destination address)
|
|
to be combined with the
|
|
<link linkend="routing-rpdb">routing policy database (RPDB)</link>
|
|
(keyed primarily to source address), the
|
|
kernel supports a well-known and well-understood interface while
|
|
simultaneously expanding and extending its routing capabilities.
|
|
Each routing table still operates in the traditional and expected
|
|
fashion. Linux simply allows you to choose from a
|
|
number of routing tables, and to traverse routing tables in a
|
|
user-definable sequence until a matching route is found.
|
|
</para>
|
|
<anchor id="routing-tables-keys"/>
|
|
<indexterm zone="routing-tables-keys">
|
|
<primary>routing tables</primary>
|
|
<secondary>key fields</secondary>
|
|
</indexterm>
|
|
<para>
|
|
Any given routing table can contain an arbitrary number of entries,
|
|
each of which is keyed on the following characteristics (cf.
|
|
<xref linkend="tb-routing-selection-adv"/>)
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
destination address; a network or host address (primary key)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
tos; Type of Service
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<link linkend="tb-tools-ip-addr-scope">scope</link>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
output interface
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
For practical purposes, this means that (even) a single routing table can
|
|
contain multiple routes to the same destination if the ToS differs
|
|
on each route or if the route applies to a different interface
|
|
<footnote>
|
|
<para>
|
|
If somebody has used scope or oif as additional keys in a routing
|
|
table, and has an example, I'd love to see it, for possible
|
|
inclusion in this documentation.
|
|
</para>
|
|
</footnote>.
|
|
</para>
|
|
<para>
|
|
Kernels supporting multiple routing tables refer to routing tables by
|
|
unique integer slots between 0 and 255
|
|
<footnote>
|
|
<para>
|
|
Can anybody describe to me what is in table 0? It looks almost like
|
|
an aggregation of the routing entries in routing tables 254 and 255.
|
|
</para>
|
|
</footnote>.
|
|
The two routing tables normally employed are
|
|
<link linkend="routing-table-local">table 255, the
|
|
&local; routing table</link>, and
|
|
<link linkend="routing-table-main">table 254, the
|
|
&main; routing table</link>. For
|
|
examples of using multiple routing tables, see
|
|
<xref linkend="ch-advanced"/>, in particular,
|
|
<xref linkend="ex-adv-multi-internet-outbound-ip-routing"/>,
|
|
<xref linkend="ex-adv-multi-internet-outbound-ip-rule"/> and
|
|
<xref linkend="ex-adv-multi-internet-inbound"/>. Also be sure
|
|
to read
|
|
<xref linkend="adv-rpdb"/> and
|
|
<xref linkend="routing-rpdb"/>.
|
|
</para>
|
|
<para>
|
|
The <command>ip route</command> and <command>ip rule</command> commands
|
|
have built in support for the special tables &main; and &local;.
|
|
Any other routing tables can be referred to by number or an
|
|
administratively maintained mapping file,
|
|
<filename>/etc/iproute2/rt_tables</filename>.
|
|
</para>
|
|
<para>
|
|
The format of this file is extraordinarily simple. Each line represents
|
|
one mapping of an arbitrary string to an integer. Comments are allowed.
|
|
</para>
|
|
<example id="ex-routing-tables-rt-table">
|
|
<title>Typical content of
|
|
<filename>/etc/iproute2/rt_tables</filename></title>
|
|
<programlisting>
|
|
<computeroutput>#
|
|
# reserved values
|
|
#
|
|
255 local <co id="id-rtrt-local" linkends="id-rtrt-local-text"/>
|
|
254 main <co id="id-rtrt-main" linkends="id-rtrt-main-text"/>
|
|
253 default <co id="id-rtrt-default" linkends="id-rtrt-default-text"/>
|
|
0 unspec <co id="id-rtrt-unspec" linkends="id-rtrt-unspec-text"/>
|
|
#
|
|
# local
|
|
#
|
|
1 inr.ruhep <co id="id-rtrt-user" linkends="id-rtrt-user-text"/></computeroutput>
|
|
</programlisting>
|
|
<calloutlist>
|
|
<callout
|
|
arearefs="id-rtrt-local"
|
|
id="id-rtrt-local-text">
|
|
<simpara>
|
|
The &local; table is a special routing table maintained by the
|
|
kernel. Users can remove entries from the local routing table
|
|
at their own risk. Users cannot add entries to the local
|
|
routing table. The file
|
|
<filename>/etc/iproute2/rt_tables</filename> need not exist, as
|
|
the &iproute2; tools have a hard-coded entry for the &local;
|
|
table.
|
|
</simpara>
|
|
</callout>
|
|
<callout
|
|
arearefs="id-rtrt-main"
|
|
id="id-rtrt-main-text">
|
|
<simpara>
|
|
The main routing table is the table operated upon by
|
|
<command>route</command> and, when not otherwise specified, by
|
|
<command>ip route</command>. The file
|
|
<filename>/etc/iproute2/rt_tables</filename> need not exist, as
|
|
the &iproute2; tools have a hard-coded entry for the &main;
|
|
table.
|
|
</simpara>
|
|
</callout>
|
|
<callout
|
|
arearefs="id-rtrt-default"
|
|
id="id-rtrt-default-text">
|
|
<simpara>
|
|
The <constant>default</constant> routing table is another
|
|
special routing table, but WHY is it special!?!
|
|
</simpara>
|
|
</callout>
|
|
<callout
|
|
arearefs="id-rtrt-unspec"
|
|
id="id-rtrt-unspec-text">
|
|
<simpara>
|
|
Operating on the <constant>unspec</constant> routing table
|
|
appears to operate on all routing tables simultaneously. Is
|
|
this true!? What does that imply?
|
|
</simpara>
|
|
</callout>
|
|
<callout
|
|
arearefs="id-rtrt-user"
|
|
id="id-rtrt-user-text">
|
|
<simpara>
|
|
This is an example indicating that table 1 is known by the name
|
|
inr.ruhep. Any references to <userinput>table
|
|
inr.ruhep</userinput> in an <command>ip rule</command>
|
|
or <command>ip route</command> will substitue the
|
|
value 1 for the word inr.ruhep.
|
|
</simpara>
|
|
</callout>
|
|
</calloutlist>
|
|
</example>
|
|
<para>
|
|
The routing table manipulated by the conventional
|
|
<link linkend="tools-route"><command>route</command></link> command
|
|
is the &main; routing table. Additionally, the use of both
|
|
<link linkend="tools-ip-address"><command>ip address</command></link> and
|
|
<link linkend="tools-ifconfig"><command>ifconfig</command></link>
|
|
will cause the kernel to alter the local routing table (and usually the
|
|
main routing table). For further documentation on how to manipulate
|
|
the other routing tables, see the command description of
|
|
<link linkend="tools-ip-route"><command>ip route</command></link>.
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<section id="routing-table-entries">
|
|
<title>Routing Table Entries (Routes)</title>
|
|
<indexterm zone="routing-tables-keys">
|
|
<primary>route types</primary>
|
|
<see>routing tables, entry types</see>
|
|
</indexterm>
|
|
<indexterm zone="routing-tables-keys">
|
|
<primary>routing tables</primary>
|
|
<secondary>entry types</secondary>
|
|
</indexterm>
|
|
<para>
|
|
Each routing table can contain an arbitrary number of route entries.
|
|
Aside from the
|
|
<link linkend="routing-table-local">local routing table</link>, which
|
|
is maintained by the kernel, and the
|
|
<link linkend="routing-table-main">main routing table</link> which is
|
|
partially maintained by the kernel,
|
|
all routing tables are controlled by the administrator or routing
|
|
software. All routes on a machine can be changed or removed
|
|
<footnote>
|
|
<para>
|
|
Once again, I recommend caution when altering the local routing
|
|
table. Removing local route types from the local routing table
|
|
can break networking in strange and wonderful ways.
|
|
</para>
|
|
</footnote>.
|
|
</para>
|
|
<para>
|
|
Each of the following route types is available for use with
|
|
the <command>ip route</command> command. Each route type causes a
|
|
particular sort of behaviour, which is identified in the textual
|
|
description. Compare the route types described below with the
|
|
<link linkend="list-routing-rule-types">rule types</link> available
|
|
for use in the RPDB.
|
|
</para>
|
|
<variablelist id="list-routing-route-types">
|
|
<varlistentry id="list-routing-route-types-unicast">
|
|
<term>unicast</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-route-types-unicast">
|
|
<primary>routing tables</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>unicast</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
A unicast route is the most common route in routing tables.
|
|
This is a typical route to a destination network address, which
|
|
describes the path to the destination. Even complex routes,
|
|
such as nexthop routes are considered unicast routes. If no
|
|
route type is specified on the command line, the route is
|
|
assumed to be a unicast route.
|
|
</para>
|
|
<example id="ex-list-route-unicast">
|
|
<title>unicast route types</title>
|
|
<programlisting>
|
|
<userinput>ip route add unicast 192.168.0.0/24 via 192.168.100.5</userinput>
|
|
<userinput>ip route add default via 193.7.255.1</userinput>
|
|
<userinput>ip route add unicast default via 206.59.29.193</userinput>
|
|
<userinput>ip route add 10.40.0.0/16 via 10.72.75.254</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-route-types-broadcast">
|
|
<term>broadcast</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-route-types-broadcast">
|
|
<primary>routing tables</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>broadcast</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
This route type is used for link layer devices (such as Ethernet
|
|
cards) which support the notion of a broadcast address. This
|
|
route type is used only in the local routing table
|
|
<footnote>
|
|
<para>
|
|
OK, I'm not absolutely sure you can't use the broadcast
|
|
route in other routing tables, but I believe you can't.
|
|
Testing forthcoming...
|
|
</para>
|
|
</footnote>
|
|
and is typically handled by the kernel.
|
|
</para>
|
|
<example id="ex-list-route-broadcast">
|
|
<title>broadcast route types</title>
|
|
<programlisting>
|
|
<userinput>ip route add table local broadcast 10.10.20.255 dev eth0 proto kernel scope link src 10.10.20.67</userinput>
|
|
<userinput>ip route add table local broadcast 192.168.43.31 dev eth4 proto kernel scope link src 192.168.43.14</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-route-types-local">
|
|
<term>local</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-route-types-local">
|
|
<primary>routing tables</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>local</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
The kernel will add entries into the local routing table when
|
|
IP addresses are added to an interface. This means that the IPs
|
|
are locally hosted IPs
|
|
<footnote>
|
|
<para>
|
|
Ibid. I'm not sure that local route types can be used
|
|
in any routing table other than the local routing table.
|
|
Testing forthcoming...
|
|
</para>
|
|
</footnote>.
|
|
</para>
|
|
<example id="ex-list-route-local">
|
|
<title>local route types</title>
|
|
<programlisting>
|
|
<userinput>ip route add table local local 10.10.20.64 dev eth0 proto kernel scope host src 10.10.20.67</userinput>
|
|
<userinput>ip route add table local local 192.168.43.12 dev eth4 proto kernel scope host src 192.168.43.14</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-route-types-nat">
|
|
<term>nat</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-route-types-nat">
|
|
<primary>routing tables</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>nat</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
This route entry is added by the kernel in the local routing
|
|
table, when the user attempts to configure stateless NAT. See
|
|
<xref linkend="nat-stateless"/> for a fuller discussion of
|
|
network address translation in general.
|
|
<footnote>
|
|
<para>
|
|
Ibid. nat route types might be ineffectual outside
|
|
the local routing table. Testing forthcoming...
|
|
</para>
|
|
</footnote>.
|
|
</para>
|
|
<example id="ex-list-route-nat">
|
|
<title>nat route types</title>
|
|
<programlisting>
|
|
<userinput>ip route add nat 193.7.255.184 via 172.16.82.184</userinput>
|
|
<userinput>ip route add nat 10.40.0.0/16 via 172.40.0.0</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-route-types-unreachable">
|
|
<term>unreachable</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-route-types-unreachable">
|
|
<primary>routing tables</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>unreachable</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
When a request for a routing decision returns a destination
|
|
with an unreachable route type, an ICMP unreachable is
|
|
generated and returned to the source address.
|
|
</para>
|
|
<example id="ex-list-route-unreachable">
|
|
<title>unreachable route types</title>
|
|
<programlisting>
|
|
<userinput>ip route add unreachable 172.16.82.184</userinput>
|
|
<userinput>ip route add unreachable 192.168.14.0/26</userinput>
|
|
<userinput>ip route add unreachable 209.10.26.51</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-route-types-prohibit">
|
|
<term>prohibit</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-route-types-prohibit">
|
|
<primary>routing tables</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>prohibit</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
When a request for a routing decision returns a destination with
|
|
a prohibit route type, the kernel generates an ICMP prohibited
|
|
to return to the source address.
|
|
</para>
|
|
<example id="ex-list-route-prohibit">
|
|
<title>prohibit route types</title>
|
|
<programlisting>
|
|
<userinput>ip route add prohibit 10.21.82.157</userinput>
|
|
<userinput>ip route add prohibit 172.28.113.0/28</userinput>
|
|
<userinput>ip route add prohibit 209.10.26.51</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-route-types-blackhole">
|
|
<term>blackhole</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-route-types-blackhole">
|
|
<primary>routing tables</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>blackhole</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
A packet matching a route with the route type blackhole is
|
|
discarded. No ICMP is sent and no packet is forwarded.
|
|
</para>
|
|
<example id="ex-list-route-blackhole">
|
|
<title>blackhole route types</title>
|
|
<programlisting>
|
|
<userinput>ip route add blackhole default</userinput>
|
|
<userinput>ip route add blackhole 202.143.170.0/24</userinput>
|
|
<userinput>ip route add blackhole 64.65.64.0/18</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-route-types-throw">
|
|
<term>throw</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-route-types-throw">
|
|
<primary>routing tables</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>throw</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
The throw route type is a convenient route type which causes
|
|
a route lookup in a routing table to fail, returning the
|
|
<link linkend="routing-selection-adv">routing selection
|
|
process</link> to the RPDB. This is useful when there are
|
|
additional routing tables. Note that there is an implicit throw
|
|
if no default route exists in a routing table, so the route
|
|
created by the first command in the example is superfluous,
|
|
although legal.
|
|
</para>
|
|
<example id="ex-list-route-throw">
|
|
<title>throw route types</title>
|
|
<programlisting>
|
|
<userinput>ip route add throw default</userinput>
|
|
<userinput>ip route add throw 10.79.0.0/16</userinput>
|
|
<userinput>ip route add throw 172.16.0.0/12</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>
|
|
The power of these route types when combined with the
|
|
<link linkend="routing-rpdb">routing policy database</link> can hardly
|
|
be understated. All of these route types can be used without the
|
|
RPDB, although the throw route doesn't make much sense outside of a
|
|
multiple routing table installation.
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<para>
|
|
</para>
|
|
</section>
|
|
<section id="routing-table-local">
|
|
<title>The Local Routing Table</title>
|
|
<indexterm zone="routing-table-local">
|
|
<primary>local routing table</primary>
|
|
<see>routing tables, local</see>
|
|
</indexterm>
|
|
<indexterm zone="routing-table-local">
|
|
<primary>routing tables</primary>
|
|
<secondary>local</secondary>
|
|
</indexterm>
|
|
<para>
|
|
The local routing table is maintained by the kernel. Normally, the
|
|
local routing table should not be manipulated,
|
|
but it is available for viewing. In
|
|
<xref linkend="ex-tools-ip-route-show-local"/>, you'll see two of the
|
|
common uses of the local routing table. The first common use is the
|
|
specification of broadcast address, necessary only for link layers
|
|
which support broadcast addressing. The second common type of entry
|
|
in a local routing table is a route to a locally hosted IP.
|
|
</para>
|
|
<para>
|
|
The route types found in the local routing table
|
|
are <constant>local</constant>, <constant>nat</constant> and
|
|
<constant>broadcast</constant>. These route types are not relevant in
|
|
other routing tables, and other route types cannot be used in the
|
|
local routing table.
|
|
</para>
|
|
<para>
|
|
If the machine has several IP addresses on one Ethernet interface,
|
|
there will be a route to each locally hosted IP in the local routing
|
|
table. This is a normal
|
|
<link linkend="list-basic-ifconfig-side-effects-up">side effect</link>
|
|
of bringing up an IP address on an interface under linux.
|
|
Maintenance of the broadcast and local routes in the local routing
|
|
table can only be done by the kernel.
|
|
</para>
|
|
<example id="ex-routing-table-local-maint">
|
|
<title>Kernel maintenance of the &local; routing table</title>
|
|
<programlisting>
|
|
<prompt>[root@real-server]# </prompt><userinput>ip address show dev eth1</userinput>
|
|
<computeroutput>6: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
|
|
link/ether 00:80:c8:e8:1e:fc brd ff:ff:ff:ff:ff:ff
|
|
inet 10.10.20.89/24 brd 10.10.20.255 scope global eth1</computeroutput>
|
|
<prompt>[root@real-server]# </prompt><userinput>ip route show dev eth1</userinput>
|
|
<computeroutput>10.10.20.0/24 proto kernel scope link src 10.10.20.89</computeroutput>
|
|
<prompt>[root@real-server]# </prompt><userinput>ip route show dev eth1 table local</userinput>
|
|
<computeroutput>broadcast 10.10.20.0 proto kernel scope link src 10.10.20.89
|
|
broadcast 10.10.20.255 proto kernel scope link src 10.10.20.89
|
|
local 10.10.20.89 proto kernel scope host src 10.10.20.89</computeroutput>
|
|
<prompt>[root@real-server]# </prompt><userinput>ip address add 192.168.254.254/24 brd + dev eth1</userinput>
|
|
<prompt>[root@real-server]# </prompt><userinput>ip address show dev eth1</userinput>
|
|
<computeroutput>6: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
|
|
link/ether 00:80:c8:e8:1e:fc brd ff:ff:ff:ff:ff:ff
|
|
inet 10.10.20.89/24 brd 10.10.20.255 scope global eth1
|
|
inet 192.168.254.254/24 brd 192.168.254.255 scope global eth1</computeroutput>
|
|
<prompt>[root@real-server]# </prompt><userinput>ip route show dev eth1</userinput>
|
|
<computeroutput>10.10.20.0/24 proto kernel scope link src 10.10.20.89
|
|
192.168.254.0/24 proto kernel scope link src 192.168.254.254</computeroutput>
|
|
<prompt>[root@real-server]# </prompt><userinput>ip route show dev eth1 table local</userinput>
|
|
<computeroutput>broadcast 10.10.20.0 proto kernel scope link src 10.10.20.89
|
|
broadcast 192.168.254.0 proto kernel scope link src 192.168.254.254
|
|
broadcast 10.10.20.255 proto kernel scope link src 10.10.20.89
|
|
local 192.168.254.254 proto kernel scope host src 192.168.254.254
|
|
local 10.10.20.89 proto kernel scope host src 10.10.20.89
|
|
broadcast 192.168.254.255 proto kernel scope link src 192.168.254.254</computeroutput>
|
|
</programlisting>
|
|
</example>
|
|
<para>
|
|
Note in
|
|
<xref linkend="ex-routing-table-local-maint"/>, that the kernel adds
|
|
not only the route for the locally connected network in the &main;
|
|
routing table, but also the three required special addresses in the
|
|
&local; routing table. Any IP addresses which are locally hosted on
|
|
the box will have &local; entries in the &local; table. The
|
|
<link linkend="list-routing-intro-ipdefs-netaddr">network
|
|
address</link> and
|
|
<link linkend="list-routing-intro-ipdefs-bcast">broadcast
|
|
address</link> are both entered as <constant>broadcast</constant> type
|
|
addresses on the interface to which they have been bound.
|
|
Conceptually, there is significance to the distinction between a
|
|
network and broadcast address, but practically, they are treated
|
|
analogously, by other networking gear as well as the linux kernel.
|
|
</para>
|
|
<para>
|
|
There is one other type of route which commonly ends up in the &local;
|
|
routing table. When using &iproute2; NAT, there will
|
|
be entries in the local routing table for each network address
|
|
translation. Refer to
|
|
<xref linkend="ex-tools-ip-route-nat-simple"/> and
|
|
<xref linkend="ex-tools-ip-route-nat-network"/> for example output.
|
|
</para>
|
|
</section>
|
|
<section id="routing-table-main">
|
|
<title>The Main Routing Table</title>
|
|
<indexterm zone="routing-table-main">
|
|
<primary>main routing table</primary>
|
|
<see>routing tables, main</see>
|
|
</indexterm>
|
|
<indexterm zone="routing-table-main">
|
|
<primary>routing tables</primary>
|
|
<secondary>main</secondary>
|
|
</indexterm>
|
|
<para>
|
|
The &main; routing table is the routing table most people think of when
|
|
considering a linux routing table. When no table is specified to an
|
|
<command>ip route</command> command, the kernel assumes the &main;
|
|
routing table. The <command>route</command> command only manipulates
|
|
the &main; routing table.
|
|
</para>
|
|
<para>
|
|
Similarly to the &local; table, the &main; table is populated
|
|
automatically by the kernel when new interfaces are brought up
|
|
with IP addresses. Consult the &main; routing table before and after
|
|
<userinput>ip address add 192.168.254.254/24 brd + dev eth1</userinput>
|
|
in
|
|
<xref linkend="ex-routing-table-local-maint"/> for a concrete example
|
|
of this kernel behaviour. Also, visit
|
|
<link linkend="list-basic-ifconfig-side-effects-up">this summary of
|
|
side effects</link> of interface definition and activation with
|
|
<command>ifconfig</command> or <command>ip address</command>.
|
|
</para>
|
|
<para>
|
|
</para>
|
|
</section>
|
|
</section>
|
|
<section id="routing-rpdb">
|
|
<title>Routing Policy Database (RPDB)</title>
|
|
<indexterm zone="routing-rpdb">
|
|
<primary>routing policy database</primary>
|
|
<see>RPDB</see>
|
|
</indexterm>
|
|
<indexterm zone="routing-rpdb">
|
|
<primary>RPDB</primary>
|
|
</indexterm>
|
|
<para>
|
|
The routing policy database (RPDB) controls the order in which the
|
|
kernel searches through the routing tables. Each rule has a priority,
|
|
and rules are examined sequentially from rule 0 through rule 32767.
|
|
</para>
|
|
<para>
|
|
When a new packet arrives for routing (assuming the routing cache
|
|
is empty), the kernel begins at the highest priority rule in the
|
|
RPDB--rule 0. The kernel iterates over each rule in turn until the
|
|
packet to be routed matches a rule. When this happens the kernel
|
|
follows the instructions in that rule. Typically, this causes the
|
|
kernel to perform a route lookup in a specified routing table. If a
|
|
matching route is found in the routing table, the kernel uses that
|
|
route. If no such route is found, the kernel returns to traverse the
|
|
RPDB again, until every option has been exhausted.
|
|
</para>
|
|
<para>
|
|
The priority-based rule system provides a flexible way to define routes
|
|
while taking advantage of the traditional routing table concept.
|
|
For a complete picture of the entire route selection process including
|
|
the RPDB, see
|
|
<link linkend="routing-selection-adv">the section on routing
|
|
selection</link>.
|
|
</para>
|
|
<para>
|
|
There are a number of different rule types available for use in the
|
|
routing policy database. These rule types have a striking similarity to
|
|
the
|
|
<link linkend="list-routing-route-types">route types</link> available
|
|
for route entries.
|
|
</para>
|
|
<variablelist id="list-routing-rule-types">
|
|
<varlistentry id="list-routing-rule-types-unicast">
|
|
<term>unicast</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-rule-types-unicast">
|
|
<primary>RPDB</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>unicast</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
A unicast rule entry is the most common rule type. This rule type
|
|
simple causes the kernel to refer to the specified routing table
|
|
in the search for a route. If no rule type is specified on the
|
|
command line, the rule is assumed to be a unicast rule.
|
|
</para>
|
|
<example id="ex-list-rule-unicast">
|
|
<title>unicast rule type</title>
|
|
<programlisting>
|
|
<userinput>ip rule add unicast from 192.168.100.17 table 5</userinput>
|
|
<userinput>ip rule add unicast iif eth7 table 5</userinput>
|
|
<userinput>ip rule add unicast fwmark 4 table 4</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-rule-types-nat">
|
|
<term>nat</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-rule-types-nat">
|
|
<primary>RPDB</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>nat</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
The nat rule type is required for correct operation of stateless
|
|
NAT. This rule is typically coupled with a corresponding nat
|
|
route entry. The RPDB nat entry causes the kernel to rewrite the
|
|
source address of an outbound packet. See
|
|
<xref linkend="nat-stateless"/> for a fuller discussion of network
|
|
address translation in general.
|
|
</para>
|
|
<example id="ex-list-rule-nat">
|
|
<title>nat rule type</title>
|
|
<programlisting>
|
|
<userinput>ip rule add nat 193.7.255.184 from 172.16.82.184</userinput>
|
|
<userinput>ip rule add nat 10.40.0.0 from 172.40.0.0/16</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-rule-types-unreachable">
|
|
<term>unreachable</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-rule-types-unreachable">
|
|
<primary>RPDB</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>unreachable</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
Any route lookup matching a rule entry with an unreachable rule
|
|
type will cause the kernel to generate an ICMP unreachable to
|
|
the source address of the packet.
|
|
</para>
|
|
<example id="ex-list-rule-unreachable">
|
|
<title>unreachable rule type</title>
|
|
<programlisting>
|
|
<userinput>ip rule add unreachable iif eth2 tos 0xc0</userinput>
|
|
<userinput>ip rule add unreachable iif wan0 fwmark 5</userinput>
|
|
<userinput>ip rule add unreachable from 192.168.7.0/25</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-rule-types-prohibit">
|
|
<term>prohibit</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-rule-types-prohibit">
|
|
<primary>RPDB</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>prohibit</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
Any route lookup matching a rule entry with a prohibit rule type
|
|
will cause the kernel to generate an ICMP prohibited to the source
|
|
address of the packet.
|
|
</para>
|
|
<example id="ex-list-rule-prohibit">
|
|
<title>prohibit rule type</title>
|
|
<programlisting>
|
|
<userinput>ip rule add prohibit from 209.10.26.51</userinput>
|
|
<userinput>ip rule add prohibit to 64.65.64.0/18</userinput>
|
|
<userinput>ip rule add prohibit fwmark 7</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry id="list-routing-rule-types-blackhole">
|
|
<term>blackhole</term>
|
|
<listitem>
|
|
<indexterm zone="list-routing-rule-types-blackhole">
|
|
<primary>RPDB</primary>
|
|
<secondary>entry types</secondary>
|
|
<tertiary>blackhole</tertiary>
|
|
</indexterm>
|
|
<para>
|
|
While traversing the RPDB, any route lookup which matches a rule
|
|
with the blackhole rule type will cause the packet to be dropped.
|
|
No ICMP will be sent and no packet will be forwarded.
|
|
</para>
|
|
<example id="ex-list-rule-blackhole">
|
|
<title>blackhole rule type</title>
|
|
<programlisting>
|
|
<userinput>ip rule add blackhole from 209.10.26.51</userinput>
|
|
<userinput>ip rule add blackhole from 172.19.40.0/24</userinput>
|
|
<userinput>ip rule add blackhole to 10.182.17.64/28</userinput>
|
|
</programlisting>
|
|
</example>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
<para>
|
|
The routing policy database provides the core of functionality around
|
|
which the policy routing and advanced routing features can be built.
|
|
</para>
|
|
</section>
|
|
<section id="routing-icmp">
|
|
<title>ICMP and Routing</title>
|
|
<!--
|
|
|
|
#
|
|
# content to add here:
|
|
#
|
|
# ignoring ICMP redirects (sysctl or pf); side effects, how to
|
|
# sending ICMP redirects
|
|
# suppressing generation of ICMP redirects (sysctl); how to
|
|
#
|
|
|
|
-->
|
|
<para>
|
|
ICMP is a very important part of the communication between hosts on
|
|
IP networks. Used by routers and endpoints (clients and servers)
|
|
ICMP communicates error conditions in networks and
|
|
provides a means for endpoints to receive information
|
|
about a network path or requested connection.
|
|
</para>
|
|
<para>
|
|
One of the commonest uses of ICMP by the administrator of a network is
|
|
the use of
|
|
<link linkend="tools-ping"><command>ping</command></link> to detect the
|
|
state of a machine in the network. There are other types of ICMP which
|
|
are used for other inter-computer communication. One other common type
|
|
of ICMP is the ICMP returned by a router or host which is not accepting
|
|
connections. Essentially, the host returns the ICMP as a polite method
|
|
of saying <quote>Go away.</quote>.
|
|
</para>
|
|
<para>
|
|
</para>
|
|
<section id="routing-icmp-mtu">
|
|
<title>MTU, MSS, and ICMP</title>
|
|
<!--
|
|
|
|
#
|
|
# content to add here:
|
|
#
|
|
# discuss path MTU, remove MSS; discuss necessary ICMP
|
|
# for communication with other hosts; xref pf-necessary-icmp
|
|
#
|
|
|
|
-->
|
|
<para>
|
|
One important use of ICMP, which is completely transparent
|
|
to most users (and indeed many admins), is the use of ICMP to discover
|
|
the Path Maximum Transmission Unit (PMTU). By discovering the Path MTU
|
|
and transmitting packets with this the MTU, a host can
|
|
minimize the delay of traffic due to fragmentation, and
|
|
(theoretically) attain a more even rate of data transmission. Because
|
|
each destination may have a different MTU due to different network
|
|
paths, the MTU is a per route attribute stored in the
|
|
<link linkend="routing-cache">routing cache</link>.
|
|
</para>
|
|
<!-- FIXME; make sure to make a full discussion of PMTU -->
|
|
<!--
|
|
|
|
Example from Giovanni Quadriglio. Needs to be incorporated into the
|
|
document.
|
|
|
|
As usual I've forgotten the PMTU example
|
|
|
|
- - Example PMTU - playing with Path MTU Discovering
|
|
|
|
eth = 0 1 0 0
|
|
- - - - - - - - - - - -
|
|
|server| - - - |router| - - - |client|
|
|
- - - - - - - - - - - -
|
|
MTU = 1500 1000 1500 1500
|
|
|
|
|
|
[root@server]# nc -l -p 9999
|
|
[root@router]# ifconfig eth1 mtu 1000
|
|
|
|
Now if on router we issue:
|
|
|
|
[root@client]# tcpdump -i eth0
|
|
|
|
and later on client we issue:
|
|
|
|
[root@client]# cat data | nc server 9999
|
|
|
|
(data is a file of 2000 byte in size for example)
|
|
|
|
we can see router sends the client the ICMP error:
|
|
|
|
server unreachable - need to frag but DF bit set (mtu=1000) !
|
|
|
|
now if PMTU discovery is enabled on client the new packet len. will be
|
|
recalculated with this new MTU in mind so that DF is always set
|
|
and the packet will reach server without being fragmented
|
|
|
|
if on client we had issued:
|
|
[root@client]# sysclt -w net.ipv4.ip_no_pmtu_disc=1
|
|
|
|
PMTU discovery on client would has been disabled. New packets starting from
|
|
client
|
|
will not have DF bit set and fragmentation will occour during the
|
|
path from client to server (i.e router fragments the packet).
|
|
|
|
It could happen to touch this parameter because of bad ICMP filtering
|
|
on some router.
|
|
|
|
|
|
-->
|
|
<para>
|
|
Path MTU can be quite easily broken if any single hop along the way
|
|
blocks all ICMP. Be sure to allow ICMP unreachable/fragmentation
|
|
needed packets into and out of your network. This will prevent you
|
|
from being one of the unclueful network admins who cause PMTU
|
|
problems.
|
|
</para>
|
|
<!-- FIXME; XREF link to minimum firewall for ICMP -->
|
|
<para>
|
|
</para>
|
|
</section>
|
|
<section id="routing-icmp-redirect">
|
|
<title>ICMP Redirects and Routing</title>
|
|
<para>
|
|
An ICMP redirect is a router's way of communicating
|
|
that there is a better path out of this network or into another one
|
|
than the one the host had chosen. In
|
|
<link linkend="example-network-netmap">the example network</link>,
|
|
&tristan; has a route to the world through &masq-gw; and a route to
|
|
192.168.98.0/24 through &isdn-router;. If &tristan; sends a packet
|
|
for 192.168.98.0/24 to &masq-gw;, the optimal outcome is for
|
|
&masq-gw; to suggest with an ICMP redirect that &tristan; send such
|
|
packets via &isdn-router; instead.
|
|
</para>
|
|
<para>
|
|
By this method, hosts can learn what networks are reachable
|
|
through which routers on the local network segment. ICMP redirect
|
|
messages, however, are easy to forge, and were (at one time) used to
|
|
subvert poorly configured machines. While this is infrequently a
|
|
problem on the Internet today,
|
|
it's still good practice to ignore ICMP redirect
|
|
messages from public networks. Create static routes where
|
|
necessary on private and public networks to
|
|
prevent ICMP redirect messages from being generated on your network.
|
|
</para>
|
|
<para>
|
|
To examine an example of ICMP redirect in action, we simply
|
|
need to send a packet directly from &tristan; to
|
|
&morgan;. We assume that &masq-gw; has a route to 192.168.98.0/24
|
|
via 192.168.99.1 (&isdn-router;), that &tristan; has no
|
|
such route.
|
|
</para>
|
|
<example id="ex-routing-icmp-redirect">
|
|
<title>ICMP Redirect on the Wire
|
|
<footnote>
|
|
<para>
|
|
Consult <xref linkend="tb-example-network-hosts"/> for details on
|
|
the IP and MAC addresses of the hosts referred to in this
|
|
example.
|
|
</para>
|
|
</footnote>
|
|
</title>
|
|
<programlisting>
|
|
<prompt>[root@tristan]# </prompt><userinput>echo test | nc 192.168.98.82 22</userinput>
|
|
<prompt>[root@tristan]# </prompt><userinput>tcpdump -nneqti eth0</userinput>
|
|
<computeroutput>0:80:c8:f8:4a:51 0:80:c8:f8:5c:71 74: 192.168.99.35.54510 > 192.168.98.82.22: tcp 0 (DF)
|
|
0:80:c8:f8:5c:71 0:80:c8:f8:4a:51 102: 192.168.99.254 > 192.168.99.35: icmp: redirect 192.168.98.82 to host 192.168.99.1 [tos 0xc0]
|
|
0:80:c8:f8:5c:71 0:c0:7b:45:6a:39 74: 192.168.99.35.54510 > 192.168.98.82.22: tcp 0 (DF)</computeroutput>
|
|
</programlisting>
|
|
</example>
|
|
<para>
|
|
There's a great deal of information above, so let's examine the
|
|
important parts. We have the first three packets which passed by our
|
|
NIC as a result of this attempt to establish a session. First, we see
|
|
a packet from &tristan; bound for &morgan; with &tristan;'s source MAC
|
|
and &masq-gw;'s destination MAC. Because &masq-gw; is &tristan;'s
|
|
default gateway, &tristan; will send all packets there.
|
|
</para>
|
|
<para>
|
|
The next packet is the ICMP redirect, informing &tristan; of a
|
|
better route. It includes several pieces of information.
|
|
Implicitly, the source IP indicates what router is suggesting the
|
|
alternate route, and the contents specify what the intended
|
|
destination was, and what the better route is. Note that &masq-gw;
|
|
suggests using 192.168.99.1 (&isdn-router;) as the gateway for this
|
|
destination.
|
|
</para>
|
|
<para>
|
|
The final packet is part of the intended session, but has the MAC
|
|
address of &masq-gw; on it. &masq-gw; has (courteously) informed us
|
|
that we should not use it as a route for the intended destination, but
|
|
has also (courteously) forwarded the packet as we had requested. In
|
|
this small network, it is acceptable to allow ICMP redirect messages,
|
|
although these should always be dropped at network borders, both
|
|
inbound and outbound.
|
|
</para>
|
|
<para>
|
|
So, in summary, ICMP redirect messages are not intrinsically dangerous
|
|
or problematic, but they shouldn't exist in well-maintained networks.
|
|
If you happen to see them growing in the shadows of your network, some
|
|
careful observation should show you what hosts are affected and which
|
|
routing tables could use some attention.
|
|
</para>
|
|
</section>
|
|
</section>
|
|
</chapter>
|