IP Routing

IP Routing IP Routing routing Routing Routing is fundamental to the design of the Internet Protocol. IP routing has been cleverly designed to minimize the complexity for leaf nodes and networks. Linux can be used as a leaf node, such as a workstation, where setting the IP address, netmask and default gateway suffices for all routing needs. Alternatively, the same routing subsystem can be used in the core of a network connecting multiple public and private networks. This chapter will begin with the basics of IP routing with linux, routing to locally connected destinations, routing to destinations through the default gateway, and using linux as a router. Subsequent topics will include the kernel's route selection algorithm, the routing cache, routing tables, the routing policy database, and issues with ICMP and routing. The precinct of this documentation is primarily static routing. Though dynamic routing is important to large networks, Internet service providers, and backbone providers, this documentation is targetted for smaller networks, particularly networks which use static routing. Nonetheless, the concepts governing the manipulation of a packet in the kernel, and how routing decisions are made by the kernel are applicable to dynamic routing environments. The linux routing subsystem has been designed with large scale networks in mind, without forgetting the need for easy configurability for leaf nodes, such as workstations and servers.

Introduction to Linux Routing The design of IP routing allows for very simple route definitions for small networks, while not hindering the flexibility of routing in complex environments. A key concept in IP routing is the ability to define what addresses are locally reachable as opposed to not directly known destinations. Every IP capable host knows about at least three classes of destination: itself, locally connected computers and everywhere else. Most fully-featured IP-aware networked operating systems (all unix-like operating systems with IP stacks, modern Macintoshes, and modern Windows) include support for the loopback device and IP. This is an IP and range configured on the host machine itself which allows the machine to talk to itself. Linux systems can communicate over IP on any locally configured IP address, whether on the loopback device or not. This is the first class of destinations: locally hosted addresses. The second class of IP addresses are addresses in the locally connected network segment. Each machine with a connection to an IP network can reach a subset of the entire IP address space on its directly connected network interface. All other hosts or destination IPs fall into a third range. Any IP which is not on the machine itself or locally reachable (i.e. connected to the same media segment) is only reachable through an IP routing device. This routing device must have an IP address in a locally reachable IP address range. All IP networking is a permutation of these three fundamental concepts of reachability. This list summarizes the three possible classifications for reachability of destination IP addresses from any single source machine. The IP address is reachable on the machine itself. Under linux this is considered scope host and is used for IPs bound to any network device including loopback devices, and the network range for the loopback device. Addresses of this nature are called local IPs or locally hosted IPs. The IP address is reachable on the directly connected link layer medium. Addresses of this type are called locally reachable or (preferred) directly reachable IPs. The IP address is ultimately reachable through a router which is reachable on a directly connected link layer medium. This class of IP addresses is only reachable through a gateway. As a practical description of the above, this partial diagram of the example network shows two machines connected to 192.168.99.0/24. On &tristan; the IP addresses 127.0.0.1 (loopback--not pictured) and 192.168.99.35 are considered locally hosted IP addresses. The directly reachable IP addresses fall inside the 192.168.99.0/24 network. Any other destination addresses are only reachable through a gateway, probably &masq-gw;. Classes of IP addresses Before examining the routing system in more detail, there are some terms to identify and define. These terms are general IP networking terms and should be familiar to users who have used IP on other operating systems and networking equipment. octet octet IP addressing, octet IP addressing octet A single number between decimal 0 and 255, hexadecimal 0x00 and 0xff. An octet is a single byte in size. Examples: 140, 254, 255, 1, 0, 7. IP address IP IP address IP addressing, address IP addressing address A locally unique four octet logical identifier which a machine can use to communicate using the Internet Protocol. This address is determined by combining the network address and the administratively assigned host address. Simply put, the IP address is a unique number identifying a host on a network. Examples: 192.168.99.35, 140.71.38.7, 205.254.210.186. host address portion IP addressing host address portion The rightmost bits (frequently octets) in an IP address which are not a part of the network address. The part of an IP address which identifies the computer on a network independent of the network. Examples: 192.168.1.27/24, 10.10.17.24/8, 172.20.158.75/16. network address network network prefix subnetwork address network address IP addressing, network address IP addressing network address A four octet address and network mask identifying the usable range of IP addresses. Conventional and CIDR notations combine the four bare octets with the netmask or prefix length to define this address. Briefly, a network address is the first address in a range, and is reserved to identify the entire network. At least one reader (CAO) has pointed out to me that there is ambiguity in the meaning and common usage of the term network address. While occasionally used to refer to a single IP address at the top of a range of addresses, the primary meaning requires the implicit network mask. Historically, this term has always meant the IP address at the top of a range AND the netmask identifying the set of available addresses. Without this latter piece of information, the network address is simply an IP address. Technically, the use of this term to mean a single IP at the top of the range is incorrect, although not uncommon. Examples: 192.168.187.0/24, 205.254.211.192/26, 4.20.17.128/255.255.255.248, 10.0.0.0/255.0.0.0, 12.35.17.112/28. network mask netmask network bitmask netmask IP addressing, network mask network mask IP addressing, network mask IP addressing network mask A four-octet set of bits which, when AND'd with a particular IP address produces the network address. Combined with a network address or IP address, the netmask identifies the range of IP addresses which are directly reachable. Examples: 255.255.255.0, 255.255.0.0, 255.255.192.0, 255.255.255.224, 255.0.0.0. prefix length prefix length IP addressing, prefix length IP addressing prefix length An alternate representation of network mask, this is a single integer between 0 and 32, identifying the number of significant bits in an IP address or network address. This is the "slash-number" component of a CIDR address. Examples: 4.20.17.0/24, 66.14.17.116/30, 10.158.42.72/29, 10.48.7.198/9, 192.168.154.64/26. broadcast address broadcast address (IP) IP addressing, broadcast address IP addressing broadcast address A four octet address derived from an OR operation between the host address portion of a network address and the full broadcast special 255.255.255.255. The broadcast is the highest allowable address in a given network, and is reserved for broadcast traffic. Examples: 192.168.205.255/24, 172.18.255.255/16, 12.7.149.63/26. These definitions are common to IP networking in general, and are understood by all in the IP networking community. For less terse introductory material on matters of IP network addressing in general, see . As is apparent from the interdependencies amongst the above definitions, each term defines a separate part of the concept of the relationships between an IP address and its network. A good IP calculator can assist in mastering these IP fundamentals. Using ipcalc to display IP information [user@workstation]$ ipcalc -n 12.7.149.0/26 Address: 12.7.149.0 00001100.00000111.10010101.00 000000 Netmask: 255.255.255.192 = 26 11111111.11111111.11111111.11 000000 Wildcard: 0.0.0.63 00000000.00000000.00000000.00 111111 => Network: 12.7.149.0/26 00001100.00000111.10010101.00 000000 (Class A) Broadcast: 12.7.149.63 00001100.00000111.10010101.00 111111 HostMin: 12.7.149.1 00001100.00000111.10010101.00 000001 HostMax: 12.7.149.62 00001100.00000111.10010101.00 111110 Hosts/Net: 62 A tool similar to the one shown in can assist in visualizing the relationships among IP addressing concepts. Subequently, this chapter will introduce some concrete examples of routing in a real network. The example network illustrates this network and all of the addresses involved.

Routing to Locally Connected Networks routing to locally reachable networks Any IP network is defined by two sets of numbers: network address and netmask. By convention, there are two ways to represent these two numbers. Netmask notation is the convention and tradition in IP networking although the more succinct CIDR notation is gaining popularity. In the example network, &isolde; has IP address 192.168.100.17. In CIDR notation, &isolde;'s address is 192.168.100.17/24, and in traditional netmask notation, 192.168.100.17/255.255.255.0. Any of the IP calculators, confirms that the first usable IP address is 192.168.100.1 and the last usable IP address is 192.168.100.254. Importantly, the IP network address, 192.168.100.0/24, is reachable through the directly connected Ethernet interface (refer to classification 2). Therefore, &isolde; should be able to reach any IP address in this range directly on the locally connected Ethernet segment. Below is the routing table for &isolde;, first shown with the conventional route -n output The route -n output can also be produced with netstat -rn and is commonly used by admininstrators who rely on platform independent behaviour across heterogeneous Unix and Unix-like systems. This traditional routing table output uses conventional netmask notation to denote network size. and then with the ip route show Refer to the ip route section for a fuller discussion of this linux specific tool. The routing table output from ip route uses exclusively CIDR notation. command. Each of these tools conveys the same routing table and operates on the same kernel routing table. For more on the routing table displayed in , consult . Identifying the locally connected networks with <command>route</command> [root@isolde]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 192.168.100.254 0.0.0.0 UG 0 0 0 eth0 [root@isolde]# ip route show 192.168.100.0/24 dev eth0 scope link 127.0.0.0/8 dev lo scope link default via 192.168.100.254 dev eth0 In the above example, the locally reachable destination is 192.168.100.0/255.255.255.0 which can also be written 192.168.100.0/24 as in ip route show. In classful networking terms, the network to which &isolde; is directly connected is called a class C sized network. When a process on &isolde; needs to send a packet to another machine on the locally connected network, packets will be sent from 192.168.100.17 (&isolde;'s IP). The kernel will consult the routing table to determine the route and the source address to use when sending this packet. Assuming the destination is 192.168.100.32, the kernel will find that 192.168.100.32 falls inside the IP address range 192.168.100.0/24 and will select this route for the outbound packet. For further details on source address selection, see . The source address on the outbound packet conveys vital information to the host receiving the packet. In order for the packet to be able to return, &isolde; has to use an IP address that is locally available, 192.168.100.32 has to have a route to &isolde; and neither host must block the packet. The packet will be sent to the locally connected network segment directly, because &isolde; interprets from the routing table that 192.168.100.32 is directly reachable through the physical network connection on eth0. Occasionally, a machine will be directly connected to two different IP networks on the same device. The routing table will show that both networks are reachable through the same physical device. For more on this topic, see . Similarly, multi-homed hosts will have routes for all locally connected networks through the locally-connected network interface. For more on this sort of configuration, see . This covers the classification of IP destinations which are available on a locally connected network. This highlights the importance of an accurate netmask and network address. The next section will cover IP ranges which are neither locally hosted nor fall in the range of the locally reachable networks. These destinations must be reached through a router.

Sending Packets Through a Gateway routing to a default gateway By comparison to the total number of publicly accessible hosts on the Internet there is an almost insignificant number of hosts inside any locally reachable network. This means that the majority of potential destinations are only available via a router. Any machine which will accept and forward packets between two networks is a router. Every router is at least dual-homed; one interface connects to one network, and a second interface connects to another network. This interface is frequently an independent NIC, although it might be a virtual interface, such as a VLAN interface. Machines connected to either network learn by a routing protocol or are statically configured to pass traffic for the other network to the router. For &tristan;, there are two different paths out of 192.168.99.0/24. One path has another leaf network, 192.168.98.0/24, and the other path has many networks, including the Internet. The routing table on &tristan; should then contain two different routes out of the network. One destination 192.168.98.0/24 will be reachable through 192.168.99.1. So, if &tristan; has a packet with a destination IP address in the range of the branch office network, it will choose to send the packet directly to &isdn-router;. The default route is another way to say the route for destination 0/0. This is the most general possible route. It is the catch-all route. If no more specific route exists in a routing table, a default route will be used. Many servers and workstations are connected to leaf networks with only one router, hence shows a very common sort of routing table. There's a route for localhost, for the locally connected IP network, and a default route. For Internet-connected hosts, the default route is customarily set to the IP of the locally reachable router which has a path to the Internet. Each router in turn has a default gateway pointing to another Internet-connected router until the packet is handed off to an Internet Service Provider's network.

Operating as a Router router operating as a IP forwarding forwarding IP forwarding sysctl ip_forward Operating as a router allows a linux machine to accept packets on one interface and transmit them on another. This is the nature of a router. The process of accepting and transmitting IP packets is known as forwarding. IP forwarding is a requirement for many of the networking techniques identified here. Stateless NAT and firewalling, transparent proxying and masquerading all require the support of IP forwarding in order to function correctly. The sysctl net/ipv4/ip_forward toggles the IP forwarding functionality on a linux box. Note that setting this sysctl alters other routing-related sysctl entries, so it is wise to set this first, and then alter other entries. Frequently, an administrator will forget this simple and crucial detail when configuring a new machine to operate as a router only to be frustrated at the simple error. The sysctl net/ipv4/conf/$DEV/forward defaults to the value of net/ipv4/ip_forward, but can be independently modified. In order to allow forwarding of packets between two interfaces while prohibiting such behaviour on a third interface, this sysctl can be employed.

Route Selection route selection Crucial to the proper ability of hosts to exchange IP packets is the correct selection of a route to the destination. The rules for the selection of route path are traditionally made on a hop-by-hop basis This document could stand to allude to MPLS implementations under linux, for those who want to look at traffic engineering and packet tagging on backbones. This is certainly not in the scope of this chapter, and should be in a separate chapter, which covers developing technologies. based solely upon the destination address of the packet. Linux behaves as a conventional routing device in this way, but can also provide a more flexible capability. Routes can be chosen and prioritized based on other packet characteristics. The route selection algorithm under linux has been generalized to enable the powerful latter scenario without complicating the overwhelmingly common case of the former scenario.

The Common Case The above sections on routing to a local network and the default gateway expose the importance of destination address for route selection. In this simplified model, the kernel need only know the destination address of the packet, which it compares against the routing tables to determine the route by which to send the packet. The kernel searches for a matching entry for the destination first in the routing cache and then the main routing table. In the case that the machine has recently transmitted a packet to the destination address, the routing cache will contain an entry for the destination. The kernel will select the same route, and transmit the packet accordingly. If the linux machine has not recently transmitted a packet to this destination address, it will look up the destination in its routing table using a technique known longest prefix match Refer to RFC 3222 for further details. . In practical terms, the concept of longest prefix match means that the most specific route to the destination will be chosen. longest prefix match route selection, longest prefix match route selection longest prefix match The use of the longest prefix match allows routes for large networks to be overridden by more specific host or network routes, as required in , for example. Conversely, it is this same property of longest prefix match which allows routes to individual destinations to be aggregated into larger network addresses. Instead of entering individual routes for each host, large numbers of contiguous network addresses can be aggregated. This is the realized promise of CIDR networking. See for further details. In the common case, route selection is based completely on the destination address. Conventional (as opposed to policy-based) IP networking relies on only the destination address to select a route for a packet. Because the majority of linux systems have no need of policy based routing features, they use the conventional routing technique of longest prefix match. While this meets the needs of a large subset of linux networking needs, there are unrealized policy routing features in a machine operating in this fashion.

The Whole Story With the prevalence of low cost bandwidth, easily configured VPN tunnels, and increasing reliance on networks, the technique of selecting a route based solely on the destination IP address range no longer suffices for all situations. The discussion of the common case of route selection under linux neglects one of the most powerful features in the linux IP stack. Since kernel 2.2, linux has supported policy based routing through the use of multiple routing tables and the routing policy database (RPDB). Together, they allow a network administrator to configure a machine select different routing tables and routes based on a number of criteria. Selectors available for use in policy-based routing are attributes of a packet passing through the linux routing code. The source address of a packet, the ToS flags, an fwmark (a mark carried through the kernel in the data structure representing the packet), and the interface name on which the packet was received are attributes which can be used as selectors. By selecting a routing table based on packet attributes, an administrator can have granular control over the network path of any packet. With this knowledge of the RPDB and multiple routing tables, let's revisit in detail the method by which the kernel selects the proper route for a packet. Understanding the series of steps the kernel takes for route selection should demystify advanced routing. In fact, advanced routing could more accurately be called policy-based networking. When determining the route by which to send a packet, the kernel always consults the routing cache first. The routing cache is a hash table used for quick access to recently used routes. If the kernel finds an entry in the routing cache, the corresponding entry will be used. If there is no entry in the routing cache, the kernel begins the process of route selection. For details on the method of matching a route in the routing cache, see . The kernel begins iterating by priority through the routing policy database. For each matching entry in the RPDB, the kernel will try to find a matching route to the destination IP address in the specified routing table using the aforementioned longest prefix match selection algorithm. When a matching destination is found, the kernel will select the matching route, and forward the packet. If no matching entry is found in the specified routing table, the kernel will pass to the next rule in the RPDB, until it finds a match or falls through the end of the RPDB and all consulted routing tables. Here is a snippet of python-esque pseudocode to illustrate the kernel's route selection process again. Each of the lookups below occurs in kernel hash tables which are accessible to the user through the use of various &iproute2; tools. route selection algorithm Routing Selection Algorithm in Pseudo-code if packet.routeCacheLookupKey in routeCache : route = routeCache[ packet.routeCacheLookupKey ] else for rule in rpdb : if packet.rpdbLookupKey in rule : routeTable = rule[ lookupTable ] if packet.routeLookupKey in routeTable : route = route_table[ packet.routeLookup_key ] This pseudocode provides some explanation of the decisions required to find a route. The final piece of information required to understand the decision making process is the lookup process for each of the three hash table lookups. In , each key is listed in order of importance. Optional keys are listed in italics and represent keys that will be matched if they are present. route selection lookup keys Keys used for hash table lookups during route selection route cache RPDB route table destination source destination source destination ToS ToS ToS scope fwmark fwmark oif iif iif

The route cache (also the forwarding information base) can be displayed using ip route show cache. The routing policy database (RPDB) can be manipulated with the ip rule utility. Individual route tables can be manipulated and displayed with the ip route command line tool. Listing the Routing Policy Database (RPDB) [root@isolde]# ip rule show 0: from all lookup local 32766: from all lookup main 32767: from all lookup 253 Observation of the output of ip rule show in on a box whose RPDB has not been changed should reveal a high priority rule, rule 0. This rule, created at RPDB initialization, instructs the kernel to try to find a match for the destination in the local routing table. If there is no match for the packet in the local routing table, then, per rule 32766, the kernel will perform a route lookup in the main routing table. Normally, the main routing table will contain a default route if not a more specific route. Failing a route lookup in the main routing table the final rule (32767) instructs the kernel to perform a route lookup in table 253. A common mistake when working with multiple routing tables involves forgetting about the statelessness of IP routing. This manifests when the user configuring the policy routing machine accounts for outbound packets (via &fwmark;, or ip rule selectors), but forgets to account for the return packets.

Summary For more ideas on how to use policy routing, how to work with multiple routing tables, and how to troubleshoot, see . Yeah. That's it. So there.

Source Address Selection source address selection route selection The selection of the correct source address is key to correct communication between hosts with multiple IP addresses. If a host chooses an address from a private network to communicate with a public Internet host, it is likely that the return half of the communication will never arrive. The initial source address for an outbound packet is chosen in according to the following series of rules. The application can request a particular IP Many networking applications accept a command line option to prefer a particular source address. The call to select a particular IP is known as bind(), so the command line option frequently contains the word bind, e.g., . Examples of command line tools allowing specification of the source address are nc -s $BINDADDR $DEST $PORT or socat - TCP4:$REMOTEHOST:$REMOTEPORT,bind=$BINDADDR. , the kernel will use the &src; hint from the chosen route path In this case, the route has already been selected (see ) and the chosen route entry includes a hint for preferred source address on outbound packets specifically for this purpose. For examples on configuring the routing tables to include this parameter, see . , or, lacking this hint, the kernel will choose the first address configured on the interface which falls in the same network as the destination address or the nexthop router. The following list recapitulates the manner by which the kernel determines what the source address of an outbound packet. The application is already using the socket, in which case, the source address has been chosen. Also, the application can specifically request a particular address (not necessarily a locally hosted IP; see ) using the bind call. The kernel performs a route lookup and finds an outbound route for the destination. If the route contains the &src; parameter, the kernel selects this IP address for the outbound packet. Also refer to this excerpt from the &iproute2; command reference.

Routing Cache routing cache forwarding information base routing cache The routing cache is also known as the forwarding information base (FIB). This term may be familiar to users of other routing systems. The routing cache stores recently used routing entries in a fast and convenient hash lookup table, and is consulted before the routing tables. If the kernel finds a matching entry during route cache lookup, it will forward the packet immediately and stop traversing the routing tables. Because the routing cache is maintained by the kernel separately from the routing tables, manipulating the routing tables may not have an immediate effect on the kernel's choice of path for a given packet. To avoid a non-deterministic lag between the time that a new route is entered into the kernel routing tables and the time that a new lookup in those route tables is performed, use ip route flush cache. Once the route cache has been emptied, new route lookups (if not by a packet, then manually with ip route get) will result in a new lookup to the kernel routing tables. The following is a listing of the hash lookup keys in the routing cache and a description of each key. Compare this list with the elements identified in . dst Destination Address routing cache lookup keys dst The destination IP address of the packet. This is the destination address on the packet at the time of the route lookup. The address is a host address. All 32 bits are significant during this lookup. src Source Address routing cache lookup keys src The source IP address of the packet. This is the source address on the packet at the time of the route lookup. The address is a host address. All 32 bits are significant during this lookup. tos Type of Service routing cache lookup keys tos The ToS marking on the packet. If there is no ToS marking on the packet (tos == 0), this lookup key is unused. If there is a ToS marking, the kernel will search for a match with this ToS value. If no matching (dst, src, tos) is found, the kernel will continue the search for a route by traversing the RPDB. fwmark routing cache lookup keys fwmark The mark on a packet added administratively by the packet filtering engine (ipchains or iptables). This mark is not part of the physical IP packet, and only exists as part of the data structure held in memory on the routing device to represent the IP packet. If there is no fwmark on the packet, this lookup key is unused. When present, the kernel will search for a matching (dst, src, tos?, fwmark) entry. If no matching entry is found, the kernel will continue the search for a route by traversing the RPDB. iif inbound interface routing cache lookup keys iif The name of the interface on which the packet arrived. The following attributes may be stored for each entry in the routing cache. cwnd FIXME Window routing cache attributes cwnd FIXME. A) I don't know what it is. B) I don't know how to describe it. advmss Advertised Maximum Segment Size routing cache attributes advmss src (Preferred Local) Source Address routing cache attributes src mtu Maximum Transmission Unit routing cache attributes mtu rtt Round Trip Time routing cache attributes rtt rttvar Round Trip Time Variation routing cache attributes rttvar FIXME. Gotta find some references to this, too. age routing cache attributes age users routing cache attributes users used routing cache attributes used Collectively the hash keys uniquely identify routes in the forwarding information base (routing cache) and each entry provides attributes of the route.

Routing Tables routing tables multiple Linux kernel 2.2 and 2.4 support multiple routing tables The kernel must be compiled with the option CONFIG_IP_MULTIPLE_TABLES=y. This is common in vendor and stock kernels, both 2.2 and 2.4. . Beyond the two commonly used routing tables (the local and main routing tables), the kernel supports up to 252 additional routing tables. The multiple routing table system provides a flexible infrastructure on top of which to implement policy routing. By allowing multiple traditional routing tables (keyed primarily to destination address) to be combined with the routing policy database (RPDB) (keyed primarily to source address), the kernel supports a well-known and well-understood interface while simultaneously expanding and extending its routing capabilities. Each routing table still operates in the traditional and expected fashion. Linux simply allows you to choose from a number of routing tables, and to traverse routing tables in a user-definable sequence until a matching route is found. routing tables key fields Any given routing table can contain an arbitrary number of entries, each of which is keyed on the following characteristics (cf. ) destination address; a network or host address (primary key) tos; Type of Service scope output interface For practical purposes, this means that (even) a single routing table can contain multiple routes to the same destination if the ToS differs on each route or if the route applies to a different interface If somebody has used scope or oif as additional keys in a routing table, and has an example, I'd love to see it, for possible inclusion in this documentation. . Kernels supporting multiple routing tables refer to routing tables by unique integer slots between 0 and 255 Can anybody describe to me what is in table 0? It looks almost like an aggregation of the routing entries in routing tables 254 and 255. . The two routing tables normally employed are table 255, the &local; routing table, and table 254, the &main; routing table. For examples of using multiple routing tables, see , in particular, , and . Also be sure to read and . The ip route and ip rule commands have built in support for the special tables &main; and &local;. Any other routing tables can be referred to by number or an administratively maintained mapping file, /etc/iproute2/rt_tables. The format of this file is extraordinarily simple. Each line represents one mapping of an arbitrary string to an integer. Comments are allowed. Typical content of <filename>/etc/iproute2/rt_tables</filename> # # reserved values # 255 local 254 main 253 default 0 unspec # # local # 1 inr.ruhep The &local; table is a special routing table maintained by the kernel. Users can remove entries from the local routing table at their own risk. Users cannot add entries to the local routing table. The file /etc/iproute2/rt_tables need not exist, as the &iproute2; tools have a hard-coded entry for the &local; table. The main routing table is the table operated upon by route and, when not otherwise specified, by ip route. The file /etc/iproute2/rt_tables need not exist, as the &iproute2; tools have a hard-coded entry for the &main; table. The default routing table is another special routing table, but WHY is it special!?! Operating on the unspec routing table appears to operate on all routing tables simultaneously. Is this true!? What does that imply? This is an example indicating that table 1 is known by the name inr.ruhep. Any references to table inr.ruhep in an ip rule or ip route will substitue the value 1 for the word inr.ruhep. The routing table manipulated by the conventional route command is the &main; routing table. Additionally, the use of both ip address and ifconfig will cause the kernel to alter the local routing table (and usually the main routing table). For further documentation on how to manipulate the other routing tables, see the command description of ip route.

Routing Table Entries (Routes) route types routing tables, entry types routing tables entry types Each routing table can contain an arbitrary number of route entries. Aside from the local routing table, which is maintained by the kernel, and the main routing table which is partially maintained by the kernel, all routing tables are controlled by the administrator or routing software. All routes on a machine can be changed or removed Once again, I recommend caution when altering the local routing table. Removing local route types from the local routing table can break networking in strange and wonderful ways. . Each of the following route types is available for use with the ip route command. Each route type causes a particular sort of behaviour, which is identified in the textual description. Compare the route types described below with the rule types available for use in the RPDB. unicast routing tables entry types unicast A unicast route is the most common route in routing tables. This is a typical route to a destination network address, which describes the path to the destination. Even complex routes, such as nexthop routes are considered unicast routes. If no route type is specified on the command line, the route is assumed to be a unicast route. unicast route types ip route add unicast 192.168.0.0/24 via 192.168.100.5 ip route add default via 193.7.255.1 ip route add unicast default via 206.59.29.193 ip route add 10.40.0.0/16 via 10.72.75.254 broadcast routing tables entry types broadcast This route type is used for link layer devices (such as Ethernet cards) which support the notion of a broadcast address. This route type is used only in the local routing table OK, I'm not absolutely sure you can't use the broadcast route in other routing tables, but I believe you can't. Testing forthcoming... and is typically handled by the kernel. broadcast route types ip route add table local broadcast 10.10.20.255 dev eth0 proto kernel scope link src 10.10.20.67 ip route add table local broadcast 192.168.43.31 dev eth4 proto kernel scope link src 192.168.43.14 local routing tables entry types local The kernel will add entries into the local routing table when IP addresses are added to an interface. This means that the IPs are locally hosted IPs Ibid. I'm not sure that local route types can be used in any routing table other than the local routing table. Testing forthcoming... . local route types ip route add table local local 10.10.20.64 dev eth0 proto kernel scope host src 10.10.20.67 ip route add table local local 192.168.43.12 dev eth4 proto kernel scope host src 192.168.43.14 nat routing tables entry types nat This route entry is added by the kernel in the local routing table, when the user attempts to configure stateless NAT. See for a fuller discussion of network address translation in general. Ibid. nat route types might be ineffectual outside the local routing table. Testing forthcoming... . nat route types ip route add nat 193.7.255.184 via 172.16.82.184 ip route add nat 10.40.0.0/16 via 172.40.0.0 unreachable routing tables entry types unreachable When a request for a routing decision returns a destination with an unreachable route type, an ICMP unreachable is generated and returned to the source address. unreachable route types ip route add unreachable 172.16.82.184 ip route add unreachable 192.168.14.0/26 ip route add unreachable 209.10.26.51 prohibit routing tables entry types prohibit When a request for a routing decision returns a destination with a prohibit route type, the kernel generates an ICMP prohibited to return to the source address. prohibit route types ip route add prohibit 10.21.82.157 ip route add prohibit 172.28.113.0/28 ip route add prohibit 209.10.26.51 blackhole routing tables entry types blackhole A packet matching a route with the route type blackhole is discarded. No ICMP is sent and no packet is forwarded. blackhole route types ip route add blackhole default ip route add blackhole 202.143.170.0/24 ip route add blackhole 64.65.64.0/18 throw routing tables entry types throw The throw route type is a convenient route type which causes a route lookup in a routing table to fail, returning the routing selection process to the RPDB. This is useful when there are additional routing tables. Note that there is an implicit throw if no default route exists in a routing table, so the route created by the first command in the example is superfluous, although legal. throw route types ip route add throw default ip route add throw 10.79.0.0/16 ip route add throw 172.16.0.0/12 The power of these route types when combined with the routing policy database can hardly be understated. All of these route types can be used without the RPDB, although the throw route doesn't make much sense outside of a multiple routing table installation.

The Local Routing Table local routing table routing tables, local routing tables local The local routing table is maintained by the kernel. Normally, the local routing table should not be manipulated, but it is available for viewing. In , you'll see two of the common uses of the local routing table. The first common use is the specification of broadcast address, necessary only for link layers which support broadcast addressing. The second common type of entry in a local routing table is a route to a locally hosted IP. The route types found in the local routing table are local, nat and broadcast. These route types are not relevant in other routing tables, and other route types cannot be used in the local routing table. If the machine has several IP addresses on one Ethernet interface, there will be a route to each locally hosted IP in the local routing table. This is a normal side effect of bringing up an IP address on an interface under linux. Maintenance of the broadcast and local routes in the local routing table can only be done by the kernel. Kernel maintenance of the &local; routing table [root@real-server]# ip address show dev eth1 6: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:80:c8:e8:1e:fc brd ff:ff:ff:ff:ff:ff inet 10.10.20.89/24 brd 10.10.20.255 scope global eth1 [root@real-server]# ip route show dev eth1 10.10.20.0/24 proto kernel scope link src 10.10.20.89 [root@real-server]# ip route show dev eth1 table local broadcast 10.10.20.0 proto kernel scope link src 10.10.20.89 broadcast 10.10.20.255 proto kernel scope link src 10.10.20.89 local 10.10.20.89 proto kernel scope host src 10.10.20.89 [root@real-server]# ip address add 192.168.254.254/24 brd + dev eth1 [root@real-server]# ip address show dev eth1 6: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:80:c8:e8:1e:fc brd ff:ff:ff:ff:ff:ff inet 10.10.20.89/24 brd 10.10.20.255 scope global eth1 inet 192.168.254.254/24 brd 192.168.254.255 scope global eth1 [root@real-server]# ip route show dev eth1 10.10.20.0/24 proto kernel scope link src 10.10.20.89 192.168.254.0/24 proto kernel scope link src 192.168.254.254 [root@real-server]# ip route show dev eth1 table local broadcast 10.10.20.0 proto kernel scope link src 10.10.20.89 broadcast 192.168.254.0 proto kernel scope link src 192.168.254.254 broadcast 10.10.20.255 proto kernel scope link src 10.10.20.89 local 192.168.254.254 proto kernel scope host src 192.168.254.254 local 10.10.20.89 proto kernel scope host src 10.10.20.89 broadcast 192.168.254.255 proto kernel scope link src 192.168.254.254 Note in , that the kernel adds not only the route for the locally connected network in the &main; routing table, but also the three required special addresses in the &local; routing table. Any IP addresses which are locally hosted on the box will have &local; entries in the &local; table. The network address and broadcast address are both entered as broadcast type addresses on the interface to which they have been bound. Conceptually, there is significance to the distinction between a network and broadcast address, but practically, they are treated analogously, by other networking gear as well as the linux kernel. There is one other type of route which commonly ends up in the &local; routing table. When using &iproute2; NAT, there will be entries in the local routing table for each network address translation. Refer to and for example output.

The Main Routing Table main routing table routing tables, main routing tables main The &main; routing table is the routing table most people think of when considering a linux routing table. When no table is specified to an ip route command, the kernel assumes the &main; routing table. The route command only manipulates the &main; routing table. Similarly to the &local; table, the &main; table is populated automatically by the kernel when new interfaces are brought up with IP addresses. Consult the &main; routing table before and after ip address add 192.168.254.254/24 brd + dev eth1 in for a concrete example of this kernel behaviour. Also, visit this summary of side effects of interface definition and activation with ifconfig or ip address.

Routing Policy Database (RPDB) routing policy database RPDB RPDB The routing policy database (RPDB) controls the order in which the kernel searches through the routing tables. Each rule has a priority, and rules are examined sequentially from rule 0 through rule 32767. When a new packet arrives for routing (assuming the routing cache is empty), the kernel begins at the highest priority rule in the RPDB--rule 0. The kernel iterates over each rule in turn until the packet to be routed matches a rule. When this happens the kernel follows the instructions in that rule. Typically, this causes the kernel to perform a route lookup in a specified routing table. If a matching route is found in the routing table, the kernel uses that route. If no such route is found, the kernel returns to traverse the RPDB again, until every option has been exhausted. The priority-based rule system provides a flexible way to define routes while taking advantage of the traditional routing table concept. For a complete picture of the entire route selection process including the RPDB, see the section on routing selection. There are a number of different rule types available for use in the routing policy database. These rule types have a striking similarity to the route types available for route entries. unicast RPDB entry types unicast A unicast rule entry is the most common rule type. This rule type simple causes the kernel to refer to the specified routing table in the search for a route. If no rule type is specified on the command line, the rule is assumed to be a unicast rule. unicast rule type ip rule add unicast from 192.168.100.17 table 5 ip rule add unicast iif eth7 table 5 ip rule add unicast fwmark 4 table 4 nat RPDB entry types nat The nat rule type is required for correct operation of stateless NAT. This rule is typically coupled with a corresponding nat route entry. The RPDB nat entry causes the kernel to rewrite the source address of an outbound packet. See for a fuller discussion of network address translation in general. nat rule type ip rule add nat 193.7.255.184 from 172.16.82.184 ip rule add nat 10.40.0.0 from 172.40.0.0/16 unreachable RPDB entry types unreachable Any route lookup matching a rule entry with an unreachable rule type will cause the kernel to generate an ICMP unreachable to the source address of the packet. unreachable rule type ip rule add unreachable iif eth2 tos 0xc0 ip rule add unreachable iif wan0 fwmark 5 ip rule add unreachable from 192.168.7.0/25 prohibit RPDB entry types prohibit Any route lookup matching a rule entry with a prohibit rule type will cause the kernel to generate an ICMP prohibited to the source address of the packet. prohibit rule type ip rule add prohibit from 209.10.26.51 ip rule add prohibit to 64.65.64.0/18 ip rule add prohibit fwmark 7 blackhole RPDB entry types blackhole While traversing the RPDB, any route lookup which matches a rule with the blackhole rule type will cause the packet to be dropped. No ICMP will be sent and no packet will be forwarded. blackhole rule type ip rule add blackhole from 209.10.26.51 ip rule add blackhole from 172.19.40.0/24 ip rule add blackhole to 10.182.17.64/28 The routing policy database provides the core of functionality around which the policy routing and advanced routing features can be built.

ICMP and Routing ICMP is a very important part of the communication between hosts on IP networks. Used by routers and endpoints (clients and servers) ICMP communicates error conditions in networks and provides a means for endpoints to receive information about a network path or requested connection. One of the commonest uses of ICMP by the administrator of a network is the use of ping to detect the state of a machine in the network. There are other types of ICMP which are used for other inter-computer communication. One other common type of ICMP is the ICMP returned by a router or host which is not accepting connections. Essentially, the host returns the ICMP as a polite method of saying Go away..

MTU, MSS, and ICMP One important use of ICMP, which is completely transparent to most users (and indeed many admins), is the use of ICMP to discover the Path Maximum Transmission Unit (PMTU). By discovering the Path MTU and transmitting packets with this the MTU, a host can minimize the delay of traffic due to fragmentation, and (theoretically) attain a more even rate of data transmission. Because each destination may have a different MTU due to different network paths, the MTU is a per route attribute stored in the routing cache. Path MTU can be quite easily broken if any single hop along the way blocks all ICMP. Be sure to allow ICMP unreachable/fragmentation needed packets into and out of your network. This will prevent you from being one of the unclueful network admins who cause PMTU problems.

ICMP Redirects and Routing An ICMP redirect is a router's way of communicating that there is a better path out of this network or into another one than the one the host had chosen. In the example network, &tristan; has a route to the world through &masq-gw; and a route to 192.168.98.0/24 through &isdn-router;. If &tristan; sends a packet for 192.168.98.0/24 to &masq-gw;, the optimal outcome is for &masq-gw; to suggest with an ICMP redirect that &tristan; send such packets via &isdn-router; instead. By this method, hosts can learn what networks are reachable through which routers on the local network segment. ICMP redirect messages, however, are easy to forge, and were (at one time) used to subvert poorly configured machines. While this is infrequently a problem on the Internet today, it's still good practice to ignore ICMP redirect messages from public networks. Create static routes where necessary on private and public networks to prevent ICMP redirect messages from being generated on your network. To examine an example of ICMP redirect in action, we simply need to send a packet directly from &tristan; to &morgan;. We assume that &masq-gw; has a route to 192.168.98.0/24 via 192.168.99.1 (&isdn-router;), that &tristan; has no such route. ICMP Redirect on the Wire <footnote> <para> Consult <xref linkend="tb-example-network-hosts"/> for details on the IP and MAC addresses of the hosts referred to in this example. </para> </footnote> [root@tristan]# echo test | nc 192.168.98.82 22 [root@tristan]# tcpdump -nneqti eth0 0:80:c8:f8:4a:51 0:80:c8:f8:5c:71 74: 192.168.99.35.54510 > 192.168.98.82.22: tcp 0 (DF) 0:80:c8:f8:5c:71 0:80:c8:f8:4a:51 102: 192.168.99.254 > 192.168.99.35: icmp: redirect 192.168.98.82 to host 192.168.99.1 [tos 0xc0] 0:80:c8:f8:5c:71 0:c0:7b:45:6a:39 74: 192.168.99.35.54510 > 192.168.98.82.22: tcp 0 (DF) There's a great deal of information above, so let's examine the important parts. We have the first three packets which passed by our NIC as a result of this attempt to establish a session. First, we see a packet from &tristan; bound for &morgan; with &tristan;'s source MAC and &masq-gw;'s destination MAC. Because &masq-gw; is &tristan;'s default gateway, &tristan; will send all packets there. The next packet is the ICMP redirect, informing &tristan; of a better route. It includes several pieces of information. Implicitly, the source IP indicates what router is suggesting the alternate route, and the contents specify what the intended destination was, and what the better route is. Note that &masq-gw; suggests using 192.168.99.1 (&isdn-router;) as the gateway for this destination. The final packet is part of the intended session, but has the MAC address of &masq-gw; on it. &masq-gw; has (courteously) informed us that we should not use it as a route for the intended destination, but has also (courteously) forwarded the packet as we had requested. In this small network, it is acceptable to allow ICMP redirect messages, although these should always be dropped at network borders, both inbound and outbound. So, in summary, ICMP redirect messages are not intrinsically dangerous or problematic, but they shouldn't exist in well-maintained networks. If you happen to see them growing in the shadows of your network, some careful observation should show you what hosts are affected and which routing tables could use some attention.