EthernetEthernet
The most common link layer network in use today is Ethernet. Although
there are several common speeds of Ethernet devices, they function
identically with regard to higher layer protocols. As this documentation
focusses on higher layer protocols (IP), some fine distinctions about
different types of Ethernet will be overlooked in favor of depicting the
uniform manner in which IP networks overlay Ethernets.
Address Resolution Protocol provides the necessary mapping between link
layer
addresses and IP addresses for machines connected to Ethernets. Linux
offers control of ARP requests and replies via several
not-well-known /proc interfaces;
net/ipv4/conf/$DEV/proxy_arp,
net/ipv4/conf/$DEV/medium_id, and
net/ipv4/conf/$DEV/hidden. For even
finer control of ARP requests than is available in stock kernels,
there are kernel and &iproute2; patches.
This chapter will introduce the
ARP conversation, discuss the
ARP cache,
a volatile mapping of the reachable IPs and MAC addresses on a
segment, examine
the ARP flux problem,
and explore several
ARP filtering and suppression
techniques. A section on
VLAN technology and
channel bonding will round out the
chapter on Ethernet.
Address Resolution ProtocolARPARPAddress Resolution Protocol (ARP)
Address Resolution Protocol (ARP) hovers in the shadows of most networks.
Because of its simplicity, by comparison to higher layer protocols, ARP
rarely intrudes upon the network administrator's routine. All modern
IP-capable operating systems provide support for ARP. The uncommon
alternative to ARP is static link-layer-to-IP mappings.
ARP defines the exchanges between network interfaces connected to an
Ethernet media segment in order to map an IP address to a link layer
address on demand. Link layer addresses are hardware addresses (although
they are not immutable)
on Ethernet cards and IP addresses are logical addresses
assigned to machines attached to the Ethernet. Subsequently in this
chapter, link layer addresses may be known by many different names:
Ethernet addresses, Media Access Control (MAC) addresses, and even
hardware addresses.
Disputably, the correct term from the kernel's perspective is "link
layer address" because this address can be changed (on many Ethernet
cards) via command line tools. Nevertheless, these terms are not
realistically distinct and can be used interchangeably.
Overview of Address Resolution Protocol
Address Resolution Protocol (ARP) exists solely to glue together the
IP and Ethernet networking layers. Since networking hardware
such as switches, hubs, and bridges operate on Ethernet frames, they
are unaware of the higher layer data carried by these frames
Some networking equipment vendors have built devices which are
sold as high performance switches and are capable of performing
operations on higher layer contents of Ethernet frames.
Typically, however, a switching device is not capable of
operating on IP packets.
.
Similarly, IP layer devices, operating on IP packets need to be able
to transmit their IP data on Ethernets. ARP defines the
conversation by which IP capable hosts can exchange mappings of
their Ethernet and IP addressing.
ARP request
ARP is used to locate the Ethernet address associated with a desired IP
address. When a machine has a packet bound for another IP on a locally
connected Ethernet network, it will send a broadcast Ethernet frame
containing an ARP request onto the Ethernet. All machines with the same
Ethernet broadcast address will receive this packet
The kernel uses the Ethernet broadcast address configured on the
link layer device. This is rarely anything but ff:ff:ff:ff:ff:ff.
In the extraordinary event that this is not the Ethernet broadcast
address in your network, see
.
.
If a machine receives the ARP request and it hosts the IP requested,
it will respond with the link layer address on which it will receive
packets for that IP address.
N.B., the
arp_filter
sysctl will alter this behaviour
somewhat.
ARP reply
Once the requestor receives the response packet, it associates
the MAC address and the IP address. This information is stored in the
arp cache. The arp cache
can be manipulated with the
ip neighbor
and
arp commands.
To learn how and when to manipulate the arp cache, see
.
In , we used ping to
test reachability of &masq-gw;. Using a packet sniffer to capture
the sequence of packets on the Ethernet as a result of &tristan;'s
attempt to ping, provides an example of ARP in flagrante
delicto. Consult the
example network map for a
visual representation of the network layout in which this traffic
occurs.
This is an archetypal conversation between two
computers exchanging relevant hardware addressing in order that they
can pass IP packets, and is comprised of two Ethernet frames.
arpingbasic usageARP conversation captured with tcpdump
tcpdump is one of a number of utilities for
watching packets visible to an interface. For further
introduction to tcpdump, see
.
[root@masq-gw]# tcpdump -ennqti eth0 \( arp or icmp \)tcpdump: listening on eth0
0:80:c8:f8:4a:51 ff:ff:ff:ff:ff:ff 42: arp who-has 192.168.99.254 tell 192.168.99.35
0:80:c8:f8:5c:73 0:80:c8:f8:4a:51 60: arp reply 192.168.99.254 is-at 0:80:c8:f8:5c:73
0:80:c8:f8:4a:51 0:80:c8:f8:5c:73 98: 192.168.99.35 > 192.168.99.254: icmp: echo request (DF)
0:80:c8:f8:5c:73 0:80:c8:f8:4a:51 98: 192.168.99.254 > 192.168.99.35: icmp: echo reply ARP request
This broadcast Ethernet frame, identifiable by the
destination Ethernet address with all bits set
(ff:ff:ff:ff:ff:ff) contains an ARP request from &tristan;
for IP address 192.168.99.254. The request includes the
source link layer address and the IP address of
the requestor, which provides enough information for the
owner of the IP address to reply with its link layer address.
ARP reply
The ARP reply from &masq-gw; includes its link layer address
and declaration of ownership of the requested IP address.
Note that the ARP reply is a unicast response to a broadcast
request. The payload of the ARP reply contains the link layer
address mapping.
The machine which initiated the ARP request (&tristan;)
now has enough information to encapsulate an IP packet in
an Ethernet frame and forward it to the link layer address
of the recipient (00:80:c8:f8:5c:73).
The final two packets in
display the link
layer header and the encapsulated ICMP packets
exchanged between these two hosts. Examining the ARP
cache on each of these hosts would reveal entries on
each host for the other host's link layer address.
This example is the commonest example of ARP traffic on an Ethernet.
In summary, an ARP request is transmitted in a broadcast Ethernet
frame. The ARP reply is a unicast response, containing the desired
information, sent to the requestor's link layer address.
An even rarer usage of ARP is gratuitous ARP, where a machine
announces its ownership of an IP address on a media segment. The
arping utility
can generate these gratuitous ARP frames. Linux kernels will
respect gratuitous ARP frames
I have repeatedly tested using arping in
gratuitous ARP mode, and have found that linux kernels appear to
respect gratuitous ARP. This is a surprise. Does anybody have
ideas about this? Must research!
.
ARPgratuitousARP replyarpinggratuitousGratuitous ARP reply frames[root@tristan]# arping -q -c 3 -A -I eth0 192.168.99.35[root@masq-gw]# tcpdump -c 3 -nni eth2 arptcpdump: listening on eth2
06:02:50.626330 arp reply 192.168.99.35 is-at 0:80:c8:f8:4a:51 (0:80:c8:f8:4a:51)
06:02:51.622727 arp reply 192.168.99.35 is-at 0:80:c8:f8:4a:51 (0:80:c8:f8:4a:51)
06:02:52.620954 arp reply 192.168.99.35 is-at 0:80:c8:f8:4a:51 (0:80:c8:f8:4a:51)
The frames generated in
are ARP replies to a
question never asked. This sort of ARP is common in failover
solutions and also for nefarious sorts of purposes, such as
ettercap.
Unsolicited ARP request frames, on the other hand, are broadcast
ARP requests initiated by a host owning an IP address.
ARPunsolicitedARP requestarpingunsolicitedUnsolicited ARP request frames[root@tristan]# arping -q -c 3 -U -I eth0 192.168.99.35[root@masq-gw]# tcpdump -c 3 -nni eth2 arptcpdump: listening on eth2
06:28:23.172068 arp who-has 192.168.99.35 (ff:ff:ff:ff:ff:ff) tell 192.168.99.35
06:28:24.167290 arp who-has 192.168.99.35 (ff:ff:ff:ff:ff:ff) tell 192.168.99.35
06:28:25.167250 arp who-has 192.168.99.35 (ff:ff:ff:ff:ff:ff) tell 192.168.99.35[root@masq-gw]# ip neigh show
These two uses of arping can help diagnose Ethernet
and ARP problems--particularly hosts replying for addresses which do
not belong to them.
To avoid IP address collisions on dynamic networks (where hosts are
turning on and off, connecting and disconnecting and otherwise
changing IP addresses) duplicate address detection becomes important.
Fortunately, arping provides this functionality as
well. A startup script could include the arping
utility in duplicate address detection mode to select between
IP addresses or methods of acquiring an IP address.
ARPduplicate address detectionarpingduplicate address detectionDuplicate Address Detection with ARP[root@tristan]# arping -D -I eth0 192.168.99.147; echo $?ARPING 192.168.99.47 from 0.0.0.0 eth0
Unicast reply from 192.168.99.47 [00:80:C8:E8:1E:FC] for 192.168.99.47 [00:80:C8:E8:1E:FC] 0.702ms
Sent 1 probes (1 broadcast(s))
Received 1 response(s)
1[root@tristan]# tcpdump -eqtnni eth2 arptcpdump: listening on eth2
0:80:c8:f8:4a:51 ff:ff:ff:ff:ff:ff 60: arp who-has 192.168.99.147 (ff:ff:ff:ff:ff:ff) tell 0.0.0.0
0:80:c8:e8:1e:fc 0:80:c8:f8:4a:51 42: arp reply 192.168.99.147 is-at 0:80:c8:e8:1e:fc (0:80:c8:e8:1e:fc)[root@masq-gw]# ip neigh show
Address Resolution Protocol, which provides a method to connect
physical network addresses with logical network addresses
is a key element to the deployment of IP on Ethernet networks.
The ARP cacheARP cacheneighbor tableARP cache
In simplest terms, an ARP cache is a stored mapping of IP addresses
with link layer addresses. An ARP cache obviates the need for an ARP
request/reply conversation for each IP packet exchanged. Naturally,
this efficiency comes with a price. Each host maintains its own ARP
cache, which can become outdated when a host is replaced, or an IP
address moves from one host to another. The ARP cache is also known
as the neighbor table.
To display the ARP cache, the venerable and cross-platform
arp admirably dispatches its duty. As with many of
the &iproute2; tools, more information is available
via ip neighbor than with arp.
below illustrates the differences
in the output between the output of these two different tools.
ARP cachedisplayingARP cache listings with arp and
ip neighbor[root@tristan]# arp -na? (192.168.99.7) at 00:80:C8:E8:1E:FC [ether] on eth0
? (192.168.99.254) at 00:80:C8:F8:5C:73 [ether] on eth0[root@tristan]# ip neighbor show192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud reachable
192.168.99.254 dev eth0 lladdr 00:80:c8:f8:5c:73 nud reachable
A major difference between the information reported by ip
neighbor and arp is the state of the
proxy ARP table. The only way to list permanently advertised entries
in the neighbor table (proxy ARP entries) is with the
arp.
ARP cachelifetimeARP cacheexpiration
Entries in the ARP cache are periodically and automatically
verified unless continually used. Along with
net/ipv4/neigh/$DEV/gc_stale_time,
there are a number of other parameters in
net/ipv4/neigh/$DEV which control the
expiration of entries in the ARP cache.
When a host is down or disconnected from the Ethernet, there is a
period of time during which other hosts may have an ARP cache entry
for the disconnected host. Any other machine may display a neighbor
table with the link layer address of the recently disconnected host.
Because there is a recently known-good link layer address on which
the IP was reachable, the entry will abide. At
gc_stale_time the state of the entry will change,
reflecting the need to verify the reachability of the link layer
address. When the disconnected host fails to respond ARP requests,
the neighbor table entry will be marked as
incomplete
Here are a the possible states for entries in the neighbor table.
ARP cachestates
Active ARP cache entry statesARP cache entry statemeaningaction if usedpermanentnever expires; never verifiedreset use counternoarpnormal expiration; never verifiedreset use counterreachablenormal expirationreset use counterstalestill usable; needs verificationreset use counter; change state to delaydelayschedule ARP request; needs verificationreset use counterprobesending ARP requestreset use counterincompletefirst ARP request sentsend ARP requestfailedno response receivedsend ARP request
To resume, a host (192.168.99.7) in &tristan;'s ARP cache on the
example network has just
been disconnected. There are a series of events which
will occur as &tristan;'s ARP cache entry for 192.168.99.7 expires and
gets scheduled for verification. Imagine that the following commands
are run to capture each of these states immediately before state
change.
ARP cacheexpiration sequenceARP cache timeout[root@tristan]# ip neighbor show 192.168.99.7192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud reachable[root@tristan]# ip neighbor show 192.168.99.7192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud stale[root@tristan]# ip neighbor show 192.168.99.7192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud delay[root@tristan]# ip neighbor show 192.168.99.7192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud probe[root@tristan]# ip neighbor show 192.168.99.7192.168.99.7 dev eth0 nud incomplete
Before the entry has expired for 192.168.99.7, but after the
host has been disconnected from the network. During this
time, &tristan; will continue to send out Ethernet frames with
the destination frame address set to the link layer address
according to this entry.
It has been gc_stale_time seconds since
the entry has been verified, so the state has changed to
stale.
This entry in the neighbor table has been requested.
Because the entry was in a stale state, the link layer
address was used, but now the kernel needs to verify
the accuracy of the address. The kernel will soon send
an ARP request for the destination IP address.
The kernel is actively performing address resolution for the
entry. It will send a total of
ucast_solicit frames to the last known
link layer address to attempt to verify reachability of the
address. Failing this, it will send
mcast_solicit broadcast frames before
altering the ARP cache state and returning an error to any
higher layer services.
After all attempts to reach the destination address have
failed, the entry will appear in the neighbor table in this
state.
The remaining neighbor table flags are visible when initial ARP
requests are made. If no ARP cache entry exists for a requested
destination IP, the kernel will generate
mcast_solicit ARP requests until receiving an
answer.
During this discovery period, the ARP cache
entry will be listed in an incomplete state. If
the lookup does not succeed after the specified number of ARP
requests, the ARP cache entry will be listed in a
failed state. If the lookup does succeed, the
kernel enters the response into the ARP cache and resets the
confirmation and update timers.
After receipt of a corresponding ARP reply, the kernel enters the
response into the ARP cache and resets the confirmation and update
timers.
For machines not using a static mapping for link layer and IP
addresses, ARP provides on demand mappings. The remainder of this
section will cover the methods available under linux to control the
address resolution protocol.
ARP SuppressionARP suppression
Complete ARP suppression is not difficult at all. ARP suppression can
be accomplished under linux on a per-interface basis by setting the
noarp flag on any Ethernet interface.
Disabling ARP will require static neighbor table mappings
for all hosts wishing to exchange packets across the Ethernet.
To suppress ARP on an interface simply use ip
link set dev $DEV arp off as in
or ifconfig $DEV -arp as in
. Complete ARP suppression
will prevent the host from sending any ARP requests or responding with
any ARP replies.
The ARP Flux ProblemARP flux
When a linux box is connected to a network segment with multiple
network cards, a potential problem with the link layer address
to IP address mapping can occur.
The machine may respond to ARP requests from both Ethernet interfaces.
On the machine creating the ARP request, these multiple answers can
cause confusion, or worse yet, non-deterministic population
of the ARP cache. Known as ARP flux
I have seen it called names other than ARP flux--anybody out there
heard of this called anything besides ARP flux?
,
this can lead to the possibly puzzling effect that an IP migrates
non-deterministically through multiple link layer addresses. It's
important to understand that ARP flux typically only affects hosts
which have multiple physical connections to the same medium or
broadcast domain.
This is a simple illustration of the problem in a network where a
server has two Ethernet adapters connected to the same media
segment. They need not have IP addresses in the same IP network for
the ARP reply to be generated by each interface. Note the first
two replies received in response to the ARP broadcast request.
These replies arrive from conflicting link layer addresses in
response to this request. Also notice the greater time required for
the sending and receiving hosts to process the broadcast ARP request
frames than the unicast frames which follow (probes two and three).
ARP flux[root@real-client]# arping -I eth0 -c 3 10.10.20.67ARPING 10.10.20.67 from 10.10.20.33 eth0
Unicast reply from 10.10.20.67 [00:80:C8:7E:71:D4] 11.298ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC] 12.077ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC] 1.542ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC] 1.547ms
Sent 3 probes (1 broadcast(s))
Received 4 response(s)
There are four solutions to this problem. The common solution for
kernel 2.4 harnesses the
arp_filter
sysctl, while the common solution for kernel 2.2 takes
advantage of the
hidden
sysctl. These two solutions alter the behaviour of ARP on a
per interface basis and only if the functionality has been enabled.
Alternate solutions which provide much greater control of ARP
(possibly documented
here at a later date)
include Julian Anastasov's
ip
arp tool and his
noarp
route flag. While these tools were conceived in the course of
the
Linux Virtual
Server project, they have practical application outside this
realm.
ARP flux prevention with arp_filterarp_filterARP fluxsolving with arp_filter
One method for preventing ARP flux involves the use of
net/ipv4/conf/$DEV/arp_filter. In
short, the use of arp_filter causes the recipient
(in the
case below,
&real-server;) to perform a route lookup to
determine the interface through which to send the
reply, instead of the default behaviour
(shown above), replying
from all Ethernet interfaces which receive the request.
The arp_filter solution can have unintended
effects if the only route to the destination
is through one of the network cards. In
, &real-client; will
demonstrate this. This instructive example should highlight
the shortcomings of the arp_filter solution in
very complex networks where finer-grained control is required.
In general, the arp_filter solution
sufficiently solves the ARP flux problem. First, hosts do not
generate ARP requests for networks to which they do not have a
direct route (see
) and second, when such a route
exists, the host normally
chooses a source
address in the same network as the destination. So, the
arp_filter solution is a good general solution,
but does not adequately address the occasional need for more control
over ARP requests and replies.
Correction of ARP flux with
conf/$DEV/arp_filter[root@real-server]# echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter[root@real-server]# echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_filter[root@real-server]# echo 1 > /proc/sys/net/ipv4/conf/eth1/arp_filter[root@real-server]# ip address show dev eth02: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:80:c8:e8:1e:fc brd ff:ff:ff:ff:ff:ff
inet 10.10.20.67/24 scope global eth0[root@real-server]# ip address show dev eth13: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:80:c8:7e:71:d4 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.1/24 brd 192.168.100.255 scope global eth1[root@real-client]# arping -I eth0 -c 3 10.10.20.67ARPING 10.10.20.67 from 10.10.20.33 eth0
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC] 0.882ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC] 1.221ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC] 1.487ms
Sent 3 probes (1 broadcast(s))
Received 3 response(s)[root@real-client]# arping -I eth0 -c 3 192.168.100.1ARPING 192.168.100.1 from 10.10.20.33 eth0
Unicast reply from 192.168.100.1 [00:80:C8:E8:1E:FC] 0.877ms
Unicast reply from 192.168.100.1 [00:80:C8:E8:1E:FC] 1.517ms
Unicast reply from 192.168.100.1 [00:80:C8:E8:1E:FC] 1.661ms
Sent 3 probes (1 broadcast(s))
Received 3 response(s)[root@real-client]# ip neighbor del 192.168.100.1 dev eth0[root@real-client]# ip address add 192.168.100.2/24 brd + dev eth0[root@real-client]# arping -I eth0 -c 3 192.168.100.1ARPING 192.168.100.1 from 192.168.100.2 eth0
Unicast reply from 192.168.100.1 [00:80:C8:7E:71:D4] 0.804ms
Unicast reply from 192.168.100.1 [00:80:C8:7E:71:D4] 1.381ms
Unicast reply from 192.168.100.1 [00:80:C8:7E:71:D4] 2.487ms
Sent 3 probes (1 broadcast(s))
Received 3 response(s)
Set the sysctl variables to enable the
arp_filter functionality. After this,
you might expect that ARP replies for 10.10.20.67 would only
advertise the link layer address on eth0 (00:80:c8:e8:1e:fc).
Here is the expected behaviour. Only one reply comes in for
the IP 10.10.20.67 after the arp_filter
sysctl has been enabled. The reply originates from the
interface on &real-server; which actually hosts the IP
address. Note that the source address on the ARP queries is
10.10.20.33, and that the ARP query causes &real-server; to
perform a route lookup on 10.10.20.33 to choose an interface
from which to send the reply.
Here, &real-client; requests the link layer address of the
host 192.168.100.1, but the source IP on the request packet
(chosen according to the
rules for source
address selection) is 10.10.20.33. When
&real-server; looks up a route to this destination, it
chooses its eth0, and replies with the link layer address of
its eth0. Conventional networking needs should not run
afoul of this oddity of the arp_filter
ARP flux prevention technique.
Remove the entry in the neighbor table before testing again.
By adding an IP address in the same network as the intended
destination (which would be
rather common where multiple IP networks share the same
medium or broadcast domain), the kernel can now select a
different source address for the ARP request packets.
Note the source address of the ARP queries is now
192.168.100.2. When &real-server; performs a route lookup
for the 192.168.100.0/24 destination, the chosen path is
through eth1. The ARP reply packets now have the correct
link layer address.
In general, the arp_filter solution should
suffice, but this knowledge can be key in determining whether or not
an alternate solution, such as an
ARP filtering solution
are necessary.
ARP flux prevention with hiddensysctlhiddenARP fluxsolving with hidden
The ARP flux problem can also be combatted with a
kernel
patch by Julian Anastasov, which was incorporated into the
2.2.14+ kernel series, but never into the 2.4+ kernel series.
Therefore, the functionality may not be available in all
kernels.
The sysctl net/ipv4/conf/$DEV/hidden toggles
the generation of ARP replies for requested IPs. It marks an
interface and all of its IP addresses invisible to other
interfaces for the purpose of ARP
requests. When an ARP request arrives on any interface, the kernel
tests to see if the IP address is locally hosted anywhere on the
machine. If the IP is found on any interface, the kernel will
generate a reply.
Since this is not always desirable, the hidden
sysctl can be employed. This prevents the kernel from finding the
IP address when testing to see what IP addresses are locally hosted.
The kernel can always find IPs hosted on the interface on which the
packet arrived, but it cannot find addresses which are
hidden.
As shown in
, not only can ARP flux be
corrected, but sensitive information about the IP addresses
available on a linux box can be safeguarded
Consider a masquerading firewall which answers ARP requests on a
public segment for IPs hosted on an internal interface. This
amounts to inadvertent exposure of internal addressing, and can be
used by an attacker as part of a data-gathering or reconaissance
operation on a network.
.
This makes the hidden sysctl useful for
preventing unwanted IP disclosure via ARP on multi-homed hosts,
in addition to preventing ARP flux on hosts connected to the
same network medium.
Correction of ARP flux with
net/$DEV/hidden[root@real-client]# arping -I eth0 -c 1 172.19.22.254ARPING 172.19.22.254 from 172.19.22.2 eth0
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2D] 0.704ms
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2E] 0.844ms
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2F] 0.918ms
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2C] 0.974ms
Sent 1 probes (1 broadcast(s))
Received 4 response(s)[root@real-server]# for i in all eth2 eth3 eth4 eth5 ; do> echo 1 > /proc/sys/net/ipv4/conf/$i/hidden> done[root@real-client]# arping -I eth0 -c 2 172.19.22.254ARPING 172.19.22.254 from 172.19.22.2 eth0
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2D] 0.710ms
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2D] 0.624ms
Sent 2 probes (1 broadcast(s))
Received 2 response(s)
These are two examples of methods to prevent ARP flux. Other
alternatives for correcting this problem are documented in
, where much more sophisticated
tools are available for manipulation and control over the ARP
functions of linux.
Proxy ARPARP, proxy
Occasionally, an IP network must be split into separate segments. Proxy
ARP can be used for increased control over packets exchanged between two
hosts or to limit exposure between two hosts in a single IP network.
The technique of proxy ARP is commonly used to interpose a device with
higher layer functionality between two other hosts. From a practical
standpoint, there is little difference between the functions of a
packet-filtering bridge and
a firewall performing proxy ARP. The manner by which the interposed
device receives the packets, however, is tremendously different.
Proxy ARP Network Diagram
The device performing proxy ARP (&masq-gw;) responds for all ARP queries
on behalf of IPs reachable on interfaces other than the interface on
which the query arrives.
FIXME; manual proxy ARP (see also
), kernel proxy ARP, and the newly
supported sysctl net/ipv4/conf/$DEV/medium_id.
sysctlmedium_idARP, proxywith kernelmedium_id
For a brief description of the use of medium_id, see
Julian's
remarks.
ARP, proxywith kernelproxy_arpsysctlproxy_arp
FIXME; Kernel proxy ARP with the sysctl
net/ipv4/conf/$DEV/proxy_arp.
Note....until this section is written, this
post
by Don Cohen is rather instructive.
ARP filteringARP filteringip arp
This section should be part of the "ghetto" which will
include documentation on ip arp. There's nothing
more to add here at the moment (low priority).
# ip arp helpUsage: ip arp [ list | flush ] [ RULE ]
ip arp [ append | prepend | add | del | change | replace | test ] RULE
RULE := [ table TABLE_NAME ] [ pref NUMBER ] [ from PREFIX ] [ to PREFIX ]
[ iif STRING ] [ oif STRING ] [ llfrom PREFIX ] [ llto PREFIX ]
[ broadcasts ] [ unicasts ] [ ACTION ] [ ALTER ]
TABLE_NAME := [ input | forward | output ]
ACTION := [ deny | allow ]
ALTER := [ src IP ] [ llsrc LLADDR ] [ lldst LLADDR ]
The
ip
arp tool.
Patches and code for the
noarp
route flag.
FIXME; add a few paragraphs on ip arp and the noarp
flag.
Connecting to an Ethernet 802.1q VLANVLAN
Virtual LANs are a way to take a single switch and subdivide it into
logical media segments. A single switch port in a VLAN-capable switch
can carry packets from multiple virtual LANs and linux can understand
the format of these Ethernet frames. For more on this, see
the linux
802.1q VLAN implementation site.
Kernels in the late 2.4 series have support for VLAN incorporated into
the stock release. The vconfig tool, however needs
to be compiled against the kernel source in order to provide userland
configurability of the kernel support for VLANs.
There are a few items of note which may prevent quick adoption of VLAN
support under linux. Ben McKeegan wrote a
good
summary of the MTU/MRU issues involved with VLANs and 10/100
Ethernet. Gigabit Ethernet drivers are not hamstrung with this problem.
Consider using gigabit Ethernet cards from the outset to avoid these
potential problems.
Bringing up a VLAN interface[root@real-router]# vconfig add eth0 7[root@real-router]# ip addr add dev eth0.7 192.168.30.254/24 brd +[root@real-router]# ip link set dev eth0.7 up
Each interface defined using the vconfig utility
takes its name from the base device to which it has been bound, and
appends the VLAN tag ID, as shown in
.
This documentation is sparse. Visit the
main
site and the
VLAN mailing list
archives.
Link Aggregation and High Availability with Bondingbonding
Networking vendors have long offered a functionality for aggregating
bandwidth across multiple physical links to a switch.
This allows a machine (frequently a server) to treat multiple
physical connections to switch units as a single logical link.
The standard moniker for this technology is IEEE 802.3ad, although
it is known by the common names of trunking, port trunking
and link aggregation. The conventional use of bonding under linux is
an implementation of this
link aggregation.
A separate use of the same driver allows the kernel to present a single
logical interface for two physical links to two separate
switches. Only one link is used at any given time. By using media
independent interface signal failure to detect when a switch or link
becomes unusable, the kernel can, transparently to userspace and
application layer services, fail to the backup physical connection.
Though not common, the failure of switches, network interfaces, and
cables can cause outages. As a component of high availability planning,
these bonding techniques
can help reduce the number of single points of failure.
For more information on bonding, see the
Documentation/networking/bonding.txt from the linux
source code tree.
Link Aggregationbondinglink aggregationchannel bonding
Bonding for link aggregation must be supported by both endpoints.
Two linux machines connected via crossover cables can take advantage
of link aggregation. A single machine connected with two physical
cables to a switch which supports port trunking can use link
aggregation to the switch.
Any conventional switch
will become ineffably confused by a hardware address appearing on
multiple ports simultaneously.
Link aggregation bonding[root@real-server root]# modprobe bonding[root@real-server root]# ip addr add 192.168.100.33/24 brd + dev bond0[root@real-server root]# ip link set dev bond0 up[root@real-server root]# ifenslave bond0 eth2 eth3master has no hw address assigned; getting one from slave!
The interface eth2 is up, shutting it down it to enslave it.
The interface eth3 is up, shutting it down it to enslave it.[root@real-server root]# ifenslave bond0 eth2 eth3[root@real-server root]# cat /proc/net/bond0/infoBonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Slave Interface: eth3
MII Status: up
Link Failure Count: 0
FIXME; Need an experiment here....maybe a tcpdump to show how the
management frames appear on the wire.
This
Beowulf
software page describes in a bit more detail the rationale and
a practical application of linux channel bonding (for link
aggregation).
High Availabilitybondinghigh availability
Bonding support under linux is part of a high availability solution.
For an entry point into the complexity of high availability in
conjunction with linux, see the
linux-ha.org site. To guard
against layer two (switch) and layer one (cable) failure, a machine
can be configured with multiple physical connections to separate
switch devices while presenting a single logical interface to
userspace.
The name of the interface can be specified by the user. It is
commonly bond0 or something similar. As a
logical interface, it can be used in routing tables and by
tcpdump.
The bond interface, when created, has no link layer address. In the
example below, an address is manually added to the interface. See
for an example of the
bonding driver reporting setting the link layer address when the first
device is enslaved to the bond (doesn't that sound cruel!).
High availability bonding[root@real-server root]# modprobe bonding mode=1 miimon=100 downdelay=200 updelay=200[root@real-server root]# ip link set dev bond0 addr 00:80:c8:e7:ab:5c[root@real-server root]# ip addr add 192.168.100.33/24 brd + dev bond0[root@real-server root]# ip link set dev bond0 up[root@real-server root]# ifenslave bond0 eth2 eth3The interface eth2 is up, shutting it down it to enslave it.
The interface eth3 is up, shutting it down it to enslave it.[root@real-server root]# ip link show eth2 ; ip link show eth3 ; ip link show bond04: eth2: <BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 100
link/ether 00:80:c8:e7:ab:5c brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,NOARP,SLAVE,DEBUG,AUTOMEDIA,PORTSEL,NOTRAILERS,UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 100
link/ether 00:80:c8:e7:ab:5c brd ff:ff:ff:ff:ff:ff
58: bond0: <BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue
link/ether 00:80:c8:e7:ab:5c brd ff:ff:ff:ff:ff:ff
Immediately noticeable, there is a new flag in the ip link
show output. The MASTER and
SLAVE flags clearly report the nature of the
relationship between the interfaces. Also, the Ethernet interfaces
indicate the master interface via the keywords master
bond0.
Note also, that all three of the interfaces share the same link layer
address, 00:80:c8:e7:ab:5c.
FIXME; What doe DEBUG,AUTOMEDIA,PORTSEL,NOTRAILERS mean?