LDP/LDP/guide/docbook/linux-ip/ether.xml

<!-- $Id$ -->

<chapter id="ch-ether">
  <title>Ethernet</title>
  <indexterm>
    <primary>Ethernet</primary>
  </indexterm>
  <para>
    The most common link layer network in use today is Ethernet.  Although
    there are several common speeds of Ethernet devices, they function
    identically with regard to higher layer protocols.  As this documentation
    focusses on higher layer protocols (IP), some fine distinctions about
    different types of Ethernet will be overlooked in favor of depicting the
    uniform manner in which IP networks overlay Ethernets.
  </para>
  <para>
    Address Resolution Protocol provides the necessary mapping between link
    layer
    addresses and IP addresses for machines connected to Ethernets.  Linux
    offers control of ARP requests and replies via several
    not-well-known <filename>/proc</filename> interfaces;
    <filename>net/ipv4/conf/$DEV/proxy_arp</filename>,
    <filename>net/ipv4/conf/$DEV/medium_id</filename>, and
    <filename>net/ipv4/conf/$DEV/hidden</filename>.  For even
    finer control of ARP requests than is available in stock kernels,
    there are kernel and &iproute2; patches.
  </para>
  <para>
    This chapter will introduce the
    <link linkend="ether-arp-overview">ARP conversation</link>, discuss the
    <link linkend="ether-arp-cache">ARP cache</link>,
    a volatile mapping of the reachable IPs and MAC addresses on a
    segment, examine
    <link linkend="ether-arp-flux">the ARP flux problem</link>,
    and explore several
    <link linkend="ether-arp-filtering">ARP filtering and suppression
    techniques</link>.  A section on
    <link linkend="ether-vlan">VLAN technology</link> and
    <link linkend="ether-bonding">channel bonding</link> will round out the
    chapter on Ethernet.
  </para>
  <indexterm zone="ether-arp">
    <primary>Address Resolution Protocol</primary>
    <see>ARP</see>
  </indexterm>
  <indexterm zone="ether-arp">
    <primary>ARP</primary>
  </indexterm>
  <section id="ether-arp">
    <title>Address Resolution Protocol (ARP)</title>
    <para>
      Address Resolution Protocol (ARP) hovers in the shadows of most networks.
      Because of its simplicity, by comparison to higher layer protocols, ARP
      rarely intrudes upon the network administrator's routine.  All modern
      IP-capable operating systems provide support for ARP.  The uncommon
      alternative to ARP is static link-layer-to-IP mappings.
    </para>
    <para>
      ARP defines the exchanges between network interfaces connected to an
      Ethernet media segment in order to map an IP address to a link layer
      address on demand.  Link layer addresses are hardware addresses (although
      <link linkend="tools-ip-link-set-address">they are not immutable</link>)
      on Ethernet cards and IP addresses are logical addresses
      assigned to machines attached to the Ethernet.  Subsequently in this
      chapter, link layer addresses may be known by many different names:
      Ethernet addresses, Media Access Control (MAC) addresses, and even
      hardware addresses.
      Disputably, the correct term from the kernel's perspective is "link
      layer address" because this address can be changed (on many Ethernet
      cards) via command line tools.  Nevertheless, these terms are not
      realistically distinct and can be used interchangeably.
    </para>
    <section id="ether-arp-overview">
      <title>Overview of Address Resolution Protocol</title>
      <para>
        Address Resolution Protocol (ARP) exists solely to glue together the
        IP and Ethernet networking layers.  Since networking hardware
        such as switches, hubs, and bridges operate on Ethernet frames, they
        are unaware of the higher layer data carried by these frames
          <footnote>
            <para>
              Some networking equipment vendors have built devices which are
              sold as high performance switches and are capable of performing
              operations on higher layer contents of Ethernet frames.
              Typically, however, a switching device is not capable of
              operating on IP packets.
            </para>
          </footnote>.
        Similarly, IP layer devices, operating on IP packets need to be able
        to transmit their IP data on Ethernets.  ARP defines the
        conversation by which IP capable hosts can exchange mappings of
        their Ethernet and IP addressing.
      </para>
      <indexterm zone="ether-arp-request">
        <primary>ARP request</primary>
      </indexterm>
      <anchor id="ether-arp-request"/>
      <para>
        ARP is used to locate the Ethernet address associated with a desired IP
        address.  When a machine has a packet bound for another IP on a locally
        connected Ethernet network, it will send a broadcast Ethernet frame
        containing an ARP request onto the Ethernet.  All machines with the same
        Ethernet broadcast address will receive this packet
        <footnote>
          <para>
            The kernel uses the Ethernet broadcast address configured on the
            link layer device.  This is rarely anything but ff:ff:ff:ff:ff:ff.
            In the extraordinary event that this is not the Ethernet broadcast
            address in your network, see
            <xref linkend="tools-ip-link-set-address"/>.
          </para>
        </footnote>.
        If a machine receives the ARP request and it hosts the IP requested,
        it will respond with the link layer address on which it will receive
        packets for that IP address.
        <foreignphrase>N.B.</foreignphrase>, the
        <link linkend="ether-arp-flux-arpfilter"><constant>arp_filter</constant>
        sysctl</link> will alter this behaviour
        somewhat.
      </para>
      <indexterm zone="ether-arp-reply">
        <primary>ARP reply</primary>
      </indexterm>
      <anchor id="ether-arp-reply"/>
      <para>
        Once the requestor receives the response packet, it associates
        the MAC address and the IP address.  This information is stored in the
        <link linkend="ether-arp-cache">arp cache</link>.  The arp cache
        can be manipulated with the
        <link linkend="tools-ip-neighbor"><command>ip neighbor</command></link>
        and
        <link linkend="tools-arp"><command>arp</command></link> commands.
        To learn how and when to manipulate the arp cache, see
        <xref linkend="tools-arp"/>.
      </para>
      <para>
        In <xref linkend="ex-basic-ping"/>, we used <command>ping</command> to
        test reachability of &masq-gw;.  Using a packet sniffer to capture
        the sequence of packets on the Ethernet as a result of &tristan;'s
        attempt to ping, provides an example of ARP <foreignphrase>in flagrante
        delicto</foreignphrase>.  Consult the
        <link linkend="example-network-netmap">example network map</link> for a
        visual representation of the network layout in which this traffic
        occurs.
      </para>
      <para>
        This is an archetypal conversation between two
        computers exchanging relevant hardware addressing in order that they
        can pass IP packets, and is comprised of two Ethernet frames.
      </para>
      <indexterm zone="ex-ether-arp-overview">
        <primary><command>arping</command></primary>
        <secondary>basic usage</secondary>
      </indexterm>
      <example id="ex-ether-arp-overview">
        <title>ARP conversation captured with tcpdump
          <footnote>
            <para>
              <command>tcpdump</command> is one of a number of utilities for
              watching packets visible to an interface.  For further
              introduction to <command>tcpdump</command>, see
              <xref linkend="tools-tcpdump"/>.
            </para>
          </footnote>
        </title>
        <programlisting>
<prompt>[root@masq-gw]# </prompt><userinput>tcpdump -ennqti eth0 \( arp or icmp \)</userinput>
<computeroutput>tcpdump: listening on eth0
0:80:c8:f8:4a:51 ff:ff:ff:ff:ff:ff 42: arp who-has 192.168.99.254 tell 192.168.99.35             <co id="ex-eao-request" linkends="ex-eao-request-text"/>
0:80:c8:f8:5c:73 0:80:c8:f8:4a:51 60: arp reply 192.168.99.254 is-at 0:80:c8:f8:5c:73            <co id="ex-eao-reply" linkends="ex-eao-reply-text"/>
0:80:c8:f8:4a:51 0:80:c8:f8:5c:73 98: 192.168.99.35 &gt; 192.168.99.254: icmp: echo request (DF)    <co id="ex-eao-ip" linkends="ex-eao-ip-text"/>
0:80:c8:f8:5c:73 0:80:c8:f8:4a:51 98: 192.168.99.254 &gt; 192.168.99.35: icmp: echo reply           <co id="ex-eao-ip2" linkends="ex-eao-ip-text"/></computeroutput>
        </programlisting>
        <calloutlist>
          <callout
            arearefs="ex-eao-request"
            id="ex-eao-request-text">
              <indexterm zone="ex-eao-request-text">
                <primary>ARP request</primary>
              </indexterm>
              <simpara>
                This broadcast Ethernet frame, identifiable by the
                destination Ethernet address with all bits set
                (ff:ff:ff:ff:ff:ff) contains an ARP request from &tristan;
                for IP address 192.168.99.254.  The request includes the
                source link layer address and the IP address of
                the requestor, which provides enough information for the
                owner of the IP address to reply with its link layer address.
              </simpara>
          </callout>
          <callout
            arearefs="ex-eao-reply"
            id="ex-eao-reply-text">
              <indexterm zone="ex-eao-reply-text">
                <primary>ARP reply</primary>
              </indexterm>
              <simpara>
                The ARP reply from &masq-gw; includes its link layer address
                and declaration of ownership of the requested IP address.
                Note that the ARP reply is a unicast response to a broadcast
                request.  The payload of the ARP reply contains the link layer
                address mapping.
              </simpara>
              <simpara>
                The machine which initiated the ARP request (&tristan;)
                now has enough information to encapsulate an IP packet in
                an Ethernet frame and forward it to the link layer address
                of the recipient (00:80:c8:f8:5c:73).
              </simpara>
          </callout>
          <callout
            arearefs="ex-eao-ip ex-eao-ip2"
            id="ex-eao-ip-text">
              <simpara>
                The final two packets in
                <xref linkend="ex-ether-arp-overview"/> display the link
                layer header and the encapsulated ICMP packets
                exchanged between these two hosts.  Examining the ARP
                cache on each of these hosts would reveal entries on
                each host for the other host's link layer address.
              </simpara>
          </callout>
        </calloutlist>
      </example>
      <para>
        This example is the commonest example of ARP traffic on an Ethernet.
        In summary, an ARP request is transmitted in a broadcast Ethernet
        frame.  The ARP reply is a unicast response, containing the desired
        information, sent to the requestor's link layer address.
      </para>
      <para>
        An even rarer usage of ARP is gratuitous ARP, where a machine
        announces its ownership of an IP address on a media segment.  The
        <link linkend="tools-arping"><command>arping</command></link> utility
        can generate these gratuitous ARP frames.  Linux kernels will
        respect gratuitous ARP frames
        <footnote>
          <para>
            I have repeatedly tested using <command>arping</command> in
            gratuitous ARP mode, and have found that linux kernels appear to
            respect gratuitous ARP.  This is a surprise.  Does anybody have
            ideas about this?  Must research!
          </para>
        </footnote>.
      </para>
      <indexterm zone="ex-ether-arp-gratuitous">
        <primary>ARP</primary>
        <secondary>gratuitous</secondary>
        <seealso>ARP reply</seealso>
      </indexterm>
      <indexterm zone="ex-ether-arp-gratuitous">
        <primary><command>arping</command></primary>
        <secondary>gratuitous</secondary>
      </indexterm>
      <example id="ex-ether-arp-gratuitous">
        <title>Gratuitous ARP reply frames</title>
        <programlisting>
<prompt>[root@tristan]# </prompt><userinput>arping -q -c 3 -A -I eth0 192.168.99.35</userinput>
<prompt>[root@masq-gw]# </prompt><userinput>tcpdump -c 3 -nni eth2 arp</userinput>
<computeroutput>tcpdump: listening on eth2
06:02:50.626330 arp reply 192.168.99.35 is-at 0:80:c8:f8:4a:51 (0:80:c8:f8:4a:51)
06:02:51.622727 arp reply 192.168.99.35 is-at 0:80:c8:f8:4a:51 (0:80:c8:f8:4a:51)
06:02:52.620954 arp reply 192.168.99.35 is-at 0:80:c8:f8:4a:51 (0:80:c8:f8:4a:51)</computeroutput>
        </programlisting>
      </example>
      <para>
        The frames generated in
        <xref linkend="ex-ether-arp-gratuitous"/> are ARP replies to a
        question never asked.  This sort of ARP is common in failover
        solutions and also for nefarious sorts of purposes, such as
        <ulink url="http://ettercap.sourceforge.net/"><command>ettercap</command></ulink>.
      </para>
      <para>
        Unsolicited ARP request frames, on the other hand, are broadcast
        ARP requests initiated by a host owning an IP address.
      </para>
      <indexterm zone="ex-ether-arp-unsolicited">
        <primary>ARP</primary>
        <secondary>unsolicited</secondary>
        <seealso>ARP request</seealso>
      </indexterm>
      <indexterm zone="ex-ether-arp-unsolicited">
        <primary><command>arping</command></primary>
        <secondary>unsolicited</secondary>
      </indexterm>
      <example id="ex-ether-arp-unsolicited">
        <title>Unsolicited ARP request frames</title>
        <programlisting>
<prompt>[root@tristan]# </prompt><userinput>arping -q -c 3 -U -I eth0 192.168.99.35</userinput>
<prompt>[root@masq-gw]# </prompt><userinput>tcpdump -c 3 -nni eth2 arp</userinput>
<computeroutput>tcpdump: listening on eth2
06:28:23.172068 arp who-has 192.168.99.35 (ff:ff:ff:ff:ff:ff) tell 192.168.99.35
06:28:24.167290 arp who-has 192.168.99.35 (ff:ff:ff:ff:ff:ff) tell 192.168.99.35
06:28:25.167250 arp who-has 192.168.99.35 (ff:ff:ff:ff:ff:ff) tell 192.168.99.35</computeroutput>
<prompt>[root@masq-gw]# </prompt><userinput>ip neigh show</userinput>
        </programlisting>
      </example>
      <para>
        These two uses of <command>arping</command> can help diagnose Ethernet
        and ARP problems--particularly hosts replying for addresses which do
        not belong to them.
      </para>
      <para>
        To avoid IP address collisions on dynamic networks (where hosts are
        turning on and off, connecting and disconnecting and otherwise
        changing IP addresses) duplicate address detection becomes important.
        Fortunately, <command>arping</command> provides this functionality as
        well.  A startup script could include the <command>arping</command>
        utility in duplicate address detection mode to select between
        IP addresses or methods of acquiring an IP address.
      </para>
      <indexterm zone="ex-ether-arp-dad">
        <primary>ARP</primary>
        <secondary>duplicate address detection</secondary>
      </indexterm>
      <indexterm zone="ex-ether-arp-dad">
        <primary><command>arping</command></primary>
        <secondary>duplicate address detection</secondary>
      </indexterm>
      <example id="ex-ether-arp-dad">
        <title>Duplicate Address Detection with ARP</title>
        <programlisting>
<prompt>[root@tristan]# </prompt><userinput>arping -D -I eth0 192.168.99.147; echo $?</userinput>
<computeroutput>ARPING 192.168.99.47 from 0.0.0.0 eth0
Unicast reply from 192.168.99.47 [00:80:C8:E8:1E:FC] for 192.168.99.47 [00:80:C8:E8:1E:FC] 0.702ms
Sent 1 probes (1 broadcast(s))
Received 1 response(s)
1</computeroutput>
<prompt>[root@tristan]# </prompt><userinput>tcpdump -eqtnni eth2 arp</userinput>
<computeroutput>tcpdump: listening on eth2
0:80:c8:f8:4a:51 ff:ff:ff:ff:ff:ff 60: arp who-has 192.168.99.147 (ff:ff:ff:ff:ff:ff) tell 0.0.0.0
0:80:c8:e8:1e:fc 0:80:c8:f8:4a:51 42: arp reply 192.168.99.147 is-at 0:80:c8:e8:1e:fc (0:80:c8:e8:1e:fc)</computeroutput>
<prompt>[root@masq-gw]# </prompt><userinput>ip neigh show</userinput>
        </programlisting>
      </example>
      <para>
        Address Resolution Protocol, which provides a method to connect
        physical network addresses with logical network addresses
        is a key element to the deployment of IP on Ethernet networks.
      </para>
    </section>
    <section id="ether-arp-cache">
      <title>The ARP cache</title>
      <indexterm zone="ether-arp-cache">
        <primary>ARP cache</primary>
      </indexterm>
      <indexterm zone="ether-arp-cache">
        <primary>neighbor table</primary>
        <seealso>ARP cache</seealso>
      </indexterm>
      <para>
        In simplest terms, an ARP cache is a stored mapping of IP addresses
        with link layer addresses.  An ARP cache obviates the need for an ARP
        request/reply conversation for each IP packet exchanged.  Naturally,
        this efficiency comes with a price.  Each host maintains its own ARP
        cache, which can become outdated when a host is replaced, or an IP
        address moves from one host to another.  The ARP cache is also known
        as the neighbor table.
      </para>
      <para>
        To display the ARP cache, the venerable and cross-platform
        <command>arp</command> admirably dispatches its duty.  As with many of
        the &iproute2; tools, more information is available
        via <command>ip neighbor</command> than with <command>arp</command>.
        <xref linkend="ex-ether-arp-cache"/> below illustrates the differences
        in the output between the output of these two different tools.
      </para>
      <indexterm zone="ex-ether-arp-cache">
        <primary>ARP cache</primary>
        <secondary>displaying</secondary>
      </indexterm>
      <example id="ex-ether-arp-cache">
        <title>ARP cache listings with <command>arp</command> and
          <command>ip neighbor</command></title>
        <programlisting>
<prompt>[root@tristan]# </prompt><userinput>arp -na</userinput>
<computeroutput>? (192.168.99.7) at 00:80:C8:E8:1E:FC [ether] on eth0
? (192.168.99.254) at 00:80:C8:F8:5C:73 [ether] on eth0</computeroutput>
<prompt>[root@tristan]# </prompt><userinput>ip neighbor show</userinput>
<computeroutput>192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud reachable
192.168.99.254 dev eth0 lladdr 00:80:c8:f8:5c:73 nud reachable</computeroutput>
        </programlisting>
      </example>
      <para>
        A major difference between the information reported by <command>ip
        neighbor</command> and <command>arp</command> is the state of the
        proxy ARP table.  The only way to list permanently advertised entries
        in the neighbor table (proxy ARP entries) is with the
        <command>arp</command>.
      </para>
      <indexterm zone="ether-arp-cache-expiry">
        <primary>ARP cache</primary>
        <secondary>lifetime</secondary>
      </indexterm>
      <indexterm zone="ether-arp-cache-expiry">
        <primary>ARP cache</primary>
        <secondary>expiration</secondary>
      </indexterm>
      <anchor id="ether-arp-cache-expiry"/>
      <para>
        Entries in the ARP cache are periodically and automatically
        verified unless continually used.  Along with
        <filename>net/ipv4/neigh/$DEV/gc_stale_time</filename>,
        there are a number of other parameters in
        <filename>net/ipv4/neigh/$DEV</filename> which control the
        expiration of entries in the ARP cache.
      </para>
      <para>
        When a host is down or disconnected from the Ethernet, there is a
        period of time during which other hosts may have an ARP cache entry
        for the disconnected host.  Any other machine may display a neighbor
        table with the link layer address of the recently disconnected host.
        Because there is a recently known-good link layer address on which
        the IP was reachable, the entry will abide.  At
        <filename>gc_stale_time</filename> the state of the entry will change,
        reflecting the need to verify the reachability of the link layer
        address.  When the disconnected host fails to respond ARP requests,
        the neighbor table entry will be marked as
        <constant>incomplete</constant>
      </para>
      <para>
        Here are a the possible states for entries in the neighbor table.
      </para>
      <indexterm zone="tb-ether-arp-cache-states" significance="preferred">
        <primary>ARP cache</primary>
        <secondary>states</secondary>
      </indexterm>
      <table id="tb-ether-arp-cache-states">
        <title>Active ARP cache entry states</title>
        <tgroup cols="3" align="center" colsep="1" rowsep="1">
          <thead>
            <row>
              <entry>ARP cache entry state</entry>
              <entry>meaning</entry>
              <entry>action if used</entry>
            </row>
          </thead>
          <tbody>
            <row>
              <entry>permanent</entry>
              <entry>never expires; never verified</entry>
              <entry>reset use counter</entry>
            </row>
            <row>
              <entry>noarp</entry>
              <entry>normal expiration; never verified</entry>
              <entry>reset use counter</entry>
            </row>
            <row>
              <entry>reachable</entry>
              <entry>normal expiration</entry>
              <entry>reset use counter</entry>
            </row>
            <row>
              <entry>stale</entry>
              <entry>still usable; needs verification</entry>
              <entry>reset use counter; change state to delay</entry>
            </row>
            <row>
              <entry>delay</entry>
              <entry>schedule ARP request; needs verification</entry>
              <entry>reset use counter</entry>
            </row>
            <row>
              <entry>probe</entry>
              <entry>sending ARP request</entry>
              <entry>reset use counter</entry>
            </row>
            <row>
              <entry>incomplete</entry>
              <entry>first ARP request sent</entry>
              <entry>send ARP request</entry>
            </row>
            <row>
              <entry>failed</entry>
              <entry>no response received</entry>
              <entry>send ARP request</entry>
            </row>
          </tbody>
        </tgroup>
      </table>
      <para>
        To resume, a host (192.168.99.7) in &tristan;'s ARP cache on the
        <link linkend="ax-example-network">example network</link> has just
        been disconnected.  There are a series of events which
        will occur as &tristan;'s ARP cache entry for 192.168.99.7 expires and
        gets scheduled for verification.  Imagine that the following commands
        are run to capture each of these states immediately before state
        change.
      </para>
      <indexterm zone="ex-ether-arp-cache-timeout">
        <primary>ARP cache</primary>
        <secondary>expiration sequence</secondary>
      </indexterm>
      <example id="ex-ether-arp-cache-timeout">
        <title>ARP cache timeout</title>
          <programlisting>
<prompt>[root@tristan]# </prompt><userinput>ip neighbor show 192.168.99.7</userinput>
<computeroutput>192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud reachable</computeroutput>     <co id="ex-eact-reachable" linkends="ex-eact-reachable-text"/>
<prompt>[root@tristan]# </prompt><userinput>ip neighbor show 192.168.99.7</userinput>
<computeroutput>192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud stale</computeroutput>         <co id="ex-eact-stale" linkends="ex-eact-stale-text"/>
<prompt>[root@tristan]# </prompt><userinput>ip neighbor show 192.168.99.7</userinput>
<computeroutput>192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud delay</computeroutput>         <co id="ex-eact-delay" linkends="ex-eact-delay-text"/>
<prompt>[root@tristan]# </prompt><userinput>ip neighbor show 192.168.99.7</userinput>
<computeroutput>192.168.99.7 dev eth0 lladdr 00:80:c8:e8:1e:fc nud probe</computeroutput>         <co id="ex-eact-probe" linkends="ex-eact-probe-text"/>
<prompt>[root@tristan]# </prompt><userinput>ip neighbor show 192.168.99.7</userinput>
<computeroutput>192.168.99.7 dev eth0  nud incomplete</computeroutput>                            <co id="ex-eact-incomplete" linkends="ex-eact-incomplete-text"/>
          </programlisting>
          <calloutlist>
            <callout
              arearefs="ex-eact-reachable"
              id="ex-eact-reachable-text">
                <simpara>
                  Before the entry has expired for 192.168.99.7, but after the
                  host has been disconnected from the network.  During this
                  time, &tristan; will continue to send out Ethernet frames with
                  the destination frame address set to the link layer address
                  according to this entry.
                </simpara>
            </callout>
            <callout
              arearefs="ex-eact-stale"
              id="ex-eact-stale-text">
                <simpara>
                  It has been <constant>gc_stale_time</constant> seconds since
                  the entry has been verified, so the state has changed to
                  stale.
                </simpara>
            </callout>
            <callout
              arearefs="ex-eact-delay"
              id="ex-eact-delay-text">
                <simpara>
                  This entry in the neighbor table has been requested.
                  Because the entry was in a stale state, the link layer
                  address was used, but now the kernel needs to verify
                  the accuracy of the address.  The kernel will soon send
                  an ARP request for the destination IP address.
                </simpara>
            </callout>
            <callout
              arearefs="ex-eact-probe"
              id="ex-eact-probe-text">
                <simpara>
                  The kernel is actively performing address resolution for the
                  entry.  It will send a total of
                  <constant>ucast_solicit</constant> frames to the last known
                  link layer address to attempt to verify reachability of the
                  address.  Failing this, it will send
                  <constant>mcast_solicit</constant> broadcast frames before
                  altering the ARP cache state and returning an error to any
                  higher layer services.
                </simpara>
            </callout>
            <callout
              arearefs="ex-eact-incomplete"
              id="ex-eact-incomplete-text">
                <simpara>
                  After all attempts to reach the destination address have
                  failed, the entry will appear in the neighbor table in this
                  state.
                </simpara>
            </callout>
          </calloutlist>
      </example>
      <para>
        The remaining neighbor table flags are visible when initial ARP
        requests are made.  If no ARP cache entry exists for a requested
        destination IP, the kernel will generate
        <constant>mcast_solicit</constant> ARP requests until receiving an
        answer.
        During this discovery period, the ARP cache
        entry will be listed in an <emphasis>incomplete</emphasis> state.  If
        the lookup does not succeed after the specified number of ARP
        requests, the ARP cache entry will be listed in a
        <emphasis>failed</emphasis> state.  If the lookup does succeed, the
        kernel enters the response into the ARP cache and resets the
        confirmation and update timers.
      </para>
      <para>
        After receipt of a corresponding ARP reply, the kernel enters the
        response into the ARP cache and resets the confirmation and update
        timers.
      </para>
      <para>
        For machines not using a static mapping for link layer and IP
        addresses, ARP provides on demand mappings.  The remainder of this
        section will cover the methods available under linux to control the
        address resolution protocol.
      </para>
    </section>
    <section id="ether-arp-suppression">
      <title>ARP Suppression</title>
      <indexterm zone="ether-arp-suppression">
        <primary>ARP suppression</primary>
      </indexterm>
      <para>
        Complete ARP suppression is not difficult at all.  ARP suppression can
        be accomplished under linux on a per-interface basis by setting the
        noarp flag on any Ethernet interface.
        Disabling ARP will require static neighbor table mappings
        for all hosts wishing to exchange packets across the Ethernet.
      </para>
      <para>
        To suppress ARP on an interface simply use <command>ip
        link set dev $DEV arp off</command> as in
        <xref linkend="ex-tools-ip-link-set"/>
        or <command>ifconfig $DEV -arp</command> as in
        <xref linkend="ex-tools-ifconfig-flags"/>.  Complete ARP suppression
        will prevent the host from sending any ARP requests or responding with
        any ARP replies.
      </para>
    </section>

    <!--

      FIXME; new little network map needed to illustrate these ARP examples

      -->

    <section id="ether-arp-flux">
      <title>The ARP Flux Problem</title>
      <indexterm zone="ether-arp-flux">
        <primary>ARP flux</primary>
      </indexterm>
      <para>
        When a linux box is connected to a network segment with multiple
        network cards, a potential problem with the link layer address
        to IP address mapping can occur.
        The machine may respond to ARP requests from both Ethernet interfaces.
        On the machine creating the ARP request, these multiple answers can
        cause confusion, or worse yet, non-deterministic population
        of the ARP cache.  Known as ARP flux
        <footnote>
          <para>
            I have seen it called names other than ARP flux--anybody out there
            heard of this called anything besides ARP flux?
          </para>
        </footnote>,
        this can lead to the possibly puzzling effect that an IP migrates
        non-deterministically through multiple link layer addresses.  It's
        important to understand that ARP flux typically only affects hosts
        which have multiple physical connections to the same medium or
        broadcast domain.
      </para>
      <para>
        This is a simple illustration of the problem in a network where a
        server has two Ethernet adapters connected to the same media
        segment.  They need not have IP addresses in the same IP network for
        the ARP reply to be generated by each interface.  Note the first
        two replies received in response to the ARP broadcast request.
        These replies arrive from conflicting link layer addresses in
        response to this request.  Also notice the greater time required for
        the sending and receiving hosts to process the broadcast ARP request
        frames than the unicast frames which follow (probes two and three).
      </para>
      <example id="ex-ether-arp-flux">
        <title>ARP flux</title>
        <programlisting>
<prompt>[root@real-client]# </prompt><userinput>arping -I eth0 -c 3 10.10.20.67</userinput>
<computeroutput>ARPING 10.10.20.67 from 10.10.20.33 eth0
Unicast reply from 10.10.20.67 [00:80:C8:7E:71:D4]  11.298ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC]  12.077ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC]  1.542ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC]  1.547ms
Sent 3 probes (1 broadcast(s))
Received 4 response(s)</computeroutput>
        </programlisting>
      </example>
      <para>
        There are four solutions to this problem.  The common solution for
        kernel 2.4 harnesses the
        <link linkend="ether-arp-flux-arpfilter"><constant>arp_filter</constant>
        sysctl</link>, while the common solution for kernel 2.2 takes
        advantage of the
        <link linkend="ether-arp-flux-hidden"><constant>hidden</constant>
        sysctl</link>.  These two solutions alter the behaviour of ARP on a
        per interface basis and only if the functionality has been enabled.
      </para>
      <para>
        Alternate solutions which provide much greater control of ARP
        (possibly documented
        <link linkend="ether-arp-filtering">here</link> at a later date)
        include Julian Anastasov's
        <ulink url="http://www.ssi.bg/~ja/#iparp"><command>ip
        arp</command></ulink> tool and his
        <ulink url="http://www.ssi.bg/~ja/#noarp">noarp
        route flag</ulink>.  While these tools were conceived in the course of
        the
        <ulink url="http://www.linuxvirtualserver.org/">Linux Virtual
        Server</ulink> project, they have practical application outside this
        realm.
      </para>
      <section id="ether-arp-flux-arpfilter">
        <title>ARP flux prevention with <constant>arp_filter</constant></title>
        <indexterm zone="ether-arp-flux-arpfilter">
          <primary><constant>arp_filter</constant></primary>
        </indexterm>
        <indexterm zone="ether-arp-flux-arpfilter">
          <primary>ARP flux</primary>
          <secondary>solving with <constant>arp_filter</constant></secondary>
        </indexterm>
        <para>
          One method for preventing ARP flux involves the use of
          <filename>net/ipv4/conf/$DEV/arp_filter</filename>.  In
          short, the use of <filename>arp_filter</filename> causes the recipient
          (in the
          <link linkend="ex-ether-arp-flux-arpfilter">case below</link>,
          &real-server;) to perform a route lookup to
          determine the interface through which to send the
          reply, instead of the default behaviour
          (<link linkend="ex-ether-arp-flux">shown above</link>), replying
          from all Ethernet interfaces which receive the request.
        </para>
        <!--

          FIXME; read the spec, why is this smart?  Doesn't this mean
                 using the informational data in the Ethernet frame?

         -->
        <para>
          The <filename>arp_filter</filename> solution can have unintended
          effects if the only route to the destination
          is through one of the network cards.  In
          <xref linkend="ex-ether-arp-flux-arpfilter"/>, &real-client; will
          demonstrate this.  This instructive example should highlight
          the shortcomings of the <constant>arp_filter</constant> solution in
          very complex networks where finer-grained control is required.
        </para>
        <para>
          In general, the <filename>arp_filter</filename> solution
          sufficiently solves the ARP flux problem.  First, hosts do not
          generate ARP requests for networks to which they do not have a
          direct route (see
          <xref linkend="routing-local"/>) and second, when such a route
          exists, the host normally
          <link linkend="routing-saddr-selection">chooses a source
          address</link> in the same network as the destination.  So, the
          <filename>arp_filter</filename> solution is a good general solution,
          but does not adequately address the occasional need for more control
          over ARP requests and replies.
        </para>
        <example id="ex-ether-arp-flux-arpfilter">
          <title>Correction of ARP flux with
            <filename>conf/$DEV/arp_filter</filename></title>
          <programlisting>
<prompt>[root@real-server]# </prompt><userinput>echo 1 &gt; /proc/sys/net/ipv4/conf/all/arp_filter</userinput>
<prompt>[root@real-server]# </prompt><userinput>echo 1 &gt; /proc/sys/net/ipv4/conf/eth0/arp_filter</userinput>
<prompt>[root@real-server]# </prompt><userinput>echo 1 &gt; /proc/sys/net/ipv4/conf/eth1/arp_filter</userinput>
<prompt>[root@real-server]# </prompt><userinput>ip address show dev eth0</userinput>
<computeroutput>2: eth0: &lt;BROADCAST,MULTICAST,UP&gt; mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:80:c8:e8:1e:fc brd ff:ff:ff:ff:ff:ff
    inet 10.10.20.67/24 scope global eth0</computeroutput>
<prompt>[root@real-server]# </prompt><userinput>ip address show dev eth1</userinput>
<computeroutput>3: eth1: &lt;BROADCAST,MULTICAST,UP&gt; mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:80:c8:7e:71:d4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.1/24 brd 192.168.100.255 scope global eth1</computeroutput>    <co id="ex-eafa-server-setup" linkends="ex-eafa-server-setup-text"/>
<prompt>[root@real-client]# </prompt><userinput>arping -I eth0 -c 3 10.10.20.67</userinput>
<computeroutput>ARPING 10.10.20.67 from 10.10.20.33 eth0
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC]  0.882ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC]  1.221ms
Unicast reply from 10.10.20.67 [00:80:C8:E8:1E:FC]  1.487ms        </computeroutput><!-- gotta wrap the callout in tags so it's in the parent object --><co id="ex-eafa-expected" linkends="ex-eafa-expected-text"/><computeroutput>
Sent 3 probes (1 broadcast(s))
Received 3 response(s)</computeroutput>
<prompt>[root@real-client]# </prompt><userinput>arping -I eth0 -c 3 192.168.100.1</userinput>
<computeroutput>ARPING 192.168.100.1 from 10.10.20.33 eth0
Unicast reply from 192.168.100.1 [00:80:C8:E8:1E:FC]  0.877ms
Unicast reply from 192.168.100.1 [00:80:C8:E8:1E:FC]  1.517ms
Unicast reply from 192.168.100.1 [00:80:C8:E8:1E:FC]  1.661ms      </computeroutput><!-- gotta wrap the callout in tags so it's in the parent object --><co id="ex-eafa-problemzone" linkends="ex-eafa-problemzone-text"/><computeroutput>
Sent 3 probes (1 broadcast(s))
Received 3 response(s)</computeroutput>
<prompt>[root@real-client]# </prompt><userinput>ip neighbor del 192.168.100.1 dev eth0</userinput>         <co id="ex-eafa-clearcache" linkends="ex-eafa-clearcache-text"/>
<prompt>[root@real-client]# </prompt><userinput>ip address add 192.168.100.2/24 brd + dev eth0</userinput> <co id="ex-eafa-newip" linkends="ex-eafa-newip-text"/>
<prompt>[root@real-client]# </prompt><userinput>arping -I eth0 -c 3 192.168.100.1</userinput>
<computeroutput>ARPING 192.168.100.1 from 192.168.100.2 eth0
Unicast reply from 192.168.100.1 [00:80:C8:7E:71:D4]  0.804ms
Unicast reply from 192.168.100.1 [00:80:C8:7E:71:D4]  1.381ms
Unicast reply from 192.168.100.1 [00:80:C8:7E:71:D4]  2.487ms      </computeroutput><!-- gotta wrap the callout in tags so it's in the parent object --><co id="ex-eafa-workaround" linkends="ex-eafa-workaround-text"/><computeroutput>
Sent 3 probes (1 broadcast(s))
Received 3 response(s)</computeroutput>
          </programlisting>
          <calloutlist>
            <callout
              arearefs="ex-eafa-server-setup"
              id="ex-eafa-server-setup-text">
                <simpara>
                  Set the sysctl variables to enable the
                  <filename>arp_filter</filename> functionality.  After this,
                  you might expect that ARP replies for 10.10.20.67 would only
                  advertise the link layer address on eth0 (00:80:c8:e8:1e:fc).
                </simpara>
            </callout>
            <callout
              arearefs="ex-eafa-expected"
              id="ex-eafa-expected-text">
                <simpara>
                  Here is the expected behaviour.  Only one reply comes in for
                  the IP 10.10.20.67 after the <filename>arp_filter</filename>
                  sysctl has been enabled.  The reply originates from the
                  interface on &real-server; which actually hosts the IP
                  address.  Note that the source address on the ARP queries is
                  10.10.20.33, and that the ARP query causes &real-server; to
                  perform a route lookup on 10.10.20.33 to choose an interface
                  from which to send the reply.
                </simpara>
            </callout>
            <callout
              arearefs="ex-eafa-problemzone"
              id="ex-eafa-problemzone-text">
                <simpara>
                  Here, &real-client; requests the link layer address of the
                  host 192.168.100.1, but the source IP on the request packet
                  (chosen according to the
                  <link linkend="routing-saddr-selection">rules for source
                  address selection</link>) is 10.10.20.33.  When
                  &real-server; looks up a route to this destination, it
                  chooses its eth0, and replies with the link layer address of
                  its eth0.  Conventional networking needs should not run
                  afoul of this oddity of the <filename>arp_filter</filename>
                  ARP flux prevention technique.
                </simpara>
            </callout>
            <callout
              arearefs="ex-eafa-clearcache"
              id="ex-eafa-clearcache-text">
                <simpara>
                  Remove the entry in the neighbor table before testing again.
                </simpara>
            </callout>
            <callout
              arearefs="ex-eafa-newip"
              id="ex-eafa-newip-text">
                <simpara>
                  By adding an IP address in the same network as the intended
                  destination (which would be
                  rather common where multiple IP networks share the same
                  medium or broadcast domain), the kernel can now select a
                  different source address for the ARP request packets.
                </simpara>
            </callout>
            <callout
              arearefs="ex-eafa-workaround"
              id="ex-eafa-workaround-text">
                <simpara>
                  Note the source address of the ARP queries is now
                  192.168.100.2.  When &real-server; performs a route lookup
                  for the 192.168.100.0/24 destination, the chosen path is
                  through eth1.  The ARP reply packets now have the correct
                  link layer address.
                </simpara>
            </callout>
          </calloutlist>
        </example>
        <para>
          In general, the <filename>arp_filter</filename> solution should
          suffice, but this knowledge can be key in determining whether or not
          an alternate solution, such as an
          <link linkend="ether-arp-filtering">ARP filtering solution</link>
          are necessary.
        </para>
      </section>
      <section id="ether-arp-flux-hidden">
        <title>ARP flux prevention with <constant>hidden</constant></title>
        <indexterm zone="ether-arp-flux-hidden">
          <primary>sysctl</primary>
          <secondary><constant>hidden</constant></secondary>
        </indexterm>
        <indexterm zone="ether-arp-flux-hidden">
          <primary>ARP flux</primary>
          <secondary>solving with <constant>hidden</constant></secondary>
        </indexterm>
        <para>
          The ARP flux problem can also be combatted with a
          <ulink url="http://www.ssi.bg/~ja/#hidden">kernel
          patch</ulink> by Julian Anastasov, which was incorporated into the
          2.2.14+ kernel series, but never into the 2.4+ kernel series.
          Therefore, the functionality may not be available in all
          kernels.
        </para>
        <para>
          The sysctl <filename>net/ipv4/conf/$DEV/hidden</filename> toggles
          the generation of ARP replies for requested IPs.  It marks an
          interface and all of its IP addresses invisible to other
          interfaces for the purpose of ARP
          requests.  When an ARP request arrives on any interface, the kernel
          tests to see if the IP address is locally hosted anywhere on the
          machine.  If the IP is found on any interface, the kernel will
          generate a reply.
        </para>
        <para>
          Since this is not always desirable, the <filename>hidden</filename>
          sysctl can be employed.  This prevents the kernel from finding the
          IP address when testing to see what IP addresses are locally hosted.
          The kernel can always find IPs hosted on the interface on which the
          packet arrived, but it cannot find addresses which are
          <filename>hidden</filename>.
        </para>
        <para>
          As shown in
          <xref linkend="ex-ether-arp-flux-hidden"/>, not only can ARP flux be
          corrected, but sensitive information about the IP addresses
          available on a linux box can be safeguarded
          <footnote>
            <para>
              Consider a masquerading firewall which answers ARP requests on a
              public segment for IPs hosted on an internal interface.  This
              amounts to inadvertent exposure of internal addressing, and can be
              used by an attacker as part of a data-gathering or reconaissance
              operation on a network.
            </para>
          </footnote>.
          This makes the <filename>hidden</filename> sysctl useful for
          preventing unwanted IP disclosure via ARP on multi-homed hosts,
          in addition to preventing ARP flux on hosts connected to the
          same network medium.
        </para>
        <example id="ex-ether-arp-flux-hidden">
          <title>Correction of ARP flux with
            <filename>net/$DEV/hidden</filename></title>
            <programlisting>
<prompt>[root@real-client]# </prompt><userinput>arping -I eth0 -c 1 172.19.22.254</userinput>
<computeroutput>ARPING 172.19.22.254 from 172.19.22.2 eth0
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2D]  0.704ms
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2E]  0.844ms
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2F]  0.918ms
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2C]  0.974ms
Sent 1 probes (1 broadcast(s))
Received 4 response(s)</computeroutput>
<prompt>[root@real-server]# </prompt><userinput>for i in all eth2 eth3 eth4 eth5 ; do</userinput>
<prompt>&gt; </prompt><userinput>echo 1 &gt; /proc/sys/net/ipv4/conf/$i/hidden</userinput>
<prompt>&gt; </prompt><userinput>done</userinput>
<prompt>[root@real-client]# </prompt><userinput>arping -I eth0 -c 2 172.19.22.254</userinput>
<computeroutput>ARPING 172.19.22.254 from 172.19.22.2 eth0
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2D]  0.710ms
Unicast reply from 172.19.22.254 [00:60:F5:08:8A:2D]  0.624ms
Sent 2 probes (1 broadcast(s))
Received 2 response(s)</computeroutput>
          </programlisting>
        </example>
        <para>
          These are two examples of methods to prevent ARP flux.  Other
          alternatives for correcting this problem are documented in
          <xref linkend="ether-arp-filtering"/>, where much more sophisticated
          tools are available for manipulation and control over the ARP
          functions of linux.
        </para>
      </section>
    </section>
  </section>
  <section id="ether-arp-proxy">
    <title>Proxy ARP</title>
    <indexterm zone="ether-arp-proxy" significance="preferred">
      <primary>ARP, proxy</primary>
    </indexterm>
    <para>
      Occasionally, an IP network must be split into separate segments.  Proxy
      ARP can be used for increased control over packets exchanged between two
      hosts or to limit exposure between two hosts in a single IP network.
      The technique of proxy ARP is commonly used to interpose a device with
      higher layer functionality between two other hosts.  From a practical
      standpoint, there is little difference between the functions of a
      <link linkend="bridging-packet-filter">packet-filtering bridge</link> and
      a firewall performing proxy ARP.  The manner by which the interposed
      device receives the packets, however, is tremendously different.
    </para>
    <example id="ex-ether-arp-proxy">
      <title>Proxy ARP Network Diagram</title>
      <mediaobject id="image-ether-arp-proxy">
        <imageobject>
          <imagedata fileref="images/ether-arp-proxy.png" format="PNG"/>
        </imageobject>
        <imageobject>
          <imagedata fileref="images/ether-arp-proxy.svg" format="SVG"/>
        </imageobject>
      </mediaobject>
    </example>
    <para>
      The device performing proxy ARP (&masq-gw;) responds for all ARP queries
      on behalf of IPs reachable on interfaces other than the interface on
      which the query arrives.
    </para>
    <para>
    </para>
    <para>
    </para>
    <para>
    </para>
    <para>
    </para>
    <para>
    </para>
    <para>
      FIXME; manual proxy ARP (see also
      <xref linkend="adv-proxy-arp"/>), kernel proxy ARP, and the newly
      supported sysctl <filename>net/ipv4/conf/$DEV/medium_id</filename>.
    </para>
    <anchor id="ether-arp-proxy-mediumid"/>
    <indexterm zone="ether-arp-proxy-mediumid">
      <primary>sysctl</primary>
      <secondary><constant>medium_id</constant></secondary>
    </indexterm>
    <indexterm zone="ether-arp-proxy-mediumid">
      <primary><constant>ARP, proxy</constant></primary>
      <secondary>with kernel</secondary>
      <tertiary><constant>medium_id</constant></tertiary>
    </indexterm>
    <para>
      For a brief description of the use of medium_id, see
      <ulink url="http://www.ssi.bg/~ja/#medium_id">Julian's
      remarks</ulink>.
    </para>
    <anchor id="ether-arp-proxy-kernel"/>
    <indexterm zone="ether-arp-proxy-kernel">
      <primary>ARP, proxy</primary>
      <secondary>with kernel</secondary>
      <tertiary><constant>proxy_arp</constant></tertiary>
    </indexterm>
    <indexterm zone="ether-arp-proxy-kernel">
      <primary>sysctl</primary>
      <secondary><constant>proxy_arp</constant></secondary>
    </indexterm>
    <para>
      FIXME; Kernel proxy ARP with the sysctl
      <filename>net/ipv4/conf/$DEV/proxy_arp</filename>.
    </para>
    <para>
      Note....until this section is written, this
      <ulink url="http://mailman.ds9a.nl/pipermail/lartc/2003q2/008315.html">post</ulink>
      by Don Cohen is rather instructive.
    </para>
  </section>
  <section id="ether-arp-filtering">
    <title>ARP filtering</title>
    <indexterm zone="ether-arp-filtering">
      <primary>ARP filtering</primary>
    </indexterm>
    <indexterm zone="ether-arp-filtering">
      <primary><command>ip arp</command></primary>
    </indexterm>
    <para>
      This section should be part of the "ghetto" which will
      include documentation on <command>ip arp</command>.  There's nothing
      more to add here at the moment (low priority).
    </para>
    <para>
      <programlisting>
<prompt># </prompt><userinput>ip arp help</userinput>
<computeroutput>Usage: ip arp [ list | flush ] [ RULE ]
       ip arp [ append | prepend | add | del | change | replace | test ] RULE
RULE := [ table TABLE_NAME ] [ pref NUMBER ] [ from PREFIX ] [ to PREFIX ]
           [ iif STRING ] [ oif STRING ] [ llfrom PREFIX ] [ llto PREFIX ]
           [ broadcasts ] [ unicasts ] [ ACTION ] [ ALTER ]
TABLE_NAME := [ input | forward | output ]
ACTION := [ deny | allow ]
ALTER := [ src IP ] [ llsrc LLADDR ] [ lldst LLADDR ]</computeroutput>
      </programlisting>
    </para>
    <para>
      The
      <ulink url="http://www.ssi.bg/~ja/#iparp"><command>ip
      arp</command></ulink> tool.
      Patches and code for the
      <ulink url="http://www.ssi.bg/~ja/#noarp">noarp
      route flag</ulink>.
    </para>
    <para>
      FIXME; add a few paragraphs on <command>ip arp</command> and the noarp
      flag.
    </para>
    <para>
    </para>
  </section>
  <section id="ether-vlan">
    <title>Connecting to an Ethernet 802.1q VLAN</title>
    <indexterm zone="ether-vlan">
      <primary>VLAN</primary>
    </indexterm>
    <para>
      Virtual LANs are a way to take a single switch and subdivide it into
      logical media segments.  A single switch port in a VLAN-capable switch
      can carry packets from multiple virtual LANs and linux can understand
      the format of these Ethernet frames.  For more on this, see
      <ulink url="http://www.candelatech.com/~greear/vlan.html">the linux
      802.1q VLAN implementation site</ulink>.
    </para>
    <para>
      Kernels in the late 2.4 series have support for VLAN incorporated into
      the stock release.  The <command>vconfig</command> tool, however needs
      to be compiled against the kernel source in order to provide userland
      configurability of the kernel support for VLANs.
    </para>
    <para>
      There are a few items of note which may prevent quick adoption of VLAN
      support under linux.  Ben McKeegan wrote a
      <ulink url="http://www.wanfear.com/pipermail/vlan/2002q4/002882.html">good
      summary</ulink> of the MTU/MRU issues involved with VLANs and 10/100
      Ethernet.  Gigabit Ethernet drivers are not hamstrung with this problem.
      Consider using gigabit Ethernet cards from the outset to avoid these
      potential problems.
    </para>
    <example id="ex-ether-vlan">
      <title>Bringing up a VLAN interface</title>
        <programlisting>
<prompt>[root@real-router]# </prompt><userinput>vconfig add eth0 7</userinput>
<prompt>[root@real-router]# </prompt><userinput>ip addr add dev eth0.7 192.168.30.254/24 brd +</userinput>
<prompt>[root@real-router]# </prompt><userinput>ip link set dev eth0.7 up</userinput>
       </programlisting>
    </example>
    <para>
      Each interface defined using the <command>vconfig</command> utility
      takes its name from the base device to which it has been bound, and
      appends the VLAN tag ID, as shown in
      <xref linkend="ex-ether-vlan"/>.
    </para>
    <para>
      This documentation is sparse.  Visit the
      <ulink url="http://www.candelatech.com/~greear/vlan.html">main
      site</ulink> and the
      <ulink url="http://www.wanfear.com/pipermail/vlan/">VLAN mailing list
      archives</ulink>.
    </para>
    <para>
    </para>
  </section>
  <section id="ether-bonding">
    <title>Link Aggregation and High Availability with Bonding</title>
    <indexterm zone="ether-bonding">
      <primary>bonding</primary>
    </indexterm>
    <para>
      Networking vendors have long offered a functionality for aggregating
      bandwidth across multiple physical links to a switch.
      This allows a machine (frequently a server) to treat multiple
      physical connections to switch units as a single logical link.
      The standard moniker for this technology is IEEE 802.3ad, although
      it is known by the common names of trunking, port trunking
      and link aggregation.  The conventional use of bonding under linux is
      an implementation of this
      <link linkend="ether-bonding-aggregation">link aggregation</link>.
    </para>
    <para>
      A separate use of the same driver allows the kernel to present a single
      logical interface for two physical links to two separate
      switches.  Only one link is used at any given time.  By using media
      independent interface signal failure to detect when a switch or link
      becomes unusable, the kernel can, transparently to userspace and
      application layer services, fail to the backup physical connection.
      Though not common, the failure of switches, network interfaces, and
      cables can cause outages.  As a component of high availability planning,
      <link linkend="ether-bonding-ha">these bonding techniques</link>
      can help reduce the number of single points of failure.
    </para>
    <para>
      For more information on bonding, see the
      <filename>Documentation/networking/bonding.txt</filename> from the linux
      source code tree.
    </para>
    <para>
    </para>
    <section id="ether-bonding-aggregation">
      <title>Link Aggregation</title>
      <indexterm zone="ether-bonding-aggregation">
        <primary>bonding</primary>
        <secondary>link aggregation</secondary>
      </indexterm>
      <indexterm zone="ether-bonding-aggregation">
        <primary>channel bonding</primary>
      </indexterm>
      <para>
        Bonding for link aggregation must be supported by both endpoints.
        Two linux machines connected via crossover cables can take advantage
        of link aggregation.  A single machine connected with two physical
        cables to a switch which supports port trunking can use link
        aggregation to the switch.
        Any conventional switch
        will become ineffably confused by a hardware address appearing on
        multiple ports simultaneously.
      </para>
      <example id="ex-ether-bonding-aggregation">
        <title>Link aggregation bonding</title>
        <programlisting>
<prompt>[root@real-server root]# </prompt><userinput>modprobe  bonding</userinput>
<prompt>[root@real-server root]# </prompt><userinput>ip addr add 192.168.100.33/24 brd + dev bond0</userinput>
<prompt>[root@real-server root]# </prompt><userinput>ip link set dev bond0 up</userinput>
<prompt>[root@real-server root]# </prompt><userinput>ifenslave  bond0 eth2 eth3</userinput>
<computeroutput>master has no hw address assigned; getting one from slave!
The interface eth2 is up, shutting it down it to enslave it.
The interface eth3 is up, shutting it down it to enslave it.</computeroutput>
<prompt>[root@real-server root]# </prompt><userinput>ifenslave  bond0 eth2 eth3</userinput>
<prompt>[root@real-server root]# </prompt><userinput>cat /proc/net/bond0/info</userinput>
<computeroutput>Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Link Failure Count: 0

Slave Interface: eth3
MII Status: up
Link Failure Count: 0</computeroutput>
        </programlisting>
      </example>
      <para>
        FIXME;  Need an experiment here....maybe a tcpdump to show how the
        management frames appear on the wire.
      </para>
      <para>
        This
        <ulink url="http://www.beowulf.org/software/bonding.html">Beowulf
        software page</ulink> describes in a bit more detail the rationale and
        a practical application of linux channel bonding (for link
        aggregation).
      </para>
      <para>
      </para>
      <para>
      </para>
    </section>
    <section id="ether-bonding-ha">
      <title>High Availability</title>
      <indexterm zone="ether-bonding-ha">
        <primary>bonding</primary>
        <secondary>high availability</secondary>
      </indexterm>
      <para>
        Bonding support under linux is part of a high availability solution.
        For an entry point into the complexity of high availability in
        conjunction with linux, see the
        <ulink url="http://linux-ha.org/">linux-ha.org</ulink> site.  To guard
        against layer two (switch) and layer one (cable) failure, a machine
        can be configured with multiple physical connections to separate
        switch devices while presenting a single logical interface to
        userspace.
      </para>
      <para>
        The name of the interface can be specified by the user.  It is
        commonly <constant>bond0</constant> or something similar.  As a
        logical interface, it can be used in routing tables and by
        <link linkend="tools-tcpdump"><command>tcpdump</command></link>.
      </para>
      <para>
        The bond interface, when created, has no link layer address.  In the
        example below, an address is manually added to the interface.  See
        <xref linkend="ex-ether-bonding-aggregation"/> for an example of the
        bonding driver reporting setting the link layer address when the first
        device is enslaved to the bond (doesn't that sound cruel!).
      </para>
      <example id="ex-ether-bonding-ha">
        <title>High availability bonding</title>
        <programlisting>
<prompt>[root@real-server root]# </prompt><userinput>modprobe bonding mode=1 miimon=100 downdelay=200 updelay=200</userinput>
<prompt>[root@real-server root]# </prompt><userinput>ip link set dev bond0 addr 00:80:c8:e7:ab:5c</userinput>
<prompt>[root@real-server root]# </prompt><userinput>ip addr add 192.168.100.33/24 brd + dev bond0</userinput>
<prompt>[root@real-server root]# </prompt><userinput>ip link set dev bond0 up</userinput>
<prompt>[root@real-server root]# </prompt><userinput>ifenslave  bond0 eth2 eth3</userinput>
<computeroutput>The interface eth2 is up, shutting it down it to enslave it.
The interface eth3 is up, shutting it down it to enslave it.</computeroutput>
<prompt>[root@real-server root]# </prompt><userinput>ip link show eth2 ; ip link show eth3 ; ip link show bond0</userinput>
<computeroutput>4: eth2: &lt;BROADCAST,MULTICAST,SLAVE,UP&gt; mtu 1500 qdisc pfifo_fast master bond0 qlen 100
  link/ether 00:80:c8:e7:ab:5c brd ff:ff:ff:ff:ff:ff
5: eth3: &lt;BROADCAST,MULTICAST,NOARP,SLAVE,DEBUG,AUTOMEDIA,PORTSEL,NOTRAILERS,UP&gt; mtu 1500 qdisc pfifo_fast master bond0 qlen 100
  link/ether 00:80:c8:e7:ab:5c brd ff:ff:ff:ff:ff:ff
58: bond0: &lt;BROADCAST,MULTICAST,MASTER,UP&gt; mtu 1500 qdisc noqueue
  link/ether 00:80:c8:e7:ab:5c brd ff:ff:ff:ff:ff:ff</computeroutput>
        </programlisting>
      </example>
      <para>
        Immediately noticeable, there is a new flag in the <command>ip link
        show</command> output.  The <constant>MASTER</constant> and
        <constant>SLAVE</constant> flags clearly report the nature of the
        relationship between the interfaces.  Also, the Ethernet interfaces
        indicate the master interface via the keywords <constant>master
        bond0</constant>.
      </para>
      <para>
        Note also, that all three of the interfaces share the same link layer
        address, <constant>00:80:c8:e7:ab:5c</constant>.
      </para>
      <para>
        FIXME; What doe DEBUG,AUTOMEDIA,PORTSEL,NOTRAILERS mean?
      </para>
    </section>

    <!--

      # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #
      #                                                                 #
      #                     link aggregation                            #
      #                                                                 #
      # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #

      #
      # Cisco-think for stuff on "etherchannel", aka 802.3ad.
      #

      http://www.cisco.com/warp/public/473/140.html#pagp
      http://www.cisco.com/univercd/cc/td/doc/product/lan/cat2950/1216ea2b/scg/swgports.htm#xtocid21

      #
      # Here's a thread on 802.3ad under linux started in 2000-08
      #

      http://www.wcug.wwu.edu/lists/netdev/200008/msg00093.html

      #
      # here's link aggregation in a Beowulf cluster; sort of a HOWTO
      #

      http://ilab.usc.edu/beo/

      # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #
      #                                                                 #
      #                      high availability                          #
      #                                                                 #
      # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #

      #
      # HA is a huge topic, encompassing application layer problems
      # as well as network layer problems, but linux-ha tries to solve
      # some of them

      http://linux-ha.org/

      #
      # see also, vrrpd (keepalived) and fake
      #

      http://www.vergenet.net/linux/fake/
      http://www.keepalived.org/

      -->

  </section>
</chapter>