mirror of https://github.com/tLDP/LDP
updated
This commit is contained in:
parent
d2d7ea38f8
commit
6a9f0c5160
|
@ -1,4 +1,3 @@
|
|||
|
||||
<!doctype linuxdoc system>
|
||||
|
||||
<!-- $Id$
|
||||
|
@ -31,7 +30,7 @@ This document is dedicated to lots of people, and is my attempt to do
|
|||
something back. To list but a few:
|
||||
<p>
|
||||
<itemize>
|
||||
<item>Rusty Russel
|
||||
<item>Rusty Russell
|
||||
<item>Alexey N. Kuznetsov
|
||||
<item>The good folks from Google
|
||||
<item>The staff of Casema Internet
|
||||
|
@ -46,7 +45,7 @@ routing. Unbeknownst to most users, you already run tools which allow you to
|
|||
do spectacular things. Commands like 'route' and 'ifconfig' are actually
|
||||
very thin wrappers for the very powerful iproute2 infrastructure
|
||||
<p>
|
||||
I hope that this HOWTO will become as readable as the ones by Rusty Russel
|
||||
I hope that this HOWTO will become as readable as the ones by Rusty Russell
|
||||
of (amongst other things) netfilter fame.
|
||||
|
||||
You can always reach us by writing the <url name="HOWTO team"
|
||||
|
@ -60,7 +59,8 @@ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
|||
In short, if your STM-64 backbone breaks down and distributes pornography to
|
||||
your most esteemed customers - it's never our fault. Sorry.
|
||||
|
||||
Copyright (c) 2000 by bert hubert, Gregory Maxwell and Martijn van Oosterhout
|
||||
Copyright (c) 2000 by bert hubert, Gregory Maxwell, Martijn van
|
||||
Oosterhout, Remco van Mook, Paul B. Schroeder and others.
|
||||
|
||||
Please freely copy and distribute (sell or give away) this document in any
|
||||
format. It's requested that corrections and/or comments be fowarded to the
|
||||
|
@ -97,7 +97,7 @@ Here are some orther references which might help learn you more:
|
|||
<descrip>
|
||||
<tag><url
|
||||
url="http://netfilter.kernelnotes.org/unreliable-guides/networking-concepts-HOWTO.html"
|
||||
name="Rusty Russels networking-concepts-HOWTO"></tag>
|
||||
name="Rusty Russell's networking-concepts-HOWTO"></tag>
|
||||
Very nice introduction, explaining what a network is, and how it is
|
||||
connected to other networks
|
||||
<tag>Linux Networking-HOWTO (Previously the Net-3 HOWTO)</tag>
|
||||
|
@ -180,6 +180,19 @@ A Makefile is supplied which should help you create postscript, dvi, pdf,
|
|||
html and plain text. You may need to install sgml-tools, ghostscript and
|
||||
tetex to get all formats.
|
||||
|
||||
<sect1>Mailing list
|
||||
<p>
|
||||
The authors receive an increasing amount of mail about this HOWTO. Because
|
||||
of the clear interest of the community, it has been decided to start a
|
||||
mailinglist where people can talk to each other about Advanced Routing and
|
||||
Traffic Control. You can subscribe to the list
|
||||
<url url="http://mailman.ds9a.nl/mailman/listinfo/lartc" name="here">.
|
||||
<p>
|
||||
It should be pointed out that the authors are very hesitant of answering
|
||||
questions asked not on the list. We would like the archive of the list to
|
||||
become some kind of knowledge base. If you have a question, please search
|
||||
the archive, and then post to the mailinglist.
|
||||
|
||||
<sect1>Layout of this document
|
||||
<p>
|
||||
We will be doing interesting stuff almost immediately, which also means that
|
||||
|
@ -205,12 +218,12 @@ they show some unexpected behaviour under Linux 2.2 and up. For example, GRE
|
|||
tunnels are an integral part of routing these days, but require completely
|
||||
different tools.
|
||||
|
||||
With iproute2, tunnels are an integral part of the tool set
|
||||
With iproute2, tunnels are an integral part of the tool set.
|
||||
|
||||
The 2.2 and above Linux kernels include a completely redesigned network
|
||||
subsystem. This new networking code brings Linux performance and a feature
|
||||
set with little competition in the general OS arena. In fact, the new
|
||||
routing filtering, and classifying code is more featureful then that
|
||||
routing, filtering, and classifying code is more featureful then that
|
||||
provided by many dedicated routers and firewalls and traffic shaping
|
||||
products.
|
||||
|
||||
|
@ -220,13 +233,13 @@ constant layering of cruft has lead to networking code that is filled with
|
|||
strange behaviour, much like most human languages. In the past, Linux
|
||||
emulated SunOS's handling of many of these things, which was not ideal.
|
||||
|
||||
This new framework has made it possible to clearly express features
|
||||
previously not possible.
|
||||
This new framework makes it possible to clearly express features
|
||||
previously beyond Linux's reach.
|
||||
|
||||
<sect1>Iproute2 tour
|
||||
<sect1>iproute2 tour
|
||||
<p>
|
||||
Linux has a sophisticated system for bandwidth provisioning called Traffic
|
||||
Control. This system supports various method for classifying, prioritising,
|
||||
Control. This system supports various method for classifying, prioritizing,
|
||||
sharing, and limiting both inbound and outbound traffic.
|
||||
|
||||
|
||||
|
@ -238,14 +251,14 @@ package is called 'iproute' on both RedHat and Debian, and may otherwise be
|
|||
found at <tt>ftp://ftp.inr.ac.ru/ip-routing/iproute2-2.2.4-now-ss??????.tar.gz"</tt>.
|
||||
Some parts of iproute require you to have certain kernel options enabled.
|
||||
|
||||
FIXME: We should mention <url url="ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz">
|
||||
is always the latest
|
||||
You can also try <url name="here" url="ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz">
|
||||
for the latest version.
|
||||
|
||||
<sect1>Exploring your current configuration
|
||||
<p>
|
||||
This may come as a surprise, but iproute2 is already configured! The current
|
||||
commands <tt>ifconfig</tt> and <tt>route</tt> are already using the advanced
|
||||
syscalls, but mostly with very default (ie, boring) settings.
|
||||
syscalls, but mostly with very default (ie. boring) settings.
|
||||
|
||||
The <tt>ip</tt> tool is central, and we'll ask it do display our interfaces
|
||||
for us.
|
||||
|
@ -270,16 +283,16 @@ home. I'll only explain part of the output as not everything is directly
|
|||
relevant.
|
||||
|
||||
We first see the loopback interface. While your computer may function
|
||||
somewhat without one, I'd advise against it. The mtu size (maximum transfer
|
||||
unit) is 3924 octects, and it is not supposed to queue. Which makes sense
|
||||
because the loopback interface is a figment of your kernels imagination.
|
||||
somewhat without one, I'd advise against it. The MTU size (Maximum Transfer
|
||||
Unit) is 3924 octets, and it is not supposed to queue. Which makes sense
|
||||
because the loopback interface is a figment of your kernel's imagination.
|
||||
|
||||
I'll skip the dummy interface for now, and it may not be present on your
|
||||
computer. Then there are my two network interfaces, one at the side of my
|
||||
computer. Then there are my two physical network interfaces, one at the side of my
|
||||
cable modem, the other serves my home ethernet segment. Furthermore, we see
|
||||
a ppp0 interface.
|
||||
|
||||
Note the absence of IP addresses. Iproute disconnects the concept of 'links'
|
||||
Note the absence of IP addresses. iproute disconnects the concept of 'links'
|
||||
and 'IP addresses'. With IP aliasing, the concept of 'the' IP address had
|
||||
become quite irrelevant anyhow.
|
||||
|
||||
|
@ -305,10 +318,10 @@ ethernet interfaces.
|
|||
</verb></tscreen>
|
||||
<p>
|
||||
This contains more information. It shows all our addresses, and to which
|
||||
cards they belong. 'inet' stands for Internet. There are lots of other
|
||||
cards they belong. 'inet' stands for Internet (IPv4). There are lots of other
|
||||
address families, but these don't concern us right now.
|
||||
|
||||
Lets examine eth0 somewhat closer. It says that it is related to the inet
|
||||
Let's examine eth0 somewhat closer. It says that it is related to the inet
|
||||
address '10.0.0.1/8'. What does this mean? The /8 stands for the number of
|
||||
bits that are in the Network Address. There are 32 bits, so we have 24 bits
|
||||
left that are part of our network. The first 8 bits of 10.0.0.1 correspond
|
||||
|
@ -317,7 +330,7 @@ to 10.0.0.0, our Network Address, and our netmask is 255.0.0.0.
|
|||
The other bits are connected to this interface, so 10.250.3.13 is directly
|
||||
available on eth0, as is 10.0.0.1 for example.
|
||||
|
||||
With ppp0, the same concept goes, though the numbers are different. It's
|
||||
With ppp0, the same concept goes, though the numbers are different. Its
|
||||
address is 212.64.94.251, without a subnet mask. This means that we have a
|
||||
point-to-point connection and that every address, with the exception of
|
||||
212.64.94.251, is remote. There is more information however, it tells us
|
||||
|
@ -450,7 +463,7 @@ people, who should be served differently. The routing policy database allows
|
|||
you to do this by having multiple sets of routing tables.
|
||||
|
||||
If you want to use this feature, make sure that your kernel is compiled with
|
||||
the "IP: policy routing" feature.
|
||||
the "IP: advanced router" and "IP: policy routing" features.
|
||||
|
||||
When the kernel needs to make a routing decision, it finds out which table
|
||||
needs to be consulted. By default, there are three tables. The old 'route'
|
||||
|
@ -472,7 +485,7 @@ If we want to do fancy things, we generate rules which point to different
|
|||
tables which allow us to override system wide routing rules.
|
||||
|
||||
For the exact semantics on what the kernel does when there are more matching
|
||||
rules, see Alexey's ip-cfref documentation.
|
||||
rules, see Alexey's ip-cref documentation.
|
||||
|
||||
<sect1>Simple source routing
|
||||
<p>
|
||||
|
@ -540,7 +553,7 @@ in ip-up.
|
|||
There are 3 kinds of tunnels in Linux. There's IP in IP tunneling, GRE tunneling and tunnels that live outside the kernel (like, for example PPTP).
|
||||
<sect1>A few general remarks about tunnels:
|
||||
<p>
|
||||
Tunnels can be used to do some very unusual and very cool stuff. They can also make things go horribly wrong when you don't configure them right. Don't point your default route to a tunnel device unless you know _exactly_ what you are doing :-). Furthermore, tunneling increases overhead, because it needs an extra set of IP headers. Typically this is 20 bytes per packet, so if the normal packet size (MTU) on a network is 1500 bytes, a packet that is sent through a tunnel can only be 1480 bytes big. This is not necessarily a problem, but be sure to read up on IP packet fragmentation/reassembly when you plan to connect large networks with tunnels. Oh, and of course, the fastest way to dig a tunnel is to dig at both sides.
|
||||
Tunnels can be used to do some very unusual and very cool stuff. They can also make things go horribly wrong when you don't configure them right. Don't point your default route to a tunnel device unless you know <bf>exactly</bf> what you are doing :-). Furthermore, tunneling increases overhead, because it needs an extra set of IP headers. Typically this is 20 bytes per packet, so if the normal packet size (MTU) on a network is 1500 bytes, a packet that is sent through a tunnel can only be 1480 bytes big. This is not necessarily a problem, but be sure to read up on IP packet fragmentation/reassembly when you plan to connect large networks with tunnels. Oh, and of course, the fastest way to dig a tunnel is to dig at both sides.
|
||||
<p>
|
||||
<sect1>IP in IP tunneling
|
||||
<p>
|
||||
|
@ -599,7 +612,7 @@ GRE is a tunneling protocol that was originally developed by Cisco, and it
|
|||
can do a few more things than IP-in-IP tunneling. For example, you can also
|
||||
transport multicast traffic and IPv6 through a GRE tunnel.
|
||||
|
||||
In Linux, you'll need the ip_gre module.
|
||||
In Linux, you'll need the ip_gre.o module.
|
||||
|
||||
<sect2>IPv4 Tunneling
|
||||
<p>
|
||||
|
@ -668,15 +681,6 @@ Of course, you can replace netb with neta for router B.
|
|||
<sect2>IPv6 Tunneling
|
||||
<p>
|
||||
|
||||
BIG FAT WARNING !!
|
||||
|
||||
The following is untested and might therefore be
|
||||
completely and utter BOLLOCKS. Proceed at your own risk. Don't say I didn't
|
||||
warn you.
|
||||
|
||||
FIXME: check & try all this
|
||||
|
||||
<p>
|
||||
A short bit about IPv6 addresses:<p>
|
||||
IPv6 addresses are, compared to IPv4 addresses, monstrously big. An example:
|
||||
<verb>3ffe:2502:200:40:281:48fe:dcfe:d9bc</verb>
|
||||
|
@ -718,10 +722,11 @@ FIXME: Waiting for our feature editor Stefan to finish his stuf
|
|||
|
||||
<sect>Multicast routing
|
||||
<p>
|
||||
FIXME: Editor Vacancy!
|
||||
FIXME: Editor Vacancy! (somebody is working on it, though)
|
||||
|
||||
<sect>Using Class Based Queueing for bandwidth management
|
||||
<p>
|
||||
Now, when I discovered this, it *really* blew me away. Linux 2.2 comes with
|
||||
Now, when I discovered this, it <em>really</em> blew me away. Linux 2.2 comes with
|
||||
everything to manage bandwidth in ways comparable to high-end dedicated
|
||||
bandwidth management systems.
|
||||
|
||||
|
@ -751,14 +756,14 @@ We will explore how our ISP could have used Linux to manage their bandwidth.
|
|||
|
||||
<sect1>What is queueing?
|
||||
<p>
|
||||
With queueing we determine the order in which data is *sent*. It it important
|
||||
to realise this, we can only shape data that we transmit. How this changing
|
||||
With queueing we determine the order in which data is <em>sent</em>. It it important
|
||||
to realise that we can only shape data that we transmit. How does this changing
|
||||
the order determine the speed of transmission? Imagine a cash register which
|
||||
is able to process 3 customers per minute.
|
||||
|
||||
People wishing to pay go stand in line at the 'tail end' of the queue. This
|
||||
is 'fifo queueing'. Let's suppose however that we let certain people always
|
||||
join in the middle of the queue, in stead of at the end. These people spend
|
||||
is 'FIFO queueing' (First In, First Out). Let's suppose however that we let certain people always
|
||||
join in the middle of the queue, instead of at the end. These people spend
|
||||
a lot less time in the queue and are therefore able to shop faster.
|
||||
|
||||
With the way the internet works, we have no direct control of what people
|
||||
|
@ -776,7 +781,7 @@ This is the equivalent of not reading half of your mail, and hoping that
|
|||
people will stop sending it to you. With the difference that it works for
|
||||
the Internet :-)
|
||||
|
||||
FIXME: explain that normally, ACKs are used to determine speed
|
||||
FIXME: explain congestion windows
|
||||
|
||||
<tscreen><verb>
|
||||
[The Internet] ---<E3, T3, whatever>--- [Linux router] --- [Office+ISP]
|
||||
|
@ -784,15 +789,15 @@ FIXME: explain that normally, ACKs are used to determine speed
|
|||
</verb></tscreen>
|
||||
|
||||
Now, our Linux router has two interfaces which I shall dub eth0 and eth1.
|
||||
Eth1 is connected to our router which moves packets from to and from our
|
||||
eth1 is connected to our router which moves packets from to and from our
|
||||
fibre link.
|
||||
|
||||
Eth0 is connected to a subnet which contains both the corporate firewall and
|
||||
eth0 is connected to a subnet which contains both the corporate firewall and
|
||||
our network head ends, through which we can connect to our customers.
|
||||
|
||||
Because we can only limit what we send, we need two separate but possibly
|
||||
very similar sets of rules. By modifying queueing on eth0, we determine how
|
||||
fast data gets sent to our customers, and therefor how much downstream
|
||||
fast data gets sent to our customers, and therefore how much downstream
|
||||
bandwidth is available for them. Their 'download speed' in short.
|
||||
|
||||
On eth1, we determine how fast we send data to The Internet, how fast our
|
||||
|
@ -804,7 +809,7 @@ CBQ enables us to generate several classes, and even classes within classes.
|
|||
The larger devisions might be called 'agencies'. Within these classes may be
|
||||
things like 'bulk' or 'interactive'.
|
||||
|
||||
For example, we may have a 10 megabit internet connection to 'the internet'
|
||||
For example, we may have a 10 megabit connection to 'the internet'
|
||||
which is to be shared by our customers, and our corporate needs. We should
|
||||
not allow a few people at the office to steal away large amounts of
|
||||
bandwidth which we should sell to our customers.
|
||||
|
@ -817,7 +822,7 @@ create virtual circuits. This works, but frame is not very fine grained, ATM
|
|||
is terribly inefficient at carrying IP traffic, and neither have standardised
|
||||
ways to segregate different types of traffic into different VCs.
|
||||
|
||||
Hover, if you do use ATM, Linux can also happily perform deft acts of fancy
|
||||
However, if you do use ATM, Linux can also happily perform deft acts of fancy
|
||||
traffic classification for you too. Another way is to order separate
|
||||
connections, but this is not very practical and also not very elegant, and
|
||||
still does not solve all your problems.
|
||||
|
@ -842,12 +847,6 @@ mention that on the command line as well. We tell the kernel that it can
|
|||
allocate 10Mbit and that the average packet size is somewhere around 1000
|
||||
octets.
|
||||
|
||||
FIXME: Double check with Alexey the the built in cell calculation is sufficient.
|
||||
|
||||
FIXME: With a 1500 mtu, the default cell is calculated same as the old example.
|
||||
|
||||
FIXME: I checked the sources (userspace and kernel), so we should be safe omitting it.
|
||||
|
||||
Now we need to generate our root class, from which all others descend:
|
||||
<tscreen><verb>
|
||||
# tc class add dev eth0 parent 10:0 classid 10:1 cbq bandwidth 10Mbit rate \
|
||||
|
@ -885,17 +884,17 @@ To top it off, we generate the root Office class:
|
|||
To make this a bit clearer, a diagram which shows our classes:
|
||||
|
||||
<tscreen><verb>
|
||||
+-------------[10: 10Mbit]----------------------+
|
||||
|+-------------[10:1 root 10Mbit]--------------+|
|
||||
|| ||
|
||||
|| +-[10:100 8Mbit]-+ +--[10:200 2Mbit]-----+ ||
|
||||
|| | | | | ||
|
||||
|| | ISP | | Office | ||
|
||||
|| | | | | ||
|
||||
|| +----------------+ +---------------------+ ||
|
||||
|| ||
|
||||
|+---------------------------------------------+|
|
||||
+-----------------------------------------------+
|
||||
+-------------[10: 10Mbit]-------------------------+
|
||||
|+-------------[10:1 root 10Mbit]-----------------+|
|
||||
|| ||
|
||||
|| +-----[10:100 8Mbit]---------+ [10:200 2Mbit] ||
|
||||
|| | | | | ||
|
||||
|| | ISP | | Office | ||
|
||||
|| | | | | ||
|
||||
|| +----------------------------+ +------------+ ||
|
||||
|| ||
|
||||
|+------------------------------------------------+|
|
||||
+--------------------------------------------------+
|
||||
</verb></tscreen>
|
||||
|
||||
Ok, now we have told the kernel what our classes are, but not yet how to
|
||||
|
@ -1153,14 +1152,38 @@ thus leaving more of the available bandwidth for others.
|
|||
|
||||
See the section on protecting your host from SYN floods for an example on
|
||||
how this works.
|
||||
<sect1>WRR
|
||||
<p>
|
||||
This qdisc is not included in the standard kernels but can be downloaded from
|
||||
<url url="http://wipl-wrr.dkik.dk/wrr/">.
|
||||
Currently the qdisc is only tested with Linux 2.2 kernels but it will
|
||||
probably work with 2.4 kernels too.
|
||||
|
||||
The WRR qdisc distributes bandwidth between its classes using the weighted
|
||||
round robin scheme. That is, like the CBQ qdisc it contains classes
|
||||
into which arbitrary qdiscs can be plugged. All classes which have sufficient
|
||||
demand will get bandwidth proportional to the weights associated with the classes.
|
||||
The weights can be set manually using the <tt>tc</tt> program. But they
|
||||
can also be made automatically decreasing for classes transferring much data.
|
||||
|
||||
The qdisc has a build-in classifier which assigns packets coming from or
|
||||
sent to different machines to different classes. Either the MAC or IP and
|
||||
either source or destination addresses can be used. The MAC address can only
|
||||
be used when the Linux box is acting as an ethernet bridge, however. The
|
||||
classes are automatically assigned to machines based on the packets seen.
|
||||
|
||||
The qdisc can be very useful at sites such as dorms where a lot of unrelated
|
||||
individuals share an Internet connection. A set of scripts setting up a
|
||||
relevant behavior for such a site is a central part of the WRR distribution.
|
||||
|
||||
<sect>Netfilter & iproute - marking packets
|
||||
<p>
|
||||
So far we've seen how iproute works, and netfilter was mentioned a few
|
||||
times. This would be a good time to browse through <url name="Rusty's Remarkably
|
||||
Unreliable guides"
|
||||
Unreliable Guides"
|
||||
url="http://netfilter.kernelnotes.org/unreliable-guides/">. Netfilter itself
|
||||
can be found <url name="here"
|
||||
url="http://antarctica.penguincomputing.com/~netfilter/">.
|
||||
url="http://netfilter.filewatcher.org/">.
|
||||
|
||||
Netfilter allows us to filter packets, or mangle their headers. One special
|
||||
feature is that we can mark a packet with a number. This is done with the
|
||||
|
@ -1218,6 +1241,8 @@ IP: advanced router (CONFIG_IP_ADVANCED_ROUTER) [Y/n/?]
|
|||
IP: use netfilter MARK value as routing key (CONFIG_IP_ROUTE_FWMARK) [Y/n/?]
|
||||
</verb></tscreen>
|
||||
|
||||
See also <ref id="SQUID" name="Transparent web-caching using netfilter, iproute2, ipchains and squid">
|
||||
in the Cookbook.
|
||||
<sect>More classifiers
|
||||
<p>
|
||||
|
||||
|
@ -1340,7 +1365,7 @@ The options decribed apply to all filters, not only U32.
|
|||
The U32 selector contains definition of the pattern, that will be matched
|
||||
to the currently processed packet. Precisely, it defines which bits are
|
||||
to be matched in the packet header and nothing more, but this simple
|
||||
method is very powerful. Let's take a look at the following examplesm
|
||||
method is very powerful. Let's take a look at the following examples,
|
||||
taken directly from a pretty complex, real-world filter:
|
||||
|
||||
<tscreen><verb>
|
||||
|
@ -1358,7 +1383,7 @@ it's 0xff, so the byte will match if it's exactly 0x10. The <tt>at</tt>
|
|||
keyword means that the match is to be started at specified offset (in
|
||||
bytes) -- in this case it's beginning of the packet. Translating all
|
||||
that to human language, the packet will match if its Type of Service
|
||||
field will have ,,low delay'' bits set. Let's analyze another rule:
|
||||
field will have `low delay' bits set. Let's analyze another rule:
|
||||
|
||||
<tscreen><verb>
|
||||
# filter parent 1: protocol ip pref 10 u32 fh 800::803 order 2051 key ht 800 bkt 0 flowid 1:3 \
|
||||
|
@ -1627,18 +1652,28 @@ kernel.
|
|||
<sect2>Generic ipv4
|
||||
<p>
|
||||
As a generic note, most rate limiting features don't work on loopback, so
|
||||
don't test them locally.
|
||||
don't test them locally. The limits are supplied in 'jiffies', and are
|
||||
enforced using the earlier mentioned token bucket filter.
|
||||
|
||||
The kernel has an internal clock which runs at 'HZ' ticks (or 'jiffies') per
|
||||
second. On intel, 'HZ' is mostly 100. So setting a *_rate file to, say 50,
|
||||
would allow for 2 packets per second. The token bucket filter is also
|
||||
configured to allow for a burst of at most 6 packets, if enough tokens have
|
||||
been earned.
|
||||
|
||||
<descrip>
|
||||
<tag>/proc/sys/net/ipv4/icmp_destunreach_rate</tag>
|
||||
FIXME: fill this in
|
||||
If the kernel decides that it can't deliver a packet, it will drop it, and
|
||||
send the source of the packet an ICMP notice to this effect.
|
||||
<tag>/proc/sys/net/ipv4/icmp_echo_ignore_all</tag>
|
||||
FIXME: fill this in
|
||||
Don't act on echo packets at all. Please don't set this by default, but if
|
||||
you are used as a relay in a DoS attack, it may be useful.
|
||||
<tag>/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts [Useful]</tag>
|
||||
If you ping the broadcast address of a network, all hosts are supposed to
|
||||
respond. This makes for a dandy denial-of-service tool. Set this to 1 to
|
||||
ignore these broadcast messages.
|
||||
<tag>/proc/sys/net/ipv4/icmp_echoreply_rate</tag>
|
||||
FIXME: fill this in
|
||||
The rate at which echo replies are sent to any one destination.
|
||||
<tag>/proc/sys/net/ipv4/icmp_ignore_bogus_error_responses</tag>
|
||||
FIXME: fill this in
|
||||
<tag>/proc/sys/net/ipv4/icmp_paramprob_rate</tag>
|
||||
|
@ -1646,7 +1681,6 @@ FIXME: fill this in
|
|||
<tag>/proc/sys/net/ipv4/icmp_timeexceed_rate</tag>
|
||||
This the famous cause of the 'Solaris middle star' in traceroutes. Limits
|
||||
number of ICMP Time Exceeded messages sent.
|
||||
FIXME: Units of these rates - either I'm stupid, or this just doesn't work
|
||||
<tag>/proc/sys/net/ipv4/igmp_max_memberships</tag>
|
||||
FIXME: fill this in
|
||||
<tag>/proc/sys/net/ipv4/inet_peer_gc_maxtime</tag>
|
||||
|
@ -1667,19 +1701,17 @@ network. Don't do so for fun - routing loops cause much more damage that
|
|||
way. You might even consider lowering it in some circumstances.
|
||||
<tag>/proc/sys/net/ipv4/ip_dynaddr</tag>
|
||||
You need to set this if you use dial-on-demand with a dynamic interface
|
||||
address. Once your demand interface comes up, any queued packets will be
|
||||
rebranded to have the right address. This solves the problem that the
|
||||
address. Once your demand interface comes up, any local TCP sockets which haven't seen replies will be rebound to have the right address. This solves the problem that the
|
||||
connection that brings up your interface itself does not work, but the
|
||||
second try does.
|
||||
<tag>/proc/sys/net/ipv4/ip_forward</tag>
|
||||
If the kernel should attempt to forward packets. Off by default for hosts,
|
||||
on by default when configured as a router.
|
||||
If the kernel should attempt to forward packets. Off by default.
|
||||
<tag>/proc/sys/net/ipv4/ip_local_port_range</tag>
|
||||
Range of local ports for outgoing connections. Actually quite small by
|
||||
default, 1024 to 4999.
|
||||
<tag>/proc/sys/net/ipv4/ip_no_pmtu_disc</tag>
|
||||
Set this if you want to disable Path MTU discovery - a technique to
|
||||
determince the largest Maximum Transfer Unit possible on you path.
|
||||
determine the largest Maximum Transfer Unit possible on your path.
|
||||
<tag>/proc/sys/net/ipv4/ipfrag_high_thresh</tag>
|
||||
FIXME: fill this in
|
||||
<tag>/proc/sys/net/ipv4/ipfrag_low_thresh</tag>
|
||||
|
@ -1713,16 +1745,23 @@ FIXME: fill this in
|
|||
<tag>/proc/sys/net/ipv4/tcp_rfc1337</tag>
|
||||
FIXME: fill this in
|
||||
<tag>/proc/sys/net/ipv4/tcp_sack</tag>
|
||||
Use Selective ACK which can be used to signify that only a single packet is
|
||||
Use Selective ACK which can be used to signify that specific packets are
|
||||
missing - therefore helping fast recovery.
|
||||
<tag>/proc/sys/net/ipv4/tcp_stdurg</tag>
|
||||
FIXME: fill this in
|
||||
<tag>/proc/sys/net/ipv4/tcp_syn_retries</tag>
|
||||
FIXME: fill this in
|
||||
Number of SYN packets the kernel will send before giving up on the new
|
||||
connection.
|
||||
<tag>/proc/sys/net/ipv4/tcp_synack_retries</tag>
|
||||
FIXME: fill this in
|
||||
To open the other side of the connection, the kernel sends a SYN with a
|
||||
piggybacked ACK on it, to acknowledge the earlier received SYN. This is part
|
||||
2 of the threeway handshake. This setting determines the number of SYN+ACK
|
||||
packets send before the kernel gives up on the connection.
|
||||
<tag>/proc/sys/net/ipv4/tcp_timestamps</tag>
|
||||
FIXME: fill this in
|
||||
Timestamps are used, amongst other things, to protect against wrapping
|
||||
sequence numbers. A 1 gigabit link might conceivably re-encounter a previous
|
||||
sequence number with an out-of-line value, because if was of a previous
|
||||
generation. The timestamp will let it recognise this 'ancient packet'.
|
||||
<tag>/proc/sys/net/ipv4/tcp_tw_recycle</tag>
|
||||
FIXME: fill this in
|
||||
<tag>/proc/sys/net/ipv4/tcp_window_scaling</tag>
|
||||
|
@ -1754,7 +1793,10 @@ See the section on reverse path filters.
|
|||
<tag>/proc/sys/net/ipv4/conf/DEV/mc_forwarding</tag>
|
||||
If we do multicast forwarding on this interface
|
||||
<tag>/proc/sys/net/ipv4/conf/DEV/proxy_arp</tag>
|
||||
FIXME: fill this in
|
||||
If you set this to 1, all other interfaces will respond to arp queries
|
||||
destined for addresses on this interface. Can be very useful when building 'ip
|
||||
pseudo bridges'. Do take care that your netmasks are very correct before
|
||||
enabling this!
|
||||
<tag>/proc/sys/net/ipv4/conf/DEV/rp_filter</tag>
|
||||
See the section on reverse path filters.
|
||||
<tag>/proc/sys/net/ipv4/conf/DEV/secure_redirects</tag>
|
||||
|
@ -1897,7 +1939,7 @@ links with small min's it might be wise to make max perhaps four or
|
|||
more times large then min.
|
||||
|
||||
Burst controls how the RED algorithm responds to bursts. Burst must be set
|
||||
large then min/avpkt. Experimentally, I've found (min+min+max)/(3*avpkt) to
|
||||
larger then min/avpkt. Experimentally, I've found (min+min+max)/(3*avpkt) to
|
||||
work okay.
|
||||
|
||||
Additionally, you need to set limit and avpkt. Limit is a safety value, after
|
||||
|
@ -1912,7 +1954,7 @@ information.
|
|||
FIXME: more needed. This means *you* greg :-) - ahu
|
||||
|
||||
|
||||
<sect>Shaping Cookbook
|
||||
<sect>Cookbook
|
||||
<p>
|
||||
This section contains 'cookbook' entries which may help you solve problems.
|
||||
A cookbook is no replacement for understanding however, so try and comprehend
|
||||
|
@ -1958,7 +2000,7 @@ popular daemons have support for this.
|
|||
|
||||
We first attach a CBQ qdisc to eth0:
|
||||
<tscreen><verb>
|
||||
# tc qdisc add dev eth0 root handle 1: bandwidth 10Mbit cell 8 avpkt 1000 \
|
||||
# tc qdisc add dev eth0 root handle 1: cbq bandwidth 10Mbit cell 8 avpkt 1000 \
|
||||
mpu 64
|
||||
</verb></tscreen>
|
||||
|
||||
|
@ -1989,7 +2031,7 @@ FIXME: why no token bucket filter? is there a default pfifo_fast fallback
|
|||
somewhere?
|
||||
|
||||
<sect1>Protecting your host from SYN floods
|
||||
<p>From Alexeys iproute documentation, adapted to netfilter and with more
|
||||
<p>From Alexey's iproute documentation, adapted to netfilter and with more
|
||||
plausible paths. If you use this, take care to adjust the numbers to
|
||||
reasonable values for your system.
|
||||
|
||||
|
@ -2028,7 +2070,7 @@ $TC qdisc add dev $INDEV handle ffff: ingress
|
|||
#
|
||||
# SYN packets are 40 bytes (320 bits) so three SYNs equals
|
||||
# 960 bits (approximately 1kbit); so we rate limit below
|
||||
# the incoming SYNs to 3/sec (not very sueful really; but
|
||||
# the incoming SYNs to 3/sec (not very useful really; but
|
||||
#serves to show the point - JHS
|
||||
############################################################
|
||||
$TC filter add dev $INDEV parent ffff: protocol ip prio 50 handle 1 fw \
|
||||
|
@ -2098,7 +2140,7 @@ class:
|
|||
|
||||
</verb></tscreen>
|
||||
|
||||
<sect1>Prioritising interactive traffic
|
||||
<sect1>Prioritizing interactive traffic
|
||||
<p>
|
||||
If lots of data is coming down your link, or going up for that matter, and
|
||||
you are trying to do some maintenance via telnet or ssh, this may not go too
|
||||
|
@ -2125,7 +2167,7 @@ author of the ipchains TOS-mangling code, puts it as follows:
|
|||
<tscreen>
|
||||
Especially the "Minimum Delay" is important for me. I switch it on for
|
||||
"interactive" packets in my upstream (Linux) router. I'm
|
||||
behind a 33k6 modem link. Linux prioritises packets in 3 queues. This
|
||||
behind a 33k6 modem link. Linux prioritizes packets in 3 queues. This
|
||||
way I get acceptable interactive performance while doing bulk
|
||||
downloads at the same time.
|
||||
</tscreen>
|
||||
|
@ -2159,6 +2201,250 @@ netfilter. On your local box:
|
|||
-j TOS --set-tos Maximize-Throughput
|
||||
</verb></tscreen>
|
||||
|
||||
<sect1>Transparent web-caching using netfilter, iproute2, ipchains and squid
|
||||
<p>
|
||||
<label id="SQUID">
|
||||
This section was sent in by reader Ram Narula from Internet for Education
|
||||
(Thailand).
|
||||
|
||||
The regular technique in accomplishing this in Linux
|
||||
is probably with use of ipchains AFTER making sure
|
||||
that the "outgoing" port 80(web) traffic gets routed through
|
||||
the server running squid.
|
||||
|
||||
There are 3 common methods to make sure "outgoing"
|
||||
port 80 traffic gets routed to the server running squid
|
||||
and 4th one is being introduced here.
|
||||
|
||||
<descrip>
|
||||
<tag>Making the gateway router do it.</tag>
|
||||
If you can tell your gateway router to
|
||||
match packets that has outgoing destination port
|
||||
of 80 to be sent to the IP address of squid server.
|
||||
<p>
|
||||
BUT
|
||||
<p>
|
||||
This would put additional load on the router and
|
||||
some commercial routers might not even support this.
|
||||
<tag>Using a Layer 4 switch.</tag>
|
||||
Layer 4 switches can handle this without any problem.
|
||||
<p>
|
||||
BUT
|
||||
<p>
|
||||
The cost for this equipment is usually very high. Typical
|
||||
layer 4 switch would normally cost more than
|
||||
a typical router+good linux server.
|
||||
<tag>Using cache server as network's gateway.</tag>
|
||||
You can force ALL traffic through cache server.
|
||||
<p>
|
||||
BUT
|
||||
<p>
|
||||
This is quite risky because Squid does
|
||||
utilize lots of cpu power which might
|
||||
result in slower over-all network performance
|
||||
or the server itself might crash and no one on the
|
||||
network will be able to access the internet if
|
||||
that occurs.
|
||||
|
||||
|
||||
<tag>Linux+NetFilter router.</tag>
|
||||
By using NetFilter another technique can be implemented
|
||||
which is using NetFilter for "mark"ing the packets
|
||||
with destination port 80 and using iproute2 to
|
||||
route the "mark"ed packets to the Squid server.
|
||||
</descrip>
|
||||
<tscreen><verb>
|
||||
|----------------|
|
||||
| Implementation |
|
||||
|----------------|
|
||||
|
||||
Addresses used
|
||||
10.0.0.1 naret (NetFilter server)
|
||||
10.0.0.2 silom (Squid server)
|
||||
10.0.0.3 donmuang (Router connected to the internet)
|
||||
10.0.0.4 kaosarn (other server on network)
|
||||
10.0.0.5 RAS
|
||||
10.0.0.0/24 main network
|
||||
10.0.0.0/19 total network
|
||||
|
||||
|---------------|
|
||||
|Network diagram|
|
||||
|---------------|
|
||||
|
||||
Internet
|
||||
|
|
||||
donmuang
|
||||
|
|
||||
------------hub/switch----------
|
||||
| | | |
|
||||
naret silom kaosarn RAS etc.
|
||||
</verb></tscreen>
|
||||
First, make all traffic pass through naret by making
|
||||
sure it is the default gateway except for silom.
|
||||
Silom's default gateway has to be donmuang (10.0.0.3) or
|
||||
this would create web traffic loop.
|
||||
|
||||
|
||||
<p>
|
||||
(all servers on my network had 10.0.0.1 as the default gateway
|
||||
which was the former IP address of donmuang router so what I did
|
||||
was changed the IP address of donmuang to 10.0.0.3 and gave
|
||||
naret ip address of 10.0.0.1)
|
||||
|
||||
<tscreen><verb>
|
||||
Silom
|
||||
-----
|
||||
-setup squid and ipchains
|
||||
</verb></tscreen>
|
||||
|
||||
<p>
|
||||
Setup Squid server on silom, make sure it does support
|
||||
transparent caching/proxying, the default port is usually
|
||||
3128, so all traffic for port 80 has to be redirected to port
|
||||
3128 locally. This can be done by using ipchains with the following:
|
||||
|
||||
<tscreen><verb>
|
||||
silom# ipchains -N allow1
|
||||
silom# ipchains -A allow1 -p TCP -s 10.0.0.0/19 -d 0/0 80 -j REDIRECT 3128
|
||||
silom# ipchains -I input -j allow1
|
||||
</verb></tscreen>
|
||||
|
||||
<p>
|
||||
|
||||
Or, in netfilter lingo:
|
||||
<tscreen><verb>
|
||||
silom# iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3128
|
||||
</verb></tscreen>
|
||||
|
||||
(note: you might have other entries as well)
|
||||
|
||||
<p>
|
||||
For more information on setting Squid server please refer
|
||||
to Squid faq page on <url
|
||||
url="http://squid.nlanr.net" name="http://squid.nlanr.net">).
|
||||
|
||||
|
||||
<p>
|
||||
Make sure ip forwarding is enabled on this server and the default
|
||||
gateway for this server is donmuang router (NOT naret).
|
||||
|
||||
|
||||
|
||||
<tscreen><verb>
|
||||
Naret
|
||||
-----
|
||||
-setup iptables and iproute2
|
||||
-disable icmp REDIRECT messages (if needed)
|
||||
</verb></tscreen>
|
||||
|
||||
<enum>
|
||||
<item>"Mark" packets of destination port 80 with value 2
|
||||
<tscreen><verb>
|
||||
naret# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 80 \
|
||||
-j MARK --set-mark 2
|
||||
</verb></tscreen>
|
||||
</item>
|
||||
<item>Setup iproute2 so it will route packets with "mark" 2 to silom
|
||||
<tscreen><verb>
|
||||
naret# echo 202 www.out >> /etc/iproute2/rt_tables
|
||||
naret# ip rule add fwmark 2 table www.out
|
||||
naret# ip route add default via 10.0.0.2 dev eth0 table www.out
|
||||
naret# ip route flush cache
|
||||
|
||||
</verb></tscreen>
|
||||
<p>
|
||||
If donmuang and naret is on the same subnet then
|
||||
naret should not send out icmp REDIRECT messages.
|
||||
In this case it is, so icmp REDIRECTs has to be
|
||||
disabled by:
|
||||
<tscreen><verb>
|
||||
naret# echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects
|
||||
naret# echo 0 > /proc/sys/net/ipv4/conf/default/send_redirects
|
||||
naret# echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects
|
||||
|
||||
</verb></tscreen>
|
||||
</item>
|
||||
</enum>
|
||||
|
||||
The setup is complete, check the configuration
|
||||
|
||||
<tscreen><verb>
|
||||
On naret:
|
||||
|
||||
naret# iptables -t mangle -L
|
||||
Chain PREROUTING (policy ACCEPT)
|
||||
target prot opt source destination
|
||||
MARK tcp -- anywhere anywhere tcp dpt:www MARK set 0x2
|
||||
|
||||
Chain OUTPUT (policy ACCEPT)
|
||||
target prot opt source destination
|
||||
|
||||
naret# ip rule ls
|
||||
0: from all lookup local
|
||||
32765: from all fwmark 2 lookup www.out
|
||||
32766: from all lookup main
|
||||
32767: from all lookup default
|
||||
|
||||
naret# ip route list table www.out
|
||||
default via 203.114.224.8 dev eth0
|
||||
|
||||
naret# ip route
|
||||
10.0.0.1 dev eth0 scope link
|
||||
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.1
|
||||
127.0.0.0/8 dev lo scope link
|
||||
default via 10.0.0.3 dev eth0
|
||||
|
||||
(make sure silom belongs to one of the above lines, in this case
|
||||
it's the line with 10.0.0.0/24)
|
||||
|
||||
|------|
|
||||
|-DONE-|
|
||||
|------|
|
||||
|
||||
|
||||
</verb></tscreen>
|
||||
<sect2>Traffic flow diagram after implementation
|
||||
<p>
|
||||
<tscreen><verb>
|
||||
|
||||
|-----------------------------------------|
|
||||
|Traffic flow diagram after implementation|
|
||||
|-----------------------------------------|
|
||||
|
||||
INTERNET
|
||||
/\
|
||||
||
|
||||
\/
|
||||
-----------------donmuang router---------------------
|
||||
/\ /\ ||
|
||||
|| || ||
|
||||
|| \/ ||
|
||||
naret silom ||
|
||||
*destination port 80 traffic=========>(cache) ||
|
||||
/\ || ||
|
||||
|| \/ \/
|
||||
\\===================================kaosarn, RAS, etc.
|
||||
|
||||
</verb></tscreen>
|
||||
|
||||
Note that the network is asymmetric as there is one extra hop on
|
||||
general outgoing path.
|
||||
|
||||
<tscreen><verb>
|
||||
Here is run down for packet traversing the network from kaosarn
|
||||
to and from the internet.
|
||||
|
||||
For web/http traffic:
|
||||
kaosarn http request->naret->silom->donmuang->internet
|
||||
http replies from internet->donmuang->silom->kaosarn
|
||||
|
||||
For non-web/http requests(eg. telnet):
|
||||
kaosarn outgoing data->naret->donmuang->internet
|
||||
incoming data from internet->donmuang->kaosarn
|
||||
</verb></tscreen>
|
||||
|
||||
|
||||
|
||||
<sect>Advanced Linux Routing
|
||||
<p>
|
||||
This section is for all you people who either want to understand why the
|
||||
|
@ -2178,7 +2464,7 @@ Lists the steps the kernel takes to classify a packet, etc...
|
|||
FIXME: Write this.
|
||||
|
||||
<sect1>Advanced uses of the packet queueing system
|
||||
<p>Go through Alexeys extremely tricky example involving the unused bits
|
||||
<p>Go through Alexey's extremely tricky example involving the unused bits
|
||||
in the TOS field.
|
||||
|
||||
FIXME: Write this.
|
||||
|
@ -2262,7 +2548,8 @@ it is Linux specific, but it does a fair job discussing the theory and uses
|
|||
of CBQ.
|
||||
Very technical stuff, but good reading for those so inclined.
|
||||
|
||||
<tag><url url="http://ceti.pl/%7ekravietz/cbq/NET4_tc.html" name="http://ceti.pl/%7ekravietz/cbq/NET4_tc.html"></tag>
|
||||
<tag><url url="http://ceti.pl/~kravietz/cbq/NET4_tc.html"
|
||||
name="http://ceti.pl/~kravietz/cbq/NET4_tc.html"></tag>
|
||||
Yet another HOWTO, this time in Polish! You can copy/paste command lines
|
||||
however, they work just the same in every language. The author is
|
||||
cooperating with us and may soon author sections of this HOWTO.
|
||||
|
@ -2291,20 +2578,23 @@ well.
|
|||
<sect>Acknowledgements
|
||||
<p>
|
||||
It is our goal to list everybody who has contributed to this HOWTO, or
|
||||
helped us demistify how things work. While there are currently no plans
|
||||
helped us demystify how things work. While there are currently no plans
|
||||
for a Netfilter type scoreboard, we do like to recognise the people who are
|
||||
helping.
|
||||
|
||||
<itemize>
|
||||
<item>Jamal Hadi <hadi%cyberus.ca>
|
||||
<item>Nadeem Hasan <nhasan@usa.net>
|
||||
<item>Philippe Latu <philippe.latu%linux-france.org>
|
||||
<item>Jason Lunz <j@cc.gatech.edu>
|
||||
<item>Alexey Mahotkin <alexm@formulabez.ru>
|
||||
<item>Pawel Krawczyk <kravietz%alfa.ceti.pl>
|
||||
<item>Wim van der Most
|
||||
<item>Ram Narula <ram@princess1.net>
|
||||
<item>Rusty Rusell (with apologies for always misspelling your name)
|
||||
<item>Charles Tassell <ctassell%isn.net>
|
||||
<item>Glen Turner <glen.turner%aarnet.edu.au>
|
||||
<item>Song Wang <wsong@ece.uci.edu>
|
||||
</itemize>
|
||||
|
||||
</article>
|
||||
|
||||
|
|
Loading…
Reference in New Issue