LDP/LDP/guide/docbook/Linux-Networking/Services.xml

7039 lines
324 KiB
XML
Raw Blame History

<sect1 id="Services">
<title>Services</title>
<para>
</para>
</sect1 id="Services">
<sect1 id="Database">
<title>Database</title>
<para>
Most databases are supported under Linux, including Oracle, DB2, Sybase, Informix, MySQL, PostgreSQL,
InterBase and Paradox. Databases, and the Structures Query Language they work with, are complex, and this
chapter has neither the space or depth to deal with them. Read the next section on PHP to learn how to set
a dynamically generated Web portal in about five minutes.
We'll be using MySQL because it's extremely fast, capable of handling large databases (200G databases aren't
unheard of), and has recently been made open source. It also works well with PHP. While currently
lacking transaction support (due to speed concerns), a future version of MySQL will have this opt
</para>
* Connecting to MS SQL 6.x+ via Openlink/PHP/ODBC mini-HOWTO
* Sybase Adaptive Server Anywhere for Linux HOWTO
</sect1 id="Database">
<sect1 id="DHCP">
<title>DHCP</title>
<para>
Endeavouring to maintain static IP addressing to maintain static IP addressing
information, such as IP addresses, subnet masks, DNS names and other
information on client machines can be difficult. Documentation becomes lost or
out-of-date, and network reconfigurations require details to be modified
manually on every machine.
</para>
<para>
DHCP (Dynamic Host Configuration Protocol) solves this problem by providing
arbitrary information (including IP addressing) to clients upon request.
Almost all client OSes support it and it is standard in most large networks.
</para>
<para>
The impact that it has is most prevalent it eases network administration,
especially in large networks or networks which have lots of mobile users.
</para>
2. DHCP protocol
DHCP (Dynamic Host Configuration Protocol), is used to control
vital networking parameters of hosts (running clients) with the help
of a server. DHCP is backward compatible with BOOTP. For more
information see RFC 2131 (old RFC 1541) and other. (See Internet
Resources section at the end of the document). You can also read
[32]http://web.syr.edu/~jmwobus/comfaqs/dhcp.faq.html.
4.5. Other interesting documents
Linux Magazine has a pretty good article in their April issue called
[62]Network Nirvana: How to make Network Configuration as easy as DHCP
that discusses the set up for DHCP.
References
1. DHCP.html#AEN17
2. DHCP.html#AEN19
3. DHCP.html#AEN24
4. DHCP.html#AEN41
5. DHCP.html#AEN45
6. DHCP.html#AEN64
7. DHCP.html#AEN69
8. DHCP.html#AEN74
9. DHCP.html#AEN77
10. DHCP.html#SLACKWARE
11. DHCP.html#REDHAT6
12. DHCP.html#AEN166
13. DHCP.html#AEN183
14. DHCP.html#DEBIAN
15. DHCP.html#AEN230
16. DHCP.html#NAMESERVER
17. DHCP.html#AEN293
18. DHCP.html#TROUBLESHOOTING
19. DHCP.html#AEN355
20. DHCP.html#AEN369
21. DHCP.html#DHCPSERVER
22. DHCP.html#AEN382
23. DHCP.html#AEN403
24. DHCP.html#AEN422
25. DHCP.html#AEN440
26. http://www.oswg.org/oswg-nightly/DHCP.html
27. http://www.linux.org.tw/CLDP/mini/DHCP.html
28. http://www.linux.or.jp/JF/JFdocs/DHCP.html
29. ftp://cuates.pue.upaep.mx/pub/linux/LuCAS/DHCP-mini-Como/
30. mailto:vuksan-feedback@veus.hr
31. http://www.opencontent.org/opl.shtml
32. http://web.syr.edu/~jmwobus/comfaqs/dhcp.faq.html
33. mailto:sergei@phystech.com
34. ftp://ftp.phystech.com/pub/
35. http://www.cps.msu.edu/~dunham/out/
36. ftp://metalab.unc.edu/pub/Linux/system/network/daemons
37. ftp://ftp.phystech.com/pub/
38. DHCP.html#NAMESERVER
39. DHCP.html#LINUXPPC-RH6
40. mailto:alexander.stevenson@home.com
41. DHCP.html#NAMESERVER
42. ftp://ftp.redhat.com/pub/redhat/redhat-4.2/i386/RedHat/RPMS/dhcpcd-0.6-2.i386.rpm
43. DHCP.html#SLACKWARE
44. mailto:nothing@cc.gatech.edu
45. DHCP.html#NAMESERVER
46. http://ftp.debian.org/debian/dists/slink/main/binary-i386/net/
47. DHCP.html#SLACKWARE
48. mailto:heiko@os.inf.tu-dresden.de
49. DHCP.html#NAMESERVER
50. DHCP.html#REDHAT6
51. ftp://ftp.linuxppc.org/
52. ftp://ftp.phystech.com/pub/dhcpcd-1.3.17-pl9.tar.gz
53. DHCP.html#TROUBLESHOOTING
54. mailto:nothing@cc.gatech.edu
55. DHCP.html#ERROR3
56. ftp://vanbuer.ddns.org/pub/
57. DHCP.html#DHCPSERVER
58. mailto:mellon@isc.org
59. ftp://ftp.isc.org/isc/dhcp/
60. http://www.kde.org/
61. ftp://ftp.us.kde.org/pub/kde/unstable/apps/network/
62. http://www.linux-mag.com/2000-04/networknirvana_01.html
</sect1 id="DHCP">
<sect1 id="DNS">
<title>DNS</title>
Setting Up Your New Domain Mini-HOWTO.
</sect1 id="DNS">
<sect1 id="FTP">
<title>FTP</title>
<para>
File Transport Protocol (FTP) is an efficient way to transfer files between
machines across networks and clients and servers exist for almost all platforms
making FTP the most convenient (and therefore popular) method of transferring
files. FTP was first developed by the University of California, Berkeley for
inclusion in 4.2BSD (Berkeley Unix). The RFC (Request for Comments)
documents for the protocol is now known as RFC 959 and is available at
ftp://nic.merit.edu/documents/rfc/rfc0959.txt.
</para>
<para>
There are two typical modes of running an FTP server - either anonymously or
account-based. Anonymous FTP servers are by far the most popular; they allow
any machine to access the FTP server and the files stored on it with the same
permissions. No usernames or passwords are transmitted down the wire.
Account-based FTP allows users to login with real usernames and passwords.
While it provides greater access control than anonymous FTP, transmitting real
usernames and password unencrypted over the Internet is generally avoided for
security reasons.
</para>
<para>
An FTP client is the userland application that provides access to FTP
servers. There are many FTP clients available. Some are graphical, and
some are text-based.
</para>
* FTP HOWTO
</sect1 id="FTP">
<sect1 id="LDAP">
<title>LDAP</title>
Information about installing, configuring, running and maintaining a LDAP
(Lightweight Directory Access Protocol) Server on a Linux machine is
presented on this section. This section also presents details about how to
create LDAP databases, how to add, how to update and how to delete
information on the directory. This paper is mostly based on the University of
Michigan LDAP information pages and on the OpenLDAP Administrator's Guide.
</sect1 id="LDAP">
<sect1 id="NFS">
<title>NFS</title>
NFS (Network File System)
The TCP/IP suite's equivalent of file sharing. This protocol operates at the Process/Application
layer of the DOD model, similar to the application layer of the OSI model.
SLIP (Serial Line Internet Protocol) and PPP (Point-to-Point Protocol)
Two protocols commonly used for dial-up access to the Internet. They are typically used with
TCP/IP; while SLIP works only with TCP/IP, PPP can be used with other protocols.
SLIP was the first protocol for dial-up Internet access. It opeates at the physical layer of the
OSI model, and provides a simple interface to a UNIX or other dial-up host for Internet access.
SLIP does not provide security, so authentication is handled through prompts before initiating
the SLIP connection.
PPP is a more recent development. It operates at the physical and data link layers of the OSI
model. In addition to the features of SLIP, PPP supports data compression, security (authentication),
and error control. PPP can also dynamically assign network addresses.
Since PPP provides easier authentication and better security, it should be used for dial-up connections
whenever possible. However, you may need to use SLIRP to communicate with dial-up servers (particularly
older UNIC machines and dedicated hardware servers) that don't support PPP.
> Start Config-HOWTO
2.15. Automount Points
If you don't like the mounting/unmounting thing, consider using autofs(5). You tell the autofs daemon what to automount and where starting with a file, /etc/auto.master. Its structure is simple:
/misc/etc/auto.misc
/mnt/etc/auto.mnt
In this example you tell autofs to automount media in /misc and /mnt, while the mountpoints are specified in/etc/auto.misc and /etc/auto.mnt. An example of /etc/auto.misc:
# an NFS export
server -romy.buddy.net:/pub/export
# removable media
cdrom -fstype=iso9660,ro:/dev/hdb
floppy-fstype=auto:/dev/fd0
Start the automounter. From now on, whenever you try to access the inexistent mount point /misc/cdrom, il will be created and the CD-ROM will be mounted.
>End Config-HOWTO
5.4. Unix Environment
The preferred way to share files in a Unix networking environment is
through NFS. NFS stands for Network File Sharing and it is a protocol
originally developed by Sun Microsystems. It is a way to share files
between machines as if they were local. A client "mounts" a filesystem
"exported" by an NFS server. The mounted filesystem will appear to the
client machine as if it was part of the local filesystem.
It is possible to mount the root filesystem at startup time, thus
allowing diskless clients to boot up and access all files from a
server. In other words, it is possible to have a fully functional
computer without a hard disk.
Coda is a network filesystem (like NFS) that supports disconnected
operation, persistant caching, among other goodies. It's included in
2.2.x kernels. Really handy for slow or unreliable networks and
laptops.
NFS-related documents:
<20> http://metalab.unc.edu/mdw/HOWTO/mini/NFS-Root.html
<20> http://metalab.unc.edu/mdw/HOWTO/Diskless-HOWTO.html
<20> http://metalab.unc.edu/mdw/HOWTO/mini/NFS-Root-Client-mini-
HOWTO/index.html
<20> http://www.redhat.com/support/docs/rhl/NFS-Tips/NFS-Tips.html
<20> http://metalab.unc.edu/mdw/HOWTO/NFS-HOWTO.html
CODA can be found at: http://www.coda.cs.cmu.edu/
<para>
5.4. Unix Environment
The preferred way to share files in a Unix networking environment is
through NFS. NFS stands for Network File Sharing and it is a protocol
originally developed by Sun Microsystems. It is a way to share files
between machines as if they were local. A client "mounts" a filesystem
"exported" by an NFS server. The mounted filesystem will appear to the
client machine as if it was part of the local filesystem.
It is possible to mount the root filesystem at startup time, thus
allowing diskless clients to boot up and access all files from a
server. In other words, it is possible to have a fully functional
computer without a hard disk.
Coda is a network filesystem (like NFS) that supports disconnected
operation, persistant caching, among other goodies. It's included in
2.2.x kernels. Really handy for slow or unreliable networks and
laptops.
NFS-related documents:
<20> http://metalab.unc.edu/mdw/HOWTO/mini/NFS-Root.html
<20> http://metalab.unc.edu/mdw/HOWTO/Diskless-HOWTO.html
<20> http://metalab.unc.edu/mdw/HOWTO/mini/NFS-Root-Client-mini-
HOWTO/index.html
<20> http://www.redhat.com/support/docs/rhl/NFS-Tips/NFS-Tips.html
<20> http://metalab.unc.edu/mdw/HOWTO/NFS-HOWTO.html
CODA can be found at: http://www.coda.cs.cmu.edu/
Samba is the Linux implementation of SMB under Linux. NFS is the Unix equivalent - a way to import and
export local files to and from remote machines. Like SMB, NFS sends information including user
passwords unencrypted, is its best to limit it to within your local network.
As you know, all storage in Linux is visible within a single tree structure, and new hard disks,
CD-ROMs, Zip drives and other spaces are mounted on a particular directory. NFS shares are also
attached to the system in this manner. NFS is included in most Linux kernels, and the tools
necessary to be an NFS server and client come in most distributions.
However, users of Linux kernel 2.2 hoping to use NFS may wish to upgrade to
kernel 2.4; while the earlier version of Linux NFS did work well, it was far slower than
most other Unix implementations of this protocol.
>Start Config-HOWTO
2.15. Automount Points
If you don't like the mounting/unmounting thing, consider using autofs(5). You tell the autofs daemon what to automount and where starting with a file, /etc/auto.master. Its structure is simple:
/misc/etc/auto.misc
/mnt/etc/auto.mnt
In this example you tell autofs to automount media in /misc and /mnt, while the mountpoints are specified in/etc/auto.misc and /etc/auto.mnt. An example of /etc/auto.misc:
# an NFS export
server -romy.buddy.net:/pub/export
# removable media
cdrom -fstype=iso9660,ro:/dev/hdb
floppy-fstype=auto:/dev/fd0
Start the automounter. From now on, whenever you try to access the inexistent mount point /misc/cdrom, il will be created and the CD-ROM will be mounted.
>End Config-HOWTO
> Linux NFS-HOWTO
> NFS-Root mini-HOWTO
> NFS-Root-Client Mini-HOWTO
> The Linux NIS(YP)/NYS/NIS+ HOWTO
</para>
Linux NFS-HOWTO
Tavis Barr
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>tavis dot barr at liu dot edu
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
Nicolai Langfeldt
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>janl at linpro dot no
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
Seth Vidal
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>skvidal at phy dot duke dot edu
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
Tom McNeal
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>trmcneal at attbi dot com
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
2002-08-25
Revision History
Revision v3.1 2002-08-25 Revised by: tavis
Typo in firewalling section in 3.0
Revision v3.0 2002-07-16 Revised by: tavis
Updates plus additions to performance, security
-----------------------------------------------------------------------------
Table of Contents
1. Preamble
1.1. Legal stuff
1.2. Disclaimer
1.3. Feedback
1.4. Translation
1.5. Dedication
2. Introduction
2.1. What is NFS?
2.2. What is this HOWTO and what is it not?
2.3. Knowledge Pre-Requisites
2.4. Software Pre-Requisites: Kernel Version and nfs-utils
2.5. Where to get help and further information
3. Setting Up an NFS Server
3.1. Introduction to the server setup
3.2. Setting up the Configuration Files
3.3. Getting the services started
3.4. Verifying that NFS is running
3.5. Making changes to /etc/exports later on
4. Setting up an NFS Client
4.1. Mounting remote directories
4.2. Getting NFS File Systems to Be Mounted at Boot Time
4.3. Mount options
5. Optimizing NFS Performance
5.1. Setting Block Size to Optimize Transfer Speeds
5.2. Packet Size and Network Drivers
5.3. Overflow of Fragmented Packets
5.4. NFS over TCP
5.5. Timeout and Retransmission Values
5.6. Number of Instances of the NFSD Server Daemon
5.7. Memory Limits on the Input Queue
5.8. Turning Off Autonegotiation of NICs and Hubs
5.9. Synchronous vs. Asynchronous Behavior in NFS
5.10. Non-NFS-Related Means of Enhancing Server Performance
6. Security and NFS
6.1. The portmapper
6.2. Server security: nfsd and mountd
6.3. Client Security
6.4. NFS and firewalls (ipchains and netfilter)
6.5. Tunneling NFS through SSH
6.6. Summary
7. Troubleshooting
7.1. Unable to See Files on a Mounted File System
7.2. File requests hang or timeout waiting for access to the file.
7.3. Unable to mount a file system
7.4. I do not have permission to access files on the mounted volume.
7.5. When I transfer really big files, NFS takes over all the CPU cycles
on the server and it screeches to a halt.
7.6. Strange error or log messages
7.7. Real permissions don't match what's in /etc/exports.
7.8. Flaky and unreliable behavior
7.9. nfsd won't start
7.10. File Corruption When Using Multiple Clients
8. Using Linux NFS with Other OSes
8.1. AIX
8.2. BSD
8.3. Tru64 Unix
8.4. HP-UX
8.5. IRIX
8.6. Solaris
8.7. SunOS
1. Preamble
1.1. Legal stuff
Copyright (c) <2002> by Tavis Barr, Nicolai Langfeldt, Seth Vidal, and Tom
McNeal. This material may be distributed only subject to the terms and
conditions set forth in the Open Publication License, v1.0 or later (the
latest version is presently available at [http://www.opencontent.org/openpub
/] http://www.opencontent.org/openpub/).
-----------------------------------------------------------------------------
1.2. Disclaimer
This document is provided without any guarantees, including merchantability
or fitness for a particular use. The maintainers cannot be responsible if
following instructions in this document leads to damaged equipment or data,
angry neighbors, strange habits, divorce, or any other calamity.
-----------------------------------------------------------------------------
1.3. Feedback
This will never be a finished document; we welcome feedback about how it can
be improved. As of February 2002, the Linux NFS home page is being hosted at
[http://nfs.sourceforge.net] http://nfs.sourceforge.net. Check there for
mailing lists, bug fixes, and updates, and also to verify who currently
maintains this document.
-----------------------------------------------------------------------------
1.4. Translation
If you are able to translate this document into another language, we would be
grateful and we will also do our best to assist you. Please notify the
maintainers.
-----------------------------------------------------------------------------
1.5. Dedication
NFS on Linux was made possible by a collaborative effort of many people, but
a few stand out for special recognition. The original version was developed
by Olaf Kirch and Alan Cox. The version 3 server code was solidified by Neil
Brown, based on work from Saadia Khan, James Yarbrough, Allen Morris, H.J.
Lu, and others (including himself). The client code was written by Olaf Kirch
and updated by Trond Myklebust. The version 4 lock manager was developed by
Saadia Khan. Dave Higgen and H.J. Lu both have undertaken the thankless job
of extensive maintenance and bug fixes to get the code to actually work the
way it was supposed to. H.J. has also done extensive development of the
nfs-utils package. Of course this dedication is leaving many people out.
The original version of this document was developed by Nicolai Langfeldt. It
was heavily rewritten in 2000 by Tavis Barr and Seth Vidal to reflect
substantial changes in the workings of NFS for Linux developed between the
2.0 and 2.4 kernels. It was edited again in February 2002, when Tom McNeal
made substantial additions to the performance section. Thomas Emmel, Neil
Brown, Trond Myklebust, Erez Zadok, and Ion Badulescu also provided valuable
comments and contributions.
-----------------------------------------------------------------------------
2. Introduction
2.1. What is NFS?
The Network File System (NFS) was developed to allow machines to mount a disk
partition on a remote machine as if it were on a local hard drive. This
allows for fast, seamless sharing of files across a network.
It also gives the potential for unwanted people to access your hard drive
over the network (and thereby possibly read your email and delete all your
files as well as break into your system) if you set it up incorrectly. So
please read the Security section of this document carefully if you intend to
implement an NFS setup.
There are other systems that provide similar functionality to NFS. Samba
([http://www.samba.org] http://www.samba.org) provides file services to
Windows clients. The Andrew File System from IBM ([http://www.transarc.com/
Product/EFS/AFS/index.html] http://www.transarc.com/Product/EFS/AFS/
index.html), recently open-sourced, provides a file sharing mechanism with
some additional security and performance features. The Coda File System
([http://www.coda.cs.cmu.edu/] http://www.coda.cs.cmu.edu/) is still in
development as of this writing but is designed to work well with disconnected
clients. Many of the features of the Andrew and Coda file systems are slated
for inclusion in the next version of NFS (Version 4) ([http://www.nfsv4.org]
http://www.nfsv4.org). The advantage of NFS today is that it is mature,
standard, well understood, and supported robustly across a variety of
platforms.
-----------------------------------------------------------------------------
2.2. What is this HOWTO and what is it not?
This HOWTO is intended as a complete, step-by-step guide to setting up NFS
correctly and effectively. Setting up NFS involves two steps, namely
configuring the server and then configuring the client. Each of these steps
is dealt with in order. The document then offers some tips for people with
particular needs and hardware setups, as well as security and troubleshooting
advice.
This HOWTO is not a description of the guts and underlying structure of NFS.
For that you may wish to read Linux NFS and Automounter Administration by
Erez Zadok (Sybex, 2001). The classic NFS book, updated and still quite
useful, is Managing NFS and NIS by Hal Stern, published by O'Reilly &
Associates, Inc. A much more advanced technical description of NFS is
available in NFS Illustrated by Brent Callaghan.
This document is also not intended as a complete reference manual, and does
not contain an exhaustive list of the features of Linux NFS. For that, you
can look at the man pages for nfs(5), exports(5), mount(8), fstab(5), nfsd(8)
, lockd(8), statd(8), rquotad(8), and mountd(8).
It will also not cover PC-NFS, which is considered obsolete (users are
encouraged to use Samba to share files with Windows machines) or NFS Version
4, which is still in development.
-----------------------------------------------------------------------------
2.3. Knowledge Pre-Requisites
You should know some basic things about TCP/IP networking before reading this
HOWTO; if you are in doubt, read the Networking- Overview-HOWTO.
-----------------------------------------------------------------------------
2.4. Software Pre-Requisites: Kernel Version and nfs-utils
The difference between Version 2 NFS and version 3 NFS will be explained
later on; for now, you might simply take the suggestion that you will need
NFS Version 3 if you are installing a dedicated or high-volume file server.
NFS Version 2 should be fine for casual use.
NFS Version 2 has been around for quite some time now (at least since the 1.2
kernel series) however you will need a kernel version of at least 2.2.18 if
you wish to do any of the following:
<EFBFBD><EFBFBD>*<2A>Mix Linux NFS with other operating systems' NFS
<EFBFBD><EFBFBD>*<2A>Use file locking reliably over NFS
<EFBFBD><EFBFBD>*<2A>Use NFS Version 3.
There are also patches available for kernel versions above 2.2.14 that
provide the above functionality. Some of them can be downloaded from the
Linux NFS homepage. If your kernel version is 2.2.14- 2.2.17 and you have the
source code on hand, you can tell if these patches have been added because
NFS Version 3 server support will be a configuration option. However, unless
you have some particular reason to use an older kernel, you should upgrade
because many bugs have been fixed along the way. Kernel 2.2.19 contains some
additional locking improvements over 2.2.18.
Version 3 functionality will also require the nfs-utils package of at least
version 0.1.6, and mount version 2.10m or newer. However because nfs-utils
and mount are fully backwards compatible, and because newer versions have
lots of security and bug fixes, there is no good reason not to install the
newest nfs-utils and mount packages if you are beginning an NFS setup.
All 2.4 and higher kernels have full NFS Version 3 functionality.
In all cases, if you are building your own kernel, you will need to select
NFS and NFS Version 3 support at compile time. Most (but not all) standard
distributions come with kernels that support NFS version 3.
Handling files larger than 2 GB will require a 2.4x kernel and a 2.2.x
version of glibc.
All kernels after 2.2.18 support NFS over TCP on the client side. As of this
writing, server-side NFS over TCP only exists in a buggy form as an
experimental option in the post-2.2.18 series; patches for 2.4 and 2.5
kernels have been introduced starting with 2.4.17 and 2.5.6. The patches are
believed to be stable, though as of this writing they are relatively new and
have not seen widespread use or integration into the mainstream 2.4 kernel.
Because so many of the above functionalities were introduced in kernel
version 2.2.18, this document was written to be consistent with kernels above
this version (including 2.4.x). If you have an older kernel, this document
may not describe your NFS system correctly.
As we write this document, NFS version 4 has only recently been finalized as
a protocol, and no implementations are considered production-ready. It will
not be dealt with here.
-----------------------------------------------------------------------------
2.5. Where to get help and further information
As of November 2000, the Linux NFS homepage is at [http://
nfs.sourceforge.net] http://nfs.sourceforge.net. Please check there for NFS
related mailing lists as well as the latest version of nfs-utils, NFS kernel
patches, and other NFS related packages.
When you encounter a problem or have a question not covered in this manual,
the faq or the man pages, you should send a message to the nfs mailing list
(<nfs@lists.sourceforge.net>). To best help the developers and other users
help you assess your problem you should include:
<EFBFBD><EFBFBD>*<2A>the version of nfs-utils you are using
<EFBFBD><EFBFBD>*<2A>the version of the kernel and any non-stock applied kernels.
<EFBFBD><EFBFBD>*<2A>the distribution of linux you are using
<EFBFBD><EFBFBD>*<2A>the version(s) of other operating systems involved.
It is also useful to know the networking configuration connecting the hosts.
If your problem involves the inability mount or export shares please also
include:
<EFBFBD><EFBFBD>*<2A>a copy of your /etc/exports file
<EFBFBD><EFBFBD>*<2A>the output of rpcinfo -p localhost run on the server
<EFBFBD><EFBFBD>*<2A>the output of rpcinfo -p servername run on the client
Sending all of this information with a specific question, after reading all
the documentation, is the best way to ensure a helpful response from the
list.
You may also wish to look at the man pages for nfs(5), exports(5), mount(8),
fstab(5), nfsd(8), lockd(8), statd(8), rquotad(8), and mountd(8).
-----------------------------------------------------------------------------
3. Setting Up an NFS Server
3.1. Introduction to the server setup
It is assumed that you will be setting up both a server and a client. If you
are just setting up a client to work off of somebody else's server (say in
your department), you can skip to Section 4. However, every client that is
set up requires modifications on the server to authorize that client (unless
the server setup is done in a very insecure way), so even if you are not
setting up a server you may wish to read this section to get an idea what
kinds of authorization problems to look out for.
Setting up the server will be done in two steps: Setting up the configuration
files for NFS, and then starting the NFS services.
-----------------------------------------------------------------------------
3.2. Setting up the Configuration Files
There are three main configuration files you will need to edit to set up an
NFS server: /etc/exports, /etc/hosts.allow, and /etc/hosts.deny. Strictly
speaking, you only need to edit /etc/exports to get NFS to work, but you
would be left with an extremely insecure setup. You may also need to edit
your startup scripts; see Section 3.3.3 for more on that.
-----------------------------------------------------------------------------
3.2.1. /etc/exports
This file contains a list of entries; each entry indicates a volume that is
shared and how it is shared. Check the man pages (man exports) for a complete
description of all the setup options for the file, although the description
here will probably satistfy most people's needs.
An entry in /etc/exports will typically look like this:
directory machine1(option11,option12) machine2(option21,option22)
where
directory
the directory that you want to share. It may be an entire volume though
it need not be. If you share a directory, then all directories under it
within the same file system will be shared as well.
machine1 and machine2
client machines that will have access to the directory. The machines may
be listed by their DNS address or their IP address (e.g.,
machine.company.com or 192.168.0.8). Using IP addresses is more reliable
and more secure. If you need to use DNS addresses, and they do not seem
to be resolving to the right machine, see Section 7.3.
optionxx
the option listing for each machine will describe what kind of access
that machine will have. Important options are:
<20><>+<2B>ro: The directory is shared read only; the client machine will not be
able to write to it. This is the default.
<20><>+<2B>rw: The client machine will have read and write access to the
directory.
<20><>+<2B>no_root_squash: By default, any file request made by user root on the
client machine is treated as if it is made by user nobody on the
server. (Excatly which UID the request is mapped to depends on the
UID of user "nobody" on the server, not the client.) If
no_root_squash is selected, then root on the client machine will have
the same level of access to the files on the system as root on the
server. This can have serious security implications, although it may
be necessary if you want to perform any administrative work on the
client machine that involves the exported directories. You should not
specify this option without a good reason.
<20><>+<2B>no_subtree_check: If only part of a volume is exported, a routine
called subtree checking verifies that a file that is requested from
the client is in the appropriate part of the volume. If the entire
volume is exported, disabling this check will speed up transfers.
<20><>+<2B>sync: By default, all but the most recent version (version 1.11) of
the exportfs command will use async behavior, telling a client
machine that a file write is complete - that is, has been written to
stable storage - when NFS has finished handing the write over to the
filesysytem. This behavior may cause data corruption if the server
reboots, and the sync option prevents this. See Section 5.9 for a
complete discussion of sync and async behavior.
Suppose we have two client machines, slave1 and slave2, that have IP
addresses 192.168.0.1 and 192.168.0.2, respectively. We wish to share our
software binaries and home directories with these machines. A typical setup
for /etc/exports might look like this:
+---------------------------------------------------------------------------+
| /usr/local 192.168.0.1(ro) 192.168.0.2(ro) |
| /home 192.168.0.1(rw) 192.168.0.2(rw) |
| |
+---------------------------------------------------------------------------+
Here we are sharing /usr/local read-only to slave1 and slave2, because it
probably contains our software and there may not be benefits to allowing
slave1 and slave2 to write to it that outweigh security concerns. On the
other hand, home directories need to be exported read-write if users are to
save work on them.
If you have a large installation, you may find that you have a bunch of
computers all on the same local network that require access to your server.
There are a few ways of simplifying references to large numbers of machines.
First, you can give access to a range of machines at once by specifying a
network and a netmask. For example, if you wanted to allow access to all the
machines with IP addresses between 192.168.0.0 and 192.168.0.255 then you
could have the entries:
+---------------------------------------------------------------------------+
| /usr/local 192.168.0.0/255.255.255.0(ro) |
| /home 192.168.0.0/255.255.255.0(rw) |
| |
+---------------------------------------------------------------------------+
See the [http://www.linuxdoc.org/HOWTO/Networking-Overview-HOWTO.html]
Networking-Overview HOWTO for further information about how netmasks work,
and you may also wish to look at the man pages for init and hosts.allow.
Second, you can use NIS netgroups in your entry. To specify a netgroup in
your exports file, simply prepend the name of the netgroup with an "@". See
the [http://www.linuxdoc.org/HOWTO/NIS-HOWTO.html] NIS HOWTO for details on
how netgroups work.
Third, you can use wildcards such as *.foo.com or 192.168. instead of
hostnames. There were problems with wildcard implementation in the 2.2 kernel
series that were fixed in kernel 2.2.19.
However, you should keep in mind that any of these simplifications could
cause a security risk if there are machines in your netgroup or local network
that you do not trust completely.
A few cautions are in order about what cannot (or should not) be exported.
First, if a directory is exported, its parent and child directories cannot be
exported if they are in the same filesystem. However, exporting both should
not be necessary because listing the parent directory in the /etc/exports
file will cause all underlying directories within that file system to be
exported.
Second, it is a poor idea to export a FAT or VFAT (i.e., MS-DOS or Windows 95
/98) filesystem with NFS. FAT is not designed for use on a multi-user
machine, and as a result, operations that depend on permissions will not work
well. Moreover, some of the underlying filesystem design is reported to work
poorly with NFS's expectations.
Third, device or other special files may not export correctly to non-Linux
clients. See Section 8 for details on particular operating systems.
-----------------------------------------------------------------------------
3.2.2. /etc/hosts.allow and /etc/hosts.deny
These two files specify which computers on the network can use services on
your machine. Each line of the file contains a single entry listing a service
and a set of machines. When the server gets a request from a machine, it does
the following:
<EFBFBD><EFBFBD>*<2A>It first checks hosts.allow to see if the machine matches a description
listed in there. If it does, then the machine is allowed access.
<EFBFBD><EFBFBD>*<2A>If the machine does not match an entry in hosts.allow, the server then
checks hosts.deny to see if the client matches a listing in there. If it
does then the machine is denied access.
<EFBFBD><EFBFBD>*<2A>If the client matches no listings in either file, then it is allowed
access.
In addition to controlling access to services handled by inetd (such as
telnet and FTP), this file can also control access to NFS by restricting
connections to the daemons that provide NFS services. Restrictions are done
on a per-service basis.
The first daemon to restrict access to is the portmapper. This daemon
essentially just tells requesting clients how to find all the NFS services on
the system. Restricting access to the portmapper is the best defense against
someone breaking into your system through NFS because completely unauthorized
clients won't know where to find the NFS daemons. However, there are two
things to watch out for. First, restricting portmapper isn't enough if the
intruder already knows for some reason how to find those daemons. And second,
if you are running NIS, restricting portmapper will also restrict requests to
NIS. That should usually be harmless since you usually want to restrict NFS
and NIS in a similar way, but just be cautioned. (Running NIS is generally a
good idea if you are running NFS, because the client machines need a way of
knowing who owns what files on the exported volumes. Of course there are
other ways of doing this such as syncing password files. See the [http://
www.linuxdoc.org/HOWTO/NIS-HOWTO.html] NIS HOWTO for information on setting
up NIS.)
In general it is a good idea with NFS (as with most internet services) to
explicitly deny access to IP addresses that you don't need to allow access
to.
The first step in doing this is to add the followng entry to /etc/hosts.deny:
+---------------------------------------------------------------------------+
| portmap:ALL |
| |
+---------------------------------------------------------------------------+
Starting with nfs-utils 0.2.0, you can be a bit more careful by controlling
access to individual daemons. It's a good precaution since an intruder will
often be able to weasel around the portmapper. If you have a newer version of
nfs-utils, add entries for each of the NFS daemons (see the next section to
find out what these daemons are; for now just put entries for them in
hosts.deny):
+---------------------------------------------------------------------------+
| lockd:ALL |
| mountd:ALL |
| rquotad:ALL |
| statd:ALL |
| |
+---------------------------------------------------------------------------+
Even if you have an older version of nfs-utils, adding these entries is at
worst harmless (since they will just be ignored) and at best will save you
some trouble when you upgrade. Some sys admins choose to put the entry ALL:
ALL in the file /etc/hosts.deny, which causes any service that looks at these
files to deny access to all hosts unless it is explicitly allowed. While this
is more secure behavior, it may also get you in trouble when you are
installing new services, you forget you put it there, and you can't figure
out for the life of you why they won't work.
Next, we need to add an entry to hosts.allow to give any hosts access that we
want to have access. (If we just leave the above lines in hosts.deny then
nobody will have access to NFS.) Entries in hosts.allow follow the format
+---------------------------------------------------------------------------+
| service: host [or network/netmask] , host [or network/netmask] |
| |
+---------------------------------------------------------------------------+
Here, host is IP address of a potential client; it may be possible in some
versions to use the DNS name of the host, but it is strongly discouraged.
Suppose we have the setup above and we just want to allow access to
slave1.foo.com and slave2.foo.com, and suppose that the IP addresses of these
machines are 192.168.0.1 and 192.168.0.2, respectively. We could add the
following entry to /etc/hosts.allow:
+---------------------------------------------------------------------------+
| portmap: 192.168.0.1 , 192.168.0.2 |
| |
+---------------------------------------------------------------------------+
For recent nfs-utils versions, we would also add the following (again, these
entries are harmless even if they are not supported):
+---------------------------------------------------------------------------+
| lockd: 192.168.0.1 , 192.168.0.2 |
| rquotad: 192.168.0.1 , 192.168.0.2 |
| mountd: 192.168.0.1 , 192.168.0.2 |
| statd: 192.168.0.1 , 192.168.0.2 |
| |
+---------------------------------------------------------------------------+
If you intend to run NFS on a large number of machines in a local network, /
etc/hosts.allow also allows for network/netmask style entries in the same
manner as /etc/exports above.
-----------------------------------------------------------------------------
3.3. Getting the services started
3.3.1. Pre-requisites
The NFS server should now be configured and we can start it running. First,
you will need to have the appropriate packages installed. This consists
mainly of a new enough kernel and a new enough version of the nfs-utils
package. See Section 2.4 if you are in doubt.
Next, before you can start NFS, you will need to have TCP/IP networking
functioning correctly on your machine. If you can use telnet, FTP, and so on,
then chances are your TCP networking is fine.
That said, with most recent Linux distributions you may be able to get NFS up
and running simply by rebooting your machine, and the startup scripts should
detect that you have set up your /etc/exports file and will start up NFS
correctly. If you try this, see Section 3.4 Verifying that NFS is running. If
this does not work, or if you are not in a position to reboot your machine,
then the following section will tell you which daemons need to be started in
order to run NFS services. If for some reason nfsd was already running when
you edited your configuration files above, you will have to flush your
configuration; see Section 3.5 for details.
-----------------------------------------------------------------------------
3.3.2. Starting the Portmapper
NFS depends on the portmapper daemon, either called portmap or rpc.portmap.
It will need to be started first. It should be located in /sbin but is
sometimes in /usr/sbin. Most recent Linux distributions start this daemon in
the boot scripts, but it is worth making sure that it is running before you
begin working with NFS (just type ps aux | grep portmap).
-----------------------------------------------------------------------------
3.3.3. The Daemons
NFS serving is taken care of by five daemons: rpc.nfsd, which does most of
the work; rpc.lockd and rpc.statd, which handle file locking; rpc.mountd,
which handles the initial mount requests, and rpc.rquotad, which handles user
file quotas on exported volumes. Starting with 2.2.18, lockd is called by
nfsd upon demand, so you do not need to worry about starting it yourself.
statd will need to be started separately. Most recent Linux distributions
will have startup scripts for these daemons.
The daemons are all part of the nfs-utils package, and may be either in the /
sbin directory or the /usr/sbin directory.
If your distribution does not include them in the startup scripts, then then
you should add them, configured to start in the following order:
rpc.portmap
rpc.mountd, rpc.nfsd
rpc.statd, rpc.lockd (if necessary), and rpc.rquotad
The nfs-utils package has sample startup scripts for RedHat and Debian. If
you are using a different distribution, in general you can just copy the
RedHat script, but you will probably have to take out the line that says:
+---------------------------------------------------------------------------+
| . ../init.d/functions |
| |
+---------------------------------------------------------------------------+
to avoid getting error messages.
-----------------------------------------------------------------------------
3.4. Verifying that NFS is running
To do this, query the portmapper with the command rpcinfo -p to find out what
services it is providing. You should get something like this:
+---------------------------------------------------------------------------+
| program vers proto port |
| 100000 2 tcp 111 portmapper |
| 100000 2 udp 111 portmapper |
| 100011 1 udp 749 rquotad |
| 100011 2 udp 749 rquotad |
| 100005 1 udp 759 mountd |
| 100005 1 tcp 761 mountd |
| 100005 2 udp 764 mountd |
| 100005 2 tcp 766 mountd |
| 100005 3 udp 769 mountd |
| 100005 3 tcp 771 mountd |
| 100003 2 udp 2049 nfs |
| 100003 3 udp 2049 nfs |
| 300019 1 tcp 830 amd |
| 300019 1 udp 831 amd |
| 100024 1 udp 944 status |
| 100024 1 tcp 946 status |
| 100021 1 udp 1042 nlockmgr |
| 100021 3 udp 1042 nlockmgr |
| 100021 4 udp 1042 nlockmgr |
| 100021 1 tcp 1629 nlockmgr |
| 100021 3 tcp 1629 nlockmgr |
| 100021 4 tcp 1629 nlockmgr |
| |
+---------------------------------------------------------------------------+
This says that we have NFS versions 2 and 3, rpc.statd version 1, network
lock manager (the service name for rpc.lockd) versions 1, 3, and 4. There are
also different service listings depending on whether NFS is travelling over
TCP or UDP. Linux systems use UDP by default unless TCP is explicitly
requested; however other OSes such as Solaris default to TCP.
If you do not at least see a line that says portmapper, a line that says nfs,
and a line that says mountd then you will need to backtrack and try again to
start up the daemons (see Section 7, Troubleshooting, if this still doesn't
work).
If you do see these services listed, then you should be ready to set up NFS
clients to access files from your server.
-----------------------------------------------------------------------------
3.5. Making changes to /etc/exports later on
If you come back and change your /etc/exports file, the changes you make may
not take effect immediately. You should run the command exportfs -ra to force
nfsd to re-read the /etc/exports <20> file. If you can't find the exportfs
command, then you can kill nfsd with the -HUP flag (see the man pages for
kill for details).
If that still doesn't work, don't forget to check hosts.allow to make sure
you haven't forgotten to list any new client machines there. Also check the
host listings on any firewalls you may have set up (see Section 7 and Section
6 for more details on firewalls and NFS).
-----------------------------------------------------------------------------
4. Setting up an NFS Client
4.1. Mounting remote directories
Before beginning, you should double-check to make sure your mount program is
new enough (version 2.10m if you want to use Version 3 NFS), and that the
client machine supports NFS mounting, though most standard distributions do.
If you are using a 2.2 or later kernel with the /proc filesystem you can
check the latter by reading the file /proc/filesystems and making sure there
is a line containing nfs. If not, typing insmod nfs may make it magically
appear if NFS has been compiled as a module; otherwise, you will need to
build (or download) a kernel that has NFS support built in. In general,
kernels that do not have NFS compiled in will give a very specific error when
the mount command below is run.
To begin using machine as an NFS client, you will need the portmapper running
on that machine, and to use NFS file locking, you will also need rpc.statd
and rpc.lockd running on both the client and the server. Most recent
distributions start those services by default at boot time; if yours doesn't,
see Section 3.2 for information on how to start them up.
With portmap, lockd, and statd running, you should now be able to mount the
remote directory from your server just the way you mount a local hard drive,
with the mount command. Continuing our example from the previous section,
suppose our server above is called master.foo.com,and we want to mount the /
home directory on slave1.foo.com. Then, all we have to do, from the root
prompt on slave1.foo.com, is type:
+---------------------------------------------------------------------------+
| # mount master.foo.com:/home /mnt/home |
| |
+---------------------------------------------------------------------------+
and the directory /home on master will appear as the directory /mnt/home on
slave1. (Note that this assumes we have created the directory /mnt/home as an
empty mount point beforehand.)
If this does not work, see the Troubleshooting section (Section 7).
You can get rid of the file system by typing
+---------------------------------------------------------------------------+
| # umount /mnt/home |
| |
+---------------------------------------------------------------------------+
just like you would for a local file system.
-----------------------------------------------------------------------------
4.2. Getting NFS File Systems to Be Mounted at Boot Time
NFS file systems can be added to your /etc/fstab file the same way local file
systems can, so that they mount when your system starts up. The only
difference is that the file system type will be set to nfs and the dump and
fsck order (the last two entries) will have to be set to zero. So for our
example above, the entry in /etc/fstab would look like:
# device mountpoint fs-type options dump fsckorder
...
master.foo.com:/home /mnt nfs rw 0 0
...
See the man pages for fstab if you are unfamiliar with the syntax of this
file. If you are using an automounter such as amd or autofs, the options in
the corresponding fields of your mount listings should look very similar if
not identical.
At this point you should have NFS working, though a few tweaks may still be
necessary to get it to work well. You should also read Section 6 to be sure
your setup is reasonably secure.
-----------------------------------------------------------------------------
4.3. Mount options
4.3.1. Soft vs. Hard Mounting
There are some options you should consider adding at once. They govern the
way the NFS client handles a server crash or network outage. One of the cool
things about NFS is that it can handle this gracefully. If you set up the
clients right. There are two distinct failure modes:
soft
If a file request fails, the NFS client will report an error to the
process on the client machine requesting the file access. Some programs
can handle this with composure, most won't. We do not recommend using
this setting; it is a recipe for corrupted files and lost data. You
should especially not use this for mail disks --- if you value your mail,
that is.
hard
The program accessing a file on a NFS mounted file system will hang when
the server crashes. The process cannot be interrupted or killed (except
by a "sure kill") unless you also specify intr. When the NFS server is
back online the program will continue undisturbed from where it was. We
recommend using hard,intr on all NFS mounted file systems.
Picking up the from previous example, the fstab entry would now look like:
# device mountpoint fs-type options dump fsckord
...
master.foo.com:/home /mnt/home nfs rw,hard,intr 0 0
...
-----------------------------------------------------------------------------
4.3.2. Setting Block Size to Optimize Transfer Speeds
The rsize and wsize mount options specify the size of the chunks of data that
the client and server pass back and forth to each other.
The defaults may be too big or to small; there is no size that works well on
all or most setups. On the one hand, some combinations of Linux kernels and
network cards (largely on older machines) cannot handle blocks that large. On
the other hand, if they can handle larger blocks, a bigger size might be
faster.
Getting the block size right is an important factor in performance and is a
must if you are planning to use the NFS server in a production environment.
See Section 5 for details.
-----------------------------------------------------------------------------
5. Optimizing NFS Performance
Careful analysis of your environment, both from the client and from the
server point of view, is the first step necessary for optimal NFS
performance. The first sections will address issues that are generally
important to the client. Later (Section 5.3 and beyond), server side issues
will be discussed. In both cases, these issues will not be limited
exclusively to one side or the other, but it is useful to separate the two in
order to get a clearer picture of cause and effect.
Aside from the general network configuration - appropriate network capacity,
faster NICs, full duplex settings in order to reduce collisions, agreement in
network speed among the switches and hubs, etc. - one of the most important
client optimization settings are the NFS data transfer buffer sizes,
specified by the mount command options rsize and wsize.
-----------------------------------------------------------------------------
5.1. Setting Block Size to Optimize Transfer Speeds
The mount command options rsize and wsize specify the size of the chunks of
data that the client and server pass back and forth to each other. If no
rsize and wsize options are specified, the default varies by which version of
NFS we are using. The most common default is 4K (4096 bytes), although for
TCP-based mounts in 2.2 kernels, and for all mounts beginning with 2.4
kernels, the server specifies the default block size.
The theoretical limit for the NFS V2 protocol is 8K. For the V3 protocol, the
limit is specific to the server. On the Linux server, the maximum block size
is defined by the value of the kernel constant NFSSVC_MAXBLKSIZE, found in
the Linux kernel source file ./include/linux/nfsd/const.h. The current
maximum block size for the kernel, as of 2.4.17, is 8K (8192 bytes), but the
patch set implementing NFS over TCP/IP transport in the 2.4 series, as of
this writing, uses a value of 32K (defined in the patch as 32*1024) for the
maximum block size.
All 2.4 clients currently support up to 32K block transfer sizes, allowing
the standard 32K block transfers across NFS mounts from other servers, such
as Solaris, without client modification.
The defaults may be too big or too small, depending on the specific
combination of hardware and kernels. On the one hand, some combinations of
Linux kernels and network cards (largely on older machines) cannot handle
blocks that large. On the other hand, if they can handle larger blocks, a
bigger size might be faster.
You will want to experiment and find an rsize and wsize that works and is as
fast as possible. You can test the speed of your options with some simple
commands, if your network environment is not heavily used. Note that your
results may vary widely unless you resort to using more complex benchmarks,
such as Bonnie, Bonnie++, or IOzone.
The first of these commands transfers 16384 blocks of 16k each from the
special file /dev/zero (which if you read it just spits out zeros really
fast) to the mounted partition. We will time it to see how long it takes. So,
from the client machine, type:
# time dd if=/dev/zero of=/mnt/home/testfile bs=16k count=16384
This creates a 256Mb file of zeroed bytes. In general, you should create a
file that's at least twice as large as the system RAM on the server, but make
sure you have enough disk space! Then read back the file into the great black
hole on the client machine (/dev/null) by typing the following:
# time dd if=/mnt/home/testfile of=/dev/null bs=16k
Repeat this a few times and average how long it takes. Be sure to unmount and
remount the filesystem each time (both on the client and, if you are zealous,
locally on the server as well), which should clear out any caches.
Then unmount, and mount again with a larger and smaller block size. They
should be multiples of 1024, and not larger than the maximum block size
allowed by your system. Note that NFS Version 2 is limited to a maximum of
8K, regardless of the maximum block size defined by NFSSVC_MAXBLKSIZE;
Version 3 will support up to 64K, if permitted. The block size should be a
power of two since most of the parameters that would constrain it (such as
file system block sizes and network packet size) are also powers of two.
However, some users have reported better successes with block sizes that are
not powers of two but are still multiples of the file system block size and
the network packet size.
Directly after mounting with a larger size, cd into the mounted file system
and do things like ls, explore the filesystem a bit to make sure everything
is as it should. If the rsize/wsize is too large the symptoms are very odd
and not 100% obvious. A typical symptom is incomplete file lists when doing
ls, and no error messages, or reading files failing mysteriously with no
error messages. After establishing that the given rsize/ wsize works you can
do the speed tests again. Different server platforms are likely to have
different optimal sizes.
Remember to edit /etc/fstab to reflect the rsize/wsize you found to be the
most desirable.
If your results seem inconsistent, or doubtful, you may need to analyze your
network more extensively while varying the rsize and wsize values. In that
case, here are several pointers to benchmarks that may prove useful:
<EFBFBD><EFBFBD>*<2A>Bonnie [http://www.textuality.com/bonnie/] http://www.textuality.com/
bonnie/
<EFBFBD><EFBFBD>*<2A>Bonnie++ [http://www.coker.com.au/bonnie++/] http://www.coker.com.au/
bonnie++/
<EFBFBD><EFBFBD>*<2A>IOzone file system benchmark [http://www.iozone.org/] http://
www.iozone.org/
<EFBFBD><EFBFBD>*<2A>The official NFS benchmark, SPECsfs97 [http://www.spec.org/osg/sfs97/]
http://www.spec.org/osg/sfs97/
The easiest benchmark with the widest coverage, including an extensive spread
of file sizes, and of IO types - reads, & writes, rereads & rewrites, random
access, etc. - seems to be IOzone. A recommended invocation of IOzone (for
which you must have root privileges) includes unmounting and remounting the
directory under test, in order to clear out the caches between tests, and
including the file close time in the measurements. Assuming you've already
exported /tmp to everyone from the server foo, and that you've installed
IOzone in the local directory, this should work:
# echo "foo:/tmp /mnt/foo nfs rw,hard,intr,rsize=8192,wsize=8192 0 0"
>> /etc/fstab
# mkdir /mnt/foo
# mount /mnt/foo
# ./iozone -a -R -c -U /mnt/foo -f /mnt/foo/testfile > logfile
The benchmark should take 2-3 hours at most, but of course you will need to
run it for each value of rsize and wsize that is of interest. The web site
gives full documentation of the parameters, but the specific options used
above are:
<EFBFBD><EFBFBD>*<2A>-a Full automatic mode, which tests file sizes of 64K to 512M, using
record sizes of 4K to 16M
<EFBFBD><EFBFBD>*<2A>-R Generate report in excel spreadsheet form (The "surface plot" option
for graphs is best)
<EFBFBD><EFBFBD>*<2A>-c Include the file close time in the tests, which will pick up the NFS
version 3 commit time
<EFBFBD><EFBFBD>*<2A>-U Use the given mount point to unmount and remount between tests; it
clears out caches
<EFBFBD><EFBFBD>*<2A>-f When using unmount, you have to locate the test file in the mounted
file system
-----------------------------------------------------------------------------
5.2. Packet Size and Network Drivers
While many Linux network card drivers are excellent, some are quite shoddy,
including a few drivers for some fairly standard cards. It is worth
experimenting with your network card directly to find out how it can best
handle traffic.
Try pinging back and forth between the two machines with large packets using
the -f and -s options with ping (see ping(8) for more details) and see if a
lot of packets get dropped, or if they take a long time for a reply. If so,
you may have a problem with the performance of your network card.
For a more extensive analysis of NFS behavior in particular, use the nfsstat
command to look at nfs transactions, client and server statistics, network
statistics, and so forth. The "-o net" option will show you the number of
dropped packets in relation to the total number of transactions. In UDP
transactions, the most important statistic is the number of retransmissions,
due to dropped packets, socket buffer overflows, general server congestion,
timeouts, etc. This will have a tremendously important effect on NFS
performance, and should be carefully monitored. Note that nfsstat does not
yet implement the -z option, which would zero out all counters, so you must
look at the current nfsstat counter values prior to running the benchmarks.
To correct network problems, you may wish to reconfigure the packet size that
your network card uses. Very often there is a constraint somewhere else in
the network (such as a router) that causes a smaller maximum packet size
between two machines than what the network cards on the machines are actually
capable of. TCP should autodiscover the appropriate packet size for a
network, but UDP will simply stay at a default value. So determining the
appropriate packet size is especially important if you are using NFS over
UDP.
You can test for the network packet size using the tracepath command: From
the client machine, just type tracepath server 2049 and the path MTU should
be reported at the bottom. You can then set the MTU on your network card
equal to the path MTU, by using the MTU option to ifconfig, and see if fewer
packets get dropped. See the ifconfig man pages for details on how to reset
the MTU.
In addition, netstat -s will give the statistics collected for traffic across
all supported protocols. You may also look at /proc/net/snmp for information
about current network behavior; see the next section for more details.
-----------------------------------------------------------------------------
5.3. Overflow of Fragmented Packets
Using an rsize or wsize larger than your network's MTU (often set to 1500, in
many networks) will cause IP packet fragmentation when using NFS over UDP. IP
packet fragmentation and reassembly require a significant amount of CPU
resource at both ends of a network connection. In addition, packet
fragmentation also exposes your network traffic to greater unreliability,
since a complete RPC request must be retransmitted if a UDP packet fragment
is dropped for any reason. Any increase of RPC retransmissions, along with
the possibility of increased timeouts, are the single worst impediment to
performance for NFS over UDP.
Packets may be dropped for many reasons. If your network topography is
complex, fragment routes may differ, and may not all arrive at the Server for
reassembly. NFS Server capacity may also be an issue, since the kernel has a
limit of how many fragments it can buffer before it starts throwing away
packets. With kernels that support the /proc filesystem, you can monitor the
files /proc/sys/net/ipv4/ipfrag_high_thresh and /proc/sys/net/ipv4/
ipfrag_low_thresh. Once the number of unprocessed, fragmented packets reaches
the number specified by ipfrag_high_thresh (in bytes), the kernel will simply
start throwing away fragmented packets until the number of incomplete packets
reaches the number specified by ipfrag_low_thresh.
Another counter to monitor is IP: ReasmFails in the file /proc/net/snmp; this
is the number of fragment reassembly failures. if it goes up too quickly
during heavy file activity, you may have problem.
-----------------------------------------------------------------------------
5.4. NFS over TCP
A new feature, available for both 2.4 and 2.5 kernels but not yet integrated
into the mainstream kernel at the time of this writing, is NFS over TCP.
Using TCP has a distinct advantage and a distinct disadvantage over UDP. The
advantage is that it works far better than UDP on lossy networks. When using
TCP, a single dropped packet can be retransmitted, without the retransmission
of the entire RPC request, resulting in better performance on lossy networks.
In addition, TCP will handle network speed differences better than UDP, due
to the underlying flow control at the network level.
The disadvantage of using TCP is that it is not a stateless protocol like
UDP. If your server crashes in the middle of a packet transmission, the
client will hang and any shares will need to be unmounted and remounted.
The overhead incurred by the TCP protocol will result in somewhat slower
performance than UDP under ideal network conditions, but the cost is not
severe, and is often not noticable without careful measurement. If you are
using gigabit ethernet from end to end, you might also investigate the usage
of jumbo frames, since the high speed network may allow the larger frame
sizes without encountering increased collision rates, particularly if you
have set the network to full duplex.
-----------------------------------------------------------------------------
5.5. Timeout and Retransmission Values
Two mount command options, timeo and retrans, control the behavior of UDP
requests when encountering client timeouts due to dropped packets, network
congestion, and so forth. The -o timeo option allows designation of the
length of time, in tenths of seconds, that the client will wait until it
decides it will not get a reply from the server, and must try to send the
request again. The default value is 7 tenths of a second. The -o retrans
option allows designation of the number of timeouts allowed before the client
gives up, and displays the Server not responding message. The default value
is 3 attempts. Once the client displays this message, it will continue to try
to send the request, but only once before displaying the error message if
another timeout occurs. When the client reestablishes contact, it will fall
back to using the correct retrans value, and will display the Server OK
message.
If you are already encountering excessive retransmissions (see the output of
the nfsstat command), or want to increase the block transfer size without
encountering timeouts and retransmissions, you may want to adjust these
values. The specific adjustment will depend upon your environment, and in
most cases, the current defaults are appropriate.
-----------------------------------------------------------------------------
5.6. Number of Instances of the NFSD Server Daemon
Most startup scripts, Linux and otherwise, start 8 instances of nfsd. In the
early days of NFS, Sun decided on this number as a rule of thumb, and
everyone else copied. There are no good measures of how many instances are
optimal, but a more heavily-trafficked server may require more. You should
use at the very least one daemon per processor, but four to eight per
processor may be a better rule of thumb. If you are using a 2.4 or higher
kernel and you want to see how heavily each nfsd thread is being used, you
can look at the file /proc/net/rpc/nfsd. The last ten numbers on the th line
in that file indicate the number of seconds that the thread usage was at that
percentage of the maximum allowable. If you have a large number in the top
three deciles, you may wish to increase the number of nfsd instances. This is
done upon starting nfsd using the number of instances as the command line
option, and is specified in the NFS startup script (/etc/rc.d/init.d/nfs on
Red Hat) as RPCNFSDCOUNT. See the nfsd(8) man page for more information.
-----------------------------------------------------------------------------
5.7. Memory Limits on the Input Queue
On 2.2 and 2.4 kernels, the socket input queue, where requests sit while they
are currently being processed, has a small default size limit (rmem_default)
of 64k. This queue is important for clients with heavy read loads, and
servers with heavy write loads. As an example, if you are running 8 instances
of nfsd on the server, each will only have 8k to store write requests while
it processes them. In addition, the socket output queue - important for
clients with heavy write loads and servers with heavy read loads - also has a
small default size (wmem_default).
Several published runs of the NFS benchmark [http://www.spec.org/osg/sfs97/]
SPECsfs specify usage of a much higher value for both the read and write
value sets, [rw]mem_default and [rw]mem_max. You might consider increasing
these values to at least 256k. The read and write limits are set in the proc
file system using (for example) the files /proc/sys/net/core/rmem_default and
/proc/sys/net/core/rmem_max. The rmem_default value can be increased in three
steps; the following method is a bit of a hack but should work and should not
cause any problems:
<EFBFBD><EFBFBD>*<2A>Increase the size listed in the file:
# echo 262144 > /proc/sys/net/core/rmem_default
# echo 262144 > /proc/sys/net/core/rmem_max
<EFBFBD><EFBFBD>*<2A>Restart NFS. For example, on Red Hat systems,
# /etc/rc.d/init.d/nfs restart
<EFBFBD><EFBFBD>*<2A>You might return the size limits to their normal size in case other
kernel systems depend on it:
# echo 65536 > /proc/sys/net/core/rmem_default
# echo 65536 > /proc/sys/net/core/rmem_max
This last step may be necessary because machines have been reported to crash
if these values are left changed for long periods of time.
-----------------------------------------------------------------------------
5.8. Turning Off Autonegotiation of NICs and Hubs
If network cards auto-negotiate badly with hubs and switches, and ports run
at different speeds, or with different duplex configurations, performance
will be severely impacted due to excessive collisions, dropped packets, etc.
If you see excessive numbers of dropped packets in the nfsstat output, or
poor network performance in general, try playing around with the network
speed and duplex settings. If possible, concentrate on establishing a
100BaseT full duplex subnet; the virtual elimination of collisions in full
duplex will remove the most severe performance inhibitor for NFS over UDP. Be
careful when turning off autonegotiation on a card: The hub or switch that
the card is attached to will then resort to other mechanisms (such as
parallel detection) to determine the duplex settings, and some cards default
to half duplex because it is more likely to be supported by an old hub. The
best solution, if the driver supports it, is to force the card to negotiate
100BaseT full duplex.
-----------------------------------------------------------------------------
5.9. Synchronous vs. Asynchronous Behavior in NFS
The default export behavior for both NFS Version 2 and Version 3 protocols,
used by exportfs in nfs-utils versions prior to Version 1.11 (the latter is
in the CVS tree, but not yet released in a package, as of January, 2002) is
"asynchronous". This default permits the server to reply to client requests
as soon as it has processed the request and handed it off to the local file
system, without waiting for the data to be written to stable storage. This is
indicated by the async option denoted in the server's export list. It yields
better performance at the cost of possible data corruption if the server
reboots while still holding unwritten data and/or metadata in its caches.
This possible data corruption is not detectable at the time of occurrence,
since the async option instructs the server to lie to the client, telling the
client that all data has indeed been written to the stable storage,
regardless of the protocol used.
In order to conform with "synchronous" behavior, used as the default for most
proprietary systems supporting NFS (Solaris, HP-UX, RS/6000, etc.), and now
used as the default in the latest version of exportfs, the Linux Server's
file system must be exported with the sync option. Note that specifying
synchronous exports will result in no option being seen in the server's
export list:
<EFBFBD><EFBFBD>*<2A>Export a couple file systems to everyone, using slightly different
options:
# /usr/sbin/exportfs -o rw,sync *:/usr/local
# /usr/sbin/exportfs -o rw *:/tmp
<EFBFBD><EFBFBD>*<2A>Now we can see what the exported file system parameters look like:
# /usr/sbin/exportfs -v
/usr/local *(rw)
/tmp *(rw,async)
If your kernel is compiled with the /proc filesystem, then the file /proc/fs/
nfs/exports will also show the full list of export options.
When synchronous behavior is specified, the server will not complete (that
is, reply to the client) an NFS version 2 protocol request until the local
file system has written all data/metadata to the disk. The server will
complete a synchronous NFS version 3 request without this delay, and will
return the status of the data in order to inform the client as to what data
should be maintained in its caches, and what data is safe to discard. There
are 3 possible status values, defined an enumerated type, nfs3_stable_how, in
include/linux/nfs.h. The values, along with the subsequent actions taken due
to these results, are as follows:
<EFBFBD><EFBFBD>*<2A>NFS_UNSTABLE - Data/Metadata was not committed to stable storage on the
server, and must be cached on the client until a subsequent client commit
request assures that the server does send data to stable storage.
<EFBFBD><EFBFBD>*<2A>NFS_DATA_SYNC - Metadata was not sent to stable storage, and must be
cached on the client. A subsequent commit is necessary, as is required
above.
<EFBFBD><EFBFBD>*<2A>NFS_FILE_SYNC - No data/metadata need be cached, and a subsequent commit
need not be sent for the range covered by this request.
In addition to the above definition of synchronous behavior, the client may
explicitly insist on total synchronous behavior, regardless of the protocol,
by opening all files with the O_SYNC option. In this case, all replies to
client requests will wait until the data has hit the server's disk,
regardless of the protocol used (meaning that, in NFS version 3, all requests
will be NFS_FILE_SYNC requests, and will require that the Server returns this
status). In that case, the performance of NFS Version 2 and NFS Version 3
will be virtually identical.
If, however, the old default async behavior is used, the O_SYNC option has no
effect at all in either version of NFS, since the server will reply to the
client without waiting for the write to complete. In that case the
performance differences between versions will also disappear.
Finally, note that, for NFS version 3 protocol requests, a subsequent commit
request from the NFS client at file close time, or at fsync() time, will
force the server to write any previously unwritten data/metadata to the disk,
and the server will not reply to the client until this has been completed, as
long as sync behavior is followed. If async is used, the commit is
essentially a no-op, since the server once again lies to the client, telling
the client that the data has been sent to stable storage. This again exposes
the client and server to data corruption, since cached data may be discarded
on the client due to its belief that the server now has the data maintained
in stable storage.
-----------------------------------------------------------------------------
5.10. Non-NFS-Related Means of Enhancing Server Performance
In general, server performance and server disk access speed will have an
important effect on NFS performance. Offering general guidelines for setting
up a well-functioning file server is outside the scope of this document, but
a few hints may be worth mentioning:
<EFBFBD><EFBFBD>*<2A>If you have access to RAID arrays, use RAID 1/0 for both write speed and
redundancy; RAID 5 gives you good read speeds but lousy write speeds.
<EFBFBD><EFBFBD>*<2A>A journalling filesystem will drastically reduce your reboot time in the
event of a system crash. Currently, [ftp://ftp.uk.linux.org/pub/linux/sct
/fs/jfs/] ext3 will work correctly with NFS version 3. In addition,
Reiserfs version 3.6 will work with NFS version 3 on 2.4.7 or later
kernels (patches are available for previous kernels). Earlier versions of
Reiserfs did not include room for generation numbers in the inode,
exposing the possibility of undetected data corruption during a server
reboot.
<EFBFBD><EFBFBD>*<2A>Additionally, journalled file systems can be configured to maximize
performance by taking advantage of the fact that journal updates are all
that is necessary for data protection. One example is using ext3 with
data=journal so that all updates go first to the journal, and later to
the main file system. Once the journal has been updated, the NFS server
can safely issue the reply to the clients, and the main file system
update can occur at the server's leisure.
The journal in a journalling file system may also reside on a separate
device such as a flash memory card so that journal updates normally
require no seeking. With only rotational delay imposing a cost, this
gives reasonably good synchronous IO performance. Note that ext3
currently supports journal relocation, and ReiserFS will (officially)
support it soon. The Reiserfs tool package found at [ftp://
ftp.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.x.0k.tar.gz] ftp://
ftp.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.x.0k.tar.gz contains
the reiserfstune tool, which will allow journal relocation. It does,
however, require a kernel patch which has not yet been officially
released as of January, 2002.
<EFBFBD><EFBFBD>*<2A>Using an automounter (such as autofs or amd) may prevent hangs if you
cross-mount files on your machines (whether on purpose or by oversight)
and one of those machines goes down. See the [http://www.linuxdoc.org/
HOWTO/mini/Automount.html] Automount Mini-HOWTO for details.
<EFBFBD><EFBFBD>*<2A>Some manufacturers (Network Appliance, Hewlett Packard, and others)
provide NFS accelerators in the form of Non-Volatile RAM. NVRAM will
boost access speed to stable storage up to the equivalent of async
access.
-----------------------------------------------------------------------------
6. Security and NFS
This list of security tips and explanations will not make your site
completely secure. NOTHING will make your site completely secure. Reading
this section may help you get an idea of the security problems with NFS. This
is not a comprehensive guide and it will always be undergoing changes. If you
have any tips or hints to give us please send them to the HOWTO maintainer.
If you are on a network with no access to the outside world (not even a
modem) and you trust all the internal machines and all your users then this
section will be of no use to you. However, its our belief that there are
relatively few networks in this situation so we would suggest reading this
section thoroughly for anyone setting up NFS.
With NFS, there are two steps required for a client to gain access to a file
contained in a remote directory on the server. The first step is mount
access. Mount access is achieved by the client machine attempting to attach
to the server. The security for this is provided by the /etc/exports file.
This file lists the names or IP addresses for machines that are allowed to
access a share point. If the client's ip address matches one of the entries
in the access list then it will be allowed to mount. This is not terribly
secure. If someone is capable of spoofing or taking over a trusted address
then they can access your mount points. To give a real-world example of this
type of "authentication": This is equivalent to someone introducing
themselves to you and you believing they are who they claim to be because
they are wearing a sticker that says "Hello, My Name is ...." Once the
machine has mounted a volume, its operating system will have access to all
files on the volume (with the possible exception of those owned by root; see
below) and write access to those files as well, if the volume was exported
with the rw option.
The second step is file access. This is a function of normal file system
access controls on the client and not a specialized function of NFS. Once the
drive is mounted the user and group permissions on the files determine access
control.
An example: bob on the server maps to the UserID 9999. Bob makes a file on
the server that is only accessible the user (the equivalent to typing chmod
600 filename). A client is allowed to mount the drive where the file is
stored. On the client mary maps to UserID 9999. This means that the client
user mary can access bob's file that is marked as only accessible by him. It
gets worse: If someone has become superuser on the client machine they can su
- username and become any user. NFS will be none the wiser.
Its not all terrible. There are a few measures you can take on the server to
offset the danger of the clients. We will cover those shortly.
If you don't think the security measures apply to you, you're probably wrong.
In Section 6.1 we'll cover securing the portmapper, server and client
security in Section 6.2 and Section 6.3 respectively. Finally, in Section 6.4
we'll briefly talk about proper firewalling for your nfs server.
Finally, it is critical that all of your nfs daemons and client programs are
current. If you think that a flaw is too recently announced for it to be a
problem for you, then you've probably already been compromised.
A good way to keep up to date on security alerts is to subscribe to the
bugtraq mailinglists. You can read up on how to subscribe and various other
information about bugtraq here: [http://www.securityfocus.com/forums/bugtraq/
faq.html] http://www.securityfocus.com/forums/bugtraq/faq.html
Additionally searching for NFS at [http://www.securityfocus.com]
securityfocus.com's search engine will show you all security reports
pertaining to NFS.
You should also regularly check CERT advisories. See the CERT web page at
[http://www.cert.org] www.cert.org.
-----------------------------------------------------------------------------
6.1. The portmapper
The portmapper keeps a list of what services are running on what ports. This
list is used by a connecting machine to see what ports it wants to talk to
access certain services.
The portmapper is not in as bad a shape as a few years ago but it is still a
point of worry for many sys admins. The portmapper, like NFS and NIS, should
not really have connections made to it outside of a trusted local area
network. If you have to expose them to the outside world - be careful and
keep up diligent monitoring of those systems.
Not all Linux distributions were created equal. Some seemingly up-to-date
distributions do not include a securable portmapper. The easy way to check if
your portmapper is good or not is to run strings(1) and see if it reads the
relevant files, /etc/hosts.deny and /etc/hosts.allow. Assuming your
portmapper is /sbin/portmap you can check it with this command:
strings /sbin/portmap | grep hosts.
On a securable machine it comes up something like this:
+---------------------------------------------------------------------------+
| /etc/hosts.allow |
| /etc/hosts.deny |
| @(#) hosts_ctl.c 1.4 94/12/28 17:42:27 |
| @(#) hosts_access.c 1.21 97/02/12 02:13:22 |
| |
+---------------------------------------------------------------------------+
First we edit /etc/hosts.deny. It should contain the line
+---------------------------------------------------------------------------+
| portmap: ALL |
| |
+---------------------------------------------------------------------------+
which will deny access to everyone. While it is closed run:
+---------------------------------------------------------------------------+
| rpcinfo -p |
| |
+---------------------------------------------------------------------------+
just to check that your portmapper really reads and obeys this file. Rpcinfo
should give no output, or possibly an error message. The files /etc/
hosts.allow and /etc/hosts.deny take effect immediately after you save them.
No daemon needs to be restarted.
Closing the portmapper for everyone is a bit drastic, so we open it again by
editing /etc/hosts.allow. But first we need to figure out what to put in it.
It should basically list all machines that should have access to your
portmapper. On a run of the mill Linux system there are very few machines
that need any access for any reason. The portmapper administers nfsd, mountd,
ypbind/ypserv, rquotad, lockd (which shows up as nlockmgr), statd (which
shows up as status) and 'r' services like ruptime and rusers. Of these only
nfsd, mountd, ypbind/ypserv and perhaps rquotad,lockd and statd are of any
consequence. All machines that need to access services on your machine should
be allowed to do that. Let's say that your machine's address is 192.168.0.254
and that it lives on the subnet 192.168.0.0, and that all machines on the
subnet should have access to it (for an overview of those terms see the the
[http://www.linuxdoc.org/HOWTO/Networking-Overview-HOWTO.html]
Networking-Overview-HOWTO). Then we write:
+---------------------------------------------------------------------------+
| portmap: 192.168.0.0/255.255.255.0 |
| |
+---------------------------------------------------------------------------+
in /etc/hosts.allow. If you are not sure what your network or netmask are,
you can use the ifconfig command to determine the netmask and the netstat
command to determine the network. For, example, for the device eth0 on the
above machine ifconfig should show:
+---------------------------------------------------------------------------+
| ... |
| eth0 Link encap:Ethernet HWaddr 00:60:8C:96:D5:56 |
| inet addr:192.168.0.254 Bcast:192.168.0.255 Mask:255.255.255.0 |
| UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 |
| RX packets:360315 errors:0 dropped:0 overruns:0 |
| TX packets:179274 errors:0 dropped:0 overruns:0 |
| Interrupt:10 Base address:0x320 |
| ... |
| |
+---------------------------------------------------------------------------+
and netstat -rn should show:
+---------------------------------------------------------------------------------+
| Kernel routing table |
| Destination Gateway Genmask Flags Metric Ref Use Iface |
| ... |
| 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 174412 eth0 |
| ... |
| |
+---------------------------------------------------------------------------------+
(The network address is in the first column).
The /etc/hosts.deny and /etc/hosts.allow files are described in the manual
pages of the same names.
IMPORTANT: Do not put anything but IP NUMBERS in the portmap lines of these
files. Host name lookups can indirectly cause portmap activity which will
trigger host name lookups which can indirectly cause portmap activity which
will trigger...
Versions 0.2.0 and higher of the nfs-utils package also use the hosts.allow
and hosts.deny files, so you should put in entries for lockd, statd, mountd,
and rquotad in these files too. For a complete example, see Section 3.2.2.
The above things should make your server tighter. The only remaining problem
is if someone gains administrative access to one of your trusted client
machines and is able to send bogus NFS requests. The next section deals with
safeguards against this problem.
-----------------------------------------------------------------------------
6.2. Server security: nfsd and mountd
On the server we can decide that we don't want to trust any requests made as
root on the client. We can do that by using the root_squash option in /etc/
exports:
/home slave1(rw,root_squash)
This is, in fact, the default. It should always be turned on unless you have
a very good reason to turn it off. To turn it off use the no_root_squash
option.
Now, if a user with UID 0 (i.e., root's user ID number) on the client
attempts to access (read, write, delete) the file system, the server
substitutes the UID of the server's 'nobody' account. Which means that the
root user on the client can't access or change files that only root on the
server can access or change. That's good, and you should probably use
root_squash on all the file systems you export. "But the root user on the
client can still use su to become any other user and access and change that
users files!" say you. To which the answer is: Yes, and that's the way it is,
and has to be with Unix and NFS. This has one important implication: All
important binaries and files should be owned by root, and not bin or other
non-root account, since the only account the clients root user cannot access
is the servers root account. In the exports(5) man page there are several
other squash options listed so that you can decide to mistrust whomever you
(don't) like on the clients.
The TCP ports 1-1024 are reserved for root's use (and therefore sometimes
referred to as "secure ports") A non-root user cannot bind these ports.
Adding the secure option to an /etc/exports means that it will only listed to
requests coming from ports 1-1024 on the client, so that a malicious non-root
user on the client cannot come along and open up a spoofed NFS dialogue on a
non-reserved port. This option is set by default.
-----------------------------------------------------------------------------
6.3. Client Security
6.3.1. The nosuid mount option
On the client we can decide that we don't want to trust the server too much a
couple of ways with options to mount. For example we can forbid suid programs
to work off the NFS file system with the nosuid option. Some unix programs,
such as passwd, are called "suid" programs: They set the id of the person
running them to whomever is the owner of the file. If a file is owned by root
and is suid, then the program will execute as root, so that they can perform
operations (such as writing to the password file) that only root is allowed
to do. Using the nosuid option is a good idea and you should consider using
this with all NFS mounted disks. It means that the server's root user cannot
make a suid-root program on the file system, log in to the client as a normal
user and then use the suid-root program to become root on the client too. One
could also forbid execution of files on the mounted file system altogether
with the noexec option. But this is more likely to be impractical than nosuid
since a file system is likely to at least contain some scripts or programs
that need to be executed.
-----------------------------------------------------------------------------
6.3.2. The broken_suid mount option
Some older programs (xterm being one of them) used to rely on the idea that
root can write everywhere. This is will break under new kernels on NFS
mounts. The security implications are that programs that do this type of suid
action can potentially be used to change your apparent uid on nfs servers
doing uid mapping. So the default has been to disable this broken_suid in the
linux kernel.
The long and short of it is this: If you're using an old linux distribution,
some sort of old suid program or an older unix of some type you might have to
mount from your clients with the broken_suid option to mount. However, most
recent unixes and linux distros have xterm and such programs just as a normal
executable with no suid status, they call programs to do their setuid work.
You enter the above options in the options column, with the rsize and wsize,
separated by commas.
-----------------------------------------------------------------------------
6.3.3. Securing portmapper, rpc.statd, and rpc.lockd on the client
In the current (2.2.18+) implementation of NFS, full file locking is
supported. This means that rpc.statd and rpc.lockd must be running on the
client in order for locks to function correctly. These services require the
portmapper to be running. So, most of the problems you will find with nfs on
the server you may also be plagued with on the client. Read through the
portmapper section above for information on securing the portmapper.
-----------------------------------------------------------------------------
6.4. NFS and firewalls (ipchains and netfilter)
IPchains (under the 2.2.X kernels) and netfilter (under the 2.4.x kernels)
allow a good level of security - instead of relying on the daemon (or perhaps
its TCP wrapper) to determine which machines can connect, the connection
attempt is allowed or disallowed at a lower level. In this case, you can stop
the connection much earlier and more globally, which can protect you from all
sorts of attacks.
Describing how to set up a Linux firewall is well beyond the scope of this
document. Interested readers may wish to read the [http://www.linuxdoc.org/
HOWTO/Firewall-HOWTO.html] Firewall-HOWTO or the [http://www.linuxdoc.org/
HOWTO/IPCHAINS-HOWTO.HTML] IPCHAINS-HOWTO. For users of kernel 2.4 and above
you might want to visit the netfilter webpage at: [http://
netfilter.filewatcher.org] http://netfilter.filewatcher.org. If you are
already familiar with the workings of ipchains or netfilter this section will
give you a few tips on how to better setup your NFS daemons to more easily
firewall and protect them.
A good rule to follow for your firewall configuration is to deny all, and
allow only some - this helps to keep you from accidentally allowing more than
you intended.
In order to understand how to firewall the NFS daemons, it will help to
breifly review how they bind to ports.
When a daemon starts up, it requests a free port from the portmapper. The
portmapper gets the port for the daemon and keeps track of the port currently
used by that daemon. When other hosts or processes need to communicate with
the daemon, they request the port number from the portmapper in order to find
the daemon. So the ports will perpetually float because different ports may
be free at different times and so the portmapper will allocate them
differently each time. This is a pain for setting up a firewall. If you never
know where the daemons are going to be then you don't know precisely which
ports to allow access to. This might not be a big deal for many people
running on a protected or isolated LAN. For those people on a public network,
though, this is horrible.
In kernels 2.4.13 and later with nfs-utils 0.3.3 or later you no longer have
to worry about the floating of ports in the portmapper. Now all of the
daemons pertaining to nfs can be "pinned" to a port. Most of them nicely take
a -p option when they are started; those daemons that are started by the
kernel take some kernel arguments or module options. They are described
below.
Some of the daemons involved in sharing data via nfs are already bound to a
port. portmap is always on port 111 tcp and udp. nfsd is always on port 2049
TCP and UDP (however, as of kernel 2.4.17, NFS over TCP is considered
experimental and is not for use on production machines).
The other daemons, statd, mountd, lockd, and rquotad, will normally move
around to the first available port they are informed of by the portmapper.
To force statd to bind to a particular port, use the -p portnum option. To
force statd to respond on a particular port, additionally use the -o portnum
option when starting it.
To force mountd to bind to a particular port use the -p portnum option.
For example, to have statd broadcast of port 32765 and listen on port 32766,
and mountd listen on port 32767, you would type:
# statd -p 32765 -o 32766
# mountd -p 32767
lockd is started by the kernel when it is needed. Therefore you need to pass
module options (if you have it built as a module) or kernel options to force
lockd to listen and respond only on certain ports.
If you are using loadable modules and you would like to specify these options
in your /etc/modules.conf file add a line like this to the file:
options lockd nlm_udpport=32768 nlm_tcpport=32768
The above line would specify the udp and tcp port for lockd to be 32768.
If you are not using loadable modules or if you have compiled lockd into the
kernel instead of building it as a module then you will need to pass it an
option on the kernel boot line.
It should look something like this:
vmlinuz 3 root=/dev/hda1 lockd.udpport=32768 lockd.tcpport=32768
The port numbers do not have to match but it would simply add unnecessary
confusion if they didn't.
If you are using quotas and using rpc.quotad to make these quotas viewable
over nfs you will need to also take it into account when setting up your
firewall. There are two rpc.rquotad source trees. One of those is maintained
in the nfs-utils tree. The other in the quota-tools tree. They do not operate
identically. The one provided with nfs-utils supports binding the daemon to a
port with the -p directive. The one in quota-tools does not. Consult your
distribution's documentation to determine if yours does.
For the sake of this discussion lets describe a network and setup a firewall
to protect our nfs server. Our nfs server is 192.168.0.42 our client is
192.168.0.45 only. As in the example above, statd has been started so that it
only binds to port 32765 for incoming requests and it must answer on port
32766. mountd is forced to bind to port 32767. lockd's module parameters have
been set to bind to 32768. nfsd is, of course, on port 2049 and the
portmapper is on port 111.
We are not using quotas.
Using IPCHAINS, a simple firewall might look something like this:
ipchains -A input -f -j ACCEPT -s 192.168.0.45
ipchains -A input -s 192.168.0.45 -d 0/0 32765:32768 -p 6 -j ACCEPT
ipchains -A input -s 192.168.0.45 -d 0/0 32765:32768 -p 17 -j ACCEPT
ipchains -A input -s 192.168.0.45 -d 0/0 2049 -p 17 -j ACCEPT
ipchains -A input -s 192.168.0.45 -d 0/0 2049 -p 6 -j ACCEPT
ipchains -A input -s 192.168.0.45 -d 0/0 111 -p 6 -j ACCEPT
ipchains -A input -s 192.168.0.45 -d 0/0 111 -p 17 -j ACCEPT
ipchains -A input -s 0/0 -d 0/0 -p 6 -j DENY -y -l
ipchains -A input -s 0/0 -d 0/0 -p 17 -j DENY -l
The equivalent set of commands in netfilter is:
iptables -A INPUT -f -j ACCEPT -s 192.168.0.45
iptables -A INPUT -s 192.168.0.45 -d 0/0 32765:32768 -p 6 -j ACCEPT
iptables -A INPUT -s 192.168.0.45 -d 0/0 32765:32768 -p 17 -j ACCEPT
iptables -A INPUT -s 192.168.0.45 -d 0/0 2049 -p 17 -j ACCEPT
iptables -A INPUT -s 192.168.0.45 -d 0/0 2049 -p 6 -j ACCEPT
iptables -A INPUT -s 192.168.0.45 -d 0/0 111 -p 6 -j ACCEPT
iptables -A INPUT -s 192.168.0.45 -d 0/0 111 -p 17 -j ACCEPT
iptables -A INPUT -s 0/0 -d 0/0 -p 6 -j DENY --syn --log-level 5
iptables -A INPUT -s 0/0 -d 0/0 -p 17 -j DENY --log-level 5
The first line says to accept all packet fragments (except the first packet
fragment which will be treated as a normal packet). In theory no packet will
pass through until it is reassembled, and it won't be reassembled unless the
first packet fragment is passed. Of course there are attacks that can be
generated by overloading a machine with packet fragments. But NFS won't work
correctly unless you let fragments through. See Section 7.8 for details.
The other lines allow specific connections from any port on our client host
to the specific ports we have made available on our server. This means that
if, say, 192.158.0.46 attempts to contact the NFS server it will not be able
to mount or see what mounts are available.
With the new port pinning capabilities it is obviously much easier to control
what hosts are allowed to mount your NFS shares. It is worth mentioning that
NFS is not an encrypted protocol and anyone on the same physical network
could sniff the traffic and reassemble the information being passed back and
forth.
-----------------------------------------------------------------------------
6.5. Tunneling NFS through SSH
One method of encrypting NFS traffic over a network is to use the
port-forwarding capabilities of ssh. However, as we shall see, doing so has a
serious drawback if you do not utterly and completely trust the local users
on your server.
The first step will be to export files to the localhost. For example, to
export the /home partition, enter the following into /etc/exports:
/home 127.0.0.1(rw)
The next step is to use ssh to forward ports. For example, ssh can tell the
server to forward to any port on any machine from a port on the client. Let
us assume, as in the previous section, that our server is 192.168.0.42, and
that we have pinned mountd to port 32767 using the argument -p 32767. Then,
on the client, we'll type:
# ssh root@192.168.0.42 -L 250:localhost:2049 -f sleep 60m
# ssh root@192.168.0.42 -L 251:localhost:32767 -f sleep 60m
The above command causes ssh on the client to take any request directed at
the client's port 250 and forward it, first through sshd on the server, and
then on to the server's port 2049. The second line causes a similar type of
forwarding between requests to port 251 on the client and port 32767 on the
server. The localhost is relative to the server; that is, the forwarding will
be done to the server itself. The port could otherwise have been made to
forward to any other machine, and the requests would look to the outside
world as if they were coming from the server. Thus, the requests will appear
to NFSD on the server as if they are coming from the server itself. Note that
in order to bind to a port below 1024 on the client, we have to run this
command as root on the client. Doing this will be necessary if we have
exported our filesystem with the default secure option.
Finally, we are pulling a little trick with the last option, -f sleep 60m.
Normally, when we use ssh, even with the -L option, we will open up a shell
on the remote machine. But instead, we just want the port forwarding to
execute in the background so that we get our shell on the client back. So, we
tell ssh to execute a command in the background on the server to sleep for 60
minutes. This will cause the port to be forwarded for 60 minutes until it
gets a connection; at that point, the port will continue to be forwarded
until the connection dies or until the 60 minutes are up, whichever happens
later. The above command could be put in our startup scripts on the client,
right after the network is started.
Next, we have to mount the filesystem on the client. To do this, we tell the
client to mount a filesystem on the localhost, but at a different port from
the usual 2049. Specifically, an entry in /etc/fstab would look like:
localhost:/home /mnt/home nfs rw,hard,intr,port=250,mountport=251 0 0
Having done this, we can see why the above will be incredibly insecure if we
have any ordinary users who are able to log in to the server locally. If they
can, there is nothing preventing them from doing what we did and using ssh to
forward a privileged port on their own client machine (where they are
legitimately root) to ports 2049 and 32767 on the server. Thus, any ordinary
user on the server can mount our filesystems with the same rights as root on
our client.
If you are using an NFS server that does not have a way for ordinary users to
log in, and you wish to use this method, there are two additional caveats:
First, the connection travels from the client to the server via sshd;
therefore you will have to leave port 22 (where sshd listens) open to your
client on the firewall. However you do not need to leave the other ports,
such as 2049 and 32767, open anymore. Second, file locking will no longer
work. It is not possible to ask statd or the locking manager to make requests
to a particular port for a particular mount; therefore, any locking requests
will cause statd to connect to statd on localhost, i.e., itself, and it will
fail with an error. Any attempt to correct this would require a major rewrite
of NFS.
It may also be possible to use IPSec to encrypt network traffic between your
client and your server, without compromising any local security on the
server; this will not be taken up here. See the [http://www.freeswan.org/]
FreeS/WAN home page for details on using IPSec under Linux.
-----------------------------------------------------------------------------
6.6. Summary
If you use the hosts.allow, hosts.deny, root_squash, nosuid and privileged
port features in the portmapper/NFS software, you avoid many of the presently
known bugs in NFS and can almost feel secure about that at least. But still,
after all that: When an intruder has access to your network, s/he can make
strange commands appear in your .forward or read your mail when /home or /var
/mail is NFS exported. For the same reason, you should never access your PGP
private key over NFS. Or at least you should know the risk involved. And now
you know a bit of it.
NFS and the portmapper makes up a complex subsystem and therefore it's not
totally unlikely that new bugs will be discovered, either in the basic design
or the implementation we use. There might even be holes known now, which
someone is abusing. But that's life.
-----------------------------------------------------------------------------
7. Troubleshooting
This is intended as a step-by-step guide to what to do when things go
wrong using NFS. Usually trouble first rears its head on the client end,
so this diagnostic will begin there.
-----------------------------------------------------------------------------
7.1. Unable to See Files on a Mounted File System
First, check to see if the file system is actually mounted. There are several
ways of doing this. The most reliable way is to look at the file /proc/
mounts, which will list all mounted filesystems and give details about them.
If this doesn't work (for example if you don't have the /proc filesystem
compiled into your kernel), you can type mount -f although you get less
information.
If the file system appears to be mounted, then you may have mounted another
file system on top of it (in which case you should unmount and remount both
volumes), or you may have exported the file system on the server before you
mounted it there, in which case NFS is exporting the underlying mount point
(if so then you need to restart NFS on the server).
If the file system is not mounted, then attempt to mount it. If this does not
work, see Symptom 3.
-----------------------------------------------------------------------------
7.2. File requests hang or timeout waiting for access to the file.
This usually means that the client is unable to communicate with the server.
See Symptom 3 letter b.
-----------------------------------------------------------------------------
7.3. Unable to mount a file system
There are two common errors that mount produces when it is unable to mount a
volume. These are:
a. failed, reason given by server: Permission denied
This means that the server does not recognize that you have access to the
volume.
i. Check your /etc/exports file and make sure that the volume is
exported and that your client has the right kind of access to it. For
example, if a client only has read access then you have to mount the
volume with the ro option rather than the rw option.
ii. Make sure that you have told NFS to register any changes you made to
/etc/exports since starting nfsd by running the exportfs command. Be
sure to type exportfs -ra to be extra certain that the exports are
being re-read.
iii. Check the file /proc/fs/nfs/exports and make sure the volume and
client are listed correctly. (You can also look at the file /var/lib/
nfs/xtab for an unabridged list of how all the active export options
are set.) If they are not, then you have not re-exported properly. If
they are listed, make sure the server recognizes your client as being
the machine you think it is. For example, you may have an old listing
for the client in /etc/hosts that is throwing off the server, or you
may not have listed the client's complete address and it may be
resolving to a machine in a different domain. One trick is login to
the server from the client via ssh or telnet; if you then type who,
one of the listings should be your login session and the name of your
client machine as the server sees it. Try using this machine name in
your /etc/exports entry. Finally, try to ping the client from the
server, and try to ping the server from the client. If this doesn't
work, or if there is packet loss, you may have lower-level network
problems.
iv. It is not possible to export both a directory and its child (for
example both /usr and /usr/local). You should export the parent
directory with the necessary permissions, and all of its
subdirectories can then be mounted with those same permissions.
b. RPC: Program Not Registered: (or another "RPC" error):
This means that the client does not detect NFS running on the server.
This could be for several reasons.
i. First, check that NFS actually is running on the server by typing
rpcinfo -p on the server. You should see something like this:
+------------------------------------------------------------+
| program vers proto port |
| 100000 2 tcp 111 portmapper |
| 100000 2 udp 111 portmapper |
| 100011 1 udp 749 rquotad |
| 100011 2 udp 749 rquotad |
| 100005 1 udp 759 mountd |
| 100005 1 tcp 761 mountd |
| 100005 2 udp 764 mountd |
| 100005 2 tcp 766 mountd |
| 100005 3 udp 769 mountd |
| 100005 3 tcp 771 mountd |
| 100003 2 udp 2049 nfs |
| 100003 3 udp 2049 nfs |
| 300019 1 tcp 830 amd |
| 300019 1 udp 831 amd |
| 100024 1 udp 944 status |
| 100024 1 tcp 946 status |
| 100021 1 udp 1042 nlockmgr |
| 100021 3 udp 1042 nlockmgr |
| 100021 4 udp 1042 nlockmgr |
| 100021 1 tcp 1629 nlockmgr |
| 100021 3 tcp 1629 nlockmgr |
| 100021 4 tcp 1629 nlockmgr |
| |
+------------------------------------------------------------+
This says that we have NFS versions 2 and 3, rpc.statd version 1,
network lock manager (the service name for rpc.lockd) versions 1, 3,
and 4. There are also different service listings depending on whether
NFS is travelling over TCP or UDP. UDP is usually (but not always)
the default unless TCP is explicitly requested.
If you do not see at least portmapper, nfs, and mountd, then you need
to restart NFS. If you are not able to restart successfully, proceed
to Symptom 9.
ii. Now check to make sure you can see it from the client. On the client,
type rpcinfo -p server where server is the DNS name or IP address of
your server.
If you get a listing, then make sure that the type of mount you are
trying to perform is supported. For example, if you are trying to
mount using Version 3 NFS, make sure Version 3 is listed; if you are
trying to mount using NFS over TCP, make sure that is registered.
(Some non-Linux clients default to TCP). Type man rpcinfo for more
details on how to read the output. If the type of mount you are
trying to perform is not listed, try a different type of mount.
If you get the error No Remote Programs Registered, then you need to
check your /etc/hosts.allow and /etc/hosts.deny files on the server
and make sure your client actually is allowed access. Again, if the
entries appear correct, check /etc/hosts (or your DNS server) and
make sure that the machine is listed correctly, and make sure you can
ping the server from the client. Also check the error logs on the
system for helpful messages: Authentication errors from bad /etc/
hosts.allow entries will usually appear in /var/log/messages, but may
appear somewhere else depending on how your system logs are set up.
The man pages for syslog can help you figure out how your logs are
set up. Finally, some older operating systems may behave badly when
routes between the two machines are asymmetric. Try typing tracepath
[server] from the client and see if the word "asymmetric" shows up
anywhere in the output. If it does then this may be causing packet
loss. However asymmetric routes are not usually a problem on recent
linux distributions.
If you get the error Remote system error - No route to host, but you
can ping the server correctly, then you are the victim of an
overzealous firewall. Check any firewalls that may be set up, either
on the server or on any routers in between the client and the server.
Look at the man pages for ipchains, netfilter, and ipfwadm, as well
as the [http://www.linuxdoc.org/HOWTO/IPCHAINS-HOWTO.html]
IPChains-HOWTO and the [http://www.linuxdoc.org/HOWTO/
Firewall-HOWTO.html] Firewall-HOWTO for help.
-----------------------------------------------------------------------------
7.4. I do not have permission to access files on the mounted volume.
This could be one of two problems.
If it is a write permission problem, check the export options on the server
by looking at /proc/fs/nfs/exports and make sure the filesystem is not
exported read-only. If it is you will need to re-export it read/write (don't
forget to run exportfs -ra after editing /etc/exports). Also, check /proc/
mounts and make sure the volume is mounted read/write (although if it is
mounted read-only you ought to get a more specific error message). If not
then you need to re-mount with the rw option.
The second problem has to do with username mappings, and is different
depending on whether you are trying to do this as root or as a non-root user.
If you are not root, then usernames may not be in sync on the client and the
server. Type id [user] on both the client and the server and make sure they
give the same UID number. If they don't then you are having problems with
NIS, NIS+, rsync, or whatever system you use to sync usernames. Check group
names to make sure that they match as well. Also, make sure you are not
exporting with the all_squash option. If the user names match then the user
has a more general permissions problem unrelated to NFS.
If you are root, then you are probably not exporting with the no_root_squash
option; check /proc/fs/nfs/exports or /var/lib/nfs/xtab on the server and
make sure the option is listed. In general, being able to write to the NFS
server as root is a bad idea unless you have an urgent need -- which is why
Linux NFS prevents it by default. See Section 6 for details.
If you have root squashing, you want to keep it, and you're only trying to
get root to have the same permissions on the file that the user nobody should
have, then remember that it is the server that determines which uid root gets
mapped to. By default, the server uses the UID and GID of nobody in the /etc/
passwd file, but this can also be overridden with the anonuid and anongid
options in the /etc/exports file. Make sure that the client and the server
agree about which UID nobody gets mapped to.
-----------------------------------------------------------------------------
7.5. When I transfer really big files, NFS takes over all the CPU cycles on
the server and it screeches to a halt.
This is a problem with the fsync() function in 2.2 kernels that causes all
sync-to-disk requests to be cumulative, resulting in a write time that is
quadratic in the file size. If you can, upgrading to a 2.4 kernel should
solve the problem. Also, exporting with the no_wdelay option forces the
program to use o_sync() instead, which may prove faster.
-----------------------------------------------------------------------------
7.6. Strange error or log messages
a. Messages of the following format:
+-------------------------------------------------------------------------------------------+
| Jan 7 09:15:29 server kernel: fh_verify: mail/guest permission failure, acc=4, error=13 |
| Jan 7 09:23:51 server kernel: fh_verify: ekonomi/test permission failure, acc=4, error=13 |
| |
+-------------------------------------------------------------------------------------------+
These happen when a NFS setattr operation is attempted on a file you
don't have write access to. The messages are harmless.
b. The following messages frequently appear in the logs:
+---------------------------------------------------------------------+
| kernel: nfs: server server.domain.name not responding, still trying |
| kernel: nfs: task 10754 can't get a request slot |
| kernel: nfs: server server.domain.name OK |
| |
+---------------------------------------------------------------------+
The "can't get a request slot" message means that the client-side RPC
code has detected a lot of timeouts (perhaps due to network congestion,
perhaps due to an overloaded server), and is throttling back the number
of concurrent outstanding requests in an attempt to lighten the load. The
cause of these messages is basically sluggish performance. See Section 5
for details.
c. After mounting, the following message appears on the client:
+---------------------------------------------------------------+
|nfs warning: mount version older than kernel |
| |
+---------------------------------------------------------------+
It means what it says: You should upgrade your mount package and/or
am-utils. (If for some reason upgrading is a problem, you may be able to
get away with just recompiling them so that the newer kernel features are
recognized at compile time).
d. Errors in startup/shutdown log for lockd
You may see a message of the following kind in your boot log:
+---------------------------------------------------------------+
|nfslock: rpc.lockd startup failed |
| |
+---------------------------------------------------------------+
They are harmless. Older versions of rpc.lockd needed to be started up
manually, but newer versions are started automatically by nfsd. Many of
the default startup scripts still try to start up lockd by hand, in case
it is necessary. You can alter your startup scripts if you want the
messages to go away.
e. The following message appears in the logs:
+---------------------------------------------------------------+
|kmem_create: forcing size word alignment - nfs_fh |
| |
+---------------------------------------------------------------+
This results from the file handle being 16 bits instead of a mulitple of
32 bits, which makes the kernel grimace. It is harmless.
-----------------------------------------------------------------------------
7.7. Real permissions don't match what's in /etc/exports.
/etc/exports is very sensitive to whitespace - so the following statements
are not the same:
/export/dir hostname(rw,no_root_squash)
/export/dir hostname (rw,no_root_squash)
The first will grant hostname rw access to /export/dir without squashing root
privileges. The second will grant hostname rw privileges with root squash and
it will grant everyone else read/write access, without squashing root
privileges. Nice huh?
-----------------------------------------------------------------------------
7.8. Flaky and unreliable behavior
Simple commands such as ls work, but anything that transfers a large amount
of information causes the mount point to lock.
This could be one of two problems:
i. It will happen if you have ipchains on at the server and/or the client
and you are not allowing fragmented packets through the chains. Allow
fragments from the remote host and you'll be able to function again. See
Section 6.4 for details on how to do this.
ii. You may be using a larger rsize and wsize in your mount options than the
server supports. Try reducing rsize and wsize to 1024 and seeing if the
problem goes away. If it does, then increase them slowly to a more
reasonable value.
-----------------------------------------------------------------------------
7.9. nfsd won't start
Check the file /etc/exports and make sure root has read permission. Check the
binaries and make sure they are executable. Make sure your kernel was
compiled with NFS server support. You may need to reinstall your binaries if
none of these ideas helps.
-----------------------------------------------------------------------------
7.10. File Corruption When Using Multiple Clients
If a file has been modified within one second of its previous modification
and left the same size, it will continue to generate the same inode number.
Because of this, constant reads and writes to a file by multiple clients may
cause file corruption. Fixing this bug requires changes deep within the
filesystem layer, and therefore it is a 2.5 item.
-----------------------------------------------------------------------------
8. Using Linux NFS with Other OSes
Every operating system, Linux included, has quirks and deviations in the
behavior of its NFS implementation -- sometimes because the protocols are
vague, sometimes because they leave gaping security holes. Linux will work
properly with all major vendors' NFS implementations, as far as we know.
However, there may be extra steps involved to make sure the two OSes are
communicating clearly with one another. This section details those steps.
In general, it is highly ill-advised to attempt to use a Linux machine with a
kernel before 2.2.18 as an NFS server for non-Linux clients. Implementations
with older kernels may work fine as clients; however if you are using one of
these kernels and get stuck, the first piece of advice we would give is to
upgrade your kernel and see if the problems go away. The user-space NFS
implementations also do not work well with non-Linux clients.
Following is a list of known issues for using Linux together with major
operating systems.
-----------------------------------------------------------------------------
8.1. AIX
8.1.1. Linux Clients and AIX Servers
The format for the /etc/exports file for our example in Section 3 is:
/usr slave1.foo.com:slave2.foo.com,access=slave1.foo.com:slave2.foo.com
/home slave1.foo.com:slave2.foo.com,rw=slave1.foo.com:slave2.foo.com
-----------------------------------------------------------------------------
8.1.2. AIX clients and Linux Servers
AIX uses the file /etc/filesystems instead of /etc/fstab. A sample entry,
based on the example in Section 4, looks like this:
/mnt/home:
dev = "/home"
vfs = nfs
nodename = master.foo.com
mount = true
options = bg,hard,intr,rsize=1024,wsize=1024,vers=2,proto=udp
account = false
i. Version 4.3.2 of AIX, and possibly earlier versions as well, requires
that file systems be exported with the insecure option, which causes NFS
to listen to requests from insecure ports (i.e., ports above 1024, to
which non-root users can bind). Older versions of AIX do not seem to
require this.
ii. AIX clients will default to mounting version 3 NFS over TCP. If your
Linux server does not support this, then you may need to specify vers=2
and/or proto=udp in your mount options.
iii. Using netmasks in /etc/exports seems to sometimes cause clients to lose
mounts when another client is reset. This can be fixed by listing out
hosts explicitly.
iv. Apparently automount in AIX 4.3.2 is rather broken.
-----------------------------------------------------------------------------
8.2. BSD
8.2.1. BSD servers and Linux clients
BSD kernels tend to work better with larger block sizes.
-----------------------------------------------------------------------------
8.2.2. Linux servers and BSD clients
Some versions of BSD may make requests to the server from insecure ports, in
which case you will need to export your volumes with the insecure option. See
the man page for exports(5) for more details.
-----------------------------------------------------------------------------
8.3. Tru64 Unix
8.3.1. Tru64 Unix Servers and Linux Clients
In general, Tru64 Unix servers work quite smoothly with Linux clients. The
format for the /etc/exports file for our example in Section 3 is:
/usr slave1.foo.com:slave2.foo.com \
-access=slave1.foo.com:slave2.foo.com \
/home slave1.foo.com:slave2.foo.com \
-rw=slave1.foo.com:slave2.foo.com \
-root=slave1.foo.com:slave2.foo.com
(The root option is listed in the last entry for informational purposes only;
its use is not recommended unless necessary.)
Tru64 checks the /etc/exports file every time there is a mount request so you
do not need to run the exportfs command; in fact on many versions of Tru64
Unix the command does not exist.
-----------------------------------------------------------------------------
8.3.2. Linux Servers and Tru64 Unix Clients
There are two issues to watch out for here. First, Tru64 Unix mounts using
Version 3 NFS by default. You will see mount errors if your Linux server does
not support Version 3 NFS. Second, in Tru64 Unix 4.x, NFS locking requests
are made by daemon. You will therefore need to specify the insecure_locks
option on all volumes you export to a Tru64 Unix 4.x client; see the exports
man pages for details.
-----------------------------------------------------------------------------
8.4. HP-UX
8.4.1. HP-UX Servers and Linux Clients
A sample /etc/exports entry on HP-UX looks like this:
/usr -ro,access=slave1.foo.com:slave2.foo.com
/home -rw=slave1.foo.com:slave2.fo.com:root=slave1.foo.com:slave2.foo.com
(The root option is listed in the last entry for informational purposes only;
its use is not recommended unless necessary.)
-----------------------------------------------------------------------------
8.4.2. Linux Servers and HP-UX Clients
HP-UX diskless clients will require at least a kernel version 2.2.19 (or
patched 2.2.18) for device files to export correctly. Also, any exports to an
HP-UX client will need to be exported with the insecure_locks option.
-----------------------------------------------------------------------------
8.5. IRIX
8.5.1. IRIX Servers and Linux Clients
A sample /etc/exports entry on IRIX looks like this:
/usr -ro,access=slave1.foo.com:slave2.foo.com
/home -rw=slave1.foo.com:slave2.fo.com:root=slave1.foo.com:slave2.foo.com
(The root option is listed in the last entry for informational purposes only;
its use is not recommended unless necessary.)
There are reportedly problems when using the nohide option on exports to
linux 2.2-based systems. This problem is fixed in the 2.4 kernel. As a
workaround, you can export and mount lower-down file systems separately.
As of Kernel 2.4.17, there continue to be several minor interoperability
issues that may require a kernel upgrade. In particular:
<EFBFBD><EFBFBD>*<2A>Make sure that Trond Myklebust's seekdir (or dir) kernel patch is
applied. The latest version (for 2.4.17) is located at:
[http://www.fys.uio.no/~trondmy/src/2.4.17/linux-2.4.17-seekdir.dif]
http://www.fys.uio.no/~trondmy/src/2.4.17/linux-2.4.17-seekdir.dif
<EFBFBD><EFBFBD>*<2A>IRIX servers do not always use the same fsid attribute field across
reboots, which results in inode number mismatch errors on a Linux client
if the mounted IRIX server reboots. A patch is available from:
[http://www.geocrawler.com/lists/3/SourceForge/789/0/7777454/] http://
www.geocrawler.com/lists/3/SourceForge/789/0/7777454/
<EFBFBD><EFBFBD>*<2A>Linux kernels v2.4.9 and above have problems reading large directories
(hundreds of files) from exported IRIX XFS file systems that were made
with naming version=1. The reason for the problem can be found at:
[http://www.geocrawler.com/archives/3/789/2001/9/100/6531172/] http://
www.geocrawler.com/archives/3/789/2001/9/100/6531172/
The naming version can be found by using (on the IRIX server):
xfs_growfs -n mount_point
The workaround is to export these file systems using the -32bitclients
option in the /etc/exports file. The fix is to convert the file system to
'naming version=2'. Unfortunately the only way to do this is by a backup/
mkfs/restore.
mkfs_xfs on IRIX 6.5.14 (and above) creates naming version=2 XFS file
systems by default. On IRIX 6.5.5 to 6.5.13, use:
mkfs_xfs -n version=2 device
Versions of IRIX prior to 6.5.5 do not support naming version=2 XFS file
systems.
-----------------------------------------------------------------------------
8.5.2. IRIX clients and Linux servers
Irix versions up to 6.5.12 have problems mounting file systems exported from
Linux boxes - the mount point "gets lost," e.g.,
# mount linux:/disk1 /mnt
# cd /mnt/xyz/abc
# pwd
/xyz/abc
This is known IRIX bug (SGI bug 815265 - IRIX not liking file handles of less
than 32 bytes), which is fixed in IRIX 6.5.13. If it is not possible to
upgrade to IRIX 6.5.13, then the unofficial workaround is to force the Linux
nfsd to always use 32 byte file handles.
A number of patches exist - see:
<EFBFBD><EFBFBD>*<2A>[http://www.geocrawler.com/archives/3/789/2001/8/50/6371896/] http://
www.geocrawler.com/archives/3/789/2001/8/50/6371896/
<EFBFBD><EFBFBD>*<2A>[http://oss.sgi.com/projects/xfs/mail_archive/0110/msg00006.html] http://
oss.sgi.com/projects/xfs/mail_archive/0110/msg00006.html
-----------------------------------------------------------------------------
8.6. Solaris
8.6.1. Solaris Servers
Solaris has a slightly different format on the server end from other
operating systems. Instead of /etc/exports, the configuration file is /etc/
dfs/dfstab. Entries are of the form of a share command, where the syntax for
the example in Section 3 would look like
share -o rw=slave1,slave2 -d "Master Usr" /usr
and instead of running exportfs after editing, you run shareall.
Solaris servers are especially sensitive to packet size. If you are using a
Linux client with a Solaris server, be sure to set rsize and wsize to 32768
at mount time.
Finally, there is an issue with root squashing on Solaris: root gets mapped
to the user noone, which is not the same as the user nobody. If you are
having trouble with file permissions as root on the client machine, be sure
to check that the mapping works as you expect.
-----------------------------------------------------------------------------
8.6.2. Solaris Clients
Solaris clients will regularly produce the following message:
+---------------------------------------------------------------------------+
|svc: unknown program 100227 (me 100003) |
| |
+---------------------------------------------------------------------------+
This happens because Solaris clients, when they mount, try to obtain ACL
information - which Linux obviously does not have. The messages can safely be
ignored.
There are two known issues with diskless Solaris clients: First, a kernel
version of at least 2.2.19 is needed to get /dev/null to export correctly.
Second, the packet size may need to be set extremely small (i.e., 1024) on
diskless sparc clients because the clients do not know how to assemble
packets in reverse order. This can be done from /etc/bootparams on the
clients.
-----------------------------------------------------------------------------
8.7. SunOS
SunOS only has NFS Version 2 over UDP.
-----------------------------------------------------------------------------
8.7.1. SunOS Servers
On the server end, SunOS uses the most traditional format for its /etc/
exports file. The example in Section 3 would look like:
/usr -access=slave1.foo.com,slave2.foo.com
/home -rw=slave1.foo.com,slave2.foo.com, root=slave1.foo.com,slave2.foo.com
Again, the root option is listed for informational purposes and is not
recommended unless necessary.
-----------------------------------------------------------------------------
8.7.2. SunOS Clients
Be advised that SunOS makes all NFS locking requests as daemon, and therefore
you will need to add the insecure_locks option to any volumes you export to a
SunOS machine. See the exports man page for details.
</sect1 id="NFS">
<sect1 id="Samba">
8.11. SAMBA - `NetBEUI', `NetBios', `CIFS' support.
SAMBA is an implementation of the Session Management Block protocol.
Samba allows Microsoft and other systems to mount and use your disks
and printers.
SAMBA and its configuration are covered in detail in the SMB-HOWTO.
5.2. Windows Environment
Samba is a suite of applications that allow most Unices (and in
particular Linux) to integrate into a Microsoft network both as a
client and a server. Acting as a server it allows Windows 95, Windows
for Workgroups, DOS and Windows NT clients to access Linux files and
printing services. It can completely replace Windows NT for file and
printing services, including the automatic downloading of printer
drivers to clients. Acting as a client allows the Linux workstation to
mount locally exported windows file shares.
According to the SAMBA Meta-FAQ:
"Many users report that compared to other SMB implementations Samba is more stable,
faster, and compatible with more clients. Administrators of some large installations say
that Samba is the only SMB server available which will scale to many tens of thousands
of users without crashing"
<20> Samba project home page <http://samba.anu.edu.au/samba/>
<20> SMB HOWTO <http://metalab.unc.edu/mdw/HOWTO/SMB-HOWTO.html>
<20> Printing HOWTO <http://metalab.unc.edu/mdw/HOWTO/Printing-
HOWTO.html>
<glossentry>
<glossterm>
samba
</glossterm>
<glossdef>
<para>
A LanManager like file and printer server for Unix. The Samba software suite is a collection of programs that implements the SMB protocol for unix systems, allowing you to serve files and printers to Windows, NT, OS/2 and DOS clients. This protocol is sometimes also referred to as the LanManager or NetBIOS protocol. This package contains all the components necessary to turn your Debian GNU/Linux box into a powerful file and printer server. Currently, the Samba Debian packages consist of the following: samba - A LanManager like file and printer server for Unix. samba-common - Samba common files used by both the server and the client. smbclient - A LanManager like simple client for Unix. swat - Samba Web Administration Tool samba-doc - Samba documentation. smbfs - Mount and umount commands for the smbfs (kernels 2.0.x and above). libpam-smbpass - pluggable authentication module for SMB password database libsmbclient - Shared library that allows applications to talk to SMB servers libsmbclient-dev - libsmbclient shared libraries winbind: Service to resolve user and group information from Windows NT servers It is possible to install a subset of these packages depending on your particular needs. For example, to access other SMB servers you should only need the smbclient and samba-common packages. From Debian 3.0r0 APT
<ulink url="http://www.tldp.org/LDP/Linux-Dictionary/html/index.html">http://www.tldp.org/LDP/Linux-Dictionary/html/index.html</ulink>
</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm>
Samba
</glossterm>
<glossdef>
<para>
A lot of emphasis has been placed on peaceful coexistence between UNIX and Windows. Unfortunately, the two systems come from very different cultures and they have difficulty getting along without mediation. ...and that, of course, is Samba&apos;s job. Samba &lt;http://samba.org/&gt; runs on UNIX platforms, but speaks to Windows clients like a native. It allows a UNIX system to move into a Windows ``Network Neighborhood&apos;&apos; without causing a stir. Windows users can happily access file and print services without knowing or caring that those services are being offered by a UNIX host. All of this is managed through a protocol suite which is currently known as the ``Common Internet File System,&apos;&apos; or CIFS &lt;http://www.cifs.com&gt;. This name was introduced by Microsoft, and provides some insight into their hopes for the future. At the heart of CIFS is the latest incarnation of the Server Message Block (SMB) protocol, which has a long and tedious history. Samba is an open source CIFS implementation, and is available for free from the http://samba.org/ mirror sites. Samba and Windows are not the only ones to provide CIFS networking. OS/2 supports SMB file and print sharing, and there are commercial CIFS products for Macintosh and other platforms (including several others for UNIX). Samba has been ported to a variety of non-UNIX operating systems, including VMS, AmigaOS, and NetWare. CIFS is also supported on dedicated file server platforms from a variety of vendors. In other words, this stuff is all over the place. From Rute-Users-Guide
<ulink url="http://www.tldp.org/LDP/Linux-Dictionary/html/index.html">http://www.tldp.org/LDP/Linux-Dictionary/html/index.html</ulink>
</para>
</glossdef>
</glossentry>
<glossentry>
<glossterm>
Samba
</glossterm>
<glossdef>
<para>
Samba adds Windows-networking support to UNIX. Whereas NFS is the most popular protocol for sharing files among UNIX machines, SMB is the most popular protocol for sharing files among Windows machines. The Samba package adds the ability for UNIX systems to interact with Windows systems. Key point: The Samba package comprises the following: smbd The Samba service allowing other machines (often Windows) to read files from a UNIX machine. nmbd Provides support for NetBIOS. Logically, the SMB protocol is layered on top of NetBIOS, which is in turn layered on top of TCP/IP. smbmount An extension to the mount program that allows a UNIX machine to connect to another machine implicitly. Files can be accessed as if they were located on the local machines. smbclient Allows files to be access through SMB in an explicity manner. This is a command-line tool much like the FTP tool that allows files to be copied. Unlike smbmount, files cannot be accessed as if they were local. smb.conf The configuration file for Samba. From Hacking-Lexicon
<ulink url="http://www.tldp.org/LDP/Linux-Dictionary/html/index.html">http://www.tldp.org/LDP/Linux-Dictionary/html/index.html</ulink>
</para>
</glossdef>
</glossentry>
Samba Authenticated Gateway HOWTO
Ricardo Alexandre Mattar
v1.2, 2004-05-21
</sect1 id="SAMBA">
<sect1 id="SSH">
<title>SSH</title>
<para>
The Secure Shell, or SSH, provides a way of running command line and
graphical applications, and transferring files, over an encrypted
connection. SSH uses up to 2,048-bit encryption with a variety of
cryptographic schemes to make sure that if a cracker intercepts your
connection, all they can see is useless gibberish. It is both a
protocol and a suite of small command line applications which can be
used for various functions.
</para>
<para>
SSH replaces the old Telnet application, and can be used for secure
remote administration of machines across the Internet. However, it
has more features.
</para>
<para>
SSH increases the ease of running applications remotely by setting up
permissions automatically. If you can log into a machine, it allows you
to run a graphical application on it, unlike Telnet, which requires users
to type lots of geeky xhost and xauth commands. SSH also has inbuild
compression, which allows your graphic applications to run much faster
over the network.
</para>
<para>
SCP (Secure Copy) and SFTP (Secure FTP) allow transfer of files over the
remote link, either via SSH's own command line utilities or graphical tools
like Gnome's GFTP. Like Telnet, SSH is cross-platform. You can find SSH
servers and clients for Linux, Unix, all flavours of Windows, BeOS, PalmOS,
Java and Embedded OSes used in routers.
</para>
<para>
Encrypted remote shell sessions are available through SSH
(http://www.ssh.fi/sshprotocols2/index.html
<http://www.ssh.fi/sshprotocols2/index.html>) thus effectively
allowing secure remote administration.
</para>
</sect1 id="SSH">
<sect1 id="Telnet">
<title>Telnet</title>
<para>
Created in the early 1970s, Telnet provides a method of running command
line applications on a remote computer as if that person were actually at
the remote site. Telnet is one of the most powerful tools for Unix, allowing
for true remote administration. It is also an interesting program from the
point of view of users, because it allows remote access to all their files
and programs from anywhere in the Internet. Combined with an X server (as
well as some rather arcane manipluation of authentication 'cookies' and
'DISPLAY' environment variables), there is no difference (apart from the
delay) between being at the console or on the other side of the planet.
However, since the 'telnet' protocol sends data 'en-clair' and there are
now more efficient protocols with features such as built-in
compression and 'tunneling' which allows for greater ease of usage of graphical
applications across the network as well as more secure connections it is an
effectively a dead protocol. Like the 'r' (such as rlogin and rsh) related
protocols it is still used though, within internal networks for the reasons
of ease of installation and use as well as backwards compatibility and also
as a means by which to configure networking devices such as routers
and firewalls.
</para>
<para>
Please consult RFC 854 for further details behind its implementation.
</para>
<para>
<20> Telnet related software
<http://metalab.unc.edu/pub/Linux/system/network/telnet/>
</para>
</sect1 id="Telnet">
<sect1 id="TFTP">
<title>TFTP</title>
<para>
Trivial File Transfer Protocol TFTP is a bare-bones protocol used by
devices that boot from the network. It is runs on top of UDP, so it
doesn&apos;t require a real TCP/IP stack. Misunderstanding: Many people
describe TFTP as simply a trivial version of FTP without authentication.
This misses the point. The purpose of TFTP is not to reduce the complexity
of file transfer, but to reduce the complexity of the underlying TCP/IP
stack so that it can fit inside boot ROMs. Key point: TFTP is almost
always used with BOOTP. BOOTP first configures the device, then TFTP
transfers the boot image named by BOOTP which is then used to boot the
device. Key point: Many systems come with unnecessary TFTP servers. Many
TFTP servers have bugs, like the backtracking problem or buffer overflows.
As a consequence, many systems can be exploited with TFTP even though
virtually nobody really uses it. Key point: A TFTP file transfer client
is built into many operating systems (UNIX, Windows, etc....). These clients
are often used to download rootkits when being broken into. Therefore,
removing the TFTP client should be part of your hardening procedure.
For further details on the TFTP protocol please see RFC's 1350, 1782,
1783, 1784, and 1785.
</para>
<para>
Most likely, you'll interface with the TFTP protocol using the TFTP command
line client, 'tftp', which allows users to transfer files to and from a
remote machine. The remote host may be specified on the command line, in
which case tftp uses host as the default host for future transfers.
</para>
<para>
Setting up TFTP is almost as easy as DHCP.
First install from the rpm package:
<screen>
# rpm -ihv tftp-server-*.rpm
</screen>
</para>
<para>
Create a directory for the files:
<screen>
# mkdir /tftpboot
# chown nobody:nobody /tftpboot
</screen>
</para>
<para>
The directory /tftpboot is owned by user nobody, because this is the default
user id set up by tftpd to access the files. Edit the file /etc/xinetd.d/tftp
to look like the following:
</para>
<para>
<screen>
service tftp
{
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.tftpd
server_args = -c -s /tftpboot
disable = no
per_source = 11
cps = 100 2
}
</screen>
</para>
<para>
The changes from the default file are the parameter disable = no (to enable
the service) and the server argument -c. This argument allows for the
creation of files, which is necessary if you want to save boot or disk
images. You may want to make TFTP read only in normal operation.
</para>
<para>
Then reload xinetd:
<screen>
/etc/rc.d/init.d/xinetd reload
</screen>
</para>
<para>
You can use the tftp command, available from the tftp (client) rpm package,
to test the server. At the tftp prompt, you can issue the commands put and
get.
</para>
</sect1 id="TFTP">
<sect1 id="VNC">
<title>VNC</title>
8.13. Tunnelling, mobile IP and virtual private networks
The Linux kernel allows the tunnelling (encapsulation) of protocols.
It can do IPX tunnelling through IP, allowing the connection of two
IPX networks through an IP only link. It can also do IP-IP tunnelling,
which it is essential for mobile IP support, multicast support and
amateur radio. (see
http://metalab.unc.edu/mdw/HOWTO/NET3-4-HOWTO-6.html#ss6.8)
Mobile IP specifies enhancements that allow transparent routing of IP
datagrams to mobile nodes in the Internet. Each mobile node is always
identified by its home address, regardless of its current point of
attachment to the Internet. While situated away from its home, a
mobile node is also associated with a care-of address, which provides
information about its current point of attachment to the Internet.
The protocol provides for registering the care-of address with a home
agent. The home agent sends datagrams destined for the mobile node
through a tunnel to the care-of address. After arriving at the end of
the tunnel, each datagram is then delivered to the mobile node.
Point-to-Point Tunneling Protocol (PPTP) is a networking technology
that allows the use of the Internet as a secure virtual private
network (VPN). PPTP is integrated with the Remote Access Services
(RAS) server which is built into Windows NT Server. With PPTP, users
can dial into a local ISP, or connect directly to the Internet, and
access their network as if they were at their desks. PPTP is a closed
protocol and its security has recently being compromised. It is highly
recomendable to use other Linux based alternatives, since they rely on
open standards which have been carefully examined and tested.
<20> A client implementation of the PPTP for Linux is available here
<http://www.pdos.lcs.mit.edu/~cananian/Projects/PPTP/>
<20> More on Linux PPTP can be found here
<http://bmrc.berkeley.edu/people/chaffee/linux_pptp.html>
Mobile IP:
<20> http://www.hpl.hp.com/personal/Jean_Tourrilhes/MobileIP/mip.html
<20> http://metalab.unc.edu/mdw/HOWTO/NET3-4-HOWTO-6.html#ss6.12
Virtual Private Networks related documents:
<20> http://metalab.unc.edu/mdw/HOWTO/mini/VPN.html
<20> http://sites.inka.de/sites/bigred/devel/cipe.html
7.4. VNC
VNC stands for Virtual Network Computing. It is, in essence, a remote
display system which allows one to view a computing 'desktop'
environment not only on the machine where it is running, but from
anywhere on the Internet and from a wide variety of machine
architectures. Both clients and servers exist for Linux as well as for
many other platforms. It is possible to execute MS-Word in a Windows
NT or 95 machine and have the output displayed in a Linux machine. The
opposite is also true; it is possible to execute an application in a
Linux machine and have the output displayed in any other Linux or
Windows machine. One of the available clients is a Java applet,
allowing the remote display to be run inside a web browser. Another
client is a port for Linux using the SVGAlib graphics library,
allowing 386s with as little as 4 MB of RAM to become fully functional
X-Terminals.
<20> VNC web site <http://www.orl.co.uk/vnc/>
<para>
Virtual Network Computing (VNC) allows a user to operate a session running on another machine.
Although Linux and all other Unix-like OSes already have this functionality built in, VNC
provides further advantages because it's cross-platform, running on Linux, BSD, Unix, Win32,
MacOS, and PalmOS. This makes it far more versatile.
For example, let's assume the machine that you are attempting to connect to is running Linux.
You can use VNC to access applications running on that other Linux desktop. You can also use
VNC to provide technical support to users on Window's based machines by taking control of
their desktops from the comfort of your server room. VNC is usually installed as seperate
packages for the client and server, typically named 'vnc' and 'vnc-server'.
VNC uses screen numbers to connect clients to servers. This is because Unix machines allow
multiple graphical sessions to be stated simultaneously (check this out by logging in to a
virtual terminal and typing startx -- :1).
For platforms (Windows, MacOS, Palm, etc) which don't have this capability, you'll connect
to 'screen 0' and take over the session of the existing user. For Unix systems, you'll need
to specify a higher number and receive a new desktop.
If you prefer the Windows-style approach where the VNC client takes over the currently
running display, you can use x0rfbserver - see the sidebox below.
VNC Servers and Clients
On Linux, the VNC server (which allows the machine to be used remotely) is actually
run as a replacement X server. To be able to start a VNC session to a machine, log
into it and run vncserver. You'll be prompted for a password - in future you can
change this password with the vncpasswd command. After you enter the password, you'll
be told the display number of the newly created machine.
It is possible to control a remote macine by using the vncviewer command. If it is
typed on its own it will prompt for a remote machine, or you can use:
vncviewer [host]:[screen-number]
> The VPN HOWTO, deprecated!!!!
> VPN HOWTO
> Linux VPN Masquerade HOWTO
</para>
10. References
10.1. Web Sites
Cipe Home Page <http://sites.inka.de/~bigred/devel/cipe.html>
Masq Home Page <http://ipmasq.cjb.net>
Samba Home Page <http://samba.anu.edu.au>
Linux HQ <http://www.linuxhq.com> ---great site for lots of linux
info
10.2. Documentation
cipe.info: info file included with cipe distribution
Firewall HOWTO, by Mark Grennan, markg@netplus.net
IP Masquerade mini-HOWTO,by Ambrose Au, ambrose@writeme.com
IPChains-Howto, by Paul Russell, Paul.Russell@rustcorp.com.au
</sect1 id="VNC">
<sect1 id="Web-Serving">
<title>Web-Serving</title>
<para>
The World Wide Web provides a simple method of publishing and linking
information across the Internet, and is responsible for popularising
the Internet to its current level. In the simplest case, a Web client
(or browser), such as Netscape or Internet Explorer, connects with a
Web server using a simple request/response protocol called HTTP
(Hypertext Transfer Protocol), and requests HTML (Hypertext Markup
Language) pages, images, Flash and other objects.
</para>
<para>
In mode modern situations, the Web server can also geneate pages
dynamically based on information returned from the user. Either way
setting up your own Web server is extremely simple. There are many
choices for Web serving under Linux. Some servers are very mature,
such as Apache, and are perfect for small and large sites alike.
Other servers programmed to be light and fast, and to have only a
limited feature set to reduce complexity. A search on freshmeat.net
will reveal a multitude of servers.
</para>
<para>
Most Linux distributions include Apache <http://www.apache.org>.
Apache is the number one server on the internet according to
http://www.netcraft.co.uk/survey/ . More than a half of all internet
sites are running Apache or one of it derivatives. Apache's advantages
include its modular design, stability and speed. Given the appropriate
hardware and configuration it can support the highest loads: Yahoo,
Altavista, GeoCities, and Hotmail are based on customized versions of
this server.
</para>
<para>
Optional support for SSL (which enables secure transactions) is also
available at:
</para>
<20> http://www.apache-ssl.org/
<20> http://raven.covalent.net/
<20> http://www.c2.net/
Dynamic Web content generation
<para>
Web scripting languages are even more common on Linux than databases
- basically, every language is available. This includes CGI,
PHP 3 and 4, Perl, JSP, ASP (via closed source applications from
Chill!soft and Halycon Software) and ColdFusion.
</para>
<para>
PHP is an open source scripting language designed to churn out
dynamically produced Web content ranging from databases to browsers.
This inludes not only HTML, but also graphics, Macromedia Flash and
XML-based information. The latest versions of PHP provide impressive
speed improvements, install easily from packages and can be set up
quickly. PHP is the most popular Apache module and is used by over
two million sites, including Amazon.com, US telco giant Sprint,
Xoom Networks and Lycos. And unlike most other server side scripting
languages, developers (or those that employ them) can add their own
functions into the source to improve it. Supported databases include
those in the Database serving section and most ODBC compliant
databases. The language itself borrows its structure from Perl and C.
</para>
<20> http://metalab.unc.edu/mdw/HOWTO/WWW-HOWTO.html
<20> http://metalab.unc.edu/mdw/HOWTO/Virtual-Services-HOWTO.html
<20> http://metalab.unc.edu/mdw/HOWTO/Intranet-Server-HOWTO.html
<20> Web servers for Linux
<http://www.linuxlinks.com/Software/Internet/WebServers/>
</sect1 id="Web-Serving">
<sect1 id="X11">
<title>X11</title>
<para>
The X Window System was developed at MIT in the late 1980s, rapidly
becoming the industry standard windowing system for Unix graphics
workstations. The software is freely available, very versatile, and is
suitable for a wide range of hardware platforms. Any X environment
consists of two distinct parts, the X server and one or more X
clients. It is important to realise the distinction between the server
and the client. The server controls the display directly and is
responsible for all input/output via the keyboard, mouse or display.
The clients, on the other hand, do not access the screen directly -
they communicate with the server, which handles all input and output.
It is the clients which do the "real" computing work - running
applications or whatever. The clients communicate with the server,
causing the server to open one or more windows to handle input and
output for that client.
</para>
<para>
In short, the X Window System allows a user to log in into a remote
machine, execute a process (for example, open a web browser) and have
the output displayed on his own machine. Because the process is
actually being executed on the remote system, very little CPU power is
needed in the local one. Indeed, computers exist whose primary purpose
is to act as pure X servers. Such systems are called X terminals.
</para>
<para>
A free port of the X Window System exists for Linux and can be found
at: Xfree <http://www.xfree86.org/>. It is included in most Linux
distributions.
<para>
<para>
For further information regarding X please see:
</para>
X11, LBX, DXPC, NXServer, SSH, MAS
Related HOWTOs:
<EFBFBD> Remote X Apps HOWTO
<EFBFBD> Linux XDMCP HOWTO
<EFBFBD> XDM and X Terminal mini-HOWTO
<EFBFBD> The Linux XFree86 HOWTO
<EFBFBD> ATI R200 + XFree86 4.x mini-HOWTO
<EFBFBD> Second Mouse in X mini-HOWTO
<EFBFBD> Linux Touch Screen HOWTO
<EFBFBD> XFree86 Video Timings HOWTO
<EFBFBD> Linux XFree-to-Xinside mini-HOWTO
<EFBFBD> XFree Local Multi-User HOWTO
<EFBFBD> Using Xinerama to MultiHead XFree86 V. 4.0+
<EFBFBD> Connecting X Terminals to Linux Mini-HOWTO
<EFBFBD> How to change the title of an xterm
<EFBFBD> X Window System Architecture Overview HOWTO
<EFBFBD> The X Window User HOWTO
</sect1 id="X11">
<sect1 id="Email">
<title>Email</title>
<para>
Alongside the Web, mail is the top reason for the popularity of the Internet. Email is an inexpensive and fast method of time-shifted messaging which, much like the Web, is actually based around sending and receiving plain text files. The protocol used is called the Simple Mail Transfer Protocol (SMTP). The server programs that implement SMTP to move mail from one server to another are called Mail Transfer Agents (MTAs).
</para>
<para>
In times gone by, users would Telnet into the SMTP server itself and use a command line program like elm or pine to check ther mail. These days, users run email clients like Netscape, Evolution, Kmail or Outlook on their desktop to check their email off a local SMTP server. Additional protocols like POP3 and IMAP4 are used between the SMTP server and desktop mail client to allow clients to manipulate files on, and download from, their local mail server. The programs that implement POP3 and IMAP4 are called Mail Delivery Agents (MDAs). They are generally separate from MTAs.
</para>
* Linux Mail-Queue mini-HOWTO
* The Linux Mail User HOWTO
</sect1 id="Email-Hosting">
<sect1 id="Proxy-Caching">
8.11. Proxy Server
The term proxy means "to do something on behalf of someone else." In
networking terms, a proxy server computer can act on the behalf of
several clients. An HTTP proxy is a machine that receives requests for
web pages from another machine (Machine A). The proxy gets the page
requested and returns the result to Machine A. The proxy may have a
cache with the requested pages, so if another machine asks for the
same page the copy in the cache will be returned instead. This allows
efficient use of bandwidth resources and less response time. As a side
effect, as client machines are not directly connected to the outside
world this is a way of securing the internal network. A well-
configured proxy can be as effective as a good firewall.
Several proxy servers exist for Linux. One popular solution is the
Apache proxy module. A more complete and robust implementation of an
HTTP proxy is SQUID.
<20> Apache <http://www.apache.org>
<20> Squid <http://squid.nlanr.net/>
<title>Proxy-Caching</title>
<para>
When a web browser retreives information from the Internet, it stores a copy of that information
in a cache on the local machine. When a user requests that information in future, the browser will
check to seee if the original source has updated; if not, the browser will simply use the cached version
rather than fetch the data again.
By doing this, there is less information that needs to be downloadded, which makes the connection seem responsive
to users and reduces bandwidth costs.
But if there are many browsers accessing the Internet through the same connection, it makes better sense to have
a single, centralised cache so that once a single machine has requested some information, the next
machine to try and download that information can also access it more quickly. This is the
theory behind the proxy cache. Squid is by far the most popular cache used on the Web, and can also be used
to accelerate Web serving.
Although Squid is useful for an ISP, large businesses or even a small office can afford to use Squid to
speed up transfers and save money, and it can easily be used to the same effect in a home with a few
flatmates sharing a cable or ADSL connection.
</para>
Traffic Control HOWTO
Version 1.0.1
Martin A. Brown
[http://www.securepipe.com/] SecurePipe, Inc.
Network Administration
<mabrown@securepipe.com>
"Nov 2003"
Revision History
Revision 1.0.1 2003-11-17 Revised by: MAB
Added link to Leonardo Balliache's documentation
Revision 1.0 2003-09-24 Revised by: MAB
reviewed and approved by TLDP
Revision 0.7 2003-09-14 Revised by: MAB
incremental revisions, proofreading, ready for TLDP
Revision 0.6 2003-09-09 Revised by: MAB
minor editing, corrections from Stef Coene
Revision 0.5 2003-09-01 Revised by: MAB
HTB section mostly complete, more diagrams, LARTC pre-release
Revision 0.4 2003-08-30 Revised by: MAB
added diagram
Revision 0.3 2003-08-29 Revised by: MAB
substantial completion of classless, software, rules, elements and components
sections
Revision 0.2 2003-08-23 Revised by: MAB
major work on overview, elements, components and software sections
Revision 0.1 2003-08-15 Revised by: MAB
initial revision (outline complete)
Traffic control encompasses the sets of mechanisms and operations by which
packets are queued for transmission/reception on a network interface. The
operations include enqueuing, policing, classifying, scheduling, shaping and
dropping. This HOWTO provides an introduction and overview of the
capabilities and implementation of traffic control under Linux.
<EFBFBD> 2003, Martin A. Brown
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or any
later version published by the Free Software Foundation; with no
invariant sections, with no Front-Cover Texts, with no Back-Cover Text. A
copy of the license is located at [http://www.gnu.org/licenses/fdl.html]
http://www.gnu.org/licenses/fdl.html.
-----------------------------------------------------------------------------
Table of Contents
1. Introduction to Linux Traffic Control
1.1. Target audience and assumptions about the reader
1.2. Conventions
1.3. Recommended approach
1.4. Missing content, corrections and feedback
2. Overview of Concepts
2.1. What is it?
2.2. Why use it?
2.3. Advantages
2.4. Disdvantages
2.5. Queues
2.6. Flows
2.7. Tokens and buckets
2.8. Packets and frames
3. Traditional Elements of Traffic Control
3.1. Shaping
3.2. Scheduling
3.3. Classifying
3.4. Policing
3.5. Dropping
3.6. Marking
4. Components of Linux Traffic Control
4.1. qdisc
4.2. class
4.3. filter
4.4. classifier
4.5. policer
4.6. drop
4.7. handle
5. Software and Tools
5.1. Kernel requirements
5.2. iproute2 tools (tc)
5.3. tcng, Traffic Control Next Generation
5.4. IMQ, Intermediate Queuing device
6. Classless Queuing Disciplines (qdiscs)
6.1. FIFO, First-In First-Out (pfifo and bfifo)
6.2. pfifo_fast, the default Linux qdisc
6.3. SFQ, Stochastic Fair Queuing
6.4. ESFQ, Extended Stochastic Fair Queuing
6.5. GRED, Generic Random Early Drop
6.6. TBF, Token Bucket Filter
7. Classful Queuing Disciplines (qdiscs)
7.1. HTB, Hierarchical Token Bucket
7.2. PRIO, priority scheduler
7.3. CBQ, Class Based Queuing
8. Rules, Guidelines and Approaches
8.1. General Rules of Linux Traffic Control
8.2. Handling a link with a known bandwidth
8.3. Handling a link with a variable (or unknown) bandwidth
8.4. Sharing/splitting bandwidth based on flows
8.5. Sharing/splitting bandwidth based on IP
9. Scripts for use with QoS/Traffic Control
9.1. wondershaper
9.2. ADSL Bandwidth HOWTO script (myshaper)
9.3. htb.init
9.4. tcng.init
9.5. cbq.init
10. Diagram
10.1. General diagram
11. Annotated Traffic Control Links
1. Introduction to Linux Traffic Control
Linux offers a very rich set of tools for managing and manipulating the
transmission of packets. The larger Linux community is very familiar with the
tools available under Linux for packet mangling and firewalling (netfilter,
and before that, ipchains) as well as hundreds of network services which can
run on the operating system. Few inside the community and fewer outside the
Linux community are aware of the tremendous power of the traffic control
subsystem which has grown and matured under kernels 2.2 and 2.4.
This HOWTO purports to introduce the concepts of traffic control, the
traditional elements (in general), the components of the Linux traffic
control implementation and provide some guidelines . This HOWTO represents
the collection, amalgamation and synthesis of the [http://lartc.org/howto/]
LARTC HOWTO, documentation from individual projects and importantly the LARTC
mailing list over a period of study.
The impatient soul, who simply wishes to experiment right now, is
recommended to the [http://tldp.org/HOWTO/Traffic-Control-tcng-HTB-HOWTO/]
Traffic Control using tcng and HTB HOWTO and [http://lartc.org/howto/] LARTC
HOWTO for immediate satisfaction.
-----------------------------------------------------------------------------
1.1. Target audience and assumptions about the reader
The target audience for this HOWTO is the network administrator or savvy
home user who desires an introduction to the field of traffic control and an
overview of the tools available under Linux for implementing traffic control.
I assume that the reader is comfortable with UNIX concepts and the command
line and has a basic knowledge of IP networking. Users who wish to implement
traffic control may require the ability to patch, compile and install a
kernel or software package [1]. For users with newer kernels (2.4.20+, see
also Section 5.1), however, the ability to install and use software may be
all that is required.
Broadly speaking, this HOWTO was written with a sophisticated user in mind,
perhaps one who has already had experience with traffic control under Linux.
I assume that the reader may have no prior traffic control experience.
-----------------------------------------------------------------------------
1.2. Conventions
This text was written in [http://www.docbook.org/] DocBook ([http://
www.docbook.org/xml/4.2/index.html] version 4.2) with vim. All formatting has
been applied by [http://xmlsoft.org/XSLT/] xsltproc based on DocBook XSL and
LDP XSL stylesheets. Typeface formatting and display conventions are similar
to most printed and electronically distributed technical documentation.
-----------------------------------------------------------------------------
1.3. Recommended approach
I strongly recommend to the eager reader making a first foray into the
discipline of traffic control, to become only casually familiar with the tc
command line utility, before concentrating on tcng. The tcng software package
defines an entire language for describing traffic control structures. At
first, this language may seem daunting, but mastery of these basics will
quickly provide the user with a much wider ability to employ (and deploy)
traffic control configurations than the direct use of tc would afford.
Where possible, I'll try to prefer describing the behaviour of the Linux
traffic control system in an abstract manner, although in many cases I'll
need to supply the syntax of one or the other common systems for defining
these structures. I may not supply examples in both the tcng language and the
tc command line, so the wise user will have some familiarity with both.
-----------------------------------------------------------------------------
1.4. Missing content, corrections and feedback
There is content yet missing from this HOWTO. In particular, the following
items will be added at some point to this documentation.
<EFBFBD><EFBFBD>*<2A> A description and diagram of GRED, WRR, PRIO and CBQ.
<EFBFBD><EFBFBD>*<2A> A section of examples.
<EFBFBD><EFBFBD>*<2A> A section detailing the classifiers.
<EFBFBD><EFBFBD>*<2A> A section discussing the techniques for measuring traffic.
<EFBFBD><EFBFBD>*<2A> A section covering meters.
<EFBFBD><EFBFBD>*<2A> More details on tcng.
I welcome suggestions, corrections and feedback at <mabrown@securepipe.com
>. All errors and omissions are strictly my fault. Although I have made every
effort to verify the factual correctness of the content presented herein, I
cannot accept any responsibility for actions taken under the influence of
this documentation.
-----------------------------------------------------------------------------
2. Overview of Concepts
This section will introduce traffic control and examine reasons for it,
identify a few advantages and disadvantages and introduce key concepts used
in traffic control.
-----------------------------------------------------------------------------
2.1. What is it?
Traffic control is the name given to the sets of queuing systems and
mechanisms by which packets are received and transmitted on a router. This
includes deciding which (and whether) packets to accept at what rate on the
input of an interface and determining which packets to transmit in what order
at what rate on the output of an interface.
In the overwhelming majority of situations, traffic control consists of a
single queue which collects entering packets and dequeues them as quickly as
the hardware (or underlying device) can accept them. This sort of queue is a
FIFO.
Note The default qdisc under Linux is the pfifo_fast, which is slightly more
complex than the FIFO.
There are examples of queues in all sorts of software. The queue is a way
of organizing the pending tasks or data (see also Section 2.5). Because
network links typically carry data in a serialized fashion, a queue is
required to manage the outbound data packets.
In the case of a desktop machine and an efficient webserver sharing the
same uplink to the Internet, the following contention for bandwidth may
occur. The web server may be able to fill up the output queue on the router
faster than the data can be transmitted across the link, at which point the
router starts to drop packets (its buffer is full!). Now, the desktop machine
(with an interactive application user) may be faced with packet loss and high
latency. Note that high latency sometimes leads to screaming users! By
separating the internal queues used to service these two different classes of
application, there can be better sharing of the network resource between the
two applications.
Traffic control is the set of tools which allows the user to have granular
control over these queues and the queuing mechanisms of a networked device.
The power to rearrange traffic flows and packets with these tools is
tremendous and can be complicated, but is no substitute for adequate
bandwidth.
The term Quality of Service (QoS) is often used as a synonym for traffic
control.
-----------------------------------------------------------------------------
2.2. Why use it?
Packet-switched networks differ from circuit based networks in one very
important regard. A packet-switched network itself is stateless. A
circuit-based network (such as a telephone network) must hold state within
the network. IP networks are stateless and packet-switched networks by
design; in fact, this statelessness is one of the fundamental strengths of
IP.
The weakness of this statelessness is the lack of differentiation between
types of flows. In simplest terms, traffic control allows an administrator to
queue packets differently based on attributes of the packet. It can even be
used to simulate the behaviour of a circuit-based network. This introduces
statefulness into the stateless network.
There are many practical reasons to consider traffic control, and many
scenarios in which using traffic control makes sense. Below are some examples
of common problems which can be solved or at least ameliorated with these
tools.
The list below is not an exhaustive list of the sorts of solutions
available to users of traffic control, but introduces the types of problems
that can be solved by using traffic control to maximize the usability of a
network connection.
Common traffic control solutions
<EFBFBD><EFBFBD>*<2A> Limit total bandwidth to a known rate; TBF, HTB with child class(es).
<EFBFBD><EFBFBD>*<2A> Limit the bandwidth of a particular user, service or client; HTB
classes and classifying with a filter. traffic.
<EFBFBD><EFBFBD>*<2A> Maximize TCP throughput on an asymmetric link; prioritize transmission
of ACK packets, wondershaper.
<EFBFBD><EFBFBD>*<2A> Reserve bandwidth for a particular application or user; HTB with
children classes and classifying.
<EFBFBD><EFBFBD>*<2A> Prefer latency sensitive traffic; PRIO inside an HTB class.
<EFBFBD><EFBFBD>*<2A> Managed oversubscribed bandwidth; HTB with borrowing.
<EFBFBD><EFBFBD>*<2A> Allow equitable distribution of unreserved bandwidth; HTB with
borrowing.
<EFBFBD><EFBFBD>*<2A> Ensure that a particular type of traffic is dropped; policer attached
to a filter with a drop action.
Remember, too that sometimes, it is simply better to purchase more
bandwidth. Traffic control does not solve all problems!
-----------------------------------------------------------------------------
2.3. Advantages
When properly employed, traffic control should lead to more predictable
usage of network resources and less volatile contention for these resources.
The network then meets the goals of the traffic control configuration. Bulk
download traffic can be allocated a reasonable amount of bandwidth even as
higher priority interactive traffic is simultaneously serviced. Even low
priority data transfer such as mail can be allocated bandwidth without
tremendously affecting the other classes of traffic.
In a larger picture, if the traffic control configuration represents policy
which has been communicated to the users, then users (and, by extension,
applications) know what to expect from the network.
-----------------------------------------------------------------------------
2.4. Disdvantages
Complexity is easily one of the most significant disadvantages of using
traffic control. There are ways to become familiar with traffic control tools
which ease the learning curve about traffic control and its mechanisms, but
identifying a traffic control misconfiguration can be quite a challenge.
Traffic control when used appropriately can lead to more equitable
distribution of network resources. It can just as easily be installed in an
inappropriate manner leading to further and more divisive contention for
resources.
The computing resources required on a router to support a traffic control
scenario need to be capable of handling the increased cost of maintaining the
traffic control structures. Fortunately, this is a small incremental cost,
but can become more significant as the configuration grows in size and
complexity.
For personal use, there's no training cost associated with the use of
traffic control, but a company may find that purchasing more bandwidth is a
simpler solution than employing traffic control. Training employees and
ensuring depth of knowledge may be more costly than investing in more
bandwidth.
Although traffic control on packet-switched networks covers a larger
conceptual area, you can think of traffic control as a way to provide [some
of] the statefulness of a circuit-based network to a packet-switched network.
-----------------------------------------------------------------------------
2.5. Queues
Queues form the backdrop for all of traffic control and are the integral
concept behind scheduling. A queue is a location (or buffer) containing a
finite number of items waiting for an action or service. In networking, a
queue is the place where packets (our units) wait to be transmitted by the
hardware (the service). In the simplest model, packets are transmitted in a
first-come first-serve basis [2]. In the discipline of computer networking
(and more generally computer science), this sort of a queue is known as a
FIFO.
Without any other mechanisms, a queue doesn't offer any promise for traffic
control. There are only two interesting actions in a queue. Anything entering
a queue is enqueued into the queue. To remove an item from a queue is to
dequeue that item.
A queue becomes much more interesting when coupled with other mechanisms
which can delay packets, rearrange, drop and prioritize packets in multiple
queues. A queue can also use subqueues, which allow for complexity of
behaviour in a scheduling operation.
From the perspective of the higher layer software, a packet is simply
enqueued for transmission, and the manner and order in which the enqueued
packets are transmitted is immaterial to the higher layer. So, to the higher
layer, the entire traffic control system may appear as a single queue [3]. It
is only by examining the internals of this layer that the traffic control
structures become exposed and available.
-----------------------------------------------------------------------------
2.6. Flows
A flow is a distinct connection or conversation between two hosts. Any
unique set of packets between two hosts can be regarded as a flow. Under TCP
the concept of a connection with a source IP and port and destination IP and
port represents a flow. A UDP flow can be similarly defined.
Traffic control mechanisms frequently separate traffic into classes of
flows which can be aggregated and transmitted as an aggregated flow (consider
DiffServ). Alternate mechanisms may attempt to divide bandwidth equally based
on the individual flows.
Flows become important when attempting to divide bandwidth equally among a
set of competing flows, especially when some applications deliberately build
a large number of flows.
-----------------------------------------------------------------------------
2.7. Tokens and buckets
Two of the key underpinnings of a shaping mechanisms are the interrelated
concepts of tokens and buckets.
In order to control the rate of dequeuing, an implementation can count the
number of packets or bytes dequeued as each item is dequeued, although this
requires complex usage of timers and measurements to limit accurately.
Instead of calculating the current usage and time, one method, used widely in
traffic control, is to generate tokens at a desired rate, and only dequeue
packets or bytes if a token is available.
Consider the analogy of an amusement park ride with a queue of people
waiting to experience the ride. Let's imagine a track on which carts traverse
a fixed track. The carts arrive at the head of the queue at a fixed rate. In
order to enjoy the ride, each person must wait for an available cart. The
cart is analogous to a token and the person is analogous to a packet. Again,
this mechanism is a rate-limiting or shaping mechanism. Only a certain number
of people can experience the ride in a particular period.
To extend the analogy, imagine an empty line for the amusement park ride
and a large number of carts sitting on the track ready to carry people. If a
large number of people entered the line together many (maybe all) of them
could experience the ride because of the carts available and waiting. The
number of carts available is a concept analogous to the bucket. A bucket
contains a number of tokens and can use all of the tokens in bucket without
regard for passage of time.
And to complete the analogy, the carts on the amusement park ride (our
tokens) arrive at a fixed rate and are only kept available up to the size of
the bucket. So, the bucket is filled with tokens according to the rate, and
if the tokens are not used, the bucket can fill up. If tokens are used the
bucket will not fill up. Buckets are a key concept in supporting bursty
traffic such as HTTP.
The TBF qdisc is a classical example of a shaper (the section on TBF
includes a diagram which may help to visualize the token and bucket
concepts). The TBF generates rate tokens and only transmits packets when a
token is available. Tokens are a generic shaping concept.
In the case that a queue does not need tokens immediately, the tokens can
be collected until they are needed. To collect tokens indefinitely would
negate any benefit of shaping so tokens are collected until a certain number
of tokens has been reached. Now, the queue has tokens available for a large
number of packets or bytes which need to be dequeued. These intangible tokens
are stored in an intangible bucket, and the number of tokens that can be
stored depends on the size of the bucket.
This also means that a bucket full of tokens may be available at any
instant. Very predictable regular traffic can be handled by small buckets.
Larger buckets may be required for burstier traffic, unless one of the
desired goals is to reduce the burstiness of the flows.
In summary, tokens are generated at rate, and a maximum of a bucket's worth
of tokens may be collected. This allows bursty traffic to be handled, while
smoothing and shaping the transmitted traffic.
The concepts of tokens and buckets are closely interrelated and are used in
both TBF (one of the classless qdiscs) and HTB (one of the classful qdiscs).
Within the tcng language, the use of two- and three-color meters is
indubitably a token and bucket concept.
-----------------------------------------------------------------------------
2.8. Packets and frames
The terms for data sent across network changes depending on the layer the
user is examining. This document will rather impolitely (and incorrectly)
gloss over the technical distinction between packets and frames although they
are outlined here.
The word frame is typically used to describe a layer 2 (data link) unit of
data to be forwarded to the next recipient. Ethernet interfaces, PPP
interfaces, and T1 interfaces all name their layer 2 data unit a frame. The
frame is actually the unit on which traffic control is performed.
A packet, on the other hand, is a higher layer concept, representing layer
3 (network) units. The term packet is preferred in this documentation,
although it is slightly inaccurate.
-----------------------------------------------------------------------------
3. Traditional Elements of Traffic Control
-----------------------------------------------------------------------------
3.1. Shaping
Shapers delay packets to meet a desired rate.
Shaping is the mechanism by which packets are delayed before transmission
in an output queue to meet a desired output rate. This is one of the most
common desires of users seeking bandwidth control solutions. The act of
delaying a packet as part of a traffic control solution makes every shaping
mechanism into a non-work-conserving mechanism, meaning roughly: "Work is
required in order to delay packets."
Viewed in reverse, a non-work-conserving queuing mechanism is performing a
shaping function. A work-conserving queuing mechanism (see PRIO) would not be
capable of delaying a packet.
Shapers attempt to limit or ration traffic to meet but not exceed a
configured rate (frequently measured in packets per second or bits/bytes per
second). As a side effect, shapers can smooth out bursty traffic [4]. One of
the advantages of shaping bandwidth is the ability to control latency of
packets. The underlying mechanism for shaping to a rate is typically a token
and bucket mechanism. See also Section 2.7 for further detail on tokens and
buckets.
-----------------------------------------------------------------------------
3.2. Scheduling
Schedulers arrange and/or rearrange packets for output.
Scheduling is the mechanism by which packets are arranged (or rearranged)
between input and output of a particular queue. The overwhelmingly most
common scheduler is the FIFO (first-in first-out) scheduler. From a larger
perspective, any set of traffic control mechanisms on an output queue can be
regarded as a scheduler, because packets are arranged for output.
Other generic scheduling mechanisms attempt to compensate for various
networking conditions. A fair queuing algorithm (see SFQ) attempts to prevent
any single client or flow from dominating the network usage. A round-robin
algorithm (see WRR) gives each flow or client a turn to dequeue packets.
Other sophisticated scheduling algorithms attempt to prevent backbone
overload (see GRED) or refine other scheduling mechanisms (see ESFQ).
-----------------------------------------------------------------------------
3.3. Classifying
Classifiers sort or separate traffic into queues.
Classifying is the mechanism by which packets are separated for different
treatment, possibly different output queues. During the process of accepting,
routing and transmitting a packet, a networking device can classify the
packet a number of different ways. Classification can include marking the
packet, which usually happens on the boundary of a network under a single
administrative control or classification can occur on each hop individually.
The Linux model (see Section 4.3) allows for a packet to cascade across a
series of classifiers in a traffic control structure and to be classified in
conjunction with policers (see also Section 4.5).
-----------------------------------------------------------------------------
3.4. Policing
Policers measure and limit traffic in a particular queue.
Policing, as an element of traffic control, is simply a mechanism by which
traffic can be limited. Policing is most frequently used on the network
border to ensure that a peer is not consuming more than its allocated
bandwidth. A policer will accept traffic to a certain rate, and then perform
an action on traffic exceeding this rate. A rather harsh solution is to drop
the traffic, although the traffic could be reclassified instead of being
dropped.
A policer is a yes/no question about the rate at which traffic is entering
a queue. If the packet is about to enter a queue below a given rate, take one
action (allow the enqueuing). If the packet is about to enter a queue above a
given rate, take another action. Although the policer uses a token bucket
mechanism internally, it does not have the capability to delay a packet as a
shaping mechanism does.
-----------------------------------------------------------------------------
3.5. Dropping
Dropping discards an entire packet, flow or classification.
Dropping a packet is a mechanism by which a packet is discarded.
-----------------------------------------------------------------------------
3.6. Marking
Marking is a mechanism by which the packet is altered.
Note This is not fwmark. The iptablestarget MARKand the ipchains--markare
used to modify packet metadata, not the packet itself.
Traffic control marking mechanisms install a DSCP on the packet itself,
which is then used and respected by other routers inside an administrative
domain (usually for DiffServ).
-----------------------------------------------------------------------------
4. Components of Linux Traffic Control
Table 1. Correlation between traffic control elements and Linux components
+-------------------+-------------------------------------------------------+
|traditional element|Linux component |
+-------------------+-------------------------------------------------------+
|shaping |The class offers shaping capabilities. |
+-------------------+-------------------------------------------------------+
|scheduling |A qdisc is a scheduler. Schedulers can be simple such |
| |as the FIFO or complex, containing classes and other |
| |qdiscs, such as HTB. |
+-------------------+-------------------------------------------------------+
|classifying |The filter object performs the classification through |
| |the agency of a classifier object. Strictly speaking, |
| |Linux classifiers cannot exist outside of a filter. |
+-------------------+-------------------------------------------------------+
|policing |A policer exists in the Linux traffic control |
| |implementation only as part of a filter. |
+-------------------+-------------------------------------------------------+
|dropping |To drop traffic requires a filter with a policer which |
| |uses "drop" as an action. |
+-------------------+-------------------------------------------------------+
|marking |The dsmark qdisc is used for marking. |
+-------------------+-------------------------------------------------------+
-----------------------------------------------------------------------------
4.1. qdisc
Simply put, a qdisc is a scheduler (Section 3.2). Every output interface
needs a scheduler of some kind, and the default scheduler is a FIFO. Other
qdiscs available under Linux will rearrange the packets entering the
scheduler's queue in accordance with that scheduler's rules.
The qdisc is the major building block on which all of Linux traffic control
is built, and is also called a queuing discipline.
The classful qdiscs can contain classes, and provide a handle to which to
attach filters. There is no prohibition on using a classful qdisc without
child classes, although this will usually consume cycles and other system
resources for no benefit.
The classless qdiscs can contain no classes, nor is it possible to attach
filter to a classless qdisc. Because a classless qdisc contains no children
of any kind, there is no utility to classifying. This means that no filter
can be attached to a classless qdisc.
A source of terminology confusion is the usage of the terms root qdisc and
ingress qdisc. These are not really queuing disciplines, but rather locations
onto which traffic control structures can be attached for egress (outbound
traffic) and ingress (inbound traffic).
Each interface contains both. The primary and more common is the egress
qdisc, known as the root qdisc. It can contain any of the queuing disciplines
(qdiscs) with potential classes and class structures. The overwhelming
majority of documentation applies to the root qdisc and its children. Traffic
transmitted on an interface traverses the egress or root qdisc.
For traffic accepted on an interface, the ingress qdisc is traversed. With
its limited utility, it allows no child class to be created, and only exists
as an object onto which a filter can be attached. For practical purposes, the
ingress qdisc is merely a convenient object onto which to attach a policer to
limit the amount of traffic accepted on a network interface.
In short, you can do much more with an egress qdisc because it contains a
real qdisc and the full power of the traffic control system. An ingress qdisc
can only support a policer. The remainder of the documentation will concern
itself with traffic control structures attached to the root qdisc unless
otherwise specified.
-----------------------------------------------------------------------------
4.2. class
Classes only exist inside a classful qdisc (e.g., HTB and CBQ). Classes are
immensely flexible and can always contain either multiple children classes or
a single child qdisc [5]. There is no prohibition against a class containing
a classful qdisc itself, which facilitates tremendously complex traffic
control scenarios.
Any class can also have an arbitrary number of filters attached to it,
which allows the selection of a child class or the use of a filter to
reclassify or drop traffic entering a particular class.
A leaf class is a terminal class in a qdisc. It contains a qdisc (default
FIFO) and will never contain a child class. Any class which contains a child
class is an inner class (or root class) and not a leaf class.
-----------------------------------------------------------------------------
4.3. filter
The filter is the most complex component in the Linux traffic control
system. The filter provides a convenient mechanism for gluing together
several of the key elements of traffic control. The simplest and most obvious
role of the filter is to classify (see Section 3.3) packets. Linux filters
allow the user to classify packets into an output queue with either several
different filters or a single filter.
<EFBFBD><EFBFBD>*<2A> A filter must contain a classifier phrase.
<EFBFBD><EFBFBD>*<2A> A filter may contain a policer phrase.
Filters can be attached either to classful qdiscs or to classes, however
the enqueued packet always enters the root qdisc first. After the filter
attached to the root qdisc has been traversed, the packet may be directed to
any subclasses (which can have their own filters) where the packet may
undergo further classification.
-----------------------------------------------------------------------------
4.4. classifier
Filter objects, which can be manipulated using tc, can use several
different classifying mechanisms, the most common of which is the u32
classifier. The u32 classifier allows the user to select packets based on
attributes of the packet.
The classifiers are tools which can be used as part of a filter to identify
characteristics of a packet or a packet's metadata. The Linux classfier
object is a direct analogue to the basic operation and elemental mechanism of
traffic control classifying.
-----------------------------------------------------------------------------
4.5. policer
This elemental mechanism is only used in Linux traffic control as part of a
filter. A policer calls one action above and another action below the
specified rate. Clever use of policers can simulate a three-color meter. See
also Section 10.
Although both policing and shaping are basic elements of traffic control
for limiting bandwidth usage a policer will never delay traffic. It can only
perform an action based on specified criteria. See also Example 5.
-----------------------------------------------------------------------------
4.6. drop
This basic traffic control mechanism is only used in Linux traffic control
as part of a policer. Any policer attached to any filter could have a drop
action.
Note The only place in the Linux traffic control system where a packet can be
explicitly dropped is a policer. A policer can limit packets enqueued at
a specific rate, or it can be configured to drop all traffic matching a
particular pattern [6].
There are, however, places within the traffic control system where a packet
may be dropped as a side effect. For example, a packet will be dropped if the
scheduler employed uses this method to control flows as the GRED does.
Also, a shaper or scheduler which runs out of its allocated buffer space
may have to drop a packet during a particularly bursty or overloaded period.
-----------------------------------------------------------------------------
4.7. handle
Every class and classful qdisc (see also Section 7) requires a unique
identifier within the traffic control structure. This unique identifier is
known as a handle and has two constituent members, a major number and a minor
number. These numbers can be assigned arbitrarily by the user in accordance
with the following rules [7].
The numbering of handles for classes and qdiscs
major
This parameter is completely free of meaning to the kernel. The user
may use an arbitrary numbering scheme, however all objects in the traffic
control structure with the same parent must share a major handle number.
Conventional numbering schemes start at 1 for objects attached directly
to the root qdisc.
minor
This parameter unambiguously identifies the object as a qdisc if minor
is 0. Any other value identifies the object as a class. All classes
sharing a parent must have unique minor numbers.
The special handle ffff:0 is reserved for the ingress qdisc.
The handle is used as the target in classid and flowid phrases of tc filter
statements. These handles are external identifiers for the objects, usable by
userland applications. The kernel maintains internal identifiers for each
object.
-----------------------------------------------------------------------------
5. Software and Tools
-----------------------------------------------------------------------------
5.1. Kernel requirements
Many distributions provide kernels with modular or monolithic support for
traffic control (Quality of Service). Custom kernels may not already provide
support (modular or not) for the required features. If not, this is a very
brief listing of the required kernel options.
The user who has little or no experience compiling a kernel is recommended
to Kernel HOWTO. Experienced kernel compilers should be able to determine
which of the below options apply to the desired configuration, after reading
a bit more about traffic control and planning.
Example 1. Kernel compilation options [8]
#
# QoS and/or fair queueing
#
CONFIG_NET_SCHED=y
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_CSZ=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_INGRESS=m
CONFIG_NET_QOS=y
CONFIG_NET_ESTIMATOR=y
CONFIG_NET_CLS=y
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_POLICE=y
A kernel compiled with the above set of options will provide modular
support for almost everything discussed in this documentation. The user may
need to modprobe module before using a given feature. Again, the confused
user is recommended to the Kernel HOWTO, as this document cannot adequately
address questions about the use of the Linux kernel.
-----------------------------------------------------------------------------
5.2. iproute2 tools (tc)
iproute2 is a suite of command line utilities which manipulate kernel
structures for IP networking configuration on a machine. For technical
documentation on these tools, see the iproute2 documentation and for a more
expository discussion, the documentation at [http://linux-ip.net/]
linux-ip.net. Of the tools in the iproute2 package, the binary tc is the only
one used for traffic control. This HOWTO will ignore the other tools in the
suite.
Because it interacts with the kernel to direct the creation, deletion and
modification of traffic control structures, the tc binary needs to be
compiled with support for all of the qdiscs you wish to use. In particular,
the HTB qdisc is not supported yet in the upstream iproute2 package. See
Section 7.1 for more information.
The tc tool performs all of the configuration of the kernel structures
required to support traffic control. As a result of its many uses, the
command syntax can be described (at best) as arcane. The utility takes as its
first non-option argument one of three Linux traffic control components,
qdisc, class or filter.
Example 2. tc command usage
[root@leander]# tc
Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }
where OBJECT := { qdisc | class | filter }
OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] }
Each object accepts further and different options, and will be incompletely
described and documented below. The hints in the examples below are designed
to introduce the vagaries of tc command line syntax. For more examples,
consult the [http://lartc.org/howto/] LARTC HOWTO. For even better
understanding, consult the kernel and iproute2 code.
Example 3. tc qdisc
[root@leander]# tc qdisc add \ (1)
> dev eth0 \ (2)
> root \ (3)
> handle 1:0 \ (4)
> htb (5)
(1) Add a queuing discipline. The verb could also be del.
(2) Specify the device onto which we are attaching the new queuing
discipline.
(3) This means "egress" to tc. The word root must be used, however. Another
qdisc with limited functionality, the ingress qdisc can be attached to
the same device.
(4) The handle is a user-specified number of the form major:minor. The
minor number for any queueing discipline handle must always be zero (0).
An acceptable shorthand for a qdisc handle is the syntax "1:", where the
minor number is assumed to be zero (0) if not specified.
(5) This is the queuing discipline to attach, HTB in this example. Queuing
discipline specific parameters will follow this. In the example here, we
add no qdisc-specific parameters.
Above was the simplest use of the tc utility for adding a queuing
discipline to a device. Here's an example of the use of tc to add a class to
an existing parent class.
Example 4. tc class
[root@leander]# tc class add \ (1)
> dev eth0 \ (2)
> parent 1:1 \ (3)
> classid 1:6 \ (4)
> htb \ (5)
> rate 256kbit \ (6)
> ceil 512kbit (7)
(1) Add a class. The verb could also be del.
(2) Specify the device onto which we are attaching the new class.
(3) Specify the parent handle to which we are attaching the new class.
(4) This is a unique handle (major:minor) identifying this class. The minor
number must be any non-zero (0) number.
(5) Both of the classful qdiscs require that any children classes be
classes of the same type as the parent. Thus an HTB qdisc will contain
HTB classes.
(6) (7)
This is a class specific parameter. Consult Section 7.1 for more detail
on these parameters.
Example 5. tc filter
[root@leander]# tc filter add \ (1)
> dev eth0 \ (2)
> parent 1:0 \ (3)
> protocol ip \ (4)
> prio 5 \ (5)
> u32 \ (6)
> match ip port 22 0xffff \ (7)
> match ip tos 0x10 0xff \ (8)
> flowid 1:6 \ (9)
> police \ (10)
> rate 32000bps \ (11)
> burst 10240 \ (12)
> mpu 0 \ (13)
> action drop/continue (14)
(1) Add a filter. The verb could also be del.
(2) Specify the device onto which we are attaching the new filter.
(3) Specify the parent handle to which we are attaching the new filter.
(4) This parameter is required. It's use should be obvious, although I
don't know more.
(5) The prio parameter allows a given filter to be preferred above another.
The pref is a synonym.
(6) This is a classifier, and is a required phrase in every tc filter
command.
(7) (8)
These are parameters to the classifier. In this case, packets with a
type of service flag (indicating interactive usage) and matching port 22
will be selected by this statement.
(9) The flowid specifies the handle of the target class (or qdisc) to which
a matching filter should send its selected packets.
(10)
This is the policer, and is an optional phrase in every tc filter
command.
(11) The policer will perform one action above this rate, and another action
below (see action parameter).
(12) The burst is an exact analog to burst in HTB (burst is a buckets
concept).
(13) The minimum policed unit. To count all traffic, use an mpu of zero (0).
(14) The action indicates what should be done if the rate based on the
attributes of the policer. The first word specifies the action to take if
the policer has been exceeded. The second word specifies action to take
otherwise.
As evidenced above, the tc command line utility has an arcane and complex
syntax, even for simple operations such as these examples show. It should
come as no surprised to the reader that there exists an easier way to
configure Linux traffic control. See the next section, Section 5.3.
-----------------------------------------------------------------------------
5.3. tcng, Traffic Control Next Generation
FIXME; sing the praises of tcng. See also [http://tldp.org/HOWTO/
Traffic-Control-tcng-HTB-HOWTO/] Traffic Control using tcng and HTB HOWTO
and tcng documentation.
Traffic control next generation (hereafter, tcng) provides all of the power
of traffic control under Linux with twenty percent of the headache.
-----------------------------------------------------------------------------
5.4. IMQ, Intermediate Queuing device
FIXME; must discuss IMQ. See also Patrick McHardy's website on [http://
trash.net/~kaber/imq/] IMQ.
-----------------------------------------------------------------------------
6. Classless Queuing Disciplines (qdiscs)
Each of these queuing disciplines can be used as the primary qdisc on an
interface, or can be used inside a leaf class of a classful qdiscs. These are
the fundamental schedulers used under Linux. Note that the default scheduler
is the pfifo_fast.
-----------------------------------------------------------------------------
6.1. FIFO, First-In First-Out (pfifo and bfifo)
Note This is not the default qdisc on Linux interfaces. Be certain to see
Section 6.2 for the full details on the default (pfifo_fast) qdisc.
The FIFO algorithm forms the basis for the default qdisc on all Linux
network interfaces (pfifo_fast). It performs no shaping or rearranging of
packets. It simply transmits packets as soon as it can after receiving and
queuing them. This is also the qdisc used inside all newly created classes
until another qdisc or a class replaces the FIFO.
[fifo-qdisc]
A real FIFO qdisc must, however, have a size limit (a buffer size) to
prevent it from overflowing in case it is unable to dequeue packets as
quickly as it receives them. Linux implements two basic FIFO qdiscs, one
based on bytes, and one on packets. Regardless of the type of FIFO used, the
size of the queue is defined by the parameter limit. For a pfifo the unit is
understood to be packets and for a bfifo the unit is understood to be bytes.
Example 6. Specifying a limit for a packet or byte FIFO
[root@leander]# cat bfifo.tcc
/*
* make a FIFO on eth0 with 10kbyte queue size
*
*/
dev eth0 {
egress {
fifo (limit 10kB );
}
}
[root@leander]# tcc < bfifo.tcc
# ================================ Device eth0 ================================
tc qdisc add dev eth0 handle 1:0 root dsmark indices 1 default_index 0
tc qdisc add dev eth0 handle 2:0 parent 1:0 bfifo limit 10240
[root@leander]# cat pfifo.tcc
/*
* make a FIFO on eth0 with 30 packet queue size
*
*/
dev eth0 {
egress {
fifo (limit 30p );
}
}
[root@leander]# tcc < pfifo.tcc
# ================================ Device eth0 ================================
tc qdisc add dev eth0 handle 1:0 root dsmark indices 1 default_index 0
tc qdisc add dev eth0 handle 2:0 parent 1:0 pfifo limit 30
-----------------------------------------------------------------------------
6.2. pfifo_fast, the default Linux qdisc
The pfifo_fast qdisc is the default qdisc for all interfaces under Linux.
Based on a conventional FIFO qdisc, this qdisc also provides some
prioritization. It provides three different bands (individual FIFOs) for
separating traffic. The highest priority traffic (interactive flows) are
placed into band 0 and are always serviced first. Similarly, band 1 is always
emptied of pending packets before band 2 is dequeued.
[pfifo_fast-qdisc]
There is nothing configurable to the end user about the pfifo_fast qdisc.
For exact details on the priomap and use of the ToS bits, see the pfifo-fast
section of the LARTC HOWTO.
-----------------------------------------------------------------------------
6.3. SFQ, Stochastic Fair Queuing
The SFQ qdisc attempts to fairly distribute opportunity to transmit data to
the network among an arbitrary number of flows. It accomplishes this by using
a hash function to separate the traffic into separate (internally maintained)
FIFOs which are dequeued in a round-robin fashion. Because there is the
possibility for unfairness to manifest in the choice of hash function, this
function is altered periodically. Perturbation (the parameter perturb) sets
this periodicity.
[sfq-qdisc]
Example 7. Creating an SFQ
[root@leander]# cat sfq.tcc
/*
* make an SFQ on eth0 with a 10 second perturbation
*
*/
dev eth0 {
egress {
sfq( perturb 10s );
}
}
[root@leander]# tcc < sfq.tcc
# ================================ Device eth0 ================================
tc qdisc add dev eth0 handle 1:0 root dsmark indices 1 default_index 0
tc qdisc add dev eth0 handle 2:0 parent 1:0 sfq perturb 10
Unfortunately, some clever software (e.g. Kazaa and eMule among others)
obliterate the benefit of this attempt at fair queuing by opening as many TCP
sessions (flows) as can be sustained. In many networks, with well-behaved
users, SFQ can adequately distribute the network resources to the contending
flows, but other measures may be called for when obnoxious applications have
invaded the network.
See also Section 6.4 for an SFQ qdisc with more exposed parameters for the
user to manipulate.
-----------------------------------------------------------------------------
6.4. ESFQ, Extended Stochastic Fair Queuing
Conceptually, this qdisc is no different than SFQ although it allows the
user to control more parameters than its simpler cousin. This qdisc was
conceived to overcome the shortcoming of SFQ identified above. By allowing
the user to control which hashing algorithm is used for distributing access
to network bandwidth, it is possible for the user to reach a fairer real
distribution of bandwidth.
Example 8. ESFQ usage
Usage: ... esfq [ perturb SECS ] [ quantum BYTES ] [ depth FLOWS ]
[ divisor HASHBITS ] [ limit PKTS ] [ hash HASHTYPE]
Where:
HASHTYPE := { classic | src | dst }
FIXME; need practical experience and/or attestation here.
-----------------------------------------------------------------------------
6.5. GRED, Generic Random Early Drop
FIXME; I have never used this. Need practical experience or attestation.
Theory declares that a RED algorithm is useful on a backbone or core
network, but not as useful near the end-user. See the section on flows to see
a general discussion of the thirstiness of TCP.
-----------------------------------------------------------------------------
6.6. TBF, Token Bucket Filter
This qdisc is built on tokens and buckets. It simply shapes traffic
transmitted on an interface. To limit the speed at which packets will be
dequeued from a particular interface, the TBF qdisc is the perfect solution.
It simply slows down transmitted traffic to the specified rate.
Packets are only transmitted if there are sufficient tokens available.
Otherwise, packets are deferred. Delaying packets in this fashion will
introduce an artificial latency into the packet's round trip time.
[tbf-qdisc]
Example 9. Creating a 256kbit/s TBF
[root@leander]# cat tbf.tcc
/*
* make a 256kbit/s TBF on eth0
*
*/
dev eth0 {
egress {
tbf( rate 256 kbps, burst 20 kB, limit 20 kB, mtu 1514 B );
}
}
[root@leander]# tcc < tbf.tcc
# ================================ Device eth0 ================================
tc qdisc add dev eth0 handle 1:0 root dsmark indices 1 default_index 0
tc qdisc add dev eth0 handle 2:0 parent 1:0 tbf burst 20480 limit 20480 mtu 1514 rate 32000bps
-----------------------------------------------------------------------------
7. Classful Queuing Disciplines (qdiscs)
The flexibility and control of Linux traffic control can be unleashed
through the agency of the classful qdiscs. Remember that the classful queuing
disciplines can have filters attached to them, allowing packets to be
directed to particular classes and subqueues.
There are several common terms to describe classes directly attached to the
root qdisc and terminal classes. Classess attached to the root qdisc are
known as root classes, and more generically inner classes. Any terminal class
in a particular queuing discipline is known as a leaf class by analogy to the
tree structure of the classes. Besides the use of figurative language
depicting the structure as a tree, the language of family relationships is
also quite common.
-----------------------------------------------------------------------------
7.1. HTB, Hierarchical Token Bucket
HTB uses the concepts of tokens and buckets along with the class-based
system and filters to allow for complex and granular control over traffic.
With a complex borrowing model, HTB can perform a variety of sophisticated
traffic control techniques. One of the easiest ways to use HTB immediately is
that of shaping.
By understanding tokens and buckets or by grasping the function of TBF, HTB
should be merely a logical step. This queuing discipline allows the user to
define the characteristics of the tokens and bucket used and allows the user
to nest these buckets in an arbitrary fashion. When coupled with a
classifying scheme, traffic can be controlled in a very granular fashion.
Below is example output of the syntax for HTB on the command line with the
tc tool. Although the syntax for tcng is a language of its own, the rules for
HTB are the same.
Example 10. tc usage for HTB
Usage: ... qdisc add ... htb [default N] [r2q N]
default minor id of class to which unclassified packets are sent {0}
r2q DRR quantums are computed as rate in Bps/r2q {10}
debug string of 16 numbers each 0-3 {0}
... class add ... htb rate R1 burst B1 [prio P] [slot S] [pslot PS]
[ceil R2] [cburst B2] [mtu MTU] [quantum Q]
rate rate allocated to this class (class can still borrow)
burst max bytes burst which can be accumulated during idle period {computed}
ceil definite upper class rate (no borrows) {rate}
cburst burst but for ceil {computed}
mtu max packet size we create rate map for {1600}
prio priority of leaf; lower are served first {0}
quantum how much bytes to serve from leaf at once {use r2q}
TC HTB version 3.3
-----------------------------------------------------------------------------
7.1.1. Software requirements
Unlike almost all of the other software discussed, HTB is a newer queuing
discipline and your distribution may not have all of the tools and capability
you need to use HTB. The kernel must support HTB; kernel version 2.4.20 and
later support it in the stock distribution, although earlier kernel versions
require patching. To enable userland support for HTB, see [http://
luxik.cdi.cz/~devik/qos/htb/] HTB for an iproute2 patch to tc.
-----------------------------------------------------------------------------
7.1.2. Shaping
One of the most common applications of HTB involves shaping transmitted
traffic to a specific rate.
All shaping occurs in leaf classes. No shaping occurs in inner or root
classes as they only exist to suggest how the borrowing model should
distribute available tokens.
-----------------------------------------------------------------------------
7.1.3. Borrowing
A fundamental part of the HTB qdisc is the borrowing mechanism. Children
classes borrow tokens from their parents once they have exceeded rate. A
child class will continue to attempt to borrow until it reaches ceil, at
which point it will begin to queue packets for transmission until more tokens
/ctokens are available. As there are only two primary types of classes which
can be created with HTB the following table and diagram identify the various
possible states and the behaviour of the borrowing mechanisms.
Table 2. HTB class states and potential actions taken
+------+-----+--------------+-----------------------------------------------+
|type |class|HTB internal |action taken |
|of |state|state | |
|class | | | |
+------+-----+--------------+-----------------------------------------------+
|leaf |< |HTB_CAN_SEND |Leaf class will dequeue queued bytes up to |
| |rate | |available tokens (no more than burst packets) |
+------+-----+--------------+-----------------------------------------------+
|leaf |> |HTB_MAY_BORROW|Leaf class will attempt to borrow tokens/ |
| |rate,| |ctokens from parent class. If tokens are |
| |< | |available, they will be lent in quantum |
| |ceil | |increments and the leaf class will dequeue up |
| | | |to cburst bytes |
+------+-----+--------------+-----------------------------------------------+
|leaf |> |HTB_CANT_SEND |No packets will be dequeued. This will cause |
| |ceil | |packet delay and will increase latency to meet |
| | | |the desired rate. |
+------+-----+--------------+-----------------------------------------------+
|inner,|< |HTB_CAN_SEND |Inner class will lend tokens to children. |
|root |rate | | |
+------+-----+--------------+-----------------------------------------------+
|inner,|> |HTB_MAY_BORROW|Inner class will attempt to borrow tokens/ |
|root |rate,| |ctokens from parent class, lending them to |
| |< | |competing children in quantum increments per |
| |ceil | |request. |
+------+-----+--------------+-----------------------------------------------+
|inner,|> |HTB_CANT_SEND |Inner class will not attempt to borrow from its|
|root |ceil | |parent and will not lend tokens/ctokens to |
| | | |children classes. |
+------+-----+--------------+-----------------------------------------------+
This diagram identifies the flow of borrowed tokens and the manner in which
tokens are charged to parent classes. In order for the borrowing model to
work, each class must have an accurate count of the number of tokens used by
itself and all of its children. For this reason, any token used in a child or
leaf class is charged to each parent class until the root class is reached.
Any child class which wishes to borrow a token will request a token from
its parent class, which if it is also over its rate will request to borrow
from its parent class until either a token is located or the root class is
reached. So the borrowing of tokens flows toward the leaf classes and the
charging of the usage of tokens flows toward the root class.
[htb-borrow]
Note in this diagram that there are several HTB root classes. Each of these
root classes can simulate a virtual circuit.
-----------------------------------------------------------------------------
7.1.4. HTB class parameters
default
An optional parameter with every HTB qdisc object, the default default
is 0, which cause any unclassified traffic to be dequeued at hardware
speed, completely bypassing any of the classes attached to the root
qdisc.
rate
Used to set the minimum desired speed to which to limit transmitted
traffic. This can be considered the equivalent of a committed information
rate (CIR), or the guaranteed bandwidth for a given leaf class.
ceil
Used to set the maximum desired speed to which to limit the transmitted
traffic. The borrowing model should illustrate how this parameter is
used. This can be considered the equivalent of "burstable bandwidth".
burst
This is the size of the rate bucket (see Tokens and buckets). HTB will
dequeue burst bytes before awaiting the arrival of more tokens.
cburst
This is the size of the ceil bucket (see Tokens and buckets). HTB will
dequeue cburst bytes before awaiting the arrival of more ctokens.
quantum
This is a key parameter used by HTB to control borrowing. Normally, the
correct quantum is calculated by HTB, not specified by the user. Tweaking
this parameter can have tremendous effects on borrowing and shaping under
contention, because it is used both to split traffic between children
classes over rate (but below ceil) and to transmit packets from these
same classes.
r2q
Also, usually calculated for the user, r2q is a hint to HTB to help
determine the optimal quantum for a particular class.
mtu
prio
-----------------------------------------------------------------------------
7.1.5. Rules
Below are some general guidelines to using HTB culled from [http://
docum.org/] http://docum.org/ and the LARTC mailing list. These rules are
simply a recommendation for beginners to maximize the benefit of HTB until
gaining a better understanding of the practical application of HTB.
<EFBFBD><EFBFBD>*<2A> Shaping with HTB occurs only in leaf classes. See also Section 7.1.2.
<EFBFBD><EFBFBD>*<2A> Because HTB does not shape in any class except the leaf class, the sum
of the rates of leaf classes should not exceed the ceil of a parent
class. Ideally, the sum of the rates of the children classes would match
the rate of the parent class, allowing the parent class to distribute
leftover bandwidth (ceil - rate) among the children classes.
This key concept in employing HTB bears repeating. Only leaf classes
actually shape packets; packets are only delayed in these leaf classes.
The inner classes (all the way up to the root class) exist to define how
borrowing/lending occurs (see also Section 7.1.3).
<EFBFBD><EFBFBD>*<2A> The quantum is only only used when a class is over rate but below ceil.
<EFBFBD><EFBFBD>*<2A> The quantum should be set at MTU or higher. HTB will dequeue a single
packet at least per service opportunity even if quantum is too small. In
such a case, it will not be able to calculate accurately the real
bandwidth consumed [9].
<EFBFBD><EFBFBD>*<2A> Parent classes lend tokens to children in increments of quantum, so for
maximum granularity and most instantaneously evenly distributed
bandwidth, quantum should be as low as possible while still no less than
MTU.
<EFBFBD><EFBFBD>*<2A> A distinction between tokens and ctokens is only meaningful in a leaf
class, because non-leaf classes only lend tokens to child classes.
<EFBFBD><EFBFBD>*<2A> HTB borrowing could more accurately be described as "using".
-----------------------------------------------------------------------------
7.2. PRIO, priority scheduler
The PRIO classful qdisc works on a very simple precept. When it is ready to
dequeue a packet, the first class is checked for a packet. If there's a
packet, it gets dequeued. If there's no packet, then the next class is
checked, until the queuing mechanism has no more classes to check.
This section will be completed at a later date.
-----------------------------------------------------------------------------
7.3. CBQ, Class Based Queuing
CBQ is the classic implementation (also called venerable) of a traffic
control system. This section will be completed at a later date.
-----------------------------------------------------------------------------
8. Rules, Guidelines and Approaches
-----------------------------------------------------------------------------
8.1. General Rules of Linux Traffic Control
There are a few general rules which ease the study of Linux traffic
control. Traffic control structures under Linux are the same whether the
initial configuration has been done with tcng or with tc.
<EFBFBD><EFBFBD>*<2A> Any router performing a shaping function should be the bottleneck on
the link, and should be shaping slightly below the maximum available link
bandwidth. This prevents queues from forming in other routers, affording
maximum control of packet latency/deferral to the shaping device.
<EFBFBD><EFBFBD>*<2A> A device can only shape traffic it transmits [10]. Because the traffic
has already been received on an input interface, the traffic cannot be
shaped. A traditional solution to this problem is an ingress policer.
<EFBFBD><EFBFBD>*<2A> Every interface must have a qdisc. The default qdisc (the pfifo_fast
qdisc) is used when another qdisc is not explicitly attached to the
interface.
<EFBFBD><EFBFBD>*<2A> One of the classful qdiscs added to an interface with no children
classes typically only consumes CPU for no benefit.
<EFBFBD><EFBFBD>*<2A> Any newly created class contains a FIFO. This qdisc can be replaced
explicitly with any other qdisc. The FIFO qdisc will be removed
implicitly if a child class is attached to this class.
<EFBFBD><EFBFBD>*<2A> Classes directly attached to the root qdisc can be used to simulate
virtual circuits.
<EFBFBD><EFBFBD>*<2A> A filter can be attached to classes or one of the classful qdiscs.
-----------------------------------------------------------------------------
8.2. Handling a link with a known bandwidth
HTB is an ideal qdisc to use on a link with a known bandwidth, because the
innermost (root-most) class can be set to the maximum bandwidth available on
a given link. Flows can be further subdivided into children classes, allowing
either guaranteed bandwidth to particular classes of traffic or allowing
preference to specific kinds of traffic.
-----------------------------------------------------------------------------
8.3. Handling a link with a variable (or unknown) bandwidth
In theory, the PRIO scheduler is an ideal match for links with variable
bandwidth, because it is a work-conserving qdisc (which means that it
provides no shaping). In the case of a link with an unknown or fluctuating
bandwidth, the PRIO scheduler simply prefers to dequeue any available packet
in the highest priority band first, then falling to the lower priority
queues.
-----------------------------------------------------------------------------
8.4. Sharing/splitting bandwidth based on flows
Of the many types of contention for network bandwidth, this is one of the
easier types of contention to address in general. By using the SFQ qdisc,
traffic in a particular queue can be separated into flows, each of which will
be serviced fairly (inside that queue). Well-behaved applications (and users)
will find that using SFQ and ESFQ are sufficient for most sharing needs.
The Achilles heel of these fair queuing algorithms is a misbehaving user or
application which opens many connections simultaneously (e.g., eMule,
eDonkey, Kazaa). By creating a large number of individual flows, the
application can dominate slots in the fair queuing algorithm. Restated, the
fair queuing algorithm has no idea that a single application is generating
the majority of the flows, and cannot penalize the user. Other methods are
called for.
-----------------------------------------------------------------------------
8.5. Sharing/splitting bandwidth based on IP
For many administrators this is the ideal method of dividing bandwidth
amongst their users. Unfortunately, there is no easy solution, and it becomes
increasingly complex with the number of machine sharing a network link.
To divide bandwidth equitably between N IP addresses, there must be N
classes.
-----------------------------------------------------------------------------
9. Scripts for use with QoS/Traffic Control
-----------------------------------------------------------------------------
9.1. wondershaper
More to come, see [http://lartc.org/wondershaper/] wondershaper.
-----------------------------------------------------------------------------
9.2. ADSL Bandwidth HOWTO script (myshaper)
More to come, see [http://www.tldp.org/HOWTO/
ADSL-Bandwidth-Management-HOWTO/implementation.html] myshaper.
-----------------------------------------------------------------------------
9.3. htb.init
More to come, see htb.init.
-----------------------------------------------------------------------------
9.4. tcng.init
More to come, see tcng.init.
-----------------------------------------------------------------------------
9.5. cbq.init
More to come, see cbq.init.
-----------------------------------------------------------------------------
10. Diagram
-----------------------------------------------------------------------------
10.1. General diagram
Below is a general diagram of the relationships of the components of a
classful queuing discipline (HTB pictured). A larger version of the diagram
is [http://linux-ip.net/traffic-control/htb-class.png] available.
Example 11. An example HTB tcng configuration
/*
*
* possible mock up of diagram shown at
* http://linux-ip.net/traffic-control/htb-class.png
*
*/
$m_web = trTCM (
cir 512 kbps, /* commited information rate */
cbs 10 kB, /* burst for CIR */
pir 1024 kbps, /* peak information rate */
pbs 10 kB /* burst for PIR */
) ;
dev eth0 {
egress {
class ( <$web> ) if tcp_dport == PORT_HTTP && __trTCM_green( $m_web );
class ( <$bulk> ) if tcp_dport == PORT_HTTP && __trTCM_yellow( $m_web );
drop if __trTCM_red( $m_web );
class ( <$bulk> ) if tcp_dport == PORT_SSH ;
htb () { /* root qdisc */
class ( rate 1544kbps, ceil 1544kbps ) { /* root class */
$web = class ( rate 512kbps, ceil 512kbps ) { sfq ; } ;
$bulk = class ( rate 512kbps, ceil 1544kbps ) { sfq ; } ;
}
}
}
}
[htb-class]
-----------------------------------------------------------------------------
11. Annotated Traffic Control Links
This section identifies a number of links to documentation about traffic
control and Linux traffic control software. Each link will be listed with a
brief description of the content at that site.
<EFBFBD><EFBFBD>*<2A> HTB site, HTB user guide and HTB theory (Martin "devik" Devera)
Hierarchical Token Bucket, HTB, is a classful queuing discipline.
Widely used and supported it is also fairly well documented in the user
guide and at [http://www.docum.org/] Stef Coene's site (see below).
<EFBFBD><EFBFBD>*<2A> General Quality of Service docs (Leonardo Balliache)
There is a good deal of understandable and introductory documentation on
his site, and in particular has some excellent overview material. See in
particular, the detailed [http://opalsoft.net/qos/DS.htm] Linux QoS
document among others.
<EFBFBD><EFBFBD>*<2A> tcng (Traffic Control Next Generation) and tcng manual (Werner
Almesberger)
The tcng software includes a language and a set of tools for creating
and testing traffic control structures. In addition to generating tc
commands as output, it is also capable of providing output for non-Linux
applications. A key piece of the tcng suite which is ignored in this
documentation is the tcsim traffic control simulator.
The user manual provided with the tcng software has been converted to
HTML with latex2html. The distribution comes with the TeX documentation.
<EFBFBD><EFBFBD>*<2A> iproute2 and iproute2 manual (Alexey Kuznetsov)
This is a the source code for the iproute2 suite, which includes the
essential tc binary. Note, that as of
iproute2-2.4.7-now-ss020116-try.tar.gz, the package did not support HTB,
so a patch available from the [http://luxik.cdi.cz/~devik/qos/htb/] HTB
site will be required.
The manual documents the entire suite of tools, although the tc utility
is not adequately documented here. The ambitious reader is recommended to
the LARTC HOWTO after consuming this introduction.
<EFBFBD><EFBFBD>*<2A> Documentation, graphs, scripts and guidelines to traffic control under
Linux (Stef Coene)
Stef Coene has been gathering statistics and test results, scripts and
tips for the use of QoS under Linux. There are some particularly useful
graphs and guidelines available for implementing traffic control at
Stef's site.
<EFBFBD><EFBFBD>*<2A> [http://lartc.org/howto/] LARTC HOWTO (bert hubert, et. al.)
The Linux Advanced Routing and Traffic Control HOWTO is one of the key
sources of data about the sophisticated techniques which are available
for use under Linux. The Traffic Control Introduction HOWTO should
provide the reader with enough background in the language and concepts of
traffic control. The LARTC HOWTO is the next place the reader should look
for general traffic control information.
<EFBFBD><EFBFBD>*<2A> Guide to IP Networking with Linux (Martin A. Brown)
Not directly related to traffic control, this site includes articles
and general documentation on the behaviour of the Linux IP layer.
<EFBFBD><EFBFBD>*<2A> Werner Almesberger's Papers
Werner Almesberger is one of the main developers and champions of
traffic control under Linux (he's also the author of tcng, above). One of
the key documents describing the entire traffic control architecture of
the Linux kernel is his Linux Traffic Control - Implementation Overview
which is available in [http://www.almesberger.net/cv/papers/tcio8.pdf]
PDF or [http://www.almesberger.net/cv/papers/tcio8.ps.gz] PS format.
<EFBFBD><EFBFBD>*<2A> Linux DiffServ project
Mercilessly snipped from the main page of the DiffServ site...
Differentiated Services (short: Diffserv) is an architecture for
providing different types or levels of service for network traffic.
One key characteristic of Diffserv is that flows are aggregated in
the network, so that core routers only need to distinguish a
comparably small number of aggregated flows, even if those flows
contain thousands or millions of individual flows.
Notes
[1] See Section 5 for more details on the use or installation of a
particular traffic control mechanism, kernel or command line utility.
[2] This queueing model has long been used in civilized countries to
distribute scant food or provisions equitably. William Faulkner is
reputed to have walked to the front of the line for to fetch his share
of ice, proving that not everybody likes the FIFO model, and providing
us a model for considering priority queuing.
[3] Similarly, the entire traffic control system appears as a queue or
scheduler to the higher layer which is enqueuing packets into this
layer.
[4] This smoothing effect is not always desirable, hence the HTB parameters
burst and cburst.
[5] A classful qdisc can only have children classes of its type. For
example, an HTB qdisc can only have HTB classes as children. A CBQ qdisc
cannot have HTB classes as children.
[6] In this case, you'll have a filter which uses a classifier to select the
packets you wish to drop. Then you'll use a policer with a with a drop
action like this police rate 1bps burst 1 action drop/drop.
[7] I do not know the range nor base of these numbers. I believe they are
u32 hexadecimal, but need to confirm this.
[8] The options listed in this example are taken from a 2.4.20 kernel source
tree. The exact options may differ slightly from kernel release to
kernel release depending on patches and new schedulers and classifiers.
[9] HTB will report bandwidth usage in this scenario incorrectly. It will
calculate the bandwidth used by quantum instead of the real dequeued
packet size. This can skew results quickly.
[10] In fact, the Intermediate Queuing Device (IMQ) simulates an output
device onto which traffic control structures can be attached. This
clever solution allows a networking device to shape ingress traffic in
the same fashion as egress traffic. Despite the apparent contradiction
of the rule, IMQ appears as a device to the kernel. Thus, there has been
no violation of the rule, but rather a sneaky reinterpretation of that
rule.
ProxyARP Subnetting HOWTO
Bob Edwards
Robert.Edwards@anu.edu.au
v2.0, 27 August 2000
This HOWTO discusses using Proxy Address Resolution Protocol (ARP)
with subnetting in order to make a small network of machines visible
on another Internet Protocol (IP) subnet (I call it sub-subnetting).
This makes all the machines on the local network (network 0 from now
on) appear as if they are connected to the main network (network 1).
This is only relevent if all machines are connected by Ethernet or
ether devices (ie. it won't work for SLIP/PPP/CSLIP etc.)
_________________________________________________________________
Table of Contents
1. [1]Acknowledgements
2. [2]Why use Proxy ARP with subnetting?
3. [3]How Proxy ARP with subnetting works
4. [4]Setting up Proxy ARP with subnetting
5. [5]Other alternatives to Proxy ARP with subnetting
6. [6]Other Applications of Proxy ARP with subnetting
7. [7]Copying conditions
1. Acknowledgements
This document, and my Proxy ARP implementation could not have been
made possible without the help of:
* Andrew Tridgell, who implemented the subnetting options for arp in
Linux, and who personally assisted me in getting it working
* the Proxy-ARP mini-HOWTO, by Al Longyear
* the Multiple-Ethernet mini-HOWTO, by Don Becker
* the arp(8) source code and man page by Fred N. van Kempen and
Bernd Eckenfels
_________________________________________________________________
2. Why use Proxy ARP with subnetting?
The applications for using Proxy ARP with subnetting are fairly
specific.
In my case, I had a wireless Ethernet card that plugs into an 8-bit
ISA slot. I wanted to use this card to provide connectivity for a
number of machines at once. Being an ISA card, I could use it on a
Linux machine, after I had written an appropriate device driver for it
- this is the subject of another document. From here, it was only
necessary to add a second Ethernet interface to the Linux machine and
then use some mechanism to join the two networks together.
For the purposes of discussion, let network 0 be the local Ethernet
connected to the Linux box via an NE-2000 clone Ethernet interface on
eth0. Network 1 is the main network connected via the wireless
Ethernet card on eth1. Machine A is the Linux box with both
interfaces. Machine B is any TCP/IP machine on network 0 and machine C
is likewise on network 1.
Normally, to provide the connectivity, I would have done one of the
following:
* Used the IP-Bridge software (see the Bridge mini-HOWTO) to bridge
the traffic between the two network interfaces. Unfortunately, the
wireless Ethernet interface cannot be put into "Promiscuous" mode
(ie. it can't see all packets on network 1). This is mainly due to
the lower bandwidth of the wireless Ethernet (2MBit/sec) meaning
that we don't want to carry any traffic not specifically destined
to another wireless Ethernet machine - in our case machine A - or
broadcasts. Also, bridging is rather CPU intensive!
* Alternatively, use subnets and an IP-router to pass packets
between the two networks (see the IP-Subnetworking mini-HOWTO).
This is a protocol specific solution, where the Linux kernel can
handle the Internet Protocol (IP) packets, but other protocols
(such as AppleTalk) need extra software to route. This also
requires the allocation of a new IP subnet (network) number, which
is not always an option.
In my case, getting a new subnet (network) number was not an option,
so I wanted a solution that allowed all the machines on network 0 to
appear as if they were on network 1. This is where Proxy ARP comes in.
Other solutions are used to connect other (non-IP) protocols, such as
netatalk to provide AppleTalk routing.
_________________________________________________________________
3. How Proxy ARP with subnetting works
The Proxy ARP is actually only used to get packets from network 1 to
network 0. To get packets back the other way, the normal IP routing
functionality is employed.
In my case, network 1 has an 8-bit subnet mask (255.255.255.0). I have
chosen the subnet mask for network 0 to be 4-bit (255.255.255.240),
allowing 14 IP nodes on network 0 (2 ^ 4 = 16, less two for the all
zeros and all ones cases). Note that any size of subnet mask up to,
but not including, the size of the mask of the other network is
allowable here (eg. 2, 3, 4, 5, 6 or 7 bits in this case - for one
bit, just use normal Proxy ARP!)
All the IP numbers for network 0 (16 in total) appear in network 1 as
a subset. Note that it is very important, in this case, not to allow
any machine connected directly to network 1 to have an IP number in
this range! In my case, I have "reserved" the IP numbers of network 1
ending in 64 .. 79 for network 0. In this case, the IP numbers ending
in 64 and 79 can't actually be used by nodes - 79 is the broadcast
address for network 0.
Machine A is allocated two IP numbers, one within the network 0 range
for it's real Ethernet interface (eth0) and the other within the
network 1 range, but outside of the network 0 range, for the wireless
Ethernet interface (eth1).
Say machine C (on network 1) wants to send a packet to machine B (on
network 0). Because the IP number of machine B makes it look to
machine C as though it is on the same physical network, machine C will
use the Address Resolution Protocol (ARP) to send a broadcast message
on network 1 requesting the machine with the IP number of machine B to
respond with it's hardware (Ethernet or MAC layer) address. Machine B
won't see this request, as it isn't actually on network 1, but machine
A, on both networks, will see it.
The first bit of magic now happens as the Linux kernel arp code on
machine A, with a properly configured Proxy ARP with subnetting entry,
determines that the ARP request has come in on the network 1 interface
(eth1) and that the IP number being ARP'd for is in the subnet range
for network 0. Machine A then sends it's own hardware (Ethernet)
address back to machine C as an ARP response packet.
Machine C then updates it's ARP cache with an entry for machine B, but
with the hardware (Ethernet) address of machine A (in this case, the
wireless Ethernet interface). Machine C can now send the packet for
machine B to this hardware (Ethernet) address, and machine A receives
it.
Machine A notices that the destination IP number in the packet is that
of machine B, not itself. Machine A's Linux kernel IP routing code
attempts to forward the packet to machine B by looking at it's routing
tables to determine which interface contains the network number for
machine B. However, the IP number for machine B is valid for both the
network 0 interface (eth0), and for the network 1 interface (eth1).
At this point, something else clever happens. Because the subnet mask
for the network 0 interface has more 1 bits (it is more specific) than
the subnet mask for the network 1 interface, the Linux kernel routing
code will match the IP number for machine B to the network 0
interface, and not keep looking for the potential match with the
network 1 interface (the one the packet came in on).
Now machine A needs to find out the "real" hardware (Ethernet) address
for machine B (assuming that it doesn't already have it in the ARP
cache). Machine A uses an ARP request, but this time the Linux kernel
arp code notes that the request isn't coming from the network 1
interface (eth1), and so doesn't respond with the Proxy address of
eth1. Instead, it sends the ARP request on the network 0 interface
(eth0), where machine B will see it and respond with it's own (real)
hardware (Ethernet) address. Now machine A can send the packet (from
machine C) onto machine B.
Machine B gets the packet from machine C (via machine A) and then
wants to send back a response. This time, machine B notices that
machine C in on a different subnet (machine B's subnet mask of
255.255.255.240 excludes all machines not in the network 0 IP address
range). Machine B is setup with a "default" route to machine A's
network 0 IP number and sends the packet to machine A. This time,
machine A's Linux kernel routing code determines the destination IP
number (of machine C) as being on network 1 and sends the packet onto
machine C via Ethernet interface eth1.
Similar (less complicated) things occur for packets originating from
and destined to machine A from other machines on either of the two
networks.
Similarly, it should be obvious that if another machine (D) on network
0 ARP's for machine B, machine A will receive the ARP request on it's
network 0 interface (eth0) and won't respond to the request as it is
set up to only Proxy on it's network 1 interface (eth1).
Note also that all of machines B and C (and D) are not required to do
anything unusual, IP-wise. In my case, there is a mixture of Suns,
Macs and PC/Windoze 95 machines on network 0 all connecting through
Linux machine A to the rest of the world.
Finally, note that once the hardware (Ethernet) addresses are
discovered by each of machines A, B, C (and D), they are placed in the
ARP cache and subsequent packet transfers occur without the ARP
overhead. The ARP caches normally expire entries after 5 minutes of
non-activity.
_________________________________________________________________
4. Setting up Proxy ARP with subnetting
I set up Proxy ARP with subnetting on a Linux kernel version 2.0.30
machine, but I am told that the code works right back to some kernel
version in the 1.2.x era.
The first thing to note is that the ARP code is in two parts: the part
inside the kernel that sends and receives ARP requests and responses
and updates the ARP cache etc.; and other part is the arp(8) command
that allows the super user to modify the ARP cache manually and anyone
to examine it.
The first problem I had was that the arp(8) command that came with my
Slackware 3.1 distribution was ancient (1994 era!!!) and didn't
communicate with the kernel arp code correctly at all (mainly
evidenced by the strange output that it gave for "arp -a").
The arp(8) command in "net-tools-1.33a" available from a variety of
places, including (from the README file that came with it)
[8]ftp.linux.org.uk:/pub/linux/Networking/base/ works properly and
includes new man pages that explain stuff a lot better than the older
arp(8) man page.
Armed with a decent arp(8) command, all the changes I made were in the
/etc/rc.d/rc.inet1 script (for Slackware - probably different for
other flavours). First of all, we need to change the broadcast
address, network number and netmask of eth0:
NETMASK=255.255.255.240 # for a 4-bit host part
NETWORK=x.y.z.64 # our new network number (replace x.y.z with your net)
BROADCAST=x.y.z.79 # in my case
Then a line needs to be added to configure the second Ethernet port
(after any module loading that might be required to load the driver
code):
/sbin/ifconfig eth1 (name on net 1) broadcast (x.y.z.255) netmask 255.255.255.0
Then we add a route for the new interface:
/sbin/route add -net (x.y.z.0) netmask 255.255.255.0
And you will probably need to change the default gateway to the one
for network 1.
At this point, it is appropriate to add the Proxy ARP entry:
/sbin/arp -i eth1 -Ds ${NETWORK} eth1 netmask ${NETMASK} pub
This tells ARP to add a static entry (the s) to the cache for network
${NETWORK}. The -D tells ARP to use the same hardware address as
interface eth1 (the second eth1), thus saving us from having to look
up the hardware address for eth1 and hardcoding it in. The netmask
option tells ARP that we want to use subnetting (ie. Proxy for all (IP
number) & ${NETMASK} == ${NETWORK} & ${NETMASK}). The pub option tells
ARP to publish this ARP entry, ie. it is a Proxy entry, so respond on
behalf of these IP numbers. The -i eth1 option tells ARP to only
respond to requests that come in on interface eth1.
Hopefully, at this point, when the machine is rebooted, all the
machines on network 0 will appear to be on network 1. You can check
that the Proxy ARP with subnetting entry has been correctly installed
on machine A. On my machine (names changed to protect the innocent) it
is:
bash$ /sbin/arp -an
Address HWtype HWaddress Flags Mask Iface
x.y.z.1 ether 00:00:0C:13:6F:17 C * eth1
x.y.z.65 ether 00:40:05:49:77:01 C * eth0
x.y.z.67 ether 08:00:20:0B:79:47 C * eth0
x.y.z.5 ether 00:00:3B:80:18:E5 C * eth1
x.y.z.64 ether 00:40:96:20:CD:D2 CMP 255.255.255.240 eth1
Alternatively, you can examine the /proc/net/arp file with eg. cat(1).
The last line is the proxy entry for the subnet. The CMP flags
indicate that it is a static (Manually entered) entry and that it is
to be Published. The entry is only going to reply to ARP requests on
eth1 where the requested IP number, once masked, matches the network
number, also masked. Note that arp(8) has automatically determined the
hardware address of eth1 and inserted this for the address to use (the
-Ds option).
Likewise, it is probably prudent to check that the routing table has
been set up correctly. Here is mine (again, the names are changed to
protect the innocent):
#/bin/netstat -rn
Kernel routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
x.y.z.64 0.0.0.0 255.255.255.240 U 0 0 71 eth0
x.y.z.0 0.0.0.0 255.255.255.0 U 0 0 389 eth1
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 7 lo
0.0.0.0 x.y.z.1 0.0.0.0 UG 1 0 573 eth1
Alternatively, you can examine the /proc/net/route file with eg.
cat(1).
Note that the first entry is a proper subset of the second, but the
routing table has ranked them in netmask order, so the eth0 entry will
be checked before the eth1 entry.
_________________________________________________________________
5. Other alternatives to Proxy ARP with subnetting
There are several other alternatives to using Proxy ARP with
subnetting in this situation, apart from the ones mentioned about
(bridging and straight routing):
* IP-Masquerading (see the IP-Masquerade mini-HOWTO), in which
network 0 is "hidden" behind machine A from the rest of the
Internet. As machines on network 0 attempt to connect outside
through machine A, it re-addresses the source address and port
number of the packets and makes them look like they are coming
from itself, rather than from the machine on the hidden network 0.
This is an elegant solution, although it prevents any machine on
network 1 from initiating a connection to any machine on network
0, as the machines on network 0 effectively don't exist outside of
network 0. This effectively increases security of the machines on
network 0, but is also means that servers on network 1 cannot
check the identity of clients on network 0 using IP numbers (eg.
NFS servers use IP hostnames for access to mountable file
systems).
* Another option is IP in IP tunneling, which isn't supported on all
platforms (such as Macs and Windoze machines) so I opted not to go
this way.
* Use Proxy ARP without subnetting. This is certainly possible, it
just means that a separate entry needs to be created for each
machine on network 0, instead of a single entry for all machines
(current and future) on network 0.
* Possibly IP Aliasing might also be useful here, but I haven't
looked into this at all.
_________________________________________________________________
6. Other Applications of Proxy ARP with subnetting
There is only one other application that I know about that uses Proxy
ARP with subnetting, also here at the Australian National University.
It is the one that Andrew Tridgell originally wrote the subnetting
extensions to Proxy ARP for. However, Andrew reliably informs me that
there are, in fact, several other sites around the world using it as
well (I don't have any details).
The other A.N.U. application involves a teaching lab set up to teach
students how to configure machines to use TCP/IP, including setting up
the gateway. The network used is a Class C network, and Andrew needed
to "subnet" it for security, traffic control and the educational
reason mentioned above. He did this using Proxy ARP, and then decided
that a single entry in the ARP cache for the whole subnet would be
faster and cleaner than one for each host on the subnet. Voila...Proxy
ARP with subnetting!
_________________________________________________________________
7. Copying conditions
Copyright 1997 by Bob Edwards <[9]Robert.Edwards@anu.edu.au>
Voice: (+61) 2 6249 4090
Unless otherwise stated, Linux HOWTO documents are copyrighted by
their respective authors. Linux HOWTO documents may be reproduced and
distributed in whole or in part, in any medium physical or electronic,
as long as this copyright notice is retained on all copies. Commercial
redistribution is allowed and encouraged; however, the author would
like to be notified of any such distributions. All translations,
derivative works, or aggregate works incorporating any Linux HOWTO
documents must be covered under this copyright notice. That is, you
may not produce a derivative work from a HOWTO and impose additional
restrictions on its distribution. Exceptions to these rules may be
granted under certain conditions; please contact the Linux HOWTO
coordinator at the address given below. In short, we wish to promote
dissemination of this information through as many channels as
possible. However, we do wish to retain copyright on the HOWTO
documents, and would like to be notified of any plans to redistribute
the HOWTOs. If you have questions, please contact the Linux HOWTO
coordinator, at <[10]linux-howto@metalab.unc.edu> via email.
References
1. Proxy-ARP-Subnet.html#INTRO
2. Proxy-ARP-Subnet.html#WHY
3. Proxy-ARP-Subnet.html#HOW
4. Proxy-ARP-Subnet.html#SETUP
5. Proxy-ARP-Subnet.html#ALTERNATIVES
6. Proxy-ARP-Subnet.html#APPLICATIONS
7. Proxy-ARP-Subnet.html#COPYING
8. ftp://ftp.linux.org.uk/pub/linux/Networking/base/
9. mailto:Robert.Edwards@anu.edu.au
10. mailto:linux-howto@metalab.unc.edu
</sect1 id="Proxy-Caching">
<sect1 id="NTP">
<title>NTP</title>
<para>
Time synchorinisation is generally considered important in the computing
environment. There are a number of reasons why this is important: it makes
sure your scheduled cron tasks on your various servers run well together,
it allows better use of log files between various machines to help
troubleshoot problems, and synchronised, correct logs are also useful if
your servers are ever attacked by crackers (either to report the attempt
to organisations such as AusCERT or in court to use against the bad guys).
Users who have overclocked their machine might also use time synchronisation
techniques to bring the time on their machines back to an accurate figure
at regular intervals, say every 20 minutes of so. This section contains an
overview of time keeping under Linux and some information about NTP, a
protocol which can be used to accurately reset the time across a computer
network.
</para>
2. How Linux Keeps Track of Time
2.1. Basic Strategies
<para>
A Linux system actually has two clocks: One is the battery powered
"Real Time Clock" (also known as the "RTC", "CMOS clock", or "Hardware
clock") which keeps track of time when the system is turned off but is
not used when the system is running. The other is the "system clock"
(sometimes called the "kernel clock" or "software clock") which is a
software counter based on the timer interrupt. It does not exist when
the system is not running, so it has to be initialized from the RTC
(or some other time source) at boot time. References to "the clock" in
the ntpd documentation refer to the system clock, not the RTC.
</para>
<para>
The two clocks will drift at different rates, so they will gradually
drift apart from each other, and also away from the "real" time. The
simplest way to keep them on time is to measure their drift rates and
apply correction factors in software. Since the RTC is only used when
the system is not running, the correction factor is applied when the
clock is read at boot time, using clock(8) or hwclock(8). The system
clock is corrected by adjusting the rate at which the system time is
advanced with each timer interrupt, using adjtimex(8).
</para>
<para>
A crude alternative to adjtimex(8) is to have chron run clock(8) or
hwclock(8) periodically to sync the system time to the (corrected)
RTC. This was recommended in the clock(8) man page, and it works if
you do it often enough that you don't cause large "jumps" in the
system time, but adjtimex(8) is a more elegant solution. Some
applications may complain if the time jumps backwards.
</para>
<para>
The next step up in accuracy is to use a program like ntpd to read the
time periodically from a network time server or radio clock, and
continuously adjust the rate of the system clock so that the times
always match, without causing sudden "jumps" in the system time. If
you always have a network connection at boot time, you can ignore the
RTC completely and use ntpdate (which comes with the ntpd package) to
initialize the system clock from a time server-- either a local server
on a LAN, or a remote server on the internet. But if you sometimes
don't have a network connection, or if you need the time to be
accurate during the boot sequence before the network is active, then
you need to maintain the time in the RTC as well.
</para>
2.2. Potential Conflicts
<para>
It might seem obvious that if you're using a program like ntpd, you
would want to sync the RTC to the (corrected) system clock. But this
turns out to be a bad idea if the system is going to stay shut down
longer than a few minutes, because it interferes with the programs
that apply the correction factor to the RTC at boot time.
</para>
<para>
If the system runs 24/7 and is always rebooted immediately whenever
it's shut down, then you can just set the RTC from the system clock
right before you reboot. The RTC won't drift enough to make a
difference in the time it takes to reboot, so you don't need to know
its drift rate.
</para>
<para>
Of course the system may go down unexpectedly, so some versions of the
kernel sync the RTC to the system clock every 11 minutes if the system
clock has been adjusted by another program. The RTC won't drift enough
in 11 minutes to make any difference, but if the system is down long
enough for the RTC to drift significantly, then you have a problem:
the programs that apply the drift correction to the RTC need to know
*exactly* when it was last reset, and the kernel doesn't record that
information anywhere.
</para>
<para>
Some unix "traditionalists" might wonder why anyone would run a linux
system less than 24/7, but some of us run dual-boot systems with
another OS running some of the time, or run Linux on laptops that have
to be shut down to conserve battery power when they're not being used.
Other people just don't like to leave machines running unattended for
long periods of time (even though we've heard all the arguments in
favor of it). So the "every 11 minutes" feature becomes a bug.
</para>
<para>
This "feature/bug" appears to behave differently in different versions
of the kernel (and possibly in different versions of xntpd and ntpd as
well), so if you're running both ntpd and hwclock you may need to test
your system to see what it actually does. If you can't keep the kernel
from resetting the RTC, you might have to run without a correction
factor on the RTC.
</para>
<para>
The part of the kernel that controls this can be found in
/usr/src/linux-2.0.34/arch/i386/kernel/time.c (where the version
number in the path will be the version of the kernel you're running).
If the variable time_status is set to TIME_OK then the kernel will
write the system time to the RTC every 11 minutes, otherwise it leaves
the RTC alone. Calls to adjtimex(2) (as used by ntpd and timed, for
example) may turn this on. Calls to settimeofday(2) will set
time_status to TIME_UNSYNC, which tells the kernel not to adjust the
RTC. I have not found any real documentation on this.
</para>
<para>
I've heard reports that some versions of the kernel may have problems
with "sleep modes" that shut down the CPU to save energy. The best
solution is to keep your kernel up to date, and refer any problems to
the people who maintain the kernel.
</para>
<para>
If you get bizarre results from the RTC you may have a hardware
problem. Some RTC chips include a lithium battery that can run down,
and some motherboards have an option for an external battery (be sure
the jumper is set correctly). The same battery maintains the CMOS RAM,
but the clock takes more power and is likely to fail first. Bizarre
results from the system clock may mean there is a problem with
interrupts.
</para>
2.3. Should the RTC use Local Time or UTC, and What About DST?
<para>
The Linux "system clock" actually just counts the number of seconds
past Jan 1, 1970, and is always in UTC (or GMT, which is technically
different but close enough that casual users tend to use both terms
interchangeably). UTC does not change as DST comes and goes-- what
changes is the conversion between UTC and local time. The translation
to local time is done by library functions that are linked into the
application programs.
</para>
<para>
This has two consequences: First, any application that needs to know
the local time also needs to know what time zone you're in, and
whether DST is in effect or not (see the next section for more on time
zones). Second, there is no provision in the kernel to change either
the system clock or the RTC as DST comes and goes, because UTC doesn't
change. Therefore, machines that only run Linux should have the RTC
set to UTC, not local time.
</para>
<para>
However, many people run dual-boot systems with other OS's that expect
the RTC to contain the local time, so hwclock needs to know whether
your RTC is in local time or UTC, which it then converts to seconds
past Jan 1, 1970 (UTC). This still does not provide for seasonal
changes to the RTC, so the change must be made by the other OS (this
is the one exception to the rule against letting more than one program
change the time in the RTC).
</para>
<para>
Unfortunately, there are no flags in the RTC or the CMOS RAM to
indicate standard time vs DST, so each OS stores this information
someplace where the other OS's can't find it. This means that hwclock
must assume that the RTC always contains the correct local time, even
if the other OS has not been run since the most recent seasonal time
change.
</para>
<para>
If Linux is running when the seasonal time change occurs, the system
clock is unaffected and applications will make the correct conversion.
But if linux has to be rebooted for any reason, the system clock will
be set to the time in the RTC, which will be off by one hour until the
other OS (usually Windows) has a chance to run.
</para>
<para>
There is no way around this, but Linux doesn't crash very often, so
the most likely reason to reboot on a dual-boot system is to run the
other OS anyway. But beware if you're one of those people who shuts
down Linux whenever you won't be using it for a while-- if you haven't
had a chance to run the other OS since the last time change, the RTC
will be off by an hour until you do.
</para>
<para>
Some other documents have stated that setting the RTC to UTC allows
Linux to take care of DST properly. This is not really wrong, but it
doesn't tell the whole story-- as long as you don't reboot, it does
not matter which time is in the RTC (or even if the RTC's battery
dies). Linux will maintain the correct time either way, until the next
reboot. In theory, if you only reboot once a year (which is not
unreasonable for Linux), DST could come and go and you'd never notice
that the RTC had been wrong for several months, because the system
clock would have stayed correct all along. But since you can't predict
when you'll want to reboot, it's better to have the RTC set to UTC if
you're not running another OS that requires local time.
</para>
<para>
The Dallas Semiconductor RTC chip (which is a drop-in replacement for
the Motorola chip used in the IBM AT and clones) actually has the
ability to do the DST conversion by itself, but this feature is not
used because the changeover dates are hard-wired into the chip and
can't be changed. Current versions change on the first Sunday in April
and the last Sunday in October, but earlier versions used different
dates (and obviously this doesn't work in countries that use other
dates). Also, the RTC is often integrated into the motherboard's
"chipset" (rather than being a separate chip) and I don't know if they
all have this ability.
</para>
2.4. How Linux keeps Track of Time Zones
<para>
You probably set your time zone correctly when you installed Linux.
But if you have to change it for some reason, or if the local laws
regarding DST have changed (as they do frequently in some countries),
then you'll need to know how to change it. If your system time is off
by some exact number of hours, you may have a time zone problem (or a
DST problem).
</para>
<para>
Time zone and DST information is stored in /usr/share/zoneinfo (or
/usr/lib/zoneinfo on older systems). The local time zone is
determined by a symbolic link from /etc/localtime to one of these
files. The way to change your timezone is to change the link. If
your local DST dates have changed, you'll have to edit the file.
</para>
<para>
You can also use the TZ environment variable to change the current
time zone, which is handy of you're logged in remotely to a machine in
another time zone. Also see the man pages for tzset and tzfile.
This is nicely summarized at
<http://www.linuxsa.org.au/tips/time.html>
</para>
2.5. The Bottom Line
<para>
If you don't need sub-second accuracy, hwclock(8) and adjtimex(8) may
be all you need. It's easy to get enthused about time servers and
radio clocks and so on, but I ran the old clock(8) program for years
with excellent results. On the other hand, if you have several
machines on a LAN it can be handy (and sometimes essential) to have
them automatically sync their clocks to each other. And the other
stuff can be fun to play with even if you don't really need it.
</para>
<para>
On machines that only run Linux, set the RTC to UTC (or GMT). On
dual-boot systems that require local time in the RTC, be aware that if
you have to reboot Linux after the seasonal time change, the clock may
be temporarily off by one hour, until you have a chance to run the
other OS. If you run more than two OS's, be sure only one of them is
trying to adjust for DST.
</para>
<para>
NTP is a standard method of synchronising time on a client from a remote
server across the network. NTP clients are typically installed on servers.
NTP is a standard method of synchronising time across a network of
computers. NTP clients are typically installed on servers.
Most business class ISPs provide NTP servers. Otherwise, there are a
number of free NTP servers in Australia:
</para>
<para>
The Univeristy of Melbourne ntp.cs.mu.oz.au
University of Adelaide ntp.saard.net
CSIRO Marine Labs, Tasmania ntp.ml.csiro.au
CSIRO National Measurements Laboratory, Sydney ntp.syd.dms.csiro.au
</para>
<para>
Xntpd (NTPv3) has been replaced by ntpd (NTPv4); the earlier version
is no longer being maintained.
</para>
</para>
Ntpd is the standard program for synchronizing clocks across a
network, and it comes with a list of public time servers you can
connect to. It can be a little more complicated to set up, but if
you're interested in this kind of thing I highly recommend that you
take a look at it.
</para>
<para>
The "home base" for information on ntpd is the NTP website at
<http://www.eecis.udel.edu/~ntp/> which also includes links to all
kinds of interesting time-related stuff (including software for other
OS's). Some linux distributions include ntpd on the CD. There is a
list of public time servers at
<http://www.eecis.udel.edu/~mills/ntp/clock2.html>.
</para>
<para>
A relatively new feature in ntpd is a "burst mode" which is designed
for machines that have only intermittent dial-up access to the
internet.
</para>
<para>
Ntpd includes drivers for quite a few radio clocks (although some
appear to be better supported than others). Most radio clocks are
designed for commercial use and cost thousands of dollars, but there
are some cheaper alternatives (discussed in later sections). In the
past most were WWV or WWVB receivers, but now most of them seem to be
GPS receivers. NIST has a PDF file that lists manufacturers of radio
clocks on their website at
<http://www.boulder.nist.gov/timefreq/links.htm> (near the bottom of
the page). The NTP website also includes many links to manufacturers
of radio clocks at <http://www.eecis.udel.edu/~ntp/hardware.htm> and
<http://www.eecis.udel.edu/~mills/ntp/refclock.htm>. Either list may
or may not be up to date at any given time :-). The list of drivers
for ntpd is at
<http://www.eecis.udel.edu/~ntp/ntp_spool/html/refclock.htm>.
</para>
<para>
Ntpd also includes drivers for several dial-up time services. These
are all long-distance (toll) calls, so be sure to calculate the effect
on your phone bill before using them.
</para>
3.4. Chrony
<para>
Xntpd was originally written for machines that have a full-time
connection to a network time server or radio clock. In theory it can
also be used with machines that are only connected intermittently, but
Richard Curnow couldn't get it to work the way he wanted it to, so he
wrote "chrony" as an alternative for those of us who only have network
access when we're dialed in to an ISP (this is the same problem that
ntpd's new "burst mode" was designed to solve). The current version
of chrony includes drift correction for the RTC, for machines that are
turned off for long periods of time.
</para>
<para>
You can get more information from Richard Curnow's website at
<http://www.rrbcurnow.freeuk.com/chrony> or <http://go.to/chrony>.
There are also two chrony mailing lists, one for announcements and one
for discussion by users. For information send email to chrony-users-
subscribe@egroups.com or chrony-announce-subscribe@egroups.com
</para>
<para>
Chrony is normally distributed as source code only, but Debian has
been including a binary in their "unstable" collection. The source
file is also available at the usual Linux archive sites.
</para>
3.5. Clockspeed
<para>
Another option is the clockspeed program by DJ Bernstein. It gets the
time from a network time server and simply resets the system clock
every three seconds. It can also be used to synchronize several
machines on a LAN.
</para>
<para>
I've sometimes had trouble reaching his website at
<http://Cr.yp.to/clockspeed.html>, so if you get a DNS error try again
on another day. I'll try to update this section if I get some better
information.
</para>
<para>
Note
You must be logged in as "root" to run any program that affects
the RTC or the system time, which includes most of the programs
described here. If you normally use a graphical interface for
everything, you may also need to learn some basic unix shell
commands.
</para>
<para>
Note
If you run more than one OS on your machine, you should only let
one of them set the RTC, so they don't confuse each other. The
exception is the twice-a-year adjustment for Daylight Saving(s)
Time.
</para>
<para>
If you run a dual-boot system that spends a lot of time running
Windows, you may want to check out some of the clock software
available for that OS instead. Follow the links on the NTP website at
<http://www.eecis.udel.edu/~ntp/software.html>.
</para>
</sect1 id="NTP">
<sect1 id="Traffic-Control">
8.6. Traffic Shaping
The traffic shaper is a virtual network device that makes it possible
to limit the rate of outgoing data flow over another network device.
This is especially useful in scenarios such as ISPs, where it is
desirable to control and enforce policies regarding how much bandwidth
is used by each client. Another alternative (for web services only)
may be certain Apache modules which restrict the number of IP
connections by client or the bandwidth used.
<title>Traffic-Control</title>
<para>
Traffic control encompasses the sets of mechanisms and operations by which
packets are queued for transmission/reception on a network interface. The
operations include enqueuing, policing, classifying, scheduling, shaping and
dropping. This HOWTO provides an introduction and overview of the
capabilities and implementation of traffic control under Linux.
</para>
<EFBFBD><EFBFBD>*<2A> the linux DiffServ project
<EFBFBD><EFBFBD>*<2A> HTB site (Martin "devik" Devera)
<EFBFBD><EFBFBD>*<2A> Traffic Control Next Generation (tcng)
TCNG manual (Werner Almesberger)
<EFBFBD><EFBFBD>*<2A> iproute2 (Alexey Kuznetsov)
iproute2 manual (Alexey Kuznetsov)
<EFBFBD><EFBFBD>*<2A> Research and documentation on traffic control under linux (Stef Coene)
<EFBFBD><EFBFBD>*<2A> LARTC HOWTO (bert hubert, et. al.)
<EFBFBD><EFBFBD>*<2A> guide to IP networking with linux (Martin A. Brown)
* http://metalab.unc.edu/mdw/HOWTO/NET3-4-HOWTO-6.html#ss6.15
* Traffic Control HOWTO
</sect1 id="Traffic-Control">
<sect1 id="Load-Balancing">
<title>Load-Balancing</title>
<para>
Demand for load balancing usually arises in database/web access when
many clients make simultaneous requests to a server. It would be
desirable to have multiple identical servers and redirect requests to
the less loaded server. This can be achieved through Network Address
Translation techniques (NAT) of which IP masquerading is a subset.
Network administrators can replace a single server providing Web
services - or any other application - with a logical pool of servers
sharing a common IP address. Incoming connections are directed to a
particular server using one load-balancing algorithm. The virtual
server rewrites incoming and outgoing packets to give clients the
appearance that only one server exists.
</para>
<para>
Linux IP-NAT information may be found here <http://www.csn.tu-
chemnitz.de/HyperNews/get/linux-ip-nat.html>
</para>
</sect1 id="Load-Balancing">
<sect1 id="Bandwidth-Limiting">
<title>Bandwidth-Limiting</title>
<para>
This section describes how to set up your Linux server to limit download
bandwidth or incoming traffic and how to use your internet link more
efficiently. It is meant to provide an easy solution for limiting
incoming traffic, thus preventing our LAN users from consuming all the
bandwidth of our internet link. This is useful when our internet link
is slow or our LAN users download tons of mp3s and the newest Linux
distro's *.iso files.
</para>
* Bandwidth Limiting HOWTO
6. Miscellaneous
6.1. Useful resources
Squid Web Proxy Cache
[http://www.squid-cache.org] http://www.squid-cache.org
Squid 2.4 Stable 1 Configuration manual
[http://www.visolve.com/squidman/Configuration%20Guide.html] http://
www.visolve.com/squidman/Configuration%20Guide.html
[http://www.visolve.com/squidman/Delaypool%20parameters.htm] http://
www.visolve.com/squidman/Delaypool%20parameters.htm
Squid FAQ
[http://www.squid-cache.org/Doc/FAQ/FAQ-19.html#ss19.8] http://
www.squid-cache.org/Doc/FAQ/FAQ-19.html#ss19.8
cbq-init script
[ftp://ftp.equinox.gu.net/pub/linux/cbq/] ftp://ftp.equinox.gu.net/pub/linux/
cbq/
Linux 2.4 Advanced Routing HOWTO
[http://www.linuxdoc.org/HOWTO/Adv-Routing-HOWTO.html] http://
www.linuxdoc.org/HOWTO/Adv-Routing-HOWTO.html
Traffic control (in Polish)
[http://ceti.pl/~kravietz/cbq/] http://ceti.pl/~kravietz/cbq/
Securing and Optimizing Linux Red Hat Edition - A Hands on Guide
[http://www.linuxdoc.org/guides.html] http://www.linuxdoc.org/guides.html
IPTraf
[http://cebu.mozcom.com/riker/iptraf/] http://cebu.mozcom.com/riker/iptraf/
IPCHAINS
[http://www.linuxdoc.org/HOWTO/IPCHAINS-HOWTO.html] http://www.linuxdoc.org/
HOWTO/IPCHAINS-HOWTO.html
Nylon socks proxy server
[http://mesh.eecs.umich.edu/projects/nylon/] http://mesh.eecs.umich.edu/
projects/nylon/
Indonesian translation of this HOWTO by Rahmat Rafiudin mjl_id@yahoo.com
[http://raf.unisba.ac.id/resources/BandwidthLimitingHOWTO/index.html] http://
raf.unisba.ac.id/resources/BandwidthLimitingHOWTO/index.html
</sect1 id="Bandwidth-Limiting">
<sect1 id="Compressed-TCP">
<title>Compressed-TCP</title>
<para>
In the past, we used to compress files in order to save disk space.
Today, disk space is cheap - but bandwidth is limited. By compressing
data streams such as TCP/IP-Sessions using SSH-like tools, you achieve
two goals:
</para>
1) You save bandwidth/transfered volume (that is important if you have
to pay for traffic or if your network is loaded.).
2) Speeding up low-bandwidth connections (Modem, GSM, ISDN).
<para>
This HowTo explains how to save both bandwith and connection time by
using tools like SSH1, SSH2, OpenSSH or LSH.
</para>
2. Compressing HTTP/FTP,...
<para>
My office is connected with a 64KBit ISDN line to the internet, so the
maximum transfer rate is about 7K/s. You can speed up the connection
by compressing it: when I download files, Netscape shows up a transfer
rate of up to 40K/s (Logfiles are compressable by factor 15). SSH is a
tool that is mainly designed to build up secure connections over
unsecured networks. Further more, SSH is able to compress connections
and to do port forwarding (like rinetd or redir). So it is the
appropriate tool to compress any simple TCP/IP connection. "Simple"
means, that only one TCP-connection is opened. An FTP-connections or
the connection between M$-Outlook and MS-Exchange are not simple as
several connections are established. SSH uses the LempleZiv (LZ77)
compression algorithm - so you will achieve the same high compression
rate as winzip/pkzip. In order to compress all HTTP-connections from
my intranet to the internet, I just have to execute one command on my
dial-in machine:
</para>
<para>
<screen>
ssh -l <login ID> <hostname> -C -L8080:<proxy_at_ISP>:80 -f sleep
10000
</screen>
</para>
<para>
<screen>
<hostname> = host that is located at my ISP. SSH-access is required.
<login ID> = my login-ID on <hostname>
<proxy_at_ISP> = the web proxy of my ISP
</screen>
</para>
<para>
My browser is configured to use localhost:8080 as proxy. My laptop
connects to the same socket. The connection is compressed and
forwarded to the real proxy by SSH. The infrastructure looks like:
</para>
<para>
<screen>
64KBit ISDN
My PC--------------------------------A PC (Unix/Linux/Win-NT) at my ISP
SSH-Client compressed SSH-Server, Port 22
Port 8080 |
| |
| |
| |
|10MBit Ethernet |100MBit
|not compressed |not compressed
| |
| |
My second PC ISP's WWW-proxy
with Netscape,... Port 80
(Laptop)
</screen>
</para>
3. Compressing Email
3.1. Incoming Emails (POP3, IMAP4)
<para>
Most people fetch their email from the mailserver via POP3. POP3 is a
protocol with many disadvantages:
</para>
1. POP3 transfers password in clear text. (There are SSL-
implementations of POP/IMAP and a challenge/response
authentication, defined in RFC-2095/2195).
2. POP3 causes much protocol overhead: first the client requests a
message than the server sends the message. After that the client
requests the transferred article to be deleted. The server confirms
the deletion. After that the server is ready for the next
transaction. So 4 transactions are needed for each email.
3. POP3 transfers the mails without compression although email is
highly compressible (factor=3.5).
<para>
You could compress POP3 by forwarding localhost:110 through a
compressed connection to your ISP's POP3-socket. After that you have
to tell your mail client to connect to localhost:110 in order to
download mail. That secures and speeds up the connection -- but the
download time still suffers from the POP3-inherent protocol overhead.
</para>
<para>
It makes sense to substitute POP3 by a more efficient protocol. The
idea is to download the entire mailbox at once without generating
protocol overhead. Furthermore it makes sense to compress the
connections. The appropriate tool which offers both features is SCP.
You can download your mail-file like this:
</para>
<para>
<screen>
scp -C -l loginId:/var/spool/mail/loginid /tmp/newmail
</screen>
</para>
<para>
But there is a problem: what happens if a new email arrives at the
server during the download of your mailbox? The new mail would be
lost. Therefore it makes more sense to use the following commands:
</para>
<para>
<screen>
ssh -l loginid mailserver -f mv /var/spool/mail/loginid
/tmp/loginid_fetchme
scp -C -l loginid:/tmp/my_new_mail /tmp/loginid_fetchme
</screen>
</para>
<para>
A move (mv) is a elementary operation, so you won't get into truble if
you receive new mail during the execution of the commands. But if the
mail server directories /tmp/ and /var/spool/mail are not on the same
disc you might get problems. A solution is to create a lockfile on the
server before you execute the mv: touch /var/spool/mail/loginid.lock.
You should remove it, after that. A better solution is to move the
file loginid in the same directory:
</para>
<para>
<screen>
ssh -l loginid mailserver -f mv /var/spool/mail/loginid
/var/spool/mail/loginid_fetchme
</screen>
</para>
<para>
After that you can use formail instead of procmail in order to filter
/tmp/newmail into the right folder(s):
</para>
<para>
<screen>
formail -s procmail < /tmp/newmail
</screen>
</para>
3.2. Outgoing Email (SMTP)
<para>
You send email over compresses and encrypted SSH-connections, in order
to:
</para>
<20> Save network traffic
<20> Secure the connection (This does not make sense, if the mail is
transported over untrusted networks, later.)
<20> Authenticate the sender. Many mail servers deny mail relaying in
order to prevent abuse. If you send an email over an SSH-
connection, the remote mail server (i.e. sendmail or MS-exchange)
thinks to be connected, locally.
<para>
If you have SSH-access on the mail server, you need the following
command:
</para>
<para>
<screen>
ssh -C -l loginid mailserver -L2525:mailserver:25
</screen>
</para>
<para>
If you don't have SSH-access on the mail server but to a server that
is allowed to use your mail server as relay, the command is:
</para>
<para>
<screen>
ssh -C -l loginid other_server -L2525:mailserver:25
</screen>
</para>
<para>
After that you can configure your mail client (or mail server: see
"smarthost") to send out mails to localhost port 2525.
</para>
4. Thoughts about performance.
<para>
Of course compression/encryption takes CPU time. It turned out that an
old Pentium-133 is able to encrypt and compress about 1GB/hour --
that's quite a lot. If you compile SSH with the option "--with-none"
you can tell SSH to use no encryption. That saves a little
performance. Here is a comprison between several download methods
(during the test, a noncompressed 6MB-file was transfered from a
133MHz-Pentium-1 to a 233MHz Pentium2 laptop over a 10MBit ethernet
without other load).
</para>
<para>
<screen>
+-------------------+--------+----------+-----------+----------------------+
| | FTP |encrypted |compressed |compressed & encrypted|
+-------------------+--------+----------+-----------+----------------------+
| Elapsed Time | 17.6s | 26s | 9s | 23s |
+-------------------+--------+----------+-----------+----------------------+
| Throughput | 790K/s | 232K/s | 320K/s | 264K/s |
+-------------------+--------+----------+-----------+----------------------+
|Compression Factor | 1 | 1 | 3.8 | 3.8 |
+-------------------+--------+----------+-----------+----------------------+
</screen>
</para>
</sect1 id="Compressed-TCP">
<sect1 id="IP-Accounting">
<title>IP-Accounting</title>
<para>
This option of the Linux kernel keeps track of IP network traffic,
performs packet logging and produces some statistics. A series of
rules may be defined so when a packet matches a given pattern, some
action is performed: a counter is increased, it is accepted/rejected,
etc.
</para>
<para>
6.3. IP Accounting (for Linux-2.0)
The IP accounting features of the Linux kernel allow you to collect
and analyze some network usage data. The data collected comprises the
number of packets and the number of bytes accumulated since the
figures were last reset. You may specify a variety of rules to
categorize the figures to suit whatever purpose you may have. This
option has been removed in kernel 2.1.102, because the old ipfwadm-
based firewalling was replaced by ``ipfwchains''.
</para>
<para>
<screen>
Kernel Compile Options:
Networking options --->
[*] IP: accounting
</screen>
</para>
<para>
After you have compiled and installed the kernel you need to use the
ipfwadm command to configure IP accounting. There are many different
ways of breaking down the accounting information that you might
choose. I've picked a simple example of what might be useful to use,
you should read the ipfwadm man page for more information.
Scenario: You have a ethernet network that is linked to the internet
via a PPP link. On the ethernet you have a machine that offers a
number of services and that you are interested in knowing how much
traffic is generated by each of ftp and world wide web traffic, as
well as total tcp and udp traffic.
</para>
<para>
You might use a command set that looks like the following, which is
shown as a shell script:
</para>
<para>
<screen>
#!/bin/sh
#
# Flush the accounting rules
ipfwadm -A -f
#
# Set shortcuts
localnet=44.136.8.96/29
any=0/0
# Add rules for local ethernet segment
ipfwadm -A in -a -P tcp -D $localnet ftp-data
ipfwadm -A out -a -P tcp -S $localnet ftp-data
ipfwadm -A in -a -P tcp -D $localnet www
ipfwadm -A out -a -P tcp -S $localnet www
ipfwadm -A in -a -P tcp -D $localnet
ipfwadm -A out -a -P tcp -S $localnet
ipfwadm -A in -a -P udp -D $localnet
ipfwadm -A out -a -P udp -S $localnet
#
# Rules for default
ipfwadm -A in -a -P tcp -D $any ftp-data
ipfwadm -A out -a -P tcp -S $any ftp-data
ipfwadm -A in -a -P tcp -D $any www
ipfwadm -A out -a -P tcp -S $any www
ipfwadm -A in -a -P tcp -D $any
ipfwadm -A out -a -P tcp -S $any
ipfwadm -A in -a -P udp -D $any
ipfwadm -A out -a -P udp -S $any
#
# List the rules
ipfwadm -A -l -n
#
</screen>
</para>
<para>
The names ``ftp-data'' and ``www'' refer to lines in /etc/services.
The last command lists each of the Accounting rules and displays the
collected totals.
</para>
<para>
An important point to note when analyzing IP accounting is that totals
for all rules that match will be incremented so that to obtain
differential figures you need to perform appropriate maths. For
example if I wanted to know how much data was not ftp nor www I would
substract the individual totals from the rule that matches all ports.
</para>
<para>
<screen>
root# ipfwadm -A -l -n
IP accounting rules
pkts bytes dir prot source destination ports
0 0 in tcp 0.0.0.0/0 44.136.8.96/29 * -> 20
0 0 out tcp 44.136.8.96/29 0.0.0.0/0 20 -> *
10 1166 in tcp 0.0.0.0/0 44.136.8.96/29 * -> 80
10 572 out tcp 44.136.8.96/29 0.0.0.0/0 80 -> *
252 10943 in tcp 0.0.0.0/0 44.136.8.96/29 * -> *
231 18831 out tcp 44.136.8.96/29 0.0.0.0/0 * -> *
0 0 in udp 0.0.0.0/0 44.136.8.96/29 * -> *
0 0 out udp 44.136.8.96/29 0.0.0.0/0 * -> *
0 0 in tcp 0.0.0.0/0 0.0.0.0/0 * -> 20
0 0 out tcp 0.0.0.0/0 0.0.0.0/0 20 -> *
10 1166 in tcp 0.0.0.0/0 0.0.0.0/0 * -> 80
10 572 out tcp 0.0.0.0/0 0.0.0.0/0 80 -> *
253 10983 in tcp 0.0.0.0/0 0.0.0.0/0 * -> *
231 18831 out tcp 0.0.0.0/0 0.0.0.0/0 * -> *
0 0 in udp 0.0.0.0/0 0.0.0.0/0 * -> *
0 0 out udp 0.0.0.0/0 0.0.0.0/0 * -> *
</screen>
</para>
<para>
6.4. IP Accounting (for Linux-2.2)
The new accounting code is accessed via ``IP Firewall Chains''. See
the IP chains home page for more information. Among other things,
you'll now need to use ipchains instead of ipfwadm to configure your
filters. (From Documentation/Changes in the latest kernel sources).
</para>
</sect1 id="IP-Accounting">
<sect1 id="IP-Aliasing">
<title>IP-Aliasing</title>
<para>
This is a cookbook recipe on how to set up and run IP aliasing on a Linux box
and how to set up the machine to receive e-mail on the aliased IP addresses.
</para>
<para>
This feature of the Linux kernel provides the possibility of setting
multiple network addresses on the same low-level network device driver
(e.g two IP addresses in one Ethernet card). It is typically used for
services that act differently based on the address they listen on
(e.g. "multihosting" or "virtual domains" or "virtual hosting
services".
</para>
<para>
There are some applications where being able to configure multiple IP
addresses to a single network device is useful. Internet Service
Providers often use this facility to provide a `customized' to their
World Wide Web and ftp offerings for their customers. You can refer to
the ``IP-Alias mini-HOWTO'' for more information than you find here.
</para>
<para>
Quickstart:
</para>
<para>
After compiling and installing your kernel with IP_Alias support
configuration is very simple. The aliases are added to virtual network
devices associated with the actual network device. A simple naming
convention applies to these devices being <devname>:<virtual dev num>,
e.g. eth0:0, ppp0:10 etc. Note that the the ifname:number device can
only be configured after the main interface has been set up.
</para>
<para>
For example, assume you have an ethernet network that supports two
different IP subnetworks simultaneously and you wish your machine to
have direct access to both, you could use something like:
</para>
<para>
<screen>
root# ifconfig eth0 192.168.1.1 netmask 255.255.255.0 up
root# route add -net 192.168.1.0 netmask 255.255.255.0 eth0
root# ifconfig eth0:0 192.168.10.1 netmask 255.255.255.0 up
root# route add -net 192.168.10.0 netmask 255.255.255.0 eth0:0
</screen>
</para>
-----------------------------------------------------------------------------
<para>
1. My Setup
</para>
<para>
<EFBFBD><EFBFBD>*<2A>IP Alias is standard in kernels 2.0.x and 2.2.x, and available as a
compile-time option in 2.4.x (IP Alias has been deprecated in 2.4.x and
replaced by a more powerful firewalling mechanism.)
<EFBFBD><EFBFBD>*<2A>IP Alias compiled as a loadable module. You would have indicated in the
"make config" command to make your kernel, that you want the IP Masq to
be compiled as a (M)odule. Check the Modules HOW-TO (if that exists) or
check the info in /usr/src/linux/Documentation/modules.txt.
<EFBFBD><EFBFBD>*<2A>I have to support 2 additional IPs over and above the IP already
allocated to me.
<EFBFBD><EFBFBD>*<2A>A D-Link DE620 pocket adapter (not important, works with any Linux
supported network adapter).
</para>
<para>
<screen>
Kernel Compile Options:
Networking options --->
....
[*] Network aliasing
....
<*> IP: aliasing support
</screen>
</para>
-----------------------------------------------------------------------------
<para>
2. Commands
</para>
<para>
1. Load the IP Alias module (you can skip this step if you compiled the
module into the kernel):
</para>
<para>
<screen>
/sbin/insmod /lib/modules/`uname -r`/ipv4/ip_alias.o
</screen>
</para>
<para>
2. Setup the loopback, eth0, and all the IP addresses beginning with the
main IP address for the eth0 interface:
</para>
<para>
<screen>
/sbin/ifconfig lo 127.0.0.1
/sbin/ifconfig eth0 up
/sbin/ifconfig eth0 172.16.3.1
/sbin/ifconfig eth0:0 172.16.3.10
/sbin/ifconfig eth0:1 172.16.3.100
</screen>
</para>
<para>
172.16.3.1 is the main IP address, while .10 and .100 are the aliases.
The magic is the eth0:x where x=0,1,2,...n for the different IP
addresses. The main IP address does not need to be aliased.
</para>
<para>
3. Setup the routes. First route the loopback, then the net, and finally,
the various IP addresses starting with the default (originally allocated)
one:
</para>
<para>
<screen>
/sbin/route add -net 127.0.0.0
/sbin/route add -net 172.16.3.0 dev eth0
/sbin/route add -host 172.16.3.1 dev eth0
/sbin/route add -host 172.16.3.10 dev eth0:0
/sbin/route add -host 172.16.3.100 dev eth0:1
/sbin/route add default gw 172.16.3.200
</screen>
</para>
<para>
That's it.
</para>
<para>
In the example IP address above, I am using the Private IP addresses (RFC
1918) for illustrative purposes. Substitute them with your own official or
private IP addresses.
</para>
<para>
The example shows only 3 IP addresses. The max is defined to be 256 in /usr/
include/linux/net_alias.h. 256 IP addresses on ONE card is a lot :-)!
</para>
<para>
Here's what my /sbin/ifconfig looks like:
</para>
<para>
<screen>
lo Link encap:Local Loopback
inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0
UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1
RX packets:5088 errors:0 dropped:0 overruns:0
TX packets:5088 errors:0 dropped:0 overruns:0
eth0 Link encap:10Mbps Ethernet HWaddr 00:8E:B8:83:19:20
inet addr:172.16.3.1 Bcast:172.16.3.255 Mask:255.255.255.0
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:334036 errors:0 dropped:0 overruns:0
TX packets:11605 errors:0 dropped:0 overruns:0
Interrupt:7 Base address:0x378
eth0:0 Link encap:10Mbps Ethernet HWaddr 00:8E:B8:83:19:20
inet addr:172.16.3.10 Bcast:172.16.3.255 Mask:255.255.255.0
UP BROADCAST RUNNING MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0
TX packets:0 errors:0 dropped:0 overruns:0
eth0:1 Link encap:10Mbps Ethernet HWaddr 00:8E:B8:83:19:20
inet addr:172.16.3.100 Bcast:172.16.3.255 Mask:255.255.255.0
UP BROADCAST RUNNING MTU:1500 Metric:1
RX packets:1 errors:0 dropped:0 overruns:0
TX packets:0 errors:0 dropped:0 overruns:0
</screen>
</para>
<para>
And /proc/net/aliases:
</para>
<para>
<screen>
device family address
eth0:0 2 172.16.3.10
eth0:1 2 172.16.3.100
</screen>
</para>
<para>
And /proc/net/alias_types:
</para>
<para>
<screen>
type name n_attach
2 ip 2
</screen>
</para>
<para>
Of course, the stuff in /proc/net was created by the ifconfig command and not
by hand!
</para>
-----------------------------------------------------------------------------
<para>
3. Troubleshooting: Questions and Answers
</para>
<para>
3.1. Question: How can I keep the settings through a reboot?
</para>
<para>
Answer: Whether you are using BSD-style or SysV-style (Redhat?? for example)
init, you can always include it in /etc/rc.d/rc.local. Here's what I have on
my SysV init system (Redhat?? 3.0.3 and 4.0):
</para>
<para>
My /etc/rc.d/rc.local: (edited to show the relevant portions)
</para>
<para>
<screen>
#setting up IP alias interfaces
echo "Setting 172.16.3.1, 172.16.3.10, 172.16.3.100 IP Aliases ..."
/sbin/ifconfig lo 127.0.0.1
/sbin/ifconfig eth0 up
/sbin/ifconfig eth0 172.16.3.1
/sbin/ifconfig eth0:0 172.16.3.10
/sbin/ifconfig eth0:1 172.16.3.100
#setting up the routes
echo "Setting IP routes ..."
/sbin/route add -net 127.0.0.0
/sbin/route add -net 172.16.3.0 dev eth0
/sbin/route add -host 172.16.3.1 eth0
/sbin/route add -host 172.16.3.10 eth0:0
/sbin/route add -host 172.16.3.100 eth0:1
/sbin/route add default gw 172.16.3.200
#
</screen>
</para>
-----------------------------------------------------------------------------
<para>
3.2. Question: How do I set up the IP aliased machine to receive e-mail on
the various aliased IP addresses (on a machine using sendmail)?
</para>
<para>
Answer: Create (if it doesn't already exist) a file called, /etc/
mynames.cw,for example. The file does not have to be this exact name nor in
the /etc directory.
</para>
<para>
In that file, place the official domain names of the aliased IP addresses. If
these aliased IP addresses do not have a domain name, then you can place the
IP address itself.
</para>
<para>
The /etc/mynames.cw might look like this:
</para>
<para>
<screen>
# /etc/mynames.cw - include all aliases for your machine here; # is a comment
domain.one.net
domain.two.com
domain.three.org
4.5.6.7
</screen>
</para>
<para>
In your sendmail.cf file, where it defines a file class macro Fw, add the
following:
</para>
<para>
<screen>
##################
# local info #
##################
# file containing names of hosts for which we receive email
Fw/etc/mynames.cw
That should do it. Test out the new setting by invoking sendmail in test
mode. The following is an example:
ganymede$ /usr/lib/sendmail -bt
ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter < ruleset> < address>
> 0 me@4.5.6.7
rewrite: ruleset 0 input: me @ 4 . 5 . 6 . 7
rewrite: ruleset 98 input: me @ 4 . 5 . 6 . 7
rewrite: ruleset 98 returns: me @ 4 . 5 . 6 . 7
rewrite: ruleset 97 input: me @ 4 . 5 . 6 . 7
rewrite: ruleset 3 input: me @ 4 . 5 . 6 . 7
rewrite: ruleset 96 input: me < @ 4 . 5 . 6 . 7 >
rewrite: ruleset 96 returns: me < @ 4 . 5 . 6 . 7 . >
rewrite: ruleset 3 returns: me < @ 4 . 5 . 6 . 7 . >
rewrite: ruleset 0 input: me < @ 4 . 5 . 6 . 7 . >
rewrite: ruleset 98 input: me < @ 4 . 5 . 6 . 7 . >
rewrite: ruleset 98 returns: me < @ 4 . 5 . 6 . 7 . >
rewrite: ruleset 0 returns: $# local $: me
rewrite: ruleset 97 returns: $# local $: me
rewrite: ruleset 0 returns: $# local $: me
> 0 me@4.5.6.8
rewrite: ruleset 0 input: me @ 4 . 5 . 6 . 8
rewrite: ruleset 98 input: me @ 4 . 5 . 6 . 8
rewrite: ruleset 98 returns: me @ 4 . 5 . 6 . 8
rewrite: ruleset 97 input: me @ 4 . 5 . 6 . 8
rewrite: ruleset 3 input: me @ 4 . 5 . 6 . 8
rewrite: ruleset 96 input: me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 96 returns: me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 3 returns: me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 0 input: me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 98 input: me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 98 returns: me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 95 input: < > me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 95 returns: me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 0 returns: $# smtp $@ 4 . 5 . 6 . 8 $: me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 97 returns: $# smtp $@ 4 . 5 . 6 . 8 $: me < @ 4 . 5 . 6 . 8 >
rewrite: ruleset 0 returns: $# smtp $@ 4 . 5 . 6 . 8 $: me < @ 4 . 5 . 6 . 8 >
>
</screen>
</para>
<para>
Notice when I tested me@4.5.6.7, it delivered the mail to the local machine,
while me@4.5.6.8 was handed off to the smtp mailer. That is the correct
response.
</para>
<para>
3.3. Question: How do I delete an alias?
</para>
<para>
Answer: To delete an alias you simply add a `-' to the end of its name and
refer to it and is as simple as:
</para>
<para>
<screen>
root# ifconfig eth0:0- 0
</screen>
</para>
<para>
All routes associated with that alias will also be deleted
automatically.
</para>
<para>
You are all set now.
</para>
</sect1 id="IP-Aliasing">
<sect1 id="Multicasting">
<title>Multicasting</title>
<para>
* Multicast HOWTO
A good page providing comparisons between reliable multicast protocols
is
<http://www.tascnets.com/mist/doc/mcpCompare.html>.
A very good and up-to-date site, with lots of interesting links
(Internet drafts, RFCs, papers, links to other sites) is:
<http://research.ivv.nasa.gov/RMP/links.html>.
<http://hill.lut.ac.uk/DS-Archive/MTP.html> is also a good source of
information on the subject.
Katia Obraczka's "Multicast Transport Protocols: A Survey and
Taxonomy" article gives short descriptions for each protocol and tries
to classify them according to different features. You can read it in
the IEEE Communications magazine, January 1998, vol. 36, No. 1.
10. References.
10.1. RFCs.
o RFC 1112 "Host Extensions for IP Multicasting". Steve Deering.
August 1989.
o RFC 2236 "Internet Group Management Protocol, version 2". W.
Fenner. November 1997.
o RFC 1458 "Requirements for Multicast Protocols". Braudes, R and
Zabele, S. May 1993.
o RFC 1469 "IP Multicast over Token-Ring Local Area Networks". T.
Pusateri. June 1993.
o RFC 1390 "Transmission of IP and ARP over FDDI Networks". D. Katz.
January 1993.
o RFC 1583 "OSPF Version 2". John Moy. March 1994.
o RFC 1584 "Multicast Extensions to OSPF". John Moy. March 1994.
o RFC 1585 "MOSPF: Analysis and Experience". John Moy. March 1994.
o RFC 1812 "Requirements for IP version 4 Routers". Fred Baker,
Editor. June 1995
o RFC 2117 "Protocol Independent Multicast-Sparse Mode (PIM-SM):
Protocol Specification". D. Estrin, D. Farinacci, A. Helmy, D.
Thaler; S. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, and
L. Wei. July 1997.
o RFC 2189 "Core Based Trees (CBT version 2) Multicast Routing". A.
Ballardie. September 1997.
o RFC 2201 "Core Based Trees (CBT) Multicast Routing Architecture".
A. Ballardie. September 1997.
10.2. Internet Drafts.
o "Introduction to IP Multicast Routing". draft-ietf-mboned-intro-
multicast- 03.txt. T. Maufer, C. Semeria. July 1997.
o "Administratively Scoped IP Multicast". draft-ietf-mboned-admin-ip-
space-03.txt. D. Meyer. June 10, 1997.
10.3. Web pages.
o Linux Multicast Homepage.
<http://www.cs.virginia.edu/~mke2e/multicast.html>
o Linux Multicast FAQ. <http://andrew.triumf.ca/pub/linux/multicast-
FAQ>
o Multicast and MBONE on Linux.
<http://www.teksouth.com/linux/multicast/>
o Christian Daudt's MBONE-Linux Page.
<http://www.microplex.com/~csd/linux/mbone.html>
o Reliable Multicast Links
<http://research.ivv.nasa.gov/RMP/links.html>
o Multicast Transport Protocols <http://hill.lut.ac.uk/DS-
Archive/MTP.html>
10.4. Books.
o "TCP/IP Illustrated: Volume 1 The Protocols". Stevens, W. Richard.
Addison Wesley Publishing Company, Reading MA, 1994
o "TCP/IP Illustrated: Volume 2, The Implementation". Wright, Gary
and W. Richard Stevens. Addison Wesley Publishing Company, Reading
MA, 1995
o "UNIX Network Programming Volume 1. Networking APIs: Sockets and
XTI". Stevens, W. Richard. Second Edition, Prentice Hall, Inc.
1998.
o "Internetworking with TCP/IP Volume 1 Principles, Protocols, and
Architecture". Comer, Douglas E. Second Edition, Prentice Hall,
Inc. Englewood Cliffs, New Jersey, 1991
</sect1 id="Multicast">
<sect1 id="Network-Management">
<title>Network-Management</title>
<para>
There is an impressive number of tools focused on network management
and remote administration under Linux. Some interesting remote administration
projects are linuxconf and webmin:
</para>
<para>
<EFBFBD> Webmin <http://www.webmin.com/webmin/>
<EFBFBD> Linuxconf <http://www.solucorp.qc.ca/linuxconf/>
</para>
<para>
Other tools include network traffic analysis tools, network security
tools, monitoring tools, configuration tools, etc. An archive of many
of these tools may be found at Metalab
<http://www.metalab.unc.edu/pub/Linux/system/network/>
</para>
9.2. SNMP
<para>
The Simple Network Management Protocol is a protocol for Internet
network management services. It allows for remote monitoring and
configuration of routers, bridges, network cards, switches, etc...
There is a large amount of libraries, clients, daemons and SNMP based
monitoring programs available for Linux. A good page dealing with SNMP
and Linux software may be found at : http://linas.org/linux/NMS.html
</para>
10. Enterprise Linux Networking
<para>
In certain situations it is necessary for the networking
infrastructure to have proper mechanisms to guarantee network
availability nearly 100% of the time. Some related techniques are
described in the following sections. Most of the following material
can be found at the excellent Linas website:
http://linas.org/linux/index.html and in the Linux High-Availability
HOWTO <http://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-
Availability-HOWTO.html>
</para>
10.1. High Availability
<para>
Redundancy is used to prevent the overall IT system from having single
points of failure. A server with only one network card or a single
SCSI disk has two single points of failure. The objective is to mask
unplanned outages from users in a manner that lets users continue to
work quickly. High availability software is a set of scripts and tools
that automatically monitor and detect failures, taking the appropriate
steps to restore normal operation and to notifying system
administrators.
</para>
</sect1 id="Networking-Management">
<sect1 id="Redundant-Networking">
<title>Redundant-Networking</title>
<para>
IP Address Takeover (IPAT). When a network adapter card fails, its IP
address should be taken by a working network card in the same node or
in another node. MAC Address Takeover: when an IP takeover occurs, it
should be made sure that all the nodes in the network update their ARP
caches (the mapping between IP and MAC addresses).
</para>
<para>
See the High-Availability HOWTO for more details:
http://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-
HOWTO.html
</para>
</sect1 id="Redundant-Networking">
10.3. Redundant networking
IP Address Takeover (IPAT). When a network adapter card fails, its IP
address should be taken by a working network card in the same node or
in another node. MAC Address Takeover: when an IP takeover occurs, it
should be made sure that all the nodes in the network update their ARP
caches (the mapping between IP and MAC addresses).
See the High-Availability HOWTO for more details:
http://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-
HOWTO.html