
253 lines
12 KiB
Raw Permalink Normal View History

2002-07-22 14:27:30 +00:00
<sect1 id="performance">
<title>Optimizing NFS Performance</title>
Getting network settings right can improve NFS performance many times
over -- a tenfold increase in transfer speeds is not unheard of.
The most important things to get right are the <userinput>rsize</userinput>
and <userinput>wsize</userinput> <command>mount</command> options. Other factors listed below
may affect people with particular hardware setups.
<sect2 id="blocksizes">
<title>Setting Block Size to Optimize Transfer Speeds</title>
The <userinput>rsize</userinput> and <userinput>wsize</userinput>
<command>mount</command> options specify the size of the chunks of data
that the client and server pass back and forth to each other. If no
<userinput>rsize</userinput> and <userinput>wsize</userinput> options
are specified, the default varies by which version of NFS we are using.
4096 bytes is the most common default, although for TCP-based mounts
in 2.2 kernels, and for all mounts beginning with 2.4 kernels, the
server specifies the default block size.
The defaults may be too big or too small. On the one hand, some
combinations of Linux kernels and network cards (largely on older
machines) cannot handle blocks that large. On the other hand, if they
can handle larger blocks, a bigger size might be faster.
So we'll want to experiment and find an rsize and wsize that works
and is as fast as possible. You can test the speed of your options
with some simple commands.
The first of these commands transfers 16384 blocks of 16k each from
the special file <filename>/dev/zero</filename> (which if you read it
just spits out zeros _really_ fast) to the mounted partition. We will
time it to see how long it takes. So, from the client machine, type:
# time dd if=/dev/zero of=/mnt/home/testfile bs=16k count=16384
This creates a 256Mb file of zeroed bytes. In general, you should
create a file that's at least twice as large as the system RAM
on the server, but make sure you have enough disk space! Then read
back the file into the great black hole on the client machine
(<filename>/dev/null</filename>) by typing the following:
# time dd if=/mnt/home/testfile of=/dev/null bs=16k
Repeat this a few times and average how long it takes. Be sure to
unmount and remount the filesystem each time (both on the client and,
if you are zealous, locally on the server as well), which should clear
out any caches.
Then unmount, and mount again with a larger and smaller block size.
They should probably be multiples of 1024, and not larger than
8192 bytes since that's the maximum size in NFS version 2. (Though
if you are using Version 3 you might want to try up to 32768.)
Wisdom has it that the block size should be a power of two since most
of the parameters that would constrain it (such as file system block
sizes and network packet size) are also powers of two. However, some
users have reported better successes with block sizes that are not
powers of two but are still multiples of the file system block size
and the network packet size.
Directly after mounting with a larger size, cd into the mounted
file system and do things like ls, explore the fs a bit to make
sure everything is as it should. If the rsize/wsize is too large
the symptoms are very odd and not 100% obvious. A typical symptom
is incomplete file lists when doing 'ls', and no error messages.
Or reading files failing mysteriously with no error messages. After
establishing that the given rsize/wsize works you can do the speed
tests again. Different server platforms are likely to have different
optimal sizes. SunOS and Solaris is reputedly a lot faster with 4096
byte blocks than with anything else.
<emphasis>Remember to edit <filename>/etc/fstab</filename> to reflect the rsize/wsize you found.</emphasis>
<sect2 id="packet-and-network">
<title>Packet Size and Network Drivers</title>
There are many shoddy network drivers available for Linux,
including for some fairly standard cards.
Try pinging back and forth between the two machines with large
packets using the <option>-f</option> and <option>-s</option>
options with <command>ping</command> (see <command>man ping</command>)
for more details and see if a lot of packets get or if they
take a long time for a reply. If so, you may have a problem
with the performance of your network card.
To correct such a problem, you may wish to reconfigure the packet
size that your network card uses. Very often there is a constraint
somewhere else in the network (such as a router) that causes a
smaller maximum packet size between two machines than what the
network cards on the machines are actually capable of. TCP should
autodiscover the appropriate packet size for a network, but UDP
will simply stay at a default value. So determining the appropriate
packet size is especially important if you are using NFS over UDP.
You can test for the network packet size using the tracepath command:
From the client machine, just type <command>tracepath [server] 2049</command>
and the path MTU should be reported at the bottom. You can then set the
MTU on your network card equal to the path MTU, by using the MTU option
to <command>ifconfig</command>, and see if fewer packets get dropped.
See the <command>ifconfig</command> man pages for details on how to reset the MTU.
<sect2 id="nfsd-instance">
<title>Number of Instances of NFSD</title>
Most startup scripts, Linux and otherwise, start 8 instances of nfsd.
In the early days of NFS, Sun decided on this number as a rule of thumb,
and everyone else copied. There are no good measures of how many
instances are optimal, but a more heavily-trafficked server may require
more. If you are using a 2.4 or higher kernel and you want to see how
heavily each nfsd thread is being used, you can look at the file
<filename>/proc/net/rpc/nfsd</filename>. The last ten numbers on the
<emphasis>th</emphasis> line in that file indicate the number of seconds
that the thread usage was at that percentage of the maximum allowable.
If you have a large number in the top three deciles, you may wish to
increase the number of <command>nfsd</command> instances. This is done
upon starting <command>nfsd</command> using the number of instances as
the command line option. See the <command>nfsd</command> man page for
more information.
<sect2 id="memlimits">
<title>Memory Limits on the Input Queue</title>
On 2.2 and 2.4 kernels, the socket input queue, where requests
sit while they are currently being processed, has a small default
size limit of 64k. This means that if you are running 8 instances of
<command>nfsd</command>, each will only have 8k to store requests while it processes
You should consider increasing this number to at least 256k for <command>nfsd</command>.
This limit is set in the proc file system using the files
<filename>/proc/sys/net/core/rmem_default</filename> and <filename>/proc/sys/net/core/rmem_max</filename>.
It can be increased in three steps; the following method is a bit of
a hack but should work and should not cause any problems:
<orderedlist Numeration="loweralpha">
<para>Increase the size listed in the file:
echo 262144 > /proc/sys/net/core/rmem_default
echo 262144 > /proc/sys/net/core/rmem_max
Restart <command>nfsd</command>, e.g., type <command>/etc/rc.d/init.d/nfsd restart</command> on Red Hat
Return the size limits to their normal size in case other kernel systems depend on it:
echo 65536 > /proc/sys/net/core/rmem_default
echo 65536 > /proc/sys/net/core/rmem_max
Be sure to perform this last step because machines have been reported
to crash if these values are left changed for long periods of time.
<sect2 id="frag-overflow">
<title>Overflow of Fragmented Packets</title>
The NFS protocol uses fragmented UDP packets. The kernel has
a limit of how many fragments of incomplete packets it can
buffer before it starts throwing away packets. With 2.2 kernels
that support the <filename>/proc</filename> filesystem, you can
specify how many by editing the files
<filename>/proc/sys/net/ipv4/ipfrag_high_thresh</filename> and
Once the number of unprocessed, fragmented packets reaches the
number specified by <userinput>ipfrag_high_thresh</userinput> (in bytes), the kernel
will simply start throwing away fragmented packets until the number
of incomplete packets reaches the number specified
by <userinput>ipfrag_low_thresh</userinput>. (With 2.2 kernels, the default is usually 256K).
This will look like packet loss, and if the high threshold is
reached your server performance drops a lot.
One way to monitor this is to look at the field IP: ReasmFails in the
file <filename>/proc/net/snmp</filename>; if it goes up too quickly during heavy file
activity, you may have problem. Good alternative values for
<userinput>ipfrag_high_thresh</userinput> and <userinput>ipfrag_low_thresh</userinput>
have not been reported; if you have a good experience with a
particular value, please let the maintainers and development team know.
<sect2 id="autonegotiation">
<title>Turning Off Autonegotiation of NICs and Hubs</title>
Sometimes network cards will auto-negotiate badly with
hubs and switches and this can have strange effects.
Moreover, hubs may lose packets if they have different
ports running at different speeds. Try playing around
with the network speed and duplex settings.
<sect2 id="non-nfs-performance">
<title>Non-NFS-Related Means of Enhancing Server Performance</title>
Offering general guidelines for setting up a well-functioning
file server is outside the scope of this document, but a few
hints may be worth mentioning: First, RAID 5 gives you good
read speeds but lousy write speeds; consider RAID 1/0 if both
write speed and redundancy are important. Second, using a
journalling filesystem will drastically reduce your reboot
time in the event of a system crash; as of this writing, ext3
(<ulink url="ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/">ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/</ulink>) was the only
journalling filesystem that worked correctly with
NFS version 3, but no doubt that will change soon.
In particular, it looks like <ulink url="http://www.namesys.com">Reiserfs</ulink>
should work with NFS version 3 on 2.4 kernels, though not yet
on 2.2 kernels. Finally, using an automounter (such as autofs
or amd) may prevent hangs if you cross-mount files
on your machines (whether on purpose or by oversight) and one of those
machines goes down. See the
<ulink url="http://www.linuxdoc.org/HOWTO/mini/Automount.html">Automount Mini-HOWTO</ulink>
for details.