Network Tuning
Introduction to network tuning Mosix, Beowulf, and other clustering technologies for Linux all depend on an effective network layer. Any delays in getting data from the application to the physical layer would cause even greater delays in the time to complete the task. Much work has been done to the Linux network stack to reduce this time, known as latency, and increase the amount of data that can be sent over a period of time, known as bandwidth. Latency is important as under high loads, the delay of getting data through the network layer will slow down the number of requests that can be handled at once. Bandwidth is important since you want services like NFS to be able to saturate its network link with data. This means that should all the services of a file server be needed, it can be provided to as many clients as request it. This chapter will start with an explination of the available networking technologies, then get into how some of these technologies can be tuned for both latency and bandwidth improvements. We will then take a look at some of the standard TCP/IP applications and some tuning hints or gotchas.
Network Technologies Overview As is the case in every other section of this book, the faster or better the technology, the more expensive it is. But this should not stop you from cobbling together older technology to make it a bit better than it is at face value. The first example of this is Beowulf. The original design goal of Beowulf was to take existing, low cost technology and create a supercomputer out of it. To do this, machines were networked together using Ethernet running at 10 megabit. Since this is too slow to have good communication between the nodes, the team did something different - bonded two 10 megabit cards together to create one 20 megabit connection. This can be done in software, and doubled the performance of a system that would have cost too much to upgrade to 100 megabit connection per machine. These days, 100 megabit connections are standard for most machines, with the cost of these cards being under $75US per card. Much like the cards of a few years ago had faster counterparts, so do today's cards. Gigabit Ethernet over copper or fiber is available for those who want to spend a few hundred dollars. If you want to get really outrageous and spend a few thousand dollars, you can get a proprietary, high speed (2GB), low latency network technology called Myrinet. Myrinet is really designed for clusters, mostly due to the high cost per server, and dropping down from this networking technology to standard Ethernet would lose many of the Myrinet advantages. Standard Ethernet works with a card, a hub, switch or router, and a cable to connect it all together. The card acts as an interface between the system and the Ethernet bus, and there are a variety of cards available, each with varying interface chips, memory, boot roms, and prices. To explain the difference between hubs, switches, and routers, we have to take a look at Ethernet, TCP/IP, and layering. When a packet of data leaves a web server to talk to the web browser, it is a TCP/IP packet. That TCP/IP packet is encapsulated within an Ethernet packet. An Ethernet packet is destined only for the local network, and the TCP/IP packet will be re-encapsulated if it has to go on another network. The TCP/IP packet contains the source and destination TCP/IP addresses for the packet, while the Ethernet packet contains the source and destination addresses for only the local network. A hub is an unintelligent device in that if one port of the hub gets a packet, all other ports on the hub will receive that packet. Due to this, most hubs are very inexpensive, but have poor performance, especially as the load increases. A switch, on the other hand, knows what Ethernet devices exist on each port and can quickly route Ethernet traffic between ports. If a packet comes in on port one, and the switch knows the packet is destined to a machine on port twelve, the switch will send the packet only to port twelve. None of the other ports will see the packet. As a result, the cost for a switch is higher, but allows greater bandwidth for devices connected to the switch. Both hubs and switches work on the Ethernet layer, so they only look at the Ethernet wrapper for the packet and work only on packets that are in the local network. When you want to connect two networks together, or connect to the Internet, you require a router. The router opens the Ethernet envelope, then takes a look at the source and destination of the TCP/IP packet and sends the packet to the appropriate interface. The nice thing about routers is that the interfaces on it does not have to be Ethernet. Many routers combine an Ethernet port and T1 CSU/DSU, or Ethernet and Fiber, and so on. The router re-addresses the TCP/IP packet to go from one phyical media to another. Routers usually have to have a good bit of brains in them to handle the packet re-writing, dynamic routing, and also requires things like SNMP management and some interface for the user to talk to it. Thus, routers will be more expensive than hubs or switches. Linux can act as a router as well, also tying in technologies like IP Masquerading and packet filtering. In some cases, a dedicated Linux box can perform routing less expensively than a standalone router, since most of the brains of being a router is already in Linux.
Networking with Ethernet Ethernet is the standard networking environment for Linux. It was one of the first to be implemented in Linux, the other being AX.25 (Ham Radio networking). Ethernet has been a standard for over twenty years, and the protocol is fairly easy to implement, interface cards are inexpensive, cabling is easy to do, and the other glue (hubs and switches) that holds the network together is also inexpensive. Linux can run a number of protocols natively over Ethernet, including SMB, TCP/IP, AppleTalk, and DECnet. By far the most popular is TCP/IP, since that is the standard protocol for UNIX machines and the Internet. Tuning Ethernet to work with your particular hardware, as always, depends on the interface cards, cabling options, and switching gear you use. Many of the tools that Linux uses are standard across most networking equipment. Most cards available today for Linux are PCI-based, so configuration of cards through setting IRQ and base addresses is not required - the driver will probe and use the configuration set by the motherboard. If you do need to force IRQ or io settings, you can do that when loading the module or booting the system by passing the arguments irq=x and io=y. Most of tuning Ethernet for Linux is to get the best physical connection between the interface card and the switching gear. Best performance comes from a switch, and switches have a few extra tricks up their sleeve. One of these is full and half duplexing. In a half-duplex setup, the card talks to the switch, then the switch talks to the card, and so on. Only one of the two ends can be talking at the same time. In full duplex mode, both card and switch can have data on the wire at the same time. Since there's different lines for transmit and receive in Ethernet, there's less congestion on the wire. This is a great solution for high bandwidth devices that have a lot of data coming in and out, such as a file server, since it can be sending files while receiving requests. Normally, the Ethernet card and switching gear negotiate a connection speed and duplex option using MII, the Media Independent Interface. Forcing a change can be used for debugging issues, or for getting higher performance. The mii-tool utility can show or set the speed and duplexing options for your Ethernet links. mii-tool -v, --verbose -V, --version -R, --reset -r, --restart -w, --watch -l, --log -A, --advertise=media -F, --force=media interface <command>mii-tool</command> Media Types media 100Mbit 10Mbit Full Duplex Half Duplex 100baseTx-HD X X 100baseTx-FD X X 10baseT-HD X X 10baseT-FD X X
Also available is 100baseT4, which is an earlier form of 100Mbit that uses four pairs of wires whereas 100BaseTx uses two pairs of wires. This protocol is not available for most modern Ethernet cards. The most common use of mii-tool is to change the setting of the media interface from half to full duplex. Most non-intelligent hubs and switches will try to negotiate half duplex by default, and intelligent switches can be set to negotiate full duplex through some configuration options. To set an already-existing connection to 100 Mbit, full duplex: # mii-tool -F 100baseTx-FD eth0 # mii-tool eth0 eth0: 100 Mbit, full duplex, link ok # Another way of doing this is to advertise 100 Mbit, full duplex and 100 Mbit, half duplex: # mii-tool -A 100baseTx-FD,100baseTx-HD eth0 restarting autonegotiation... # Using autonegotiation will not work with many non-intelligent devices and will cause you to drop back to 100baseTx-HD (half duplex). Use force when the gear you are talking to is not managed, and use autonegotiate if the gear is managed. This program only works with chipsets and drivers that support MII. The Intel eepro100 cards implement this, but others may not. If your driver does not support MII, you may need to force the setting at boot time when the driver is loaded.
Tuning TCP/IP performance Setting the Maximum Transmission Unit (MTU) of a network interface can be used to tune performance over a TCP/IP link. The MTU is used to set the maximum size of a packet that goes out on the wire. If data is set to go out that is larger than the MTU, the packet is broken up into smaller packets. This can take up some processing time to create the Ethernet packets, and decreases bandwidth. Ethernet has a set number of bytes it adds on to a packet, no matter the size. Larger packets will have a smaller percentage of overhead used up by the Ethernet header. On the other hand, smaller packets is better for latency, since TCP/IP will wait for the MTU to be filled, or a timeout to occur before sending a packet of data. In the event of an interactive TCP/IP connection (such as telnet or ssh), the user does not want to wait long for their packet to make it from their machine to the remote machine. Smaller MTUs make sure the packet size is met earlier and the packet goes out quickly. In addition, MTUs also have to fit into the size of the medium the packet is running over. Ethernet has a maximum packet size of ??WHATISIT??, counting the Ethernet header packets. Asynchronous Transfer Mode, or ATM, has a very small MTU, on the order of a few bytes. By default, Ethernet TCP/IP connections have a MTU of 1500 bytes. The MTU can be set using ifconfig: # ifconfig eth0 mtu 1500 It is recommended to leave the MTU at the maximum number, since almost all non-interactive TCP/IP applications will transfer more than 1500 bytes per session, and a bit of latency for interactive applications is more an annoyance than an actual performance bottleneck. When using Domain Name Servers (DNS), you may run into cases where DNS resolution is a performance bottleneck. We will get into this more in , but some applications recommend for best performance to log the raw TCP/IP addresses that come in and do not try to resolve it to a name. For security reasons, you may want to change this so you can quickly find out what machine is trying to break into your web server. This decision is left to you, the administrator, as part of the never-ending balance between performance and security. A potential fix for this is to run a caching name server locally to store often-used TCP/IP addresses and name, and leave the real DNS serving to another machine. Applications like ping will sometimes appear to fail if DNS is not configured properly, even if you try to ping a TCP/IP address instead of using the name. The solution to this and other TCP/IP management applications, is to find the option that prevents resolution of names or TCP/IP addresses. For ping, this is to give the -n option. # ping -n 192.168.1.50
Tuning Linux dialup If your connection to the rest of the world is through a dialup link, don't worry. covers many of the issues that you will see for dialup. The MTU for PPP is recommended at 1500, but slower links may want an MTU of 296 to have improved latency. Modems in Linux should be full fledged modems, not ones labeled WinModem or Soft Modems. Each of these styles of modem pass much of the processing work off to the CPU. This makes the cards very inexpensive, but increases the load on the CPU. Modems can be internal or external, but internal modems include their own port settings that may be easier to use than with external modems, since the modem and serial port are in the same card. If you use external modems, set the data link between the serial port and the modem to be the highest the modem supports, which is usually 115,200bps or 230,400bps. This ensures that the modem can talk as fast as it needs to with the machine. You should also ensure that you are using RTS/CTS handshaking, also known as hardware, or out-of-band flow control. This allows a modem to immediately tell Linux to stop sending data to the modem, preventing loss of data. Controlling the Linux serial port is used with the setserial command. You can find the available serial ports at bootup, or by looking at /proc/tty/driver/serial. Entries in that file that have a UART listed exist in Linux. Remember that COM1 or Serial 1 listed on your box will be listed as /dev/ttyS0, and COM2 is /dev/ttyS1. Most modern applications can comprehend speeds greater than 38,400bps, but some older ones do not. To compensate for this, Linux has made 38,400bps (or 38.4kbps) a magic speed, and if an application asks for 38.4kbps, Linux will translate this to another speed. Currently, this can be as high as 460kbps. To use this, the setserial command is used. # setserial /dev/ttyS0 /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4 # setserial /dev/ttyS0 spd_vhi # setserial /dev/ttyS0 /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4, Flags: spd_vhi Speeds are listed as codes for setserial, and these codes are listed in . These values can be set by non-root users. Settings for <command>setserial</command> Code Speed spd_normal 38.4kbps spd_hi 57.6kbps spd_vhi 115.2kbps spd_shi 230.4kbps spd_warp 460.8kbps
There is also a spd_cust entry that can use a custom speed instead of 38.4kbps. The resulting speed becomes the value of baud_base divided by the value of divisor. If both sides have it available, compression using Deflate or BSD is available and can increse throughput. Even though many newer modem protocols provide compression, it isn't very strong. By using Deflate or BSD, greater compression at little loss of CPU or memory space can be attained. The down side is that both sides need to have either Deflate or BSD compression available built into the PPP software. BSD compression can be activated by using the option passed to pppd followed by a compression level between 9 and 15. Higher numbers indicate higher compression. Deflate compression can be used with the option followed by a number in the range of 8 to 15. Deflate is preferred by the pppd used by Linux.
Wireless Ethernet Wireless Ethernet, also known as IEEE 802.11b, is becoming more popular as the cost to implement decreases and availability of more products increase. The Apple AirPort and Lucent Orinoco cards have brought wireless into the home market, allowing a person to have Ethernet access anywhere in their house, and schools are deploying Wireless Ethernet across the campus. It's now available at airports, schools, many high tech companies, and soon upscale coffee shops. Given that 802.11b is most popular for laptops, since they are portable, tuning for performance is not as great importance as tuning for power usage. Using 802.11b is often very power-consuming and can quickly drain the batteries. Some cards (such as the Lucent Orinoco card) have the ability to turn its antenna on and off at regular intervals. Instead of the antenna being on all the time, it turns on a few times a second. With the transmitter being turned off now more than half the time, the battery usage is decreased. There is an increase of latency and decrease in data rate. To set the power management of the card, you will need to have the wireless tools, available with many distributions. This package contains three commands for managing your wireless card: iwconfig, iwspy, and iwpriv. The iwconfig is an extension of ifconfig. Run without any options, it will check all available interfaces and checks for wireless extensions. If there are any, it will report similar to the following: wvlan0 IEEE 802.11-DS ESSID:"default" Nickname:"HERMES I" Frequency:2.437GHz Sensitivity:1/3 Mode:Managed Access Point: 00:90:4B:08:13:1C Bit Rate:2Mb/s RTS thr:off Fragment thr:off Power Management:off Link quality:8/92 Signal level:-88 dBm Noise level:-96 dBm Rx invalid nwid:0 invalid crypt:0 invalid misc:599 As you can see, the output tells you a variety of statistics on the link. My current bit rate is 2Mb/s, probably because my link quality is so low. The link quality is 8 out of 92, indicating that I should either move my laptop, move my base statiopn, or throw out my 2.4Ghz phone. You can also see that power management is currently off. Since my laptop is plugged into the wall, this is not a concern to me. If I did want to activate the power management, I would use: # iwconfig wvlan0 power 1 # iwconfig iwconfig lo no wireless extensions. wvlan0 IEEE 802.11-DS ESSID:"default" Nickname:"HERMES I" Frequency:2.437GHz Sensitivity:1/3 Mode:Managed Access Point: 00:90:4B:08:13:1C Bit Rate:2Mb/s RTS thr:off Fragment thr:off Encryption key:off Power Management period:1s mode:All packets received Link quality:11/92 Signal level:-84 dBm Noise level:-95 dBm Rx invalid nwid:0 invalid crypt:0 invalid misc:0 # The power management is now set to turn the transmitter on only once per second. By default, the time is in seconds, but but appending m or u to the end of the number will make it milliseconds or microseconds. All that being said, here are a few ways in improve the link quality of your system. Any combination of these will work, so do not expect one method alone to work. Check the infrastructure and building materials. Thick wood or metal walls will cause a lot of interference. Line of sight to the base station is best. Some base stations and wireless cards support external antennas. They will greatly improve the range and quality of the link. Move the base station around. Line of sight is best, but not required. Turn off other devices that use 2.4Ghz. Some phones and other wireless gadgets use the same frequency, and if not built properly, will cause the wireless Ethernet cards to continually scan through frequencies for the correct one, dropping performance.
Monitoring Network Performance The best way to make sure your network is not the bottleneck is to monitor how much traffic is flowing. Because of collision detection and avoidance in Ethernet, once the load gets above about 50% to 60% of its maximum, you will start to see degrading performance if using hubs. This number if higher for switches, but still exists since the silicon on the switch needs to analyze and move the data around. To make best use of your networking equipment, you will want to monitor the amount of traffic that is flowing through the network. The easiest way to do this is to use SNMP, or Simple Network Management Protocol. SNMP was designed to manage and monitor machines via the network, be it servers, desktops, or other network devices such as switches or network storage. As you would guess, there are SNMP clients and servers avilable for Linux to monitor the statistics and usage of network interfaces. SNMP uses an MIB or Management Information Base to keep track of the features of an SNMP device. While a Linux box can have things like monitoring the number of users logged in, a Cisco router will not need these functions. So the MIBs are used to identify devices and their particular features. The SNMP daemon for Linux is net-snmp, formerly known as usd-snmp, and based on the cmu SNMP package. Your distribution should be mostly configured. The only thing you need to do is set the community name, which is really just the password to access the snmpd server. By default, the community name is private, but should be changed to something else. You will also want to change the security such that you have readonly access to snmpd. # sec.name source community com2sec paranoid default public #com2sec readonly default public #com2sec readwrite default private Change the paranoid above to read readonly and restart snmpd. This setting will give readonly access to the entire world to your SNMP server. While a malicious intruder will not be able to change data on your machine, it can give them plenty of information about your box to find a weakness and exploit it. You can change the source entry to a machine name, a network address. Default means any machine can access snmpd. You can test that snmpd is working properly by using snmpwalk to query snmpd. snmpwalk host community start point $ snmpwalk 192.168.1.175 public system.sysDescr.0 system.sysDescr.0 = Linux clint 2.2.18 #1 Mon Dec 18 11:23:05 EST 2000 i686 $ Since this example uses system.sysDescr.0 as its start point, there is only one entry that gets listed, that of the output of uname.
Network Monitoring with MRTG The most popular application for monitoring network traffic is MRTG, the Multi Router Traffic Grapher. MRTG tracks and graphs network usage in graphs ranging from the last 24 hours to a year, all on a web page. MRTG uses SNMP to fetch information from routers. You can also track individual servers for ingoing and outgoing traffic. The process of monitoring a server using SNMP will consume a small portion of network, memory, and CPU. MRTG is available for Red Hat and Debian distributions. You can also download the source from the MRTG home page. Once installed, you will need to configure MRTG to point to the servers or routers you wish to monitor. You can do this with cfgmaker. The options to cfgmaker have to include the machine and community name that you want to monitor. mkomarinski@clint:~$ cfgmaker public@localhost --base: Get Device Info on public@localhost --base: Vendor Id: --base: Populating confcache --base: Get Interface Info --base: Walking ifIndex --base: Walking ifType --base: Walking ifSpeed --base: Walking ifAdminStatus --base: Walking ifOperStatus # Created by # /usr/bin/cfgmaker public@localhost ### Global Config Options # for Debian WorkDir: /var/www/mrtg # or for NT # WorkDir: c:\mrtgdata ### Global Defaults # to get bits instead of bytes and graphs growing to the right # Options[_]: growright, bits ###################################################################### # System: clint # Description: Linux clint 2.2.18 #1 Mon Dec 18 11:23:05 EST 2000 i686 # Contact: mkomarinski@valinux.com # Location: Laptop (various locations) ###################################################################### ### Interface 3 >> Descr: 'wvlan0' | Name: '' | Ip: '192.168.1.175' | Eth: '00-02-2d-08-ae-c1' ### Target[localhost_3]: 3:public@localhost MaxBytes[localhost_3]: 1250000 Title[localhost_3]: Traffic Analysis for 3 -- clint PageTop[localhost_3]: <H1>Traffic Analysis for 3 -- clint</H1> <TABLE> <TR><TD>System:</TD> <TD>clint in Laptop (various locations)</TD></TR> <TR><TD>Maintainer:</TD> <TD>mkomarinski@valinux.com</TD></TR> <TR><TD>Description:</TD><TD>wvlan0 </TD></TR> <TR><TD>ifType:</TD> <TD>ethernetCsmacd (6)</TD></TR> <TR><TD>ifName:<TD> <TD></TD></TR> <TR><TD>Max Speed:</TD> <TD>1250.0 kBytes/s</TD></TR> <TR><TD>Ip:</TD> <TD>192.168.1.175 ()</TD></TR> </TABLE> All the configuration information has been pulled from snmpd. You can redirect the output of cfgmaker into /etc/mrtg.cfg. Most installations of mrtg will include a cron process to run mrtg if /etc/mrtg.cfg exists every five minutes. Within five minutes, you will see data on your web site.