LDP/LDP/guide/docbook/Tuning-Linux/kernel.sgml

<chapter id="kernel">
<title>Kernel Tuning</title>
  <para>
    Unlike most other UNIX or UNIX-like operating systems, the source
    code to the kernel is available for easy tuning and upgrading.
    With kernels being packaged in distributions, the necessity for most
    users to compile their own kernels is pretty low.  There are advantages
    to compiling your own kernels, though:
  </para>
  <itemizedlist>
    <listitem>
      <para>
        Compile a version of the kernel specific to your chipset.
      </para>
    </listitem>
    <listitem>
      <para>
        Add in a new driver or security patch
      </para>
    </listitem>
    <listitem>
      <para>
        Compile a kernel with modules compiled into the kernel
      </para>
    </listitem>
  </itemizedlist>
  <para>
    This chapter covers each of these situations, and also tuning of running
    kernels using applications like <application>sysconf</application> and
    <application>powertweak</application>.
  </para>

  <section id="kernelversions">
    <title>Kernel versions</title>
    <para>
      There are a few things to consider when deciding what kernel revision
      you should be using in a production environment.
    </para>
    <para>
      Officially, Linux has two major threads of development.  The stable
      releases have an even number as its minor number.  The Linux 2.2 and 2.4
      kernels are examples of stable releases.  These are considered by Linus to
      be stable enough for production use.  Few known bugs or issues exist with
      these kernels.
    </para>
    <para>
      The other release is the development releases.  These have experimental
      drivers or changes to them, and can cause crashes if used in production
      environments.  However, these releases will be more likely to support
      newer hardware.
    </para>
    <para>
      Most Linux distributions will often make a compromise between stability
      and performance by creating their own version of the kernel and source
      code.  These kernels start with a base of a stable kernel, then add in
      some newer drivers.  The resulting kernel is then tested to make sure it
      is stable, then released.
    </para>
    <para>
      As a generalization, stable kernels are good for production use, while
      development kernels may have better performance.  For your use, you should
      monitor the changes in the two sets of kernels and see if there are any
      performance increases that will make your system faster, then test that
      against a known stable kernel.  You can then make a determination of
      stability of the system versus performance.
    </para>
  </section> <!-- kernelversions -->

  <section id="kernelcompile">
    <title>Compiling the Linux Kernel</title>
    <para>
      There is a specific order to getting your kernel compiled, and there
      is nothing preventing you from having multiple kernels and modules
      on the same system.  This allows you to boot test kernels while keeping
      a known stable kernel in case you have trouble.  Boot loaders like LILO
      will allow you to select a kernel at bootup.
    </para>
    <para>
      The easiest place to pick up the latest kernels is <ulink
      url="http://www.kernel.org">http://www.kernel.org</ulink>.  This is
      the official site for Linux kernels.  These kernels are almost always
      in tar-bzip2 format, instead of being in RPM or DEB formats.  Check with
      your distribution provider for packaged formats of the kernel.
    </para>
    <para>
      Unpack and untar the kernel.  For packages compressed with bzip2,
      the process is:
    </para>
<screen>
# tar -jxvf linux-2.2.19.tar.bz2
</screen>
    <para>
      This will create all the header files and source code for the
      kernel.  There are three methods for configuring the kernel to compile,
      and each have some prerequisites.  All methods will obviously require
      the C compiler.
    </para>
    <section id="kernelcompileconfig">
      <title>Using <command>make config</command></title>
      <indexterm>
        <primary>kernel</primary>
        <secondary>compile</secondary>
        <tertiary>config</tertiary>
      </indexterm>
      <para>
        The siplest way (and the first) is to use <command>make config</command>
	from within <filename>/usr/src/linux</filename>.  This is a very
	tedious way of compiling, since you must answer each question
	and the Linux kernel has a large number of questions now.
      </para>
      <para>
        But an advantage to <command>make config</command> is that it
	does not require any other development tools to be installed
	in order to use it.  On very lightweight systems, using this
	method will save drive space and memory, since we are not
	running very complicated or large programs.
      </para>
      <para>
        SCREENSHOT OF make config HERE
      </para>
    </section> <!-- kernelcompileconfig -->

    <section id="kernelcompilemenuconfig">
      <title>Using <command>make menuconfig</command></title>
      <indexterm>
        <primary>kernel</primary>
        <secondary>compile</secondary>
        <tertiary>menuconfig</tertiary>
      </indexterm>
      <para>
        The next level of kernel configuration involves using a character
        based menuing system.  For systems that have only a text console
        or <application>ssh</application> access, this makes an easier way
        to configure a kernel.  It requires both ncurses and its development
        libraries to be installed in order for this to work.  The program
        that runs in <command>make menuconfig</command> is part of the
        kernel source package and is compiled when needed.
      </para>
      <para>
        With an on-screen menu, it is easier to organize and select the
        options you want to compile in.  Help menus are available as well.
        This option is not good for those who want small systems, since it
        requires extra libraries.
      </para>
      <para>
        INSERT SCREENSHOT OF make menuconfig
      </para>
    </section> <!-- kernelcompilemenuconfig -->

    <section id="kernelcompilexconfig">
      <title>Using <command>make xconfig</command></title>
      <indexterm>
        <primary>kernel</primary>
	<secondary>compile</secondary>
	<tertiary>xconfig</tertiary>
      </indexterm>
      <para>
        The most user friendly, but hardest to use, is the <command>make
	xconfig</command> option for configuring the kernel.  This
	requires most of the X libraries to be installed, along with
	Tcl and Tk.
      </para>
      <para>
        Since many server systems may not have X or its associated libraries,
	Tcl, or Tk installed, you may want to step back to one of the other
	methods of configuring the kernel.  You would also need to make sure
	that you can export X to your desktop if you are configuring
	a remote system.
      </para>
      <figure id="kernelmakexconfig">
        <title><command>make xconfig</command></title>
        <mediaobject>
          <imageobject>
            <imagedata format="JPG" fileref="makexconfig.jpg">
          </imageobject>
        </mediaobject>
      </figure>
    </section> <!-- kernelcompilexconfig -->

    <section id="kernelcompileoptions">
      <title>Options for configuring the kernel</title>
      <para>
        Now that you have chosen a way to configure the kernel, how
        should you go about your configuration?  What options should you
        pick for best performance, lowest memory consumption, and
        best security options?
      </para>
      <para>
        Many of the answers to these questions depends on your particular
        situation.  Selecting the right processor family for your CPI
        is one of the best things to do.  Each processor family has its
        own way of alinging memory and some CPU-specific options and
        optimizations.
      </para>
      <para>
        If your hardware setup is going to be stable, you can
	compile in drivers for various kinds of hardware for
	slightly better performance, and to prevent trojan
	modules from being loaded.  See <xref linkend="kernelcompileoptions">
	for more on this.
      </para>
      <para>
        Selecting DMA access for IDE devices under <quote>Block devices</quote>
	will automatically turn on DMA access, reducing CPU load.  You can see
	<xref linkend="disktuningideos"> for more information.
      </para>
      <para>
        If your box will be configured as a router without firewalling,
	you can enable <quote>IP:  optimize as router not host</quote> under
	<quote>Networking options</quote>.  This reduces the latency of
	packets within the kernel by skipping a few unnecessary steps.
	But these steps are required if you will be doing firewalling
	or acting as a server.
      </para>
      <para>
        Bonding device support under <quote>Network device support</quote>
	will enable bonding multiple network devices together to create
	one large pipe to other bonded Linux boxes or some Cisco routers.
      </para>
      <para>
        If you plan on having a large number of users logging in and you would
	need a large number of pseudo terminals (PTYs), you can enter
	a number in <quote>Maximum number of Unix98 PTYs in use (0-2048)</quote>
	under <quote>Character devices</quote>.
      </para>
      <para>
        For systems that will not be onsite and require high availability,
	you can enable <quote>watchdog timer support</quote>.  Watchdogs
	can be implemented either in software or hardware.  Software watchdogs
	open and close files to make sure that everything is working.  If the
	driver cannot open the file, it tries to trigger a reboot of the system,
	since this indicates that there is some error with the system.
	Watchdog timers are usually implemented ISA cards, but some
	single board computers will have the necessary chips built in.  Hardware
	watchdogs are a bit better as it can monitor extra features such
	as temperature and fan speed and can trigger reboots on those events as
	well.  Hardware watchdogs are a bit more proactive as they continually
	talk to the kernel driver.  If the kernel driver does not respond,
	the watchdog triggers a reset.
      </para>
    </section> <!-- kernelcompileoptions -->

    <section id="kernelcompilemodules">
      <title>Kernel Modules and You</title>
      <para>
        While this book is not specifically covering security, it is
	fair to bring it up every now and then.  For those who want
	hard core, as-uncrackable-as-possible systems, you may want
	to consider removing module support from your system.
      </para>
      <para>
        The reason for this is that if a cracker were to gain access
	to your system, one of the more advanced methods of covering their
	tracks is to insert a kernel module that hides much of their activity.
      </para>
      <para>
        Another tactic is to replace an existing module with a trojan horse.
	Once the module is loaded, its payload deploys in kernel space instead
	of userspace, giving the rogue code more access to the system.
      </para>
      <para>
        Another reason to remove modules from your system is to create a smaller
	size kernel.  With modules compiled into the kernel, they get
	compressed along with the rest of the kernel using bzip2.  Normally
	module files are uncompressed, but bzip2 can compress sizes by
	more than half.  This can save crucial space if it is required.
      </para>
      <para>
        For most users, however, you will want to use modules, as it gives a
	very versitile way of installing new drivers by just loading
	the module.
      </para>
    </section> <!-- kernelcompilemodules -->
    <section id="kernelcompilebuild">
      <title>Building The Kernel</title>
      <para>
        With a configured kernel, you can now go through the build process.
	This is documented in a number of places, but it is included
	here for completeness.
      </para>
      <para>
        The first thing you should take a look at if you want to store the
	kernel configuration is to get
	<filename>/usr/src/linux/.config</filename>, as it has the ouput
	from the configuration options selected previously.  You can
	move the .config file to a new system, then build the kernel
	and skip the configuration part.
      </para>
      <para>
        The <command>make dep</command> command sets up the dependencies needed
	so the kernel can be built.  This is a crucial step, as the kernel
	will not compile without it.
      </para>
      <para>
        Most users will want to use <application>bzip2</application> for
	building the kernel, as it makes a much smaller kernel than
	<application>gzip</application> or uncompressed.  You can build a
	<application>bzip2</application> kernel by using <command>make
	bzImage</command>.  The compressed image is built and put into <filename
	class="directory">/usr/src/linux/arch/i386/</filename>.
	You may want to just use <command>make bzlilo</command> to put the
	kernel into <filename class="directory">/</filename> and run
	<command>lilo</command> to let you boot the new kernel.
        Some users may not want their kernel in <filename
	class="directory">/</filename>, since many distributions create a small
	<filename class="directory">/boot</filename> partition that contains
	the kernel and enough to boot the kernel.  In this event, you will
	want to copy the kernel by and into <filename
	class="directory">/boot</filename>.
      </para>
      <para>
        If you will be using modules, you will also need to run <command>make
	modules</command> and <command>make modules_install</command>.  These
	two commands will build and install the modules for your kernel.
      </para>
    </section> <!-- kernelcompilebuild -->
  </section> <!-- kernelcompile -->
  <section id="kernelrunning">
    <title>Tuning a Runing Linux System</title>
    <indexterm>
      <primary>/proc</primary>
    </indexterm>
    <para>
      When Linux first got started, changing options to the kernel often
      required recompiling.  Getting information out of Linux required
      applications be recompiled to look at the right memory location
      to get things like user and process lists.  After the release
      of the 1.0 kernel, the <filename class="directory">/proc</filename>
      filesystem arrived as a way to access information in a kernel-independent
      way.  Because of this, a command like <command>ps</command> will
      keep operating even if the kernel rev changes.  The <filename
      class="directory">/proc</filename> filesystem also allows you
      to change the kernel itself wile running, allowing you to add
      in features like TCP/IP forwarding, manage RAID cards, and so on.
    </para>

    <section id="kernelrunningproc">
      <title>The <filename class="directory">/proc</filename> filesystem</title>
      <para>
        This portion is not strictly about tuning a system, but getting
	information out of your urnning system.  In the event of serious
	memory or hardware issues, you can sometimes get information
	out of Linux.
      </para>
      <para>
        The <filename class="directory">/proc</filename> heirarchy may change
	depending on the kernel revision, but is generally arranged
	into different classes.
      </para>
      <variablelist>
        <varlistentry>
	  <term>
	    Processes
	  </term>
	  <listitem>
	    <para>
	      Contains information about each of the running processes,
	      organized by PID.  Each process has its own directory
	      of information, including status, list of file descriptors,
	      command line, environment variables for the application,
	      and working directory when the application was started.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    bus
	  </term>
	  <listitem>
	    <para>
	      Contains information about the system busses.  At this point,
	      it works with the USB, PCI, and PCMCIA busses.  More, like
	      IEEE-1394 will list devices and controllers under this directory.
	      The information about each device is less human readable than
	      older directoried like <filename class="directory">/proc/pci
	      </filename>, but human readability is not the point of
	      <filename class="directory">/proc</filename>
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    cpuinfo
	  </term>
	  <listitem>
	    <para>
	      If you want to find out information about the CPU(s) installed in
	      your system, you can take a look at the cpuinfo file.  For each
	      processor, you will get things like CPU speed, vendor,
	      model type (Pentium II, Pentium III, etc.), and the
	      ever-popular bogomips rating.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    filesystems
	  </term>
	  <listitem>
	    <para>
	      This file has a list of the filesystems that the kernel currently
	      knows about.  We say currently, since modules can be loaded or
	      removed to add or remove additional filesystem types.  If
	      you are having trouble mounting a filesystem, this is an easy
	      way to make sure the filesystem type is supported by the kernel.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    ide
	  </term>
	  <listitem>
	    <para>
	      If your system has
	      <indexterm>
	        <primary>IDE</primary>
	      </indexterm>
	      IDE devices, this directory contains information about the IDE
	      devices that are in your system.  It is organized by IDE
	      channels that are available, and each device has informaiton about
	      the model name and number, serial number, capacity, and any
	      other information specific to that IDE type.  Tuning of IDE
	      drives should be done using <command>hdparm</command>, and you can
	      see <xref linkend="disktuningideos"> for usage.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    meminfo
	  </term>
	  <listitem>
	    <para>
	      This file has the raw memory information that you would normally
	      see as output of <command>free</command>.  The first
	      part of the ouput contains the memory listing in bytes, followed
	      by most of the same information in kilobytes.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    modules
	  </term>
	  <listitem>
	    <para>
	      You could easily mistake the contents of the modules file
	      for the output of <command>lsmod</command>.  There is a listing
	      of the modules, size of the module, and dependencies for
	      the module.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    partitions
	  </term>
	  <listitem>
	    <para>
	      This file has a list of all the disk partitions that are known
	      to Linux, including those not mounted or listed in <filename>
	      /etc/fstab</filename>.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    pci
	  </term>
	  <listitem>
	    <para>
	      This file may go away in the long term, in favor of <filename
	      class="directory">/proc/bus/pci</filename>, but <filename>
	      /proc/pci</filename> is more human readable, listing
	      the device type, vendor, and device name.  But this
	      will only happen if the Linux kernel knows of the device.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    scsi
          </term>
	  <listitem>
	    <para>
	      <indexterm>
	        <primary>SCSI</primary>
	      </indexterm>
	      SCSI-based systems can look at and tune their parameters here.
	      Its setup if much the same as for IDE, listing chains, followed by
	      devices on each chain.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    swaps
	  </term>
	  <listitem>
	    <para>
	      Lists all mounted swap partitions (or files) mounted.
	    </para>
	  </listitem>
	</varlistentry>
	<varlistentry>
	  <term>
	    version
	  </term>
	  <listitem>
	    <para>
	      The version file contains version information of the kernel you
	      are using.  It's very helpful in that it has the user and host
	      that the kernel was compiled on, GCC version used to compile the
	      kernel, and date and time this kernel was compiled.  This
	      makes it much more helpful than <command>uname</command>, since
	      you can quickly check a series of running machines and verify
	      they all report the same kernel revision.
	    </para>
	  </listitem>
	</varlistentry>
      </variablelist>
      <para>
        Not listed above is <filename class="directory">/proc/sys</filename>.
	This directory is where most of the runtime changes to the kernel
	take place.  Access to this used to be done manually, using
	<command>cat</command> and <command>echo</command> to view or change
	settings of a running kernel.  If you wanted runtime settings to be
	changed each time the machine booted, you had to add commands to
	the <filename>rc.local</filename> file for each setting.
      </para>
      <para>
        An easier way to manage these settings is via the use of
	<command>sysctl</command>.
	<indexterm>
	  <primary>sysctl</primary>
	</indexterm>
      </para>
      <cmdsynopsis>
        <command>sysctl</command>
	<arg>-n</arg>
	<arg><replaceable>variable</replaceable></arg>
	<arg>-w <replaceable>variable</replaceable>=<replaceable>value</replaceable></arg>
	<arg>-a</arg>
	<arg>-p <replaceable>filename</replaceable></arg>
      </cmdsynopsis>
      <para>
        If you use an argument of <replaceable>variable</replaceable>, you will
	get the current value of
	<filename>/proc/sys/<replaceable>variable</replaceable></filename>.  You
	can set it by adding an equal sign and <replaceable>value</replaceable>.
	You can get a list of what is available along with their current values
	by using -a.
      </para>
      <para>
        As part of the Linux boot sequence on more modern distributions,
	<command>sysctl</command> is called with -p, which reads
	settings from <filename>/etc/sysctl.conf</filename>.  This file can be
	edited by a text file, and contains a list of lines of the form
	<replaceable>variable</replaceable>=<replaceable>value</replaceable>.
      </para>
<screen>
#
# /etc/sysctl.conf - Configuration file for setting system variables
# See sysctl.conf (5) for information.
#
#kernel.domainname = example.com
#net/ipv4/icmp_echo_ignore_broadcasts=1
net.ipv4.conf.all.hidden = 1
net.ipv4.conf.lo.hidden=1
</screen>
      <para>
        As you can see, <replaceable>variable</replaceable> can separate out
	directories using either a slash (/), or a dot (.).  Using the dot makes
	it look more like SNMP variables.
      </para>
      <para>
        Since everything under <filename class="directory">/proc</filename>
	are regular files, security for viewing or setting are handled by
	regular file permissions.  In some cases, only root can make changes,
	in others, only root can view or set.
      </para>
    </section> <!-- kernelrunningproc -->

    <section id="kernelrunningpowertweak">
      <title>Using <command>powertweak</command></title>
      <indexterm>
        <primary>powertweak</primary>
      </indexterm>
      <para>
        Another interface to change options of a running kernel is through the
	use of <command>powertweak</command>.  Powertweak also runs on startup
	of linux, using a daemon called <application>powertweakd</application>
	to issue changes to the kernel.  Configuration files are stored in the
	kernel in XML format.
      </para>
      <para>
        An advantage of using <command>powertweak</command> over
	<command>sysctl</command> is that powertweak has a graphical
	and text interface.  There is a pretty extensive help system that
	can give you information about each setting.
      </para>
      <para>
        For those that just want to have optimized settings for a particular
	use, there are pre-made settings for laptops, web servers, routers,
	and Oracle usage.  As an example, the laptop setting changes the kernel
	disk flush so the hard drive can spin down.  These are just generic
	settings, as each different machine has different PCI devices
	that can also be set.
      </para>
      <para>
        You can get the latest version of <application>powertweak</application>
	from <ulink
	url="http://www.powertweak.org/">http://www.powertweak.org/</ulink>
      </para>
      <figure id="kernelpowertweak"
        <title>Powertweak</title>
	<mediaobject>
	  <imageobject>
	    <imagedata format="JPG" fileref="powertweak.jpg">
	  </imageobject>
	</mediaobject>
      </figure>
    </section> <!-- kernelrunningpowertweak -->

  </section> <!-- kernelrunning -->
</chapter> <!-- kernel -->