LDP/LDP/howto/docbook/IO-Perf-HOWTO.xml

257 lines
17 KiB
XML
Raw Normal View History

2002-04-06 02:45:00 +00:00
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE article PUBLIC '-//OASIS//DTD DocBook XML V4.1.2//EN' >
<article>
<articleinfo>
<title>I/O Performance HOWTO</title>
<author>
<firstname>Sharon</firstname>
<surname>Snider</surname>
<authorblurb><para><ulink url="mailto:snidersd@us.ibm.com">snidersd@us.ibm.com</ulink></para></authorblurb>
</author>
<pubdate>v1.0, 2002-04-05</pubdate>
<abstract><para>This HOWTO covers information on available patches for the 2.4 kernel that will improve the I/O performance of your Linux operating system. </para></abstract>
<revhistory>
<revision>
<revnumber>v1.0</revnumber>
<date>2002-04-05</date>
<authorinitials>sds</authorinitials>
<revremark>Wrote and converted to DocBook XML.</revremark>
</revision>
</revhistory>
</articleinfo>
<sect1>
<title>Distribution Policy</title>
<para>The I/O Performance-HOWTO is copyrighted &copy; 2002, by IBM Corporation </para>
<para>The I/O Performance-HOWTO may be distributed, at your choice, under either the terms of the GNU Public License version 2 or later or the standard Linux Documentation Project (LDP) terms. These licenses should be available from the LDP Web site <ulink url="http://www.linuxdoc.org/docs.html"></ulink>. Please note that since the LDP terms do not allow modification (other than translation), modified versions can be assumed to be distributed under the GPL.</para>
</sect1>
<sect1 id="INTRODUCTION">
<title>Introduction</title>
<para>This HOWTO provides information on improving the input/output (I/O) performance of the Linux operating system for the 2.4 kernel. Additional patches will be added as they become available.</para>
<para>Please send any comments, or contributions via e-mail to <ulink url="mailto:snidersd@us.ibm.com"> Sharon Snider</ulink>. </para>
</sect1>
<sect1 id="OVERVIEW">
<title>Avoiding Bounce Buffers</title>
<para>This section provides information on applying and using the bounce buffer patch on the Linux 2.4 kernel. The bounce buffer patch, written by Jens Axboe, enables device drivers that support Direct Memory Access (DMA) I/O to high-address physical memory to avoid bounce buffers.</para>
<para>This document provides a brief overview on memory and addressing in the Linux kernel, followed by information on why and how to make use of the bounce buffer patch.</para>
<sect2>
<title>Memory and Addressing in the Linux 2.4 Kernel</title>
<para>The Linux 2.4 kernel includes configuration options for specifying the amount of physical memory in the target computer. By default, the configuration is limited to the amount of memory that can be directly mapped into the kernel's virtual address space. The mapping starts at PAGE_OFFSET (normally 0xC0000000). On i386 systems the default mapping scheme limits the kernel-mode addressability to the first gigabyte (GB) of physical memory, also known as low memory. High-address physical memory is normally the memory above 1 GB. This memory is not directly accessible or permanently mapped by the kernel. Support for high-address physical memory is an option that is enabled during <link linkend="config">configuration of the Linux kernel</link>.</para>
</sect2>
<sect2>
<title>The Problem with Bounce Buffers</title>
<para>When DMA I/O is performed to or from high-address physical memory, an area is allocated in memory known as a bounce buffer. When data travels between a device and high-address physical memory, it is first copied through the bounce buffer.</para>
<para>Systems with a large amount of high-address physical memory and intense I/O activity can create a large number of bounce buffer data copies. The excessive number of data copies can lead to a shortage of memory and performance degradation.</para>
<para>Peripheral component interface (PCI) devices normally address up to 4 GB of physical memory. When a bounce buffer is used for high-address physical memory that is below 4 GB, time and memory are wasted because the peripheral has the ability to address that memory directly. Using the bounce buffer patch can decrease, and possibly eliminate, the use of bounce buffers.</para>
</sect2>
<sect2 id="config">
<title>Locating the Patch</title>
<para> The latest version of the bounce buffer patch is <emphasis>block-highmem-all-&lt;version&gt;.gz </emphasis>, and it is available from Andrea Arcangeli's -aa series kernels at <ulink url="http://kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/"></ulink>.</para>
<sect3>
<title>Configuring the Linux Kernel to Avoid Bounce Buffers</title>
<para>This section includes information on configuring the Linux kernel to avoid bounce buffers. The Linux Kernel-HOWTO at <ulink url="http://www.linuxdoc.org/HOWTO/Kernel-HOWTO.html"></ulink> explains the process of re-compiling the Linux kernel.</para>
<para>The following kernel configuration options are required to enable the bounce buffer patch:</para>
<itemizedlist>
<listitem><para>Development Code - To enable the configurator to display the <guimenuitem>High I/O Support</guimenuitem> option, select <guimenuitem>Code Maturity Level Options</guimenuitem> category and specify "y" to <menuchoice><guibutton>prompt for development and/or incomplete code/drivers</guibutton></menuchoice>.</para></listitem>
<listitem><para>High-Address Physical Memory Support - To enable high memory support for physical memory that is greater than 1 GB, select <guimenuitem>Processor type and feature</guimenuitem> category, and enter the actual amount of physical memory under the <menuchoice><guilabel>High Memory Support</guilabel></menuchoice> option.</para></listitem>
<listitem><para>High-Address Physical Memory I/O Support - To enable high DMA I/O to physical addresses greater than 1 GB, select <guimenuitem>Processor type and feature</guimenuitem> category, and enter "y" to <menuchoice><guibutton>HIGHMEM I/O support</guibutton></menuchoice> option. This configuration option is a new option introduced by the bounce buffer patch.</para></listitem>
</itemizedlist>
</sect3>
<sect3 id="enabled">
<title>Enabled Device Drivers</title>
<para>The bounce buffer patch provides the kernel infrastructure, small computer system interface (SCSI), and IDE mid-level driver modifications to support DMA I/O to high-address physical memory. Updates for several device drivers to make use of the added support are included with the patch.</para>
<para>You will need to apply the bounce buffer patch and configure the kernel to support high-address physical memory I/O. Many IDE configurations and the peripheral device drivers listed below perform DMA I/O without the use of bounce buffers:</para>
<para><simplelist columns="1" type="vert">
<member>aic7xxx_drv.o</member>
<member>aic7xxx_old.o</member>
<member>cciss.o</member>
<member>cpqarray.o</member>
<member>megaraid.o</member>
<member>qlogicfc.o</member>
<member>sym53c8xx.o</member>
</simplelist></para>
</sect3>
</sect2>
<sect2>
<title>Modifying Your Device Driver to Avoid Bounce Buffers</title>
<para>The entire process of rebuilding a Linux device driver is beyond the scope of this document. However, additional information is available at
<ulink url="http://www.xml.com/ldd/chapter/book/index.html"></ulink>.</para>
<note><para>Modifications are required for all device drivers that are not listed above in the
<link linkend="enabled">Enabled Device Drivers</link> section.</para></note>
<para>If your device driver is capable of high-address physical memory DMA I/O, you can modify your device driver to make use of the bounce buffer patch by making the following modifications:</para>
<para>For SCSI Device Drivers: set the <structfield>highmem_io</structfield> bit in the <structname>Scsi_Host_Template</structname> structure, then call <structfield>scsi_register ( )</structfield>.</para>
<para>For IDE Drivers: set the <structfield>highmem</structfield> in the <structname>ide_hwif_t</structname> structure, then call <structfield>ide_dmaproc ( )</structfield>.</para>
<orderedlist>
<listitem><para>Call <structfield>pci_set_dma_mask ( )</structfield> to specify the address bits that the device can successfully use on DMA operations. Modify the code as follows:</para>
<para><structfield>int pci_set_dma_mask (struct pci_dev *pdev, dma_addr_t mask);</structfield></para>
<para>If DMA I/O can be supported with the specified mask, <structfield>pci_set_dma_mask ( )</structfield> will set <structfield>pdev->dma_mask</structfield> and return 0. For SCSI or IDE, the mask value will also be passed by the mid-level drivers to <structfield>blk_queue_bounce_limit ( )</structfield> so that bounce buffers are not created for memory directly addressable by the device. Drivers other than SCSI or IDE must call <structfield>blk_queue_bounce_limit ( )</structfield> directly. Modify the code as follows:</para>
<para><structfield>void blk_queue_bounce_limit (request_queue_t *q, u64 dma_addr);</structfield></para> </listitem>
<listitem><para>Use <structfield>pci_map_page (dev, page, offset, size, direction)</structfield> to map a memory region so that it is accessible by the peripheral device, instead of <structfield>pci_map_single (dev, address, size, direction)</structfield>.</para>
<para>The address parameter for <structfield>pci_map_single ( )</structfield> correlates to the page and offset parameters of <structfield>pci_map_page ( )</structfield>. <structfield>pci_map_page ( )</structfield> supports both the high and low physical memory.</para>
<para>Use the <structfield>virt_to_page ( )</structfield> macro to convert an address to a page/offset pair. The macro is defined by including pc.h. For example:</para>
<simplelist columns="1" type="vert">
<member><structfield> void *address;</structfield></member>
<member><structfield> struct page *page;</structfield></member>
<member><structfield> unsigned long offset;</structfield></member>
</simplelist>
<simplelist columns="1" type="vert">
<member><structfield> page = virt_to_page (address);</structfield></member>
<member><structfield> offset = (unsigned long) address &amp; ~PAGE_MASK;</structfield></member>
</simplelist>
<para>Call <structfield>pci_unmap_page ( )</structfield> after the DMA I/O transfer is complete, the mapping established by <structfield> pci_map_page ( )</structfield> should be removed by calling <structfield>pci_unmap_page ( )</structfield>.</para>
<important><title>Important:</title><para><structfield>pci_map_single ( )</structfield> is implemented using <structfield>virt_to_bus ( ) </structfield>. This function call handles low memory addresses only. Drivers supporting high-address physical memory should no longer call <structfield>virt_to_bus ( )</structfield> or <structfield>bus_to_virt ( )</structfield>.</para></important></listitem>
<listitem><para>Set your driver to map a scatter-gather DMA operation using <structfield>pci_map_sg ( )</structfield>. The driver should set the page and offset fields instead of the address field of the scatterlist structure. Refer to step 3 for converting an address to a page/offset pair.</para>
<note><para>If your driver is already using the PCI DMA API, continue to use <structfield>pci_map_page ( ) </structfield> or <structfield>pci_map_sg ( )</structfield> as appropriate. However, do not use the address field of the scatterlist structure.</para></note>
</listitem>
</orderedlist>
</sect2>
</sect1>
<sect1>
<title>Raw I/O Variable-Size Optimization Patch</title>
<para>This section provides information on the raw I/O variable-size optimization patch for the Linux 2.4 kernel written by Badari Pulavarty. This patch is also known as the RAW VARY or PAGESIZE_io patch. </para>
<para>The raw I/O variable-size patch changes the block size used for raw I/O from hardsect_size (normally 512 bytes) to 4 kilobytes (K). The patch improves I/O throughput and central processing unit (CPU) utilization by reducing the number of buffer heads needed for raw I/O operations.</para>
<sect2>
<title>Locating the Patch</title>
<para>You can download the patch from one of the following locations:</para>
<itemizedlist>
<listitem><para>Andrea Arcangeli has made the patch available at
<ulink url="http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.18pre7aa2/"></ulink>.
The name of the file is <emphasis>10_rawio-vary-io-1</emphasis>.</para></listitem>
<listitem><para>Alan Cox has included the patch in the <emphasis>2.4.18pre9-ac2</emphasis> kernel patch. The patch is available at <ulink url="http://www.kernel.org/pub/linux/kernel/people/alan/linux-2.4/2.4.18/"></ulink>. </para></listitem>
<listitem><para>The patch can be found as part of the IO Scalability Package at <ulink url="http://sourceforge.net/projects/lse/io"></ulink>. The name of the patch is <emphasis>PAGESIZE_io-&lt;version&gt;</emphasis> listed under the <emphasis>Raw I/O Enhancements </emphasis> release.</para> </listitem>
</itemizedlist>
</sect2>
<sect2>
<title>Modifying Your Driver for the Raw I/O Variable-Size Optimization Patch</title>
<para>Modifications are required for all device drivers using version 2.4.17 patch. However, rebuilding device drivers is beyond the scope of this document.
Additional information is available at <ulink url="http://www.xml.com/ldd/chapter/book/index.html"></ulink>.</para>
<para>In previous versions of this patch, changes were enabled for all drivers. However, the 2.4.17 and later versions of the patch enable only the changes for the Adaptec aic7xxx and the Qlogic ISP1020 SCSI drivers. All other drivers for version 2.4.17 must be modified to make use of the patch.</para>
<para>You will need to modify the code as follows:</para>
<para>Set the <structfield>can_do_varyio</structfield> bit in the <structname>Scsi_Host_Template</structname> structure before calling <structfield>scsi_register ( ).</structfield></para>
<important><para>Drivers that have the raw I/O patch enabled must support buffer heads of variable sizes (b_size) in a single I/O request because <structfield>hardsect_size</structfield> is used until the data buffer is aligned to the 4 K boundary.</para></important>
</sect2>
</sect1>
<sect1>
<title>I/O Request Lock Patch</title>
<para>This section provides information on the I/O request lock patch, also known as the scsi concurrent queuing patch (sior1), written by Johnathan Lahr. </para>
<para>The I/O request lock patch improves scsi I/O performance on Linux 2.4 multi-processor systems by providing concurrent I/O request queuing. There are significant I/O preformance and CPU utilization improvements possible by enabling multi-processors to concurrently drive multiple block devices.</para>
<para>Initially block I/O requests are queued one at a time holding the global spin lock, <structfield> io_request_lock</structfield>. Once the patch is applied, SCSI requests are queued which holds the specific queue lock targeted by the request. Requests that are made to different devices are queued concurrently, and requests that are made to the same device are queued serially.</para>
<sect2>
<title>Locating the Patch</title>
<para>You can download the I/O request patch from Sourceforge at <ulink url="http://sourceforge.net/projects/lse/io"></ulink>. The latest version is <emphasis>sior1-v1.2416</emphasis>.</para>
<para>Additional patches that enable concurrent queuing can be downloaded from Sourceforge. The patch for the Emulex SCSI/FC is <emphasis>lpfc_sior1-v0.249</emphasis> and the patch for Adaptec SCSI is <emphasis>aic_sior1-v0.249</emphasis> .</para>
</sect2>
<sect2>
<title>Modifying Your Driver for the I/O Request Lock Patch</title>
<para>Modifications are required for all device drivers. However, rebuilding device drivers is beyond the scope of this document.
Additional information is available at <ulink url="http://www.xml.com/ldd/chapter/book/index.html"></ulink>.</para>
<para>The I/O request lock patch installs concurrent queuing capability into the SCSI midlayer. Concurrent queuing is
activated for each SCSI adapter device driver. To activate the device, the <structfield>concurrent_queue</structfield> field in the <structfield>Scsi_Host_Template</structfield> must be set when the system registers the driver.</para>
<important><para>You activate concurrent queuing when you apply the patch. Concurrent queuing ensures access to the drivers <structfield>request_queue</structfield>. by This access is protected by the <structfield>request_queue.queue_lock</structfield> acquisition.</para></important>
</sect2>
</sect1>
<sect1>
<title>Additional Resources</title>
<para>The following list of Web sites provides additional information on modifying device drivers and configuring the Linux kernel.</para>
<itemizedlist>
<listitem><para>Information on Dynamic DMA mapping is available at
<ulink url="http://lwn.net/2001/0712/a/dma-interface.php3"></ulink>.</para></listitem>
<listitem><para>Kernel-HOWTO is available from the Linux Documentation Project at
<ulink url="http://www.linuxdoc.org/HOWTO/Kernel-HOWTO.html"></ulink>.</para></listitem>
<listitem><para>Linux Device Drivers, 2nd Edition published by O'Reilly is available online at
<ulink url="http://www.xml.com/ldd/chapter/book/index.html"></ulink>.</para></listitem>
</itemizedlist>
</sect1>
</article>