675 lines
25 KiB
HTML
675 lines
25 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
|
<TITLE>HOWTO: Multi Disk System Tuning: Technologies </TITLE>
|
|
<LINK HREF="Multi-Disk-HOWTO-7.html" REL=next>
|
|
<LINK HREF="Multi-Disk-HOWTO-5.html" REL=previous>
|
|
<LINK HREF="Multi-Disk-HOWTO.html#toc6" REL=contents>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="Multi-Disk-HOWTO-7.html">Next</A>
|
|
<A HREF="Multi-Disk-HOWTO-5.html">Previous</A>
|
|
<A HREF="Multi-Disk-HOWTO.html#toc6">Contents</A>
|
|
<HR>
|
|
<H2><A NAME="technologies"></A> <A NAME="s6">6. Technologies </A></H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies
|
|
-->
|
|
|
|
In order to decide how to get the most of your devices you need to
|
|
know what technologies are available and their implications. As always
|
|
there can be some tradeoffs with respect to speed, reliability, power,
|
|
flexibility, ease of use and complexity.
|
|
<P>Many of the techniques described below can be stacked in a number
|
|
of ways to maximise performance and reliability, though at the cost
|
|
of added complexity.
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="RAID"></A> <A NAME="ss6.1">6.1 RAID</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!RAID
|
|
-->
|
|
|
|
This is a method of increasing reliability, speed or both by using multiple
|
|
disks in parallel thereby decreasing access time and increasing transfer
|
|
speed. A checksum or mirroring system can be used to increase reliability.
|
|
Large servers can take advantage of such a setup but it might be overkill
|
|
for a single user system unless you already have a large number of disks
|
|
available. See other documents and FAQs for more information.
|
|
<P>For Linux one can set up a RAID system using either software
|
|
(the <CODE>md</CODE> module in the kernel), a Linux compatible
|
|
controller card (PCI-to-SCSI) or a SCSI-to-SCSI controller. Check the
|
|
documentation for what controllers can be used. A hardware solution is
|
|
usually faster, and perhaps also safer, but comes at a significant cost.
|
|
<P>A summary of available hardware RAID solutions for Linux is available
|
|
at
|
|
<A HREF="http://www.Linux-Consulting.com/Raid/Docs/raid_hw.txt">Linux Consulting</A>.
|
|
<P>
|
|
<P>
|
|
<P>
|
|
<H3><A NAME="SCSI-to-SCSI"></A> SCSI-to-SCSI</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!RAID!SCSI-to-SCSI
|
|
-->
|
|
|
|
SCSI-to-SCSI controllers are usually implemented as complete cabinets
|
|
with drives and a controller that connects to the computer with a
|
|
second SCSI bus. This makes the entire cabinet of drives look like a
|
|
single large, fast SCSI drive and requires no special RAID driver. The
|
|
disadvantage is that the SCSI bus connecting the cabinet to the
|
|
computer becomes a bottleneck.
|
|
<P>A significant disadvantage for people with large disk farms is that there
|
|
is a limit to how many SCSI entries there can be in the
|
|
<A HREF="file:///dev/">/dev</A>
|
|
directory. In these cases using SCSI-to-SCSI will conserve entries.
|
|
<P>Usually they are configured via the front panel or with a terminal
|
|
connected to their on-board serial interface.
|
|
<P>
|
|
<P>Some manufacturers of such systems are
|
|
<A HREF="http://www.cmd.com">CMD</A>
|
|
and
|
|
<A HREF="http://www.syred.com">Syred</A>
|
|
whose web pages describe several systems.
|
|
<P>
|
|
<P>
|
|
<H3><A NAME="PCI-to-SCSI"></A> PCI-to-SCSI</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!RAID!PCI-to-SCSI
|
|
-->
|
|
|
|
PCI-to-SCSI controllers are, as the name suggests,
|
|
connected to the high speed PCI
|
|
bus and is therefore not suffering from the same bottleneck as the
|
|
SCSI-to-SCSI controllers. These controllers require special drivers
|
|
but you also get the means of controlling the RAID configuration over
|
|
the network which simplifies management.
|
|
<P>Currently only a few families of PCI-to-SCSI host adapters
|
|
are supported under Linux.
|
|
<P>
|
|
<DL>
|
|
<P>
|
|
<DT><B>DPT</B><DD><P>The oldest and most mature is a range of controllers from
|
|
<A HREF="http://www.dpt.com">DPT</A>
|
|
including SmartCache I/III/IV and SmartRAID I/III/IV controller families.
|
|
These controllers are supported by the EATA-DMA driver in
|
|
the standard kernel. This company also has an informative
|
|
<A HREF="http://www.dpt.com">home page</A>
|
|
which also describes various general aspects
|
|
of RAID and SCSI in addition to the product related information.
|
|
<P>More information from the author of the DPT controller drivers
|
|
(EATA* drivers) can be found at his pages on
|
|
<A HREF="http://www.uni-mainz.de/~neuffer/scsi/">SCSI</A>
|
|
and
|
|
<A HREF="http://www.uni-mainz.de/~neuffer/scsi/dpt/">DPT</A>.
|
|
<P>These are not the fastest but have a good track record of
|
|
proven reliability.
|
|
<P>Note that the maintenance tools for DPT controllers currently
|
|
run under DOS/Win only so you will need a small DOS/Win partition
|
|
for some of the software. This also means you have to boot the
|
|
system into Windows in order to maintain your RAID system.
|
|
<P>
|
|
<P>
|
|
<DT><B>ICP-Vortex</B><DD><P>A very recent addition is a range of controllers from
|
|
<A HREF="http://www.icp-vortex.com">ICP-Vortex</A>
|
|
featuring up to 5 independent channels and very fast hardware
|
|
based on the i960 chip. The Linux driver was written by the
|
|
company itself which shows they support Linux.
|
|
<P>As ICP-Vortex supplies the maintenance software for Linux it is
|
|
not necessary with a reboot to other operating systems for the
|
|
setup and maintenance of your RAID system. This saves you also
|
|
extra downtime.
|
|
<P>
|
|
<P>
|
|
<DT><B>Mylex DAC-960</B><DD><P>This is one of the latest entries which is out in early beta.
|
|
More information as well as drivers are available at
|
|
<A HREF="http://www.dandelion.com/Linux/DAC960.html">Dandelion Digital's Linux DAC960 Page</A>.
|
|
<P>
|
|
<P>
|
|
<DT><B>Compaq Smart-2 PCI Disk Array Controllers</B><DD><P>Another very recent entry and currently in beta release is the
|
|
<A HREF="http://www.insync.net/~frantzc/cpqarray.html">Smart-2</A>
|
|
driver.
|
|
<P>
|
|
<DT><B>IBM ServeRAID</B><DD><P>IBM has released their
|
|
<A HREF="http://www.developer.ibm.com/welcome/netfinity/serveraid_beta.html">driver</A>
|
|
as GPL.
|
|
<P>
|
|
<P>
|
|
</DL>
|
|
<P>
|
|
<P>
|
|
<P>
|
|
<P>
|
|
<H3><A NAME="soft-raid"></A> Software RAID</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!RAID!Software RAID
|
|
-->
|
|
|
|
A number of operating systems offer software RAID using
|
|
ordinary disks and controllers. Cost is low and performance
|
|
for raw disk IO can be very high.
|
|
As this can be very CPU intensive it increases the load noticeably
|
|
so if the machine is CPU bound in performance rather then IO bound
|
|
you might be better off with a hardware PCI-to-RAID controller.
|
|
<P>Real cost, performance and especially reliability of software
|
|
vs. hardware RAID is a very controversial topic. Reliability
|
|
on Linux systems have been very good so far.
|
|
<P>The current software RAID project on Linux is the <CODE>md</CODE> system
|
|
(multiple devices) which offers much more than RAID so it is
|
|
described in more details later.
|
|
<P>
|
|
<P>
|
|
<P>
|
|
<H3><A NAME="raid-levels"></A> RAID Levels</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!RAID!RAID levels
|
|
-->
|
|
|
|
RAID comes in many levels and flavours which I will give a brief
|
|
overview of this here. Much has been written about it and the
|
|
interested reader is recommended to read more about this in the
|
|
<A HREF="http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/">Software RAID HOWTO</A>.
|
|
<P>
|
|
<UL>
|
|
<LI>RAID <EM>0</EM> is not redundant at all but offers the best
|
|
throughput of all levels here. Data is striped across a number of
|
|
drives so read and write operations take place in parallel across
|
|
all drives. On the other hand if a single drive fail then
|
|
everything is lost. Did I mention backups?
|
|
</LI>
|
|
<LI>RAID <EM>1</EM> is the most primitive method of obtaining redundancy
|
|
by duplicating data across all drives. Naturally this is
|
|
massively wasteful but you get one substantial advantage which is
|
|
fast access.
|
|
The drive that access the data first wins. Transfers
|
|
are not any faster than for a single drive, even though you might
|
|
get some faster read transfers by using one track reading per
|
|
drive.
|
|
|
|
Also if you have only 2 drives this is the only method of achieving
|
|
redundancy.
|
|
</LI>
|
|
<LI>RAID <EM>2</EM> and <EM>4</EM> are not so common and are not covered
|
|
here.
|
|
</LI>
|
|
<LI>RAID <EM>3</EM> uses a number of disks (at least 2) to store data
|
|
in a striped RAID 0 fashion. It also uses an additional redundancy
|
|
disk to store the XOR sum of the data from the data disks. Should
|
|
the redundancy disk fail, the system can continue to operate as if
|
|
nothing happened. Should any single data disk fail the system can
|
|
compute the data on this disk from the information on the redundancy
|
|
disk and all remaining disks. Any double fault will bring the whole
|
|
RAID set off-line.
|
|
|
|
RAID 3 makes sense only with at least 2 data disks (3 disks
|
|
including the redundancy disk). Theoretically there is no limit for
|
|
the number of disks in the set, but the probability of a fault
|
|
increases with the number of disks in the RAID set. Usually the
|
|
upper limit is 5 to 7 disks in a single RAID set.
|
|
|
|
Since RAID 3 stores all redundancy information on a dedicated disk
|
|
and since this information has to be updated whenever a write to any
|
|
data disk occurs, the overall write speed of a RAID 3 set is limited
|
|
by the write speed of the redundancy disk. This, too, is a limit for
|
|
the number of disks in a RAID set. The overall read speed of a RAID
|
|
3 set with all data disks up and running is that of a RAID 0 set
|
|
with that number of data disks. If the set has to reconstruct data
|
|
stored on a failed disk from redundant information, the performance
|
|
will be severely limited: All disks in the set have to be read and
|
|
XOR-ed to compute the missing information.
|
|
</LI>
|
|
<LI>RAID <EM>5</EM> is just like RAID 3, but the redundancy
|
|
information is spread on all disks of the RAID set. This improves
|
|
write performance, because load is distributed more evenly between
|
|
all available disks.
|
|
</LI>
|
|
</UL>
|
|
<P>There are also hybrids available based on RAID 0 or 1 and one other
|
|
level. Many combinations are possible but I have only seen a few
|
|
referred to. These are more complex than the above mentioned
|
|
RAID levels.
|
|
<P>RAID <EM>0/1</EM> combines striping with duplication which
|
|
gives very high transfers combined with fast seeks as well as
|
|
redundancy. The disadvantage is high disk consumption as well as
|
|
the above mentioned complexity.
|
|
<P>RAID <EM>1/5</EM> combines the speed and redundancy benefits of
|
|
RAID5 with the fast seek of RAID1. Redundancy is improved compared
|
|
to RAID 0/1 but disk consumption is still substantial. Implementing
|
|
such a system would involve typically more than 6 drives, perhaps
|
|
even several controllers or SCSI channels.
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="vol-mgmnt"></A> <A NAME="ss6.2">6.2 Volume Management</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!volume management
|
|
-->
|
|
|
|
Volume management is a way of overcoming the constraints of fixed
|
|
sized partitions and disks while still having a control of where
|
|
various parts of file space resides. With such a system you can
|
|
add new disks to your system and add space from this drive to parts
|
|
of the file space where needed, as well as migrating data out from
|
|
a disk developing faults to other drives before catastrophic failure
|
|
occurs.
|
|
<P>The system developed by
|
|
<A HREF="http://www.veritas.com">Veritas</A>
|
|
has become the defacto standard for logical volume management.
|
|
<P>Volume management is for the time being an area where Linux is lacking.
|
|
<P>One is the virtual partition system project
|
|
<A HREF="http://www-wsg.cso.uiuc.edu/~roth/">VPS</A>
|
|
that will reimplement many of the volume management functions found in
|
|
IBM's AIX system. Unfortunately this project is currently on hold.
|
|
<P>Another project is the
|
|
<A HREF="http://www.sistina.com/lvm/">Logical Volume Manager</A>
|
|
project that is similar to a project by HP.
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="ss6.3">6.3 Linux <CODE>md</CODE> Kernel Patch</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!md
|
|
-->
|
|
|
|
<!--
|
|
disk!technologies!raid
|
|
-->
|
|
|
|
<!--
|
|
disk!technologies!striping
|
|
-->
|
|
|
|
<!--
|
|
disk!technologies!translucence
|
|
-->
|
|
|
|
The Linux Multi Disk (md) provides a number of block level features
|
|
in various stages of development.
|
|
<P>RAID 0 (striping) and concatenation are very solid and in production quality
|
|
and also RAID 4 and 5 are quite mature.
|
|
<P>It is also possible to stack some
|
|
levels, for instance mirroring (RAID 1) two pairs of drives,
|
|
each pair set up as striped disks (RAID 0),
|
|
which offers the speed of RAID 0 combined with the reliability of RAID 1.
|
|
<P>In addition to RAID this system offers (in alpha stage) block level
|
|
volume management and soon also translucent file space.
|
|
Since this is done on the block level it can be used in combination
|
|
with any file system, even for <CODE>fat</CODE> using Wine.
|
|
<P>Think very carefully what drives you combine so you can operate all drives
|
|
in parallel, which gives you better performance and less wear. Read more
|
|
about this in the documentation that comes with <CODE>md</CODE>.
|
|
<P>
|
|
<P>Unfortunately The Linux software RAID has split into two trees,
|
|
the old stable versions 0.35 and 0.42 which are documented in the
|
|
official
|
|
<A HREF="http://linas.org/linux/Software-RAID/Software-RAID.html">Software-RAID HOWTO</A>
|
|
and the newer less stable 0.90 series which is documented in the
|
|
unofficial
|
|
<A HREF="http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/">Software RAID HOWTO</A>
|
|
which is a work in progress.
|
|
<P>A
|
|
<A HREF="http://www-mddsp.enel.ucalgary.ca/People/adilger/online-ext2/">patch for online growth of ext2fs</A>
|
|
is available in early stages
|
|
and related work is taking place at
|
|
<A HREF="http://ext2resize.sourceforge.net/">the ext2fs resize project</A>
|
|
at Sourceforge.
|
|
<P>
|
|
<P>
|
|
<P>Hint: if you cannot get it to work properly you have forgotten to set
|
|
the <CODE>persistent-block</CODE> flag. Your best documentation is currently
|
|
the source code.
|
|
<P>
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="ss6.4">6.4 Compression</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!compression
|
|
-->
|
|
|
|
<!--
|
|
disk!compression!DouBle
|
|
-->
|
|
|
|
<!--
|
|
disk!compression!Zlibc
|
|
-->
|
|
|
|
<!--
|
|
disk!compression!dmsdos
|
|
-->
|
|
|
|
<!--
|
|
disk!compression!e2compr
|
|
-->
|
|
|
|
Disk compression versus file compression
|
|
is a hotly debated topic especially regarding
|
|
the added danger of file corruption. Nevertheless there are several options
|
|
available for the adventurous administrators. These take on many forms,
|
|
from kernel modules and patches to extra libraries but note that most
|
|
suffer various forms of limitations such as being read-only. As development
|
|
takes place at neck breaking speed the specs have undoubtedly changed by the
|
|
time you read this. As always: check the latest updates yourself. Here only
|
|
a few references are given.
|
|
<P>
|
|
<UL>
|
|
<LI>DouBle features file compression with some limitations.</LI>
|
|
<LI>Zlibc adds transparent on-the-fly decompression of files as they load.</LI>
|
|
<LI>there are many modules available for reading compressed files or
|
|
partitions that are native to various other operating systems though
|
|
currently most of these are read-only.</LI>
|
|
<LI>
|
|
<A HREF="http://bf9nt.uni-duisburg.de/mitarbeiter/gockel/software/dmsdos/">dmsdos</A>
|
|
(currently in version 0.9.2.0) offer many of the compression
|
|
options available for DOS and Windows. It is not yet complete but work is
|
|
ongoing and new features added regularly.</LI>
|
|
<LI><CODE>e2compr</CODE> is a package that extends <CODE>ext2fs</CODE> with compression
|
|
capabilities. It is still under testing and will therefore mainly be of
|
|
interest for kernel hackers but should soon gain stability for wider use.
|
|
Check the
|
|
<A HREF="http://e2compr.memalpha.cx/e2compr/">http://e2compr.memalpha.cx/e2compr/</A>
|
|
name="e2compr homepage">
|
|
for more information. I have reports of speed and good stability
|
|
which is why it is mentioned here.</LI>
|
|
</UL>
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="ss6.5">6.5 ACL</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!ACL
|
|
-->
|
|
|
|
Access Control List (ACL) offers finer control over file access
|
|
on a user by user basis, rather than the traditional owner, group
|
|
and others, as seen in directory listings (<CODE>drwxr-xr-x</CODE>). This
|
|
is currently not available in Linux but is expected in kernel 2.3
|
|
as hooks are already in place in <CODE>ext2fs</CODE>.
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="ss6.6">6.6 <CODE>cachefs</CODE></A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!cachefs
|
|
-->
|
|
|
|
This uses part of a hard disk to cache slower media such as CD-ROM.
|
|
It is available under SunOS but not yet for Linux.
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="ss6.7">6.7 Translucent or Inheriting File Systems</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!translucent
|
|
-->
|
|
|
|
<!--
|
|
disk!technologies!inheriting
|
|
-->
|
|
|
|
This is a copy-on-write system where writes go to a different system
|
|
than the original source while making it look like an ordinary file
|
|
space. Thus the file space inherits the original data and the
|
|
translucent write back buffer can be private to each user.
|
|
<P>There is a number of applications:
|
|
<UL>
|
|
<LI>updating a live file system on CD-ROM, making it flexible, fast
|
|
while also conserving space,</LI>
|
|
<LI>original skeleton files for each new user, saving space since the
|
|
original data is kept in a single space and shared out,</LI>
|
|
<LI>parallel project development prototyping where every user can
|
|
seemingly modify the system globally while not affecting other users.</LI>
|
|
</UL>
|
|
<P>SunOS offers this feature and this is under development for Linux.
|
|
There was an old project called the Inheriting File Systems (<CODE>ifs</CODE>)
|
|
but this project has stopped.
|
|
One current project is part of the <CODE>md</CODE> system and offers
|
|
block level translucence so it can be applied to any file system.
|
|
<P>Sun has an informative
|
|
<A HREF="http://www.sun.ca/white-papers/tfs.html">page</A>
|
|
on translucent file system.
|
|
<P>It should be noted that
|
|
<A HREF="http://www.rational.com">Clearcase (now owned by Rational)</A>
|
|
pioneered and popularized translucent filesystems for software
|
|
configuration management by writing their own UNIX filesystem.
|
|
<P>
|
|
<P>
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="physical-track-positioning"></A> <A NAME="ss6.8">6.8 Physical Track Positioning</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!physical track positioning
|
|
-->
|
|
|
|
<!--
|
|
disk!technologies!track positioning
|
|
-->
|
|
|
|
This trick used to be very important when drives were slow and small,
|
|
and some file systems used to take the varying characteristics into
|
|
account when placing files. Although higher overall speed, on board
|
|
drive and controller caches and intelligence
|
|
has reduced the effect of this.
|
|
<P>Nevertheless there is still a little to be gained even today.
|
|
As we know, "<I>world dominance</I>" is soon within reach
|
|
but to achieve this "<I>fast</I>" we need to employ all the tricks
|
|
we can use
|
|
<A HREF="http://www.mit.edu:8001/finger?linus@linux.cs.helsinki.fi"> </A>.
|
|
<P>To understand the strategy we need to recall this near ancient piece
|
|
of knowledge and the properties of the various track locations.
|
|
This is based on the fact that transfer speeds generally increase for tracks
|
|
further away from the spindle, as well as the fact that it is faster
|
|
to seek to or from the central tracks than to or from the inner or outer
|
|
tracks.
|
|
<P>Most drives use disks running at constant angular velocity but use
|
|
(fairly) constant data density across all tracks. This means that
|
|
you will get much higher transfer rates on the outer tracks than
|
|
on the inner tracks; a characteristics which fits the requirements
|
|
for large libraries well.
|
|
<P>Newer disks use a logical geometry mapping which differs from the actual
|
|
physical mapping which is transparently mapped by the drive itself.
|
|
This makes the estimation of the "middle" tracks a little harder.
|
|
<P>In most cases track 0 is at the outermost track and this is the general
|
|
assumption most people use. Still, it should be kept in mind that there
|
|
are no guarantees this is so.
|
|
<P>
|
|
<P>
|
|
<DL>
|
|
<DT><B>Inner</B><DD><P>tracks are usually slow in transfer, and lying at one
|
|
end of the seeking position it is also slow to seek to.
|
|
<P>This is more suitable to the low end directories such as DOS, root
|
|
and print spools.
|
|
<P>
|
|
<DT><B>Middle</B><DD><P>tracks are on average faster with respect to transfers
|
|
than inner tracks and being in the middle also on average faster to
|
|
seek to.
|
|
<P>This characteristics is ideal for the most demanding parts such as
|
|
<CODE>swap</CODE>, <CODE>/tmp</CODE> and <CODE>/var/tmp</CODE>.
|
|
<P>
|
|
<DT><B>Outer</B><DD><P>tracks have on average even faster transfer characteristics
|
|
but like the inner tracks are at the end of the seek so statistically it
|
|
is equally slow to seek to as the inner tracks.
|
|
<P>Large files such as libraries would benefit from a place here.
|
|
<P>
|
|
</DL>
|
|
<P>Hence seek time reduction can be achieved by positioning frequently
|
|
accessed tracks in the middle so that the average seek distance and
|
|
therefore the seek time is short. This can be done either by using
|
|
<CODE>fdisk</CODE> or <CODE>cfdisk</CODE> to make a partition
|
|
on the middle tracks or by first
|
|
making a file (using <CODE>dd</CODE>) equal to half the size of the entire disk
|
|
before creating the files that are frequently accessed, after which
|
|
the dummy file can be deleted. Both cases assume starting from an
|
|
empty disk.
|
|
<P>The latter trick is suitable for news spools where the empty directory
|
|
structure can be placed in the middle before putting in the data files.
|
|
This also helps reducing fragmentation a little.
|
|
<P>This little trick can be used both on ordinary drives as well as RAID
|
|
systems. In the latter case the calculation for centring the tracks
|
|
will be different, if possible. Consult the latest RAID manual.
|
|
<P>The speed difference this makes depends on the drives, but a 50 percent
|
|
improvement is a typical value.
|
|
<P>
|
|
<H3><A NAME="disk-speed-values"></A> Disk Speed Values</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!disk speed values
|
|
-->
|
|
|
|
The same mechanical head disk assembly (HDA) is often available
|
|
with a number of interfaces (IDE, SCSI etc) and the mechanical
|
|
parameters are therefore often comparable. The mechanics is
|
|
today often the limiting factor but development is improving
|
|
things steadily. There are two main parameters, usually quoted
|
|
in milliseconds (ms):
|
|
<P>
|
|
<UL>
|
|
<LI>Head movement - the speed at which the read-write head
|
|
is able to move from one track to the next, called access time.
|
|
If you do the mathematics and doubly integrate the seek
|
|
first across all possible starting tracks and
|
|
then across all possible target tracks you will find that
|
|
this is equivalent of a stroke across a third of all tracks.
|
|
</LI>
|
|
<LI>Rotational speed - which determines the time taken to
|
|
get to the right sector, called latency.</LI>
|
|
</UL>
|
|
<P>After voice coils replaced stepper motors for the head movement
|
|
the improvements seem to have levelled off and more energy is
|
|
now spent (literally) at improving rotational speed. This has
|
|
the secondary benefit of also improving transfer rates.
|
|
<P>Some typical values:
|
|
<BLOCKQUOTE><CODE>
|
|
<PRE>
|
|
|
|
Drive type
|
|
|
|
|
|
Access time (ms) | Fast Typical Old
|
|
---------------------------------------------
|
|
Track-to-track <1 2 8
|
|
Average seek 10 15 30
|
|
End-to-end 10 30 70
|
|
</PRE>
|
|
</CODE></BLOCKQUOTE>
|
|
<P>This shows that the very high end drives offer only marginally
|
|
better access times then the average drives but that the old
|
|
drives based on stepper motors are significantly worse.
|
|
<P>
|
|
<BLOCKQUOTE><CODE>
|
|
<PRE>
|
|
|
|
Rotational speed (RPM) | 3600 | 4500 | 4800 | 5400 | 7200 | 10000
|
|
-------------------------------------------------------------------
|
|
Latency (ms) | 17 | 13 | 12.5 | 11.1 | 8.3 | 6.0
|
|
</PRE>
|
|
</CODE></BLOCKQUOTE>
|
|
<P>As latency is the average time taken to reach a given sector, the
|
|
formula is quite simply
|
|
<BLOCKQUOTE><CODE>
|
|
<PRE>
|
|
latency (ms) = 60000 / speed (RPM)
|
|
</PRE>
|
|
</CODE></BLOCKQUOTE>
|
|
<P>Clearly this too is an example of diminishing returns for the
|
|
efforts put into development. However, what really takes off
|
|
here is the power consumption, heat and noise.
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="ss6.9">6.9 Yoke</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!yoke
|
|
-->
|
|
|
|
There is also a
|
|
<A HREF="http://www.it.uc3m.es/cgi-bin/ptb/cvs-yoke.cgi">Linux Yoke Driver</A>
|
|
available in beta which
|
|
is intended to do hot-swappable transparent binding of
|
|
one Linux block device to another. This means that if you
|
|
bind two block devices together,
|
|
say <CODE>/dev/hda</CODE> and <CODE>/dev/loop0</CODE>,
|
|
writing to one device will mean also writing to the other
|
|
and reading from either will yield the same result.
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="ss6.10">6.10 Stacking</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!stacking
|
|
-->
|
|
|
|
One of the advantages of a layered design of an operating system
|
|
is that you have the flexibility to put the pieces together
|
|
in a number of ways.
|
|
For instance you can cache a CD-ROM with <CODE>cachefs</CODE> that is
|
|
a volume striped over 2 drives. This in turn can be set up
|
|
translucently with a volume that is NFS mounted from another machine.
|
|
RAID can be stacked in several layers to offer very fast seek and
|
|
transfer in such a way that it will work if even 3 drives fail.
|
|
The choices are many, limited only by imagination and, probably
|
|
more importantly, money.
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="ss6.11">6.11 Recommendations</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!technologies!recommendations
|
|
-->
|
|
|
|
There is a near infinite number of combinations available but my
|
|
recommendation is to start off with a simple setup without any
|
|
fancy add-ons. Get a feel for what is needed, where the maximum
|
|
performance is required, if it is access time or transfer speed
|
|
that is the bottle neck, and so on. Then phase in each component
|
|
in turn. As you can stack quite freely you should be able to
|
|
retrofit most components in as time goes by with relatively few
|
|
difficulties.
|
|
<P>RAID is usually a good idea but make sure you have a thorough
|
|
grasp of the technology and a solid back up system.
|
|
<P>
|
|
<P>
|
|
<P>
|
|
<HR>
|
|
<A HREF="Multi-Disk-HOWTO-7.html">Next</A>
|
|
<A HREF="Multi-Disk-HOWTO-5.html">Previous</A>
|
|
<A HREF="Multi-Disk-HOWTO.html#toc6">Contents</A>
|
|
</BODY>
|
|
</HTML>
|