322 lines
13 KiB
HTML
322 lines
13 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
|
<TITLE>HOWTO: Multi Disk System Tuning: Considerations and Dimensioning</TITLE>
|
|
<LINK HREF="Multi-Disk-HOWTO-11.html" REL=next>
|
|
<LINK HREF="Multi-Disk-HOWTO-9.html" REL=previous>
|
|
<LINK HREF="Multi-Disk-HOWTO.html#toc10" REL=contents>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="Multi-Disk-HOWTO-11.html">Next</A>
|
|
<A HREF="Multi-Disk-HOWTO-9.html">Previous</A>
|
|
<A HREF="Multi-Disk-HOWTO.html#toc10">Contents</A>
|
|
<HR>
|
|
<H2><A NAME="s10">10. Considerations and Dimensioning</A></H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!considerations and dimensioning
|
|
-->
|
|
|
|
The starting point in this will be to consider where you are and what
|
|
you want to do. The typical home system starts out with existing
|
|
hardware and the newly converted Linux user will want to get the most
|
|
out of existing hardware. Someone setting up a new system for a
|
|
specific purpose (such as an Internet provider) will instead have to
|
|
consider what the goal is and buy accordingly. Being ambitious I will
|
|
try to cover the entire range.
|
|
<P>Various purposes will also have different requirements regarding file
|
|
system placement on the drives, a large multiuser machine would
|
|
probably be best off with the <CODE>/home</CODE> directory on a
|
|
separate disk, just to give an example.
|
|
<P>In general, for performance it is advantageous to split most things
|
|
over as many disks as possible but there is a limited number of
|
|
devices that can live on a SCSI bus and cost is naturally also a
|
|
factor. Equally important, file system maintenance becomes more
|
|
complicated as the number of partitions and physical drives increases.
|
|
<P>
|
|
<H2><A NAME="ss10.1">10.1 Home Systems</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!considerations and dimensioning!home systems
|
|
-->
|
|
|
|
With the cheap hardware available today it is possible to have
|
|
quite a big system at home that is still cheap, systems that
|
|
rival major servers of yesteryear. While many started out with
|
|
old, discarded disks to build a Linux server (which is how this
|
|
HOWTO came into existence), many can now afford to buy 40 GB
|
|
disks up front.
|
|
<P>Size remains important for some, and here are a few guidelines:
|
|
<P>
|
|
<DL>
|
|
<DT><B>Testing</B><DD><P>Linux is simple and you don't even need a hard disk to
|
|
try it out, if you can get the boot floppies to work you are likely to
|
|
get it to work on your hardware. If the standard kernel does not work
|
|
for you, do not forget that often there can be special boot disk versions
|
|
available for unusual hardware combinations that can solve your initial
|
|
problems until you can compile your own kernel.
|
|
<P>
|
|
<DT><B>Learning</B><DD><P>about operating system is something Linux excels in,
|
|
there is plenty of documentation and the source is available. A single
|
|
drive with 50 MB is enough to get you started with a shell, a few of
|
|
the most frequently used commands and utilities.
|
|
<P>
|
|
<DT><B>Hobby</B><DD><P>use or more serious learning requires more commands and
|
|
utilities but a single drive is still all it takes, 500 MB should give
|
|
you plenty of room, also for sources and documentation.
|
|
<P>
|
|
<DT><B>Serious</B><DD><P>software development or just serious hobby work requires
|
|
even more space. At this stage you have probably a mail and news feed
|
|
that requires spool files and plenty of space. Separate drives for
|
|
various tasks will begin to show a benefit. At this stage you have
|
|
probably already gotten hold of a few drives too. Drive requirements
|
|
gets harder to estimate but I would expect 2-4 GB to be plenty, even
|
|
for a small server.
|
|
<P>
|
|
<DT><B>Servers</B><DD><P>come in many flavours, ranging from mail servers to full
|
|
sized ISP servers. A base of 2 GB for the main system should be
|
|
sufficient, then add space and perhaps also drives for separate
|
|
features you will offer. Cost is the main limiting factor here
|
|
but be prepared to spend a bit if you wish to justify the "S"
|
|
in ISP. Admittedly, not all do it.
|
|
<P>Basically a server is dimensioned like any machine for serious use
|
|
with added space for the services offered, and tends to be IO bound
|
|
rather than CPU bound.
|
|
<P>With cheap networking technology both for land lines as well as
|
|
through radio nets, it is quite likely that very soon home users
|
|
will have their own servers more or less permanently hooked onto
|
|
the net.
|
|
<P>
|
|
</DL>
|
|
<P>
|
|
<H2><A NAME="ss10.2">10.2 Servers</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!servers
|
|
-->
|
|
|
|
Big tasks require big drives and a separate section here. If possible
|
|
keep as much as possible on separate drives. Some of the appendices
|
|
detail the setup of a small departmental server for 10-100
|
|
users. Here I will present a few consideration for the higher end
|
|
servers. In general you should not be afraid of using RAID, not only
|
|
because it is fast and safe but also because it can make growth a
|
|
little less painful. All the notes below come as additions to the
|
|
points mentioned earlier.
|
|
<P>Popular servers rarely just happens, rather they grow over time and this
|
|
demands both generous amounts of disk space as well as a good net
|
|
connection. In many of these cases it might be a good idea to reserve
|
|
entire SCSI drives, in singles or as arrays, for each task. This way you
|
|
can move the data should the computer fail. Note that transferring drives
|
|
across computers is not simple and might not always work, especially in the
|
|
case of IDE drives. Drive arrays require careful setup in order to
|
|
reconstruct the data correctly, so you might want to keep a paper copy of
|
|
your <CODE>fstab</CODE> file as well as a note of SCSI IDs.
|
|
<P>
|
|
<H3><A NAME="server-home-dirs"></A> Home Directories </H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!servers!home directories
|
|
-->
|
|
|
|
Estimate how many drives you will need, if this is more than 2 I would
|
|
recommend RAID, strongly. If not you should separate users across your
|
|
drives dedicated to users based on some kind of simple hashing algorithm.
|
|
For instance you could use the first 2 letters in the user name, so
|
|
<CODE>jbloggs</CODE> is put on <CODE>/u/j/b/jbloggs</CODE> where <CODE>/u/j</CODE>
|
|
is a symbolic link to a
|
|
physical drive so you can get a balanced load on your drives.
|
|
<P>
|
|
<H3>Anonymous FTP</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!servers!FTP, anonymous
|
|
-->
|
|
|
|
<!--
|
|
disk!servers!anonymous FTP
|
|
-->
|
|
|
|
This is an essential service if you are serious about service. Good
|
|
servers are well maintained, documented, kept up to date, and
|
|
immensely popular no matter where in the world they are located. The
|
|
big server
|
|
<A HREF="ftp://ftp.funet.fi">ftp.funet.fi</A>
|
|
is an excellent example of this.
|
|
<P>In general this is not a question of CPU but of network bandwidth. Size
|
|
is hard to estimate, mainly it is a question of ambition and service
|
|
attitudes. I believe the big archive at
|
|
<A HREF="ftp://ftp.cdrom.com">ftp.cdrom.com</A>
|
|
is a *BSD machine
|
|
with 50 GB disk. Also memory is important for a dedicated FTP server,
|
|
about 256 MB RAM would be sufficient for a very big server, whereas
|
|
smaller servers can get the job done well with 64 MB RAM.
|
|
Network connections would still be the most important factor.
|
|
<P>
|
|
<P>
|
|
<H3><A NAME="www"></A> WWW </H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!servers!WWW
|
|
-->
|
|
|
|
<!--
|
|
disk!servers!World Wide Web
|
|
-->
|
|
|
|
For many this is the main reason to get onto the Internet, in fact many
|
|
now seem to equate the two. In addition to being network intensive
|
|
there is also a fair bit of drive activity related to this, mainly
|
|
regarding the caches. Keeping the cache on a separate, fast drive
|
|
would be beneficial. Even better would be installing a caching proxy
|
|
server. This way you can reduce the cache size for each user and speed
|
|
up the service while at the same time cut down on the bandwidth
|
|
requirements.
|
|
<P>With a caching proxy server you need a fast set of drives, RAID0 would
|
|
be ideal as reliability is not important here. Higher capacity is
|
|
better but about 2 GB should be sufficient for most. Remember to match
|
|
the cache period to the capacity and demand. Too long periods would on
|
|
the other hand be a disadvantage, if possible try to adjust based on
|
|
the URL. For more information check up on the most used servers such as
|
|
<CODE>Harvest</CODE>,
|
|
<A HREF="http://www.squid-cache.org/">Squid</A>
|
|
and the one from
|
|
<A HREF="http://www.netscape.com">Netscape</A>.
|
|
<P>
|
|
<P>
|
|
<H3>Mail</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!servers!mail
|
|
-->
|
|
|
|
Handling mail is something most machines do to some extent. The big mail
|
|
servers, however, come into a class of their own. This is a demanding task
|
|
and a big server can be slow even when connected to fast drives and a good
|
|
net feed. In the Linux world the big server at <CODE>vger.rutgers.edu</CODE> is a
|
|
well known example. Unlike a news service which is distributed and which
|
|
can partially reconstruct the spool using other machines as a feed, the
|
|
mail servers are centralised. This makes safety much more important, so for
|
|
a major server you should consider a RAID solution with emphasize on
|
|
reliability. Size is hard to estimate, it all depends on how many lists you
|
|
run as well as how many subscribers you have.
|
|
<P>Big mail servers can be IO limited in performance and for this
|
|
reason some use huge silicon disks connected to the SCSI bus to hold
|
|
all mail related files including temporary files.
|
|
For extra safety these are battery backed and filesystems
|
|
like <CODE>udf</CODE> are preferred since they always flush metadata to disk.
|
|
This added cost to performance is offset by the very fast disk.
|
|
<P>Note that these days more and more switch over from using <CODE>POP</CODE> to
|
|
pull mail to local machine from mail server and instead use <CODE>IMAP</CODE> to
|
|
serve mail while keeping the mail archive centralised.
|
|
This means that mail is no longer spooled in its original sense but
|
|
often builds up, requiring huge disk space. Also more and more (ab)use
|
|
mail attachments to send all sorts of things across, even a small
|
|
word processor document can easily end up over 1 MB. Size your disks
|
|
generously and keep an eye on how much space is left.
|
|
<P>
|
|
<P>
|
|
<H3>News</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!servers!news
|
|
-->
|
|
|
|
This is definitely a high volume task, and very dependent on what
|
|
news groups you subscribe to. On Nyx there is a fairly complete feed
|
|
and the spool files consume about 17 GB. The biggest groups are no doubt
|
|
in the <CODE>alt.binary.*</CODE> hierarchy, so if you for some reason decide not to
|
|
get these you can get a good service with perhaps 12 GB. Still others,
|
|
that shall remain nameless, feel 2 GB is sufficient to claim ISP status.
|
|
In this case news expires so fast I feel the spelling IsP is barely
|
|
justified. A full newsfeed means a traffic of a few GB every day and this
|
|
is an ever growing number.
|
|
<P>
|
|
<P>
|
|
<H3>Others</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!servers!other
|
|
-->
|
|
|
|
There are many services available on the net and even though many have been
|
|
put somewhat in the shadows by the web. Nevertheless, services like
|
|
<EM>archie</EM>, <EM>gopher</EM> and <EM>wais</EM> just to name a few, still exist and
|
|
remain valuable tools on the net. If you are serious about starting a major
|
|
server you should also consider these services. Determining the required
|
|
volumes is hard, it all depends on popularity and demand. Providing good
|
|
service inevitably has its costs, disk space is just one of them.
|
|
<P>
|
|
<P>
|
|
<H3>Server Recommendations</H3>
|
|
|
|
<P>
|
|
<!--
|
|
disk!servers!recommendations
|
|
-->
|
|
|
|
Servers today require large numbers of large disks to function
|
|
satisfactorily in commercial settings. As mean time between failure
|
|
(MTBF) decreases rapidly as the number of components increase it is
|
|
advisable to look into using RAID for protection and use a number of
|
|
medium sized drives rather than one single huge disk. Also look into
|
|
the High Availability (HA) project for more information.
|
|
More information is available at
|
|
<P>
|
|
<A HREF="http://www.ibiblio.org/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html">High Availability HOWTO</A>
|
|
and also at related
|
|
<A HREF="http://www.henge.com/~alanr/ha/index.html">web pages</A>.
|
|
<P>There is also an article in Byte called
|
|
<A HREF="http://www.byte.com/columns/servinglinux/1999/06/0607servinglinux.html">How Big Does Your Unix Server Have To Be?</A>
|
|
with many points that are relevant to Linux.
|
|
<P>
|
|
<P>
|
|
<H2><A NAME="pitfalls"></A> <A NAME="ss10.3">10.3 Pitfalls</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<!--
|
|
disk!pitfalls
|
|
-->
|
|
|
|
The dangers of splitting up everything into separate partitions are
|
|
briefly mentioned in the section about volume management. Still, several
|
|
people have asked me to emphasize this point more strongly: when one
|
|
partition fills up it cannot grow any further, no matter if there is
|
|
plenty of space in other partitions.
|
|
<P>In particular look out for explosive growth in the news spool
|
|
(<CODE>/var/spool/news</CODE>). For multi user machines with quotas keep
|
|
an eye on <CODE>/tmp</CODE> and <CODE>/var/tmp</CODE> as some people try to hide their
|
|
files there, just look out for filenames ending in gif or jpeg...
|
|
<P>In fact, for single physical drives this scheme offers very little gains
|
|
at all, other than making file growth monitoring easier
|
|
(using '<CODE>df</CODE>') and physical track positioning. Most importantly
|
|
there is no scope for parallel disk access. A freely available volume
|
|
management system would solve this but this is still some time in the
|
|
future. However, when more specialised file systems become available
|
|
even a single disk could benefit from being divided into several
|
|
partitions.
|
|
<P>For more information see section
|
|
<A HREF="Multi-Disk-HOWTO-15.html#troubleshooting">Troubleshooting</A>.
|
|
<P>
|
|
<P>
|
|
<P>
|
|
<HR>
|
|
<A HREF="Multi-Disk-HOWTO-11.html">Next</A>
|
|
<A HREF="Multi-Disk-HOWTO-9.html">Previous</A>
|
|
<A HREF="Multi-Disk-HOWTO.html#toc10">Contents</A>
|
|
</BODY>
|
|
</HTML>
|