old-www/HOWTO/Multi-Disk-HOWTO-10.html

322 lines
13 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
<TITLE>HOWTO: Multi Disk System Tuning: Considerations and Dimensioning</TITLE>
<LINK HREF="Multi-Disk-HOWTO-11.html" REL=next>
<LINK HREF="Multi-Disk-HOWTO-9.html" REL=previous>
<LINK HREF="Multi-Disk-HOWTO.html#toc10" REL=contents>
</HEAD>
<BODY>
<A HREF="Multi-Disk-HOWTO-11.html">Next</A>
<A HREF="Multi-Disk-HOWTO-9.html">Previous</A>
<A HREF="Multi-Disk-HOWTO.html#toc10">Contents</A>
<HR>
<H2><A NAME="s10">10. Considerations and Dimensioning</A></H2>
<P>
<!--
disk!considerations and dimensioning
-->
The starting point in this will be to consider where you are and what
you want to do. The typical home system starts out with existing
hardware and the newly converted Linux user will want to get the most
out of existing hardware. Someone setting up a new system for a
specific purpose (such as an Internet provider) will instead have to
consider what the goal is and buy accordingly. Being ambitious I will
try to cover the entire range.
<P>Various purposes will also have different requirements regarding file
system placement on the drives, a large multiuser machine would
probably be best off with the <CODE>/home</CODE> directory on a
separate disk, just to give an example.
<P>In general, for performance it is advantageous to split most things
over as many disks as possible but there is a limited number of
devices that can live on a SCSI bus and cost is naturally also a
factor. Equally important, file system maintenance becomes more
complicated as the number of partitions and physical drives increases.
<P>
<H2><A NAME="ss10.1">10.1 Home Systems</A>
</H2>
<P>
<!--
disk!considerations and dimensioning!home systems
-->
With the cheap hardware available today it is possible to have
quite a big system at home that is still cheap, systems that
rival major servers of yesteryear. While many started out with
old, discarded disks to build a Linux server (which is how this
HOWTO came into existence), many can now afford to buy 40 GB
disks up front.
<P>Size remains important for some, and here are a few guidelines:
<P>
<DL>
<DT><B>Testing</B><DD><P>Linux is simple and you don't even need a hard disk to
try it out, if you can get the boot floppies to work you are likely to
get it to work on your hardware. If the standard kernel does not work
for you, do not forget that often there can be special boot disk versions
available for unusual hardware combinations that can solve your initial
problems until you can compile your own kernel.
<P>
<DT><B>Learning</B><DD><P>about operating system is something Linux excels in,
there is plenty of documentation and the source is available. A single
drive with 50 MB is enough to get you started with a shell, a few of
the most frequently used commands and utilities.
<P>
<DT><B>Hobby</B><DD><P>use or more serious learning requires more commands and
utilities but a single drive is still all it takes, 500 MB should give
you plenty of room, also for sources and documentation.
<P>
<DT><B>Serious</B><DD><P>software development or just serious hobby work requires
even more space. At this stage you have probably a mail and news feed
that requires spool files and plenty of space. Separate drives for
various tasks will begin to show a benefit. At this stage you have
probably already gotten hold of a few drives too. Drive requirements
gets harder to estimate but I would expect 2-4 GB to be plenty, even
for a small server.
<P>
<DT><B>Servers</B><DD><P>come in many flavours, ranging from mail servers to full
sized ISP servers. A base of 2 GB for the main system should be
sufficient, then add space and perhaps also drives for separate
features you will offer. Cost is the main limiting factor here
but be prepared to spend a bit if you wish to justify the "S"
in ISP. Admittedly, not all do it.
<P>Basically a server is dimensioned like any machine for serious use
with added space for the services offered, and tends to be IO bound
rather than CPU bound.
<P>With cheap networking technology both for land lines as well as
through radio nets, it is quite likely that very soon home users
will have their own servers more or less permanently hooked onto
the net.
<P>
</DL>
<P>
<H2><A NAME="ss10.2">10.2 Servers</A>
</H2>
<P>
<!--
disk!servers
-->
Big tasks require big drives and a separate section here. If possible
keep as much as possible on separate drives. Some of the appendices
detail the setup of a small departmental server for 10-100
users. Here I will present a few consideration for the higher end
servers. In general you should not be afraid of using RAID, not only
because it is fast and safe but also because it can make growth a
little less painful. All the notes below come as additions to the
points mentioned earlier.
<P>Popular servers rarely just happens, rather they grow over time and this
demands both generous amounts of disk space as well as a good net
connection. In many of these cases it might be a good idea to reserve
entire SCSI drives, in singles or as arrays, for each task. This way you
can move the data should the computer fail. Note that transferring drives
across computers is not simple and might not always work, especially in the
case of IDE drives. Drive arrays require careful setup in order to
reconstruct the data correctly, so you might want to keep a paper copy of
your <CODE>fstab</CODE> file as well as a note of SCSI IDs.
<P>
<H3><A NAME="server-home-dirs"></A> Home Directories </H3>
<P>
<!--
disk!servers!home directories
-->
Estimate how many drives you will need, if this is more than 2 I would
recommend RAID, strongly. If not you should separate users across your
drives dedicated to users based on some kind of simple hashing algorithm.
For instance you could use the first 2 letters in the user name, so
<CODE>jbloggs</CODE> is put on <CODE>/u/j/b/jbloggs</CODE> where <CODE>/u/j</CODE>
is a symbolic link to a
physical drive so you can get a balanced load on your drives.
<P>
<H3>Anonymous FTP</H3>
<P>
<!--
disk!servers!FTP, anonymous
-->
<!--
disk!servers!anonymous FTP
-->
This is an essential service if you are serious about service. Good
servers are well maintained, documented, kept up to date, and
immensely popular no matter where in the world they are located. The
big server
<A HREF="ftp://ftp.funet.fi">ftp.funet.fi</A>
is an excellent example of this.
<P>In general this is not a question of CPU but of network bandwidth. Size
is hard to estimate, mainly it is a question of ambition and service
attitudes. I believe the big archive at
<A HREF="ftp://ftp.cdrom.com">ftp.cdrom.com</A>
is a *BSD machine
with 50 GB disk. Also memory is important for a dedicated FTP server,
about 256 MB RAM would be sufficient for a very big server, whereas
smaller servers can get the job done well with 64 MB RAM.
Network connections would still be the most important factor.
<P>
<P>
<H3><A NAME="www"></A> WWW </H3>
<P>
<!--
disk!servers!WWW
-->
<!--
disk!servers!World Wide Web
-->
For many this is the main reason to get onto the Internet, in fact many
now seem to equate the two. In addition to being network intensive
there is also a fair bit of drive activity related to this, mainly
regarding the caches. Keeping the cache on a separate, fast drive
would be beneficial. Even better would be installing a caching proxy
server. This way you can reduce the cache size for each user and speed
up the service while at the same time cut down on the bandwidth
requirements.
<P>With a caching proxy server you need a fast set of drives, RAID0 would
be ideal as reliability is not important here. Higher capacity is
better but about 2 GB should be sufficient for most. Remember to match
the cache period to the capacity and demand. Too long periods would on
the other hand be a disadvantage, if possible try to adjust based on
the URL. For more information check up on the most used servers such as
<CODE>Harvest</CODE>,
<A HREF="http://www.squid-cache.org/">Squid</A>
and the one from
<A HREF="http://www.netscape.com">Netscape</A>.
<P>
<P>
<H3>Mail</H3>
<P>
<!--
disk!servers!mail
-->
Handling mail is something most machines do to some extent. The big mail
servers, however, come into a class of their own. This is a demanding task
and a big server can be slow even when connected to fast drives and a good
net feed. In the Linux world the big server at <CODE>vger.rutgers.edu</CODE> is a
well known example. Unlike a news service which is distributed and which
can partially reconstruct the spool using other machines as a feed, the
mail servers are centralised. This makes safety much more important, so for
a major server you should consider a RAID solution with emphasize on
reliability. Size is hard to estimate, it all depends on how many lists you
run as well as how many subscribers you have.
<P>Big mail servers can be IO limited in performance and for this
reason some use huge silicon disks connected to the SCSI bus to hold
all mail related files including temporary files.
For extra safety these are battery backed and filesystems
like <CODE>udf</CODE> are preferred since they always flush metadata to disk.
This added cost to performance is offset by the very fast disk.
<P>Note that these days more and more switch over from using <CODE>POP</CODE> to
pull mail to local machine from mail server and instead use <CODE>IMAP</CODE> to
serve mail while keeping the mail archive centralised.
This means that mail is no longer spooled in its original sense but
often builds up, requiring huge disk space. Also more and more (ab)use
mail attachments to send all sorts of things across, even a small
word processor document can easily end up over 1 MB. Size your disks
generously and keep an eye on how much space is left.
<P>
<P>
<H3>News</H3>
<P>
<!--
disk!servers!news
-->
This is definitely a high volume task, and very dependent on what
news groups you subscribe to. On Nyx there is a fairly complete feed
and the spool files consume about 17 GB. The biggest groups are no doubt
in the <CODE>alt.binary.*</CODE> hierarchy, so if you for some reason decide not to
get these you can get a good service with perhaps 12 GB. Still others,
that shall remain nameless, feel 2 GB is sufficient to claim ISP status.
In this case news expires so fast I feel the spelling IsP is barely
justified. A full newsfeed means a traffic of a few GB every day and this
is an ever growing number.
<P>
<P>
<H3>Others</H3>
<P>
<!--
disk!servers!other
-->
There are many services available on the net and even though many have been
put somewhat in the shadows by the web. Nevertheless, services like
<EM>archie</EM>, <EM>gopher</EM> and <EM>wais</EM> just to name a few, still exist and
remain valuable tools on the net. If you are serious about starting a major
server you should also consider these services. Determining the required
volumes is hard, it all depends on popularity and demand. Providing good
service inevitably has its costs, disk space is just one of them.
<P>
<P>
<H3>Server Recommendations</H3>
<P>
<!--
disk!servers!recommendations
-->
Servers today require large numbers of large disks to function
satisfactorily in commercial settings. As mean time between failure
(MTBF) decreases rapidly as the number of components increase it is
advisable to look into using RAID for protection and use a number of
medium sized drives rather than one single huge disk. Also look into
the High Availability (HA) project for more information.
More information is available at
<P>
<A HREF="http://www.ibiblio.org/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html">High Availability HOWTO</A>
and also at related
<A HREF="http://www.henge.com/~alanr/ha/index.html">web pages</A>.
<P>There is also an article in Byte called
<A HREF="http://www.byte.com/columns/servinglinux/1999/06/0607servinglinux.html">How Big Does Your Unix Server Have To Be?</A>
with many points that are relevant to Linux.
<P>
<P>
<H2><A NAME="pitfalls"></A> <A NAME="ss10.3">10.3 Pitfalls</A>
</H2>
<P>
<!--
disk!pitfalls
-->
The dangers of splitting up everything into separate partitions are
briefly mentioned in the section about volume management. Still, several
people have asked me to emphasize this point more strongly: when one
partition fills up it cannot grow any further, no matter if there is
plenty of space in other partitions.
<P>In particular look out for explosive growth in the news spool
(<CODE>/var/spool/news</CODE>). For multi user machines with quotas keep
an eye on <CODE>/tmp</CODE> and <CODE>/var/tmp</CODE> as some people try to hide their
files there, just look out for filenames ending in gif or jpeg...
<P>In fact, for single physical drives this scheme offers very little gains
at all, other than making file growth monitoring easier
(using '<CODE>df</CODE>') and physical track positioning. Most importantly
there is no scope for parallel disk access. A freely available volume
management system would solve this but this is still some time in the
future. However, when more specialised file systems become available
even a single disk could benefit from being divided into several
partitions.
<P>For more information see section
<A HREF="Multi-Disk-HOWTO-15.html#troubleshooting">Troubleshooting</A>.
<P>
<P>
<P>
<HR>
<A HREF="Multi-Disk-HOWTO-11.html">Next</A>
<A HREF="Multi-Disk-HOWTO-9.html">Previous</A>
<A HREF="Multi-Disk-HOWTO.html#toc10">Contents</A>
</BODY>
</HTML>