This commit is contained in:
gferg 2001-08-22 14:00:26 +00:00
parent 3136c4d10d
commit 58e9af2368
1 changed files with 126 additions and 16 deletions

View File

@ -2,10 +2,10 @@
<article>
<title>Linux Cluster HOWTO</title>
<title> Linux Cluster HOWTO </title>
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
<date> v0.2, June 10, 2001</date>
<date> v0.3, August 21, 2001 </date>
<abstract>
How to set up high-performance Linux computing clusters.
@ -31,10 +31,10 @@ name="http://www.ram.org/computing/linux/linux_cluster.html">. </p>
a general way, this is a specific description of how our lab is setup
and includes not only details the compute aspects, but also the
desktop, laptop, and public server aspects. This is done mainly for
local use, but I figure I might as well put it up on the web and
perhaps someone else will find it useful. The main use as it stands
is that it's a report on what kind of hardware works well with Linux
and what kind of hardware doesn't. </p>
local use, but I put it up on the web since I received several e-mail
messages based on my newsgroup query requesting the same information.
The main use as it stands is that it's a report on what kind of
hardware works well with Linux and what kind of hardware doesn't. </p>
</sect>
@ -45,7 +45,10 @@ and what kind of hardware doesn't. </p>
<sect> Hardware
<p> This section covers the hardware choices I've made. Unless noted,
assume that everything works <it>really</it> well. </p>
assume that everything works <it>really</it> well. </p>
<p> Hardware installation is also fairly straight-forward unless
otherwise noted, with most of the details covered by the manuals. </p>
<!-- ************************************************************* -->
@ -85,6 +88,7 @@ assume that everything works <it>really</it> well. </p>
<item> ATI Expert 98 8 MB PCI video card
<item> Full-tower case
</itemize>
</p>
</sect1>
@ -174,7 +178,12 @@ at all the machines:
</itemize>
</p>
<p> Networking is important.
<p> While this is a nice solution, I think it's kind of needless. What
we need is a small hand held monitor that can plug into the back of
the PC (operated with a stylus, like the Palm). I don't plan to use
more monitor switches/KVM cables. </p>
<p> Networking is important:
<itemize>
<item> 1 Cisco Catalyst 3448 XL Enterprise Edition network switch.
@ -212,6 +221,8 @@ distribution. We use our own software for parallising applications but
have experimented with PVM and MPI. In my view, the overhead for these
pre-packaged programs is too high. </p>
</sect1>
<!-- ************************************************************* -->
<sect1> Costs
@ -226,7 +237,7 @@ pre-packaged programs is too high. </p>
<!-- ************************************************************* -->
<sect> Configuration
<sect> Set up and configuration
<!-- ************************************************************* -->
@ -236,6 +247,12 @@ pre-packaged programs is too high. </p>
<p>
<tscreen><verb>
farm/cluster machines:
hda1 - swap (2 * RAM)
hda2 - / (remaining disk space)
hdb1 - /maxa (total disk)
desktops (without windows):
hda1 - swap (2 * RAM)
@ -258,12 +275,6 @@ hda1 - /win (half the total disk size)
hda2 - swap (2 * RAM)
hda3 - / (4 GB)
hda4 - /home (remaining disk space)
farm machines:
hda1 - swap (2 * RAM)
hda2 - / (remaining disk space)
hdb1 - /maxa (total disk)
</verb></tscreen>
</p>
@ -278,6 +289,106 @@ to configure desktops as they wish. </p>
</sect1>
<!-- ************************************************************* -->
<sect1> Operating system installation
<!-- ************************************************************* -->
<sect2> Cloning
<p> I believe in having a completely distributed system. This means
each machine contains a copy of the operating system. Installing the
OS on each machine manually is cumbersome. To optimise this process,
what I do is first set up and install one machine exactly the way I
want to. I then create a tar and gzipped file of the entire system
and place it on a CD-ROM which I then clone on each machine in my
cluster. </p>
<p> The commands I use to create the tar file are as follows:
<tscreen><verb>
tar -czvlps --same-owner --atime-preserve -f /maxa/slash.tgz /
</verb></tscreen>
</p>
<p> I use have a script called <tt>go</tt> that takes a hostname and
IP address as its arguments and untars the <tt>slash.tgz</tt> file on
the CD-ROM and replaces the hostname and IP address in the appropriate
locations. A version of the <tt>go</tt> script and the input files for
it can be accessed at: <htmlurl
url="http://www.ram.org/computing/linux/cluster/"
name="http://www.ram.org/computing/linux/linux/cluster/">. This script
will have to be edited based on your cluster design. </p>
<p> To make this work, I also use Tom's Root Boot package <htmlurl
url="http://www.toms.net/rb/" name="http://www.toms.net/rb/"> to boot
the machine and clone the system. The <tt>go</tt> script can be
placed on a CD-ROM or on the floppy containing Tom's Root Boot package
(you need to delete a few programs from this package since the floppy
disk is stretched to capacity). </p>
<p> More conveniently, you could burn a bootable CD-ROM containing
Tom's Root Boot package, including the <tt>go</tt> script, and the tgz
file containing the system you wish to clone. You can also edit Tom's
Root Boot's init scripts so that it directly executes the <tt>go</tt>
script (you will still have to set IP addresses if you don't use
DHCP). </p>
<p> Thus you can develop a system where all you have to do is insert a
CDROM, turn on the machine, have a cup of coffee (or a can of coke)
and come back to see a full clone. You then repeat this process for as
many machines as you have. This procedure has worked extremely well
for me and if you have someone else actually doing the work (of
inserting and removing CD-ROMs) then it's ideal. </p>
</sect2>
<!-- ************************************************************* -->
<sect2> DHCP vs. hard-coded IP addresses
<p> If you have DHCP set up, then you don't need to reset the IP
address and that part of it can be removed from the <tt>go</tt>
script. </p>
<p> DHCP has the advantage that you don't muck around with IP
addresses at all provided the DHCP server is configured
appropriately. It has the disadvantage that it relies on a centralised
server (and like I said, I tend to distribute things as much as
possible). Also, linking hardware ethernet addresses to IP addresses can
make it inconvenient if you wish to replace machines or change
hostnames routinely. </p>
</sect2>
<!-- ************************************************************* -->
</sect1>
<!-- ************************************************************* -->
</sect>
<!-- ************************************************************* -->
<sect> Performing tasks on the cluster
<p> This section is still being developed as the usage on my cluster
evolves, but so far we tend to write our own sets of message passing
routines to communicate between processes on different machines. </p>
<p> Many applications, particularly in the computational genomics
areas, are massively and trivially parallelisable, meaning that
perfect distribution can be achieved by spreading tasks equally across
the machines (for example, when analysing a whole genome using a
single gene technique, each processor can work on one gene at a time
independent of all the other processors). </p>
<p> So far we have not found the need to use a professional queing
system, but obviously that is highly dependent on the type of
applications you wish to run. </p>
</sect>
<!-- ************************************************************* -->
@ -318,4 +429,3 @@ Samudrala's research page (which describes the kind of research done with these
<!-- ************************************************************* -->
</article>