mirror of https://github.com/tLDP/LDP
updated
This commit is contained in:
parent
3136c4d10d
commit
58e9af2368
|
@ -2,10 +2,10 @@
|
|||
|
||||
<article>
|
||||
|
||||
<title>Linux Cluster HOWTO</title>
|
||||
<title> Linux Cluster HOWTO </title>
|
||||
|
||||
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
|
||||
<date> v0.2, June 10, 2001</date>
|
||||
<date> v0.3, August 21, 2001 </date>
|
||||
|
||||
<abstract>
|
||||
How to set up high-performance Linux computing clusters.
|
||||
|
@ -31,10 +31,10 @@ name="http://www.ram.org/computing/linux/linux_cluster.html">. </p>
|
|||
a general way, this is a specific description of how our lab is setup
|
||||
and includes not only details the compute aspects, but also the
|
||||
desktop, laptop, and public server aspects. This is done mainly for
|
||||
local use, but I figure I might as well put it up on the web and
|
||||
perhaps someone else will find it useful. The main use as it stands
|
||||
is that it's a report on what kind of hardware works well with Linux
|
||||
and what kind of hardware doesn't. </p>
|
||||
local use, but I put it up on the web since I received several e-mail
|
||||
messages based on my newsgroup query requesting the same information.
|
||||
The main use as it stands is that it's a report on what kind of
|
||||
hardware works well with Linux and what kind of hardware doesn't. </p>
|
||||
|
||||
</sect>
|
||||
|
||||
|
@ -45,7 +45,10 @@ and what kind of hardware doesn't. </p>
|
|||
<sect> Hardware
|
||||
|
||||
<p> This section covers the hardware choices I've made. Unless noted,
|
||||
assume that everything works <it>really</it> well. </p>
|
||||
assume that everything works <it>really</it> well. </p>
|
||||
|
||||
<p> Hardware installation is also fairly straight-forward unless
|
||||
otherwise noted, with most of the details covered by the manuals. </p>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
|
@ -85,6 +88,7 @@ assume that everything works <it>really</it> well. </p>
|
|||
<item> ATI Expert 98 8 MB PCI video card
|
||||
<item> Full-tower case
|
||||
</itemize>
|
||||
</p>
|
||||
|
||||
</sect1>
|
||||
|
||||
|
@ -174,7 +178,12 @@ at all the machines:
|
|||
</itemize>
|
||||
</p>
|
||||
|
||||
<p> Networking is important.
|
||||
<p> While this is a nice solution, I think it's kind of needless. What
|
||||
we need is a small hand held monitor that can plug into the back of
|
||||
the PC (operated with a stylus, like the Palm). I don't plan to use
|
||||
more monitor switches/KVM cables. </p>
|
||||
|
||||
<p> Networking is important:
|
||||
|
||||
<itemize>
|
||||
<item> 1 Cisco Catalyst 3448 XL Enterprise Edition network switch.
|
||||
|
@ -212,6 +221,8 @@ distribution. We use our own software for parallising applications but
|
|||
have experimented with PVM and MPI. In my view, the overhead for these
|
||||
pre-packaged programs is too high. </p>
|
||||
|
||||
</sect1>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect1> Costs
|
||||
|
@ -226,7 +237,7 @@ pre-packaged programs is too high. </p>
|
|||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect> Configuration
|
||||
<sect> Set up and configuration
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
|
@ -236,6 +247,12 @@ pre-packaged programs is too high. </p>
|
|||
|
||||
<p>
|
||||
<tscreen><verb>
|
||||
farm/cluster machines:
|
||||
|
||||
hda1 - swap (2 * RAM)
|
||||
hda2 - / (remaining disk space)
|
||||
hdb1 - /maxa (total disk)
|
||||
|
||||
desktops (without windows):
|
||||
|
||||
hda1 - swap (2 * RAM)
|
||||
|
@ -258,12 +275,6 @@ hda1 - /win (half the total disk size)
|
|||
hda2 - swap (2 * RAM)
|
||||
hda3 - / (4 GB)
|
||||
hda4 - /home (remaining disk space)
|
||||
|
||||
farm machines:
|
||||
|
||||
hda1 - swap (2 * RAM)
|
||||
hda2 - / (remaining disk space)
|
||||
hdb1 - /maxa (total disk)
|
||||
</verb></tscreen>
|
||||
</p>
|
||||
|
||||
|
@ -278,6 +289,106 @@ to configure desktops as they wish. </p>
|
|||
|
||||
</sect1>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect1> Operating system installation
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect2> Cloning
|
||||
|
||||
<p> I believe in having a completely distributed system. This means
|
||||
each machine contains a copy of the operating system. Installing the
|
||||
OS on each machine manually is cumbersome. To optimise this process,
|
||||
what I do is first set up and install one machine exactly the way I
|
||||
want to. I then create a tar and gzipped file of the entire system
|
||||
and place it on a CD-ROM which I then clone on each machine in my
|
||||
cluster. </p>
|
||||
|
||||
<p> The commands I use to create the tar file are as follows:
|
||||
|
||||
<tscreen><verb>
|
||||
tar -czvlps --same-owner --atime-preserve -f /maxa/slash.tgz /
|
||||
</verb></tscreen>
|
||||
</p>
|
||||
|
||||
<p> I use have a script called <tt>go</tt> that takes a hostname and
|
||||
IP address as its arguments and untars the <tt>slash.tgz</tt> file on
|
||||
the CD-ROM and replaces the hostname and IP address in the appropriate
|
||||
locations. A version of the <tt>go</tt> script and the input files for
|
||||
it can be accessed at: <htmlurl
|
||||
url="http://www.ram.org/computing/linux/cluster/"
|
||||
name="http://www.ram.org/computing/linux/linux/cluster/">. This script
|
||||
will have to be edited based on your cluster design. </p>
|
||||
|
||||
<p> To make this work, I also use Tom's Root Boot package <htmlurl
|
||||
url="http://www.toms.net/rb/" name="http://www.toms.net/rb/"> to boot
|
||||
the machine and clone the system. The <tt>go</tt> script can be
|
||||
placed on a CD-ROM or on the floppy containing Tom's Root Boot package
|
||||
(you need to delete a few programs from this package since the floppy
|
||||
disk is stretched to capacity). </p>
|
||||
|
||||
<p> More conveniently, you could burn a bootable CD-ROM containing
|
||||
Tom's Root Boot package, including the <tt>go</tt> script, and the tgz
|
||||
file containing the system you wish to clone. You can also edit Tom's
|
||||
Root Boot's init scripts so that it directly executes the <tt>go</tt>
|
||||
script (you will still have to set IP addresses if you don't use
|
||||
DHCP). </p>
|
||||
|
||||
<p> Thus you can develop a system where all you have to do is insert a
|
||||
CDROM, turn on the machine, have a cup of coffee (or a can of coke)
|
||||
and come back to see a full clone. You then repeat this process for as
|
||||
many machines as you have. This procedure has worked extremely well
|
||||
for me and if you have someone else actually doing the work (of
|
||||
inserting and removing CD-ROMs) then it's ideal. </p>
|
||||
|
||||
</sect2>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect2> DHCP vs. hard-coded IP addresses
|
||||
|
||||
<p> If you have DHCP set up, then you don't need to reset the IP
|
||||
address and that part of it can be removed from the <tt>go</tt>
|
||||
script. </p>
|
||||
|
||||
<p> DHCP has the advantage that you don't muck around with IP
|
||||
addresses at all provided the DHCP server is configured
|
||||
appropriately. It has the disadvantage that it relies on a centralised
|
||||
server (and like I said, I tend to distribute things as much as
|
||||
possible). Also, linking hardware ethernet addresses to IP addresses can
|
||||
make it inconvenient if you wish to replace machines or change
|
||||
hostnames routinely. </p>
|
||||
|
||||
</sect2>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
</sect1>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
</sect>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect> Performing tasks on the cluster
|
||||
|
||||
<p> This section is still being developed as the usage on my cluster
|
||||
evolves, but so far we tend to write our own sets of message passing
|
||||
routines to communicate between processes on different machines. </p>
|
||||
|
||||
<p> Many applications, particularly in the computational genomics
|
||||
areas, are massively and trivially parallelisable, meaning that
|
||||
perfect distribution can be achieved by spreading tasks equally across
|
||||
the machines (for example, when analysing a whole genome using a
|
||||
single gene technique, each processor can work on one gene at a time
|
||||
independent of all the other processors). </p>
|
||||
|
||||
<p> So far we have not found the need to use a professional queing
|
||||
system, but obviously that is highly dependent on the type of
|
||||
applications you wish to run. </p>
|
||||
|
||||
</sect>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
@ -318,4 +429,3 @@ Samudrala's research page (which describes the kind of research done with these
|
|||
<!-- ************************************************************* -->
|
||||
|
||||
</article>
|
||||
|
||||
|
|
Loading…
Reference in New Issue