This commit is contained in:
gferg 2002-07-22 13:54:26 +00:00
parent 50218989e1
commit 9feedc4e2b
2 changed files with 62 additions and 23 deletions

View File

@ -11,7 +11,7 @@
</affiliation>
</author>
<pubdate>v1.1.7d, 15 July 2002</pubdate>
<pubdate>v1.1.7, 15 July 2002</pubdate>
<abstract>
<para>

View File

@ -5,7 +5,7 @@
<title> Linux Cluster HOWTO </title>
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
<date> v0.92, April 8, 2002 </date>
<date> v0.96, May 30, 2002 </date>
<abstract>
How to set up high-performance Linux computing clusters.
@ -281,7 +281,7 @@ below $2000.00 (which is what our desktop machines cost). </p>
<!-- ************************************************************* -->
<sect1> Linux, of course!
<sect1> Operating system: Linux, of course!
<p> We use Linux systems with a 2.4.9-7 kernel based on the KRUD 7.2
distribution, and 2.2.17-14 kernel based on the KRUD 7.0
@ -290,9 +290,15 @@ are sent to us on CD and there's no reliance on an external network
connection for updates. They also seem "cleaner" than the regular Red
Hat distributions. </p>
<p> We use our own software for parallelising applications
but have experimented with PVM and MPI. In my view, the overhead for
these pre-packaged programs is too high. I recommend writing
</sect1>
<!-- ************************************************************* -->
<sect1> Parallel processing software
<p> We use our own software for parallelising applications but have
experimented with PVM and MPI. In my view, the overhead for these
pre-packaged programs is too high. I recommend writing
application-specific code for the tasks you perform (that's one
person's view). </p>
@ -302,7 +308,8 @@ person's view). </p>
<sect1> Costs
<p> Linux is freely copiable. </p>
<p> Linux and most software that run on Linux are freely
copiable. </p>
</sect1>
@ -312,7 +319,7 @@ person's view). </p>
<!-- ************************************************************* -->
<sect> Set up and configuration
<sect> Set up, configuration, and maintenance
<!-- ************************************************************* -->
@ -366,7 +373,7 @@ to configure desktops as they wish. </p>
<!-- ************************************************************* -->
<sect1> Operating system installation
<sect1> Operating system installation and maintenance
<!-- ************************************************************* -->
@ -396,8 +403,8 @@ url="http://www.ram.org/computing/linux/cluster/"
name="http://www.ram.org/computing/linux/linux/cluster/">. This script
will have to be edited based on your cluster design. </p>
<p> To make this work, I also use Tom's Root Boot package <htmlurl
url="http://www.toms.net/rb/" name="http://www.toms.net/rb/"> to boot
<p> To make this work, I also use Tom's Root Boot package (<htmlurl
url="http://www.toms.net/rb/" name="http://www.toms.net/rb/">) to boot
the machine and clone the system. The <tt>go</tt> script can be
placed on a CD-ROM or on the floppy containing Tom's Root Boot package
(you need to delete a few programs from this package since the floppy
@ -410,6 +417,14 @@ Root Boot's init scripts so that it directly executes the <tt>go</tt>
script (you will still have to set IP addresses if you don't use
DHCP). </p>
<p> Alternately, you can create your own custom disk (like a rescue
disk) that contains the kernel you can want and the tools you
want. There are several documents that describe how to do this,
including the Linux Bootdisk HOWTO (<htmlurl
url="http://www.linuxdoc.org/HOWTO/Bootdisk-HOWTO/"
name="http://www.linuxdoc.org/HOWTO/Bootdisk-HOWTO/">), which also
contains links to other pre-made boot/root disks. </p>
<p> Thus you can develop a system where all you have to do is insert a
CDROM, turn on the machine, have a cup of coffee (or a can of coke)
and come back to see a full clone. You then repeat this process for as
@ -417,9 +432,10 @@ many machines as you have. This procedure has worked extremely well
for me and if you have someone else actually doing the work (of
inserting and removing CD-ROMs) then it's ideal. </p>
<p> <htmlurl url="mailto:rob@fantinibakery.com" name="Rob Fantini">
has contributed modifications of the scripts above that he used for
cloning a Mandrake 8.2 system accessible at <htmlurl
<p> Rob Fantini (<htmlurl url="mailto:rob@fantinibakery.com"
name="rob@fantinibakery.com">) has contributed modifications of the
scripts above that he used for cloning a Mandrake 8.2 system
accessible at <htmlurl
url="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz"
name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">.
</p>
@ -428,6 +444,16 @@ name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">.
<!-- ************************************************************* -->
<sect2> Cloning and maintenance packages
<p> SystemImager (<htmlurl url="http://systemimager.org"
name="http://systemimager.org">) is software that automates Linux
installs, software distribution, and production deployment. </p>
</sect2>
<!-- ************************************************************* -->
<sect2> DHCP vs. hard-coded IP addresses
<p> If you have DHCP set up, then you don't need to reset the IP
@ -459,7 +485,20 @@ issues are listed below: </p>
room increase the temperature significantly. Thus while they might be
okay as desktops, the cooling and power consumption when using them as
part of a large cluster is a consideration. The AMD Palmino
configuration described previously seems to work really well. </p>
configuration described previously seems to work really well, but I
definitely recommend getting two fans in the case--this solved all our
instability problems. </p>
</sect1>
<!-- ************************************************************* -->
<sect1> Known software issues <label id="known_software_issues">
<p> Some tar executables apparently don't create a tar file the nice
way they're supposed to (especially in terms of referencing and
de-referencing symbolic links). The solution to this I've found is to
use a tar executable that does, like the one from RedHat 7.0. </p>
</sect1>
@ -481,8 +520,9 @@ routines to communicate between processes on different machines. </p>
areas, are massively and trivially parallelisable, meaning that
perfect distribution can be achieved by spreading tasks equally across
the machines (for example, when analysing a whole genome using a
single gene technique, each processor can work on one gene at a time
independent of all the other processors). </p>
technique that operates on a single gene/proteom, each processor can
work on one gene at a time independent of all the other
processors). </p>
<p> So far we have not found the need to use a professional queueing
system, but obviously that is highly dependent on the type of
@ -510,15 +550,13 @@ clock (IPC) whereas the Athlon executes nine IPC (you do the math!)).
<p> These machines are incredibly stable both in terms of hardware and
software once they have been debugged (usually some in a new batch of
machines have hardware problems). Reboots have generally occurred
when a circuit breaker is tripped. The first machine I installed has
been up since its birth!
machines have hardware problems), running constantly under very heavy
loads. One example is given below. Reboots have generally occurred
when a circuit breaker is tripped.
<tscreen><verb>
~ ram@fp1 % uptime
4:49am up 374 days, 2:47, 1 user, load average: 2.08, 2.02, 2.01
12:39pm up 439 days, 14:45, 1 user, load average: 1.19, 1.08, 1.02
</verb></tscreen>
</p>
</sect1>
@ -565,3 +603,4 @@ Samudrala's research page (which describes the kind of research done with these
<!-- ************************************************************* -->
</article>