mirror of https://github.com/tLDP/LDP
updated
This commit is contained in:
parent
50218989e1
commit
9feedc4e2b
|
@ -11,7 +11,7 @@
|
|||
</affiliation>
|
||||
</author>
|
||||
|
||||
<pubdate>v1.1.7d, 15 July 2002</pubdate>
|
||||
<pubdate>v1.1.7, 15 July 2002</pubdate>
|
||||
|
||||
<abstract>
|
||||
<para>
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
<title> Linux Cluster HOWTO </title>
|
||||
|
||||
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
|
||||
<date> v0.92, April 8, 2002 </date>
|
||||
<date> v0.96, May 30, 2002 </date>
|
||||
|
||||
<abstract>
|
||||
How to set up high-performance Linux computing clusters.
|
||||
|
@ -281,7 +281,7 @@ below $2000.00 (which is what our desktop machines cost). </p>
|
|||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect1> Linux, of course!
|
||||
<sect1> Operating system: Linux, of course!
|
||||
|
||||
<p> We use Linux systems with a 2.4.9-7 kernel based on the KRUD 7.2
|
||||
distribution, and 2.2.17-14 kernel based on the KRUD 7.0
|
||||
|
@ -290,9 +290,15 @@ are sent to us on CD and there's no reliance on an external network
|
|||
connection for updates. They also seem "cleaner" than the regular Red
|
||||
Hat distributions. </p>
|
||||
|
||||
<p> We use our own software for parallelising applications
|
||||
but have experimented with PVM and MPI. In my view, the overhead for
|
||||
these pre-packaged programs is too high. I recommend writing
|
||||
</sect1>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect1> Parallel processing software
|
||||
|
||||
<p> We use our own software for parallelising applications but have
|
||||
experimented with PVM and MPI. In my view, the overhead for these
|
||||
pre-packaged programs is too high. I recommend writing
|
||||
application-specific code for the tasks you perform (that's one
|
||||
person's view). </p>
|
||||
|
||||
|
@ -302,7 +308,8 @@ person's view). </p>
|
|||
|
||||
<sect1> Costs
|
||||
|
||||
<p> Linux is freely copiable. </p>
|
||||
<p> Linux and most software that run on Linux are freely
|
||||
copiable. </p>
|
||||
|
||||
</sect1>
|
||||
|
||||
|
@ -312,7 +319,7 @@ person's view). </p>
|
|||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect> Set up and configuration
|
||||
<sect> Set up, configuration, and maintenance
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
|
@ -366,7 +373,7 @@ to configure desktops as they wish. </p>
|
|||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect1> Operating system installation
|
||||
<sect1> Operating system installation and maintenance
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
|
@ -396,8 +403,8 @@ url="http://www.ram.org/computing/linux/cluster/"
|
|||
name="http://www.ram.org/computing/linux/linux/cluster/">. This script
|
||||
will have to be edited based on your cluster design. </p>
|
||||
|
||||
<p> To make this work, I also use Tom's Root Boot package <htmlurl
|
||||
url="http://www.toms.net/rb/" name="http://www.toms.net/rb/"> to boot
|
||||
<p> To make this work, I also use Tom's Root Boot package (<htmlurl
|
||||
url="http://www.toms.net/rb/" name="http://www.toms.net/rb/">) to boot
|
||||
the machine and clone the system. The <tt>go</tt> script can be
|
||||
placed on a CD-ROM or on the floppy containing Tom's Root Boot package
|
||||
(you need to delete a few programs from this package since the floppy
|
||||
|
@ -410,6 +417,14 @@ Root Boot's init scripts so that it directly executes the <tt>go</tt>
|
|||
script (you will still have to set IP addresses if you don't use
|
||||
DHCP). </p>
|
||||
|
||||
<p> Alternately, you can create your own custom disk (like a rescue
|
||||
disk) that contains the kernel you can want and the tools you
|
||||
want. There are several documents that describe how to do this,
|
||||
including the Linux Bootdisk HOWTO (<htmlurl
|
||||
url="http://www.linuxdoc.org/HOWTO/Bootdisk-HOWTO/"
|
||||
name="http://www.linuxdoc.org/HOWTO/Bootdisk-HOWTO/">), which also
|
||||
contains links to other pre-made boot/root disks. </p>
|
||||
|
||||
<p> Thus you can develop a system where all you have to do is insert a
|
||||
CDROM, turn on the machine, have a cup of coffee (or a can of coke)
|
||||
and come back to see a full clone. You then repeat this process for as
|
||||
|
@ -417,9 +432,10 @@ many machines as you have. This procedure has worked extremely well
|
|||
for me and if you have someone else actually doing the work (of
|
||||
inserting and removing CD-ROMs) then it's ideal. </p>
|
||||
|
||||
<p> <htmlurl url="mailto:rob@fantinibakery.com" name="Rob Fantini">
|
||||
has contributed modifications of the scripts above that he used for
|
||||
cloning a Mandrake 8.2 system accessible at <htmlurl
|
||||
<p> Rob Fantini (<htmlurl url="mailto:rob@fantinibakery.com"
|
||||
name="rob@fantinibakery.com">) has contributed modifications of the
|
||||
scripts above that he used for cloning a Mandrake 8.2 system
|
||||
accessible at <htmlurl
|
||||
url="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz"
|
||||
name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">.
|
||||
</p>
|
||||
|
@ -428,6 +444,16 @@ name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">.
|
|||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect2> Cloning and maintenance packages
|
||||
|
||||
<p> SystemImager (<htmlurl url="http://systemimager.org"
|
||||
name="http://systemimager.org">) is software that automates Linux
|
||||
installs, software distribution, and production deployment. </p>
|
||||
|
||||
</sect2>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect2> DHCP vs. hard-coded IP addresses
|
||||
|
||||
<p> If you have DHCP set up, then you don't need to reset the IP
|
||||
|
@ -459,7 +485,20 @@ issues are listed below: </p>
|
|||
room increase the temperature significantly. Thus while they might be
|
||||
okay as desktops, the cooling and power consumption when using them as
|
||||
part of a large cluster is a consideration. The AMD Palmino
|
||||
configuration described previously seems to work really well. </p>
|
||||
configuration described previously seems to work really well, but I
|
||||
definitely recommend getting two fans in the case--this solved all our
|
||||
instability problems. </p>
|
||||
|
||||
</sect1>
|
||||
|
||||
<!-- ************************************************************* -->
|
||||
|
||||
<sect1> Known software issues <label id="known_software_issues">
|
||||
|
||||
<p> Some tar executables apparently don't create a tar file the nice
|
||||
way they're supposed to (especially in terms of referencing and
|
||||
de-referencing symbolic links). The solution to this I've found is to
|
||||
use a tar executable that does, like the one from RedHat 7.0. </p>
|
||||
|
||||
</sect1>
|
||||
|
||||
|
@ -481,8 +520,9 @@ routines to communicate between processes on different machines. </p>
|
|||
areas, are massively and trivially parallelisable, meaning that
|
||||
perfect distribution can be achieved by spreading tasks equally across
|
||||
the machines (for example, when analysing a whole genome using a
|
||||
single gene technique, each processor can work on one gene at a time
|
||||
independent of all the other processors). </p>
|
||||
technique that operates on a single gene/proteom, each processor can
|
||||
work on one gene at a time independent of all the other
|
||||
processors). </p>
|
||||
|
||||
<p> So far we have not found the need to use a professional queueing
|
||||
system, but obviously that is highly dependent on the type of
|
||||
|
@ -510,15 +550,13 @@ clock (IPC) whereas the Athlon executes nine IPC (you do the math!)).
|
|||
|
||||
<p> These machines are incredibly stable both in terms of hardware and
|
||||
software once they have been debugged (usually some in a new batch of
|
||||
machines have hardware problems). Reboots have generally occurred
|
||||
when a circuit breaker is tripped. The first machine I installed has
|
||||
been up since its birth!
|
||||
machines have hardware problems), running constantly under very heavy
|
||||
loads. One example is given below. Reboots have generally occurred
|
||||
when a circuit breaker is tripped.
|
||||
|
||||
<tscreen><verb>
|
||||
~ ram@fp1 % uptime
|
||||
4:49am up 374 days, 2:47, 1 user, load average: 2.08, 2.02, 2.01
|
||||
12:39pm up 439 days, 14:45, 1 user, load average: 1.19, 1.08, 1.02
|
||||
</verb></tscreen>
|
||||
|
||||
</p>
|
||||
|
||||
</sect1>
|
||||
|
@ -565,3 +603,4 @@ Samudrala's research page (which describes the kind of research done with these
|
|||
<!-- ************************************************************* -->
|
||||
|
||||
</article>
|
||||
|
||||
|
|
Loading…
Reference in New Issue