This commit is contained in:
gferg 2003-03-17 22:12:10 +00:00
parent 74a70a7966
commit 1bd2c2610d
3 changed files with 91 additions and 29 deletions

View File

@ -523,7 +523,7 @@ partition images to and from a TFTP server. </Para>
Cluster-HOWTO</ULink>,
<CiteTitle>Linux Cluster HOWTO</CiteTitle>
</Para><Para>
<CiteTitle>Updated: November 2002</CiteTitle>.
<CiteTitle>Updated: March 2003</CiteTitle>.
How to set up high-performance Linux computing clusters. </Para>
</ListItem>

View File

@ -100,7 +100,7 @@ various issues related to this. </Para>
Cluster-HOWTO</ULink>,
<CiteTitle>Linux Cluster HOWTO</CiteTitle>
</Para><Para>
<CiteTitle>Updated: November 2002</CiteTitle>.
<CiteTitle>Updated: March 2003</CiteTitle>.
How to set up high-performance Linux computing clusters. </Para>
</ListItem>

View File

@ -5,7 +5,7 @@
<title> Linux Cluster HOWTO </title>
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
<date> v0.98, November 11, 2002 </date>
<date> v1.0, March 17, 2003 </date>
<abstract>
How to set up high-performance Linux computing clusters.
@ -33,11 +33,13 @@ and includes not only details the compute aspects, but also the
desktop, laptop, and public server aspects. This is done mainly for
local use, but I put it up on the web since I received several e-mail
messages based on my newsgroup query requesting the same information.
Even today, as I plan another 64-node cluster, I find there is a
Even today, as I plan another 64-node cluster, I find that there is a
dearth of information about exactly how to assemble components to form
a node that works reliably under Linux. The main use of this HOWTO as
it stands is that it's a report on what kind of hardware works well
with Linux and what kind of hardware doesn't. </p>
a node that works reliably under Linux that includes information not
only about the compute nodes, but about hardware that needs to work
well with the nodes for productive research to happen. The main use
of this HOWTO as it stands is that it's a report on what kind of
hardware works well with Linux and what kind of hardware doesn't. </p>
</sect>
@ -246,7 +248,7 @@ following setup:
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DE6 Dual PIII-FCPGA motherboard </item>
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM hard disk </item>
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE </item>
<item> 1.4 MB floppy drive </item>
<item> Asus V7700 64mb GeForce2-GTS AGP video card </item>
@ -261,6 +263,27 @@ following setup:
<!-- ************************************************************* -->
<sect1> Firewall/gateway hardware
<p> 1 firewall with the following setup:
<itemize>
<item> AMD Palamino XP 1700+ 1.47GHz CPU </item>
<item> MSI KT3 Ultra2 KT333 MS-6380E motherboard </item>
<item> 512 MB PC2100 DDR-266MHz DIMM RAM </item>
<item> 40GB Seagate 7200rpm ATA/100 hard disk </item>
<item> Asus 52X CD-A520 INT IDE cdrom </item>
<item> 1.44 MB floppy drive </item>
<item> ATI Expert 2000 Rage 128 32mb video card </item>
<item> 4 Intel Pro/1000T Gigabit Server ethernet cards </item>
<item> 4U Black Rackmount Steel case </item>
</itemize>
</p>
</sect1>
<!-- ************************************************************* -->
<sect1> Miscellaneous/accessory hardware
<p> Backup:
@ -282,6 +305,13 @@ following setup:
</itemize>
</p>
<p> Printers:
<itemize>
<item> HP colour laserject 4600dn </item>
</itemize>
</p>
</sect1>
<!-- ************************************************************* -->
@ -294,7 +324,7 @@ at all the machines:
<itemize>
<item> 15" .28dp XLN CTL Monitor </item>
<item> 3 Belkin Omniview 16-Port Pro Switches </item>
<item> 40 KVM cables </item>
<item> Belkin Omniview 2-Port Switch </item>
</itemize>
</p>
@ -306,6 +336,8 @@ more monitor switches/KVM cables. </p>
<p> Networking is important:
<itemize>
<item> 1 Netgear FSM750S 48 port/2 git network switch </item>
<item> 1 Netgear FS517TS 16 port/1 git network switch </item>
<item> 1 Netgear FS750NA 48 port network switch </item>
<item> 1 Netgear FS524 24 port network switch </item>
<item> 1 Cisco Catalyst 3448 XL Enterprise Edition 48 port network switch </item>
@ -340,12 +372,27 @@ below $2000.00 (which is what our desktop machines cost). </p>
<sect1> Operating system: Linux, of course!
<p> We use Linux systems with a 2.4.9-7 kernel based on the KRUD 7.2
distribution, and 2.2.17-14 kernel based on the KRUD 7.0
distribution. These distributions work very well for us since updates
are sent to us on CD and there's no reliance on an external network
connection for updates. They also seem "cleaner" than the regular Red
Hat distributions. </p>
<p> The following kernels and distributions are what are being used:
<itemize>
<item> Kernel 2.2.16-22, distribution KRUD 7.0
<item> Kernel 2.4.9-7, distribution KRUD 7.2
<item> Kernel 2.4.18-10, distribution KRUD 7.3
</itemize>
These distributions work very well for us since updates are sent to us
on CD and there's no reliance on an external network connection for
updates. They also seem "cleaner" than the regular Red Hat
distributions, and the setup is extremely stable. </p>
</sect1>
<!-- ************************************************************* -->
<sect1> Networking software
<p> We use Shorewall 1.3.14a ((<htmlurl url="http://www.shorewall.net"
name="http://www.shorewall.net">) for the firewall. </p>
</sect1>
@ -433,7 +480,32 @@ to configure desktops as they wish. </p>
<!-- ************************************************************* -->
<sect2> Cloning
<sect2> Cloning and maintenance packages
<sect3> FAI
<p> FAI (<htmlurl url="http://www.informatik.uni-koeln.de/fai/"
name="http://www.informatik.uni-koeln.de/fai/">) is an automated
system to install a Debian GNU/Linux operating system on a PC
cluster. You can take one or more virgin PCs, turn on the power and
after a few minutes Linux is installed, configured and running on the
whole cluster, without any interaction necessary.
</sect3>
<sect3> SystemImager
<p> SystemImager (<htmlurl url="http://systemimager.org"
name="http://systemimager.org">) is software that automates Linux
installs, software distribution, and production deployment. </p>
</sect3>
</sect2>
<!-- ************************************************************* -->
<sect2> Personal cloning strategy
<p> I believe in having a completely distributed system. This means
each machine contains a copy of the operating system. Installing the
@ -500,16 +572,6 @@ name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">.
<!-- ************************************************************* -->
<sect2> Cloning and maintenance packages
<p> SystemImager (<htmlurl url="http://systemimager.org"
name="http://systemimager.org">) is software that automates Linux
installs, software distribution, and production deployment. </p>
</sect2>
<!-- ************************************************************* -->
<sect2> DHCP vs. hard-coded IP addresses
<p> If you have DHCP set up, then you don't need to reset the IP
@ -576,8 +638,8 @@ routines to communicate between processes on different machines. </p>
areas, are massively and trivially parallelisable, meaning that
perfect distribution can be achieved by spreading tasks equally across
the machines (for example, when analysing a whole genome using a
technique that operates on a single gene/proteom, each processor can
work on one gene at a time independent of all the other
technique that operates on a single gene/protein, each processor can
work on one gene/protein at a time independent of all the other
processors). </p>
<p> So far we have not found the need to use a professional queueing
@ -612,7 +674,7 @@ loads. One example is given below. Reboots have generally occurred
when a circuit breaker is tripped.
<tscreen><verb>
12:39pm up 439 days, 14:45, 1 user, load average: 1.19, 1.08, 1.02
2:29pm up 495 days, 1:04, 2 users, load average: 4.85, 7.15, 7.72
</verb></tscreen>
</p>