This commit is contained in:
gferg 2002-04-09 14:00:22 +00:00
parent 033bc2fddc
commit ea177ceb54
1 changed files with 216 additions and 80 deletions

View File

@ -5,7 +5,7 @@
<title> Linux Cluster HOWTO </title>
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
<date> v0.3, August 21, 2001 </date>
<date> v0.92, April 8, 2002 </date>
<abstract>
How to set up high-performance Linux computing clusters.
@ -33,8 +33,11 @@ and includes not only details the compute aspects, but also the
desktop, laptop, and public server aspects. This is done mainly for
local use, but I put it up on the web since I received several e-mail
messages based on my newsgroup query requesting the same information.
The main use as it stands is that it's a report on what kind of
hardware works well with Linux and what kind of hardware doesn't. </p>
Even today, as I plan another 64-node cluster, I find there is a
dearth of information about exactly how to assemble components to form
a node that works reliably under Linux. The main use of this HOWTO as
it stands is that it's a report on what kind of hardware works well
with Linux and what kind of hardware doesn't. </p>
</sect>
@ -44,11 +47,14 @@ hardware works well with Linux and what kind of hardware doesn't. </p>
<sect> Hardware
<p> This section covers the hardware choices I've made. Unless noted,
assume that everything works <it>really</it> well. </p>
<p> This section covers the hardware choices I've made. Unless noted
in the <ref id="known_hardware_issues" name="known hardware issues">
section, assume that everything works <it>really</it> well. </p>
<p> Hardware installation is also fairly straight-forward unless
otherwise noted, with most of the details covered by the manuals. </p>
otherwise noted, with most of the details covered by the manuals. For
each section, the hardware is listed in the order of purchase (most
recent is listed first). </p>
<!-- ************************************************************* -->
@ -57,15 +63,31 @@ otherwise noted, with most of the details covered by the manuals. </p>
<p> 32 machines have the following setup each:
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard
<item> 2 256 MB 168-pin PC133 Registered ECC Micron RAM
<item> 1 20 GB Maxtor ATA/66 5400 RPM HD
<item> 1 40 GB Maxtor UDMA/100 7200 RPM HD
<item> Asus CD-S500 50x CDROM
<item> 1.4 MB floppy drive
<item> ATI Expert 98 8 MB PCI video card
<item> Mid-tower case
<item> 2 AMD Palamino MP XP 1800+ 1.53 GHz CPUs </item>
<item> Tyan S2460 Dual Socket-A/MP motherboard </item>
<item> Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
<item> 1 20 GB Maxtor UDMA/100 7200rpm HD </item>
<item> 1 120 GB Maxtor 5400rpm ATA100 HD </item>
<item> Asus CD-A520 52x CDROM </item>
<item> 1.44mb floppy drive </item>
<item> ATI Expert 98 8mb AGP video card </item>
<item> IN-WIN P4 300ATX Mid Tower case </item>
<item> Intel PCI PRO-100 10/100Mbps network card </item>
</itemize>
</p>
<p> 32 machines have the following setup each:
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
<item> 2 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 1 20 GB Maxtor ATA/66 5400 RPM HD </item>
<item> 1 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> Asus CD-S500 50x CDROM </item>
<item> 1.4 MB floppy drive </item>
<item> ATI Expert 98 8 MB PCI video card </item>
<item> IN-WIN P4 300ATX Mid Tower case </item>
</itemize>
</p>
@ -75,18 +97,19 @@ otherwise noted, with most of the details covered by the manuals. </p>
<sect1> Server hardware
<p> 1 external server with the following setup:
<p> 1 server for external use (dissemination of information) with the
following setup:
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard
<item> 2 256 MB 168-pin PC133 Registered ECC Micron RAM
<item> 1 20 GB Maxtor ATA/66 5400 RPM HD
<item> 2 40 GB Maxtor UDMA/100 7200 RPM HD
<item> Asus CD-S500 50x CDROM
<item> 1.4 MB floppy drive
<item> ATI Expert 98 8 MB PCI video card
<item> Full-tower case
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
<item> 2 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 1 20 GB Maxtor ATA/66 5400 RPM HD </item>
<item> 2 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> Asus CD-S500 50x CDROM </item>
<item> 1.4 MB floppy drive </item>
<item> ATI Expert 98 8 MB PCI video card </item>
<item> Full-tower case with 300W PS </item>
</itemize>
</p>
@ -96,69 +119,112 @@ otherwise noted, with most of the details covered by the manuals. </p>
<sect1> Desktop hardware
<p> 1 desktop with the following setup:
<itemize>
<item> 2 Intel Xeon 1.7 GHz 256K 400FS </item>
<item> Supermicro P4DCE Dual Xeon motherboard </item>
<item> 4 256mb RAMBUS 184-Pin 800 MHz memory </item>
<item> 2 120 GB Maxtor ATA/100 5400 RPM HD </item>
<item> 1 60 GB Maxtor ATA/100 7200 RPM HD </item>
<item> 52X Asus CD-A520 INT IDE CDROM </item>
<item> 1.4 MB floppy drive </item>
<item> Leadtex 64 MB GF2 MX400 AGP </item>
<item> Creative SB LIVE Value PCI 5.1 </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Supermicro SC760 full-tower case with 400W PS </item>
</itemize>
<p> 2 desktops with the following setup:
<itemize>
<item> 2 AMD K7 1.2g/266 MP Socket A CPU </item>
<item> Tyan S2462NG Dual Socket A motherboard </item>
<item> 4 256mb PC2100 REG ECC DDR-266Mhz </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> 50X Asus CD-A520 INT IDE CDROM </item>
<item> 1.4 MB floppy drive </item>
<item> Chaintech Geforce2 MX200 32mg AGP </item>
<item> Creative SB LIVE Value PCI </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Full-tower case with 300W PS </item>
</itemize>
</p>
<p> 2 desktops with the following setup:
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> Asus CD-S500 50x CDROM </item>
<item> 1.4 MB floppy drive </item>
<item> Jaton Nvidia TNT2 32mb PCI </item>
<item> Creative SB LIVE Value PCI </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Full-tower case with 300W PS </item>
</itemize>
</p>
<p> 2 desktops with the following setup:
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> Mitsumi 8x/4x/32x CDRW </item>
<item> 1.4 MB floppy drive </item>
<item> Jaton Nvidia TNT2 32mb PCI </item>
<item> Creative SB LIVE Value PCI </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Full-tower case with 300W PS </item>
</itemize>
</p>
<p> 4 desktops with the following setup:
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs
<item> Supermicro 370 DE6 Dual PIII-FCPGA motherboard
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE
<item> 1.4 MB floppy drive
<item> Asus V7700 64mb GeForce2-GTS AGP video card
<item> Creative SB Live Platinum 5.1 sound card
<item> Microsoft Natural Keyboard
<item> Microsoft Intellimouse Explorer
<item> Full-tower case
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DE6 Dual PIII-FCPGA motherboard </item>
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE </item>
<item> 1.4 MB floppy drive </item>
<item> Asus V7700 64mb GeForce2-GTS AGP video card </item>
<item> Creative SB Live Platinum 5.1 sound card </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Full-tower case with 300W PS </item>
</itemize>
</p>
<p> 2 desktops with the following setup:
</sect1>
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD
<item> Mitsumi 8x/4x/32x CDRW
<item> 1.4 MB floppy drive
<item> Jaton Nvidia TNT2 32mb PCI
<item> Creative SB LIVE Value PCI
<item> Microsoft Natural Keyboard
<item> Microsoft Intellimouse Explorer
<item> Full-tower case
</itemize>
</p>
<!-- ************************************************************* -->
<p> 2 desktops with the following setup:
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD
<item> Asus CD-S500 50x CDROM
<item> 1.4 MB floppy drive
<item> Jaton Nvidia TNT2 32mb PCI
<item> Creative SB LIVE Value PCI
<item> Microsoft Natural Keyboard
<item> Microsoft Intellimouse Explorer
<item> Full-tower case
</itemize>
</p>
<sect1> Miscellaneous/accessory hardware
<p> Backup:
<itemize>
<item> 2 Sony 20/40 GB DSS4 SE LVD DAT
<item> 2 Sony 20/40 GB DSS4 SE LVD DAT </item>
</itemize>
</p>
<p> Monitors:
<itemize>
<item> 4 21" Sony CPD-G500 .24mm monitor
<item> 2 18" Viewsonic VP-181 TFT-LCD monitor
<item> 1 22" Viewsonic P220F 0.25-0.27m monitor </item>
<item> 4 21" Sony CPD-G500 .24mm monitor </item>
<item> 2 18" Viewsonic VP181 LCD monitor </item>
<item> 1 17" Viewsonic VE170 LCD monitor </item>
</itemize>
</p>
@ -172,9 +238,9 @@ otherwise noted, with most of the details covered by the manuals. </p>
at all the machines:
<itemize>
<item> 15" .28dp XLN CTL Monitor
<item> 3 Belkin Omniview 16-Port Pro Switches
<item> 40 KVM cables
<item> 15" .28dp XLN CTL Monitor </item>
<item> 3 Belkin Omniview 16-Port Pro Switches </item>
<item> 40 KVM cables </item>
</itemize>
</p>
@ -186,7 +252,8 @@ more monitor switches/KVM cables. </p>
<p> Networking is important:
<itemize>
<item> 1 Cisco Catalyst 3448 XL Enterprise Edition network switch.
<item> 1 Cisco Catalyst 3448 XL Enterprise Edition 48 port network switch. </item>
<item> 1 Netgear FS524 24 port network switch </item>
</itemize>
</p>
@ -199,7 +266,7 @@ more monitor switches/KVM cables. </p>
<p> Our vendor is Hard Drives Northwest (<htmlurl
url="http://www.hdnw.com" name="http://www.hdnw.com">). For each
compute node in our cluster (containing two processors), we paid about
$1500, including taxes. Generally, our goal is to keep each node to
$1500-$2000, including taxes. Generally, our goal is to keep each node to
below $2000.00 (which is what our desktop machines cost). </p>
</sect1>
@ -216,10 +283,18 @@ below $2000.00 (which is what our desktop machines cost). </p>
<sect1> Linux, of course!
<p> Specfically we use 2.2.17-14 kernel based on the KRUD 7.0
distribution. We use our own software for parallising applications but
have experimented with PVM and MPI. In my view, the overhead for these
pre-packaged programs is too high. </p>
<p> We use Linux systems with a 2.4.9-7 kernel based on the KRUD 7.2
distribution, and 2.2.17-14 kernel based on the KRUD 7.0
distribution. These distributions work very well for us since updates
are sent to us on CD and there's no reliance on an external network
connection for updates. They also seem "cleaner" than the regular Red
Hat distributions. </p>
<p> We use our own software for parallelising applications
but have experimented with PVM and MPI. In my view, the overhead for
these pre-packaged programs is too high. I recommend writing
application-specific code for the tasks you perform (that's one
person's view). </p>
</sect1>
@ -342,6 +417,13 @@ many machines as you have. This procedure has worked extremely well
for me and if you have someone else actually doing the work (of
inserting and removing CD-ROMs) then it's ideal. </p>
<p> <htmlurl url="mailto:rob@fantinibakery.com" name="Rob Fantini">
has contributed modifications of the scripts above that he used for
cloning a Mandrake 8.2 system accessible at <htmlurl
url="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz"
name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">.
</p>
</sect2>
<!-- ************************************************************* -->
@ -368,12 +450,29 @@ hostnames routinely. </p>
<!-- ************************************************************* -->
<sect1> Known hardware issues <label id="known_hardware_issues">
<p> The hardware in general has worked really well for us. Specific
issues are listed below: </p>
<p> The AMD dual 1.2 GHz machines run really hot. Two of them in a
room increase the temperature significantly. Thus while they might be
okay as desktops, the cooling and power consumption when using them as
part of a large cluster is a consideration. The AMD Palmino
configuration described previously seems to work really well. </p>
</sect1>
<!-- ************************************************************* -->
</sect>
<!-- ************************************************************* -->
<sect> Performing tasks on the cluster
<!-- ************************************************************* -->
<p> This section is still being developed as the usage on my cluster
evolves, but so far we tend to write our own sets of message passing
routines to communicate between processes on different machines. </p>
@ -385,10 +484,47 @@ the machines (for example, when analysing a whole genome using a
single gene technique, each processor can work on one gene at a time
independent of all the other processors). </p>
<p> So far we have not found the need to use a professional queing
<p> So far we have not found the need to use a professional queueing
system, but obviously that is highly dependent on the type of
applications you wish to run. </p>
<!-- ************************************************************* -->
<sect1> Rough benchmarks
<p> For the single most important program we run (our ab initio
protein folding simulation program), using the Pentium 3 1 GHz
processor machine as a reference frame, the Athlon 1.2 GHz processor
machine is about 16% faster on average, the Pentium 4 1.7 GHz machine
is about 25-32% faster on average, and the Athlon 1.5 GHz processor is
about 80% faster on average (yes, the Athlon 1.5 GHz is faster than
the Xeon 1.7 GHz since the Xeon executes only six instructions per
clock (IPC) whereas the Athlon executes nine IPC (you do the math!)).
</p>
</sect1>
<!-- ************************************************************* -->
<sect1> Uptimes
<p> These machines are incredibly stable both in terms of hardware and
software once they have been debugged (usually some in a new batch of
machines have hardware problems). Reboots have generally occurred
when a circuit breaker is tripped. The first machine I installed has
been up since its birth!
<tscreen><verb>
~ ram@fp1 % uptime
4:49am up 374 days, 2:47, 1 user, load average: 2.08, 2.02, 2.01
</verb></tscreen>
</p>
</sect1>
<!-- ************************************************************* -->
</sect>
<!-- ************************************************************* -->