2001-07-02 14:29:24 +00:00
|
|
|
<!doctype Linuxdoc system>
|
|
|
|
|
|
|
|
<article>
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<title> Linux Cluster HOWTO </title>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
|
2003-03-17 22:12:10 +00:00
|
|
|
<date> v1.0, March 17, 2003 </date>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<abstract>
|
|
|
|
How to set up high-performance Linux computing clusters.
|
|
|
|
</abstract>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<toc>
|
|
|
|
|
|
|
|
<sect> Introduction
|
|
|
|
|
|
|
|
<p> This document describes how I set up my Linux computing clusters
|
|
|
|
for high-performance computing which I need for <htmlurl
|
|
|
|
url="http://compbio.washington.edu" name="my research">. </p>
|
|
|
|
|
|
|
|
<p> Use the information below at your own risk. I disclaim all
|
|
|
|
responsibility for anything you may do after reading this HOWTO. The
|
|
|
|
latest version of this HOWTO will always be available at <htmlurl
|
|
|
|
url="http://www.ram.org/computing/linux/linux_cluster.html"
|
|
|
|
name="http://www.ram.org/computing/linux/linux_cluster.html">. </p>
|
|
|
|
|
|
|
|
<p> Unlike other documentation that talks about setting up clusters in
|
|
|
|
a general way, this is a specific description of how our lab is setup
|
|
|
|
and includes not only details the compute aspects, but also the
|
|
|
|
desktop, laptop, and public server aspects. This is done mainly for
|
2001-08-22 14:00:26 +00:00
|
|
|
local use, but I put it up on the web since I received several e-mail
|
|
|
|
messages based on my newsgroup query requesting the same information.
|
2003-03-17 22:12:10 +00:00
|
|
|
Even today, as I plan another 64-node cluster, I find that there is a
|
2002-04-09 14:00:22 +00:00
|
|
|
dearth of information about exactly how to assemble components to form
|
2003-03-17 22:12:10 +00:00
|
|
|
a node that works reliably under Linux that includes information not
|
|
|
|
only about the compute nodes, but about hardware that needs to work
|
|
|
|
well with the nodes for productive research to happen. The main use
|
|
|
|
of this HOWTO as it stands is that it's a report on what kind of
|
|
|
|
hardware works well with Linux and what kind of hardware doesn't. </p>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Hardware
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> This section covers the hardware choices I've made. Unless noted
|
|
|
|
in the <ref id="known_hardware_issues" name="known hardware issues">
|
|
|
|
section, assume that everything works <it>really</it> well. </p>
|
2001-08-22 14:00:26 +00:00
|
|
|
|
|
|
|
<p> Hardware installation is also fairly straight-forward unless
|
2002-04-09 14:00:22 +00:00
|
|
|
otherwise noted, with most of the details covered by the manuals. For
|
|
|
|
each section, the hardware is listed in the order of purchase (most
|
|
|
|
recent is listed first). </p>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Node hardware
|
|
|
|
|
|
|
|
<p> 32 machines have the following setup each:
|
|
|
|
|
2002-09-26 15:20:15 +00:00
|
|
|
<itemize>
|
|
|
|
<item> 2 AMD Palamino MP XP 2000+ 1.67 GHz CPUs </item>
|
|
|
|
<item> Asus A7M266-D w/LAN Dual DDR </item>
|
|
|
|
<item> 2 Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
|
|
|
|
<item> 1 41 GB Maxtor 7200rpm ATA100 HD </item>
|
|
|
|
<item> 1 120 GB Maxtor 5400rpm ATA100 HD </item>
|
|
|
|
<item> Asus CD-A520 52x CDROM </item>
|
|
|
|
<item> 1.44mb floppy drive </item>
|
|
|
|
<item> ATI Expert 2000 Rage 128 32mb </item>
|
|
|
|
<item> IN-WIN P4 300ATX Mid Tower case </item>
|
|
|
|
<item> Enermax P4-430ATX power supply </item>
|
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> 32 machines have the following setup each:
|
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 AMD Palamino MP XP 1800+ 1.53 GHz CPUs </item>
|
|
|
|
<item> Tyan S2460 Dual Socket-A/MP motherboard </item>
|
|
|
|
<item> Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
|
|
|
|
<item> 1 20 GB Maxtor UDMA/100 7200rpm HD </item>
|
|
|
|
<item> 1 120 GB Maxtor 5400rpm ATA100 HD </item>
|
|
|
|
<item> Asus CD-A520 52x CDROM </item>
|
|
|
|
<item> 1.44mb floppy drive </item>
|
|
|
|
<item> ATI Expert 98 8mb AGP video card </item>
|
|
|
|
<item> IN-WIN P4 300ATX Mid Tower case </item>
|
|
|
|
<item> Intel PCI PRO-100 10/100Mbps network card </item>
|
2002-09-26 15:20:15 +00:00
|
|
|
<item> Enermax P4-430ATX power supply </item>
|
2002-04-09 14:00:22 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> 32 machines have the following setup each:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 2 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
|
|
|
<item> 1 20 GB Maxtor ATA/66 5400 RPM HD </item>
|
|
|
|
<item> 1 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> Asus CD-S500 50x CDROM </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> ATI Expert 98 8 MB PCI video card </item>
|
|
|
|
<item> IN-WIN P4 300ATX Mid Tower case </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Server hardware
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> 1 server for external use (dissemination of information) with the
|
|
|
|
following setup:
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<itemize>
|
2002-09-26 15:20:15 +00:00
|
|
|
<item> 2 AMD Palamino MP XP 2000+ 1.67 GHz CPUs </item>
|
|
|
|
<item> Asus A7M266-D w/LAN Dual DDR </item>
|
|
|
|
<item> 4 Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
|
|
|
|
<item> Asus CD-A520 52x CDROM </item>
|
|
|
|
<item> 1 41 GB Maxtor 7200rpm ATA100 HD </item>
|
|
|
|
<item> 6 120 GB Maxtor 5400rpm ATA100 HD </item>
|
|
|
|
<item> 1.44mb floppy drive </item>
|
|
|
|
<item> ATI Expert 2000 Rage 128 32mb </item>
|
|
|
|
<item> IN-WIN P4 300ATX Mid Tower case </item>
|
|
|
|
<item> Enermax P4-430ATX power supply </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
2001-08-22 14:00:26 +00:00
|
|
|
</p>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Desktop hardware
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> 1 desktop with the following setup:
|
2001-07-02 14:29:24 +00:00
|
|
|
|
2002-09-26 15:20:15 +00:00
|
|
|
<itemize>
|
|
|
|
<item> 2 AMD Palamino MP XP 2000+ 1.67 GHz CPUs </item>
|
|
|
|
<item> Asus A7M266-D w/LAN Dual DDR </item>
|
|
|
|
<item> 2 Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
|
|
|
|
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE </item>
|
|
|
|
<item> 1 41 GB Maxtor 7200rpm ATA100 HD </item>
|
|
|
|
<item> 1 120 GB Maxtor 5400rpm ATA100 HD </item>
|
|
|
|
<item> 1.44mb floppy drive </item>
|
|
|
|
<item> ATI Expert 2000 Rage 128 32mb </item>
|
|
|
|
<item> IN-WIN P4 300ATX Mid Tower case </item>
|
|
|
|
<item> Intel PCI PRO-100 10/100Mbps network card </item>
|
|
|
|
<item> Enermax P4-430ATX power supply </item>
|
|
|
|
</itemize>
|
|
|
|
|
|
|
|
<p> 1 desktop with the following setup:
|
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 Intel Xeon 1.7 GHz 256K 400FS </item>
|
|
|
|
<item> Supermicro P4DCE Dual Xeon motherboard </item>
|
|
|
|
<item> 4 256mb RAMBUS 184-Pin 800 MHz memory </item>
|
|
|
|
<item> 2 120 GB Maxtor ATA/100 5400 RPM HD </item>
|
|
|
|
<item> 1 60 GB Maxtor ATA/100 7200 RPM HD </item>
|
|
|
|
<item> 52X Asus CD-A520 INT IDE CDROM </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Leadtex 64 MB GF2 MX400 AGP </item>
|
|
|
|
<item> Creative SB LIVE Value PCI 5.1 </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Supermicro SC760 full-tower case with 400W PS </item>
|
|
|
|
</itemize>
|
|
|
|
|
|
|
|
<p> 2 desktops with the following setup:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> 2 AMD K7 1.2g/266 MP Socket A CPU </item>
|
|
|
|
<item> Tyan S2462NG Dual Socket A motherboard </item>
|
|
|
|
<item> 4 256mb PC2100 REG ECC DDR-266Mhz </item>
|
|
|
|
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> 50X Asus CD-A520 INT IDE CDROM </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Chaintech Geforce2 MX200 32mg AGP </item>
|
|
|
|
<item> Creative SB LIVE Value PCI </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
|
|
|
</itemize>
|
2001-07-02 14:29:24 +00:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> 2 desktops with the following setup:
|
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
|
|
|
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> Asus CD-S500 50x CDROM </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Jaton Nvidia TNT2 32mb PCI </item>
|
|
|
|
<item> Creative SB LIVE Value PCI </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> 2 desktops with the following setup:
|
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
|
|
|
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> Mitsumi 8x/4x/32x CDRW </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Jaton Nvidia TNT2 32mb PCI </item>
|
|
|
|
<item> Creative SB LIVE Value PCI </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
2002-09-26 15:20:15 +00:00
|
|
|
<p> 1 desktop with the following setup:
|
2002-04-09 14:00:22 +00:00
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DE6 Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
|
|
|
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE </item>
|
2002-09-26 15:20:15 +00:00
|
|
|
<item> Asus CD-A520 52x CDROM </item>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Asus V7700 64mb GeForce2-GTS AGP video card </item>
|
|
|
|
<item> Creative SB Live Platinum 5.1 sound card </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
|
|
|
</itemize>
|
|
|
|
|
|
|
|
</p>
|
|
|
|
|
2002-09-26 15:20:15 +00:00
|
|
|
<p> 3 desktops with the following setup:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DE6 Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
2003-03-17 22:12:10 +00:00
|
|
|
<item> 3 40 GB Maxtor UDMA/100 7200 RPM hard disk </item>
|
2002-09-26 15:20:15 +00:00
|
|
|
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Asus V7700 64mb GeForce2-GTS AGP video card </item>
|
|
|
|
<item> Creative SB Live Platinum 5.1 sound card </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2003-03-17 22:12:10 +00:00
|
|
|
<sect1> Firewall/gateway hardware
|
|
|
|
|
|
|
|
<p> 1 firewall with the following setup:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> AMD Palamino XP 1700+ 1.47GHz CPU </item>
|
|
|
|
<item> MSI KT3 Ultra2 KT333 MS-6380E motherboard </item>
|
|
|
|
<item> 512 MB PC2100 DDR-266MHz DIMM RAM </item>
|
|
|
|
<item> 40GB Seagate 7200rpm ATA/100 hard disk </item>
|
|
|
|
<item> Asus 52X CD-A520 INT IDE cdrom </item>
|
|
|
|
<item> 1.44 MB floppy drive </item>
|
|
|
|
<item> ATI Expert 2000 Rage 128 32mb video card </item>
|
|
|
|
<item> 4 Intel Pro/1000T Gigabit Server ethernet cards </item>
|
|
|
|
<item> 4U Black Rackmount Steel case </item>
|
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<sect1> Miscellaneous/accessory hardware
|
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
<p> Backup:
|
|
|
|
|
|
|
|
<itemize>
|
2002-11-12 00:21:11 +00:00
|
|
|
<item> 2 Sony 20/40 GB DSS4 SE LVD DAT drives </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> Monitors:
|
|
|
|
|
|
|
|
<itemize>
|
2002-09-26 15:20:15 +00:00
|
|
|
<item> 1 20.1" Viewsonic VP201M LCD monitor </item>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 1 22" Viewsonic P220F 0.25-0.27m monitor </item>
|
|
|
|
<item> 4 21" Sony CPD-G500 .24mm monitor </item>
|
|
|
|
<item> 2 18" Viewsonic VP181 LCD monitor </item>
|
|
|
|
<item> 1 17" Viewsonic VE170 LCD monitor </item>
|
2002-09-26 15:20:15 +00:00
|
|
|
<item> 2 Sun monitors </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
2003-03-17 22:12:10 +00:00
|
|
|
<p> Printers:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> HP colour laserject 4600dn </item>
|
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Putting-it-all-together hardware
|
|
|
|
|
|
|
|
<p> We use KVM switches with a cheap monitor to connect up and "look"
|
|
|
|
at all the machines:
|
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 15" .28dp XLN CTL Monitor </item>
|
|
|
|
<item> 3 Belkin Omniview 16-Port Pro Switches </item>
|
2003-03-17 22:12:10 +00:00
|
|
|
<item> Belkin Omniview 2-Port Switch </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<p> While this is a nice solution, I think it's kind of needless. What
|
|
|
|
we need is a small hand held monitor that can plug into the back of
|
|
|
|
the PC (operated with a stylus, like the Palm). I don't plan to use
|
|
|
|
more monitor switches/KVM cables. </p>
|
|
|
|
|
|
|
|
<p> Networking is important:
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<itemize>
|
2003-03-17 22:12:10 +00:00
|
|
|
<item> 1 Netgear FSM750S 48 port/2 git network switch </item>
|
|
|
|
<item> 1 Netgear FS517TS 16 port/1 git network switch </item>
|
2002-09-26 15:20:15 +00:00
|
|
|
<item> 1 Netgear FS750NA 48 port network switch </item>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 1 Netgear FS524 24 port network switch </item>
|
2002-09-26 15:20:15 +00:00
|
|
|
<item> 1 Cisco Catalyst 3448 XL Enterprise Edition 48 port network switch </item>
|
|
|
|
<item> 1 Netgear ME102NA Wireless Access Point </item>
|
|
|
|
<item> 1 Netgear MA401NA Wireless PCMCIA network card </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Costs
|
|
|
|
|
|
|
|
<p> Our vendor is Hard Drives Northwest (<htmlurl
|
|
|
|
url="http://www.hdnw.com" name="http://www.hdnw.com">). For each
|
|
|
|
compute node in our cluster (containing two processors), we paid about
|
2002-04-09 14:00:22 +00:00
|
|
|
$1500-$2000, including taxes. Generally, our goal is to keep each node to
|
2001-07-02 14:29:24 +00:00
|
|
|
below $2000.00 (which is what our desktop machines cost). </p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Software
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2002-07-22 13:54:26 +00:00
|
|
|
<sect1> Operating system: Linux, of course!
|
2001-07-02 14:29:24 +00:00
|
|
|
|
2003-03-17 22:12:10 +00:00
|
|
|
<p> The following kernels and distributions are what are being used:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> Kernel 2.2.16-22, distribution KRUD 7.0
|
|
|
|
<item> Kernel 2.4.9-7, distribution KRUD 7.2
|
|
|
|
<item> Kernel 2.4.18-10, distribution KRUD 7.3
|
|
|
|
</itemize>
|
|
|
|
|
|
|
|
These distributions work very well for us since updates are sent to us
|
|
|
|
on CD and there's no reliance on an external network connection for
|
|
|
|
updates. They also seem "cleaner" than the regular Red Hat
|
|
|
|
distributions, and the setup is extremely stable. </p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Networking software
|
|
|
|
|
|
|
|
<p> We use Shorewall 1.3.14a ((<htmlurl url="http://www.shorewall.net"
|
|
|
|
name="http://www.shorewall.net">) for the firewall. </p>
|
2002-04-09 14:00:22 +00:00
|
|
|
|
2002-07-22 13:54:26 +00:00
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Parallel processing software
|
|
|
|
|
|
|
|
<p> We use our own software for parallelising applications but have
|
|
|
|
experimented with PVM and MPI. In my view, the overhead for these
|
|
|
|
pre-packaged programs is too high. I recommend writing
|
2002-04-09 14:00:22 +00:00
|
|
|
application-specific code for the tasks you perform (that's one
|
|
|
|
person's view). </p>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
</sect1>
|
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Costs
|
|
|
|
|
2002-07-22 13:54:26 +00:00
|
|
|
<p> Linux and most software that run on Linux are freely
|
|
|
|
copiable. </p>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2002-07-22 13:54:26 +00:00
|
|
|
<sect> Set up, configuration, and maintenance
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Disk configuration
|
|
|
|
|
|
|
|
<p> This section describes disk partitioning strategies. </p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
<tscreen><verb>
|
2001-08-22 14:00:26 +00:00
|
|
|
farm/cluster machines:
|
|
|
|
|
2002-09-26 15:20:15 +00:00
|
|
|
hda1 - swap (2 * RAM)
|
|
|
|
hda2 - / (remaining disk space)
|
|
|
|
hdb1 - /maxa (total disk)
|
2001-08-22 14:00:26 +00:00
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
desktops (without windows):
|
|
|
|
|
2002-09-26 15:20:15 +00:00
|
|
|
hda1 - swap (2 * RAM)
|
|
|
|
hda2 - / (4 GB)
|
|
|
|
hda3 - /spare (remaining disk space)
|
|
|
|
hdb1 - /maxa (total disk)
|
|
|
|
hdd1 - /maxb (total disk)
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
desktops (with windows):
|
|
|
|
|
2002-09-26 15:20:15 +00:00
|
|
|
hda1 - /win (total disk)
|
|
|
|
hdb1 - swap (2 * RAM)
|
|
|
|
hdb2 - / (4 GB)
|
|
|
|
hdb3 - /spare (remaining disk space)
|
|
|
|
hdd1 - /maxa (total disk)
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
laptops (single disk):
|
|
|
|
|
2002-09-26 15:20:15 +00:00
|
|
|
hda1 - /win (half the total disk size)
|
|
|
|
hda2 - swap (2 * RAM)
|
2002-11-12 00:21:11 +00:00
|
|
|
hda3 - / (remaining disk space)
|
2001-07-02 14:29:24 +00:00
|
|
|
</verb></tscreen>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Package configuration
|
|
|
|
|
|
|
|
<p> Install a minimal set of packages for the farm. Users are allowed
|
|
|
|
to configure desktops as they wish. </p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2002-07-22 13:54:26 +00:00
|
|
|
<sect1> Operating system installation and maintenance
|
2001-08-22 14:00:26 +00:00
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2003-03-17 22:12:10 +00:00
|
|
|
<sect2> Cloning and maintenance packages
|
|
|
|
|
|
|
|
<sect3> FAI
|
|
|
|
|
|
|
|
<p> FAI (<htmlurl url="http://www.informatik.uni-koeln.de/fai/"
|
|
|
|
name="http://www.informatik.uni-koeln.de/fai/">) is an automated
|
|
|
|
system to install a Debian GNU/Linux operating system on a PC
|
|
|
|
cluster. You can take one or more virgin PCs, turn on the power and
|
|
|
|
after a few minutes Linux is installed, configured and running on the
|
|
|
|
whole cluster, without any interaction necessary.
|
|
|
|
|
|
|
|
</sect3>
|
|
|
|
|
|
|
|
<sect3> SystemImager
|
|
|
|
|
|
|
|
<p> SystemImager (<htmlurl url="http://systemimager.org"
|
|
|
|
name="http://systemimager.org">) is software that automates Linux
|
|
|
|
installs, software distribution, and production deployment. </p>
|
|
|
|
|
|
|
|
</sect3>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect2> Personal cloning strategy
|
2001-08-22 14:00:26 +00:00
|
|
|
|
|
|
|
<p> I believe in having a completely distributed system. This means
|
|
|
|
each machine contains a copy of the operating system. Installing the
|
|
|
|
OS on each machine manually is cumbersome. To optimise this process,
|
|
|
|
what I do is first set up and install one machine exactly the way I
|
|
|
|
want to. I then create a tar and gzipped file of the entire system
|
|
|
|
and place it on a CD-ROM which I then clone on each machine in my
|
|
|
|
cluster. </p>
|
|
|
|
|
|
|
|
<p> The commands I use to create the tar file are as follows:
|
|
|
|
|
|
|
|
<tscreen><verb>
|
|
|
|
tar -czvlps --same-owner --atime-preserve -f /maxa/slash.tgz /
|
|
|
|
</verb></tscreen>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> I use have a script called <tt>go</tt> that takes a hostname and
|
|
|
|
IP address as its arguments and untars the <tt>slash.tgz</tt> file on
|
|
|
|
the CD-ROM and replaces the hostname and IP address in the appropriate
|
|
|
|
locations. A version of the <tt>go</tt> script and the input files for
|
|
|
|
it can be accessed at: <htmlurl
|
|
|
|
url="http://www.ram.org/computing/linux/cluster/"
|
|
|
|
name="http://www.ram.org/computing/linux/linux/cluster/">. This script
|
|
|
|
will have to be edited based on your cluster design. </p>
|
|
|
|
|
2002-07-22 13:54:26 +00:00
|
|
|
<p> To make this work, I also use Tom's Root Boot package (<htmlurl
|
|
|
|
url="http://www.toms.net/rb/" name="http://www.toms.net/rb/">) to boot
|
2001-08-22 14:00:26 +00:00
|
|
|
the machine and clone the system. The <tt>go</tt> script can be
|
|
|
|
placed on a CD-ROM or on the floppy containing Tom's Root Boot package
|
|
|
|
(you need to delete a few programs from this package since the floppy
|
|
|
|
disk is stretched to capacity). </p>
|
|
|
|
|
|
|
|
<p> More conveniently, you could burn a bootable CD-ROM containing
|
|
|
|
Tom's Root Boot package, including the <tt>go</tt> script, and the tgz
|
|
|
|
file containing the system you wish to clone. You can also edit Tom's
|
|
|
|
Root Boot's init scripts so that it directly executes the <tt>go</tt>
|
|
|
|
script (you will still have to set IP addresses if you don't use
|
|
|
|
DHCP). </p>
|
|
|
|
|
2002-07-22 13:54:26 +00:00
|
|
|
<p> Alternately, you can create your own custom disk (like a rescue
|
|
|
|
disk) that contains the kernel you can want and the tools you
|
|
|
|
want. There are several documents that describe how to do this,
|
|
|
|
including the Linux Bootdisk HOWTO (<htmlurl
|
|
|
|
url="http://www.linuxdoc.org/HOWTO/Bootdisk-HOWTO/"
|
|
|
|
name="http://www.linuxdoc.org/HOWTO/Bootdisk-HOWTO/">), which also
|
|
|
|
contains links to other pre-made boot/root disks. </p>
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<p> Thus you can develop a system where all you have to do is insert a
|
|
|
|
CDROM, turn on the machine, have a cup of coffee (or a can of coke)
|
|
|
|
and come back to see a full clone. You then repeat this process for as
|
|
|
|
many machines as you have. This procedure has worked extremely well
|
|
|
|
for me and if you have someone else actually doing the work (of
|
|
|
|
inserting and removing CD-ROMs) then it's ideal. </p>
|
|
|
|
|
2002-07-22 13:54:26 +00:00
|
|
|
<p> Rob Fantini (<htmlurl url="mailto:rob@fantinibakery.com"
|
|
|
|
name="rob@fantinibakery.com">) has contributed modifications of the
|
|
|
|
scripts above that he used for cloning a Mandrake 8.2 system
|
|
|
|
accessible at <htmlurl
|
2002-04-09 14:00:22 +00:00
|
|
|
url="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz"
|
|
|
|
name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">.
|
|
|
|
</p>
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect2> DHCP vs. hard-coded IP addresses
|
|
|
|
|
|
|
|
<p> If you have DHCP set up, then you don't need to reset the IP
|
|
|
|
address and that part of it can be removed from the <tt>go</tt>
|
|
|
|
script. </p>
|
|
|
|
|
|
|
|
<p> DHCP has the advantage that you don't muck around with IP
|
|
|
|
addresses at all provided the DHCP server is configured
|
|
|
|
appropriately. It has the disadvantage that it relies on a centralised
|
|
|
|
server (and like I said, I tend to distribute things as much as
|
|
|
|
possible). Also, linking hardware ethernet addresses to IP addresses can
|
|
|
|
make it inconvenient if you wish to replace machines or change
|
|
|
|
hostnames routinely. </p>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<sect1> Known hardware issues <label id="known_hardware_issues">
|
|
|
|
|
|
|
|
<p> The hardware in general has worked really well for us. Specific
|
|
|
|
issues are listed below: </p>
|
|
|
|
|
|
|
|
<p> The AMD dual 1.2 GHz machines run really hot. Two of them in a
|
|
|
|
room increase the temperature significantly. Thus while they might be
|
|
|
|
okay as desktops, the cooling and power consumption when using them as
|
|
|
|
part of a large cluster is a consideration. The AMD Palmino
|
2002-07-22 13:54:26 +00:00
|
|
|
configuration described previously seems to work really well, but I
|
|
|
|
definitely recommend getting two fans in the case--this solved all our
|
|
|
|
instability problems. </p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Known software issues <label id="known_software_issues">
|
|
|
|
|
|
|
|
<p> Some tar executables apparently don't create a tar file the nice
|
|
|
|
way they're supposed to (especially in terms of referencing and
|
|
|
|
de-referencing symbolic links). The solution to this I've found is to
|
|
|
|
use a tar executable that does, like the one from RedHat 7.0. </p>
|
2002-04-09 14:00:22 +00:00
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Performing tasks on the cluster
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<p> This section is still being developed as the usage on my cluster
|
|
|
|
evolves, but so far we tend to write our own sets of message passing
|
|
|
|
routines to communicate between processes on different machines. </p>
|
|
|
|
|
|
|
|
<p> Many applications, particularly in the computational genomics
|
|
|
|
areas, are massively and trivially parallelisable, meaning that
|
|
|
|
perfect distribution can be achieved by spreading tasks equally across
|
|
|
|
the machines (for example, when analysing a whole genome using a
|
2003-03-17 22:12:10 +00:00
|
|
|
technique that operates on a single gene/protein, each processor can
|
|
|
|
work on one gene/protein at a time independent of all the other
|
2002-07-22 13:54:26 +00:00
|
|
|
processors). </p>
|
2001-08-22 14:00:26 +00:00
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> So far we have not found the need to use a professional queueing
|
2001-08-22 14:00:26 +00:00
|
|
|
system, but obviously that is highly dependent on the type of
|
|
|
|
applications you wish to run. </p>
|
2002-04-09 14:00:22 +00:00
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Rough benchmarks
|
|
|
|
|
2002-09-26 15:20:15 +00:00
|
|
|
<p> For the single most important program we run (our <it>ab
|
|
|
|
initio</it> protein folding simulation program), using the Pentium 3 1
|
2002-11-12 00:21:11 +00:00
|
|
|
GHz processor machine as a frame of reference, the Athlon 1.2 GHz
|
|
|
|
processor machine is about 16% faster on average, the Xeon 1.7 GHz
|
|
|
|
machine is about 25-32% faster on average, the Athlon 1.5 GHz
|
2002-09-26 15:20:15 +00:00
|
|
|
processor is about 38% faster on average, and the Athlon 1.7 GHz
|
|
|
|
processor is about 46% faster on average (yes, the Athlon 1.5 GHz is
|
|
|
|
faster than the Xeon 1.7 GHz since the Xeon executes only six
|
|
|
|
instructions per clock (IPC) whereas the Athlon executes nine IPC (you
|
2002-11-12 00:21:11 +00:00
|
|
|
do the math!)). </p>
|
2002-04-09 14:00:22 +00:00
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Uptimes
|
|
|
|
|
|
|
|
<p> These machines are incredibly stable both in terms of hardware and
|
|
|
|
software once they have been debugged (usually some in a new batch of
|
2002-07-22 13:54:26 +00:00
|
|
|
machines have hardware problems), running constantly under very heavy
|
|
|
|
loads. One example is given below. Reboots have generally occurred
|
|
|
|
when a circuit breaker is tripped.
|
2002-04-09 14:00:22 +00:00
|
|
|
|
|
|
|
<tscreen><verb>
|
2003-03-17 22:12:10 +00:00
|
|
|
2:29pm up 495 days, 1:04, 2 users, load average: 4.85, 7.15, 7.72
|
2002-04-09 14:00:22 +00:00
|
|
|
</verb></tscreen>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
2001-08-22 14:00:26 +00:00
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Acknowledgements
|
|
|
|
|
|
|
|
<p> The following people have been helpful in getting this HOWTO
|
|
|
|
done:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> Michael Levitt (<htmlurl url="mailto:michael.levitt@stanford.edu" name="Michael Levitt">)
|
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Bibliography <label id="references">
|
|
|
|
|
|
|
|
<p> The following documents may prove useful to you---they are links
|
|
|
|
to sources that make use of high-performance computing clusters:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> <url url="http://www.ram.org/computing/rambin/rambin.html"
|
|
|
|
name="RAMBIN web page">
|
|
|
|
|
|
|
|
<item> <url url="http://www.ram.org/computing/ramp/ramp.html"
|
|
|
|
name="RAMP web page">
|
|
|
|
|
|
|
|
<item> <url url="http://www.ram.org/research/research.html" name="Ram
|
|
|
|
Samudrala's research page (which describes the kind of research done with these clusters)">
|
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
</article>
|
2002-07-22 13:54:26 +00:00
|
|
|
|