2001-07-02 14:29:24 +00:00
|
|
|
<!doctype Linuxdoc system>
|
|
|
|
|
|
|
|
<article>
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<title> Linux Cluster HOWTO </title>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
|
2002-04-09 14:00:22 +00:00
|
|
|
<date> v0.92, April 8, 2002 </date>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<abstract>
|
|
|
|
How to set up high-performance Linux computing clusters.
|
|
|
|
</abstract>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<toc>
|
|
|
|
|
|
|
|
<sect> Introduction
|
|
|
|
|
|
|
|
<p> This document describes how I set up my Linux computing clusters
|
|
|
|
for high-performance computing which I need for <htmlurl
|
|
|
|
url="http://compbio.washington.edu" name="my research">. </p>
|
|
|
|
|
|
|
|
<p> Use the information below at your own risk. I disclaim all
|
|
|
|
responsibility for anything you may do after reading this HOWTO. The
|
|
|
|
latest version of this HOWTO will always be available at <htmlurl
|
|
|
|
url="http://www.ram.org/computing/linux/linux_cluster.html"
|
|
|
|
name="http://www.ram.org/computing/linux/linux_cluster.html">. </p>
|
|
|
|
|
|
|
|
<p> Unlike other documentation that talks about setting up clusters in
|
|
|
|
a general way, this is a specific description of how our lab is setup
|
|
|
|
and includes not only details the compute aspects, but also the
|
|
|
|
desktop, laptop, and public server aspects. This is done mainly for
|
2001-08-22 14:00:26 +00:00
|
|
|
local use, but I put it up on the web since I received several e-mail
|
|
|
|
messages based on my newsgroup query requesting the same information.
|
2002-04-09 14:00:22 +00:00
|
|
|
Even today, as I plan another 64-node cluster, I find there is a
|
|
|
|
dearth of information about exactly how to assemble components to form
|
|
|
|
a node that works reliably under Linux. The main use of this HOWTO as
|
|
|
|
it stands is that it's a report on what kind of hardware works well
|
|
|
|
with Linux and what kind of hardware doesn't. </p>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Hardware
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> This section covers the hardware choices I've made. Unless noted
|
|
|
|
in the <ref id="known_hardware_issues" name="known hardware issues">
|
|
|
|
section, assume that everything works <it>really</it> well. </p>
|
2001-08-22 14:00:26 +00:00
|
|
|
|
|
|
|
<p> Hardware installation is also fairly straight-forward unless
|
2002-04-09 14:00:22 +00:00
|
|
|
otherwise noted, with most of the details covered by the manuals. For
|
|
|
|
each section, the hardware is listed in the order of purchase (most
|
|
|
|
recent is listed first). </p>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Node hardware
|
|
|
|
|
|
|
|
<p> 32 machines have the following setup each:
|
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 AMD Palamino MP XP 1800+ 1.53 GHz CPUs </item>
|
|
|
|
<item> Tyan S2460 Dual Socket-A/MP motherboard </item>
|
|
|
|
<item> Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
|
|
|
|
<item> 1 20 GB Maxtor UDMA/100 7200rpm HD </item>
|
|
|
|
<item> 1 120 GB Maxtor 5400rpm ATA100 HD </item>
|
|
|
|
<item> Asus CD-A520 52x CDROM </item>
|
|
|
|
<item> 1.44mb floppy drive </item>
|
|
|
|
<item> ATI Expert 98 8mb AGP video card </item>
|
|
|
|
<item> IN-WIN P4 300ATX Mid Tower case </item>
|
|
|
|
<item> Intel PCI PRO-100 10/100Mbps network card </item>
|
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> 32 machines have the following setup each:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 2 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
|
|
|
<item> 1 20 GB Maxtor ATA/66 5400 RPM HD </item>
|
|
|
|
<item> 1 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> Asus CD-S500 50x CDROM </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> ATI Expert 98 8 MB PCI video card </item>
|
|
|
|
<item> IN-WIN P4 300ATX Mid Tower case </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Server hardware
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> 1 server for external use (dissemination of information) with the
|
|
|
|
following setup:
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 2 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
|
|
|
<item> 1 20 GB Maxtor ATA/66 5400 RPM HD </item>
|
|
|
|
<item> 2 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> Asus CD-S500 50x CDROM </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> ATI Expert 98 8 MB PCI video card </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
2001-08-22 14:00:26 +00:00
|
|
|
</p>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Desktop hardware
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> 1 desktop with the following setup:
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 Intel Xeon 1.7 GHz 256K 400FS </item>
|
|
|
|
<item> Supermicro P4DCE Dual Xeon motherboard </item>
|
|
|
|
<item> 4 256mb RAMBUS 184-Pin 800 MHz memory </item>
|
|
|
|
<item> 2 120 GB Maxtor ATA/100 5400 RPM HD </item>
|
|
|
|
<item> 1 60 GB Maxtor ATA/100 7200 RPM HD </item>
|
|
|
|
<item> 52X Asus CD-A520 INT IDE CDROM </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Leadtex 64 MB GF2 MX400 AGP </item>
|
|
|
|
<item> Creative SB LIVE Value PCI 5.1 </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Supermicro SC760 full-tower case with 400W PS </item>
|
|
|
|
</itemize>
|
|
|
|
|
|
|
|
<p> 2 desktops with the following setup:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> 2 AMD K7 1.2g/266 MP Socket A CPU </item>
|
|
|
|
<item> Tyan S2462NG Dual Socket A motherboard </item>
|
|
|
|
<item> 4 256mb PC2100 REG ECC DDR-266Mhz </item>
|
|
|
|
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> 50X Asus CD-A520 INT IDE CDROM </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Chaintech Geforce2 MX200 32mg AGP </item>
|
|
|
|
<item> Creative SB LIVE Value PCI </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
|
|
|
</itemize>
|
2001-07-02 14:29:24 +00:00
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> 2 desktops with the following setup:
|
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
|
|
|
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> Asus CD-S500 50x CDROM </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Jaton Nvidia TNT2 32mb PCI </item>
|
|
|
|
<item> Creative SB LIVE Value PCI </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> 2 desktops with the following setup:
|
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
|
|
|
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> Mitsumi 8x/4x/32x CDRW </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Jaton Nvidia TNT2 32mb PCI </item>
|
|
|
|
<item> Creative SB LIVE Value PCI </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> 4 desktops with the following setup:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> 2 Pentium III 1 GHz Intel CPUs </item>
|
|
|
|
<item> Supermicro 370 DE6 Dual PIII-FCPGA motherboard </item>
|
|
|
|
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
|
|
|
|
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
|
|
|
|
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE </item>
|
|
|
|
<item> 1.4 MB floppy drive </item>
|
|
|
|
<item> Asus V7700 64mb GeForce2-GTS AGP video card </item>
|
|
|
|
<item> Creative SB Live Platinum 5.1 sound card </item>
|
|
|
|
<item> Microsoft Natural Keyboard </item>
|
|
|
|
<item> Microsoft Intellimouse Explorer </item>
|
|
|
|
<item> Full-tower case with 300W PS </item>
|
|
|
|
</itemize>
|
|
|
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Miscellaneous/accessory hardware
|
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
<p> Backup:
|
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 2 Sony 20/40 GB DSS4 SE LVD DAT </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> Monitors:
|
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 1 22" Viewsonic P220F 0.25-0.27m monitor </item>
|
|
|
|
<item> 4 21" Sony CPD-G500 .24mm monitor </item>
|
|
|
|
<item> 2 18" Viewsonic VP181 LCD monitor </item>
|
|
|
|
<item> 1 17" Viewsonic VE170 LCD monitor </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Putting-it-all-together hardware
|
|
|
|
|
|
|
|
<p> We use KVM switches with a cheap monitor to connect up and "look"
|
|
|
|
at all the machines:
|
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 15" .28dp XLN CTL Monitor </item>
|
|
|
|
<item> 3 Belkin Omniview 16-Port Pro Switches </item>
|
|
|
|
<item> 40 KVM cables </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<p> While this is a nice solution, I think it's kind of needless. What
|
|
|
|
we need is a small hand held monitor that can plug into the back of
|
|
|
|
the PC (operated with a stylus, like the Palm). I don't plan to use
|
|
|
|
more monitor switches/KVM cables. </p>
|
|
|
|
|
|
|
|
<p> Networking is important:
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<itemize>
|
2002-04-09 14:00:22 +00:00
|
|
|
<item> 1 Cisco Catalyst 3448 XL Enterprise Edition 48 port network switch. </item>
|
|
|
|
<item> 1 Netgear FS524 24 port network switch </item>
|
2001-07-02 14:29:24 +00:00
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Costs
|
|
|
|
|
|
|
|
<p> Our vendor is Hard Drives Northwest (<htmlurl
|
|
|
|
url="http://www.hdnw.com" name="http://www.hdnw.com">). For each
|
|
|
|
compute node in our cluster (containing two processors), we paid about
|
2002-04-09 14:00:22 +00:00
|
|
|
$1500-$2000, including taxes. Generally, our goal is to keep each node to
|
2001-07-02 14:29:24 +00:00
|
|
|
below $2000.00 (which is what our desktop machines cost). </p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Software
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Linux, of course!
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> We use Linux systems with a 2.4.9-7 kernel based on the KRUD 7.2
|
|
|
|
distribution, and 2.2.17-14 kernel based on the KRUD 7.0
|
|
|
|
distribution. These distributions work very well for us since updates
|
|
|
|
are sent to us on CD and there's no reliance on an external network
|
|
|
|
connection for updates. They also seem "cleaner" than the regular Red
|
|
|
|
Hat distributions. </p>
|
|
|
|
|
|
|
|
<p> We use our own software for parallelising applications
|
|
|
|
but have experimented with PVM and MPI. In my view, the overhead for
|
|
|
|
these pre-packaged programs is too high. I recommend writing
|
|
|
|
application-specific code for the tasks you perform (that's one
|
|
|
|
person's view). </p>
|
2001-07-02 14:29:24 +00:00
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
</sect1>
|
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Costs
|
|
|
|
|
|
|
|
<p> Linux is freely copiable. </p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<sect> Set up and configuration
|
2001-07-02 14:29:24 +00:00
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Disk configuration
|
|
|
|
|
|
|
|
<p> This section describes disk partitioning strategies. </p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
<tscreen><verb>
|
2001-08-22 14:00:26 +00:00
|
|
|
farm/cluster machines:
|
|
|
|
|
|
|
|
hda1 - swap (2 * RAM)
|
|
|
|
hda2 - / (remaining disk space)
|
|
|
|
hdb1 - /maxa (total disk)
|
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
desktops (without windows):
|
|
|
|
|
|
|
|
hda1 - swap (2 * RAM)
|
|
|
|
hda2 - / (4 GB)
|
|
|
|
hda3 - /home (remaining disk space)
|
|
|
|
hdb1 - /maxa (total disk)
|
|
|
|
hdd1 - /maxb (total disk)
|
|
|
|
|
|
|
|
desktops (with windows):
|
|
|
|
|
|
|
|
hda1 - /win (total disk)
|
|
|
|
hdb1 - swap (2 * RAM)
|
|
|
|
hdb2 - / (4 GB)
|
|
|
|
hdb3 - /home (remaining disk space)
|
|
|
|
hdd1 - /maxa (total disk)
|
|
|
|
|
|
|
|
laptops (single disk):
|
|
|
|
|
|
|
|
hda1 - /win (half the total disk size)
|
|
|
|
hda2 - swap (2 * RAM)
|
|
|
|
hda3 - / (4 GB)
|
|
|
|
hda4 - /home (remaining disk space)
|
|
|
|
</verb></tscreen>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Package configuration
|
|
|
|
|
|
|
|
<p> Install a minimal set of packages for the farm. Users are allowed
|
|
|
|
to configure desktops as they wish. </p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Operating system installation
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect2> Cloning
|
|
|
|
|
|
|
|
<p> I believe in having a completely distributed system. This means
|
|
|
|
each machine contains a copy of the operating system. Installing the
|
|
|
|
OS on each machine manually is cumbersome. To optimise this process,
|
|
|
|
what I do is first set up and install one machine exactly the way I
|
|
|
|
want to. I then create a tar and gzipped file of the entire system
|
|
|
|
and place it on a CD-ROM which I then clone on each machine in my
|
|
|
|
cluster. </p>
|
|
|
|
|
|
|
|
<p> The commands I use to create the tar file are as follows:
|
|
|
|
|
|
|
|
<tscreen><verb>
|
|
|
|
tar -czvlps --same-owner --atime-preserve -f /maxa/slash.tgz /
|
|
|
|
</verb></tscreen>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p> I use have a script called <tt>go</tt> that takes a hostname and
|
|
|
|
IP address as its arguments and untars the <tt>slash.tgz</tt> file on
|
|
|
|
the CD-ROM and replaces the hostname and IP address in the appropriate
|
|
|
|
locations. A version of the <tt>go</tt> script and the input files for
|
|
|
|
it can be accessed at: <htmlurl
|
|
|
|
url="http://www.ram.org/computing/linux/cluster/"
|
|
|
|
name="http://www.ram.org/computing/linux/linux/cluster/">. This script
|
|
|
|
will have to be edited based on your cluster design. </p>
|
|
|
|
|
|
|
|
<p> To make this work, I also use Tom's Root Boot package <htmlurl
|
|
|
|
url="http://www.toms.net/rb/" name="http://www.toms.net/rb/"> to boot
|
|
|
|
the machine and clone the system. The <tt>go</tt> script can be
|
|
|
|
placed on a CD-ROM or on the floppy containing Tom's Root Boot package
|
|
|
|
(you need to delete a few programs from this package since the floppy
|
|
|
|
disk is stretched to capacity). </p>
|
|
|
|
|
|
|
|
<p> More conveniently, you could burn a bootable CD-ROM containing
|
|
|
|
Tom's Root Boot package, including the <tt>go</tt> script, and the tgz
|
|
|
|
file containing the system you wish to clone. You can also edit Tom's
|
|
|
|
Root Boot's init scripts so that it directly executes the <tt>go</tt>
|
|
|
|
script (you will still have to set IP addresses if you don't use
|
|
|
|
DHCP). </p>
|
|
|
|
|
|
|
|
<p> Thus you can develop a system where all you have to do is insert a
|
|
|
|
CDROM, turn on the machine, have a cup of coffee (or a can of coke)
|
|
|
|
and come back to see a full clone. You then repeat this process for as
|
|
|
|
many machines as you have. This procedure has worked extremely well
|
|
|
|
for me and if you have someone else actually doing the work (of
|
|
|
|
inserting and removing CD-ROMs) then it's ideal. </p>
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> <htmlurl url="mailto:rob@fantinibakery.com" name="Rob Fantini">
|
|
|
|
has contributed modifications of the scripts above that he used for
|
|
|
|
cloning a Mandrake 8.2 system accessible at <htmlurl
|
|
|
|
url="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz"
|
|
|
|
name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">.
|
|
|
|
</p>
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect2> DHCP vs. hard-coded IP addresses
|
|
|
|
|
|
|
|
<p> If you have DHCP set up, then you don't need to reset the IP
|
|
|
|
address and that part of it can be removed from the <tt>go</tt>
|
|
|
|
script. </p>
|
|
|
|
|
|
|
|
<p> DHCP has the advantage that you don't muck around with IP
|
|
|
|
addresses at all provided the DHCP server is configured
|
|
|
|
appropriately. It has the disadvantage that it relies on a centralised
|
|
|
|
server (and like I said, I tend to distribute things as much as
|
|
|
|
possible). Also, linking hardware ethernet addresses to IP addresses can
|
|
|
|
make it inconvenient if you wish to replace machines or change
|
|
|
|
hostnames routinely. </p>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<sect1> Known hardware issues <label id="known_hardware_issues">
|
|
|
|
|
|
|
|
<p> The hardware in general has worked really well for us. Specific
|
|
|
|
issues are listed below: </p>
|
|
|
|
|
|
|
|
<p> The AMD dual 1.2 GHz machines run really hot. Two of them in a
|
|
|
|
room increase the temperature significantly. Thus while they might be
|
|
|
|
okay as desktops, the cooling and power consumption when using them as
|
|
|
|
part of a large cluster is a consideration. The AMD Palmino
|
|
|
|
configuration described previously seems to work really well. </p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Performing tasks on the cluster
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
2001-08-22 14:00:26 +00:00
|
|
|
<p> This section is still being developed as the usage on my cluster
|
|
|
|
evolves, but so far we tend to write our own sets of message passing
|
|
|
|
routines to communicate between processes on different machines. </p>
|
|
|
|
|
|
|
|
<p> Many applications, particularly in the computational genomics
|
|
|
|
areas, are massively and trivially parallelisable, meaning that
|
|
|
|
perfect distribution can be achieved by spreading tasks equally across
|
|
|
|
the machines (for example, when analysing a whole genome using a
|
|
|
|
single gene technique, each processor can work on one gene at a time
|
|
|
|
independent of all the other processors). </p>
|
|
|
|
|
2002-04-09 14:00:22 +00:00
|
|
|
<p> So far we have not found the need to use a professional queueing
|
2001-08-22 14:00:26 +00:00
|
|
|
system, but obviously that is highly dependent on the type of
|
|
|
|
applications you wish to run. </p>
|
2002-04-09 14:00:22 +00:00
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Rough benchmarks
|
|
|
|
|
|
|
|
<p> For the single most important program we run (our ab initio
|
|
|
|
protein folding simulation program), using the Pentium 3 1 GHz
|
|
|
|
processor machine as a reference frame, the Athlon 1.2 GHz processor
|
|
|
|
machine is about 16% faster on average, the Pentium 4 1.7 GHz machine
|
|
|
|
is about 25-32% faster on average, and the Athlon 1.5 GHz processor is
|
|
|
|
about 80% faster on average (yes, the Athlon 1.5 GHz is faster than
|
|
|
|
the Xeon 1.7 GHz since the Xeon executes only six instructions per
|
|
|
|
clock (IPC) whereas the Athlon executes nine IPC (you do the math!)).
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect1> Uptimes
|
|
|
|
|
|
|
|
<p> These machines are incredibly stable both in terms of hardware and
|
|
|
|
software once they have been debugged (usually some in a new batch of
|
|
|
|
machines have hardware problems). Reboots have generally occurred
|
|
|
|
when a circuit breaker is tripped. The first machine I installed has
|
|
|
|
been up since its birth!
|
|
|
|
|
|
|
|
<tscreen><verb>
|
|
|
|
~ ram@fp1 % uptime
|
|
|
|
4:49am up 374 days, 2:47, 1 user, load average: 2.08, 2.02, 2.01
|
|
|
|
</verb></tscreen>
|
|
|
|
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
2001-08-22 14:00:26 +00:00
|
|
|
|
2001-07-02 14:29:24 +00:00
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Acknowledgements
|
|
|
|
|
|
|
|
<p> The following people have been helpful in getting this HOWTO
|
|
|
|
done:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> Michael Levitt (<htmlurl url="mailto:michael.levitt@stanford.edu" name="Michael Levitt">)
|
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
<sect> Bibliography <label id="references">
|
|
|
|
|
|
|
|
<p> The following documents may prove useful to you---they are links
|
|
|
|
to sources that make use of high-performance computing clusters:
|
|
|
|
|
|
|
|
<itemize>
|
|
|
|
<item> <url url="http://www.ram.org/computing/rambin/rambin.html"
|
|
|
|
name="RAMBIN web page">
|
|
|
|
|
|
|
|
<item> <url url="http://www.ram.org/computing/ramp/ramp.html"
|
|
|
|
name="RAMP web page">
|
|
|
|
|
|
|
|
<item> <url url="http://www.ram.org/research/research.html" name="Ram
|
|
|
|
Samudrala's research page (which describes the kind of research done with these clusters)">
|
|
|
|
</itemize>
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</sect>
|
|
|
|
|
|
|
|
<!-- ************************************************************* -->
|
|
|
|
|
|
|
|
</article>
|