LDP/LDP/howto/linuxdoc/Cluster-HOWTO.sgml

726 lines
24 KiB
Plaintext
Raw Normal View History

2001-07-02 14:29:24 +00:00
<!doctype Linuxdoc system>
<article>
2001-08-22 14:00:26 +00:00
<title> Linux Cluster HOWTO </title>
2001-07-02 14:29:24 +00:00
<author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
2003-03-17 22:12:10 +00:00
<date> v1.0, March 17, 2003 </date>
2001-07-02 14:29:24 +00:00
<abstract>
How to set up high-performance Linux computing clusters.
</abstract>
<!-- ************************************************************* -->
<toc>
<sect> Introduction
<p> This document describes how I set up my Linux computing clusters
for high-performance computing which I need for <htmlurl
url="http://compbio.washington.edu" name="my research">. </p>
<p> Use the information below at your own risk. I disclaim all
responsibility for anything you may do after reading this HOWTO. The
latest version of this HOWTO will always be available at <htmlurl
url="http://www.ram.org/computing/linux/linux_cluster.html"
name="http://www.ram.org/computing/linux/linux_cluster.html">. </p>
<p> Unlike other documentation that talks about setting up clusters in
a general way, this is a specific description of how our lab is setup
and includes not only details the compute aspects, but also the
desktop, laptop, and public server aspects. This is done mainly for
2001-08-22 14:00:26 +00:00
local use, but I put it up on the web since I received several e-mail
messages based on my newsgroup query requesting the same information.
2003-03-17 22:12:10 +00:00
Even today, as I plan another 64-node cluster, I find that there is a
2002-04-09 14:00:22 +00:00
dearth of information about exactly how to assemble components to form
2003-03-17 22:12:10 +00:00
a node that works reliably under Linux that includes information not
only about the compute nodes, but about hardware that needs to work
well with the nodes for productive research to happen. The main use
of this HOWTO as it stands is that it's a report on what kind of
hardware works well with Linux and what kind of hardware doesn't. </p>
2001-07-02 14:29:24 +00:00
</sect>
<!-- ************************************************************* -->
<!-- ************************************************************* -->
<sect> Hardware
2002-04-09 14:00:22 +00:00
<p> This section covers the hardware choices I've made. Unless noted
in the <ref id="known_hardware_issues" name="known hardware issues">
section, assume that everything works <it>really</it> well. </p>
2001-08-22 14:00:26 +00:00
<p> Hardware installation is also fairly straight-forward unless
2002-04-09 14:00:22 +00:00
otherwise noted, with most of the details covered by the manuals. For
each section, the hardware is listed in the order of purchase (most
recent is listed first). </p>
2001-07-02 14:29:24 +00:00
<!-- ************************************************************* -->
<sect1> Node hardware
<p> 32 machines have the following setup each:
2002-09-26 15:20:15 +00:00
<itemize>
<item> 2 AMD Palamino MP XP 2000+ 1.67 GHz CPUs </item>
<item> Asus A7M266-D w/LAN Dual DDR </item>
<item> 2 Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
<item> 1 41 GB Maxtor 7200rpm ATA100 HD </item>
<item> 1 120 GB Maxtor 5400rpm ATA100 HD </item>
<item> Asus CD-A520 52x CDROM </item>
<item> 1.44mb floppy drive </item>
<item> ATI Expert 2000 Rage 128 32mb </item>
<item> IN-WIN P4 300ATX Mid Tower case </item>
<item> Enermax P4-430ATX power supply </item>
</itemize>
</p>
<p> 32 machines have the following setup each:
2001-07-02 14:29:24 +00:00
<itemize>
2002-04-09 14:00:22 +00:00
<item> 2 AMD Palamino MP XP 1800+ 1.53 GHz CPUs </item>
<item> Tyan S2460 Dual Socket-A/MP motherboard </item>
<item> Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
<item> 1 20 GB Maxtor UDMA/100 7200rpm HD </item>
<item> 1 120 GB Maxtor 5400rpm ATA100 HD </item>
<item> Asus CD-A520 52x CDROM </item>
<item> 1.44mb floppy drive </item>
<item> ATI Expert 98 8mb AGP video card </item>
<item> IN-WIN P4 300ATX Mid Tower case </item>
<item> Intel PCI PRO-100 10/100Mbps network card </item>
2002-09-26 15:20:15 +00:00
<item> Enermax P4-430ATX power supply </item>
2002-04-09 14:00:22 +00:00
</itemize>
</p>
<p> 32 machines have the following setup each:
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
<item> 2 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 1 20 GB Maxtor ATA/66 5400 RPM HD </item>
<item> 1 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> Asus CD-S500 50x CDROM </item>
<item> 1.4 MB floppy drive </item>
<item> ATI Expert 98 8 MB PCI video card </item>
<item> IN-WIN P4 300ATX Mid Tower case </item>
2001-07-02 14:29:24 +00:00
</itemize>
</p>
</sect1>
<!-- ************************************************************* -->
<sect1> Server hardware
2002-04-09 14:00:22 +00:00
<p> 1 server for external use (dissemination of information) with the
following setup:
2001-07-02 14:29:24 +00:00
<itemize>
2002-09-26 15:20:15 +00:00
<item> 2 AMD Palamino MP XP 2000+ 1.67 GHz CPUs </item>
<item> Asus A7M266-D w/LAN Dual DDR </item>
<item> 4 Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
<item> Asus CD-A520 52x CDROM </item>
<item> 1 41 GB Maxtor 7200rpm ATA100 HD </item>
<item> 6 120 GB Maxtor 5400rpm ATA100 HD </item>
<item> 1.44mb floppy drive </item>
<item> ATI Expert 2000 Rage 128 32mb </item>
<item> IN-WIN P4 300ATX Mid Tower case </item>
<item> Enermax P4-430ATX power supply </item>
2001-07-02 14:29:24 +00:00
</itemize>
2001-08-22 14:00:26 +00:00
</p>
2001-07-02 14:29:24 +00:00
</sect1>
<!-- ************************************************************* -->
<sect1> Desktop hardware
2002-04-09 14:00:22 +00:00
<p> 1 desktop with the following setup:
2001-07-02 14:29:24 +00:00
2002-09-26 15:20:15 +00:00
<itemize>
<item> 2 AMD Palamino MP XP 2000+ 1.67 GHz CPUs </item>
<item> Asus A7M266-D w/LAN Dual DDR </item>
<item> 2 Kingston 512mb PC2100 DDR-266MHz REG ECC RAM </item>
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE </item>
<item> 1 41 GB Maxtor 7200rpm ATA100 HD </item>
<item> 1 120 GB Maxtor 5400rpm ATA100 HD </item>
<item> 1.44mb floppy drive </item>
<item> ATI Expert 2000 Rage 128 32mb </item>
<item> IN-WIN P4 300ATX Mid Tower case </item>
<item> Intel PCI PRO-100 10/100Mbps network card </item>
<item> Enermax P4-430ATX power supply </item>
</itemize>
<p> 1 desktop with the following setup:
2001-07-02 14:29:24 +00:00
<itemize>
2002-04-09 14:00:22 +00:00
<item> 2 Intel Xeon 1.7 GHz 256K 400FS </item>
<item> Supermicro P4DCE Dual Xeon motherboard </item>
<item> 4 256mb RAMBUS 184-Pin 800 MHz memory </item>
<item> 2 120 GB Maxtor ATA/100 5400 RPM HD </item>
<item> 1 60 GB Maxtor ATA/100 7200 RPM HD </item>
<item> 52X Asus CD-A520 INT IDE CDROM </item>
<item> 1.4 MB floppy drive </item>
<item> Leadtex 64 MB GF2 MX400 AGP </item>
<item> Creative SB LIVE Value PCI 5.1 </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Supermicro SC760 full-tower case with 400W PS </item>
</itemize>
<p> 2 desktops with the following setup:
<itemize>
<item> 2 AMD K7 1.2g/266 MP Socket A CPU </item>
<item> Tyan S2462NG Dual Socket A motherboard </item>
<item> 4 256mb PC2100 REG ECC DDR-266Mhz </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> 50X Asus CD-A520 INT IDE CDROM </item>
<item> 1.4 MB floppy drive </item>
<item> Chaintech Geforce2 MX200 32mg AGP </item>
<item> Creative SB LIVE Value PCI </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Full-tower case with 300W PS </item>
</itemize>
2001-07-02 14:29:24 +00:00
</p>
<p> 2 desktops with the following setup:
<itemize>
2002-04-09 14:00:22 +00:00
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> Asus CD-S500 50x CDROM </item>
<item> 1.4 MB floppy drive </item>
<item> Jaton Nvidia TNT2 32mb PCI </item>
<item> Creative SB LIVE Value PCI </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Full-tower case with 300W PS </item>
2001-07-02 14:29:24 +00:00
</itemize>
</p>
<p> 2 desktops with the following setup:
<itemize>
2002-04-09 14:00:22 +00:00
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DLE Dual PIII-FCPGA motherboard </item>
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> Mitsumi 8x/4x/32x CDRW </item>
<item> 1.4 MB floppy drive </item>
<item> Jaton Nvidia TNT2 32mb PCI </item>
<item> Creative SB LIVE Value PCI </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Full-tower case with 300W PS </item>
2001-07-02 14:29:24 +00:00
</itemize>
</p>
2002-09-26 15:20:15 +00:00
<p> 1 desktop with the following setup:
2002-04-09 14:00:22 +00:00
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DE6 Dual PIII-FCPGA motherboard </item>
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
<item> 3 40 GB Maxtor UDMA/100 7200 RPM HD </item>
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE </item>
2002-09-26 15:20:15 +00:00
<item> Asus CD-A520 52x CDROM </item>
2002-04-09 14:00:22 +00:00
<item> 1.4 MB floppy drive </item>
<item> Asus V7700 64mb GeForce2-GTS AGP video card </item>
<item> Creative SB Live Platinum 5.1 sound card </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Full-tower case with 300W PS </item>
</itemize>
</p>
2002-09-26 15:20:15 +00:00
<p> 3 desktops with the following setup:
<itemize>
<item> 2 Pentium III 1 GHz Intel CPUs </item>
<item> Supermicro 370 DE6 Dual PIII-FCPGA motherboard </item>
<item> 4 256 MB 168-pin PC133 Registered ECC Micron RAM </item>
2003-03-17 22:12:10 +00:00
<item> 3 40 GB Maxtor UDMA/100 7200 RPM hard disk </item>
2002-09-26 15:20:15 +00:00
<item> Ricoh 32x12x10 CDRW/DVD Combo EIDE </item>
<item> 1.4 MB floppy drive </item>
<item> Asus V7700 64mb GeForce2-GTS AGP video card </item>
<item> Creative SB Live Platinum 5.1 sound card </item>
<item> Microsoft Natural Keyboard </item>
<item> Microsoft Intellimouse Explorer </item>
<item> Full-tower case with 300W PS </item>
</itemize>
</p>
2002-04-09 14:00:22 +00:00
</sect1>
<!-- ************************************************************* -->
2003-03-17 22:12:10 +00:00
<sect1> Firewall/gateway hardware
<p> 1 firewall with the following setup:
<itemize>
<item> AMD Palamino XP 1700+ 1.47GHz CPU </item>
<item> MSI KT3 Ultra2 KT333 MS-6380E motherboard </item>
<item> 512 MB PC2100 DDR-266MHz DIMM RAM </item>
<item> 40GB Seagate 7200rpm ATA/100 hard disk </item>
<item> Asus 52X CD-A520 INT IDE cdrom </item>
<item> 1.44 MB floppy drive </item>
<item> ATI Expert 2000 Rage 128 32mb video card </item>
<item> 4 Intel Pro/1000T Gigabit Server ethernet cards </item>
<item> 4U Black Rackmount Steel case </item>
</itemize>
</p>
</sect1>
<!-- ************************************************************* -->
2002-04-09 14:00:22 +00:00
<sect1> Miscellaneous/accessory hardware
2001-07-02 14:29:24 +00:00
<p> Backup:
<itemize>
2002-11-12 00:21:11 +00:00
<item> 2 Sony 20/40 GB DSS4 SE LVD DAT drives </item>
2001-07-02 14:29:24 +00:00
</itemize>
</p>
<p> Monitors:
<itemize>
2002-09-26 15:20:15 +00:00
<item> 1 20.1" Viewsonic VP201M LCD monitor </item>
2002-04-09 14:00:22 +00:00
<item> 1 22" Viewsonic P220F 0.25-0.27m monitor </item>
<item> 4 21" Sony CPD-G500 .24mm monitor </item>
<item> 2 18" Viewsonic VP181 LCD monitor </item>
<item> 1 17" Viewsonic VE170 LCD monitor </item>
2002-09-26 15:20:15 +00:00
<item> 2 Sun monitors </item>
2001-07-02 14:29:24 +00:00
</itemize>
</p>
2003-03-17 22:12:10 +00:00
<p> Printers:
<itemize>
<item> HP colour laserject 4600dn </item>
</itemize>
</p>
2001-07-02 14:29:24 +00:00
</sect1>
<!-- ************************************************************* -->
<sect1> Putting-it-all-together hardware
<p> We use KVM switches with a cheap monitor to connect up and "look"
at all the machines:
<itemize>
2002-04-09 14:00:22 +00:00
<item> 15" .28dp XLN CTL Monitor </item>
<item> 3 Belkin Omniview 16-Port Pro Switches </item>
2003-03-17 22:12:10 +00:00
<item> Belkin Omniview 2-Port Switch </item>
2001-07-02 14:29:24 +00:00
</itemize>
</p>
2001-08-22 14:00:26 +00:00
<p> While this is a nice solution, I think it's kind of needless. What
we need is a small hand held monitor that can plug into the back of
the PC (operated with a stylus, like the Palm). I don't plan to use
more monitor switches/KVM cables. </p>
<p> Networking is important:
2001-07-02 14:29:24 +00:00
<itemize>
2003-03-17 22:12:10 +00:00
<item> 1 Netgear FSM750S 48 port/2 git network switch </item>
<item> 1 Netgear FS517TS 16 port/1 git network switch </item>
2002-09-26 15:20:15 +00:00
<item> 1 Netgear FS750NA 48 port network switch </item>
2002-04-09 14:00:22 +00:00
<item> 1 Netgear FS524 24 port network switch </item>
2002-09-26 15:20:15 +00:00
<item> 1 Cisco Catalyst 3448 XL Enterprise Edition 48 port network switch </item>
<item> 1 Netgear ME102NA Wireless Access Point </item>
<item> 1 Netgear MA401NA Wireless PCMCIA network card </item>
2001-07-02 14:29:24 +00:00
</itemize>
</p>
</sect1>
<!-- ************************************************************* -->
<sect1> Costs
<p> Our vendor is Hard Drives Northwest (<htmlurl
url="http://www.hdnw.com" name="http://www.hdnw.com">). For each
compute node in our cluster (containing two processors), we paid about
2002-04-09 14:00:22 +00:00
$1500-$2000, including taxes. Generally, our goal is to keep each node to
2001-07-02 14:29:24 +00:00
below $2000.00 (which is what our desktop machines cost). </p>
</sect1>
<!-- ************************************************************* -->
</sect>
<!-- ************************************************************* -->
<sect> Software
<!-- ************************************************************* -->
2002-07-22 13:54:26 +00:00
<sect1> Operating system: Linux, of course!
2001-07-02 14:29:24 +00:00
2003-03-17 22:12:10 +00:00
<p> The following kernels and distributions are what are being used:
<itemize>
<item> Kernel 2.2.16-22, distribution KRUD 7.0
<item> Kernel 2.4.9-7, distribution KRUD 7.2
<item> Kernel 2.4.18-10, distribution KRUD 7.3
</itemize>
These distributions work very well for us since updates are sent to us
on CD and there's no reliance on an external network connection for
updates. They also seem "cleaner" than the regular Red Hat
distributions, and the setup is extremely stable. </p>
</sect1>
<!-- ************************************************************* -->
<sect1> Networking software
<p> We use Shorewall 1.3.14a ((<htmlurl url="http://www.shorewall.net"
name="http://www.shorewall.net">) for the firewall. </p>
2002-04-09 14:00:22 +00:00
2002-07-22 13:54:26 +00:00
</sect1>
<!-- ************************************************************* -->
<sect1> Parallel processing software
<p> We use our own software for parallelising applications but have
experimented with PVM and MPI. In my view, the overhead for these
pre-packaged programs is too high. I recommend writing
2002-04-09 14:00:22 +00:00
application-specific code for the tasks you perform (that's one
person's view). </p>
2001-07-02 14:29:24 +00:00
2001-08-22 14:00:26 +00:00
</sect1>
2001-07-02 14:29:24 +00:00
<!-- ************************************************************* -->
<sect1> Costs
2002-07-22 13:54:26 +00:00
<p> Linux and most software that run on Linux are freely
copiable. </p>
2001-07-02 14:29:24 +00:00
</sect1>
<!-- ************************************************************* -->
</sect>
<!-- ************************************************************* -->
2002-07-22 13:54:26 +00:00
<sect> Set up, configuration, and maintenance
2001-07-02 14:29:24 +00:00
<!-- ************************************************************* -->
<sect1> Disk configuration
<p> This section describes disk partitioning strategies. </p>
<p>
<tscreen><verb>
2001-08-22 14:00:26 +00:00
farm/cluster machines:
2002-09-26 15:20:15 +00:00
hda1 - swap (2 * RAM)
hda2 - / (remaining disk space)
hdb1 - /maxa (total disk)
2001-08-22 14:00:26 +00:00
2001-07-02 14:29:24 +00:00
desktops (without windows):
2002-09-26 15:20:15 +00:00
hda1 - swap (2 * RAM)
hda2 - / (4 GB)
hda3 - /spare (remaining disk space)
hdb1 - /maxa (total disk)
hdd1 - /maxb (total disk)
2001-07-02 14:29:24 +00:00
desktops (with windows):
2002-09-26 15:20:15 +00:00
hda1 - /win (total disk)
hdb1 - swap (2 * RAM)
hdb2 - / (4 GB)
hdb3 - /spare (remaining disk space)
hdd1 - /maxa (total disk)
2001-07-02 14:29:24 +00:00
laptops (single disk):
2002-09-26 15:20:15 +00:00
hda1 - /win (half the total disk size)
hda2 - swap (2 * RAM)
2002-11-12 00:21:11 +00:00
hda3 - / (remaining disk space)
2001-07-02 14:29:24 +00:00
</verb></tscreen>
</p>
</sect1>
<!-- ************************************************************* -->
<sect1> Package configuration
<p> Install a minimal set of packages for the farm. Users are allowed
to configure desktops as they wish. </p>
</sect1>
2001-08-22 14:00:26 +00:00
<!-- ************************************************************* -->
2002-07-22 13:54:26 +00:00
<sect1> Operating system installation and maintenance
2001-08-22 14:00:26 +00:00
<!-- ************************************************************* -->
2003-03-17 22:12:10 +00:00
<sect2> Cloning and maintenance packages
<sect3> FAI
<p> FAI (<htmlurl url="http://www.informatik.uni-koeln.de/fai/"
name="http://www.informatik.uni-koeln.de/fai/">) is an automated
system to install a Debian GNU/Linux operating system on a PC
cluster. You can take one or more virgin PCs, turn on the power and
after a few minutes Linux is installed, configured and running on the
whole cluster, without any interaction necessary.
</sect3>
<sect3> SystemImager
<p> SystemImager (<htmlurl url="http://systemimager.org"
name="http://systemimager.org">) is software that automates Linux
installs, software distribution, and production deployment. </p>
</sect3>
</sect2>
<!-- ************************************************************* -->
<sect2> Personal cloning strategy
2001-08-22 14:00:26 +00:00
<p> I believe in having a completely distributed system. This means
each machine contains a copy of the operating system. Installing the
OS on each machine manually is cumbersome. To optimise this process,
what I do is first set up and install one machine exactly the way I
want to. I then create a tar and gzipped file of the entire system
and place it on a CD-ROM which I then clone on each machine in my
cluster. </p>
<p> The commands I use to create the tar file are as follows:
<tscreen><verb>
tar -czvlps --same-owner --atime-preserve -f /maxa/slash.tgz /
</verb></tscreen>
</p>
<p> I use have a script called <tt>go</tt> that takes a hostname and
IP address as its arguments and untars the <tt>slash.tgz</tt> file on
the CD-ROM and replaces the hostname and IP address in the appropriate
locations. A version of the <tt>go</tt> script and the input files for
it can be accessed at: <htmlurl
url="http://www.ram.org/computing/linux/cluster/"
name="http://www.ram.org/computing/linux/linux/cluster/">. This script
will have to be edited based on your cluster design. </p>
2002-07-22 13:54:26 +00:00
<p> To make this work, I also use Tom's Root Boot package (<htmlurl
url="http://www.toms.net/rb/" name="http://www.toms.net/rb/">) to boot
2001-08-22 14:00:26 +00:00
the machine and clone the system. The <tt>go</tt> script can be
placed on a CD-ROM or on the floppy containing Tom's Root Boot package
(you need to delete a few programs from this package since the floppy
disk is stretched to capacity). </p>
<p> More conveniently, you could burn a bootable CD-ROM containing
Tom's Root Boot package, including the <tt>go</tt> script, and the tgz
file containing the system you wish to clone. You can also edit Tom's
Root Boot's init scripts so that it directly executes the <tt>go</tt>
script (you will still have to set IP addresses if you don't use
DHCP). </p>
2002-07-22 13:54:26 +00:00
<p> Alternately, you can create your own custom disk (like a rescue
disk) that contains the kernel you can want and the tools you
want. There are several documents that describe how to do this,
including the Linux Bootdisk HOWTO (<htmlurl
url="http://www.linuxdoc.org/HOWTO/Bootdisk-HOWTO/"
name="http://www.linuxdoc.org/HOWTO/Bootdisk-HOWTO/">), which also
contains links to other pre-made boot/root disks. </p>
2001-08-22 14:00:26 +00:00
<p> Thus you can develop a system where all you have to do is insert a
CDROM, turn on the machine, have a cup of coffee (or a can of coke)
and come back to see a full clone. You then repeat this process for as
many machines as you have. This procedure has worked extremely well
for me and if you have someone else actually doing the work (of
inserting and removing CD-ROMs) then it's ideal. </p>
2002-07-22 13:54:26 +00:00
<p> Rob Fantini (<htmlurl url="mailto:rob@fantinibakery.com"
name="rob@fantinibakery.com">) has contributed modifications of the
scripts above that he used for cloning a Mandrake 8.2 system
accessible at <htmlurl
2002-04-09 14:00:22 +00:00
url="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz"
name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">.
</p>
2001-08-22 14:00:26 +00:00
</sect2>
<!-- ************************************************************* -->
<sect2> DHCP vs. hard-coded IP addresses
<p> If you have DHCP set up, then you don't need to reset the IP
address and that part of it can be removed from the <tt>go</tt>
script. </p>
<p> DHCP has the advantage that you don't muck around with IP
addresses at all provided the DHCP server is configured
appropriately. It has the disadvantage that it relies on a centralised
server (and like I said, I tend to distribute things as much as
possible). Also, linking hardware ethernet addresses to IP addresses can
make it inconvenient if you wish to replace machines or change
hostnames routinely. </p>
</sect2>
<!-- ************************************************************* -->
</sect1>
<!-- ************************************************************* -->
2002-04-09 14:00:22 +00:00
<sect1> Known hardware issues <label id="known_hardware_issues">
<p> The hardware in general has worked really well for us. Specific
issues are listed below: </p>
<p> The AMD dual 1.2 GHz machines run really hot. Two of them in a
room increase the temperature significantly. Thus while they might be
okay as desktops, the cooling and power consumption when using them as
part of a large cluster is a consideration. The AMD Palmino
2002-07-22 13:54:26 +00:00
configuration described previously seems to work really well, but I
definitely recommend getting two fans in the case--this solved all our
instability problems. </p>
</sect1>
<!-- ************************************************************* -->
<sect1> Known software issues <label id="known_software_issues">
<p> Some tar executables apparently don't create a tar file the nice
way they're supposed to (especially in terms of referencing and
de-referencing symbolic links). The solution to this I've found is to
use a tar executable that does, like the one from RedHat 7.0. </p>
2002-04-09 14:00:22 +00:00
</sect1>
<!-- ************************************************************* -->
2001-08-22 14:00:26 +00:00
</sect>
<!-- ************************************************************* -->
<sect> Performing tasks on the cluster
2002-04-09 14:00:22 +00:00
<!-- ************************************************************* -->
2001-08-22 14:00:26 +00:00
<p> This section is still being developed as the usage on my cluster
evolves, but so far we tend to write our own sets of message passing
routines to communicate between processes on different machines. </p>
<p> Many applications, particularly in the computational genomics
areas, are massively and trivially parallelisable, meaning that
perfect distribution can be achieved by spreading tasks equally across
the machines (for example, when analysing a whole genome using a
2003-03-17 22:12:10 +00:00
technique that operates on a single gene/protein, each processor can
work on one gene/protein at a time independent of all the other
2002-07-22 13:54:26 +00:00
processors). </p>
2001-08-22 14:00:26 +00:00
2002-04-09 14:00:22 +00:00
<p> So far we have not found the need to use a professional queueing
2001-08-22 14:00:26 +00:00
system, but obviously that is highly dependent on the type of
applications you wish to run. </p>
2002-04-09 14:00:22 +00:00
<!-- ************************************************************* -->
<sect1> Rough benchmarks
2002-09-26 15:20:15 +00:00
<p> For the single most important program we run (our <it>ab
initio</it> protein folding simulation program), using the Pentium 3 1
2002-11-12 00:21:11 +00:00
GHz processor machine as a frame of reference, the Athlon 1.2 GHz
processor machine is about 16% faster on average, the Xeon 1.7 GHz
machine is about 25-32% faster on average, the Athlon 1.5 GHz
2002-09-26 15:20:15 +00:00
processor is about 38% faster on average, and the Athlon 1.7 GHz
processor is about 46% faster on average (yes, the Athlon 1.5 GHz is
faster than the Xeon 1.7 GHz since the Xeon executes only six
instructions per clock (IPC) whereas the Athlon executes nine IPC (you
2002-11-12 00:21:11 +00:00
do the math!)). </p>
2002-04-09 14:00:22 +00:00
</sect1>
<!-- ************************************************************* -->
<sect1> Uptimes
<p> These machines are incredibly stable both in terms of hardware and
software once they have been debugged (usually some in a new batch of
2002-07-22 13:54:26 +00:00
machines have hardware problems), running constantly under very heavy
loads. One example is given below. Reboots have generally occurred
when a circuit breaker is tripped.
2002-04-09 14:00:22 +00:00
<tscreen><verb>
2003-03-17 22:12:10 +00:00
2:29pm up 495 days, 1:04, 2 users, load average: 4.85, 7.15, 7.72
2002-04-09 14:00:22 +00:00
</verb></tscreen>
</p>
</sect1>
<!-- ************************************************************* -->
2001-08-22 14:00:26 +00:00
2001-07-02 14:29:24 +00:00
</sect>
<!-- ************************************************************* -->
<sect> Acknowledgements
<p> The following people have been helpful in getting this HOWTO
done:
<itemize>
<item> Michael Levitt (<htmlurl url="mailto:michael.levitt@stanford.edu" name="Michael Levitt">)
</itemize>
</p>
</sect>
<!-- ************************************************************* -->
<sect> Bibliography <label id="references">
<p> The following documents may prove useful to you---they are links
to sources that make use of high-performance computing clusters:
<itemize>
<item> <url url="http://www.ram.org/computing/rambin/rambin.html"
name="RAMBIN web page">
<item> <url url="http://www.ram.org/computing/ramp/ramp.html"
name="RAMP web page">
<item> <url url="http://www.ram.org/research/research.html" name="Ram
Samudrala's research page (which describes the kind of research done with these clusters)">
</itemize>
</p>
</sect>
<!-- ************************************************************* -->
</article>
2002-07-22 13:54:26 +00:00