From 1bd2c2610df52a2de4765b5b9cc3b01f275f5c6a Mon Sep 17 00:00:00 2001 From: gferg <> Date: Mon, 17 Mar 2003 22:12:10 +0000 Subject: [PATCH] updated --- LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml | 2 +- LDP/howto/docbook/HOWTO-INDEX/hwSect.sgml | 2 +- LDP/howto/linuxdoc/Cluster-HOWTO.sgml | 116 ++++++++++++++----- 3 files changed, 91 insertions(+), 29 deletions(-) diff --git a/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml b/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml index adc4cd53..11237b04 100644 --- a/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml +++ b/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml @@ -523,7 +523,7 @@ partition images to and from a TFTP server. Cluster-HOWTO, Linux Cluster HOWTO -Updated: November 2002. +Updated: March 2003. How to set up high-performance Linux computing clusters. diff --git a/LDP/howto/docbook/HOWTO-INDEX/hwSect.sgml b/LDP/howto/docbook/HOWTO-INDEX/hwSect.sgml index 6b67ffc0..bd7ad320 100644 --- a/LDP/howto/docbook/HOWTO-INDEX/hwSect.sgml +++ b/LDP/howto/docbook/HOWTO-INDEX/hwSect.sgml @@ -100,7 +100,7 @@ various issues related to this. Cluster-HOWTO, Linux Cluster HOWTO -Updated: November 2002. +Updated: March 2003. How to set up high-performance Linux computing clusters. diff --git a/LDP/howto/linuxdoc/Cluster-HOWTO.sgml b/LDP/howto/linuxdoc/Cluster-HOWTO.sgml index 286df269..b00b3492 100644 --- a/LDP/howto/linuxdoc/Cluster-HOWTO.sgml +++ b/LDP/howto/linuxdoc/Cluster-HOWTO.sgml @@ -5,7 +5,7 @@ Linux Cluster HOWTO Ram Samudrala (me@ram.org) - v0.98, November 11, 2002 + v1.0, March 17, 2003 How to set up high-performance Linux computing clusters. @@ -33,11 +33,13 @@ and includes not only details the compute aspects, but also the desktop, laptop, and public server aspects. This is done mainly for local use, but I put it up on the web since I received several e-mail messages based on my newsgroup query requesting the same information. -Even today, as I plan another 64-node cluster, I find there is a +Even today, as I plan another 64-node cluster, I find that there is a dearth of information about exactly how to assemble components to form -a node that works reliably under Linux. The main use of this HOWTO as -it stands is that it's a report on what kind of hardware works well -with Linux and what kind of hardware doesn't.

+a node that works reliably under Linux that includes information not +only about the compute nodes, but about hardware that needs to work +well with the nodes for productive research to happen. The main use +of this HOWTO as it stands is that it's a report on what kind of +hardware works well with Linux and what kind of hardware doesn't.

@@ -246,7 +248,7 @@ following setup: 2 Pentium III 1 GHz Intel CPUs Supermicro 370 DE6 Dual PIII-FCPGA motherboard 4 256 MB 168-pin PC133 Registered ECC Micron RAM - 3 40 GB Maxtor UDMA/100 7200 RPM HD + 3 40 GB Maxtor UDMA/100 7200 RPM hard disk Ricoh 32x12x10 CDRW/DVD Combo EIDE 1.4 MB floppy drive Asus V7700 64mb GeForce2-GTS AGP video card @@ -261,6 +263,27 @@ following setup: + Firewall/gateway hardware + +

1 firewall with the following setup: + + + AMD Palamino XP 1700+ 1.47GHz CPU + MSI KT3 Ultra2 KT333 MS-6380E motherboard + 512 MB PC2100 DDR-266MHz DIMM RAM + 40GB Seagate 7200rpm ATA/100 hard disk + Asus 52X CD-A520 INT IDE cdrom + 1.44 MB floppy drive + ATI Expert 2000 Rage 128 32mb video card + 4 Intel Pro/1000T Gigabit Server ethernet cards + 4U Black Rackmount Steel case + +

+ +
+ + + Miscellaneous/accessory hardware

Backup: @@ -282,6 +305,13 @@ following setup:

+

Printers: + + + HP colour laserject 4600dn + +

+
@@ -294,7 +324,7 @@ at all the machines: 15" .28dp XLN CTL Monitor 3 Belkin Omniview 16-Port Pro Switches - 40 KVM cables + Belkin Omniview 2-Port Switch

@@ -306,6 +336,8 @@ more monitor switches/KVM cables.

Networking is important: + 1 Netgear FSM750S 48 port/2 git network switch + 1 Netgear FS517TS 16 port/1 git network switch 1 Netgear FS750NA 48 port network switch 1 Netgear FS524 24 port network switch 1 Cisco Catalyst 3448 XL Enterprise Edition 48 port network switch @@ -340,12 +372,27 @@ below $2000.00 (which is what our desktop machines cost).

Operating system: Linux, of course! -

We use Linux systems with a 2.4.9-7 kernel based on the KRUD 7.2 -distribution, and 2.2.17-14 kernel based on the KRUD 7.0 -distribution. These distributions work very well for us since updates -are sent to us on CD and there's no reliance on an external network -connection for updates. They also seem "cleaner" than the regular Red -Hat distributions.

+

The following kernels and distributions are what are being used: + + + Kernel 2.2.16-22, distribution KRUD 7.0 + Kernel 2.4.9-7, distribution KRUD 7.2 + Kernel 2.4.18-10, distribution KRUD 7.3 + + +These distributions work very well for us since updates are sent to us +on CD and there's no reliance on an external network connection for +updates. They also seem "cleaner" than the regular Red Hat +distributions, and the setup is extremely stable.

+ +
+ + + + Networking software + +

We use Shorewall 1.3.14a (() for the firewall.

@@ -433,7 +480,32 @@ to configure desktops as they wish.

- Cloning + Cloning and maintenance packages + + FAI + +

FAI () is an automated +system to install a Debian GNU/Linux operating system on a PC +cluster. You can take one or more virgin PCs, turn on the power and +after a few minutes Linux is installed, configured and running on the +whole cluster, without any interaction necessary. + + + + SystemImager + +

SystemImager () is software that automates Linux +installs, software distribution, and production deployment.

+ +
+ +
+ + + + Personal cloning strategy

I believe in having a completely distributed system. This means each machine contains a copy of the operating system. Installing the @@ -500,16 +572,6 @@ name="http://www.ram.org/computing/linux/cluster/fantini_contribution.tgz">. - Cloning and maintenance packages - -

SystemImager () is software that automates Linux -installs, software distribution, and production deployment.

- -
- - - DHCP vs. hard-coded IP addresses

If you have DHCP set up, then you don't need to reset the IP @@ -576,8 +638,8 @@ routines to communicate between processes on different machines.

areas, are massively and trivially parallelisable, meaning that perfect distribution can be achieved by spreading tasks equally across the machines (for example, when analysing a whole genome using a -technique that operates on a single gene/proteom, each processor can -work on one gene at a time independent of all the other +technique that operates on a single gene/protein, each processor can +work on one gene/protein at a time independent of all the other processors).

So far we have not found the need to use a professional queueing @@ -612,7 +674,7 @@ loads. One example is given below. Reboots have generally occurred when a circuit breaker is tripped. - 12:39pm up 439 days, 14:45, 1 user, load average: 1.19, 1.08, 1.02 + 2:29pm up 495 days, 1:04, 2 users, load average: 4.85, 7.15, 7.72