updated

2001-08-22 14:00:26 +00:00 · 2001-08-22 14:00:26 +00:00 · 58e9af2368
parent 3136c4d10d
commit 58e9af2368
1 changed files with 126 additions and 16 deletions
--- a/LDP/howto/linuxdoc/Cluster-HOWTO.sgml
+++ b/LDP/howto/linuxdoc/Cluster-HOWTO.sgml
@ -2,10 +2,10 @@

 <article>

-<title>Linux Cluster HOWTO</title>
+<title> Linux Cluster HOWTO </title>

 <author>Ram Samudrala <tt>(me@ram.org)</tt> </author>
-<date> v0.2, June 10, 2001</date>
+<date> v0.3,  August 21, 2001 </date>

 <abstract>
 How to set up high-performance Linux computing clusters.
@ -31,10 +31,10 @@ name="http://www.ram.org/computing/linux/linux_cluster.html">. </p>
 a general way, this is a specific description of how our lab is setup
 and includes not only details the compute aspects, but also the
 desktop, laptop, and public server aspects.  This is done mainly for
-local use, but I figure I might as well put it up on the web and
-perhaps someone else will find it useful.  The main use as it stands
-is that it's a report on what kind of hardware works well with Linux
-and what kind of hardware doesn't. </p>
+local use, but I put it up on the web since I received several e-mail
+messages based on my newsgroup query requesting the same information.
+The main use as it stands is that it's a report on what kind of
+hardware works well with Linux and what kind of hardware doesn't. </p>

 </sect>

@ -45,7 +45,10 @@ and what kind of hardware doesn't. </p>
 <sect> Hardware

 <p> This section covers the hardware choices I've made. Unless noted,
-assume that everything works <it>really</it> well. </p>
+assume that everything works <it>really</it> well.  </p>
+
+<p> Hardware installation is also fairly straight-forward unless
+otherwise noted, with most of the details covered by the manuals. </p>

 <!-- ************************************************************* -->

@ -85,6 +88,7 @@ assume that everything works <it>really</it> well. </p>
 <item> ATI Expert 98 8 MB PCI video card
 <item> Full-tower case
 </itemize>
+</p>

 </sect1>

@ -174,7 +178,12 @@ at all the machines:
 </itemize>
 </p>

-<p> Networking is important. 
+<p> While this is a nice solution, I think it's kind of needless. What
+we need is a small hand held monitor that can plug into the back of
+the PC (operated with a stylus, like the Palm). I don't plan to use
+more monitor switches/KVM cables. </p>
+
+<p> Networking is important:

 <itemize>
 <item> 1 Cisco Catalyst 3448 XL Enterprise Edition network switch.
@ -212,6 +221,8 @@ distribution. We use our own software for parallising applications but
 have experimented with PVM and MPI. In my view, the overhead for these
 pre-packaged programs is too high. </p>

+</sect1>
+
 <!-- ************************************************************* -->

 <sect1> Costs
@ -226,7 +237,7 @@ pre-packaged programs is too high. </p>

 <!-- ************************************************************* -->

-<sect> Configuration
+<sect> Set up and configuration

 <!-- ************************************************************* -->

@ -236,6 +247,12 @@ pre-packaged programs is too high. </p>

 <p>
 <tscreen><verb>
+farm/cluster machines:
+
+hda1 - swap  (2 * RAM)
+hda2 - /     (remaining disk space)
+hdb1 - /maxa (total disk)
+
 desktops (without windows):

 hda1 - swap  (2 * RAM)
@ -258,12 +275,6 @@ hda1 - /win  (half the total disk size)
 hda2 - swap  (2 * RAM)
 hda3 - /     (4 GB)
 hda4 - /home (remaining disk space)
-
-farm machines:
-
-hda1 - swap  (2 * RAM)
-hda2 - /     (remaining disk space)
-hdb1 - /maxa (total disk)
 </verb></tscreen>
 </p>

@ -278,6 +289,106 @@ to configure desktops as they wish.  </p>

 </sect1>

+<!-- ************************************************************* -->
+
+<sect1> Operating system installation
+
+<!-- ************************************************************* -->
+
+<sect2> Cloning 
+
+<p> I believe in having a completely distributed system. This means
+each machine contains a copy of the operating system.  Installing the
+OS on each machine manually is cumbersome. To optimise this process,
+what I do is first set up and install one machine exactly the way I
+want to.  I then create a tar and gzipped file of the entire system
+and place it on a CD-ROM which I then clone on each machine in my
+cluster. </p>
+
+<p> The commands I use to create the tar file are as follows: 
+
+<tscreen><verb>
+tar -czvlps --same-owner --atime-preserve -f /maxa/slash.tgz /
+</verb></tscreen>
+</p>
+
+<p> I use have a script called <tt>go</tt> that takes a hostname and
+IP address as its arguments and untars the <tt>slash.tgz</tt> file on
+the CD-ROM and replaces the hostname and IP address in the appropriate
+locations. A version of the <tt>go</tt> script and the input files for
+it can be accessed at: <htmlurl
+url="http://www.ram.org/computing/linux/cluster/"
+name="http://www.ram.org/computing/linux/linux/cluster/">. This script
+will have to be edited based on your cluster design. </p>
+
+<p> To make this work, I also use Tom's Root Boot package <htmlurl
+url="http://www.toms.net/rb/" name="http://www.toms.net/rb/"> to boot
+the machine and clone the system.  The <tt>go</tt> script can be
+placed on a CD-ROM or on the floppy containing Tom's Root Boot package
+(you need to delete a few programs from this package since the floppy
+disk is stretched to capacity). </p>
+
+<p> More conveniently, you could burn a bootable CD-ROM containing
+Tom's Root Boot package, including the <tt>go</tt> script, and the tgz
+file containing the system you wish to clone.  You can also edit Tom's
+Root Boot's init scripts so that it directly executes the <tt>go</tt>
+script (you will still have to set IP addresses if you don't use
+DHCP). </p>
+
+<p> Thus you can develop a system where all you have to do is insert a
+CDROM, turn on the machine, have a cup of coffee (or a can of coke)
+and come back to see a full clone. You then repeat this process for as
+many machines as you have. This procedure has worked extremely well
+for me and if you have someone else actually doing the work (of
+inserting and removing CD-ROMs) then it's ideal. </p>
+
+</sect2>
+
+<!-- ************************************************************* -->
+
+<sect2> DHCP vs. hard-coded IP addresses
+
+<p> If you have DHCP set up, then you don't need to reset the IP
+address and that part of it can be removed from the <tt>go</tt>
+script. </p>
+
+<p> DHCP has the advantage that you don't muck around with IP
+addresses at all provided the DHCP server is configured
+appropriately. It has the disadvantage that it relies on a centralised
+server (and like I said, I tend to distribute things as much as
+possible). Also, linking hardware ethernet addresses to IP addresses can
+make it inconvenient if you wish to replace machines or change
+hostnames routinely. </p>
+
+</sect2>
+
+<!-- ************************************************************* -->
+
+</sect1>
+
+<!-- ************************************************************* -->
+
+</sect>
+
+<!-- ************************************************************* -->
+
+<sect> Performing tasks on the cluster
+
+<p> This section is still being developed as the usage on my cluster
+evolves, but so far we tend to write our own sets of message passing
+routines to communicate between processes on different machines.  </p>
+
+<p> Many applications, particularly in the computational genomics
+areas, are massively and trivially parallelisable, meaning that
+perfect distribution can be achieved by spreading tasks equally across
+the machines (for example, when analysing a whole genome using a
+single gene technique, each processor can work on one gene at a time
+independent of all the other processors). </p>
+
+<p> So far we have not found the need to use a professional queing
+system, but obviously that is highly dependent on the type of
+applications you wish to run. </p>
+
 </sect>

 <!-- ************************************************************* -->
@ -318,4 +429,3 @@ Samudrala's research page (which describes the kind of research done with these
 <!-- ************************************************************* -->

 </article>
-