1342 lines
29 KiB
HTML
1342 lines
29 KiB
HTML
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||
|
<HTML
|
||
|
><HEAD
|
||
|
><TITLE
|
||
|
>The Beowulf HOWTO</TITLE
|
||
|
><META
|
||
|
NAME="GENERATOR"
|
||
|
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"></HEAD
|
||
|
><BODY
|
||
|
CLASS="article"
|
||
|
BGCOLOR="#FFFFFF"
|
||
|
TEXT="#000000"
|
||
|
LINK="#0000FF"
|
||
|
VLINK="#840084"
|
||
|
ALINK="#0000FF"
|
||
|
><DIV
|
||
|
CLASS="ARTICLE"
|
||
|
><DIV
|
||
|
CLASS="TITLEPAGE"
|
||
|
><H1
|
||
|
CLASS="title"
|
||
|
><A
|
||
|
NAME="AEN2"
|
||
|
></A
|
||
|
>The Beowulf HOWTO</H1
|
||
|
><H3
|
||
|
CLASS="author"
|
||
|
><A
|
||
|
NAME="AEN4"
|
||
|
>Kurt Swendson</A
|
||
|
></H3
|
||
|
><DIV
|
||
|
CLASS="affiliation"
|
||
|
><DIV
|
||
|
CLASS="address"
|
||
|
><P
|
||
|
CLASS="address"
|
||
|
><TT
|
||
|
CLASS="email"
|
||
|
><<A
|
||
|
HREF="mailto:lam32767@lycos.com"
|
||
|
>lam32767@lycos.com</A
|
||
|
>></TT
|
||
|
></P
|
||
|
></DIV
|
||
|
></DIV
|
||
|
><P
|
||
|
CLASS="pubdate"
|
||
|
>2004-05-17<BR></P
|
||
|
><DIV
|
||
|
CLASS="revhistory"
|
||
|
><TABLE
|
||
|
WIDTH="100%"
|
||
|
BORDER="0"
|
||
|
><TR
|
||
|
><TH
|
||
|
ALIGN="LEFT"
|
||
|
VALIGN="TOP"
|
||
|
COLSPAN="3"
|
||
|
><B
|
||
|
>Revision History</B
|
||
|
></TH
|
||
|
></TR
|
||
|
><TR
|
||
|
><TD
|
||
|
ALIGN="LEFT"
|
||
|
>Revision 1.0</TD
|
||
|
><TD
|
||
|
ALIGN="LEFT"
|
||
|
>2005-01-08</TD
|
||
|
><TD
|
||
|
ALIGN="LEFT"
|
||
|
></TD
|
||
|
></TR
|
||
|
><TR
|
||
|
><TD
|
||
|
ALIGN="LEFT"
|
||
|
COLSPAN="3"
|
||
|
>first official release</TD
|
||
|
></TR
|
||
|
><TR
|
||
|
><TD
|
||
|
ALIGN="LEFT"
|
||
|
>Revision 0.9</TD
|
||
|
><TD
|
||
|
ALIGN="LEFT"
|
||
|
>2004-05-17</TD
|
||
|
><TD
|
||
|
ALIGN="LEFT"
|
||
|
>Revised by: 01</TD
|
||
|
></TR
|
||
|
><TR
|
||
|
><TD
|
||
|
ALIGN="LEFT"
|
||
|
COLSPAN="3"
|
||
|
>initial revision</TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
></DIV
|
||
|
><DIV
|
||
|
><DIV
|
||
|
CLASS="abstract"
|
||
|
><A
|
||
|
NAME="AEN21"
|
||
|
></A
|
||
|
><P
|
||
|
></P
|
||
|
><P
|
||
|
>This document describes step by step instructions on building a
|
||
|
Beowulf cluster. This is a Red Hat and LAM specific version of this
|
||
|
document.</P
|
||
|
><P
|
||
|
></P
|
||
|
></DIV
|
||
|
></DIV
|
||
|
><HR></DIV
|
||
|
><DIV
|
||
|
CLASS="TOC"
|
||
|
><DL
|
||
|
><DT
|
||
|
><B
|
||
|
>Table of Contents</B
|
||
|
></DT
|
||
|
><DT
|
||
|
>1. <A
|
||
|
HREF="#intro"
|
||
|
>Introduction</A
|
||
|
></DT
|
||
|
><DD
|
||
|
><DL
|
||
|
><DT
|
||
|
>1.1. <A
|
||
|
HREF="#copyright"
|
||
|
>Copyright and License</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>1.2. <A
|
||
|
HREF="#disclaimer"
|
||
|
>Disclaimer</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>1.3. <A
|
||
|
HREF="#credits"
|
||
|
>Credits / Contributors</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>1.4. <A
|
||
|
HREF="#feedback"
|
||
|
>Feedback</A
|
||
|
></DT
|
||
|
></DL
|
||
|
></DD
|
||
|
><DT
|
||
|
>2. <A
|
||
|
HREF="#AEN49"
|
||
|
>Definitions</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>3. <A
|
||
|
HREF="#AEN58"
|
||
|
>Requirements</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>4. <A
|
||
|
HREF="#AEN70"
|
||
|
>Set Up The Head Node</A
|
||
|
></DT
|
||
|
><DD
|
||
|
><DL
|
||
|
><DT
|
||
|
>4.1. <A
|
||
|
HREF="#AEN77"
|
||
|
>Hosts</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>4.2. <A
|
||
|
HREF="#AEN86"
|
||
|
>Groups</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>4.3. <A
|
||
|
HREF="#AEN94"
|
||
|
>NFS</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>4.4. <A
|
||
|
HREF="#AEN103"
|
||
|
>IP Addresses</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>4.5. <A
|
||
|
HREF="#AEN107"
|
||
|
>Services</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>4.6. <A
|
||
|
HREF="#AEN113"
|
||
|
>SSH</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>4.7. <A
|
||
|
HREF="#AEN128"
|
||
|
>MPI</A
|
||
|
></DT
|
||
|
></DL
|
||
|
></DD
|
||
|
><DT
|
||
|
>5. <A
|
||
|
HREF="#AEN137"
|
||
|
>Set Up Slave Nodes</A
|
||
|
></DT
|
||
|
><DD
|
||
|
><DL
|
||
|
><DT
|
||
|
>5.1. <A
|
||
|
HREF="#AEN140"
|
||
|
>Base Linux Install</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>5.2. <A
|
||
|
HREF="#AEN147"
|
||
|
>Hardware</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>5.3. <A
|
||
|
HREF="#AEN152"
|
||
|
>Post Install Commands</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>5.4. <A
|
||
|
HREF="#AEN173"
|
||
|
>SSH On Slave Nodes</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>5.5. <A
|
||
|
HREF="#AEN179"
|
||
|
>NFS Settings On Slave Nodes</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>5.6. <A
|
||
|
HREF="#AEN184"
|
||
|
>Lilo Modifications On Slave Nodes</A
|
||
|
></DT
|
||
|
></DL
|
||
|
></DD
|
||
|
><DT
|
||
|
>6. <A
|
||
|
HREF="#AEN195"
|
||
|
>Verification</A
|
||
|
></DT
|
||
|
><DT
|
||
|
>7. <A
|
||
|
HREF="#AEN202"
|
||
|
>Run A Program</A
|
||
|
></DT
|
||
|
></DL
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect1"
|
||
|
><H1
|
||
|
CLASS="sect1"
|
||
|
><A
|
||
|
NAME="intro"
|
||
|
></A
|
||
|
>1. Introduction</H1
|
||
|
><P
|
||
|
>This document describes step by step instructions on building a
|
||
|
Beowulf cluster. After seeing all of the documentation that was available,
|
||
|
I felt there were enough gaps and omissions that my own document, which I
|
||
|
believe accurately describes how to build a Beowulf cluster, would be
|
||
|
beneficial.</P
|
||
|
><P
|
||
|
>I first saw Thomas Sterling's article in Scientific American, and
|
||
|
immediately got the book, because its title was "How to Build a Beowulf".
|
||
|
No doubt, it was a valuable reference, but it does not walk you through
|
||
|
instructions on exactly what to do.</P
|
||
|
><P
|
||
|
>What follows is a description of what I got to work. It is only one
|
||
|
example - my example. You may choose a different message passing
|
||
|
interface; you may choose a different Linux distribution. You may also
|
||
|
spend as much time as I did researching and experimenting, and learn on
|
||
|
your own.</P
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="copyright"
|
||
|
></A
|
||
|
>1.1. Copyright and License</H2
|
||
|
><P
|
||
|
>This document, <EM
|
||
|
>The Beowulf HOWTO</EM
|
||
|
>, is
|
||
|
copyrighted (c) 2004 by <EM
|
||
|
>Kurt Swendson</EM
|
||
|
>. Permission
|
||
|
is granted to copy, distribute and/or modify this document under the
|
||
|
terms of the GNU Free Documentation License, Version 1.1 or any later
|
||
|
version published by the Free Software Foundation; with no Invariant
|
||
|
Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A
|
||
|
copy of the license is available at <A
|
||
|
HREF="http://www.gnu.org/copyleft/fdl.html"
|
||
|
TARGET="_top"
|
||
|
> http://www.gnu.org/copyleft/fdl.html</A
|
||
|
>.</P
|
||
|
><P
|
||
|
>Linux is a registered trademark of Linus Torvalds.</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="disclaimer"
|
||
|
></A
|
||
|
>1.2. Disclaimer</H2
|
||
|
><P
|
||
|
>No liability for the contents of this document can be accepted.
|
||
|
Use the concepts, examples and information at your own risk. There may
|
||
|
be errors and inaccuracies which could damage to your system. Though
|
||
|
this is highly unlikely, proceed with caution. The author(s) do not
|
||
|
accept responsibility for your actions.</P
|
||
|
><P
|
||
|
>All copyrights are held by their by their respective owners,
|
||
|
unless specifically noted otherwise. Use of a term in this document
|
||
|
should not be regarded as affecting the validity of any trademark or
|
||
|
service mark. Naming of particular products or brands should not be seen
|
||
|
as endorsements.</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="credits"
|
||
|
></A
|
||
|
>1.3. Credits / Contributors</H2
|
||
|
><P
|
||
|
>Thanks to Thomas Johnson for all of his support and encouragement
|
||
|
and, of course, for the hardware without which I would not have been
|
||
|
able to even start.</P
|
||
|
><P
|
||
|
>Thanks to my lovely wife Sharron for her understanding and
|
||
|
patience during my many hours spent with "the wolves".</P
|
||
|
><P
|
||
|
>The <A
|
||
|
HREF="http://ibiblio.org/pub/Linux/docs/HOWTO/archive/Beowulf-HOWTO.html"
|
||
|
TARGET="_top"
|
||
|
>original
|
||
|
Beowulf HOWTO</A
|
||
|
> by Jacek Radajewski and Douglas Eadline. </P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="feedback"
|
||
|
></A
|
||
|
>1.4. Feedback</H2
|
||
|
><P
|
||
|
>Send your additions, comments and criticisms to
|
||
|
<TT
|
||
|
CLASS="email"
|
||
|
><<A
|
||
|
HREF="mailto:lam32767@lycos.com"
|
||
|
>lam32767@lycos.com</A
|
||
|
>></TT
|
||
|
>.</P
|
||
|
></DIV
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect1"
|
||
|
><HR><H1
|
||
|
CLASS="sect1"
|
||
|
><A
|
||
|
NAME="AEN49"
|
||
|
></A
|
||
|
>2. Definitions</H1
|
||
|
><P
|
||
|
>What is a Beowulf cluster? The authors of the
|
||
|
<A
|
||
|
HREF="http://ibiblio.org/pub/Linux/docs/HOWTO/archive/Beowulf-HOWTO.html"
|
||
|
TARGET="_top"
|
||
|
>original
|
||
|
Beowulf HOWTO</A
|
||
|
>, Jacek Radajewski and Douglas Eadline,
|
||
|
provide a good definition in their document: "Beowulf is a
|
||
|
multi-computer architecture which can be used for parallel computations.
|
||
|
It is a system which usually consists of one server node, and one or more
|
||
|
client nodes connected together via Ethernet or some other network". The
|
||
|
site <A
|
||
|
HREF="http://beowulf.org"
|
||
|
TARGET="_top"
|
||
|
>beowulf.org</A
|
||
|
> lists many web
|
||
|
pages about Beowulf systems built by individuals and organizations. From
|
||
|
these two links, one can be exposed to a large number of perspectives on
|
||
|
the Beowulf architecture, and draw his / her own conclusions.</P
|
||
|
><P
|
||
|
>What's the difference between a true Beowulf cluster and a COW
|
||
|
[cluster of workstations]? Brahma gives a good definition:<A
|
||
|
HREF="http://www.phy.duke.edu/brahma/beowulf_book/node62.html"
|
||
|
TARGET="_top"
|
||
|
> http://www.phy.duke.edu/brahma/beowulf_book/node62.html</A
|
||
|
>.</P
|
||
|
><P
|
||
|
>If you are a "user" at your organization, and you have the use
|
||
|
of some nodes, you may still do the instructions shown here to create a cow.
|
||
|
But if you "own" the nodes, that is, if you have complete control of them,
|
||
|
and are able to completely erase and rebuild them, you may create a true
|
||
|
Beowulf cluster.</P
|
||
|
><P
|
||
|
>In Brahma's web page, he suggests you manually configure each box,
|
||
|
and then later on (after you get the feel of doing this whole "wolfing up"
|
||
|
procedure), you can set up new nodes automatically (which I will describe
|
||
|
in a later document).</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect1"
|
||
|
><HR><H1
|
||
|
CLASS="sect1"
|
||
|
><A
|
||
|
NAME="AEN58"
|
||
|
></A
|
||
|
>3. Requirements</H1
|
||
|
><P
|
||
|
>Let's briefly outline the requirements: <P
|
||
|
></P
|
||
|
><UL
|
||
|
><LI
|
||
|
><P
|
||
|
>More than one box, each equipped with a network card.</P
|
||
|
></LI
|
||
|
><LI
|
||
|
><P
|
||
|
>A switch or hub to connect them</P
|
||
|
></LI
|
||
|
><LI
|
||
|
><P
|
||
|
>Linux</P
|
||
|
></LI
|
||
|
><LI
|
||
|
><P
|
||
|
>A message-passing interface [I used lam]</P
|
||
|
></LI
|
||
|
></UL
|
||
|
>It is not a requirement to have a kvm switch, [you know,
|
||
|
the switch to share one keyboard, video, and mouse between many boxes],
|
||
|
but it is convenient while setting up and / or debugging.</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect1"
|
||
|
><HR><H1
|
||
|
CLASS="sect1"
|
||
|
><A
|
||
|
NAME="AEN70"
|
||
|
></A
|
||
|
>4. Set Up The Head Node</H1
|
||
|
><P
|
||
|
>So let's get "wolfing." Choose the most powerful box to be the head
|
||
|
node. Install Linux there and choose every package you want. The only
|
||
|
requirement is that you choose "Network Servers" [in Red Hat terminology]
|
||
|
because you need to have NFS and ssh. That's all you need. In my case, I
|
||
|
was going to do development of the Beowulf application, so I added X and C
|
||
|
development.</P
|
||
|
><P
|
||
|
>It is my experience that you do not actually need NFS, but I found
|
||
|
it invaluable for copying files between nodes, and for automating the
|
||
|
install process. Later in this document I will describe how you can run a
|
||
|
simple Beowulf application without the use of NFS, but a more complex
|
||
|
application may use NFS or actually depend upon it.</P
|
||
|
><P
|
||
|
>Those of you researching Beowulf systems will also know how you can
|
||
|
have a second network card on the head node so you can access it from the
|
||
|
outside world. This is not required for the operation of a cluster.</P
|
||
|
><P
|
||
|
>I learned the hard way: use a password that obeys the strong
|
||
|
password constraints for your Linux distribution. I used an easily typed
|
||
|
password like "a" for my user, and the whole thing did not work. When I
|
||
|
changed my password to a legal password, with mixed numbers, characters,
|
||
|
upper and lower case, it worked.</P
|
||
|
><P
|
||
|
>If you use lam as your message passing interface, you will read in
|
||
|
the manual to turn OFF the firewalls, because they use random port numbers
|
||
|
to communicate between nodes. Here is a rule: If the manual tells you to
|
||
|
do something, DO IT! The lam manual also tells you to run as a non-root
|
||
|
user. Make the same user for every box. Build every box on the cluster
|
||
|
with that same user and password. I named that non root user "wolf".
|
||
|
</P
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN77"
|
||
|
></A
|
||
|
>4.1. Hosts</H2
|
||
|
><P
|
||
|
>First we modify /etc/hosts. In it, you will see the comments
|
||
|
telling you to leave the "localhost" line alone. Ignore that advice and
|
||
|
change it to not include the name of your box in the loopback
|
||
|
address.</P
|
||
|
><P
|
||
|
>Modify the line that says: <TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>127.0.0.1 wolf00 localhost.localdomain localhost</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
></P
|
||
|
><P
|
||
|
>...to now say: <TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>127.0.0.1 localhost.localdomain localhost </PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
></P
|
||
|
><P
|
||
|
>Then add all the boxes you want on your cluster. Note: This is not
|
||
|
required for the operation of a Beowulf cluster; only convenient, so
|
||
|
that you may type a simple "wolf01" when you refer to a box on your
|
||
|
cluster instead of the more tedious 192.168.0.101:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>192.168.0.100 wolf00
|
||
|
192.168.0.101 wolf01
|
||
|
192.168.0.102 wolf02
|
||
|
192.168.0.103 wolf03
|
||
|
192.168.0.104 wolf04</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN86"
|
||
|
></A
|
||
|
>4.2. Groups</H2
|
||
|
><P
|
||
|
>In order to responsibly set up your cluster, especially if you are
|
||
|
a "user" of your boxes [see Definitions], you should have some measure
|
||
|
of security.</P
|
||
|
><P
|
||
|
>After you create your user, create a group, and add the user to
|
||
|
the group. Then, you may modify your files and directories to only be
|
||
|
accessible by the users within that group:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>groupadd beowulf
|
||
|
usermod -g beowulf wolf </PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>...and add the following to /home/wolf/.bash_profile:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>umask 007</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>Now any files created by the user "wolf" [or any user within the
|
||
|
group] will be automatically only writeable by the group
|
||
|
"beowulf".</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN94"
|
||
|
></A
|
||
|
>4.3. NFS</H2
|
||
|
><P
|
||
|
>Refer to the following web site: <A
|
||
|
HREF="http://www.ibiblio.org/mdw/HOWTO/NFS-HOWTO/server.html"
|
||
|
TARGET="_top"
|
||
|
>http://www.ibiblio.org/mdw/HOWTO/NFS-HOWTO/server.html</A
|
||
|
></P
|
||
|
><P
|
||
|
>Print that up, and have it at your side. I will be directing you
|
||
|
how to modify your system in order to create an NFS server, but I have
|
||
|
found this site invaluable, as you may also.</P
|
||
|
><P
|
||
|
>Make a directory for everybody to share:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>mkdir /mnt/wolf
|
||
|
chmod 770 /mnt/wolf
|
||
|
chown wolf:beowulf /mnt/wolf -R </PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>Go to the /etc directory, and add your "shared" directory to the
|
||
|
exports file:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>cd /etc
|
||
|
cat >> exports
|
||
|
/mnt/wolf 192.168.0.100/192.168.0.255 (rw)
|
||
|
<control d></PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN103"
|
||
|
></A
|
||
|
>4.4. IP Addresses</H2
|
||
|
><P
|
||
|
>My network is 192.168.0.nnn because it is one of the "private" IP
|
||
|
ranges. Thomas Sterling talks about it on page 106 of his book. It is
|
||
|
inside my firewall, and works just fine.</P
|
||
|
><P
|
||
|
>My head node, which I call "wolf00" is 192.168.0.100, and every
|
||
|
other node is named "wolfnn", with an ip of 192.168.0.100 + nn. I am
|
||
|
following the sage advice of many of the web pages out there, and
|
||
|
setting myself up for an easier task of scaling up my cluster.</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN107"
|
||
|
></A
|
||
|
>4.5. Services</H2
|
||
|
><P
|
||
|
>Make sure that services we want are up:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>chkconfig -add sshd
|
||
|
chkconfig -add nfs
|
||
|
chkconfig -add rexec
|
||
|
chkconfig -add rlogin
|
||
|
chkconfig -level 3 rsh on
|
||
|
chkconfig -level 3 nfs on
|
||
|
chkconfig -level 3 rexec on
|
||
|
chkconfig -level 3 rlogin on</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>...And, during startup, I saw some services that I know I don't
|
||
|
want, and in my opinion, could be removed. You may add or remove others
|
||
|
that suit your needs; just include the ones shown above.</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>chkconfig -del atd
|
||
|
chkconfig -del rsh
|
||
|
chkconfig -del sendmail</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN113"
|
||
|
></A
|
||
|
>4.6. SSH</H2
|
||
|
><P
|
||
|
>To be responsible, we make ssh work. While logged in as root, you
|
||
|
must modify the /etc/ssh/sshd_config file. The lines:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>#RSAAuthentication yes
|
||
|
#AuthorizedKeysFile .ssh/authorized_keys</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>...are commented out, so uncomment them [remove the #].</P
|
||
|
><P
|
||
|
>Reboot, and log back in as wolf, because the operation of your
|
||
|
cluster will always be done from the user "wolf". Also, the hosts file
|
||
|
modifications done earlier must take effect. Logging out and back in
|
||
|
will not do this. To be sure, reboot the box, and make sure your prompt
|
||
|
shows hostname "wolf00".</P
|
||
|
><P
|
||
|
>To generate your public and private SSH keys, do this:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>ssh-keygen -b 1024 -f ~/.ssh/id_rsa -t rsa -N "" </PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>...and it will display a few messages, and tell you that it created
|
||
|
the public / private key pair. You will see these files, id_rsa and
|
||
|
id_rsa.pub, in the /home/wolf/.ssh directory.</P
|
||
|
><P
|
||
|
>Copy the id_rsa.pub file into a file called "authorized_keys"
|
||
|
right there in the .ssh directory. We will be using this file later.
|
||
|
Verify that the contents of this file show the hostname [the reason we
|
||
|
rebooted the box]. Modify the security on the files, and the
|
||
|
directory:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>chmod 644 ~/.ssh/auth*
|
||
|
chmod 755 ~/.ssh </PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>According to the LAM user group, only the head node needs to log
|
||
|
on to the slave nodes; not the other way around. Therefore when we copy
|
||
|
the public key files, we only copy the head node's key file to each
|
||
|
slave node, and set up the agent on the head node. This is MUCH easier
|
||
|
than copying all authorized_keys files to all nodes. I will describe
|
||
|
this in more detail later.</P
|
||
|
><P
|
||
|
>Note: I only am documenting what the LAM distribution of the
|
||
|
message passing interface requires; if you chose another message passing
|
||
|
interface to build your cluster, your requirements may differ.</P
|
||
|
><P
|
||
|
>At the end of /home/wolf/.bash_profile, add the following
|
||
|
statements [again this is lam-specific; your requirements may
|
||
|
vary]:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>export LAMRSH='ssh -x'
|
||
|
ssh-agent sh -c 'ssh-add && bash'</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN128"
|
||
|
></A
|
||
|
>4.7. MPI</H2
|
||
|
><P
|
||
|
>Lastly, put your message passing interface on the box. As stated
|
||
|
in 1.2 Requirements, I used lam. You can get lam from here:</P
|
||
|
><P
|
||
|
><A
|
||
|
HREF=" http://www.lam-mpi.org/"
|
||
|
TARGET="_top"
|
||
|
> http://www.lam-mpi.org/</A
|
||
|
></P
|
||
|
><P
|
||
|
>...but you can use any other message passing interface or parallel
|
||
|
virtual machine software you want. Again, I am showing you what worked
|
||
|
for me.</P
|
||
|
><P
|
||
|
>You can either build LAM from the supplied source, or use their
|
||
|
precompiled RPM package. It is not in the scope of this document to
|
||
|
describe that; I just got the source and followed the directions, and in
|
||
|
another experiment I installed their rpm. Both of them worked fine.
|
||
|
Remember the whole reason we are doing this is to learn; go forth and
|
||
|
learn.</P
|
||
|
><P
|
||
|
>You may also read more documentation regarding LAM and other
|
||
|
message passing interface software <A
|
||
|
HREF="http://www.tldp.org/HOWTO/Scientific-Computing-with-GNU-Linux/systems.html"
|
||
|
TARGET="_top"
|
||
|
>here.</A
|
||
|
></P
|
||
|
></DIV
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect1"
|
||
|
><HR><H1
|
||
|
CLASS="sect1"
|
||
|
><A
|
||
|
NAME="AEN137"
|
||
|
></A
|
||
|
>5. Set Up Slave Nodes</H1
|
||
|
><P
|
||
|
>Get your network cables out. Install Linux on the first non-head
|
||
|
node. Follow these steps for each non-head node.</P
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN140"
|
||
|
></A
|
||
|
>5.1. Base Linux Install</H2
|
||
|
><P
|
||
|
>Going with my example node names and IP addresses, this is what I
|
||
|
chose during setup:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>Workstation
|
||
|
auto partition
|
||
|
remove all partitions on system
|
||
|
use LILO as the boot loader
|
||
|
put boot loader on the MBR
|
||
|
host name wolf01
|
||
|
ip address 192.168.0.101
|
||
|
add the user "wolf"
|
||
|
same password as on all other nodes
|
||
|
NO firewall</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>The ONLY package installed: network servers. Un-select all other
|
||
|
packages.</P
|
||
|
><P
|
||
|
>It doesn't matter what else you choose; this is the minimum that
|
||
|
you need. Why fill the box up with non-essential software you will never
|
||
|
use? My research has been concentrated on finding that minimal
|
||
|
configuration to get up and running.</P
|
||
|
><P
|
||
|
>Here's another very important point: when you move on to an
|
||
|
automated install and config, you really will NEVER log in to the box.
|
||
|
Only during setup and install do I type anything directly on the
|
||
|
box.</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN147"
|
||
|
></A
|
||
|
>5.2. Hardware</H2
|
||
|
><P
|
||
|
>When the computer starts up, it will complain if it does not have
|
||
|
a keyboard connected. I was not able to modify the BIOS, because I had
|
||
|
older discarded boxes with no documentation, so I just connected a
|
||
|
"fake" keyboard.</P
|
||
|
><P
|
||
|
>I am in the computer industry, and see hundreds of keyboards come
|
||
|
and go, and some occasionally end up in the garbage. I get the old dead
|
||
|
keyboard out of the garbage, remove JUST the cord with the tiny circuit
|
||
|
board up there in the corner, where the num lock and caps lock lights
|
||
|
are. Then I plug the cord in, and the computer thinks it has a complete
|
||
|
keyboard without incident.</P
|
||
|
><P
|
||
|
>Again, you would be better off modifying your bios, if you are
|
||
|
able to. This is just a trick to use in case you don't have the bios
|
||
|
program.</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN152"
|
||
|
></A
|
||
|
>5.3. Post Install Commands</H2
|
||
|
><P
|
||
|
>After your newly installed box reboots, log on as root again,
|
||
|
and...</P
|
||
|
><P
|
||
|
></P
|
||
|
><UL
|
||
|
><LI
|
||
|
><P
|
||
|
>do the same chkconfig commands stated above to set up the
|
||
|
right services.</P
|
||
|
></LI
|
||
|
></UL
|
||
|
><P
|
||
|
></P
|
||
|
><UL
|
||
|
><LI
|
||
|
><P
|
||
|
>modify hosts; remove "wolfnn" from localhost, and just add
|
||
|
wolfnn and wolf00.</P
|
||
|
></LI
|
||
|
></UL
|
||
|
><P
|
||
|
></P
|
||
|
><UL
|
||
|
><LI
|
||
|
><P
|
||
|
>install lam</P
|
||
|
></LI
|
||
|
></UL
|
||
|
><P
|
||
|
></P
|
||
|
><UL
|
||
|
><LI
|
||
|
><P
|
||
|
>create the /mnt/wolf directory and set up security for
|
||
|
it.</P
|
||
|
></LI
|
||
|
></UL
|
||
|
><P
|
||
|
></P
|
||
|
><UL
|
||
|
><LI
|
||
|
><P
|
||
|
>do the ssh configuration</P
|
||
|
></LI
|
||
|
></UL
|
||
|
><P
|
||
|
>Up to this point, we are pretty much the same as the head node. I
|
||
|
do NOT do the modification of the exports file.</P
|
||
|
><P
|
||
|
>Also, do NOT add this line to the .bash_profile:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>sh -c 'ssh-add && bash'</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN173"
|
||
|
></A
|
||
|
>5.4. SSH On Slave Nodes</H2
|
||
|
><P
|
||
|
>Recall that on the head node, we created a file "authorized_keys".
|
||
|
Copy that file, created on your head node, to the ~/.ssh directory on
|
||
|
the slave nodes. The HEAD node will log on the all the SLAVE
|
||
|
nodes.</P
|
||
|
><P
|
||
|
>The requirement, as stated in the LAM user manual, is that there
|
||
|
should be no interaction required when logging in from the head to any
|
||
|
of the slaves. So, copying the public key from the head node into each
|
||
|
slave node, in the file "authorized_keys", tells each slave
|
||
|
that "wolf
|
||
|
user on wolf00 is allowed to log on here without any password; we know
|
||
|
it is safe."</P
|
||
|
><P
|
||
|
>However you may recall that the documentation states that the
|
||
|
first time you log on, it will ask for confirmation. So only once, after
|
||
|
doing the above configuration, go back to the head node, and type ssh
|
||
|
wolfnn where "wolfnn" is the name of your newly configured slave node.
|
||
|
It will ask you for confirmation, and you simply answer "yes" to it, and
|
||
|
that will be the last time you will have to interact.</P
|
||
|
><P
|
||
|
>Prove it by logging off, and then ssh back to that node, and it
|
||
|
should just immediately log you in, with no dialog whatsoever.</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN179"
|
||
|
></A
|
||
|
>5.5. NFS Settings On Slave Nodes</H2
|
||
|
><P
|
||
|
>As root, enter these commands:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>cat >> /etc/fstab
|
||
|
wolf00:/mnt/wolf /mnt/wolf nfs rw,hard,intr 0 0
|
||
|
<control d> </PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>What we did here was automatically mount the exported directory we
|
||
|
put in the /etc/exports file on the head node. More discussion regarding
|
||
|
nfs later in this document.</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect2"
|
||
|
><HR><H2
|
||
|
CLASS="sect2"
|
||
|
><A
|
||
|
NAME="AEN184"
|
||
|
></A
|
||
|
>5.6. Lilo Modifications On Slave Nodes</H2
|
||
|
><P
|
||
|
>Then modify /etc/lilo.conf.</P
|
||
|
><P
|
||
|
>The 2nd line of this file says</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>timeout=nn</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>Modify that line to say:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>timeout=1200</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>After it is modified, we invoke the changes. You type
|
||
|
"/sbin/lilo", and it will display back "added linux *" to confirm that
|
||
|
it took the changes you made to the lilo.conf file:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>/sbin/lilo
|
||
|
Added linux * </PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>Why do I do this lilo modification? If you were researching
|
||
|
Beowulf on the web, and understand everything I have done so far, you
|
||
|
may wonder, "I don't remember reading anything about lilo.conf."</P
|
||
|
><P
|
||
|
>All my Beowulf nodes share a single power strip. I turn on the
|
||
|
power strip, and every box on the cluster starts up immediately. As the
|
||
|
startup procedure progresses, it mounts file systems. Seeing that the
|
||
|
non-head nodes mount the shared directory from the head node, they all
|
||
|
will have to wait a little bit until the head node is up, with NFS ready
|
||
|
to go. So I make each slave node wait 2 minutes in the lilo step.
|
||
|
Meanwhile, the head node comes up, and making the shared directory
|
||
|
available. By then, the slave nodes finally start booting up because
|
||
|
lilo has waited 2 minutes.</P
|
||
|
></DIV
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect1"
|
||
|
><HR><H1
|
||
|
CLASS="sect1"
|
||
|
><A
|
||
|
NAME="AEN195"
|
||
|
></A
|
||
|
>6. Verification</H1
|
||
|
><P
|
||
|
>All done! You are almost ready to start wolfing.</P
|
||
|
><P
|
||
|
>Reboot your boxes. Did they all come up? Can you ping the head node
|
||
|
from each box? Can you ping each node from the head node? Can you ssh?
|
||
|
Don't worry about doing ssh as root; only as wolf. Also only worry about
|
||
|
ssh from the head to the slave, not the other way around.</P
|
||
|
><P
|
||
|
>If you are logged in as wolf, and ssh to a box, does it go
|
||
|
automatically, without prompting for password?</P
|
||
|
><P
|
||
|
>After the node boots up, log in as wolf, and say "mount". Does it
|
||
|
show wolf00:/mnt/wolf mounted? On the head node, copy a file into
|
||
|
/mnt/wolf. Can you read and write that file from the slave node?</P
|
||
|
><P
|
||
|
>This is really not required; it is merely convenient to have a
|
||
|
common directory reside on the head node. With a common shared directory,
|
||
|
you can easily use scp to copy files between boxes. Sterling states in his
|
||
|
book, on page 119, a single NFS server causes a serious obstacle to
|
||
|
scaling up to large numbers of nodes. I learned this when I went from a
|
||
|
small number of boxes up to a large number.</P
|
||
|
></DIV
|
||
|
><DIV
|
||
|
CLASS="sect1"
|
||
|
><HR><H1
|
||
|
CLASS="sect1"
|
||
|
><A
|
||
|
NAME="AEN202"
|
||
|
></A
|
||
|
>7. Run A Program</H1
|
||
|
><P
|
||
|
>Once you can do all the tests shown above, you should be able to run
|
||
|
a program. From here on in, the instructions are lam specific.</P
|
||
|
><P
|
||
|
>Go back to the head node, log in as wolf, and enter the following
|
||
|
commands:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>cat > /nnt/wolf/lamhosts
|
||
|
wolf01
|
||
|
wolf02
|
||
|
wolf03
|
||
|
wolf04
|
||
|
<control d></PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>Go to the lam examples directory, and compile "hello.c":</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>mpicc -o hello hello.c
|
||
|
cp hello /mnt/wolf </PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>Then, as shown in the lam documentation, start up lam:</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>[wolf@wolf00 wolf]$ lamboot -v lamhosts
|
||
|
LAM 7.0/MPI 2 C++/ROMIO - Indiana University
|
||
|
n0<2572> ssi:boot:base:linear: booting n0 (wolf00)
|
||
|
n0<2572> ssi:boot:base:linear: booting n1 (wolf01)
|
||
|
n0<2572> ssi:boot:base:linear: booting n2 (wolf02)
|
||
|
n0<2572> ssi:boot:base:linear: booting n3 (wolf04)
|
||
|
n0<2572> ssi:boot:base:linear: finished</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>So we are now finally ready to run an app. [Remember, I am using
|
||
|
lam; your message passing interface may have different syntax].</P
|
||
|
><TABLE
|
||
|
BORDER="0"
|
||
|
BGCOLOR="#E0E0E0"
|
||
|
WIDTH="100%"
|
||
|
><TR
|
||
|
><TD
|
||
|
><FONT
|
||
|
COLOR="#000000"
|
||
|
><PRE
|
||
|
CLASS="screen"
|
||
|
>[wolf@wolf00 wolf]$ mpirun n0-3 /mnt/wolf/hello
|
||
|
Hello, world! I am 0 of 4
|
||
|
Hello, world! I am 3 of 4
|
||
|
Hello, world! I am 2 of 4
|
||
|
Hello, world! I am 1 of 4
|
||
|
[wolf@wolf00 wolf]$</PRE
|
||
|
></FONT
|
||
|
></TD
|
||
|
></TR
|
||
|
></TABLE
|
||
|
><P
|
||
|
>Recall I mentioned the use of NFS above. I am telling the nodes to
|
||
|
all use the nfs shared directory, which will bottleneck when using a
|
||
|
larger number of boxes. You could easily copy the executable to each box,
|
||
|
and in the mpirun command, specify node local directories: mpirun n0-3
|
||
|
/home/wolf/hello. The prerequisite for this is to have all the files
|
||
|
available locally. In fact I have done this, and it worked better than
|
||
|
using the nfs shared executable. Of course this theory breaks down if my
|
||
|
cluster application needs to modify a file shared across the
|
||
|
cluster.</P
|
||
|
></DIV
|
||
|
></DIV
|
||
|
></BODY
|
||
|
></HTML
|
||
|
>
|