LDP/LDP/howto/docbook/Beowulf-HOWTO.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://docbook.org/xml/4.2/docbookx.dtd">
<article>
  <articleinfo>
    <title>The Beowulf HOWTO</title>

    <author>
      <firstname>Kurt</firstname>

      <surname>Swendson</surname>

      <affiliation>
        <address><email>lam32767@lycos.com</email></address>
      </affiliation>
    </author>

    <pubdate>2004-05-17</pubdate>

    <revhistory>
      <revision>
        <revnumber>1.0</revnumber>

        <date>2004-05-17</date>

        <authorinitials>01</authorinitials>

        <revremark>first official release</revremark>
      </revision>
    </revhistory>

    <abstract>
      <para>This document describes step by step instructions on building a
      Beowulf cluster. This is a Red Hat and LAM specific version of this
      document.</para>
    </abstract>
  </articleinfo>

  <sect1 id="intro">
    <title>Introduction</title>

    <para>This document describes step by step instructions on building a
    Beowulf cluster. After seeing all of the documentation that was available,
    I felt there were enough gaps and omissions that my own document that
    accurately describes how to build a Beowulf cluster would be
    beneficial.</para>

    <para>I first saw Thomas Sterling’s article in Scientific American, and
    immediately got the book, because its title was “How to Build a Beowulf”.
    No doubt - it was a valuable reference, but it really does not walk you
    through instructions on exactly what to do.</para>

    <para>What follows is a description of what I got to work. It is only one
    example - my example. You may choose a different message passing
    interface; you may choose a different Linux distribution. You may also
    spend as much time as I did researching and experimenting, and learn on
    your own.</para>

    <sect2 id="copyright">
      <title>Copyright and License</title>

      <para>This document, <emphasis>The Beowulf HOWTO</emphasis>, is
      copyrighted (c) 2002 by <emphasis>Kurt Swendson</emphasis>. Permission
      is granted to copy, distribute and/or modify this document under the
      terms of the GNU Free Documentation License, Version 1.1 or any later
      version published by the Free Software Foundation; with no Invariant
      Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A
      copy of the license is available at <ulink
      url="http://www.gnu.org/copyleft/fdl.html">
      http://www.gnu.org/copyleft/fdl.html</ulink>.</para>

      <para>Linux is a registered trademark of Linus Torvalds.</para>
    </sect2>

    <sect2 id="disclaimer">
      <title>Disclaimer</title>

      <para>No liability for the contents of this document can be accepted.
      Use the concepts, examples and information at your own risk. There may
      be errors and inaccuracies, that could be damaging to your system.
      Proceed with caution, and although this is highly unlikely, the
      author(s) do not take any responsibility.</para>

      <para>All copyrights are held by their by their respective owners,
      unless specifically noted otherwise. Use of a term in this document
      should not be regarded as affecting the validity of any trademark or
      service mark. Naming of particular products or brands should not be seen
      as endorsements.</para>
    </sect2>

    <sect2 id="credits">
      <title>Credits / Contributors</title>

      <para>Thanks to Thomas Johnson, for all of his support and
      encouragement, and of course, the hardware, without which I would not
      have been able to even start.</para>

      <para>Thanks to my lovely wife Sharron for her understanding and
      patience during my many hours spent with "the wolves".</para>
    </sect2>

    <sect2 id="feedback">
      <title>Feedback</title>

      <para>Send your additions, comments and criticisms to
      <email>lam32767@lycos.com</email>.</para>
    </sect2>
  </sect1>

  <sect1>
    <title>Definitions</title>

    <para>What is a Beowulf cluster? <ulink
    url="http://www.tldp.org/HOWTO/Beowulf-HOWTO.html">Jacek and
    Douglas</ulink> give a good definition in their article: "Beowulf is a
    multi computer architecture which can be used for parallel computations.
    It is a system which usually consists of one server node, and one or more
    client nodes connected together via Ethernet or some other network". The
    site <ulink url="http://beowulf.org">beowulf.org</ulink> lists many web
    pages about Beowulf systems built by individuals and organizations. From
    these two links, one can be exposed to a large number of perspectives on
    the Beowulf architecture, and draw his / her own conclusions.</para>

    <para>What’s the difference between a true Beowulf cluster and a COW
    [cluster of workstations]? Brahma gives a good definition:<ulink
    url="http://www.phy.duke.edu/brahma/beowulf_book/node62.html">
    http://www.phy.duke.edu/brahma/beowulf_book/node62.html</ulink>.</para>

    <para>If you are a “user” at your organization, and you have the use of
    some nodes, you may still do the instructions shown here to create a cow.
    But if you “own” the nodes, that is, if you have complete control of them,
    and are able to completely erase and rebuild them, you may create a true
    Beowulf cluster.</para>

    <para>In Brahma's web page, he suggests that you manually configure each
    box, and then, later on, after you get the feel of doing this whole
    “wolfing up” procedure, you can set up new nodes automatically, which I
    will describe in a later document.</para>
  </sect1>

  <sect1>
    <title>Requirements</title>

    <para>Let’s briefly outline your requirements: <itemizedlist>
        <listitem>
          <para>More than one box, each equipped with a network card.</para>
        </listitem>

        <listitem>
          <para>A switch or hub to connect them</para>
        </listitem>

        <listitem>
          <para>Linux</para>
        </listitem>

        <listitem>
          <para>A message-passing interface [I used lam]</para>
        </listitem>
      </itemizedlist>It is not a requirement to have a kvm switch, but it is
    convenient while setting up and / or debugging.</para>
  </sect1>

  <sect1>
    <title>Set up the head node</title>

    <para>So let’s get wolfing. Choose the most powerful box to be the head
    node. Install Linux on there, and choose every package you want. The only
    requirement is that you choose “Network Servers” [in Red Hat terminology]
    because you need to have NFS and ssh. That’s all you need. In my case, I
    was going to do development of the Beowulf application, so I added X and C
    development.</para>

    <para>It is my experience that you do not actually need NFS, but I found
    it invaluable for copying files between nodes, and for automating the
    install process. Later in this document I will describe how you can run a
    simple Beowulf application without the use of NFS, but a more complex
    application may use NFS or actually depend upon it.</para>

    <para>Those of you researching Beowulf systems will also know how you can
    have a 2nd network card on the head node so you can access it from the
    outside world. This is not required for the operation of a cluster, so you
    may do what you want regarding this.</para>

    <para>I learned the hard way: use a password that obeys the strong
    password constraints for your Linux distribution. I used an easily-typed
    password like “a” for my user, and the whole thing did not work. When I
    changed my password to a legal password, with mixed numbers, characters,
    upper and lower case, it worked.</para>

    <para>If you use lam as your message passing interface, you will read in
    the manual to turn OFF the firewalls, because they use random port numbers
    to communicate between nodes. Here is a rule: If the manual tells you to
    do something, DO IT! The lam manual also tells you to run as a non-root
    user. Make the same user for every box. Build every box on the cluster
    with the same user “wolf” with the same password.</para>

    <sect2>
      <title>Hosts</title>

      <para>First we will modify /etc/hosts. In the file, you will see the
      comments telling you to leave the “localhost” line alone. Ignore that
      advice and change it to not include the name of your box in the loopback
      address.</para>

      <para>Modify the line that says: <screen>127.0.0.1 wolf00 localhost.localdomain localhost</screen></para>

      <para>...to now say: <screen>127.0.0.1 localhost.localdomain localhost </screen></para>

      <para>Then add all the boxes you want on your cluster. Note: This is not
      required for the operation of a Beowulf cluster; only convenient, so
      that you may type a simple “wolf01” when you refer to a box on your
      cluster instead of the more tedious 192.168.0.101:</para>

      <screen>192.168.0.100 wolf00
192.168.0.101 wolf01
192.168.0.102 wolf02
192.168.0.103 wolf03
192.168.0.104 wolf04</screen>
    </sect2>

    <sect2>
      <title>Groups</title>

      <para>In order to responsibly set your cluster up, especially if you are
      a “user” of your boxes [see Definitions], you should have some measure
      of security.</para>

      <para>After you create your user, create a group, and add the user to
      the group. Then, you may modify your files and directories to only be
      accessible by the users within that group:</para>

      <screen>groupadd beowulf 
usermod -g beowulf wolf </screen>

      <para>… and add the following to /home/wolf/.bash_profile:</para>

      <screen>umask 007</screen>

      <para>Now any files created by the user “wolf” [or any user within the
      group] will automatically be only writeable by the group
      “beowulf”.</para>
    </sect2>

    <sect2>
      <title>NFS</title>

      <para>Refer to the following web site: <ulink
      url="http://www.ibiblio.org/mdw/HOWTO/NFS-HOWTO/server.html">http://www.ibiblio.org/mdw/HOWTO/NFS-HOWTO/server.html</ulink></para>

      <para>Print that up, and have it at your side. I will be directing you
      how to modify your system in order to create an NFS server, but I have
      found this site invaluable, as you may also.</para>

      <para>Make a directory for everybody to share:</para>

      <screen>mkdir /mnt/wolf 
chmod 770 /mnt/wolf 
chown wolf:beowulf /mnt/wolf -R </screen>

      <para>Go to the /etc directory, and add your “shared” directory to the
      exports file:</para>

      <screen>cd /etc 
cat &gt;&gt; exports 
/mnt/wolf 192.168.0.100/192.168.0.255 (rw) 
&lt;control d&gt;</screen>
    </sect2>

    <sect2>
      <title>IP Addresses</title>

      <para>My network is 192.168.0.nnn because it is one of the “private” IP
      ranges. Thomas Sterling talks about it on page 106 of his book. It is
      inside my firewall, and works just fine.</para>

      <para>My head node, which I call “wolf00” is 192.168.0.100, and every
      other node is named “wolfnn”, with an ip of 192.168.0.100 + nn. I am
      following the sage advice of many of the web pages out there, and
      setting myself up for an easier task of scaling up my cluster.</para>
    </sect2>

    <sect2>
      <title>Services</title>

      <para>Make sure that services we want are up:</para>

      <screen>chkconfig -add sshd 
chkconfig -add telnet 
chkconfig -add nfs 
chkconfig -add rexec 
chkconfig -add rlogin 
chkconfig -level 3 rsh on 
chkconfig -level 3 telnet on 
chkconfig -level 3 nfs on 
chkconfig -level 3 rexec on 
chkconfig -level 3 rlogin on</screen>

      <para>Telnet? I added this just as a convenience. It is not needed but
      it is nice to have while debugging nfs. How are you going to log into a
      box if you can’t ssh to the box?</para>

      <para>…And, during startup, I saw some services that I know I don’t
      want, and in my opinion, could be removed. You may add or remove others
      that suit your needs; just include the ones shown above.</para>

      <screen>chkconfig -del atd 
chkconfig -del rsh
chkconfig -del sendmail</screen>
    </sect2>

    <sect2>
      <title>SSH</title>

      <para>To be responsible, we make ssh work. While logged in as root, you
      must modify the /etc/ssh/sshd_config file. The lines:</para>

      <screen>#RSAAuthentication yes 
#AuthorizedKeysFile .ssh/authorized_keys</screen>

      <para>… are commented out, so uncomment them [remove the #].</para>

      <para>Reboot, and log back in as wolf, because the operation of your
      cluster will always be done from the user "wolf". Also, the hosts file
      modifications done earlier must take effect. Logging out and back in
      will not do this. To be sure, reboot the box, and make sure your prompt
      shows hostname "wolf00". </para>

      <para>To generate your public and private SSH keys, do this:</para>

      <screen>ssh-keygen -b 1024 -f ~/.ssh/id_rsa -t rsa -N “” </screen>

      <para>… and it will display a few messages, and tell you that it created
      the public / private key pair. You will see these files, id_rsa and
      id_rsa.pub, in the /home/wolf/.ssh directory.</para>

      <para>Copy the id_rsa.pub file into a file called “authorized_keys”
      right there in the .ssh directory. We will be using this file later.
      Verify that the contents of this file show the hostname [the reason we
      rebooted the box]. Modify the security on the files, and the
      directory:</para>

      <screen>chmod 644 ~/.ssh/auth* 
chmod 755 ~/.ssh </screen>

      <para>According to the LAM user group, only the head node needs to log
      on to the slave nodes; not the other way around. Therefore when we copy
      the public key files, we only copy the head node’s key file to each
      slave node, and set up the agent on the head node. This is MUCH easier
      than copying all authorized_keys files to all nodes. I will describe
      this in more detail later.</para>

      <para>Note: I only am documenting what the LAM distribution of the
      message passing interface requires; if you chose another MPI to build
      your cluster your requirements may differ.</para>

      <para>At the end of /home/wolf/.bash_profile, add the following
      statements [again this is lam-specific; your requirements may
      vary]:</para>

      <screen>export LAMRSH=’ssh -x’ 
ssh-agent sh -c ‘ssh-add &amp;&amp; bash’</screen>
    </sect2>

    <sect2>
      <title>MPI</title>

      <para>Lastly, put your message - passing interface on the box. As stated
      in 1.2 Requirements, I used lam. You can get lam from here:</para>

      <para><ulink url=" http://www.lam-mpi.org/">
      http://www.lam-mpi.org/</ulink></para>

      <para>...but you can use any other message passing interface or parallel
      virtual machine software you want. Again --- I am showing you what
      worked for me.</para>

      <para>You can either build LAM from the supplied source, or use their
      precompiled RPM package. It is not in the scope of this document to
      describe that - I just got the source and followed the directions, and
      in another experiment I installed their rpm; both of them worked fine.
      Remember the whole reason we are doing this is to learn - go forth and
      learn.</para>
    </sect2>
  </sect1>

  <sect1>
    <title>Set up slave nodes</title>

    <para>Get your network cables out. Install Linux on the first non-head
    node.</para>

    <sect2>
      <title>Base linux install</title>

      <para>Going with my example node names and IP addresses, this is what I
      chose during setup:</para>

      <screen>Workstation 
auto partition 
remove all partitions on system 
use LILO as the boot loader 
put boot loader on the MBR 
host name wolf01 
ip address 192.168.0.101 
add the user “wolf” 
same password as on all other nodes 
NO firewall</screen>

      <para>The ONLY package installed: network servers. UN select all other
      packages.</para>

      <para>It doesn’t matter what else you choose; this is the minimum of
      what you need. Why fill the box up with non-essential software you will
      never use? My research has been concentrated on finding that minimal
      configuration to get up and running.</para>

      <para>Here’s another very important point. When you move on to an
      automated install and config, you really will NEVER log in to the box.
      Only during setup and install do I type anything directly on the
      box.</para>
    </sect2>

    <sect2>
      <title>Hardware</title>

      <para>When the computer starts up, it will complain if it does not have
      a keyboard connected. I was not able to modify the BIOS, because I had
      older discarded boxes with no documentation, so I just connected a
      “fake” keyboard.</para>

      <para>I am in the computer industry, and see hundreds of keyboards come
      and go, and some occasionally end up in the garbage. I get the old dead
      keyboard out of the garbage, remove JUST the cord with the tiny circuit
      board up there in the corner, where the num lock and caps lock lights
      are. Then I plug the cord in, and the computer thinks it has a complete
      keyboard without incident.</para>

      <para>Again, you would be better off modifying your bios, if you are
      able to. This is just a trick to use in the case that you don’t have the
      bios program.</para>
    </sect2>

    <sect2>
      <title>Post install commands</title>

      <para>After your newly installed box reboots, log on as root again,
      and…</para>

      <itemizedlist>
        <listitem>
          <para>do the same chkconfig commands stated above to set up the
          right services.</para>
        </listitem>
      </itemizedlist>

      <itemizedlist>
        <listitem>
          <para>modify hosts; remove “wolfnn” from localhost, and just add
          wolfnn and wolf00.</para>
        </listitem>
      </itemizedlist>

      <itemizedlist>
        <listitem>
          <para>install lam</para>
        </listitem>
      </itemizedlist>

      <itemizedlist>
        <listitem>
          <para>create the /mnt/wolf directory and set up security for
          it.</para>
        </listitem>
      </itemizedlist>

      <itemizedlist>
        <listitem>
          <para>do the ssh configuration</para>
        </listitem>
      </itemizedlist>

      <para>Up to this point, we are pretty much the same as the head node. I
      do NOT do the modification of the exports file.</para>

      <para>Also, do NOT add this line to the .bash_profile:</para>

      <screen>sh -c ‘ssh-add &amp;&amp; bash’</screen>
    </sect2>

    <sect2>
      <title>SSH on slave nodes</title>

      <para>Recall that on the head node, we created a file “authorized_keys”.
      Copy that file, created on your head node, to the ~/.ssh directory on
      the slave nodes. The HEAD node will log on the all the SLAVE
      nodes.</para>

      <para>The requirement, as stated in the LAM user manual, is that there
      should be no interaction required when logging in from the head to any
      of the slaves. So, copying the public key from the head node into each
      slave node, in the file “authorized_keys”, tells each slave that “wolf
      user on wolf00 is allowed to log on here without any password; we know
      it is safe.”</para>

      <para>However you may recall that the documentation states that the
      first time you log on, it will ask for confirmation. So only once, after
      doing the above configuration, go back to the head node, and type ssh
      wolfnn where “wolfnn” is the name of your newly configured slave node.
      It will ask you for confirmation, and you simply answer “yes” to it, and
      that will be the last time you will have to interact.</para>

      <para>Prove it by logging off, and then ssh back to that node, and it
      should just immediately log you in, with no dialog whatsoever.</para>
    </sect2>

    <sect2>
      <title>NFS settings on slave nodes</title>

      <para>As root, enter these commands:</para>

      <screen>cat &gt;&gt; /etc/fstab 
wolf00:/mnt/wolf /mnt/wolf nfs rw,hard,intr 0 0 
&lt;control d&gt; </screen>

      <para>What we did here was automatically mount the exported directory we
      put in the /etc/exports file on the head node. More discussion regarding
      nfs later in this document.</para>
    </sect2>

    <sect2>
      <title>Lilo modifcations on slave nodes</title>

      <para>Then modify /etc/lilo.conf.</para>

      <para>The 2nd line of this file says</para>

      <screen>timeout=nn</screen>

      <para>Modify that line to say:</para>

      <screen>timeout=1200</screen>

      <para>After it is modified, we invoke the changes. You type
      "/sbin/lilo", and it will display back "added linux *" to confirm that
      it took the changes you made to the lilo.conf file:</para>

      <screen>/sbin/lilo
Added linux * </screen>

      <para>Why do I do this lilo modification? If you were researching
      Beowulf on the web, and understand everything I have done so far, you
      may wonder, “I don’t remember reading anything about lilo.conf.”</para>

      <para>All my Beowulf nodes share a single power strip. I turn on the
      power strip, and every box on the cluster starts up immediately. As the
      startup procedure progresses, it mounts file systems. Seeing that the
      non-head nodes mount the shared directory from the head node, they all
      will have to wait a little bit until the head node is up, with NFS ready
      to go. So I make each slave node wait 2 minutes in the lilo step.
      Meanwhile, the head node comes up, and making the shared directory
      available. By then, the slave nodes finally start booting up because
      lilo has waited 2 minutes.</para>
    </sect2>
  </sect1>

  <sect1>
    <title>Verification</title>

    <para>All done! You are almost ready to start wolfing.</para>

    <para>Reboot your boxes. Did they all come up? Can you ping the head node
    from each box? Can you ping each node from the head node? Can you telnet?
    Can you ssh? Don’t worry about doing ssh as root; only as wolf.</para>

    <para>If you are logged in as wolf, and ssh to a box, does it go
    automatically, without prompting for password?</para>

    <para>After the node boots up, log in as wolf, and say “mount”. Does it
    show wolf00:/mnt/wolf mounted? On the head node, copy a file into
    /mnt/wolf. Can you read and write that file from the node box?</para>

    <para>This is really not required; it is merely convenient to have a
    common directory reside on the head node. With a common shared directory,
    you can easily use scp to copy files between boxes. Sterling states in his
    book, on page 119, a single NFS server causes a serious obstacle to
    scaling up to large numbers of nodes. I learned this when I went from a
    small number of boxes up to a large number.</para>
  </sect1>

  <sect1>
    <title>Run a program</title>

    <para>Once you can do all the tests shown above, you should be able to run
    a program. From here on in, the instructions are lam specific.</para>

    <para>Go back to the head node, log in as wolf, and enter the following
    commands:</para>

    <screen>cat &gt; /nnt/wolf/lamhosts 
wolf01 
wolf02 
wolf03 
wolf04 
&lt;control d&gt;</screen>

    <para>Go to the lam examples directory, and compile “hello.c”:</para>

    <screen>mpicc -o hello hello.c 
cp hello /mnt/wolf </screen>

    <para>Then, as shown in the lam documentation, start up lam:</para>

    <screen>[wolf@wolf00 wolf]$ lamboot -v lamhosts 
LAM 7.0/MPI 2 C++/ROMIO - Indiana University 
n0&lt;2572&gt; ssi:boot:base:linear: booting n0 (wolf00) 
n0&lt;2572&gt; ssi:boot:base:linear: booting n1 (wolf01) 
n0&lt;2572&gt; ssi:boot:base:linear: booting n2 (wolf02) 
n0&lt;2572&gt; ssi:boot:base:linear: booting n3 (wolf04) 
n0&lt;2572&gt; ssi:boot:base:linear: finished</screen>

    <para>So we are now finally ready to run an app. [Remember, I am using
    lam; your message passing interface may have different syntax].</para>

    <screen>[wolf@wolf00 wolf]$ mpirun n0-3 /mnt/wolf/hello 
Hello, world! I am 0 of 4 
Hello, world! I am 3 of 4 
Hello, world! I am 2 of 4 
Hello, world! I am 1 of 4 
[wolf@wolf00 wolf]$</screen>

    <para>Recall I mentioned the use of NFS above. I am telling the nodes to
    all use the nfs shared directory, which will bottleneck when using a
    larger number of boxes. You could easily copy the executable to each box,
    and in the mpirun command, specify node local directories: mpirun n0-3
    /home/wolf/hello. The prerequisite for this is to have all the files
    available locally. In fact I have done this, and it worked better than
    using the nfs shared executable. Of course this theory breaks down if my
    cluster application needs to modify a file shared across the
    cluster.</para>
  </sect1>
</article>