updated

2003-02-19 16:26:55 +00:00 · 2003-02-19 16:26:55 +00:00 · 2b5e0e5cab
parent 3dcbe47888
commit 2b5e0e5cab
1 changed files with 1641 additions and 1233 deletions
--- a/LDP/howto/linuxdoc/CPU-Design-HOWTO.sgml
+++ b/LDP/howto/linuxdoc/CPU-Design-HOWTO.sgml
@ -42,7 +42,7 @@ CPU Design HOW-TO
 <author>Al Dev (Alavoor Vasudevan) 
       <htmlurl url="mailto:alavoor[AT]yahoo.com"
             name="alavoor[AT]yahoo.com">
-<date>v12.3, 11 Nov 2002
+<date>v12.5, 17 Feb 2002
 <abstract>
 CPU is the "brain" of computer and is a very vital component
 of computer system and is like a "cousin brother" of operating system
@ -656,7 +656,8 @@ Visit the following sites for Super Computers -
 <itemize>
 <item> Top 500 super computers <url url="http://www.top500.org/ORSC/2000">
 <item> National Computing Facilities Foundation <url url="http://www.nwo.nl/ncf/indexeng.htm">
-<item> Linux Super Computer Beowulf cluster <url url="http://www.linuxdoc.org/HOWTO/Beowulf-HOWTO.html">
+<item> Linux Super Computer Beowulf cluster 
+<url url="http://www.tldp.org/HOWTO/Beowulf-HOWTO.html">
 <item> Extreme machines - beowulf cluster <url url="http://www.xtreme-machines.com">
 <item> System architecture description of
 the Hitachi SR2201 <url url="http://www.hitachi.co.jp/Prod/comp/hpc/eng/sr1.html">
@ -912,6 +913,413 @@ being SM-MIMD machines also because special assisting
 hardware/software (such as a directory memory) has been
 incorporated to establish a single system image although
 the memory is physically distributed.
+
+
+
+
+<sect> Linux Super Computers
+<p>
+Supercomputers traditionally have been expensive, highly customized designs purchased by a select group of customers, but the industry is being overhauled by comparatively mainstream technologies such as Intel processors, 
+<url name="InfiniBand" url="http://news.com.com/2100-1001-966777.html"> high-speed 
+connections (see also <url name="Myricom" url="http://www.myricom.com">, and 
+<url name="Fibre Channel" url="http://www.fibrechannel.com"> storage networks that have 
+become fast enough 
+to accomplish many tasks.
+
+The new breed of supercomputers usually involve numerous two-processor servers bolted into racks and joined with special high-speed networks into a 
+<url name="cluster" url="http://news.com.com/2110-1001-966788.html">.
+
+<url name="Linux Networx" url="http://www.linuxnetworx.com"> customers include Los 
+Alamos and Lawrence Livermore national laboratories for nuclear weapons research,
+Boeing for aeronautic engineering, and Sequenom for genetics research.
+
+About <url name="Clusterworx" url="http://www.linuxnetworx.com/products/clusterworx.php"> :
+Clusterworx is the most complete administration tool for monitoring and 
+management of Linux-based cluster systems. Clusterworx increases system uptime, improves cluster efficiency, tracks cluster performance, and removes the hassle from cluster installation and configuration.
+The primary features of Clusterworx include monitoring of system properties, integrated 
+disk cloning using multicast technology, and event management of node properties
+through a remotely accessible, easy-to-use graphical user interface (GUI). Some of 
+the system properties monitored include CPU Usage, Memory Usage, Disk I/O, Network 
+Bandwidth, and many more. Additional custom properties can easily be monitored through 
+the use of user-specific plug-ins. Events automate system administration tasks by
+setting thresholds on these properties and then taking default or custom actions 
+when these values are exceeded.
+	
+
+About <url name="Myricom" url="http://www.myricom.com">: 
+Myrinet clusters are used for computationally demanding scientific and 
+engineering applications, and for data-intensive web and database applications. All 
+of the major OEM computer companies today offer cluster products. In 
+addition to direct sales, Myricom supplies Myrinet products and software to
+IBM, HP, Compaq, Sun, NEC, SGI, Cray, and many other OEM and 
+system-integration companies. There are thousands of Myrinet clusters in 
+use world-wide, including several systems with more than 1000 processors.
+
+<sect1> Little Linux SuperComputer In Your Garage
+<p>
+Imagine your garage filled with dozens of computers all linked together in a super-powerful Linux cluster. You still have to supply your own hardware, but the geek equivalent of a Mustang GT will become easier to set up and maintain, thanks to new software to be demonstrated at LinuxWorld next week.
+
+The Open Source Cluster Applications Resources (OSCAR) software, being developed by the 
+<url name="Open Cluster Group" url="http://www.OpenClusterGroup.org">, will allow a 
+non-expert Linux user to set up a cluster in a matter of hours, instead of the days of work it now can take an experienced network administrator to piece one together. Developers of OSCAR are saying it'll be as easy as installing most software. Call it a "supercomputer on a CD."
+
+"We've actually taken it to the point where a typical high school kid who has a little bit of experience with Linux and can get their hands on a couple of extra boxes could set up a cluster at home," says Stephen L. Scott, project leader at the Oak Ridge National Laboratory, one of several organizations working on OSCAR. "You can have a little supercomputer in your garage."
+
+Supercomputing in Linux:
+
+From 
+A <url name="step-by-step guide" url="http://www.pcquest.com/content/Supercomputer/102051001.asp">
+on how to set up a cluster of PCQLinux machines for supercomputing 
+
+Shekhar Govindarajan, Friday, May 10, 2002
+
+To keep it simple, we start with a cluster of three machines. One will
+be the server and the other two will be the nodes. However, plugging in
+additional nodes is easy and we will tell you the modification to
+accommodate additional nodes. Instead of two nodes, you can have a
+single node. So, even if you have two PCs, you can build a cluster. We
+suggest that you go through the article Understanding Clustering, page
+42, which explains what a cluster is and what server and nodes mean in a
+cluster before you get started.
+
+*Set up server hardware
+*You should have at least a 2 GB or bigger hard disk on the server. It
+should have a graphics card that is supported by PCQLinux 7.1 and a
+floppy drive. You also need to plug in two network cards preferably the
+faster PCI cards instead of ISA supported by PCQLinux. 
+
+Why two network cards?  Adhering to the standards for cluster setups, if
+the server node needs to be connected to the outside (external)
+network? Internet or your private network?the nodes in the cluster must
+be on a separate network. This is needed if you want to remotely execute
+programs on the server. If not, you can do away with a second network
+card for the external network. For example, at PCQ Labs, we have our
+regular machines plugged in the 192.168.1.0 network. We selected the
+network 172.16.0.0 for the cluster nodes. Hence, on the server, one
+network card (called external interface) will be connected to the Labs
+network and the other network card (internal interface) will be
+connected to a switch. We used a 100/10 Mbps switch. A 100 Mbps switch
+is recommended because the faster the speed of the network, the faster
+is the message passing. All cluster nodes will also be connected to this
+switch.
+
+*PCQLinux on server
+*If you already have a machine with PCQLinux 7.1, including the X Window
+(KDE or GNOME), installed you can use it as a server machine. In this
+case you may skip the following steps for installation. If this machine
+has a firewall (ipchains or iptables) setup, remove all strict
+restrictive rules, as it will hinder communication between the server
+and the nodes. The 'medium' level of firewall rules in PCQLinux is
+suitable. After the cluster set up, you may selectively enable the
+rules, if required. 
+
+If you haven't installed PCQLinux on the machine, opt for custom system
+install and manual partitioning. Create the swap and / (ROOT)
+partitions. If you are shown the 1024 cylinder limit problem, you may
+also have to create a /boot partition of about 50 MB. In the network
+configuration, fill in the hostname (say, server. cluster.net), IP
+address of the gateway/router on your network, and the IP of a DNS
+server (if any) running on your network. Leave other field to their
+defaults. We will set up the IP addresses for network cards after the
+installation. Select 'Medium' for the firewall configuration. We now
+come to the package-selection wizard. You don't need to install all the
+packages. Besides the packages selected by default, select 'Development'
+and 'Kernel Development' packages. These provide various libraries and
+header files for writing programs and are useful if you will develop
+applications on the cluster. You will need the X Window system because
+we will use a graphical tool for cluster set up and configuration. By
+default, GNOME is selected as the Window Manager. If you are comfortable
+using KDE, select it instead. By suggesting that you select only a few
+packages for install, we aim at a minimal installation. However, if you
+wish to install other packages like your favorite text editor, network
+management utilities or a Web server, then you can select them. Make
+sure that you set up your graphics card and monitor correctly. 
+
+After the installation finishes, reboot into PCQLinux. Log in as root. 
+
+*Set up OSCAR
+*Mount this month's CD and copy the file oscar-1.2.1.tar.gz from the
+directory system/cdrom/ unltdlinux/linux on the CD to /root. Uncompress
+and extract the archive as:
+
+tar -zxvf oscar-1.2.1.tar.gz
+
+This will extract the files in a directory named oscar-1.2.1 within
+/root directory. 
+
+OSCAR installs Linux on the nodes from the server across the network.
+For this, it constructs an image file from RPM packages. This image file
+is in turn picked up by the nodes to install PCQLinux onto them. The
+OSCAR version we've given on the CD is customized for RedHat 7.1. Though
+PCQLinux 7.1 is also based on RedHat 7.1, some RPMs with PCQLinux are of
+more recent versions than the ones required by OSCAR. OSCAR constructs
+the image out of a list of RPMs specified in sample.rpmlist in the
+subdirectory oscarsamples in oscar-1.2.1. You have to replace this file
+with the one customized for PCQLinux RPMs. We have given a file named
+sample.rpmlist on this month's CD in the directory
+system/cdrom/unltdlinux /linux. Overwrite the file sample.rpmlist in the
+oscarsamples directory with this file. 
+
+<code>
+*Copy PCQLinux RPMs to /tftpboot/rpm
+*For creating the image, OSCAR will look for the PCQLinux RPMs in the
+directory /tftpboot/rpm. Create a directory /tftpboot and a subdirectory
+named rpm within it
+
+mkdir /tftpboot
+mkdir /tftpboot/rpm
+
+Next, copy all the PCQLinux RPMs from both the CDs to /tftpboot/rpm
+directory. Insert CD 1 (PCQLinux CD 1, given with our July 2001 issue)
+and issue the following commands:
+
+mount /mnt/cdrom
+cd /mnt/cdrom/RedHat/RPMS
+cp *.rpm /tftpboot/ rpm 
+cd
+umount /mnt/cdrom
+
+Insert CD 2 (given with the July 2001 issue) and issue the above
+commands again. 
+
+Note. If you are tight at the disk space, you don't need to copy all the
+RPMs to /tftpboot/rpm. You can copy only the RPMs listed in
+sample.rpmlist file. Copy only the required RPMs. 
+
+*Copy required RPMs
+*Type the following in a Linux text editor and save the file as copyrpms.sh
+
+#!/bin/bash
+rpms_path="/mnt/cdrom/RedHat/RPMS/"
+rpms_list="/root/oscar-1.2.1/oscarsamples/sample.rpmlist"
+mount /mnt/cdrom
+while read line
+do file="$rpms_path$line.i386.rpm"
+if [ -f $file ]
+then
+cp $file /tftpboot/rpm
+else file="$rpms_path$line.noarch.rpm"
+if [ -f $file ]
+then
+cp $file /tftpboot/rpm
+else file="$rpms_path$line.i586.rpm"
+if [ -f $file ]
+then
+cp $file /tftpboot/rpm
+else file="$rpms_path$line.i686.rpm"
+if [ -f $file ]
+then
+cp $file /tftpboot/rpm
+fi
+fi
+fi 
+fi 
+done &lt $rpms_list 
+eject
+
+Give executable permissions to the file as:
+
+chmod +x copyrpms.sh
+
+Assuming that you have created the directory /tftpboot/rpm, insert
+PCQLinux CD 1 (don't mount it) and issue:
+./copyrpms
+
+When all the RPMs from the CD are copied, the CD drive will eject. Next,
+insert CD 2 and issue ./copyrpms again. 
+
+*Fix glitch in PCQLinux 
+*On this month's CD we have carried the zlib
+rpm 'zlib-1.1.3-22.i386.rpm' which you can find in the directory
+system/cdrom/ unltdlinux/linux on the CD. (We had given this on our July
+CD as well, but the file was corrupt.) Install the RPM as:
+
+rpm -ivh zlib-1.1.3-22.i386.rpm
+
+Copy this file to /tftpboot/rpm directory. This will prompt you to
+overwrite the corrupted zlib RPM, already in the directory. Go for it. 
+
+*Set up networking
+*Linux names network cards or interfaces as eth0, eth1, eth2. In our
+case eth0 is the internal interface and eth1 is the external interface.
+We assign eth0, an IP address of 172.16.0.1. Since we are running a DHCP
+server on the PCQ Labs network, we will set eth1 to obtain IP address
+from the DHCP server. If you are using a single network card for the
+cluster network, skip setting up the second card. 
+
+Launch X Window. Launch a terminal window within GNOME or KDE and issue
+the command netcfg. This will pop up a graphical network configurator.
+Click on the Interfaces tab. To set up the internal interface, click on
+eth0 and then on edit. For IP address, enter 172.16.0.1 and for the
+netmask enter 255.255.255.0. Click on 'Activate interface at boot time'.
+For 'Interface configuration protocol' select 'none' from the drop-down
+list. 
+
+To set up the external interface, select eth1 and click on edit. If you
+are running a DHCP server, select dhcp from the drop down list. Else,
+enter a free IP address (say, 192.168.1.23), the associated netmask
+(say, 255.255.255.0) and select none from the drop-down list. In either
+case, make sure to click on 'Activate interface at boot time'. 
+
+Highlight eth0 and click on the button 'Activate'. Do the same for eth1.
+Finally, click on save and quit the configurator.  
+
+Issue the command, ifconfig to check whether the network interfaces are
+up and have been given the correct IP addresses.
+You are now ready to start Oscar.
+
+*Run OSCAR
+*In the terminal window, change to oscar-1.2.2 directory and issue the
+command:
+
+./install_cluster eth0
+
+Replace eth0 with the name of the internal interface in your case. You
+will see text flowing in the window. After a couple of minutes, the
+graphical wizard of OSCAR will pop up. OSCAR installation calls cluster
+nodes as clients
+
+*Build image from RPMs
+*Click on 'Build Oscar Client Image'. We assume that all the node
+machines will have IDE hard disks. If you are using SCSI hard disk in
+the nodes, you need to change the Disk Partition File. Refer to the
+OSCAR installation documentation on the CD. When finished, a message
+'Successfully created image oscarimage' will pop up. 
+
+*Tell OSCAR about the nodes
+*Click on the button 'Define OSCAR clients'. Here you should see the
+domain name, starting IP and subnet mast, pre-filled with cluster.net,
+172.16.0.2 and 255.255. 255.0. With 'Number of hosts' you specify the
+number of nodes. As per the OSCAR documentation, OSCAR supports up to
+100 nodes or may be more. But it hasn't been experimented with arbitrary
+large number of nodes. In our case we fill in two. If you are
+experimenting with two machines, one server and the other the node, then
+fill in one. 
+
+In OSCAR once you define the number of nodes you cannot change it after
+the cluster is installed. You need to again start from the beginning,
+ie, from the step when we issued 'install_cluster'
+
+Note. If for any reason you need to start again, before issuing
+./install_cluster, execute the script named start_over located in the
+subdirectory scripts as:
+
+/root/oscar-1.2.1/script/start_over'
+
+Clicking on the 'Add clients' button will show 'Successfully created
+clients' after a couple of seconds.
+
+*Set up the nodes *
+Before carrying out the subsequent steps in OSCAR installation, connect
+the network cards of the node machines to the switch and set them up to
+boot from floppy from their BIOS.
+
+*Set up nodes to network
+*We come back to OSCAR installation wizard running on the server
+machine. Click on the button 'Set up Networking'. In the right frame you
+will see a tree-like structure as shown in the screenshot. In our case,
+the two nodes are given a hostname of oscarnode1.cluster.net and
+oscarnode2. cluster.net. They are assigned IP addresses 172.16.0.2 and
+172.16. 0.3 respectively. Next, we assign the MAC (Media Access Control)
+address of the nodes to the listed IP addresses. This can be done by
+booting the nodes using a floppy created by OSCAR or by networking
+booting them. For the latter refer to the OSCAR documentation given on
+the CD.
+
+Click on the button 'Build AutoInstall Floppy'. This will pop up a
+terminal window. Insert a blank floppy in the server and click 'y' to
+continue. After the terminal window disappears, click on the button
+'Collect MAC addresses' in the OSCAR window. Insert the floppy in one of
+the node machines and power it on. The machine will boot from the
+floppy. Press enter at the boot: prompt. After some time, the MAC
+address of the node will show up in the left frame. Suppose we want to
+assign the IP address 172.16.0.1 to this node. Click on the MAC address
+in the left and on the 'osacrnde1.cluster.net' in the right frame. Then,
+click on 'Assign MAC to node'. 
+
+*Assign IP addresses to the nodes of the cluster
+*Switch off the node machine. Now boot the second node machine from the
+same floppy. As before, the MAC address of the second node will appear
+in the left frame. Assign it to oscarnode2. cluster.net. 
+
+If you want to plug in more node machines, repeat the above process for
+them. When done, click on the button 'Stop collecting' on the OSCAR window. 
+
+After shutting down all the node machines, click on the button
+'Configure DHCP Server'. Then click on the close button in the 'MAC
+address collection' window.
+
+*PCQLinux on the nodes
+*Next, boot the first node machine again from the floppy. This time the
+node machine will install PCQLinux 7.1 from the network. When done, a
+message, as following, will be shown:
+
+I have done for ' seconds. Reboot me already
+
+Take out the floppy and reboot the node machine. This time it should
+boot from the hard disk. If everything has gone well, you will boot into
+PCOLinux 7.1. While booting, PCQLinux will detect and prompt you to set
+up hardware like mouse, graphics card, sound card etc on the nodes. 
+
+*Problem: No active partition
+*If you are shown an error during booting which says no active
+partition, then boot from a Windows bootable floppy or CD. Launch fdisk
+and select option2 (Set active partition). Set partition 1 of type
+non-dos and about 31 MB in size as active. This is the /boot partition
+from where the kernel boot image resides. 
+
+*Test networking of nodes
+*On the server, open another terminal window and issue:
+
+/root/oscar-1.2.2/scripts/ping_clients
+
+If there is no problem with the networking, you will be shown 'All
+clients responded'. Else check whether all nodes are powered on, defects
+in network cables, hub/ switch ports etc. From now on, ideally, you
+don't need to work physically on the node machines. Hence you can plug
+off the monitor, keyboard, mouse, etc from the node machines. If the
+node machines need to be accessed and worked upon, you should use SSH
+(Secure Shell), similar to telnet but secure, to access them from the
+server. 
+
+*All done
+*Click on 'Complete Cluster Setup' and then on 'Test cluster Setup'.
+This will pop up a terminal window and prompt you to enter a non-root
+username. Enter 'shekhar' (say). If the user account does not exist on
+the server machine, it will be created. In the latter case, you will be
+prompted for a password for the new account. Click on the 'Quit' button
+on the OSCAR window. Reboot the server machine. 
+
+*Test the cluster
+*To test the cluster, log in as the user that you created above (shekhar
+in our case) and issue:
+
+cd OSCAR_test
+./text_cluster
+
+Enter the number of nodes when prompted (two in our case). For the
+number of processors on each client enter 1 (assuming uniprocessor
+machines). The test verifies the running of PBS and runs example
+programs coded using LAM, MPICH, PVM libraries by dispatching them
+through PBS to the nodes. You can see pbs_mom (see Understanding
+Clustering, page 42) running on the nodes by issuing the command 'ps 'e
+| grep pbs_mom' on the nodes. 
+
+If there are no error messages in the output, congratulations, you have
+your supercomputer up and running. Our cluster setup qualifies to be
+called a Beowulf cluster because it has been built using easily
+available hardware, free and open-source software, the /home directory
+on the server is exported to all the nodes via NFS (you can check this
+by issuing the command 'mount' on the nodes), and finally the server and
+nodes can execute command and scripts remotely on each other via SSH. 
+Using the libraries installed on the cluster, you can start developing
+or executing cluster-aware applications on the server. The compilers for
+them (like, gcc, g++) are same as with PCQLinux. 
+
+Shekhar Govindarajan
+
+</code>
+
 <!-- 
 	*******************************************
 	************ End of Section ***************