<item> QED RISC 64-bit and MIPS cpus : <url url="http://www.qedinc.com/about.htm">
<item> Origin 2000 CPU - <url url="http://techpubs.sgi.com/library/manuals/3000/007-3511-001/html/O2000Tuning.1.html">
<item> NVAX CPUs <url url="http://www.research.compaq.com/wrl/DECarchives/DTJ/DTJ700"> and at <url name="mirror-site" url="http://www.digital.com/info/DTJ700">
<item> Univ. of Mich High-perf. GaAs Microprocessor Project <url url="http://www.eecs.umich.edu/UMichMP">
<item> CPU Info center - List of CPUs sparc, arm etc.. <url url="http://bwrc.eecs.berkeley.edu/CIC/tech">
<item> Main CPU site is : Google Search engine CPU site "Computers>Hardware>Components>Microprocessors" <url url="http://directory.google.com/Top/Computers/Hardware/Components/Microprocessors">
</itemize>
Other important CPU sites are at -
<itemize>
<item> World-wide 24-hour news on CPUs <url url="http://www.newsnow.co.uk/cgi/NewsNow/NewsLink.htm?Theme=Processors">
<item> The computer architecture site is at <url url="http://www.cs.wisc.edu/~arch/www">
<item> ARM CPU <url url="http://www.arm.com/Documentation">
<item> Great CPUs <url url="http://www.cs.uregina.ca/~bayko/cpu.html">
<item> Intel - How the Microprocessors work <url url="http://www.intel.com/education/mpuworks">
<item> Simple course in Microprocessors <url url="http://www.hkrmicro.com/course/micro.html">
</itemize>
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1> How Transistors work <label id="trans">
<p>
Microprocessors are essential to many of the products we use every day such as TVs, cars, radios, home appliances and of course, computers. Transistors are the main components of microprocessors.
At their most basic level, transistors may seem simple. But their development actually required many years of painstaking research. Before transistors, computers relied on slow, inefficient vacuum tubes and mechanical switches to process information. In 1958, engineers (one of them Intel founder Robert Noyce) managed to put two transistors onto a silicon crystal and create the first integrated circuit that led to the microprocessor.
Transistors are miniature electronic switches. They are the building blocks of the microprocessor which is the brain of the computer.
Similar to a basic light switch, transistors have two operating positions, on and off. This on/off, or binary functionality of transistors enables the processing of information in a computer.
<bf>How a simple electronic switch works: </bf>
<p>
The only information computers understand are electrical signals that are switched on and off. To comprehend transistors, it is necessary to have an understanding of how a switched electronic circuit works.
Switched electronic circuits consist of several parts. One is the circuit pathway where the electrical current flows - typically through a wire. Another is the switch, a device that starts and stops the flow of electrical current by either completing or breaking the circuit's pathway.
Transistors have no moving parts and are turned on and off by electrical signals. The on/off switching of transistors facilitates the work performed by microprocessors.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1> How a Transistors handles information<label id="transinfo">
<p>
Something that has only two states, like a transistor, can be referred to as binary. The transistor's on state is represented by a 1 and the off state is represented by a 0. Specific sequences and patterns of 1's and 0's generated by multiple transistors can represent letters, numbers, colors and graphics. This is known as binary notation
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1> Displaying binary information <label id="bininfo">
<p>
<bf> Spell your name in Binary: </bf>
<p>
Each character of the alphabet has a binary equivalent. Below is the name JOHN and its equivalent in binary.
<code>
J 0100 1010
O 0100 1111
H 0100 1000
N 0100 1110
</code>
More complex information can be created such as graphics, audio and video using the binary, or on/off action of transistors.
Scroll down to the Binary Chart below to see the complete alphabet in binary.
<!-- Put a line space after &lowbar below to avoid space between.... -->
<table loc=p>
<tabular ca="rll">
Character <colsep>Binary <colsep>Character <colsep>Binary <rowsep><hline>
A <colsep> 0100 0001 <colsep> N <colsep> 0100 1110 <rowsep>
B <colsep> 0100 0010 <colsep> O <colsep> 0100 1111 <rowsep>
C <colsep> 0100 0011 <colsep> P <colsep> 0101 0000 <rowsep>
D <colsep> 0100 0100 <colsep> Q <colsep> 0101 0001 <rowsep>
E <colsep> 0100 0101 <colsep> R <colsep> 0101 0010 <rowsep>
F <colsep> 0100 0110 <colsep> S <colsep> 0101 0011 <rowsep>
G <colsep> 0100 0111 <colsep> T <colsep> 0101 0100 <rowsep>
H <colsep> 0100 1000 <colsep> U <colsep> 0101 0101 <rowsep>
I <colsep> 0100 1001 <colsep> V <colsep> 0101 0110 <rowsep>
J <colsep> 0100 1010 <colsep> W <colsep> 0101 0111 <rowsep>
K <colsep> 0100 1011 <colsep> X <colsep> 0101 1000 <rowsep>
L <colsep> 0100 1100 <colsep> Y <colsep> 0101 1001 <rowsep>
M <colsep> 0100 1101 <colsep> Z <colsep> 0101 1010 <rowsep>
</tabular>
<caption><bf>Binary Chart for Alphabets</bf></caption>
</table>
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1> What is a Semi-conductor? <label id="semicon">
<p>
Conductors and insulators :
Many materials, such as most metals, allow electrical current to flow through them. These are known as conductors. Materials that do not allow electrical current to flow through them are called insulators. Pure silicon, the base material of most transistors, is considered a semiconductor because its conductivity can be modulated by the introduction of impurities.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect2> Anatomy of Transistor <label id="anatomy">
<p>
Semiconductors and flow of electricity
Adding certain types of impurities to the silicon in a transistor changes its crystalline structure and enhances its ability to conduct electricity. Silicon containing boron impurities is called p-type silicon - p for positive or lacking electrons. Silicon containing phosphorus impurities is called n-type silicon - n for negative or having a majority of free electrons
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect2> A Working Transistor <label id="worktrans">
<p>
A Working transistor - The On/Off state of Transistor
Transistors consist of three terminals; the source, the gate and the drain.
In the n-type transistor, both the source and the drain are negatively-charged and sit on a positively-charged well of p-silicon.
When positive voltage is applied to the gate, electrons in the p-silicon are attracted to the area under the gate forming an electron channel between the source and the drain.
When positive voltage is applied to the drain, the electrons are pulled from the source to the drain. In this state the transistor is on.
If the voltage at the gate is removed, electrons aren't attracted to the area between the source and drain. The pathway is broken and the transistor is turned off.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect2> Impact of Transistors <label id ="impact">
<p>
The Impact of Transistors - How microprocessors affect our lives.
The binary function of transistors gives micro- processors the ability to perform many tasks; from simple word processing to video editing. Micro- processors have evolved to a point where transistors can execute hundreds of millions of instructions per second on a single chip.
Automobiles, medical devices, televisions, computers and even the Space Shuttle use microprocessors. They all rely on the flow of binary information made possible by the transistor.
<!--
*******************************************
************ End of Section ***************
*******************************************
<chapt> CPU Design and Architecture <label id="cpudesign">
-->
<sect> CPU Design and Architecture <label id="cpudesign">
<p>
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1> CPU Design <label id="cpudesign">
<p>
Visit the following links for information on CPU Design.
<itemize>
<item> Hamburg University VHDL archive <url url="http://tech-www.informatik.uni-hamburg.de/vhdl">
the Hitachi SR2201 <url url="http://www.hitachi.co.jp/Prod/comp/hpc/eng/sr1.html">
<item> Personal Parallel Supercomputers <url url="http://www.checs.net/checs_98/papers/super">
</itemize>
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1> Main Architectural Classes
<p>
Before going on to the descriptions of the machines themselves, it is
important to consider some mechanisms that are or have been used to
increase the performance. The hardware structure or architecture
determines to a large extent what the possibilities and impossibilities
are in speeding up a computer system beyond the performance of a single
CPU. Another important factor that is considered in combination with
the hardware is the capability of compilers to generate efficient code
to be executed on the given hardware platform. In many cases it is hard
to distinguish between hardware and software influences and one has to be
careful in the interpretation of results when ascribing certain effects
to hardware or software peculiarities or both. In this chapter we will
give most emphasis to the hardware architecture. For a description of
machines that can be considered to be classified as "high-performance".
Since many years the taxonomy of Flynn has proven to be useful for
the classification of high-performance computers. This classification
is based on the way of manipulating of instruction and data streams and
comprises four main architectural classes. We will first briefly sketch
these classes and afterwards fill in some details when each of the
classes is described.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1> SISD machines
<p>
These are the conventional systems that contain one CPU
and hence can accommodate one instruction stream that is executed serially.
Nowadays many large mainframes may have more than one CPU but each of
these execute instruction streams that are unrelated. Therefore, such
systems still should be regarded as (a couple of) SISD machines acting
on different data spaces. Examples of SISD machines are for instance
most workstations like those of DEC, Hewlett-Packard, and Sun
Microsystems. The definition of SISD machines is given here for
completeness' sake. We will not discuss this type of machines
in this report.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1>
SIMD machines
<p>
Such systems often have a large number of processing
units, ranging from 1,024 to 16,384 that all may execute the same
instruction on different data in lock-step. So, a single instruction
manipulates many data items in parallel. Examples of SIMD machines
in this class are the CPP DAP Gamma II and the Alenia Quadrics.
Another subclass of the SIMD systems are the vectorprocessors.
Vectorprocessors act on arrays of similar data rather than on single
data items using specially structured CPUs. When data can be manipulated
by these vector units, results can be delivered with a rate of one,
two and --- in special cases --- of three per clock cycle (a clock
cycle being defined as the basic internal unit of time for the system).
So, vector processors execute on their data in an almost parallel way
but only when executing in vector mode. In this case they are several
times faster than when executing in conventional scalar mode. For
practical purposes vectorprocessors are therefore mostly regarded
as SIMD machines. Examples of such systems is for instance
the Hitachi S3600.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1>
MISD machines
<p>
Theoretically in these type of machines multiple
instructions should act on a single stream of data. As yet no
practical machine in this class has been constructed nor are
such systems easily to conceive. We will disregard them in the
following discussions.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1>
MIMD machines
<p>
These machines execute several instruction
streams in parallel on different data. The difference with the
multi-processor SISD machines mentioned above lies in the fact that
the instructions and data are related because they represent different
parts of the same task to be executed. So, MIMD systems may run
many sub-tasks in parallel in order to shorten the time-to-solution
for the main task to be executed. There is a large variety of
MIMD systems and especially in this class the Flynn taxonomy proves
to be not fully adequate for the classification of systems. Systems
that behave very differently like a four-processor NEC SX-5 and a
thousand processor SGI/Cray T3E fall both in this class. In the
following we will make another important distinction between classes
of systems and treat them accordingly.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect2>
Shared memory systems
<p>
Shared memory systems have multiple CPUs all
of which share the same address space. This means that the knowledge
of where data is stored is of no concern to the user as there is only
one memory accessed by all CPUs on an equal basis. Shared memory
systems can be both SIMD or MIMD. Single-CPU vector processors can be
regarded as an example of the former, while the multi-CPU models of
these machines are examples of the latter. We will sometimes use the
abbreviations SM-SIMD and SM-MIMD for the two subclasses.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect2>
Distributed memory systems
<p>
In this case each CPU has its own
associated memory. The CPUs are connected by some network and may
exchange data between their respective memories when required. In
contrast to shared memory machines the user must be aware of the
location of the data in the local memories and will have to move
or distribute these data explicitly when needed. Again, distributed
memory systems may be either SIMD or MIMD. The first class of
SIMD systems mentioned which operate in lock step, all have distributed
memories associated to the processors. As we will see,
distributed-memory MIMD systems exhibit a large variety in the
topology of their connecting network. The details of this topology
are largely hidden from the user which is quite helpful with
respect to portability of applications. For the distributed-memory
systems we will sometimes use DM-SIMD and DM-MIMD to indicate
the two subclasses.
Although the difference between shared- and distributed memory
machines seems clear cut, this is not always entirely the case
from user's point of view. For instance, the late Kendall
Square Research systems employed the idea of "virtual shared memory"
on a hardware level. Virtual shared memory can also be simulated
at the programming level: A specification of High Performance
Fortran (HPF) was published in 1993 which by means of
compiler directives distributes the data over the
available processors. Therefore, the system on which HPF is
implemented in this case will look like a shared memory machine
to the user. Other vendors of Massively Parallel Processing
systems (sometimes called MPP systems), like HP
and SGI/Cray,
also are able to support proprietary virtual shared-memory programming models due to
the fact that these physically distributed memory systems are able to address
the whole collective address space. So, for
the user such systems have one global address space spanning all of
the memory in
the system. We will say a little more about
the structure of such systems in
the ccNUMA section. In addition, packages like TreadMarks
provide a virtual shared memory environment for networks of workstations.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1>
Distributed Processing Systems
<p>
Another trend that has came up in
the last few years is distributed processing. This takes
the DM-MIMD concept one step further: instead
of many integrated processors in one or several boxes,
workstations, mainframes, etc., are connected by (Gigabit) Ethernet, FDDI, or otherwise
and set to work concurrently on tasks in
the same program. Conceptually, this is not different from DM-MIMD computing, but
the communication between processors is often orders
of magnitude slower. Many packages to realise distributed
computing are available. Examples of
these are PVM (st
anding for Parallel Virtual Machine),
and MPI (Message Passing Interface). This style
of programming, called
the "message passing" model has becomes so much accepted that PVM
and MPI have been adopted by virtually all major vendors
of distributed-memory MIMD systems
and even on shared-memory MIMD systems for compatibility reasons. In addition
there is a tendency to cluster shared-memory systems,
for instance by HiPPI channels, to obtain systems
with a very high computational power. E.g.,
the NEC SX-5,
and
the SGI/Cray SV1 have this structure. So, within
the clustered nodes a shared-memory programming style can be
used while between clusters message-passing should be used.
<!--
*******************************************
************ End of Section ***************
*******************************************
-->
<sect1>
ccNUMA machines
<p>
As already mentioned in the introduction, a trend can be
observed to build systems that have a rather small (up to 16)
number of RISC processors that are tightly integrated in
a cluster, a Symmetric Multi-Processing (SMP) node. The
processors in such a node are virtually always connected
by a 1-stage crossbar while these clusters are connected by a
less costly network.
This is similar to the policy mentioned for large
vectorprocessor ensembles mentioned above but with the important
difference that all of the processors can access all of the
address space. Therefore, such systems can be considered as
SM-MIMD machines. On the other hand, because the memory is
physically distributed, it cannot be guaranteed that a
data access operation always will be satisfied within the same
time. Therefore such machines are called ccNUMA
systems where ccNUMA stands for Cache Coherent Non-Uniform Memory
Access. The term "Cache Coherent" refers
to the fact that for all CPUs any variable that is to be used
must have a consistent value. Therefore, is must be assured
that the caches that provide these variables are also consistent
in this respect. There are various ways to ensure that the
caches of the CPUs are coherent. One is the snoopy bus
protocol in which the caches listen in on transport of variables to
any of the CPUs and update their own copies of these
variables if they have them. Another way is the directory memory,
a special part of memory which enables to keep track of the all
copies of variables and of their validness.
For all practical purposes we can classify these systems as
being SM-MIMD machines also because special assisting
hardware/software (such as a directory memory) has been
incorporated to establish a single system image although
the memory is physically distributed.
<sect> Linux Super Computers
<p>
Supercomputers traditionally have been expensive, highly customized designs purchased by a select group of customers, but the industry is being overhauled by comparatively mainstream technologies such as Intel processors,
<url name="Linux Networx" url="http://www.linuxnetworx.com"> customers include Los
Alamos and Lawrence Livermore national laboratories for nuclear weapons research,
Boeing for aeronautic engineering, and Sequenom for genetics research.
About <url name="Clusterworx" url="http://www.linuxnetworx.com/products/clusterworx.php"> :
Clusterworx is the most complete administration tool for monitoring and
management of Linux-based cluster systems. Clusterworx increases system uptime, improves cluster efficiency, tracks cluster performance, and removes the hassle from cluster installation and configuration.
The primary features of Clusterworx include monitoring of system properties, integrated
disk cloning using multicast technology, and event management of node properties
through a remotely accessible, easy-to-use graphical user interface (GUI). Some of
the system properties monitored include CPU Usage, Memory Usage, Disk I/O, Network
Bandwidth, and many more. Additional custom properties can easily be monitored through
the use of user-specific plug-ins. Events automate system administration tasks by
setting thresholds on these properties and then taking default or custom actions
when these values are exceeded.
About <url name="Myricom" url="http://www.myricom.com">:
Myrinet clusters are used for computationally demanding scientific and
engineering applications, and for data-intensive web and database applications. All
of the major OEM computer companies today offer cluster products. In
addition to direct sales, Myricom supplies Myrinet products and software to
IBM, HP, Compaq, Sun, NEC, SGI, Cray, and many other OEM and
system-integration companies. There are thousands of Myrinet clusters in
use world-wide, including several systems with more than 1000 processors.
<sect1> Little Linux SuperComputer In Your Garage
<p>
Imagine your garage filled with dozens of computers all linked together in a super-powerful Linux cluster. You still have to supply your own hardware, but the geek equivalent of a Mustang GT will become easier to set up and maintain, thanks to new software to be demonstrated at LinuxWorld next week.
The Open Source Cluster Applications Resources (OSCAR) software, being developed by the
<url name="Open Cluster Group" url="http://www.OpenClusterGroup.org">, will allow a
non-expert Linux user to set up a cluster in a matter of hours, instead of the days of work it now can take an experienced network administrator to piece one together. Developers of OSCAR are saying it'll be as easy as installing most software. Call it a "supercomputer on a CD."
"We've actually taken it to the point where a typical high school kid who has a little bit of experience with Linux and can get their hands on a couple of extra boxes could set up a cluster at home," says Stephen L. Scott, project leader at the Oak Ridge National Laboratory, one of several organizations working on OSCAR. "You can have a little supercomputer in your garage."
Supercomputing in Linux:
From
A <url name="step-by-step guide" url="http://www.pcquest.com/content/Supercomputer/102051001.asp">
on how to set up a cluster of PCQLinux machines for supercomputing
Shekhar Govindarajan, Friday, May 10, 2002
To keep it simple, we start with a cluster of three machines. One will
be the server and the other two will be the nodes. However, plugging in
additional nodes is easy and we will tell you the modification to
accommodate additional nodes. Instead of two nodes, you can have a
single node. So, even if you have two PCs, you can build a cluster. We
suggest that you go through the article Understanding Clustering, page
42, which explains what a cluster is and what server and nodes mean in a
cluster before you get started.
*Set up server hardware
*You should have at least a 2 GB or bigger hard disk on the server. It
should have a graphics card that is supported by PCQLinux 7.1 and a
floppy drive. You also need to plug in two network cards preferably the
faster PCI cards instead of ISA supported by PCQLinux.
Why two network cards? Adhering to the standards for cluster setups, if
the server node needs to be connected to the outside (external)
network? Internet or your private network?the nodes in the cluster must
be on a separate network. This is needed if you want to remotely execute
programs on the server. If not, you can do away with a second network
card for the external network. For example, at PCQ Labs, we have our
regular machines plugged in the 192.168.1.0 network. We selected the
network 172.16.0.0 for the cluster nodes. Hence, on the server, one
network card (called external interface) will be connected to the Labs
network and the other network card (internal interface) will be
connected to a switch. We used a 100/10 Mbps switch. A 100 Mbps switch
is recommended because the faster the speed of the network, the faster
is the message passing. All cluster nodes will also be connected to this
switch.
*PCQLinux on server
*If you already have a machine with PCQLinux 7.1, including the X Window
(KDE or GNOME), installed you can use it as a server machine. In this
case you may skip the following steps for installation. If this machine
has a firewall (ipchains or iptables) setup, remove all strict
restrictive rules, as it will hinder communication between the server
and the nodes. The 'medium' level of firewall rules in PCQLinux is
suitable. After the cluster set up, you may selectively enable the
rules, if required.
If you haven't installed PCQLinux on the machine, opt for custom system
install and manual partitioning. Create the swap and / (ROOT)
partitions. If you are shown the 1024 cylinder limit problem, you may
also have to create a /boot partition of about 50 MB. In the network
configuration, fill in the hostname (say, server. cluster.net), IP
address of the gateway/router on your network, and the IP of a DNS
server (if any) running on your network. Leave other field to their
defaults. We will set up the IP addresses for network cards after the
installation. Select 'Medium' for the firewall configuration. We now
come to the package-selection wizard. You don't need to install all the
packages. Besides the packages selected by default, select 'Development'
and 'Kernel Development' packages. These provide various libraries and
header files for writing programs and are useful if you will develop
applications on the cluster. You will need the X Window system because
we will use a graphical tool for cluster set up and configuration. By
default, GNOME is selected as the Window Manager. If you are comfortable
using KDE, select it instead. By suggesting that you select only a few
packages for install, we aim at a minimal installation. However, if you
wish to install other packages like your favorite text editor, network
management utilities or a Web server, then you can select them. Make
sure that you set up your graphics card and monitor correctly.
After the installation finishes, reboot into PCQLinux. Log in as root.
*Set up OSCAR
*Mount this month's CD and copy the file oscar-1.2.1.tar.gz from the
directory system/cdrom/ unltdlinux/linux on the CD to /root. Uncompress
and extract the archive as:
tar -zxvf oscar-1.2.1.tar.gz
This will extract the files in a directory named oscar-1.2.1 within
/root directory.
OSCAR installs Linux on the nodes from the server across the network.
For this, it constructs an image file from RPM packages. This image file
is in turn picked up by the nodes to install PCQLinux onto them. The
OSCAR version we've given on the CD is customized for RedHat 7.1. Though
PCQLinux 7.1 is also based on RedHat 7.1, some RPMs with PCQLinux are of
more recent versions than the ones required by OSCAR. OSCAR constructs
the image out of a list of RPMs specified in sample.rpmlist in the
subdirectory oscarsamples in oscar-1.2.1. You have to replace this file
with the one customized for PCQLinux RPMs. We have given a file named
sample.rpmlist on this month's CD in the directory
system/cdrom/unltdlinux /linux. Overwrite the file sample.rpmlist in the
oscarsamples directory with this file.
<code>
*Copy PCQLinux RPMs to /tftpboot/rpm
*For creating the image, OSCAR will look for the PCQLinux RPMs in the
directory /tftpboot/rpm. Create a directory /tftpboot and a subdirectory
named rpm within it
mkdir /tftpboot
mkdir /tftpboot/rpm
Next, copy all the PCQLinux RPMs from both the CDs to /tftpboot/rpm
directory. Insert CD 1 (PCQLinux CD 1, given with our July 2001 issue)
and issue the following commands:
mount /mnt/cdrom
cd /mnt/cdrom/RedHat/RPMS
cp *.rpm /tftpboot/ rpm
cd
umount /mnt/cdrom
Insert CD 2 (given with the July 2001 issue) and issue the above
commands again.
Note. If you are tight at the disk space, you don't need to copy all the
RPMs to /tftpboot/rpm. You can copy only the RPMs listed in
sample.rpmlist file. Copy only the required RPMs.
*Copy required RPMs
*Type the following in a Linux text editor and save the file as copyrpms.sh