178 lines
9.4 KiB
HTML
178 lines
9.4 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
|
<TITLE>Linux Parallel Processing HOWTO: Linux-Hosted Attached Processors</TITLE>
|
|
<LINK HREF="Parallel-Processing-HOWTO-6.html" REL=next>
|
|
<LINK HREF="Parallel-Processing-HOWTO-4.html" REL=previous>
|
|
<LINK HREF="Parallel-Processing-HOWTO.html#toc5" REL=contents>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="Parallel-Processing-HOWTO-6.html">Next</A>
|
|
<A HREF="Parallel-Processing-HOWTO-4.html">Previous</A>
|
|
<A HREF="Parallel-Processing-HOWTO.html#toc5">Contents</A>
|
|
<HR>
|
|
<H2><A NAME="s5">5. Linux-Hosted Attached Processors</A></H2>
|
|
|
|
<P>
|
|
<P>Although this approach has recently fallen out of favor, it is
|
|
virtually impossible for other parallel processing methods to achieve
|
|
the low cost and high performance possible by using a Linux system to
|
|
host an attached parallel computing system. The problem is that very
|
|
little software support is available; you are pretty much on your own.
|
|
<P>
|
|
<H2><A NAME="ss5.1">5.1 A Linux PC Is A Good Host</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<P>In general, attached parallel processors tend to be specialized to
|
|
perform specific types of functions.
|
|
<P>Before becoming discouraged by the fact that you are somewhat on your
|
|
own, it is useful to understand that, although it may be difficult to
|
|
get a Linux PC to appropriately host a particular system, a Linux PC
|
|
is one of the few platforms well suited to this type of use.
|
|
<P>PCs make a good host for two primary reasons. The first is the cheap
|
|
and easy expansion capability; resources such as more memory, disks,
|
|
networks, etc., are trivially added to a PC. The second is the ease
|
|
of interfacing. Not only are ISA and PCI bus prototyping cards widely
|
|
available, but the parallel port offers reasonable performance in a
|
|
completely non-invasive interface. The IA32 separate I/O space also
|
|
facilitates interfacing by providing hardware I/O address protection
|
|
at the level of individual I/O port addresses.
|
|
<P>Linux also makes a good host OS. The free availability of full source
|
|
code, and extensive "hacking" guides, obviously are a tremendous help.
|
|
However, Linux also provides good near-real-time scheduling, and there
|
|
is even a true real-time version of Linux at
|
|
<A HREF="http://luz.cs.nmt.edu/~rtlinux/">http://luz.cs.nmt.edu/~rtlinux/</A>. Perhaps even more important
|
|
is the fact that while providing a full UNIX environment, Linux can
|
|
support development tools that were written to run under Microsoft DOS
|
|
and/or Windows. MSDOS programs can execute within a Linux process
|
|
using <CODE>dosemu</CODE> to provide a protected virtual machine that can
|
|
literally run MSDOS. Linux support for Windows 3.xx programs is even
|
|
more direct: free software such as <CODE>wine</CODE>,
|
|
<A HREF="http://www.linpro.no/wine/">http://www.linpro.no/wine/</A>, simulates Windows 3.11 well enough
|
|
for most programs to execute correctly and efficiently within a UNIX/X
|
|
environment.
|
|
<P>The following two sections give examples of attached parallel systems
|
|
that I'd like to see supported under Linux....
|
|
<P>
|
|
<H2><A NAME="ss5.2">5.2 Did You DSP That?</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<P>There is a thriving market for high-performance DSP (Digital Signal
|
|
Processing) processors. Although these chips were generally designed
|
|
to be embedded in application-specific systems, they also make great
|
|
attached parallel computers. Why?
|
|
<P>
|
|
<UL>
|
|
<LI>Many of them, such as the Texas Instruments (
|
|
<A HREF="http://www.ti.com/">http://www.ti.com/</A>) TMS320 and the Analog Devices (
|
|
<A HREF="http://www.analog.com/">http://www.analog.com/</A>) SHARC DSP families, are designed to
|
|
construct parallel machines with little or no "glue" logic.
|
|
</LI>
|
|
<LI>They are cheap, especially per MIP or MFLOP. Including the cost
|
|
of basic support logic, it is not unheard of for a DSP processor to be
|
|
one tenth the cost of a PC processor with comparable performance.
|
|
</LI>
|
|
<LI>They do not use much power nor generate much heat. This means
|
|
that it is possible to have a bunch of these chips powered by a
|
|
conventional PC's power supply - and enclosing them in your PC's case
|
|
will not turn it into an oven.
|
|
</LI>
|
|
<LI>There are strange-looking things in most DSP instruction sets
|
|
that high-level (e.g., C) compilers are unlikely to use well - for
|
|
example, "Bit Reverse Addressing." Using an attached parallel system,
|
|
it is possible to straightforwardly compile and run most code on the
|
|
host, while running the most time-consuming few algorithms on the DSPs
|
|
as carefully hand-tuned code.
|
|
</LI>
|
|
<LI>These DSP processors are not really designed to run a UNIX-like
|
|
OS, and generally are not very good as stand-alone general-purpose
|
|
computer processors. For example, many do not have memory management
|
|
hardware. In other words, they work best when hosted by a more
|
|
general-purpose machine... such as a Linux PC.</LI>
|
|
</UL>
|
|
<P>Although some audio cards and modems include DSP processors that Linux
|
|
drivers can access, the big payoff comes from using an attached
|
|
parallel system that has four or more DSP processors.
|
|
<P>Because the Texas Instruments TMS320 series,
|
|
<A HREF="http://www.ti.com/sc/docs/dsps/dsphome.htm">http://www.ti.com/sc/docs/dsps/dsphome.htm</A>, has been very
|
|
popular for a long time, and it is trivial to construct a TMS320-based
|
|
parallel processor, there are quite a few such systems available.
|
|
There are both integer-only and floating-point capable versions of the
|
|
TMS320; older designs used a somewhat unusual single-precision
|
|
floating-point format, but the new models support IEEE formats. The
|
|
older TMS320C4x (aka, 'C4x) achieves up to 80 MFLOPS using the
|
|
TI-specific single-precision floating-point format; in contrast, a
|
|
single 'C67x will provide up to 1 GFLOPS single-precision or 420
|
|
MFLOPS double-precision for IEEE floating point calculations, using a
|
|
VLIW-based chip architecture called VelociTI. Not only is it easy to
|
|
configure a group of these chips as a multiprocessor, but in a single
|
|
chip, the 'C8x multiprocessor will provide a 100 MFLOPS IEEE
|
|
floating-point RISC master processor along with either two or four
|
|
integer slave DSPs.
|
|
<P>The other DSP processor family that has been used in more than a few
|
|
attached parallel systems lately is the SHARC (aka, ADSP-2106x) from
|
|
Analog Devices
|
|
<A HREF="http://www.analog.com/">http://www.analog.com/</A>. These chips can be
|
|
configured as a 6-processor shared memory multiprocessor without
|
|
external glue logic, and larger systems also can be configured using
|
|
six 4-bit links/chip. Most of the larger systems seem targeted to
|
|
military applications, and are a bit pricey. However, Integrated
|
|
Computing Engines, Inc.,
|
|
<A HREF="http://www.iced.com/">http://www.iced.com/</A>, makes an
|
|
interesting little two-board PCI card set called GreenICE. This unit
|
|
contains an array of 16 SHARC processors, and is capable of delivering
|
|
a peak speed of about 1.9 GFLOPS using a single-precision IEEE format.
|
|
GreenICE costs less than $5,000.
|
|
<P>In my opinion, attached parallel DSPs really deserve a lot more
|
|
attention from the Linux parallel processing community....
|
|
<P>
|
|
<H2><A NAME="ss5.3">5.3 FPGAs And Reconfigurable Logic Computing</A>
|
|
</H2>
|
|
|
|
<P>
|
|
<P>If parallel processing is all about getting the highest speedup, then
|
|
why not build custom hardware? Well, we all know the answers; it
|
|
costs too much, takes too long to develop, becomes useless when we
|
|
change the algorithm even slightly, etc. However, recent advances in
|
|
electrically reprogrammable FPGAs (Field Programmable Gate Arrays)
|
|
have nullified most of those objections. Now, the gate density is
|
|
high enough so that an entire simple processor can be built within a
|
|
single FPGA, and the time to reconfigure (reprogram) an FPGA has also
|
|
been dropping to a level where it is reasonable to reconfigure even
|
|
when moving from one phase of an algorithm to the next.
|
|
<P>This stuff is not for the weak of heart: you'll have to work with
|
|
hardware description languages like VHDL for the FPGA configuration, as
|
|
well as writing low-level code to interface to programs on the Linux
|
|
host system. However, the cost of FPGAs is low, and especially for
|
|
algorithms operating on low-precision integer data (actually, a small
|
|
superset of the stuff SWAR is good at), FPGAs can perform complex
|
|
operations just about as fast as you can feed them data. For example,
|
|
simple FPGA-based systems have yielded better-than-supercomputer times
|
|
for searching gene databases.
|
|
<P>There are other companies making appropriate FPGA-based hardware, but
|
|
the following two companies represent a good sample.
|
|
<P>Virtual Computer Company offers a variety of products using
|
|
dynamically reconfigurable SRAM-based Xilinx FPGAs. Their 8/16 bit
|
|
"Virtual ISA Proto Board"
|
|
<A HREF="http://www.vcc.com/products/isa.html">http://www.vcc.com/products/isa.html</A> is less than $2,000.
|
|
<P>The Altera ARC-PCI (Altera Reconfigurable Computer, PCI bus),
|
|
<A HREF="http://www.altera.com/html/new/pressrel/pr_arc-pci.html">http://www.altera.com/html/new/pressrel/pr_arc-pci.html</A>,
|
|
is a similar type of card, but uses Altera FPGAs and a PCI bus
|
|
interface rather than ISA.
|
|
<P>Many of the design tools, hardware description languages, compilers,
|
|
routers, mappers, etc., come as object code only that runs under
|
|
Windows and/or DOS. You could simply keep a disk partition with
|
|
DOS/Windows on your host PC and reboot whenever you need to use them,
|
|
however, many of these software packages may work under Linux using
|
|
<CODE>dosemu</CODE> or Windows emulators like <CODE>wine</CODE>.
|
|
<P>
|
|
<HR>
|
|
<A HREF="Parallel-Processing-HOWTO-6.html">Next</A>
|
|
<A HREF="Parallel-Processing-HOWTO-4.html">Previous</A>
|
|
<A HREF="Parallel-Processing-HOWTO.html#toc5">Contents</A>
|
|
</BODY>
|
|
</HTML>
|