139 lines
6.8 KiB
HTML
139 lines
6.8 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
|
<TITLE>Brief Introduction to Alpha Systems and Processors: Alpha CPUs</TITLE>
|
|
<LINK HREF="Alpha-HOWTO-4.html" REL=next>
|
|
<LINK HREF="Alpha-HOWTO-2.html" REL=previous>
|
|
<LINK HREF="Alpha-HOWTO.html#toc3" REL=contents>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="Alpha-HOWTO-4.html">Next</A>
|
|
<A HREF="Alpha-HOWTO-2.html">Previous</A>
|
|
<A HREF="Alpha-HOWTO.html#toc3">Contents</A>
|
|
<HR>
|
|
<H2><A NAME="s3">3. Alpha CPUs</A></H2>
|
|
|
|
<P>There are currently 2 generations of CPU core that implement the Alpha
|
|
architecture:
|
|
<P>
|
|
<UL>
|
|
<LI> EV4</LI>
|
|
<LI> EV5</LI>
|
|
</UL>
|
|
<P>
|
|
<P>Opinions differ as to what "EV" stands for (Editor's note: the true
|
|
answer is of course "Electro Vlassic"
|
|
<A HREF="Alpha-HOWTO-12.html#ref1">[1]</A>), but the
|
|
number represents the first generation of Digital's CMOS technology
|
|
that the core was implemented in. So, the EV4 was originally
|
|
implemented in CMOS4. As time goes by, a CPU tends to get a mid-life
|
|
performance kick by being optically shrunk into the next generation of
|
|
CMOS process. EV45, then, is the EV4 core implemented in CMOS5
|
|
process. There is a big difference between shrinking a design into a
|
|
particular technology and implementing it from scratch in that
|
|
technology (but I don't want to go into that now). There are a few
|
|
other wildcards in here: there is also a CMOS4S (optical shrink in
|
|
CMOS4) and a CMOS5L.
|
|
<P>
|
|
<P>True technophiles will be interested to know that CMOS4 is a 0.75 micron
|
|
process, CMOS5 is a 0.5 micron process and CMOS6 is a 0.35 micron process.
|
|
<P>
|
|
<P>To map these CPU cores to <EM>chips</EM> we get:
|
|
<P>
|
|
<DL>
|
|
<DT><B>21064-150,166</B><DD><P>EV4 (originally), EV4S (now)
|
|
<DT><B>21064-200</B><DD><P>EV4S
|
|
<DT><B>21064A-233,275,300</B><DD><P>EV45
|
|
<DT><B>21066</B><DD><P>LCA4S (EV4 core, with EV4 FPU)
|
|
<DT><B>21066A-233</B><DD><P>LCA45 (EV4 core, but with EV45 FPU)
|
|
<DT><B>21164-233,300,333</B><DD><P>EV5
|
|
<DT><B>21164A-417</B><DD><P>EV56
|
|
<DT><B>21264</B><DD><P>
|
|
<A HREF="http://www.mdronline.com/report/articles/21264/21264.html">EV6</A></DL>
|
|
<P>
|
|
<P>
|
|
<P> The EV4 core is a dual-issue (it can issue 2 instructions per CPU
|
|
clock) superpipelined core with integer unit, floating point unit and
|
|
branch prediction. It is fully bypassed and has 64-bit internal data
|
|
paths and tightly coupled 8Kbyte caches, one each for Instruction and
|
|
Data. The caches are write-through (they never get dirty).
|
|
<P>
|
|
<P> The EV45 core has a couple of tweaks to the EV4 core: it has a
|
|
slightly improved floating point unit, and 16KB caches, one each for
|
|
Instruction and Data (it also has cache parity). (Editor's note: Neal
|
|
Crook indicated in a separate mail that the changes to the floating
|
|
point unit (FPU) improve the performance of the divider. The EV4 FPU
|
|
divider takes 34 cycles for a single-precision divide and 63 cycles
|
|
for a double-precision divide (non data-dependent). In constrast, the
|
|
EV45 divider takes typically 19 cycles (34 cycles max) for
|
|
single-precision and typically 29 cycles (63 cycles max) for a
|
|
double-precision division (data-dependent).)
|
|
<P>
|
|
<P> The EV5 core is a quad-issue core, also superpipelined, fully bypassed
|
|
etc etc. It has tightly-coupled 8Kbyte caches, one each for I and D. These
|
|
caches are write-through. It also has a tightly-coupled 96Kbyte on-chip
|
|
second-level cache (the Scache) which is 3-way set associative and write-back
|
|
(it can be dirty). The EV4->EV5 performance increase is better than just
|
|
the increase achieved by clock speed improvements. As well as the bigger
|
|
caches and quad issue, there are microarchitectural improvements to reduce
|
|
producer/consumer latencies in some paths.
|
|
<P>
|
|
<P> The EV56 core is fundamentally the same microarchitecture as the
|
|
EV5, but it adds some new instructions for 8 and 16-bit loads and
|
|
stores (see Section
|
|
<A HREF="Alpha-HOWTO-8.html#byte ld/st">Bytes and all that stuff</A>). These are primarily intended for use by device drivers. The
|
|
EV56 core is implemented in CMOS6, which is a 2.0V process.
|
|
<P>
|
|
<P> The 21064 was anounced in March 1992. It uses the EV4 core, with a 128-bit
|
|
bus interface. The bus interface supports the 'easy' connection of an external
|
|
second-level cache, with a block size of 256-bits (2 data beats on the
|
|
bus). The Bcache timing is completely software configurable. The 21064 can also
|
|
be configured to use a 64-bit external bus, (but I'm not sure if any shipping
|
|
system uses this mode). The 21064 does not impose any policy on the Bcache, but
|
|
it is usually configured as a write-back cache. The 21064 does contain hooks to
|
|
allow external hardware to maintain cache coherence with the Bcache and
|
|
internal caches, but this is hairy.
|
|
<P>
|
|
<P> The 21066 uses the EV4 core and integrates a memory controller and
|
|
PCI host bridge. To save pins, the memory controller has a 64-bit data
|
|
bus (but the internal caches have a block size of 256 bits, just like
|
|
the 21064, therefore a block fill takes 4 beats on the bus). The
|
|
memory controller supports an external Bcache and external DRAMs. The
|
|
timing of the Bcache and DRAMs is completely software configurable,
|
|
and can be controlled to the resolution of the CPU clock
|
|
period. Having a 4-beat process to fill a cache block isn't as bad as
|
|
it sounds because the DRAM access is done in page mode. Unfortunately,
|
|
the memory controller doesn't support any of the new esoteric DRAMs
|
|
(SDRAM, EDO or BEDO) or synchronous cache RAMs. The PCI bus interface
|
|
is fully rev2.0 compliant and runs at upto 33MHz.
|
|
<P>
|
|
<P> The 21164 has a 128-bit data bus and supports split reads, with
|
|
upto 2 reads outstanding at any time (this allows 100% data bus
|
|
utilisation under best-case dream-on conditions, i.e., you can
|
|
theoretically transfer 128-bits of data on every bus clock). The 21164
|
|
supports easy connection of an external 3-rd level cache (Bcache) and
|
|
has all the hooks to allow external systems to maintain full cache
|
|
coherence with all caches. Therefore, symmetric multiprocessor designs
|
|
are 'easy'.
|
|
<P>
|
|
<P> The 21164A was announced in October, 1995. It uses the EV56 core. It is
|
|
nominally pin-compatible with the 21164, but requires split power rails; all
|
|
of the power pins that were +3.3V power on the 21164 have now been split into
|
|
two groups; one group provided 2.0V power to the CPU core, the other group
|
|
supplies 3.3V to the I/O cells. Unlike older implementations, the 21164 pins
|
|
are not 5V-tolerant. The end result of this change is that 21164 systems are,
|
|
in general, not upgradeable to the 21164A (though note that it would be
|
|
relatively straightforward to design a 21164A system that could also
|
|
accommodate a 21164). The 21164A also has a couple of new pins to support
|
|
the new 8 and 16-bit loads and stores. It also improves the 21164 support for
|
|
using synchronus SRAMs to implement the external Bcache.
|
|
<P>
|
|
<P>
|
|
<HR>
|
|
<A HREF="Alpha-HOWTO-4.html">Next</A>
|
|
<A HREF="Alpha-HOWTO-2.html">Previous</A>
|
|
<A HREF="Alpha-HOWTO.html#toc3">Contents</A>
|
|
</BODY>
|
|
</HTML>
|