307 lines
15 KiB
HTML
307 lines
15 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<TITLE>Linux Benchmarking - Article III - Interpreting benchmark results: Benchmarking vs. benchmarketing</TITLE>
|
|
<META NAME="GENERATOR" CONTENT="Mozilla/3.01Gold (X11; I; Linux 2.0.29 i686) [Netscape]">
|
|
</HEAD>
|
|
<BODY>
|
|
|
|
<P><A HREF="./Article3e-1.html"><IMG SRC="./gx/balsa/prev.gif" ALT="Previous" HEIGHT=16 WIDTH=16></A>
|
|
<A HREF="./Article3e-3.html"><IMG SRC="./gx/balsa/next.gif" ALT="Next" HEIGHT=16 WIDTH=16></A>
|
|
<A HREF="./Article3e.html#toc2"><IMG SRC="./gx/balsa/toc.gif" ALT="Contents" HEIGHT=16 WIDTH=16></A>
|
|
|
|
<HR></P>
|
|
|
|
<H2><A NAME="s2"></A>2. Benchmarking vs. benchmarketing</H2>
|
|
|
|
<P>There are two basic approaches to benchmarking in the field of computing:
|
|
the "scientific" or quantitative approach and the "benchmarketing"
|
|
approach. Both approaches use exactly the same tools, however with a slightly
|
|
different methodology and of course with widely diverging objectives and
|
|
results.</P>
|
|
|
|
<H2><A NAME="ss2.1"></A>2.1 The scientific/quantitative approach</H2>
|
|
|
|
<P>The first approach is to think of benchmarking as a tool for experimentation.
|
|
Benchmarking is a specific branch of Computer Science, since it produces
|
|
numbers which can then be mathematically processed and analyzed; this analysis
|
|
will be later used to draw relevant conclusions about CPU architectures,
|
|
compiler design, etc.</P>
|
|
|
|
<P>As with any scientific activity, experiments (benchmark runs and reporting)
|
|
must follow some basic guidelines or rules: </P>
|
|
|
|
<UL>
|
|
<LI>A good dose of modesty/humility (don't be too ambitious to begin with)
|
|
and common sense.</LI>
|
|
|
|
<LI>No bias or prejudice.</LI>
|
|
|
|
<LI>A clearly stated objective related to advancing the state-of-the-art.</LI>
|
|
|
|
<LI>Reproducibility.</LI>
|
|
|
|
<LI>Accuracy.</LI>
|
|
|
|
<LI>Relevance.</LI>
|
|
|
|
<LI>Correct logical/statistical inference.</LI>
|
|
|
|
<LI>Conciseness.</LI>
|
|
|
|
<LI>Sharing of information.</LI>
|
|
|
|
<LI>Quoting sources/references.</LI>
|
|
</UL>
|
|
|
|
<P>Of course, this is an idealized view of the scientific community, but
|
|
these are some of the basic rules for the experimental methods in all branches
|
|
of Science.</P>
|
|
|
|
<P>I should stress that benchmarking results in <B>documented quantitative
|
|
data</B>.</P>
|
|
|
|
<P>The correct procedure for GNU/Linux benchmarking under this approach
|
|
is: </P>
|
|
|
|
<OL>
|
|
<LI>Decide on what is the issue that is going to be investigated. <B>It
|
|
is very important to execute this step before anything else gets started</B>.
|
|
Stating clearly what we are going to investigate is getting half the work
|
|
done.</LI>
|
|
|
|
<LI>Also note that we are not out to prove anything: we must start with
|
|
a clean, <B>Zen-like mind</B>. This is particularly difficult for us, GNU/Linux
|
|
benchmarkers, since we are all <B>utterly convinced</B> that: </LI>
|
|
|
|
<OL>
|
|
<LI>GNU/Linux is the best OS in the universe (what "best" means
|
|
in this context is not clear, however; probably the same as "coolest"),</LI>
|
|
|
|
<LI>Wide-SCSI-3 is better than plain EIDE, (idem),</LI>
|
|
|
|
<LI><A HREF="http://www.digital.com/semiconductor/alpha/alpha-chips.html">Digital's
|
|
64-bit RISC Alpha </A>is the best platform around for GNU/Linux development
|
|
(idem), and</LI>
|
|
|
|
<LI>X Window is a good, modern GUI (no comments). </LI>
|
|
</OL>
|
|
|
|
<LI>After purifying our minds and souls ;-), we will have to select the
|
|
tools (i.e. the benchmarks) which will be used for our benchmarking experiments.
|
|
You can take a look at my previous article for a selection of GPLed benchmarks.
|
|
Another way to get the right tool for the job at hand is to devise and
|
|
implement your own benchmark. This approach takes a lot more time and energy,
|
|
and sometimes amounts to reinventing the wheel. Creativity being one of
|
|
the nicest features in the GNU/Linux world, writing a new benchmark is
|
|
recommended nonetheless, especially in the areas where such tools are sorely
|
|
missed (Graphics, 3D, multimedia, etc). <B>Summarizing, selecting the appropriate
|
|
tool for the job is very important</B>.</LI>
|
|
|
|
<LI>Now comes the painstaking part: gathering the data. This takes huge
|
|
amounts of patience and attention to details. See my two previous articles.</LI>
|
|
|
|
<LI>And finally we reach the stage of data analysis and logical inference,
|
|
based on the data we gathered/analyzed. This is also where one can spoil
|
|
everything by joining the Dark Side of the Force (see subsection 1.2 below).
|
|
Quoting Andrew Tanenbaum: "Figures don't lie, but liars figure".</LI>
|
|
|
|
<LI>If relevant conclusions can be drawn, publishing them on the appropriate
|
|
mailing lists, newsgroups or on the <B><A HREF="http://www.linuxgazette.com">Linux
|
|
Gazette </A></B>is in order. Again this is very much a Zen attitude (known
|
|
as "coming back to the village").</LI>
|
|
|
|
<LI>Just when you thought it was over and you could finally close the cabinet
|
|
of your CPU after having disassembled it more times than you could count,
|
|
you get a sympathetic email that mentions a small but fundamental flaw
|
|
in your entire benchmarking procedure. And you begin to understand that
|
|
benchmarking is an iterative process, much like self-improvement...</LI>
|
|
</OL>
|
|
|
|
<H2><A NAME="ss2.2"></A>2.2 The benchmarketing approach</H2>
|
|
|
|
<P>This second approach is more popular than the first one, as it serves
|
|
commercial purposes and gets more subsidies (i.e. grants, sponsorship,
|
|
money, cash, dinero, l'argent, $) than the first approach. Benchmarketing
|
|
has one basic objective, and that is to prove that equipment/software A
|
|
is better (faster, more powerful, better performing or with a better price/performance
|
|
ratio) than equipment/software B. The basic inspiration for this approach
|
|
is the Greek philosophical current known as Sophism. Sophistry has had
|
|
its adepts at all times and ages, but the Greeks made it into a veritable
|
|
art. Benchmarketers have continued this tradition with varying success
|
|
(also note that the first Sophists were lawyers <A HREF="Article3e-7.html#sophism">(1)</A>
|
|
see my comment on Intel below). Of course with this approach there is no
|
|
hope of spiritual redemption... Quoting Larry Wall (of Perl fame) as often
|
|
quoted by David Niemi:</P>
|
|
|
|
<P><I>"Down that path lies madness. On the other hand the road to
|
|
Hell is paved with melting snowballs."</I></P>
|
|
|
|
<P>Benchmarketing results cover the entire range from outright lies to
|
|
subtle fallacies. Sometimes an excessive amount of data is involved, and
|
|
in other cases no quantitative data at all is provided; in both cases the
|
|
task of proving benchmarketing wrong is made more arduous.</P>
|
|
|
|
<H3>A short history of benchmarketing/CPU developments</H3>
|
|
|
|
<P>We already saw that the first widely used benchmark, Whetstone, originated
|
|
as the result of research into computer architecture and compiler design.
|
|
So the original Whetstone benchmark can be traced to the "scientific
|
|
approach".</P>
|
|
|
|
<P>At the time Whestone was written, computers were indeed rare and very
|
|
expensive, and the fact that they executed tasks impossible for human beings
|
|
was enough to justify their purchase by large organizations. </P>
|
|
|
|
<P>Very soon competition changed this. Foremost among the early benchmarketers
|
|
was the need to justify the purchase of very expensive mainframes (at the
|
|
time called supercomputers; these early machines would not even match my
|
|
< $900 AMD K6 box). This gave rise to a good number of now obsolete
|
|
benchmarks, as of course each different architecture needed a new algorithm
|
|
to justify its existence in commercial terms.</P>
|
|
|
|
<P>This <B>supercomputer market issue </B>is still not over, but two factors
|
|
contributed to its relative decline:</P>
|
|
|
|
<OL>
|
|
<LI>Jack Dongarra's effort to standardize the LINPACK benchmark as the
|
|
basic tool for supercomputer benchmarking. This was not entirely successful,
|
|
as specific "optimizers" were created to make LINPACK run faster
|
|
on some CPU architectures (note that unless you are trying to solve large
|
|
scientific problems involving matrix operations - the usual task assigned
|
|
to most supercomputers - LINPACK is not a good measure of the CPU performance
|
|
of your GNU/Linux box; anyway, you can find a version of LINPACK ported
|
|
to C in Al Aburto's excellent <A HREF="ftp://ftp.nosc.mil/pub/aburto">FTP
|
|
site</A>.</LI>
|
|
|
|
<LI>The appearance of very fast and cheap superminis, and later microprocessors,
|
|
and the widespread use of networking technologies. These changed the idea
|
|
of a centralized computing facility and signaled the end of the supercomputer
|
|
for most applications. Also modern supercomputers are built with arrays
|
|
of microprocessors nowadays (notably the latest Cray machines are built
|
|
using up to 2048 Alpha processors), so there was a shift in focus.</LI>
|
|
</OL>
|
|
|
|
<P>Next in line was the <B>workstation market </B>issue. A nice side-effect
|
|
of the various marketing initiatives on the part of some competitors (HP,
|
|
Sun, IBM, SGI among others) is that it spawned the development of various
|
|
Unix benchmarks that we can now use to benchmark our GNU/Linux boxes!</P>
|
|
|
|
<P>In parallel to the workstation market development, we saw fierce competition
|
|
develop in the <B>microprocessor market</B>, with each manufacturer touting
|
|
its architecture as the "superior" design. In terms of microprocessor
|
|
architecture an interesting development was the performance issue of CISC
|
|
against RISC designs. In market terms the dominating architecture is Intel's
|
|
x86 CISC design (c.f. Computer Architecture, a Quantitative Approach, Hennessy
|
|
and Patterson, 2nd. edition; there is an excellent 25-page appendix on
|
|
the x86 architecture). </P>
|
|
|
|
<P>Recently the demonstrably better-performing Alpha RISC architecture
|
|
was almost wiped out by Intel lawyers: as a settlement of a complex legal
|
|
battle over patent infringements, Intel bought Digital's microelectronics
|
|
operation (which also produced the StrongARM <A HREF="Article3e-7.html#arm">(2)
|
|
</A>and Digital's highly successful line of Ethernet chips). Note however
|
|
that Digital kept its Alpha design team and the settlement includes the
|
|
possibility by Digital to have present and future Alpha chips manufactured
|
|
by Intel.</P>
|
|
|
|
<P>The x86 market attracted <A HREF="http://www.intel.com">Intel </A>competitors
|
|
<A HREF="http://www.amd.com/products/cpg/cpg.html">AMD </A>and more recently
|
|
<A HREF="http://www.cyrix.com">Cyrix</A> which created original x86 designs.
|
|
AMD also bought a small startup called NexGen which designed the precursor
|
|
to the K6, and Cyrix had to grow under the umbrella of IBM and now <A HREF="http://www.national.com">National
|
|
Semiconductor</A> but that's another story altogether. Intel is still the
|
|
market leader since it has 90% of the microprocessor market, even though
|
|
both the AMD K6 and Cyrix 6x86MX architectures provide better Linux performance/MHz
|
|
than Intel's best effort to date, the Pentium II (except for floating-point
|
|
operations).</P>
|
|
|
|
<P>Lastly, we have the <B>OS market</B> issue. The <A HREF="http://www.microsoft.com">Microsoft
|
|
</A>Windows (R) line of OS's is the overwhelming market leader as far as
|
|
desktop applications are concerned, but in terms of performance/security/stability/flexibility
|
|
it sometimes does not compare well with other OSes. Of course, inter-OS
|
|
benchmarking is a risky business and OS designers are aware of that.</P>
|
|
|
|
<P>Besides, comparing GNU/Linux to other OSes using benchmarks is almost
|
|
always an exercise in futility: GNU/Linux is GPLed, whereas no other OS
|
|
can be said to be <I>free</I> (in the GNU/GPL sense). Can you compare something
|
|
that is <I>free</I> to something that is proprietary <A HREF="Article3e-7.html#freedom">(3)</A>
|
|
Does benchmarketing apply to something that is <I>free</I>?</P>
|
|
|
|
<P>Comparing GNU/Linux to other OSes is also a good way to start a nice
|
|
flame war on comp.os.linux.advocacy, specially when GNU/Linux is compared
|
|
to BSD Unices or Windows NT. Most debaters don't seem to realize that each
|
|
OS had different design objectives!</P>
|
|
|
|
<P>These debates usually reach a steady state when both sides are convinced
|
|
that they are "right" and that their opponents are "wrong".
|
|
Sometimes benchmarking data is called in to prove or disprove an argument.
|
|
But even then we see that this has more to do with benchmarketing than
|
|
with benchmarking. My $0.02 of advice: <B>avoid such debates like the plague</B>.</P>
|
|
|
|
<H3>Turning benchmarking into benchmarketing</H3>
|
|
|
|
<P>The <A HREF="http://www.specbench.org">SPEC95 </A>CPU benchmark suite
|
|
(the CPU Integer and FP tests, which SPEC calls CINT95/CFP95) is an example
|
|
of a promising Jedi that succumbed to the Dark side of the Force ;-).</P>
|
|
|
|
<P>SPEC (<B>S</B>tandard <B>P</B>erformance <B>E</B>valuation <B>C</B>orporation)
|
|
originated as a non-profit corporation with the explicit objective of creating
|
|
a vendor-independent, objective, non-biased, industry-wide CPU benchmark
|
|
suite. Founding members were some universities and various CPU and systems
|
|
manufacturers, such as Intel, HP, Digital, IBM and Motorola.</P>
|
|
|
|
<P>However, some technical and philosophical issues have developed for
|
|
historical reasons that make SPEC95 inadequate for Linux benchmarking:
|
|
</P>
|
|
|
|
<OL>
|
|
<LI><B>Cost</B>. Strangely enough, SPEC95 benchmarks are free but you have
|
|
to pay for them: last time I checked, the CINT95/CFP95 cost was $600. The
|
|
quarterly newsletter was $550. These sums correspond to "administrative
|
|
costs", according to SPEC.</LI>
|
|
|
|
<LI><B>Licensing</B>. SPEC benchmarks are not placed under GPL. In fact,
|
|
SPEC95 has a severely limiting license that makes it inadequate for GNU/Linux
|
|
users. The license is clearly geared to large corporations/organizations:
|
|
you almost need a full-time employee just to handle all the requisites
|
|
specified in the license, you cannot freely reproduce the sources, new
|
|
releases are every three years, etc...</LI>
|
|
|
|
<LI><B>Outright cheating</B>. Recently, a California Court ordered a major
|
|
microprocessor manufacturer to pay back $50 for each processor sold of
|
|
a given speed and model, because the manufacturer had distorted SPEC results
|
|
with a modified version of gcc, and used such results in its advertisements.
|
|
Benchmarketing seems to have backfired on this occasion.</LI>
|
|
|
|
<LI><B>Comparability</B>. Hennessy and Patterson (see reference above)
|
|
clearly identify the technical limitations of SPEC92. Basically these have
|
|
to do with each vendor optimizing benchmark runs for their specific purposes.
|
|
Even though SPEC95 was released as an update that would work around these
|
|
limitations, it does not (and cannot, in practical terms) satisfactorily
|
|
address this issue. Compiler flag issues in SPEC92 prompted SPEC to release
|
|
a 10-page document entitled "SPEC Run and Reporting Rules for CPU95
|
|
Suites". It clearly shows how confident SPEC is that nobody will try
|
|
to circumvent specific CPU shortcomings with tailor-made compilers/optimizers!
|
|
Unfortunately, SPEC98 is likely to carry over these problems to the next
|
|
generation of CPU performance measurements.</LI>
|
|
|
|
<LI><B>Run time</B>. Last but not least, the SPEC95 benchmarks take about
|
|
2 days to run on the SPARC reference machine. Note that this in no way
|
|
makes them more accurate than other CPU benchmarks that run in < 5 minutes
|
|
(e.g. nbench-byte, presented below)!</LI>
|
|
</OL>
|
|
|
|
<P>Summarizing, if you must absolutely compare CPU performance for different
|
|
configurations running GNU/Linux, SPEC95 is definitely <B>not</B> the recommended
|
|
benchmark. On the other hand it's a handy tool for benchmarketers.</P>
|
|
|
|
<P>
|
|
<HR><A HREF="./Article3e-1.html"><IMG SRC="./gx/balsa/prev.gif" ALT="Previous" HEIGHT=16 WIDTH=16></A>
|
|
<A HREF="./Article3e-3.html"><IMG SRC="./gx/balsa/next.gif" ALT="Next" HEIGHT=16 WIDTH=16></A>
|
|
<A HREF="./Article3e.html#toc2"><IMG SRC="./gx/balsa/toc.gif" ALT="Contents" HEIGHT=16 WIDTH=16></A>
|
|
</P>
|
|
|
|
</BODY>
|
|
</HTML>
|