old-www/LDP/LG/issue24/Article3e-5.html

405 lines
14 KiB
HTML

<HTML>
<HEAD>
<TITLE>Linux Benchmarking - Article III - Interpreting benchmark results: An example of correct/incorrect interpretation of results</TITLE>
</HEAD>
<BODY>
<A HREF="./Article3e-4.html"><IMG SRC="./gx/balsa/prev.gif" ALT="Previous"></A>
<A HREF="./Article3e-6.html"><IMG SRC="./gx/balsa/next.gif" ALT="Next"></A>
<A HREF="./Article3e.html#toc5"><IMG SRC="./gx/balsa/toc.gif" ALT="Contents"></A>
<HR>
<H2><A NAME="s5">5. An example of correct/incorrect interpretation of results</A></H2>
<P>We finally get to the practical part of this article. As usual, I propose a different benchmark as a practical example, only this time we will be seeing a more complex benchmark, in fact a CPU benchmark suite: we'll use the latest version of nbench-byte (version 2.1) as our example. You can download it from Uwe Mayer's
<A HREF="http://www.tux.org/~mayer/">new Web site </A>or from the
<A HREF="http://www.tux.org/bench">Linux Benchmarking Project</A></P>
<P>What we are going to measure this time can be described as "general CPU performance". So: this is not processor performance for matrix operations, this is not MMX performance, this is not the ability of a processor to decode an MPEG stream. Also, this is not a measure of a processor interrupt response time, peak MIPS, etc.
<OL>
<LI>Wrong ways to benchmark.</LI>
<LI>Wrong way to analyze benchmarking data measurements.</LI>
</OL>
</P>
<P>Now let's take a look at a correct procedure, following all the steps recommended in section 1.1</P>
<H2><A NAME="ss5.1">5.1 Example</A></H2>
<H3>Stating our objective</H3>
<P>For this short example we just want to compare the performance of two different CPUs: the AMD K6 and the Cyrix 6x86MX. This is comparative benchmarking, so we should keep all conditions fixed and vary just this single variable: the CPU.</P>
<P>This is not too ambitious and I have no bias for/against any of these two chips. Also, since both CPUs are widely available at reasonable prices, such comparative benchmarking may be of interest to GNU/Linux users wanting to upgrade and/or put together their next CPU.</P>
<H3>Choice of a benchmark</H3>
<P>Nbench-byte is an improved, updated version of the
<A HREF="http://www.byte.com/bmark/bmark.htm"> BYTEmark benchmark suite </A>developed at BYTE magazine by Rick Grehan. Uwe F. Mayer did the port to Linux and is its present maintainer/developer. The latest version is 2.1, dated December 97.</P>
<P>Similarly to SPEC95, this modern CPU benchmark suite uses 10 different algorithms that are representative of common CPU-intensive tasks (the file bdoc.txt included with the source has a description of each algorithm). Note that Rick has stopped development of BYTEmark (neither Uwe nor myself managed to contact him), but you can see that this is not a committee-designed benchmark; in this respect its lineage fits quite well the GNU/Linux style of development.</P>
<P>Nbench-byte 2.1 also goes one step beyond SPEC95 in that it generates three index figures: an Integer Index, a Floating-Point Index and a Memory Index. The Memory Index reflects the fact that on most modern CPUs, the memory subsystem represents a major performance bottleneck. You can check the Web site for
<A HREF="http://www.cs.virginia.edu/stream/"> STREAM </A> a new benchmark specifically created to address this issue, for more information on this topic.</P>
<P>One of nbench-byte nicest features is that it calibrates itself. For each of the tests it determines a minimum amount of work that needs to be done to be able to accurately measure the time needed. Then it runs that test five times and does a statistical analysis (using the student-t distribution) to see if the results are consistent (meaning that the probability is at least 95% that the true mean of the results is within 5% of the calculated mean of the results). If not, then nbench runs the test up to twenty-five times more and does the statistical analysis after each additional test run. If consistency cannot be achieved within a total of thirty runs, a warning will be issued when the score gets reported. </P>
<P>In terms of raw data statistical processing, nbench-byte 2.1 goes beyond all the other benchmarks I have ever come across.</P>
<P>Another very interesting feature of this benchmark suite is its portability across a wide range of OS's and platforms. However, because of fundamental differences in compiler/libraries/memory management in different OSes, this benchmark should <B>not</B> be carelessly used to compare results across platforms. This is not an OS benchmark, it's a CPU benchmark (see the pitfalls subsection below). You have been warned.</P>
<H3>Benchmark setup</H3>
<P>We are doing comparative benchmarking, so we will be using exactly the same hardware for our benchmark runs. All that will change between runs is:
<OL>
<LI>The processor (one run with a 6x86MX, the other run with a K6).</LI>
<LI>A small cyrix.rc file that was added to the rc.local script. This calls set6x86 to setup a few internal 6x86MX registers. The K6 does not need this file.</LI>
</OL>
</P>
<P>Also note that we are using the precompiled nbench executable, as shipped in the tar.gz package.</P>
<P>To describe our hardware setup, we resort to the Linux Benchmarking Toolkit Report Form:</P>
<P>LINUX BENCHMARKING TOOLKIT REPORT FORM
<PRE>
CPU
</PRE>
<PRE>
===
</PRE>
<PRE>
Vendor: AMD/Cyrix
</PRE>
<PRE>
Model: K6-166/6x86MX-PR200
</PRE>
<PRE>
Core clock:166 MHz (2.5 x 66MHz)
</PRE>
<PRE>
Motherboard vendor: ASUS
</PRE>
<PRE>
Mbd. model: P55T2P4
</PRE>
<PRE>
Mbd. chipset: Intel HX
</PRE>
<PRE>
Bus type: PCI
</PRE>
<PRE>
Bus clock: 33 MHz
</PRE>
<PRE>
Cache total: 512 Kb
</PRE>
<PRE>
Cache type/speed: Pipeline burst 6 ns
</PRE>
<PRE>
SMP (number of processors): 1
</PRE>
<PRE>
RAM
</PRE>
<PRE>
===
</PRE>
<PRE>
Total: 32 MB
</PRE>
<PRE>
Type: EDO SIMMs
</PRE>
<PRE>
Speed: 60 ns
</PRE>
<PRE>
Disk
</PRE>
<PRE>
====
</PRE>
<PRE>
Vendor: IBM
</PRE>
<PRE>
Model: IBM-DCAA-34430
</PRE>
<PRE>
Size: 4.3 GB
</PRE>
<PRE>
Interface: EIDE
</PRE>
<PRE>
Driver/Settings: Bus Master DMA mode 2
</PRE>
<PRE>
Video board
</PRE>
<PRE>
===========
</PRE>
<PRE>
Vendor: Generic S3
</PRE>
<PRE>
Model: Trio64-V2
</PRE>
<PRE>
Bus: PCI
</PRE>
<PRE>
Video RAM type: 60 ns EDO DRAM
</PRE>
<PRE>
Video RAM total: 2 MB
</PRE>
<PRE>
X server vendor: XFree86
</PRE>
<PRE>
X server version: 3.3
</PRE>
<PRE>
X server chipset choice: S3 accelerated
</PRE>
<PRE>
Resolution/vert. refresh rate: 1152x864 @ 70 Hz
</PRE>
<PRE>
Color depth: 16 bits
</PRE>
<PRE>
Kernel
</PRE>
<PRE>
======
</PRE>
<PRE>
Version: 2.0.29
</PRE>
<PRE>
Swap size: 64 MB
</PRE>
<PRE>
gcc
</PRE>
<PRE>
===
</PRE>
<PRE>
Version: 2.7.2.3
</PRE>
<PRE>
Options: (default nbench)
</PRE>
<PRE>
libc version: 5.4.38
</PRE>
<PRE>
Test notes
</PRE>
<PRE>
==========
</PRE>
<PRE>
Two processors tested. The 6x86MX was configured with a special rc.cyrix file.
</PRE>
<PRE>
RESULTS
</PRE>
<PRE>
========
</PRE>
<PRE>
Linux kernel 2.0.0 Compilation Time: N/A
</PRE>
<PRE>
Whetstone Double Precision (FPU) INDEX: N/A
</PRE>
<PRE>
UnixBench 4.10 system INDEX: N/A
</PRE>
<PRE>
Xengine: N/A
</PRE>
<PRE>
nbench-byte integer INDEX: 6x86MX - 0.686; K6 - 0.713
</PRE>
<PRE>
nbench-byte memory INDEX: 6x86MX - 0.753; K6 - 0.793
</PRE>
<PRE>
nbench-byte floating-point INDEX: 6x86MX - 0.655; K6 - 0.802
</PRE>
<PRE>
Comments
</PRE>
<PRE>
=========
</PRE>
<PRE>
With the CPU case open, it took me 30 minutes to run nbench-byte on the two processors!
</PRE>
</P>
<H3>Detailed benchmark results</H3>
<P>One can get very detailed benchmark results with nbench-byte 2.1 by specifying the -v option. However, here we are only showing the normal output from a standard run, first on the 6x86MX, then on the K6:</P>
<P><B>6x86MX results:</B></P>
<P><CODE> </CODE></P>
<P><CODE>BYTEmark* Native Mode Benchmark ver. 2 (10/95)</CODE></P>
<P><CODE>Index-split by Andrew D. Balsa (11/97)</CODE></P>
<P><CODE>Linux/Unix* port by Uwe F. Mayer (12/96,11/97)</CODE></P>
<P><CODE> </CODE></P>
<P><CODE>TEST : Iterations/sec. : Old Index : New Index</CODE></P>
<P><CODE> : : Pentium 90* : AMD K6/233*</CODE></P>
<P><CODE>--------------------:------------------:-------------:------------</CODE></P>
<P><CODE>NUMERIC SORT : 80.681 : 2.07 : 0.68</CODE></P>
<P><CODE>STRING SORT : 11.107 : 4.96 : 0.77</CODE></P>
<P><CODE>BITFIELD : 2.1997e+07 : 3.77 : 0.79</CODE></P>
<P><CODE>FP EMULATION : 8.5349 : 4.10 : 0.95</CODE></P>
<P><CODE>FOURIER : 881.21 : 1.00 : 0.56</CODE></P>
<P><CODE>ASSIGNMENT : 0.71582 : 2.72 : 0.71</CODE></P>
<P><CODE>IDEA : 147.28 : 2.25 : 0.67</CODE></P>
<P><CODE>HUFFMAN : 58.095 : 1.61 : 0.51</CODE></P>
<P><CODE>NEURAL NET : 0.70897 : 1.14 : 0.48</CODE></P>
<P><CODE>LU DECOMPOSITION : 27.869 : 1.44 : 1.04</CODE></P>
<P><CODE>==========================ORIGINAL BYTEMARK RESULTS==========================</CODE></P>
<P><CODE>INTEGER INDEX : 2.861</CODE></P>
<P><CODE>FLOATING-POINT INDEX: 1.181</CODE></P>
<P><CODE>Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0</CODE></P>
<P><CODE>==============================LINUX DATA BELOW===============================</CODE></P>
<P><CODE>C compiler : gcc version 2.7.2.3</CODE></P>
<P><CODE>libc : libc.so.5.4.38</CODE></P>
<P><CODE>MEMORY INDEX : 0.753</CODE></P>
<P><CODE>INTEGER INDEX : 0.686</CODE></P>
<P><CODE>FLOATING-POINT INDEX: 0.655</CODE></P>
<P><CODE>Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38</CODE></P>
<P><CODE>* Trademarks are property of their respective holder.</CODE></P>
<P><B>K6 results:</B></P>
<P><CODE> </CODE></P>
<P><CODE>BYTEmark* Native Mode Benchmark ver. 2 (10/95)</CODE></P>
<P><CODE>Index-split by Andrew D. Balsa (11/97)</CODE></P>
<P><CODE>Linux/Unix* port by Uwe F. Mayer (12/96,11/97)</CODE></P>
<P><CODE> </CODE></P>
<P><CODE>TEST : Iterations/sec. : Old Index : New Index</CODE></P>
<P><CODE> : : Pentium 90* : AMD K6/233*</CODE></P>
<P><CODE>--------------------:------------------:-------------:------------</CODE></P>
<P><CODE>NUMERIC SORT : 82.229 : 2.11 : 0.69</CODE></P>
<P><CODE>STRING SORT : 10.57 : 4.72 : 0.73</CODE></P>
<P><CODE>BITFIELD : 2.0672e+07 : 3.55 : 0.74</CODE></P>
<P><CODE>FP EMULATION : 6.4842 : 3.11 : 0.72</CODE></P>
<P><CODE>FOURIER : 1117.1 : 1.27 : 0.71</CODE></P>
<P><CODE>ASSIGNMENT : 0.93388 : 3.55 : 0.92</CODE></P>
<P><CODE>IDEA : 158.42 : 2.42 : 0.72</CODE></P>
<P><CODE>HUFFMAN : 81.407 : 2.26 : 0.72</CODE></P>
<P><CODE>NEURAL NET : 1.0764 : 1.73 : 0.73</CODE></P>
<P><CODE>LU DECOMPOSITION : 26.521 : 1.37 : 0.99</CODE></P>
<P><CODE>==========================ORIGINAL BYTEMARK RESULTS==========================</CODE></P>
<P><CODE>INTEGER INDEX : 2.990</CODE></P>
<P><CODE>FLOATING-POINT INDEX: 1.445</CODE></P>
<P><CODE>Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0</CODE></P>
<P><CODE>==============================LINUX DATA BELOW===============================</CODE></P>
<P><CODE>C compiler : gcc version 2.7.2.3</CODE></P>
<P><CODE>libc : libc.so.5.4.38</CODE></P>
<P><CODE>MEMORY INDEX : 0.793</CODE></P>
<P><CODE>INTEGER INDEX : 0.713</CODE></P>
<P><CODE>FLOATING-POINT INDEX: 0.802</CODE></P>
<P><CODE>Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38</CODE></P>
<P><CODE>* Trademarks are property of their respective holder.</CODE></P>
<H3>Data analysis</H3>
<P>We will concentrate on the Linux data, for obvious reasons. As we can see, whereas the 6x86MX outperforms the K6 on some tests by a narrow margin (approx. 6%), the K6 vastly outperforms the 6x86MX on other tests.</P>
<H3>Conclusion</H3>
<P>On our synthetic test nbench-byte version 2.1, the K6 has shown slightly better performance than the 6x86MX, running at the same 166MHz (2.5 x 66MHz) clock rate on exactly the same hardware.</P>
<H2><A NAME="ss5.2">5.2 Pitfalls</A></H2>
<P>The basic pitfall that one should be warned against concerning nbench-byte applies similarly to all benchmarks: one should not to try to use this tool for something it was not designed for. Since this is a CPU benchmark, do not use it to test OS performance, video bandwidth, or any other feature that implies I/O activity. Also, it is not an adequate tool for comparing compilers and/or C and math libraries.</P>
<P>This is less obvious than it seems at first. For an accurate, thorough, documented discussion of this particular pitfall, you are referred to
<A HREF="http://www.tux.org/~mayer/linux/compare/index.html">one of Uwe's excellent pages on benchmarking</A></P>
<P>Another pitfall would have been to compare the two processors running on widely different machines. Motherboard, cache and RAM timing setup can skew results by as much as 10%. Compilation options and libraries can also skew results by 25% or more.</P>
<HR>
<A HREF="./Article3e-4.html"><IMG SRC="./gx/balsa/prev.gif" ALT="Previous"></A>
<A HREF="./Article3e-6.html"><IMG SRC="./gx/balsa/next.gif" ALT="Next"></A>
<A HREF="./Article3e.html#toc5"><IMG SRC="./gx/balsa/toc.gif" ALT="Contents"></A>
</BODY>
</HTML>