252 lines
11 KiB
HTML
252 lines
11 KiB
HTML
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
||
|
<HTML>
|
||
|
<HEAD>
|
||
|
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
||
|
<TITLE>Linux Benchmarking HOWTO: The Linux Benchmarking Toolkit (LBT)</TITLE>
|
||
|
<LINK HREF="Benchmarking-HOWTO-4.html" REL=next>
|
||
|
<LINK HREF="Benchmarking-HOWTO-2.html" REL=previous>
|
||
|
<LINK HREF="Benchmarking-HOWTO.html#toc3" REL=contents>
|
||
|
</HEAD>
|
||
|
<BODY>
|
||
|
<A HREF="Benchmarking-HOWTO-4.html">Next</A>
|
||
|
<A HREF="Benchmarking-HOWTO-2.html">Previous</A>
|
||
|
<A HREF="Benchmarking-HOWTO.html#toc3">Contents</A>
|
||
|
<HR>
|
||
|
<H2><A NAME="s3">3. The Linux Benchmarking Toolkit (LBT)</A></H2>
|
||
|
|
||
|
<P>
|
||
|
<P>I will propose a basic benchmarking toolkit for Linux. This is a preliminary version of a comprehensive Linux Benchmarking Toolkit, to be expanded and improved. Take it for what it's worth, i.e. as a proposal. If you don't think it is a valid test suite, feel free to email me your critics and I will be glad to make the changes and improve it if I can. Before getting into an argument, however, read this HOWTO and the mentionned references: informed criticism is welcomed, empty criticism is not.
|
||
|
<H2><A NAME="ss3.1">3.1 Rationale</A>
|
||
|
</H2>
|
||
|
|
||
|
<P>
|
||
|
<P>This is just common sense:
|
||
|
<OL>
|
||
|
<LI>It should not take a whole day to run. When it comes to comparative benchmarking (various runs), nobody wants to spend days trying to figure out the fastest setup for a given system. Ideally, the entire benchmark set should take about 15 minutes to complete on an average machine.</LI>
|
||
|
<LI>All source code for the software used must be freely available on the Net, for obvious reasons.</LI>
|
||
|
<LI>Benchmarks should provide simple figures reflecting the measured performance. </LI>
|
||
|
<LI>There should be a mix of synthetic benchmarks and application benchmarks (with separate results, of course).</LI>
|
||
|
<LI>Each <B>synthetic</B> benchmarks should exercise a particular subsystem to its maximum capacity. </LI>
|
||
|
<LI>Results of <B>synthetic</B> benchmarks should <B>not</B> be averaged into a single figure of merit (that defeats the whole idea behind synthetic benchmarks, with considerable loss of information). </LI>
|
||
|
<LI>Applications benchmarks should consist of commonly executed tasks on Linux systems. </LI>
|
||
|
</OL>
|
||
|
<H2><A NAME="ss3.2">3.2 Benchmark selection</A>
|
||
|
</H2>
|
||
|
|
||
|
<P>
|
||
|
<P>I have selected five different benchmark suites, trying as much as possible to avoid overlap in the tests:
|
||
|
<OL>
|
||
|
<LI>Kernel 2.0.0 (default configuration) compilation using gcc. </LI>
|
||
|
<LI>Whetstone version 10/03/97 (latest version by Roy Longbottom).</LI>
|
||
|
<LI>xbench-0.2 (with fast execution parameters).</LI>
|
||
|
<LI>UnixBench benchmarks version 4.01 (partial results). </LI>
|
||
|
<LI>BYTE Magazine's BYTEmark benchmarks beta release 2 (partial results). </LI>
|
||
|
</OL>
|
||
|
<P>For tests 4 and 5, "(partial results)" means that not all results produced by these benchmarks are considered.
|
||
|
<H2><A NAME="ss3.3">3.3 Test duration</A>
|
||
|
</H2>
|
||
|
|
||
|
<P>
|
||
|
<OL>
|
||
|
<LI>Kernel 2.0.0 compilation: 5 - 30 minutes, depending on the <B>real</B> performance of your system. </LI>
|
||
|
<LI>Whetstone: 100 seconds. </LI>
|
||
|
<LI>Xbench-0.2: < 1 hour. </LI>
|
||
|
<LI>UnixBench benchmarks version 4.01: approx. 15 minutes. </LI>
|
||
|
<LI>BYTE Magazine's BYTEmark benchmarks: approx. 10 minutes. </LI>
|
||
|
</OL>
|
||
|
<H2><A NAME="ss3.4">3.4 Comments</A>
|
||
|
</H2>
|
||
|
|
||
|
<P>
|
||
|
<H3>Kernel 2.0.0 compilation: </H3>
|
||
|
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI><B>What:</B> it is the only application benchmark in the LBT. </LI>
|
||
|
<LI>The code is widely available (i.e. I finally found some use for my old Linux CD-ROMs). </LI>
|
||
|
<LI>Most linuxers recompile the kernel quite often, so it is a significant measure of overall performance. </LI>
|
||
|
<LI>The kernel is large and gcc uses a large chunk of memory: attenuates L2 cache size bias with small tests. </LI>
|
||
|
<LI>It does frequent I/O to disk. </LI>
|
||
|
<LI>Test procedure: get a pristine 2.0.0 source, compile with default options (make config, press Enter repeatedly). The reported time should be the time spent on compilation i.e. after you type make zImage, <B>not</B> including make dep, make clean. Note that the default target architecture for the kernel is the i386, so if compiled on another architecture, gcc too should be set to cross-compile, with i386 as the target architecture.</LI>
|
||
|
<LI><B>Results: </B>compilation time in minutes and seconds (please don't report fractions of seconds). </LI>
|
||
|
</UL>
|
||
|
<H3>Whetstone: </H3>
|
||
|
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI><B>What: </B>measures pure floating point performance with a short, tight loop. The source (in C) is quite readable and it is very easy to see which floating-point operations are involved.</LI>
|
||
|
<LI>Shortest test in the LBT :-). </LI>
|
||
|
<LI>It's an "Old Classic" test: comparable figures are available, its flaws and shortcomings are well known. </LI>
|
||
|
<LI>Test procedure: the newest C source should be obtained from Aburto's site. Compile and run in double precision mode. Specify gcc and -O2 as precompiler and precompiler options, and define POSIX 1 to specify machine type.</LI>
|
||
|
<LI><B>Results: </B>a floating-point performance figure in MWIPS.</LI>
|
||
|
</UL>
|
||
|
<H3>Xbench-0.2: </H3>
|
||
|
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI><B>What:</B> measures X server performance. </LI>
|
||
|
<LI>The xStones measure provided by xbench is a weighted average of several tests indexed to an old Sun station with a single-bit-depth display. Hmmm... it is questionable as a test of modern X servers, but it's still the best tool I have found. </LI>
|
||
|
<LI>Test procedure: compile with -O2. We specify a few options for a shorter run:<CODE> ./xbench -timegoal 3 > results/name_of_your_linux_box.out</CODE>. To get the xStones rating, we must run an awk script; the simplest way is to type <CODE>make summary.ms</CODE>. Check the summary.ms file: the xStone rating for your system is in the last column of the line with your machine name specified during the test.</LI>
|
||
|
<LI><B>Results:</B> an X performance figure in xStones. </LI>
|
||
|
<LI>Note: this test, as it stands, is outdated. It should be re-coded.</LI>
|
||
|
</UL>
|
||
|
<H3>UnixBench version 4.01: </H3>
|
||
|
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI><B>What:</B> measures overall Unix performance. This test will exercice the file I/O and kernel multitasking performance.</LI>
|
||
|
<LI>I have discarded all arithmetic test results, keeping only the system-related test results.</LI>
|
||
|
<LI>Test procedure: make with -O2. Execute with<CODE> ./Run -1</CODE> (run each test once). You will find the results in the ./results/report file. Calculate the geometric mean of the EXECL THROUGHPUT, FILECOPY 1, 2, 3, PIPE THROUGHPUT, PIPE-BASED CONTEXT SWITCHING, PROCESS CREATION, SHELL SCRIPTS and SYSTEM CALL OVERHEAD indexes.</LI>
|
||
|
<LI><B>Results:</B> a system index.</LI>
|
||
|
</UL>
|
||
|
<H3>BYTE Magazine's BYTEmark benchmarks: </H3>
|
||
|
|
||
|
<P>
|
||
|
<UL>
|
||
|
<LI><B>What:</B> provides a good measure of CPU performance. Here is an excerpt from the documentation: <EM>"These benchmarks are meant to expose the theoretical upper limit of the CPU, FPU, and memory architecture of a system. They cannot measure video, disk, or network throughput (those are the domains of a different set of benchmarks). You should, therefore, use the results of these tests as part, not all, of any evaluation of a system."</EM></LI>
|
||
|
<LI>I have discarded the FPU test results since the Whetstone test is just as representative of FPU performance.</LI>
|
||
|
<LI>I have split the integer tests in two groups: those more representative of memory-cache-CPU performance and the CPU integer tests.</LI>
|
||
|
<LI>Test procedure: make with -O2. Run the test with <CODE>./nbench > myresults.dat</CODE> or similar. Then, from myresults.dat, calculate geometric mean of STRING SORT, ASSIGNMENT and BITFIELD test indexes; this is the <B>memory index</B>; calculate the geometric mean of NUMERIC SORT, IDEA, HUFFMAN and FP EMULATION test indexes; this is the <B>integer index</B>.</LI>
|
||
|
<LI><B>Results:</B> a memory index and an<B> </B>integer index calculated as explained above.</LI>
|
||
|
</UL>
|
||
|
<H2><A NAME="ss3.5">3.5 Possible improvements</A>
|
||
|
</H2>
|
||
|
|
||
|
<P>
|
||
|
<P>The ideal benchmark suite would run in a few minutes, with synthetic benchmarks testing every subsystem separately and applications benchmarks providing results for different applications. It would also automatically generate a complete report and eventually email the report to a central database on the Web.
|
||
|
<P>We are not really interested in portability here, but it should at least run on all recent (> 2.0.0) versions and flavours (i386, Alpha, Sparc...) of Linux.
|
||
|
<P>If anybody has any idea about benchmarking network performance in a simple, easy and reliable way, with a short (less than 30 minutes to setup and run) test, please contact me.
|
||
|
<H2><A NAME="ss3.6">3.6 LBT Report Form</A>
|
||
|
</H2>
|
||
|
|
||
|
<P>
|
||
|
<P>Besides the tests, the benchmarking procedure would not be complete without a form describing the setup, so here it is (following the guidelines from comp.benchmarks.faq):
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
LINUX BENCHMARKING TOOLKIT REPORT FORM
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
CPU
|
||
|
==
|
||
|
Vendor:
|
||
|
Model:
|
||
|
Core clock:
|
||
|
Motherboard vendor:
|
||
|
Mbd. model:
|
||
|
Mbd. chipset:
|
||
|
Bus type:
|
||
|
Bus clock:
|
||
|
Cache total:
|
||
|
Cache type/speed:
|
||
|
SMP (number of processors):
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
RAM
|
||
|
====
|
||
|
Total:
|
||
|
Type:
|
||
|
Speed:
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
Disk
|
||
|
====
|
||
|
Vendor:
|
||
|
Model:
|
||
|
Size:
|
||
|
Interface:
|
||
|
Driver/Settings:
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
Video board
|
||
|
===========
|
||
|
Vendor:
|
||
|
Model:
|
||
|
Bus:
|
||
|
Video RAM type:
|
||
|
Video RAM total:
|
||
|
X server vendor:
|
||
|
X server version:
|
||
|
X server chipset choice:
|
||
|
Resolution/vert. refresh rate:
|
||
|
Color depth:
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
Kernel
|
||
|
=====
|
||
|
Version:
|
||
|
Swap size:
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
gcc
|
||
|
===
|
||
|
Version:
|
||
|
Options:
|
||
|
libc version:
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
Test notes
|
||
|
==========
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
RESULTS
|
||
|
========
|
||
|
Linux kernel 2.0.0 Compilation Time: (minutes and seconds)
|
||
|
Whetstones: results are in MWIPS.
|
||
|
Xbench: results are in xstones.
|
||
|
Unixbench Benchmarks 4.01 system INDEX:
|
||
|
BYTEmark integer INDEX:
|
||
|
BYTEmark memory INDEX:
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
|
||
|
<HR>
|
||
|
<PRE>
|
||
|
Comments*
|
||
|
=========
|
||
|
* This field is included for possible interpretations of the results, and as
|
||
|
such, it is optional. It could be the most significant part of your report,
|
||
|
though, specially if you are doing comparative benchmarking.
|
||
|
</PRE>
|
||
|
<HR>
|
||
|
<H2><A NAME="ss3.7">3.7 Network performance tests </A>
|
||
|
</H2>
|
||
|
|
||
|
<P>
|
||
|
<P>Testing network performance is a challenging task since it involves at least two machines, a server and a client machine, hence twice the time to setup and many more variables to control, etc... On an ethernet network, I guess your best bet would be the ttcp package. (to be expanded)
|
||
|
<H2><A NAME="ss3.8">3.8 SMP tests</A>
|
||
|
</H2>
|
||
|
|
||
|
<P>
|
||
|
<P>SMP tests are another challenge, and any benchmark specifically designed for SMP testing will have a hard time proving itself valid in real-life settings, since algorithms that can take advantage of SMP are hard to come by. It seems later versions of the Linux kernel (> 2.1.30 or around that) will do "fine-grained" multiprocessing, but I have no more information than that for the moment.
|
||
|
<P>According to David Niemi, <EM>" ... shell8 </EM>[part of the Unixbench 4.01 benchmaks]<EM>does a good job at comparing similar hardware/OS in SMP and UP modes."</EM>
|
||
|
<HR>
|
||
|
<A HREF="Benchmarking-HOWTO-4.html">Next</A>
|
||
|
<A HREF="Benchmarking-HOWTO-2.html">Previous</A>
|
||
|
<A HREF="Benchmarking-HOWTO.html#toc3">Contents</A>
|
||
|
</BODY>
|
||
|
</HTML>
|