<revremark>Converted to XML 4.1.2, added gfdl, reviewed, author revisions</revremark>
</revision>
<revision>
<revnumber>1.0</revnumber>
<date>2002-08-24</date>
<authorinitials>SS</authorinitials>
<revremark>Initial release</revremark>
</revision>
</revhistory>
<abstract>
<para>
This document is a guide to Valgrind, the <function>malloc</function> debugger. Valgrind 1.0.0 is described.
</para>
</abstract>
</articleinfo>
<sect1id="background">
<title>Background</title>
<para>
Dynamic storage allocation plays an important role in C programming;
it is also the breeding ground of numerous hard-to-track-down bugs.
Freeing an allocated block twice, running off the edge of the
malloc'ed buffer, and failing to keep track of addresses of allocated
blocks are common errors which frustrate the programmer - debugging
them is very difficult due to the errors manifesting themselves
as <quote>mysterious behavior</quote> at places far off from the point where the
programmer actually committed the blunder.
</para>
</sect1>
<sect1id="intro">
<title>Introduction</title>
<sect2id="purpose">
<title>Purpose</title>
<para>
Valgrind is an open-source tool for finding memory-management problems
in Linux-x86 executables. It detects memory leaks/corruption in the program
being run. It is being developed by <ulinkurl="mailto:jseward@acm.org">Julian Seward</ulink>.
</para>
</sect2>
<sect2id="acknowledgements">
<title>Acknowledgments</title>
<para>
We express our sincere appreciation to Julian Seward
for creating Valgrind. Thanks to Mr.Pramode C.E and also
friends at the Govt Engineering College, Trichur for their advice and cooperation.
</para>
</sect2>
<sect2id="copyright">
<title>Copyright and Distribution Policy</title>
<para>
Copyright (C)2002 Deepak P, Sandeep S.
</para>
<para>Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
with no Invariant Sections, with no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in <xreflinkend="gfdl"/> entitled "GNU
Free Documentation License".</para>
</sect2>
<sect2id="feedback">
<title>Feedback and Corrections</title>
<para>
Kindly forward feedback and criticism to <ulinkurl="mailto:pdeepak16@vsnl.com">Deepak.P</ulink> or/and <ulinkurl="mailto:sandeep_gect@yahoo.com">Sandeep.S</ulink>. We shall be indebted to anybody who points out errors and inaccuracies in this document; we will rectify them as soon as we are informed.
</para>
</sect2>
</sect1>
<sect1id="getinstalled">
<title>Getting it Installed</title>
<sect2id="getvalgrind">
<title>Getting Valgrind</title>
<para>
Valgrind may be obtained from the following locations:
Add the path to your <varname>path</varname> variable. Now valgrind is ready to catch
the bugs.
</para>
</sect2>
</sect1>
<sect1id="closerview">
<title>A Closer View</title>
<sect2id="whyvalgrind">
<title>Why Valgrind?</title>
<para>
As said above, memory management is prone to errors that are too
hard to detect. Common errors may be listed as:
</para>
<para>
<orderedlist>
<listitem><para>Use of uninitialized memory</para></listitem>
<listitem><para>Reading/writing memory after it has been freed</para></listitem>
<listitem><para>Reading/writing off the end of malloc'd blocks</para></listitem>
<listitem><para>Reading/writing inappropriate areas on the stack</para></listitem>
<listitem><para>Memory leaks -- where pointers to malloc'd blocks are lost forever</para></listitem>
<listitem><para>Mismatched use of malloc/new/new[] vs free/delete/delete[]</para></listitem>
<listitem><para>Some misuses of the POSIX pthreads API</para></listitem>
</orderedlist>
</para>
<para>
These errors usually lead to crashes.
</para>
<para>
This is a situation where we need Valgrind. Valgrind works directly with the
executables, with no need to recompile, relink or modify the program to be
checked. Valgrind decides whether the program should be modified to avoid memory
leak, and also points out the spots of <quote>leak.</quote>
</para>
<para>
Valgrind simulates every single instruction your program executes.
For this reason, Valgrind finds errors not only in your application but also in
all supporting dynamically-linked (.so-format) libraries, including the GNU C
library, the X client libraries, Qt if you work with KDE, and so on. That
often includes libraries, for example the GNU C library, which may contain
memory access violations.
</para>
</sect2>
<sect2id="usage">
<title>Usage</title>
<sect3id="invoking">
<title>Invoking Valgrind</title>
<para>
The checking may be performed by simply placing the word <command>valgrind</command>
just before the normal command used to invoke the program. For example:
</para>
<screen>
#valgrind ps -ax
</screen>
<para>
Valgrind provides thousands of options. We deliberately avoid them, not
to make this article boring.
</para>
<para>
The output contains the usual output of <command>ps -ax</command> also with the
detailed report by valgrind. Any error (memory related) is pointed out in the error report.
</para>
</sect3>
<sect3id="erridentify">
<title>How to Identify the Error from the Error Report</title>
<para>
Consider the output of Valgrind for some test program:
<screen>
==1353== Invalid read of size 4
==1353== at 0x80484F6: print (valg_eg.c:7)
==1353== by 0x8048561: main (valg_eg.c:16)
==1353== by 0x4026D177: __libc_start_main
(../sysdeps/generic/libc-start.c :129)
==1353== by 0x80483F1: free@@GLIBC_2.0 (in /home/deepu/valg/a.out)
==1353== Address 0x40C9104C is 0 bytes after a block of size 40
alloc'd
==1353== at 0x40046824: malloc (vg_clientfuncs.c:100)
==1353== by 0x8048524: main (valg_eg.c:12)
==1353== by 0x4026D177: __libc_start_main
(../sysdeps/generic/libc-start.c :129)
==1353== by 0x80483F1: free@@GLIBC_2.0 (in /home/deepu/valg/a.out)
</screen>
</para>
<para>
Here, 1353 is the process ID. This part of the error report says that
a read error has occurred at line number 7, in the function
<function>print</function>. The function <function>print</function> is called by function
<function>main</function>, and both are in the file <filename>valg_eg.c</filename>.
The function <function>main</function> is called by the
function <function>__libc_start_main</function> at line
number 129, in <filename>../sysdeps/generic/libc-start.c</filename>.
The function <function>__libc_start_main</function> is called by <function>free@@GLIBC_2.0</function> in the file <filename>/home/deepu/valg/a.out.</filename> Similarly details of calling <function>malloc</function> are also given.
</para>
</sect3>
<sect3id="errortypes">
<title>Types of Errors with Examples</title>
<para>
Valgrind can only really detect two types of errors: use of illegal
address and use of undefined values. Nevertheless, this is enough to
discover all sorts of memory management problems in a program. Some common errors
are given below.
</para>
<sect4id="uninit-mem">
<title>Use of uninitialized memory</title>
<para>
Sources of uninitialized data are:
</para>
<itemizedlist>
<listitem><para>local variables that have not been initialized.</para></listitem>
<listitem><para>The contents of malloc'd blocks, before writing something there.</para></listitem>
</itemizedlist>
<para>
This is not a problem with <function>calloc</function> since it initializes
each allocated bytes with 0. The <function>new</function> operator in C++ is similar
to <function>malloc</function>. Fields of the created object will be uninitialized.
</para>
<para>
Sample program:
</para>
<programlisting>
#include <stdlib.h>
int main()
{
int p, t;
if (p == 5) /*Error occurs here*/
t = p+1;
return 0;
}
</programlisting>
<para>
Here the value of <literal>p</literal> is uninitialized, therefore <literal>p</literal> may contain
some random value (garbage), so an error may occur at the condition check.
An uninitialized variable will cause error in 2 situations:
</para>
<itemizedlist>
<listitem><para>When it is used to determine the outcome of a conditional branch.
Eg:'if (p == 5)' in the above program.</para></listitem>
<listitem><para>When it is used to generate a memory address.
Eg: In the above program let there be an integer array a[10], and if you write 'a[p] = 1', it will generate an error.</para></listitem>
</itemizedlist>
</sect4>
<sect4id="illegal-rw">
<title>Illegal read/write</title>
<para>
Illegal read/write errors occurs when you try to read/write from/to
an address that is not in the address range of your program.
</para>
<para>
Sample program:
</para>
<programlisting>
#include <stdlib.h>
int main()
{
int *p, i, a;
p = malloc(10*sizeof(int));
p[11] = 1; /* invalid write error */
a = p[11]; /* invalid read error */
free(p);
return 0;
}
</programlisting>
<para>
Here you are trying to read/write from/to address (p+sizeof(int)*11)
which is not allocated to the program.
</para>
</sect4>
<sect4id="invalid-free">
<title>Invalid free</title>
<para>
Valgrind keeps track of blocks allocated to your program with <function>malloc/new</function>. So it can easily check whether argument to free/delete is valid or not.
</para>
<para>
Sample program:
</para>
<programlisting>
#include <stdlib.h>
int main()
{
int *p, i;
p = malloc(10*sizeof(int));
for(i = 0;i < 10;i++)
p[i] = i;
free(p);
free(p); /* Error: p has already been freed */
return 0;
}
</programlisting>
<para>
Valgrind checks the address, which is given as argument to free. If it
is an address that has already been freed you will be told that the free is
invalid.
</para>
</sect4>
<sect4id="mismatcheduse">
<title>Mismatched Use of Functions</title>
<para>
In C++ you can allocate and free memory using more than one function, but the following rules must be followed:
</para>
<itemizedlist>
<listitem><para>If allocated with <function>malloc</function>, <function>calloc</function>, <function>realloc</function>, <function>valloc</function> or <function>memalign</function>, you must deallocate with <function>free</function>.</para></listitem>
<listitem><para>If allocated with <function>new[]</function>, you must deallocate with <function>delete[]</function>.</para></listitem>
<listitem><para>If allocated with <function>new</function>, you must deallocate with <function>delete</function>.</para></listitem>
</itemizedlist>
<para>
Sample program:
</para>
<programlisting>
#include <stdlib.h>
int main()
{
int *p, i;
p = ( int* ) malloc(10*sizeof(int));
for(i = 0;i < 10;i++)
p[i] = i;
delete(p); /* Error: function mismatch */
return 0;
}
</programlisting>
<para>
Output by valgrind is:
</para>
<screen>
==1066== ERROR SUMMARY: 1 errors from 1 contexts (suppressed:
0 from 0)
==1066== malloc/free: in use at exit: 0 bytes in 0 blocks.
Here, <literal>buf = p</literal> contains the address of a 10 byte block. The <function>read</function> system call tries to read 100 bytes from standard input and place it at <literal>p</literal>. But the bytes after the first 10 are unaddressable.
</para>
</sect4>
<sect4id="memleak-detect">
<title>Memory Leak Detection</title>
<para>
Consider the following program:
</para>
<para>
<programlisting>
#include <stdlib.h>
int main()
{
int *p, i;
p = malloc(5*sizeof(int));
for(i = 0;i < 5;i++)
p[i] = i;
return 0;
}
</programlisting>
</para>
<para>
<screen>
==1048== LEAK SUMMARY:
==1048== definitely lost: 20 bytes in 1 blocks.
==1048== possibly lost: 0 bytes in 0 blocks.
==1048== still reachable: 0 bytes in 0 blocks.
</screen>
</para>
<para>
In the above program <literal>p</literal> contains the address of a 20-byte block.
But it is not freed anywhere in the program. So the pointer to this 20
byte block is lost forever. This is known as memory leaking. We can get
the leak summary by using the Valgrind option <option>--leak-check=yes</option>.
</para>
</sect4>
</sect3>
<sect3id="error-suppress">
<title>How to Suppress Errors</title>
<para>
Valgrind detects numerous problems in many programs which come
pre-installed on your GNU/Linux system. You can't easily fix these, but you don't want to
see these errors (and yes, there are many!). So Valgrind reads a list of errors
to suppress at startup, from a suppression file ending in <filename>.supp</filename>.
</para>
<para>
Suppression files may be modified. This is useful if part of your
project contains errors you can't or don't want to fix, yet you don't want to
continuously be reminded of them. The format of the file is as follows.
</para>
<para>
<programlisting>
{
Error name
Type
fun:function name, which contains the error to suppress
fun:function name, which calls the function specified above
}
</programlisting>
</para>
<para>
<screen>
Error name can be any name.
type=ValueN, if the error is an uninitialized value error.
=AddrN, if it is an address error.(N=sizeof(data type))
=Free, if it is a free error (eg:mismatched free)
=Cond, if error is due to uninitialized CPU condition code.
=Param, if it is an invalid system call parameter error.
<listitem><para>Highly optimized code (compiled with -O1, -O2 options ) may sometimes cheat Valgrind.</para></listitem>
<listitem><para>Valgrind relies on dynamic linking mechanism.</para></listitem>
</orderedlist>
<para>
Valgrind is closely tied to details of the CPU, operating system and to
a less extent, compiler and basic C libraries. Presently Valgrind works only
on the Linux platform (kernels 2.2.X or 2.4.X) on x86s. Glibc 2.1.X or
2.2.X is also required for Valgrind.
</para>
</sect2>
</sect1>
<sect1id="deeper">
<title>Let's Go Deeper</title>
<para>
Valgrind simulates an Intel x86 processor and runs our test program in
this synthetic processor. The two processors are not exactly same. Valgrind is
compiled into a shared object, valgrind.so. A shell script <literal>valgrind</literal> sets
the <varname>LD_PRELOAD</varname> environment variable to point to valgrind.so. This causes the .so to be loaded as an extra library to any subsequently executed
dynamically-linked ELF binary, permitting the program to be debugged.
</para>
<para>
The dynamic linker calls the initialization function of Valgrind. Then the
synthetic CPU takes control from the real CPU. In the memory there may be
some other .so files. The dynamic linker calls the initialization function of
all such .so files. Now the dynamic linker calls the <function>main</function> of the loaded
program. When main returns, the synthetic CPU calls the finalization function of
valgrind.so. During the execution of the finalization function, summary of
all errors detected are printed and memory leaks are checked. Finalization
function exits giving back the control from the synthetic CPU to the real
one.
</para>
<sect2id="val-validity">
<title>How Valgrind Tracks Validity of Each Byte</title>
<para>
For every byte processed, the synthetic processor maintains 9 bits,
8 'V' bits and 1 'A' bit. The 'V' bits indicate the validity of the 8 bits
in the byte and the 'A' bit indicates validity of the byte address. These
valid-value(V) bits are checked only in two situations:
</para>
<orderedlist>
<listitem><para>when data is used for address generation,</para></listitem>
<listitem><para>when control flow decision is to be made.</para></listitem>
</orderedlist>
<para>
In any of these two situations, if the data is found to be undefined an
error report will be generated. But no error reports are generated while
copying or adding undefined data.
</para>
<para>
However the case with floating-point data is different. During a
floating-point read instruction the 'V' bits corresponding to the data are
checked. Thus copying of uninitialized value will produce error in case of
floating-point numbers.
</para>
<para>
<programlisting>
#include <stdlib.h>
int main()
{
int *p, *a;
p = malloc(10*sizeof(int));
a = malloc(10*sizeof(int));
a[3] = p[3];
free(a);
free(p);
return 0;
}
/* produce no errors */
</programlisting>
</para>
<para>
<programlisting>
#include <stdlib.h>
int main()
{
float *p, *a;
p = malloc(10*sizeof(float));
a = malloc(10*sizeof(float));
a[3] = p[3];
free(a);
free(p);
return 0;
}
/* produces error */
</programlisting>
</para>
<para>
All bytes that are in memory but not in CPU have an associated valid-address(A)
bit, which indicates whether the corresponding memory location is accessible by
the program. When a program starts, the 'A' bits corresponding to each global
variables are set. When a call <function>malloc</function>, <function>new</function> or any other memory allocating function is made, the 'A' bits corresponding to the allocated bytes are
set. Upon freeing the allocated block using <function>free/new/new‘’</function> the
corresponding 'A' bits are cleared. While doing a system call the 'A' bits
are changed appropriately.
</para>
<para>
When values are loaded from memory the 'A' bits corresponding to each
bytes are checked by Valgrind, and if the 'A' bit corresponding to a byte is set
then its 'V' bits is checked. If the 'V' bits are not set, an error will be
generated and the 'V' bits are set to indicate validity. This avoids long chain of
errors. If the 'A' bit corresponding to a loaded byte is 0 then its 'V' bits are
forced to set, despite the value being invalid.
</para>
<para>
Have a look on the following program. Run it.
</para>
<programlisting>
#include <stdlib.h>
int main()
{
int *p, j;
p = malloc(5*sizeof(int));
j = p[5];
if (p[5] == 1)
i = p[5]+1;
free(p);
return 0;
}
</programlisting>
<para>
Here two errors occur. Both of them are due to the accessing address
location <literal> p + sizeof(int)*5 </literal> which is not allocated to the program.
During the execution of <literal>j = p[5]</literal>, since the address <literal> p +
sizeof(int)*5 </literal> is invalid, the 'V' bits of 4 bytes starting at location <literal>p+sizeof(int)*5</literal>
are forced to set. Therefore uninitialized value occurs neither during
the execution of <literal>j = p[5]</literal> nor during the execution of <literal>if(p[5]==1)</literal>.
</para>
</sect2>
<sect2id="cacheprofiling">
<title>Cache Profiling</title>
<para>
Modern x86 machines use two levels of caching. These levels are L1 and
L2, in which L1 is a split cache that consists of Instruction cache(I1) and
Data cache(D1). L2 is a unified cache.
</para>
<para>
The configuration of a cache means its size, associativity and number
of lines. If the data requested by the processor appears in the upper level
it is called a hit. If the data is not found in the upper level, the
request is called a miss. The lower level in the hierarchy is then accessed to
retrieve the block containing requested data. In modern machines L1 is
first searched for data/instruction requested by the processor. If it is a
hit then that data/instruction is copied to some register in the processor.
Otherwise L2 is searched. If it is a hit then data/instruction is copied to
L1 and from there it is copied to a register. If the request to L2 also is
a miss then main memory has to be accessed.
</para>
<para>
Valgrind can simulate the cache, meaning it can display the things that
occur in the cache when a program is running. For this, first compile your program
with <option>-g</option> option as usual. Then use the shell script <literal>cachegrind</literal> instead of <literal>valgrind</literal>.
</para>
<para>
Sample output:
</para>
<para>
<screen>
==7436== I1 refs: 12,841
==7436== I1 misses: 238
==7436== L2i misses: 237
==7436== I1 miss rate: 1.85%
==7436== L2i miss rate: 1.84%
==7436==
==7436== D refs: 5,914 (4,626 rd + 1,288 wr)
==7436== D1 misses: 357 ( 324 rd + 33 wr)
==7436== L2d misses: 352 ( 319 rd + 33 wr)
==7436== D1 miss rate: 6.0% ( 7.0% + 2.5% )
==7436== L2d miss rate: 5.9% ( 6.8% + 2.5% )
==7436==
==7436== L2 refs: 595 ( 562 rd + 33 wr)
==7436== L2 misses: 589 ( 556 rd + 33 wr)
==7436== L2 miss rate: 3.1% ( 3.1% + 2.5% )
</screen>
</para>
<para>
<screen>
L2i misses means the number of instruction misses that occur in L2
cache.
L2d misses means the number of data misses that occur in L2 cache.
Total number of data references = Number of reads + Number of writes.
Miss rate means fraction of misses that are not found in the upper
level.
</screen>
</para>
<para>
The shell script <literal>cachegrind</literal> also produces a file, <filename>cachegrind.out</filename>, that
contains line-by-line cache profiling information which is not humanly
understandable. A program <literal>vg_annotate</literal> can easily interpret this
information. If the shell script <literal>vg_annotate</literal> is used without any arguments it will read the file <filename>cachegrind.out</filename> and produce an output which is humanly understandable.
</para>
<para>
When C, C++ or assembly source programs are passed as input to
<literal>vg_annotate</literal> it displays the number of cache reads, writes, misses etc.
</para>
<screen>
I1 cache: 16384 B, 32 B, 4-way associative
D1 cache: 16384 B, 32 B, 4-way associative
L2 cache: 262144 B, 32 B, 8-way associative
Command: ./a.out
Events recorded: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Events shown: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw