old-www/LDP/LG/issue81/kurup.html

330 lines
19 KiB
HTML

<!--startcut ==============================================-->
<!-- *** BEGIN HTML header *** -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<title>Is Your Memory Not What It Used To Be? LG #81</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!-- *** END HTML header *** -->
<CENTER>
<A HREF="http://www.linuxgazette.com/">
<IMG ALT="LINUX GAZETTE" SRC="../gx/lglogo.png"
WIDTH="600" HEIGHT="124" border="0"></A>
<BR>
<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="durodola.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue81/kurup.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="padala.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
<P>
</CENTER>
<!--endcut ============================================================-->
<H4 ALIGN="center">
"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>
<P> <HR> <P>
<!--===================================================================-->
<center>
<H1><font color="maroon">Is Your Memory Not What It Used To Be?</font></H1>
<H4>By <a href="http://www.geocities.com/madhumkurup/mailme.html">Madhu M Kurup</a></H4>
</center>
<P> <HR> <P>
<!-- END header -->
<h2>Intent</h2>
The intent of this article is to provide an understanding of memory
leak detection and profiling tools currently available. It also aims at
providing you with enough information to be able to make a choice between
the different tools for your needs.<br>
<h2>Leaks and Corruption</h2>
We are talking software here, not plumbing. And yes, any fairly large,
non trivial program is bound to have a problem with memory and or leaks.<br>
<h3>Where do problems occur?</h3>
First, leaks and such memory problems do not occur in some languages.
These languages believe that memory management is <i>so important</i> that
it should never be handled by the users of that language. It is better
handled by the <i>language designers</i>. Examples of such languages are
Perl, Java &nbsp;and so on.<br>
&nbsp;&nbsp;&nbsp; However, in some other languages (notably C and
C++) the language designers have felt that memory management is <i>so important</i>
that it can only be taken care of by the <i>users</i> of the language.
A leak is said to occur when you dynamically allocate memory and then forget
to return it. In addition to leaks, other memory problems such as <a
href="http://www.tuxedo.org/%7Eesr/jargon/html/entry/buffer-overflow.html">buffer
overflows</a>,&nbsp;<a
href="http://www.tuxedo.org/%7Eesr/jargon/html/entry/dangling-pointer.html">dangling
pointers</a> &nbsp;also occur when programmers manage memory themselves.
These problems are caused where there is a&nbsp; mismatch between what
the program (and by extension the programmer) believes the state of memory
is, as opposed to what it really is.<br>
<h3>What are the problems?</h3>
In order for programs to be able to deal with data whose size is
not known at compile time, the program may need to request memory from
the runtime environment (operating system). However, having obtained a
chunk of memory, it may be possible that the program does not return to
back to the environment after use. An even more severe condition results
when the address of the block that was obtained is lost, which means that
it is no longer possible to identify that allocated memory. &nbsp;Other
problems include trying to access memory after it has been returned (dangling
pointers). Another common problem is trying to access more memory that was
originally requested and so on (buffer overflow).<br>
<h3>Why should these problems bother me?</h3>
Leaks may not be a problem for short-lived programs that finish their
work quickly. Unfortunately, many programs are designed to function
without termination for a long period. A good example would be the Apache
webserver that is currently providing you this web page. In such a situation,
a malfunctioning leaky program could keep requesting memory from the system
and not return it. Eventually this would lead to the system running out
of memory and all programs running on that machine to suffer. This is obviously
not a &nbsp;good thing. In addition to a program requiring more memory,
leaks can also make a program sluggish. The speed at which the program
is context-switched in and out can decrease if the memory load increases.
While not as severe as causing the machine to crash, an excessive memory
load on a machine could cause it to thrash, swapping data back and forth.<br>
&nbsp;&nbsp;&nbsp; Dangling pointers can result in subtle corruption
and bugs that are extremely unusual, obscure and hard to solve. Buffer overflows
are probably the most dangerous of the three forms of memory problems.
They lead to most of the security exploits that you read about[<a
href="#Secure_Programming_">SEC</a>]. &nbsp;In addition to the problems
described above, it may be possible that the same memory chunk is returned
back to the system multiple times. This obviously indicates a programming
error. A programmer may wish to see how the memory requests are made by
a program over the course of the lifetime of the program in order to find
and fix bugs.<br>
<h3>Combating these problems</h3>
There are some run time mechanisms to combat memory problems. Leaks can
be solved by periodically stopping and restarting the offending program
<cite></cite> [<a href="#OOM_killer">OOM</a>]. Dangling pointers can be
made repeatable by zeroing out all memory returned back to the operating
systems. Buffer overflows have a variety of solutions, some of which are
described in more detail <a
href="http://www.geocities.com/madhumkurup/papers/Buffer.ps">here</a>. <br>
&nbsp;&nbsp;&nbsp; Typically, the overhead of combating these problems at
runtime or late in development cycle is so high that finding them and
fixing them at the program level is often the more optimal solution.<br>
<h2>Open Source </h2>
<h3>GCC-based alternatives</h3>
The <a
href="http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/boehm-gc/">gcc</a> toolset
now includes a garbage collector which facilitates the easy detection
and elimination of many memory problems. Note that while this can be used
to detect leaks, the primary reason for creating this was to implement
a good garbage collector[<a href="#Garbage_Collectors">GC</a>]. This work
is currently being led by Hans-J. Boehm at HP.&nbsp;
<h4>Technology</h4>
The technology used here is <a
href="http://www.hpl.hp.com/personal/Hans_Boehm/gc/gcdescr.html">Boehm-Demers-Weiser</a>
technique for keeping track of allocated memory. Allocation of memory
is done using the algorithm's version of the standard memory allocation
functions. The program is then compiled with these functions and when executed,
the algorithm can analyze the behavior of the program. This algorithm is
fairly well known and well understood. It should not cause any problems
and/or interfere with programs. It can be made thread safe and can even
scale onto a multiprocessor system.<br>
<h4>Performance</h4>
Good performance with reduction in speed in line with expectations.
The code is extremely portable and is also available directly with gcc.
The version shipped with gcc is slightly older, but can be upgraded.<br>
&nbsp;&nbsp;&nbsp; There is no interface - it is difficult to use
and requires much effort for it to be useful. Existing systems may not
have this compiler configuration and may require some additional work to
get it going. In addition, in order for the calls to be trapped, all memory
calls (such as <i>malloc()</i> and <i>free()</i> ) have to be replaced with
equivalents provided by the garbage collector. One could use a macro, but
that is still not very flexible.&nbsp; Also this approach implicitly requires
source code for all pieces that require memory profiling with the ability
to shift from the real functions to those provided.<br>
<h4>Verdict</h4>
If you need a solution across multiple platforms (architectures,
operating systems) where you have control over all relevant source, this
could be it.<br>
<h3>Memprof</h3>
<a href="http://people.redhat.com/otaylor/memprof/">Memprof</a> is
an attractive easy to use package, created by Owen Talyor of Red Hat. This
tool is a nice clean GNOME front-end to the Boehm-Demers-Weiser garbage
collector.<br>
<h4>Technology</h4>
At the heart of the profiling, memprof is no different from the toolset
described above. However, how it implements this functionality is to trap
all memory requests from the program and redirect it at runtime to the
garbage collector. While not as functional as the gcc alternative on threads
and multiprocessors, the program can be asked to follow forks as they happen.<br>
<h4>Performance</h4>
The performance of this tool is pretty good. The GUI was well designed,
responsive and informative. This tools works directly with executables,
and it works without any changes needed to the source. This tool also graphically
displays the memory profile as the program executes which helps in understanding
memory requirements of the program during its lifetime.<br>
&nbsp;&nbsp;&nbsp; This tool is currently available only for the x86
and PPC architecture on Linux. If you need help on other platforms, you
will need to look elsewhere. This tool is not a GTK application, it needs
the full-blown GNOME environment. This may not be feasible everywhere.
Finally, development on this tool appears to be static (version 0.4.1.
for a while). While it is possible that it does what it is required to do
well, it does not seem that this too will do anything more than just leak
detection.<br>
<h4>Verdict</h4>
If you like GUI tools and don't mind GNOME and Linux, this is a
tool for you.<br>
<h3>Valgrind</h3>
<a href="http://developer.kde.org/%7Esewardj/">Valgrind</a> is a
program that attempts to solve a whole slew of memory problems, leaks
being just one of them. This tool is the product of Julian Seward (of
<a href="http://sources.redhat.com/bzip2/index.html">bzip2</a> and <a
href="http://www.cacheprof.org">cacheprof</a> fame). It terms itself "an open source
memory debugger for x86 linux" and it certainly fits that bill. In addition,
it can profile the usage of the CPU cache, something that is fairly unusual.
<h4>Technology</h4>
The technology used in this program is fairly complex and <a
href="http://developer.kde.org/%7Esewardj/docs/techdocs.html">well documented</a>.
Each byte of memory allocated by the program is tracked by nine status
bits, which are then used for housekeeping purposes to identify what is
going on. At the cost of tremendously increasing the memory load of an
executing program, this tool enables a much greater set of checks. As all
the reads and writes are intercepted, cache profiling of the CPU's various
L caches can also be done.<br>
<h4>Performance</h4>
The tool was the slowest of the three detailed here, for obvious
reasons. However, for the reduction in speed, this tool provides a wealth
of information is probably the most detailed of the three. In addition
to the usual suspects, this tool can identify a variety of other memory
and even some POSIX pthread issues. Cache information is probably overkill
for most applications, but it is an interesting way to look at the performance
of an application. The biggest plus for Valgrind is that it is under rapid
development with a pro-active developer and an active community. In fact
the web page of Valgrind proclaims the following from the author - &nbsp;<i>"If
you have problems with Valgrind, don't suffer in silence. Mail me."</i>.<br>
&nbsp;&nbsp;&nbsp; The tool however, is very x86 specific. Portability
is fairly limited and to x86 Linux. The interface is purely command-line
driven and while usable, sometimes the tool gives you too much information
for it to be useful. This tool also directly works with binaries, so while
recompiles are not required, it will require diligence to go through the
output of this tool to find what you are looking for. You can suppress memory
profiling for various system libraries by creating suppression files, but
writing these files is not easy. In addition, threading support is not complete,
although this tool has been used on Mozilla, OpenOffice and such other
large threaded programs. If this tool had a GUI front end, it would
win hands down.<br>
<h4>Verdict</h4>
If you are on x86 and know your code well and do not mind a CLI interface,
this program will take you another level.<br>
<h3>Other Open Source tools</h3>
Before I get sent to the stake for not having mentioned your favorite
memory tool, I must confess that few compare in completeness to these three
in terms of the data that they provide. A more&nbsp; comprehensive list
of leak detection tools is available&nbsp;<a
href="http://www.sslug.dk/emailarkiv/bog/2001_08/msg00030.html">here</a>.
<br>
<h2>Commercial</h2>
These tools are mentioned here only for completeness.
<h3>Purify</h3>
The <a href="http://www.rational.com/products/pqc/pplus_ux.jsp">big
daddy</a> of memory tools, does <i>not work</i> on Linux, so you can stop
asking that question.
<h3>Geodesic</h3>
A latecomer to this arena, <a
href="http://www.geodesic.com/solutions/solutions_linux.html">Geodesic</a>
is known most in the Linux community for their <a
href="http://www.geodesic.com/solutions/products_gc_demo.html">Mozilla</a>
demo, in which they use their tools to help find memory problems in the
Mozilla codebase. How much use this has been to the Mozilla team is yet
to be quantified, but their open-source friendliness can't hurt. Works
for Solaris/Linux with a fully functional trial. Works on Windows as well.<br>
<h3>Insure++</h3>
A C++ specific tool, but still fairly well known, Parasoft's <a
href="http://www.parasoft.com/jsp/products/home.jsp?product=Insure">Insure++</a>
is a fairly complete memory profiling / leak detection tool. In addition,
it can find some C++ specific errors as well, so that can't hurt. This tool
works with a variety of compilers and operating systems, a free trial version
is available too.
<h2>Miscellaneous Notes:</h2>
<h3><a name="Secure_Programming_"></a>Secure Programming </h3>
Secure programming involves many components, but probably the most significant
is the careful use of memory. More details are available&nbsp;<a
href="http://www.theorygroup.com/Theory/FAQ/Secure-Programs-HOWTO-1.html">here</a>.<br>
<h3><a name="OOM_killer"></a>OOM killer</h3>
Some the newer Linux kernels employ an algorithm which is known as the
Out Of Memory (OOM) killer. This code is invoked when the kernel completely
runs out of memory, at which point active programs / processes are chosen
to be executed (as in killed, end_of_the_road, happy hunting grounds, etc).
More details are available&nbsp;<a
href="http://linux-mm.org/docs/oom-killer.shtml">here</a>.<br>
<h3><a name="Garbage_Collectors"></a>Garbage Collectors </h3>
One of the other reasons why garbage collection is not always a preferred
solution is that it is really tough to implement. They have severe problems
with self-referential structures (i.e. structures that link to themselves)
as aptly described <a
href="http://www.tuxedo.org/%7Eesr/jargon/html/Some-AI-Koans.html">here</a>.<br>
<!-- *** BEGIN bio *** -->
<SPACER TYPE="vertical" SIZE="30">
<P>
<H4><IMG ALIGN=BOTTOM ALT="" SRC="../gx/note.gif">Madhu M Kurup</H4>
<EM>I'm a CS engineer from Bangalore, India and formerly of the
<a href="http://www.linux-bangalore.org">ILUG Bangalore</a>. I've
been working and playing with Linux for a while and while programming is
my first love, Linux comes a close second. I work at the Data Mining group
at <a href="http://www.yahoo.com">Yahoo!</a> Inc and work on algorithms,
scalability and APIs there. I moonlight on the Linux messenger client and
dabble in various software projects when (if ever) I can find any free time.
<P> And yes, if you want to know, I use C++, vi, mutt, Windowmaker
and Mandrake; let the flame wars begin :) </EM>
<!-- *** END bio *** -->
<!-- *** BEGIN copyright *** -->
<P> <hr> <!-- P -->
<H5 ALIGN=center>
Copyright &copy; 2002, Madhu M Kurup.<BR>
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
Published in Issue 81 of <i>Linux Gazette</i>, August 2002</H5>
<!-- *** END copyright *** -->
<!--startcut ==========================================================-->
<HR><P>
<CENTER>
<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="durodola.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue81/kurup.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="padala.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
</CENTER>
</BODY></HTML>
<!--endcut ============================================================-->