old-www/LDP/LG/issue31/tag_memleak.html

234 lines
10 KiB
HTML

<!--startcut ======================================================= -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<TITLE>The Answer Guy 31: Memory Leaks and the OS that Allows Them</TITLE>
</head>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#A000A0"
ALINK="#FF0000">
<!--endcut ========================================================= -->
<H4>"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>
<P> <hr> <P>
<!-- =============================================================== -->
<H1 align="center"><A NAME="answer">
<img src="../gx/dennis/qbubble.gif" alt="" border="0" align="middle">
<a href="./index.html">The Answer Guy</a>
<img src="../gx/dennis/bbubble.gif" alt="" border="0" align="middle">
</A></H1> <BR>
<H4 align="center">By James T. Dennis,
<a href="mailto:linux-questions-only@ssc.com">linux-questions-only@ssc.com</a><BR>
Starshine Technical Services,
<A HREF="http://www.starshine.org/">http://www.starshine.org/</A> </H4>
<p><hr><p>
<p>The original message in this thrread appeared in Issue 30,
<a href="http://www.linuxgazette.com/issue30/tag_memleak.html"
>Linux Memory Usage vs. Leakage</a></p>
<p><hr width="40%"><p>
<H3><img src="../gx/dennis/qbub.gif" alt="(?)" width="50" height="28"
align="left" border="0">Memory Leaks and the OS that Allows Them</H3>
<p><strong>From Thomas L. Gossard on Fri, 03 Jul 1998
<br><br>
Answerguy,
<br><br>
Regarding the recent question you received on memory leakage under
2.0.29. I don't believe it is a memory leakage under the normal
sense where a program quits and won't give the memory back to the OS.
</strong></p>
<blockquote><img src="../gx/dennis/bbub.gif" alt="(!)" width="50" height="28"
align="left" border="0">
Once a program has <strong>quit</strong> (exited) it is the OS'
responsibility to reclaim all RAM and normalize all
other resources (process table entries, filed descriptors
and handles, etc) that were allocated to that process.
<br><br>
If it fails to do so, that is a bug in the OS (the kernel
and/or its drivers or core user space processes, like
'<tt>init</tt>'). Under Linux (and Unix in general) it is very
rare to see this sort of bug. (I've never heard of any
kernel memory leaks in Linux).
<br><br>
Under NT there is apparently a problem because the system is
very complex and so much of the programming doesn't respect
the intended modularity between "kernel" and "user space" ---
so DLLs and drivers, (particularly video drivers) will end up
locked into memory with no references. Since I'm not an NT
programmer (and not a systems programmer of any sort I'll have
to accept the considered opinions of others who've said that
this is why NT has a notorious poor stability record compared
to any form of Unix. The fact that they've added some process
memory protection and imposed some modularity and process
isolation means that NT's stability is orders of magnitude
better than MS-DOS, Windows 3.x, and Windows '95 ever were.
However, it's reported to be very poor compared to any of the
multi-user OS' like Unix or VMS.
</blockquote>
<p><strong><img src="../gx/dennis/qbub.gif" alt="(?)" width="50" height="28"
align="left" border="0">
I also use .29 and saw the same problem. I sent out several e-mails
and found out that what is really happening is the OS has the memory
but is not reporting it as free but has saved it for cache purposes.
Notice the guy with the question said "<tt>ls</tt>" the first time took
memory but not the second time. A memory leak will take the memory
each time. The OS is keeping the memory for itself. The real
problem is in the way the OS or <tt>top</tt> or whatever is reporting the
memory usage and the way we expect to see it.
</strong></p>
<blockquote><img src="../gx/dennis/bbub.gif" alt="(!)" width="50" height="28"
align="left" border="0">
The way that memory is used by the Linux cache is
fairly complex. Consequently the output from '<tt>top</tt>' and
'<tt>free</tt>' and '<tt>vmstat</tt>' are not easy to interpret (and I
don't consider myself to be an expert in them by any
means).
<br><br>
The intended design is supposed to use all "available"
free memory for disk caching (and I guess the 2.2 kernels
will implement disk and directory entry caching ---
which should yield much better performance for several
reasons). It is certainly possible that there were bugs
in the caching and memory management code in some of the
2.0.x kernels. You could certainly go to the Linux
kernel mailing list archives and read through the various
change summaries to see. Or you could ugrade to a
newer kernel and look for symptoms.
</blockquote>
<p><strong><img src="../gx/dennis/qbub.gif" alt="(?)" width="50" height="28"
align="left" border="0">
The only true way to check on the problem seems to be to execute
some memory hog routines, like graphics and watch the swap useage.
In particular my mail program seemed to suck up 8 or 9 megs at a
time yet even going in and out of that and xv my swap was barely
touched. With a sufficient memory leak after a period of time the
swap should see a great deal of activity due to the lack of memory.
<br><br>
Tom G
</strong></p>
<blockquote><img src="../gx/dennis/bbub.gif" alt="(!)" width="50" height="28"
align="left" border="0">
Most memory leaks are in user space --- in long running
daemons like '<tt>named</tt>', a web server, '<tt>sendmail</tt>', X, etc.
Your test doesn't isolate the cuase of the memory leak.
I think my message covered some suggestions to do that
(like run with init=/bin/sh and run some tests from there)
<br><br>
If exiting doesn't return your memory to availability for
cache/free space --- you have a problem in your kernel.
However, it can be deceiving. For example --- I remember
a situation where BIND ('<tt>named</tt>') was leaking --- and it
looked like '<tt>sendmail</tt>' was the culprit. In actuality
'<tt>sendmail</tt>' was making DNS queries on the named, causing
<strong>it</strong> to lose it's cookies. (At the same time that
'<tt>sendmail</tt>' was segfaulting (dying a horrible death) because
the old resolver libraries (against which it was linked)
were return <strong>lots</strong> of MX records for sites
like <a href="http://www.compuserve.com/">Compuserve</a>
and <a href="http://aol/com/">AOL</a> (which back then had
just started deploying
dozens of mail servers each --- so that one DNS request
would return more records than the resolver could handle).
<br><br>
At first I thought someone had discovered a new remote
<tt>sendmail</tt> exploit and was hacking into my site (this was
actually on an old SunOS box). Then I realized that it
was related to DNS --- and finally I upgraded to a newer
DNS and set of resolver libraries. The newer version of
<tt>named</tt> still had a memory leak back then --- but my other
sysadmin friends said "Oh yeah! It's been doing that ---
just set up a '<tt>cron</tt>' job to kill it once a day or so"
(I'd been sure that it was my fault and that I'd built
and installed it incorrectly).
<br><br>
As for the "true way" to look for memory leaks ---
I think most programmers would disagree with your
analysis on this one. They might suggest Electric Fence
(a debugging form of the malloc() and new() calls that's
designed to catch the sorts of allocation and reference
problems that '<tt>lint</tt>' won't --- and that might not be
immediately fatal). Another option might be for someone
to link this with Insure++
(<A HREF="http://www.linuxjournal.com/issue51/2951.html"
>http://www.linuxjournal.com/issue51/2951.html</A>) and do
their testing with that.
<br><br>
Certainly, we, as sysadmins are usually constrained
to more hueristic and less "invasive" approaches ---
but we definitely want to isolate the problem to a
specific component (program, module, kernel configuration
whatever) or combination. That's what "tech support"
is all about.
</blockquote>
<!--================================================================-->
<P> <hr> <P>
<H5 align="center"><a href="http://www.linuxgazette.com/copying.html"
>Copyright &copy;</a> 1998, James T. Dennis <BR>
Published in <I>Linux Gazette</I> Issue 31 August 1998</H5>
<P> <hr> <P>
<!--================================================================-->
<table width="98%"><tr valign="center" align="center">
<td rowspan="3"><A HREF="./lg_answer31.html"><IMG
SRC="../gx/dennis/answernew.gif"
ALT="[ Answer Guy Index ]"></A></td>
<td><A HREF="tag_backup.html">backup</A></td>
<td><A HREF="tag_uidgid.html">uidgid</A></td>
<td><A HREF="tag_connect.html">connect</A></td>
<td><A HREF="tag_95slow.html">95slow</A></td>
<td><A HREF="tag_badblock.html">badblock</A></td>
<td><A HREF="tag_trident.html">trident</A></td>
<td><A HREF="tag_sound.html">sound</A></td>
</tr><tr valign="center" align="center">
<td><A HREF="tag_kernel.html">kernel</A></td>
<td><A HREF="tag_solprint.html">solprint</A></td>
<td><A HREF="tag_idescsi.html">idescsi</A></td>
<td><A HREF="tag_distrib.html">distrib</A></td>
<td><A HREF="tag_modem.html">modem</A></td>
<td><A HREF="tag_NDS.html">NDS</a></td>
<td><A HREF="tag_rpm.html">rpm</A></td>
</tr><tr valign="center" align="center">
<td><A HREF="tag_guy.html">guy</A></td>
<td><A HREF="tag_maildns.html">maildns</A></td>
<td><A HREF="tag_memleak.html">memleak</a></td>
<td><A HREF="tag_multihead.html">multihead</A></td>
<td><A HREF="tag_cdr.html">cdr</A></td>
</tr></table>
<P> <hr> <P>
<!--================================================================-->
<A HREF="./index.html"><IMG SRC="../gx/indexnew.gif"
ALT="[ Table Of Contents ]"></A>
<A HREF="../index.html"><IMG SRC="../gx/homenew.gif"
ALT="[ Front Page ]"></A>
<A HREF="lg_bytes31.html"><IMG SRC="../gx/back2.gif"
ALT="[ Previous Section ]"></A>
<A HREF="./searls.html"><IMG SRC="../gx/fwd.gif"
ALT="[ Next Section ]"></A>
<!--startcut ======================================================= -->
</body>
</html>
<!--endcut ========================================================= -->