271 lines
12 KiB
HTML
271 lines
12 KiB
HTML
<!--startcut ======================================================= -->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<html>
|
|
<head>
|
|
<META NAME="generator" CONTENT="lgazmail v1.1pre8">
|
|
<TITLE>The Answer Guy 30: Hardware Lockups due to Graphics Load</TITLE>
|
|
</head>
|
|
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#A000A0"
|
|
ALINK="#FF0000">
|
|
<!--endcut ========================================================= -->
|
|
<H4>"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
|
</H4>
|
|
<P> <hr> <P>
|
|
|
|
<!-- =============================================================== -->
|
|
<H1 align="center"><A NAME="answer">
|
|
<img src="../gx/dennis/qbubble.gif" alt="" border="0" align="middle">
|
|
<a href="./index.html">The Answer Guy</a>
|
|
<img src="../gx/dennis/bbubble.gif" alt="" border="0" align="middle">
|
|
</A></H1> <BR>
|
|
<H4 align="center">By James T. Dennis,
|
|
<a href="mailto:linux-questions-only@ssc.com">linux-questions-only@ssc.com</a><BR>
|
|
Starshine Technical Services,
|
|
<A HREF="http://www.starshine.org/">http://www.starshine.org/</A> </H4>
|
|
<p><hr><p>
|
|
<H3><img src="../gx/dennis/qbub.gif" alt="(?)" width="50" height="28"
|
|
align="left" border="0">Hardware Lockups due to Graphics Load</H3>
|
|
|
|
<p><strong>From Brad Alexander on 30 May 1998
|
|
|
|
<!-- begin body -->
|
|
<BR><BR>
|
|
Hi Jim,
|
|
|
|
<br><br>
|
|
This isn't Linux-specific, but I'm having a problem and I'm hoping you can
|
|
help me come up with a workaround that isn't going to cost a lot of money.
|
|
|
|
<br><br>
|
|
I have an Intel P-100 on an Amptron AM-7900 board with 64MB of EDO RAM (2
|
|
32MB sticks), a gob of hard drives (a 2.2MB Quantum Fireball IDE and a
|
|
FutureDomain SCSI controller with a 420MB Conner, a 1GB Seagate, 1GB
|
|
Micropolis and 1GB Quantum Empire), a Diamond Stealth 64 with 2MB DRAM, and
|
|
a SoundBlaster 16 Plug'n'Pray.
|
|
|
|
<br><br>
|
|
I'm running a heavily modified RedHat 5.0 machine with an 800MB DOS
|
|
partition on <TT>/dev/hda1</TT> and a 200MB win95 partition on
|
|
<TT>/dev/hda3</TT> (Linux's <TT>/+/usr</TT> is on <TT>/dev/hda2</TT>).
|
|
|
|
<br><br>
|
|
I have been seeing system lockups for quite a while now. I noticed them
|
|
when running xlock in random mode initially, then noticed that I was also
|
|
starting to have problems with some of my dos apps, like Jane's Longbow and
|
|
Duke Nukem locking up. Under Linux, I settled on using <TT>xlock</TT> in
|
|
galaxy mode, and the lockups dropped to every couple of weeks. (Note that
|
|
during this time, I upgraded memory from 4 8MB sticks to 2 32s.)
|
|
|
|
<br><br>
|
|
Everything went all right until I upgraded to RedHat 5.0, with XFree86
|
|
3.3.1. The lockups increased to about every 2 days. Once I upgraded to
|
|
XFree86 3.3.2, they dropped back down to about once a week.
|
|
|
|
<br><br>
|
|
I'm basically using you as a sounding board to see if I might have missed
|
|
something. I'm thinking its hardware, but where? The stealth? The lockups
|
|
seem to occur during graphics app use, <TT>xlock</TT>, or the <TT>gimp</TT>.
|
|
The motherboard? The chip? What can I start replacing without sinking a whole
|
|
bunch of money into it?
|
|
|
|
<br><br>
|
|
Thanks in advance,
|
|
<br>--Brad Alexander
|
|
</strong></p>
|
|
|
|
<blockquote><img src="../gx/dennis/bbub.gif"width="50" height="28" alt="(!)"
|
|
align="left" border="0">
|
|
Well, the first thought would be to try a different video card.
|
|
I don't have too much confidence that the problem is truly
|
|
related to the video card's activity --- so it's just a
|
|
diagnostics start.
|
|
|
|
<br><br>
|
|
To see if this really is related to graphics, boot up the
|
|
system in text mode (don't run X, change your runlevel or
|
|
initdefault to one of the non-xdm modes if necessary). Now you
|
|
can run a couple of kernel builds on it (that's usually a
|
|
pretty good stress test. Try '<TT>make -j</TT>' to work it harder.
|
|
|
|
<br><br>
|
|
It would also be helpful to know what sort of lockup you're
|
|
getting. It may be that you could still login via a serial
|
|
port (using a null modem and a laptop or any other nearby
|
|
computer or terminal). Do do this simply add a line like
|
|
|
|
<blockquote><code>t1:23:respawn:/sbin/agetty -L 38400,19200,9600,2400,1200 ttyS1 vt100
|
|
</code></blockquote>
|
|
|
|
... to your <TT>/etc/inittab</TT>. This should allow you to use one of
|
|
your serial lines to login. It is possible for the Linux X
|
|
Windows system and console to be dead while the kernel and
|
|
other processes are still up and running. Another test is to
|
|
ping it from another system (if you have an ethernet LAN
|
|
connected to this machine). Even if telnet doesn't work you
|
|
want to ping it to see if the kernel is still responding.
|
|
|
|
<br><br>
|
|
It's also probably worth trying the software watchdog timer
|
|
code in the newer kernels. These allow you to configure a
|
|
kernel module to emulate a hardware watchdog timer card. These
|
|
WDT devices are basically a "dead man's switch" for your
|
|
system. If the timer isn't periodically updated by the kernel
|
|
(or by some <EM>other thread in the kernel</EM>, in the case of the
|
|
emulated WDT) then the WDT triggers a system reset.
|
|
|
|
<br><br>
|
|
Obviously a software emulation of this isn't quite as reliable
|
|
as a hardware WDT --- since a completely hung kernel will
|
|
never get around to calling on that module's thread of
|
|
execution. However, it isn't too unlikely that the hang is in
|
|
some specific kernel thread and that some other thread
|
|
continues to execute after other parts have died.
|
|
|
|
<br><br>
|
|
Frankly I'm not sure what the difference between the kernel
|
|
watchdog emulation code and the boot "<TT>panic=</TT>" parameter. But
|
|
that's definitely another thing to try (just add something
|
|
like <TT>panic=60</TT> to your lilo "<TT>append=</TT>" directive, or
|
|
manually when you boot up your system). I guess that the difference
|
|
would be that there may be some conditions under which the
|
|
kernel could get into a comatose or unresponsive state without
|
|
panic'ing (if it got tricked into some really long timeout wait
|
|
or something). The <TT>panic=</TT> option forces the Linux kernel to
|
|
reboot after a "panic" (a critical error condition detected by
|
|
the kernel, usually a corrupted table that fails its consistency and
|
|
integrity checks).
|
|
|
|
<br><br>
|
|
Normally the kernel would just display a "panic" message and
|
|
sit there waiting for human intervention. These are very
|
|
rare (other than the old "VFS kernel panic, unable to mount
|
|
root" that occurs when you have your kernel misconfigured for
|
|
your arrangement of hard drives --- or when you change the
|
|
hardware setting of your disk drives without updating your
|
|
kernel (with the '<TT>rdev</TT>' command to set the root device flags)
|
|
and/or without updating your <TT>LILO</TT> or <TT>LOADLIN</TT> commands
|
|
(which are usually used to pass these flags to your kernel to
|
|
over-ride the compiled in defaults).
|
|
|
|
<br><br>
|
|
Other than that common case I think I've only seen one or two
|
|
Linux kernel panics in the last 6 years. I've only had about a
|
|
half dozen unexplained system lockups over that period --- and
|
|
that's on about fifty Linux machines that I've managed during
|
|
various portions of that time. These lockups might have been
|
|
panics in situations that were so bad the kernel couldn't even
|
|
display an error message, there's no way to know).
|
|
|
|
<br><br>
|
|
I've only had to reboot unresponsive Linux boxes about a dozen
|
|
or so times in all the years I've used it. This was only a
|
|
problem in the late .99 and early 1.0x kernels when I was
|
|
running a very busy FTP/Web server that was simply overloaded
|
|
-- the TCP/IP stack would get so congested that the system
|
|
would timeout between my login name and password --- at the
|
|
console (I'd've loved a working SAK --- secure attention key
|
|
back then). I was glad to see the major TCP/IP re-write in
|
|
between 1.2 and 2.x.
|
|
|
|
<br><br>
|
|
I'm not trying to tout Linux' horn here --- (well, maybe a
|
|
little). The point is that I don't get panics and lockups
|
|
often enough to see how the <TT>panic=</TT> parameter and the
|
|
softdog/watchdog code would work in those situations.
|
|
|
|
<br><br>
|
|
However, if you enabled the <TT>panic=</TT> and/or the softdog
|
|
kernel option, you may see that the machine reboots without a minute
|
|
or two after your lockup (wait for ten or fifteen). This tells
|
|
you that some part of the kernel was still running (and that
|
|
the hardware isn't completely wigged out).
|
|
|
|
<br><br>
|
|
Beyond that the things to do are to take out all non-essential
|
|
hardware (the sound card would be a great choice --- and the
|
|
SCSI card, since you mention that your Linux partitions are on
|
|
the IDE drives. As with most technical computing issues the it
|
|
eventually boils down to a matter of cost. You mentioned a
|
|
couple of times how you don't want to spend money on solving
|
|
this problem. Ultimately the time you spend fighting with it
|
|
translates to money --- and you'll have to eventually ask what
|
|
your time is worth.
|
|
|
|
<br><br>
|
|
(The deeper part of this question is that you may find that
|
|
your home machine isn't worth the time <EM>or the money</EM> and you
|
|
may content yourself to just use any machines that you
|
|
encounter at work, or whatever. Strange as that sounds I've
|
|
had friends who refuse to keep a computer around the house
|
|
specifically because they "spend enough time with them at work"
|
|
and feels that "home is for family time").
|
|
|
|
<br><br>
|
|
At the same time I don't recommend throwing replacement
|
|
components at the problem without understanding the nature of
|
|
the problem. However, it may be that the best solution is to
|
|
replace the motherboard and/or the video card and/or the RAM.
|
|
|
|
<br><br>
|
|
Troubleshooting computers is difficult work. Whole books have
|
|
been devoted to the subject (I like the Win L. Rosch Hardware
|
|
Bible personally --- read it years ago and should probably get
|
|
an updated copy). There are also parts of the process that
|
|
can't be gained from any book --- that you must learn by
|
|
experience and figure out through some combination of analysis
|
|
and intuition. As our computers become more sophisticated the
|
|
balance seems to lean more for the intuition.
|
|
</blockquote>
|
|
|
|
<!-- end body -->
|
|
<!--================================================================-->
|
|
<P> <hr> <P>
|
|
<H5 align="center"><a href="http://www.linuxgazette.com/copying.html"
|
|
>Copyright ©</a> 1998, James T. Dennis <BR>
|
|
Published in <I>Linux Gazette</I> Issue 30 July 1998</H5>
|
|
<P> <hr> <P>
|
|
<!--================================================================-->
|
|
<table width="98%"><tr valign="center" align="center">
|
|
<td rowspan="3"><A HREF="./lg_answer30.html"><IMG
|
|
SRC="../gx/dennis/answernew.gif"
|
|
ALT="[ Answer Guy Index ]"></A></td>
|
|
<td><A HREF="tag_SCOkeys.html">SCOkeys</A></td>
|
|
<td><A HREF="tag_chroot.html">chroot</A></td>
|
|
<td><A HREF="tag_dosemu-db.html">dosemu-db</A></td>
|
|
<td><A HREF="tag_NTauth.html">NTauth</A></td>
|
|
<td><A HREF="tag_cdr.html">cdr</A></td>
|
|
<td><A HREF="tag_3270.html">3270</A></td>
|
|
<td><A HREF="linux-questions-only@ssc.comport.html">comport</A></td>
|
|
</tr><tr valign="center" align="center">
|
|
<td><A HREF="tag_lilostop.html">lilostop</A></td>
|
|
<td><A HREF="tag_emulate.html">emulate</A></td>
|
|
<td><A HREF="tag_ppadrivers.html">ppadrivers</A></td>
|
|
<td><A HREF="tag_database.html">database</A></td>
|
|
<td><A HREF="tag_vacation.html">vacation</A></td>
|
|
<td><A HREF="tag_nullmodem.html">nullmodem</A></td>
|
|
<td><A HREF="tag_lockups.html">lockups</A></td>
|
|
</tr><tr valign="center" align="center">
|
|
<td><A HREF="tag_gzipC.html">gzipC</A></td>
|
|
<td><A HREF="tag_newlook.html">newlook</A></td>
|
|
<td><A HREF="tag_c500.html">c500</A></td>
|
|
<td><A HREF="tag_solprint.html">solprint</A></td>
|
|
<td><A HREF="tag_vc1shell.html">vc1shell</A></td>
|
|
<td><A HREF="tag_memleak.html">memleak</A></td>
|
|
<td><A HREF="tag_tvcard.html">tvcard</A></td>
|
|
</tr></table>
|
|
<P> <hr> <P>
|
|
<!--================================================================-->
|
|
<A HREF="./index.html"><IMG SRC="../gx/indexnew.gif"
|
|
ALT="[ Table Of Contents ]"></A>
|
|
<A HREF="../index.html"><IMG SRC="../gx/homenew.gif"
|
|
ALT="[ Front Page ]"></A>
|
|
<A HREF="lg_bytes30.html"><IMG SRC="../gx/back2.gif"
|
|
ALT="[ Previous Section ]"></A>
|
|
<A HREF="./vrenios.html"><IMG SRC="../gx/fwd.gif"
|
|
ALT="[ Next Section ]"></A>
|
|
<!--startcut ======================================================= -->
|
|
</body>
|
|
</html>
|
|
<!--endcut ========================================================= -->
|