206 lines
8.9 KiB
HTML
206 lines
8.9 KiB
HTML
<!--startcut ======================================================= -->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<html>
|
|
<head>
|
|
<META NAME="generator" CONTENT="lgazmail v1.4F.r">
|
|
<TITLE>The Answer Gang 78: Watchdog daemon</TITLE>
|
|
</HEAD><BODY BGCOLOR="#FFFFFF" TEXT="#000000"
|
|
LINK="#3366FF" VLINK="#A000A0">
|
|
<!--endcut ========================================================= -->
|
|
<P> <hr>
|
|
<!--startcut ======================================================= -->
|
|
<CENTER>
|
|
<!-- *** BEGIN navbar *** -->
|
|
<!-- *** END navbar *** -->
|
|
</CENTER>
|
|
</p>
|
|
<!--endcut ========================================================= -->
|
|
<!--startcut ======================================================= -->
|
|
<P> <hr>
|
|
<!-- begin tagnav ::::::::::::::::::::::::::::::::::::::::::::::::::-->
|
|
<p align="center">
|
|
<table width="100%" border="0"><tr>
|
|
<td align="right" valign="center"
|
|
><IMG ALT="" SRC="../../gx/navbar/left.jpg"
|
|
WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="middle" border="0"
|
|
><A HREF="../index.html"
|
|
><IMG SRC="../../gx/navbar/toc.jpg" align="middle"
|
|
ALT="[ Table Of Contents ]" border="0"></A
|
|
><A HREF="../lg_answer.html"
|
|
><IMG SRC="../../gx/dennis/answertoc.jpg" align="middle"
|
|
ALT="[ Answer Guy Current Index ]" border="0"></A></td>
|
|
<td align="center" valign="center"><A HREF="../lg_answer.html#greeting"><img align="middle"
|
|
src="../../gx/dennis/smily.gif" alt="greetings" border="0"></A>
|
|
<A HREF="../tag/bios.html">Meet the Gang</A>
|
|
<A HREF="1.html">1</A>
|
|
<A HREF="2.html">2</A>
|
|
<A HREF="3.html">3</A>
|
|
<A HREF="4.html">4</A>
|
|
<A HREF="5.html">5</A>
|
|
<A HREF="6.html">6</A>
|
|
<A HREF="7.html">7</A>
|
|
</td>
|
|
<td align="left" valign="center"><A HREF="../../tag/kb.html"
|
|
><IMG SRC="../../gx/dennis/answerpast.jpg" align="middle"
|
|
ALT="[ Index of Past Answers ]" border="0"></A
|
|
><IMG ALT="" SRC="../../gx/navbar/right.jpg" align="middle"
|
|
WIDTH="14" HEIGHT="45" BORDER="0"></td></tr></table>
|
|
</p>
|
|
<!-- end tagnav ::::::::::::::::::::::::::::::::::::::::::::::::::::-->
|
|
<!--endcut ========================================================= -->
|
|
<P> <hr> <P>
|
|
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
|
|
<center>
|
|
<H1><A NAME="answer">
|
|
<img src="../../gx/dennis/qbubble.gif" alt="(?)"
|
|
border="0" align="middle">
|
|
<font color="#B03060">The Answer Gang</font>
|
|
<img src="../../gx/dennis/bbubble.gif" alt="(!)"
|
|
border="0" align="middle">
|
|
</A></H1>
|
|
<BR>
|
|
<H4>By Jim Dennis, Ben Okopnik, Dan Wilder, Breen, Chris, and...
|
|
(<a href="tag/bios.html">meet the Gang</a>) ...
|
|
the Editors of Linux Gazette...
|
|
|
|
and You!
|
|
<br>Send questions (or interesting answers) to
|
|
The Answer Gang
|
|
for possible publication
|
|
(but read the <a href="../tag/ask-the-gang.html">guidelines</a> first)
|
|
</H4>
|
|
</center>
|
|
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
|
|
<p><hr><p>
|
|
<!-- begin 2 -->
|
|
<H3 align="left"><img src="../../gx/dennis/bbubble.gif"
|
|
height="50" width="60" alt="(!) " border="0"
|
|
>Watchdog daemon</H3>
|
|
|
|
|
|
<p align="right"><strong>Answer By James T. Dennis
|
|
<p></strong></p>
|
|
<blockQuote>
|
|
The Linux kernel supports a class of devices called "watchdog"
|
|
drivers. These are programmable timers which are wired to a system's
|
|
reset or power lines. They are common on non-PC servers and workstations
|
|
and in embedded devices and are increasing included in PC PCI chipsets.
|
|
There are also PC adapter cards that can function as watchdog timers,
|
|
some of them are included in adapters with other functions (such as the
|
|
PC Weasel 2000, or some high precision real-time clocks?) and some of
|
|
them have electronics to monitor CPU or case temperature, power supply
|
|
voltages, etc.
|
|
</blockQuote>
|
|
<blockQuote>
|
|
These all have one function in common, they can be set to some time
|
|
interval (60 seconds by default, under Linux) and will count down
|
|
towards zero. If they ever reach zero they'll strobe the reset line
|
|
and force the hardware to reboot. Thus the require period "petting"
|
|
or they'll "bite" you.
|
|
</blockQuote>
|
|
<blockQuote>
|
|
The Linux kernel supports a variety of watchdog hardware, and also
|
|
includes one which is a software emulation of what a watchdog timer
|
|
does. (Those are a bit less robust since some forms of kernel panic
|
|
or failure <EM>might</EM> leave the system wedged and unable to execute the
|
|
softdog code). (The Linux kernel can be set to reset after a time delay
|
|
in case of panic --- the default is to dump a message and registers to the
|
|
the console and wait for a human to read them and reboot. Read the
|
|
bootparam(7) man pages and search for panic= for details on how to
|
|
over-ride that).
|
|
</blockQuote>
|
|
<blockQuote>
|
|
All of this is of no use unless you also have a daemon or utility that
|
|
can set the watchdog, monitor the system, and periodically "pet the
|
|
dog." (Some texts on this topic use the more abusive "kicking" analogy
|
|
--- but I find that distasteful).
|
|
</blockQuote>
|
|
<blockQuote>
|
|
Of course one can write one's own daemon, or even a cron job (if one
|
|
over-rode the default 60 second value to be a bit longer, to account for
|
|
possible cron delays). However, it's best to start with one that's
|
|
already written and reasonably well proven. The <A HREF="http://www.debian.org/">Debian</A> project has one
|
|
that's simply called "watchdog." Although it is a Debian package it can
|
|
be adapted for use on any Linux distribution.
|
|
</blockQuote>
|
|
<blockQuote>
|
|
This particular daemon performs up to 10 internal system tests
|
|
(most are optional) and it can be configured to execute a custom suite
|
|
of tests --- your own script or binary which must return a zero exit
|
|
value on success (and should run in under some liberal time limit).
|
|
In other words, it's extensible. On failure it can attempt to execute
|
|
a custom "repair" script or binary, then it can try a soft reboot
|
|
(with statically compile code -- NOT by calling the normal 'shutdown'
|
|
or 'reboot' binaries). Failing all of that, it will simply fail to
|
|
write to the <TT>/dev/watchdog</TT> which will cause the kernel to fail to
|
|
"pet the dog" (hardware) or cause the kernel to reboot (softdog).
|
|
</blockQuote>
|
|
<blockQuote>
|
|
In (almost) any event a system failure should result in a reboot
|
|
instead of a hang. That can be good for systems that are remotely
|
|
located and hard to get reach. Of course Linux is pretty robust and
|
|
reliable: so it's rare that the watchdog will be needed; and of course
|
|
it <EM>could</EM> be that the watchdog will cause some spurious reboots,
|
|
sometimes --- especially when initially configuring and tuning it.
|
|
But there are cases where it's worth the risk and effort.
|
|
</blockQuote>
|
|
|
|
<!-- end 2 -->
|
|
<P> <hr> </p>
|
|
<!-- *** BEGIN copyright *** -->
|
|
<H5 align="center">This page edited and maintained by the Editors
|
|
of <I>Linux Gazette</I>
|
|
<a href=""
|
|
>Copyright ©</a> 2002
|
|
<BR>Published in issue 78 of <I>Linux Gazette</I> May 2002</H5>
|
|
<H6 ALIGN="center">HTML script maintained by
|
|
<A HREF="mailto:star@starshine.org">Heather Stern</a> of
|
|
Starshine Technical Services,
|
|
<A HREF="http://www.starshine.org/">http://www.starshine.org/</A>
|
|
</H6>
|
|
<!-- *** END copyright *** -->
|
|
<!--startcut ======================================================= -->
|
|
<P> <hr>
|
|
<!-- begin tagnav ::::::::::::::::::::::::::::::::::::::::::::::::::-->
|
|
<p align="center">
|
|
<table width="100%" border="0"><tr>
|
|
<td align="right" valign="center"
|
|
><IMG ALT="" SRC="../../gx/navbar/left.jpg"
|
|
WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="middle" border="0"
|
|
><A HREF="../index.html"
|
|
><IMG SRC="../../gx/navbar/toc.jpg" align="middle"
|
|
ALT="[ Table Of Contents ]" border="0"></A
|
|
><A HREF="../lg_answer.html"
|
|
><IMG SRC="../../gx/dennis/answertoc.jpg" align="middle"
|
|
ALT="[ Answer Guy Current Index ]" border="0"></A></td>
|
|
<td align="center" valign="center"><A HREF="../lg_answer.html#greeting"><img align="middle"
|
|
src="../../gx/dennis/smily.gif" alt="greetings" border="0"></A>
|
|
<A HREF="../tag/bios.html">Meet the Gang</A>
|
|
<A HREF="1.html">1</A>
|
|
<A HREF="2.html">2</A>
|
|
<A HREF="3.html">3</A>
|
|
<A HREF="4.html">4</A>
|
|
<A HREF="5.html">5</A>
|
|
<A HREF="6.html">6</A>
|
|
<A HREF="7.html">7</A>
|
|
</td>
|
|
<td align="left" valign="center"><A HREF="../../tag/kb.html"
|
|
><IMG SRC="../../gx/dennis/answerpast.jpg" align="middle"
|
|
ALT="[ Index of Past Answers ]" border="0"></A
|
|
><IMG ALT="" SRC="../../gx/navbar/right.jpg" align="middle"
|
|
WIDTH="14" HEIGHT="45" BORDER="0"></td></tr></table>
|
|
</p>
|
|
<!-- end tagnav ::::::::::::::::::::::::::::::::::::::::::::::::::::-->
|
|
<!--endcut ========================================================= -->
|
|
<P> <hr>
|
|
<!--startcut ======================================================= -->
|
|
<CENTER>
|
|
<!-- *** BEGIN navbar *** -->
|
|
<!-- *** END navbar *** -->
|
|
</CENTER>
|
|
</p>
|
|
<!--endcut ========================================================= -->
|
|
<!--startcut ======================================================= -->
|
|
</BODY></HTML>
|
|
<!--endcut ========================================================= -->
|