old-www/HOWTO/Usenet-News-HOWTO/component.html

1067 lines
21 KiB
HTML

<HTML
><HEAD
><TITLE
>Components of a running system</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
"><LINK
REL="HOME"
TITLE="Usenet News HOWTO "
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="Access control in NNTPd"
HREF="x818.html"><LINK
REL="NEXT"
TITLE="Monitoring and administration"
HREF="x1094.html"></HEAD
><BODY
CLASS="SECTION"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Usenet News HOWTO</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="x818.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="x1094.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECTION"
><H1
CLASS="SECTION"
><A
NAME="COMPONENT">9. Components of a running system</H1
><P
>This chapter reviews the components of a running CNews+NNTPd server.
Analogous components will be found in an INN-based system too. We invite
additions from readers familiar with INN to add their pieces to this
chapter.</P
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN844">9.1. <TT
CLASS="LITERAL"
>/var/lib/news</TT
>: the CNews control area</H2
><P
>This directory is more popularly known as <TT
CLASS="LITERAL"
>$NEWSCTL</TT
>. It
contains configuration, log and status files. There are no
articles or binaries kept here. Let's see what some of the
files are meant for. Control files are dealt in slightly greater detail in
<SPAN
CLASS="QUOTE"
>"<A
HREF="settingup.html#CONFIGURESYSTEM"
>Section 4.3</A
>&#62;"</SPAN
></P
><P
></P
><UL
><LI
><P
><TT
CLASS="LITERAL"
>sys</TT
>:
One line per system/NDN listing all the newsgroup
hierarchies each system subscribes to. Each line is prefixed with the system
name and the one beginning with ME: indicates what we are going to receive.
Look up manpage of <TT
CLASS="LITERAL"
>newssys</TT
>.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>explist</TT
>:
This file has entries indicating articles of which
newsgroup expire and when and if they have to be archived. The order in
which the newsgroups are listed is important. See manpage of
<TT
CLASS="LITERAL"
>expire</TT
> for file format.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>batchparms</TT
>:
Details of how to feed other sites/NDN, like the size of
batches, the mode of transmission (UUCP/NNTP) are specified here.
manpage to refer: <TT
CLASS="LITERAL"
>newsbatch</TT
>.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>controlperm</TT
>:
If you wish to authenticate a control message before any
action is taken on it, you must enter authentication-related information
here. The <TT
CLASS="LITERAL"
>controlperm</TT
> manpage will list all the fields
in detail.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>mailpaths</TT
>:
It features the e-mail address of the moderator for each
newsgroup who is responsible for approving/disapproving
articles posted to moderated newsgroups. The sample
<TT
CLASS="LITERAL"
>mailpaths</TT
> file in the <TT
CLASS="LITERAL"
>tar</TT
> will
give you an idea of how entries are made.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>nntp_access/user_access</TT
>:
These files contain entries of servernames
and usernames on whom restrictions will apply when accessing newsgroups.
Again, the sample file in the tarball shall explain the format of the file.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>log, errlog</TT
>:
These are log files that keep growing large with each batch
that is received. The <TT
CLASS="LITERAL"
>log</TT
> file has one entry per
article telling you if it
has been accepted by your news server or rejected. To understand the
format of this file, refer to Chapter 2.2 of the <TT
CLASS="LITERAL"
>CNews</TT
>
guide. Errors, if any, while digesting the articles are
logged in <TT
CLASS="LITERAL"
>errlog</TT
>. These
log files have to be rolled as the files hog a lot of disk space. </P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>nntplog</TT
>:
This file logs information of the <TT
CLASS="LITERAL"
>nntpd</TT
> giving
details of when a connection was established/broken and what commands were
issued. This file needs to be configured in <TT
CLASS="LITERAL"
>syslog</TT
>
<TT
CLASS="LITERAL"
>syslogd</TT
> should be running.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>active</TT
>:
This file has one line per newsgroup to be found in your news
server. Besides other things, it tells you how many articles are
currently present in each newsgroup. It is updated when each batch is
digested or when articles are expired. The <TT
CLASS="LITERAL"
>active</TT
>
manpage will furnish more details about other paramaters.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>history</TT
>:
This file, again, contains one line per article, mapping
<TT
CLASS="LITERAL"
>message-id</TT
> to newsgroup name and also giving its
associated article number in that newsgroup. It is updated
each time a feed is digested
and when <TT
CLASS="LITERAL"
>doexpire</TT
> is run. Plays a key role in
loop-detection and serves as an article database. Read manpage of
<TT
CLASS="LITERAL"
>newsdb</TT
>, <TT
CLASS="LITERAL"
>doexpire</TT
> for the file format </P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>newsgroups</TT
>:
It has a one-line description for each newsgroup explaining
what kind of posts go into each of them. Ideally speaking, it should cover
all the newsgroups found in the <TT
CLASS="LITERAL"
>active</TT
> file.</P
></LI
><LI
><P
><EM
>Miscellaneous files</EM
>:
Files like <TT
CLASS="LITERAL"
>mailname</TT
>, <TT
CLASS="LITERAL"
>organisation</TT
>,
<TT
CLASS="LITERAL"
>whoami</TT
> contain information required for forming some of
the headers of an article. The contents of
<TT
CLASS="LITERAL"
>mailname</TT
> form the <TT
CLASS="LITERAL"
>From:</TT
> header and
that of <TT
CLASS="LITERAL"
>organisation</TT
> form the
<TT
CLASS="LITERAL"
>Organisation:</TT
> header. <TT
CLASS="LITERAL"
>whoami</TT
> contains
the name of the news system. Refer to chapter 2.1 of
<TT
CLASS="LITERAL"
>guide.ps</TT
> for a detailed list of files in the
<TT
CLASS="LITERAL"
>$NEWSCTL</TT
> area. Read <TT
CLASS="LITERAL"
>RFC 1036</TT
> for
description of article headers .</P
></LI
></UL
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN917">9.2. <TT
CLASS="LITERAL"
>/var/spool/news</TT
>: the article repository</H2
><P
>This is also known as the <TT
CLASS="LITERAL"
>$NEWSARTS</TT
> or
<TT
CLASS="LITERAL"
>$NEWSSPOOL</TT
> directory. This is where the
articles reside on your disk. No binaries or control files
should belong here. Enough space should be allocated to this
directory as the number of articles keep increasing with each
batch that is digested. An explanation of the following sub-directories will
give you an overview of this directory:
<P
></P
><UL
><LI
><P
><TT
CLASS="LITERAL"
>in.coming</TT
>:
Feeds/batches/articles from NDNs on their arrival and
before being processed reside in this directory. After processing, they
appear in
<TT
CLASS="LITERAL"
>$NEWSARTS</TT
> or in its <TT
CLASS="LITERAL"
>bad</TT
> sub-directory
if there were errors. </P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>out.going</TT
>:
This directory contains batches/feeds to be sent to your
NDNs <EM
>i.e.</EM
> feeds to be pushed to your neighbouring sites
reside here before they are transmitted. It contains one sub-directory per
NDN mentioned in the <TT
CLASS="LITERAL"
>sys</TT
> file. These sub-directories
contain files called <TT
CLASS="LITERAL"
>togo</TT
> which contain information about
the article like the <TT
CLASS="LITERAL"
>message-id</TT
> or the article number
that is queued for transmission. </P
></LI
><LI
><P
><A
NAME="NEWSGROUPDIR"
></A
>&#62;<EM
>newsgroup directories</EM
>:
For each newsgroup hierarchy that the news server
has subscribed to, a directory is created under
<TT
CLASS="LITERAL"
>$NEWSARTS</TT
>.
Further sub-directories are created under the parent to hold
articles of specific newsgroups. For instance, for a
newsgroup like <TT
CLASS="LITERAL"
>comp.music.compose</TT
>, the parent directory
<TT
CLASS="LITERAL"
>comp</TT
> will appear in <TT
CLASS="LITERAL"
>$NEWSARTS</TT
> and a
sub-directory called <TT
CLASS="LITERAL"
>music</TT
> will be created under
<TT
CLASS="LITERAL"
>comp</TT
>. The <TT
CLASS="LITERAL"
>music</TT
> sub-directory
shall contain a further sub-directory called <TT
CLASS="LITERAL"
>compose</TT
> and
all articles of <TT
CLASS="LITERAL"
>comp.music.compose</TT
>
shall reside here. In effect, article 242 of newsgroup
<TT
CLASS="LITERAL"
>comp.music.compose</TT
> shall map to file
<TT
CLASS="LITERAL"
>$NEWSARTS/comp/music/compose/242</TT
>.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>control</TT
>:
The control directory houses only the control messages that
have been received by this site. The control messages could be any of the
following: <TT
CLASS="LITERAL"
>newgroup, rmgroup, checkgroup</TT
> and
<TT
CLASS="LITERAL"
>cancel</TT
> appearing in the subject line of the article.
More information to be found in <SPAN
CLASS="QUOTE"
>"<A
HREF="x64.html#CONTROLMSG"
>Section 2.4</A
>&#62;"</SPAN
></P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>junk</TT
>:
The <TT
CLASS="LITERAL"
>junk</TT
> directory contains all
articles that the news
server has received and has decided, after processing, that it does not
belong to any of the hierarchies it has subscribed to. The news server
transfers/passes all articles in this directory to NDNs
that have subscribed to the <TT
CLASS="LITERAL"
>junk</TT
> hierarchy.</P
></LI
></UL
></P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN963">9.3. <TT
CLASS="LITERAL"
>/usr/lib/newsbin</TT
>: the executables</H2
><P
>TO BE ADDED LATER</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="CRONJOBS">9.4. <TT
CLASS="LITERAL"
>crontab and cron jobs </TT
></H2
><P
>The heart of the Usenet news server is the various scripts that run at regular
intervals processing articles, digesting/rejecting them and
transmitting them to NDNs. I shall try to enumerate the ones that are important
enough to be cronned. :)</P
><P
></P
><UL
><LI
><P
><TT
CLASS="LITERAL"
>newsrun</TT
>:
The key script. This script picks the batches in the
<TT
CLASS="LITERAL"
>in.coming</TT
> directory, uncompresses them if necessary and
feeds it to <TT
CLASS="LITERAL"
>relaynews</TT
> which then processes each
article digesting and batching them and logging any errors. This script
needs to run through <TT
CLASS="LITERAL"
>cron</TT
>
as frequently as you want the feeds to be digested. Every half hour should
suffice for a non-critical requirement.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>sendbatches</TT
>:
This script is run to transmit the <TT
CLASS="LITERAL"
>togo</TT
> files formed in
the <TT
CLASS="LITERAL"
>out.going</TT
> directory to your NDNs. It reads the
<TT
CLASS="LITERAL"
>batchparms</TT
> file to know
exactly how and to whom the batches need to be transmitted. The frequency,
again, can be set according to your requirements. Once an hour should be
sufficient.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>newsdaily</TT
>:
This script does maintenance chores like rolling logs and
saving them, reporting errors/anomalies and doing cleanup jobs.
It should typically run once a day.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>newswatch</TT
>:
This looks for news problems at a more detailed level than
newsdaily like looking for persistent lock files or unattended batches,
determining space shortage issues, and the likes. This should typically run
once every hour. For more on this and the above, read the
<TT
CLASS="LITERAL"
>newsmaint</TT
> manpage.</P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>doexpire</TT
>:
This script expires old articles as determined by the
control file <TT
CLASS="LITERAL"
>explist</TT
> and updates the
<TT
CLASS="LITERAL"
>active</TT
> file. This is necessary if you do not
want unnecessary/unwanted articels hogging up your disk space. Run it once
a day. Manpage: <TT
CLASS="LITERAL"
>expire</TT
></P
></LI
><LI
><P
><TT
CLASS="LITERAL"
>newsrunning off/on</TT
>:
This script shuts/starts off the news server for you.
You could choose to add this in your cron job if you think the news server
takes up lots of CPU time during peak hours and you wish to keep a check on
it. </P
></LI
></UL
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN1000">9.5. <TT
CLASS="LITERAL"
>newsrun</TT
> and <TT
CLASS="LITERAL"
>relaynews</TT
>: digesting received articles</H2
><P
>The heart and soul of the Usenet News system, <TT
CLASS="LITERAL"
>newsrun</TT
> just picks up the batches/
articles in the <TT
CLASS="LITERAL"
>in.coming</TT
> directory of
<TT
CLASS="LITERAL"
>$NEWSARTS</TT
> and uncompresses them (if required) and calls
<TT
CLASS="LITERAL"
>relaynews</TT
>. It should run from cron.</P
><P
><TT
CLASS="LITERAL"
>relaynews</TT
> picks up each article one by one through
<TT
CLASS="LITERAL"
>stdin</TT
>, determines if it belongs to a subscribed group
by looking up
<TT
CLASS="LITERAL"
>sys</TT
> file, looks in the <TT
CLASS="LITERAL"
>history</TT
> file
to determine that it does not already exist locally, digests it updating the
<TT
CLASS="LITERAL"
>active</TT
> and <TT
CLASS="LITERAL"
>history</TT
> file and batches it
for neighbouring sites. Logs errors on encountering problems while processing
the article and takes appropriate action if it happens to be
a control message. More info in manpage of <TT
CLASS="LITERAL"
>relaynews</TT
>.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN1017">9.6. <TT
CLASS="LITERAL"
>doexpire</TT
> and <TT
CLASS="LITERAL"
>expire</TT
>: removing old articles</H2
><P
>A good way to get rid of unwanted/old articles from the
<TT
CLASS="LITERAL"
>$NEWSARTS</TT
> area is to run <TT
CLASS="LITERAL"
>doexpire</TT
> once a
day. It reads the
<TT
CLASS="LITERAL"
>explist</TT
> file from the <TT
CLASS="LITERAL"
>$NEWSCTL</TT
> directory
to determine what articles expire today. It can archive the
said article if so configured. It then updates the
<TT
CLASS="LITERAL"
>active</TT
> and the <TT
CLASS="LITERAL"
>history</TT
> file accordingly.
If you wish to retain the article entry in the
<TT
CLASS="LITERAL"
>history</TT
> file to avoid re-digesting it as a new
article after having expired it, add a special <EM
>/expired/;</EM
>
line in the control file. More on the options and functioning in the
<TT
CLASS="LITERAL"
>expire </TT
> manpage.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN1031">9.7. <TT
CLASS="LITERAL"
>nntpd</TT
> and <TT
CLASS="LITERAL"
>msgidd</TT
>: managing the NNTP interface</H2
><P
>As has already been discussed in the chapter on setting up the software,
<TT
CLASS="LITERAL"
>nntpd</TT
> is a TCP-based server daemon which runs under
<TT
CLASS="LITERAL"
>inetd</TT
>. It is fired by <TT
CLASS="LITERAL"
>inetd</TT
>
whenever there's an incoming connection on the NNTP port, and it takes
over the dialogue from there. It reads the C-News configuration and data
files in <TT
CLASS="LITERAL"
>$NEWSCTL</TT
>, article files from
<TT
CLASS="LITERAL"
>$NEWSARTS&#62;</TT
>, and receives incoming posts and
transfers. These it dutifully queues in
<TT
CLASS="LITERAL"
>$NEWSARTS/in.coming</TT
>, either as batch files or single
article files.</P
><P
>It is important that <TT
CLASS="LITERAL"
>inetd</TT
> be configured to
fire <TT
CLASS="LITERAL"
>nntpd</TT
> as user <TT
CLASS="LITERAL"
>news</TT
>, not as
<TT
CLASS="LITERAL"
>root</TT
> like it does for other daemons like
<TT
CLASS="LITERAL"
>telnetd</TT
> or <TT
CLASS="LITERAL"
>ftpd</TT
>. If this is not
done correctly, a lot of problems can be caused in the functioning of
the C-News system later.</P
><P
><TT
CLASS="LITERAL"
>nntpd</TT
> is fired each time a new NNTP connection
is received, and dies once the NNTP client closes its connection. Thus,
if one <TT
CLASS="LITERAL"
>nntpd</TT
> receives a few articles by an incoming
batch feed (not a <TT
CLASS="LITERAL"
>POST</TT
> but an <TT
CLASS="LITERAL"
>XFER</TT
>),
then another <TT
CLASS="LITERAL"
>nntpd</TT
> will not know about the receipt of
these articles till the batches are digested. This will hamper
duplicate newsfeed detection if there are multiple upstream NDNs feeding
our server with the same set of articles over NNTP. To fix this,
<TT
CLASS="LITERAL"
>nntpd</TT
> uses an ally: <TT
CLASS="LITERAL"
>msgidd</TT
>, the
message ID daemon. This
daemon is fired once at server bootup time through
<TT
CLASS="LITERAL"
>newsboot</TT
>, and keeps running quietly in the
background, listening on a named Unix socket in the
<TT
CLASS="LITERAL"
>$NEWSCTL</TT
> area. It keeps in its memory a list of all
message IDs which various incarnations of <TT
CLASS="LITERAL"
>nntpd</TT
> have
asked it to remember.</P
><P
>Thus, when one copy of <TT
CLASS="LITERAL"
>nntpd</TT
> receives an
incoming feed of news articles, it updates <TT
CLASS="LITERAL"
>msgidd</TT
>
with the message IDs of these messages through the Unix socket. When
another copy of <TT
CLASS="LITERAL"
>nntpd</TT
> is fired later and the NNTP
client tries to feed it some more articles, the <TT
CLASS="LITERAL"
>nntpd</TT
>
checks each message ID against <TT
CLASS="LITERAL"
>msgidd</TT
>. Since
<TT
CLASS="LITERAL"
>msgidd</TT
> stores all these IDs in memory, the lookup is
very fast, and duplicate articles are blocked at the NNTP interface
itself.</P
><P
>On a running system, expect to see one instance of
<TT
CLASS="LITERAL"
>nntpd</TT
> for each active NNTP connection, and just one
instance of <TT
CLASS="LITERAL"
>msgidd</TT
> running quietly in the background,
hardly consuming any CPU resources. Our <TT
CLASS="LITERAL"
>nntpd</TT
> is
configured to die if the NNTP connection is more than a few minutes
idle, thus conserving server resources. This does not inconvenience the
user because modern NNTP clients simply re-connect. If an
<TT
CLASS="LITERAL"
>nntpd</TT
> instance is found to be running for days, it is
either hung due to a network error, or is receiving a very long incoming
NNTP feed from your upstream server. We used to receive our primary
incoming feed from our service provider through NNTP sessions lasting 18
to 20 hours without a break, every day.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN1072">9.8. <TT
CLASS="LITERAL"
>nov</TT
>, the News Overview system</H2
><P
>NOV, the News Overview System is a recent augmentation to the
C-News and NNTP systems and to the NNTP protocol. This subsystem
maintains a file for each active newsgroup, in which it maintains one
line per current article. This line of text contains some key meta-data
about the article, <EM
>e.g.</EM
> the contents of the
<TT
CLASS="LITERAL"
>From</TT
>, <TT
CLASS="LITERAL"
>Subject</TT
>,
<TT
CLASS="LITERAL"
>Date</TT
> and the article size and message ID. This speeds
up NNTP response enormously. The <TT
CLASS="LITERAL"
>nov</TT
> library has been
integrated into the <TT
CLASS="LITERAL"
>nntpd</TT
> code, and into key binaries
of C-News, thus providing seamless maintenance of the News Overview
database when articles are added or deleted from the repository.</P
><P
>When <TT
CLASS="LITERAL"
>newsrun</TT
> adds an article into
<TT
CLASS="LITERAL"
>starcom.test</TT
>, it also updates
<TT
CLASS="LITERAL"
>$NEWSARTS/starcom/test/.overview</TT
> and adds a line with
the relevant data, tab-separated, into it. When <TT
CLASS="LITERAL"
>nntpd</TT
>
comes to life with an NNTP client, and it sees the
<TT
CLASS="LITERAL"
>XOVER</TT
> NNTP command, it reads this
<TT
CLASS="LITERAL"
>.overview</TT
> file, and returns the relevant lines to the
NNTP client. When <TT
CLASS="LITERAL"
>expire</TT
> deletes an article, it also
removes the corresponding line from the <TT
CLASS="LITERAL"
>.overview</TT
>
file. Thus, the maintenance of the NOV database is seamless.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN1091">9.9. Batching feeds with UUCP and NNTP</H2
><P
>Some information about batching feeds has been provided in earlier
sections. More will be added later here in this document.</P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="x818.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="x1094.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Access control in NNTPd</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Monitoring and administration</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>