mirror of https://github.com/tLDP/LDP
374 lines
18 KiB
Plaintext
374 lines
18 KiB
Plaintext
<section><title>Components of a running system</title>
|
|
<para>
|
|
This chapter reviews the components of a running CNews+NNTPd server.
|
|
Analogous components will be found in an INN-based system too. We invite
|
|
additions from readers familiar with INN to add their pieces to this
|
|
chapter.
|
|
</para>
|
|
|
|
<section><title><literal>/var/lib/news</literal>: the CNews control area</title>
|
|
<para>
|
|
This directory is more popularly known as <literal>$NEWSCTL</literal>. It
|
|
contains configuration, log and status files. There are no
|
|
articles or binaries kept here. Let's see what some of the
|
|
files are meant for.
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para><literal>sys</literal>:
|
|
One line per system/NDN listing all the newsgroup
|
|
hierarchies each system subscribes to. Each line is prefixed with the system
|
|
name and the one beginning with ME: indicates what we are going to receive.
|
|
Look up manpage of <literal>newssys</literal>.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>explist</literal>:
|
|
This file has entries indicating articles of which
|
|
newsgroup expire and when and if they have to be archived. The order in
|
|
which the newsgroups are listed is important. See manpage of
|
|
<literal>expire</literal> for file format.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>batchparms</literal>:
|
|
Details of how to feed other sites/NDN, like the size of
|
|
batches, the mode of transmission (UUCP/NNTP) are specified here.
|
|
manpage to refer: <literal>newsbatch</literal>.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>controlperm</literal>:
|
|
If you wish to authenticate a control message before any
|
|
action is taken on it, you must enter authentication-related information
|
|
here. The <literal>controlperm</literal> manpage will list all the fields
|
|
in detail.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>mailpaths</literal>:
|
|
It features the e-mail address of the moderator for each
|
|
newsgroup who is responsible for approving/disapproving
|
|
articles posted to moderated newsgroups. The sample
|
|
<literal>mailpaths</literal> file in the <literal>tar</literal> will
|
|
you give an idea of how entries are made.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>nntp_access/user_access</literal>:
|
|
These files contain entries of servernames
|
|
and usernames on whom restrictions will apply when accessing newsgroups.
|
|
Again, the sample file in the tarball shall explain the format of the file.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>log, errlog</literal>:
|
|
These are log files that keep growing large with each batch
|
|
that is received. The <literal>log</literal> file has one entry per
|
|
<literal>article</literal> telling you if it
|
|
has been accepted by your news server or rejected. To understand the
|
|
format of this file, refer to Chapter 2.2 of the <literal>CNews</literal>
|
|
guide. Errors, if any, while digesting the articles are
|
|
logged in <literal>errlog</literal>. These
|
|
log files have to be rolled as the files hog a lot of disk space.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>nntplog</literal>:
|
|
This file logs information of the <literal>nntp daemon</literal> giving
|
|
details of when a connection was established/broken and what commands were
|
|
issued. This file needs to be configured in syslog and syslog
|
|
<literal>daemon</literal> should be running.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>active</literal>:
|
|
This file has one line per newsgroup to be found in your news
|
|
server. Besides other things, it tells you how many articles are
|
|
currently present in each newsgroup. It is updated when each batch is
|
|
digested or when articles are expired. The <literal>active</literal>
|
|
manpage will furnish more details about other paramaters.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>history</literal>:
|
|
This file, again, contains one line per <literal>article</literal>, mapping
|
|
<literal>message-id</literal> to newsgroup name and also giving its
|
|
associated <literal>article</literal> no. in that newsgroup. It is updated
|
|
each time a feed is digested
|
|
and when <literal>doexpire</literal> is run. Plays a key role in
|
|
loop-detection and serves as an article database. Read manpage of
|
|
<literal>newsdb</literal>, <literal>doexpire</literal> for the file format
|
|
</para></listitem>
|
|
|
|
<listitem><para>newsgroups:
|
|
It has a one line description for each newsgroup explaining
|
|
what kind of posts go into each of them. Ideally speaking, it should cover
|
|
all the newsgroups found in the <literal>active</literal> file.
|
|
</para></listitem>
|
|
|
|
<listitem><para>Miscellaneous files:
|
|
Files like <literal>mailname</literal>, <literal>organisation</literal>,
|
|
<literal>whoami</literal> contain information required for forming some of
|
|
the headers of an <literal>article</literal>. The contents of
|
|
<literal>mailname</literal> form the <literal>From:</literal> header and
|
|
that of <literal>organisation</literal> form the
|
|
<literal>Organisation:</literal> header. <literal>whoami</literal> contains
|
|
the name of the news system. Refer to chapter 2.1 of
|
|
<literal>guide.ps</literal> for a detailed list of files in the
|
|
<literal>$NEWSCTL</literal> area. Read <literal>RFC 1036</literal> for
|
|
description of article headers .
|
|
</para></listitem>
|
|
</itemizedlist>
|
|
</section>
|
|
|
|
<section><title><literal>/var/spool/news</literal>: the article repository</title>
|
|
<para>
|
|
This is also known as the <literal>$NEWSARTS</literal> or
|
|
<literal>$NEWSSPOOL</literal> directory. This is where the
|
|
articles reside on your disk. No binaries or control files
|
|
should belong here. Enough space should be allocated to this
|
|
directory as the number of articles keep increasing with each
|
|
batch that is digested. An explanation of the following sub-directories will
|
|
give you an overview of this directory:
|
|
<itemizedlist>
|
|
<listitem><para><literal>in.coming</literal>:
|
|
Feeds/batches/articles from NDNs on their arrival and
|
|
before being processed reside in this directory. After processing, they
|
|
appear in
|
|
<literal>$NEWSARTS</literal> or in its <literal>bad</literal> sub-directory
|
|
if there were errors.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>out.going</literal>:
|
|
This directory contains batches/feeds to be sent to your
|
|
NDNs i.e. feeds to be pushed to your neighbouring sites reside here
|
|
before they are transmitted. It contains one sub-directory per NDN mentioned
|
|
in the <literal>sys</literal> file. These sub-directories contain files
|
|
called <literal>togo</literal>
|
|
which contain information about the <literal>article</literal> like the
|
|
<literal>message-id</literal>
|
|
or the article no. that is queued for transmission.
|
|
</para></listitem>
|
|
|
|
<listitem><para><anchor id="newsgroupdir"/>newsgroup directories:
|
|
For each newsgroup hierarchy that the news server
|
|
has subscribed to, a directory is created under
|
|
<literal>$NEWSARTS</literal>.
|
|
Further sub-directories are created under the parent to hold
|
|
articles of specific newsgroups. For instance, for a
|
|
newsgroup like <literal>comp.music.compose</literal>, the parent directory
|
|
<literal>comp</literal> will appear in <literal>$NEWSARTS</literal> and a
|
|
sub-directory called <literal>music</literal> will be created under
|
|
<literal>comp</literal>. The <literal>music</literal> sub-directory
|
|
shall contain a further sub-directory called <literal>compose</literal> and
|
|
all articles of <literal>comp.music.compose</literal>
|
|
shall reside here. In effect,
|
|
<literal>article</literal> 242 of newsgroup
|
|
<literal>comp.music.compose</literal> shall map to file
|
|
<literal>$NEWSARTS/comp/music/compose/242</literal>.
|
|
</para></listitem>
|
|
|
|
<listitem><para>control:
|
|
The control directory houses only the control messages that
|
|
have been received by this site. The control messages could be any of the
|
|
following: <literal>newgroup, rmgroup, checkgroup</literal> and
|
|
<literal>cancel</literal>
|
|
appearing in the subject line of the <literal>article</literal>.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>junk</literal>:
|
|
The <literal>junk</literal> directory contains all
|
|
articles that the news
|
|
server has received and has decided, after processing, that it does not
|
|
belong to any of the hierarchies it has subscribed to. The news server
|
|
transfers/passes all <literal>articles</literal> in this directory to NDNs
|
|
that have subscribed to the <literal>junk</literal> hierarchy.
|
|
</para></listitem>
|
|
</itemizedlist>
|
|
</section>
|
|
|
|
<section><title><literal>/usr/lib/newsbin</literal>: the executables</title>
|
|
<para></para>
|
|
</section>
|
|
|
|
<section id="cronjobs"><title><literal>crontab and cron jobs </literal></title>
|
|
<para>
|
|
The heart of the Usenet news server is the various scripts that run at regular
|
|
intervals processing articles, digesting/rejecting them and
|
|
transmitting them to NDNs. I shall try to enumerate the ones that are important
|
|
enough to be cronned. :)
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para><literal>newsrun</literal>:
|
|
The key script. This script picks the batches in the
|
|
<literal>in.coming</literal> directory, uncompresses them if necessary and
|
|
feeds it to <literal>relaynews</literal> which then processes each
|
|
<literal>article</literal> digesting and
|
|
batching them and logging any errors. This script needs to run through cron
|
|
as frequently as you want the feeds to be digested. Every half hour should
|
|
suffice for a non-critical requirement.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>sendbatches</literal>:
|
|
This script is run to transmit the togo files formed in
|
|
the <literal>out.going</literal> directory to your NDNs. It reads the
|
|
<literal>batchparms</literal> file to know
|
|
exactly how and to whom the batches need to be transmitted. The frequency,
|
|
again, can be set according to your requirements. Once an hour should be
|
|
sufficient.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>newsdaily</literal>:
|
|
This script does maintenance chores like rolling logs and
|
|
saving them, reporting errors/anomalies and doing cleanup jobs.
|
|
It should typically run once a day.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>newswatch</literal>:
|
|
This looks for news problems at a more detailed level than
|
|
newsdaily like looking for persistent lock files, determining if there is
|
|
enough space for a minimum no. of files, if there is a huge queue of
|
|
unattended batches and the likes. This should typically run once every hour.
|
|
For more on this and the above, read the <literal>newsmaint</literal>
|
|
manpage.
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>doexpire</literal>:
|
|
This script expires old articles as determined by the
|
|
control file <literal>explist</literal> and updates the
|
|
<literal>active</literal> file. This is necessary if you do not
|
|
want unnecessary/unwanted articels hogging up your disk space. Run it once
|
|
a day. Manpage: <literal>expire</literal>
|
|
</para></listitem>
|
|
|
|
<listitem><para><literal>newsrunning off/on</literal>:
|
|
This script shuts/starts off the news server for you.
|
|
You could choose to add this in your cron job if you think the news server
|
|
takes up lots of CPU time during peak hours and you wish to keep a check on
|
|
it.
|
|
</para></listitem>
|
|
</itemizedlist>
|
|
</section>
|
|
|
|
<section><title><literal>newsrun</literal> and <literal>relaynews</literal>: digesting received articles </title>
|
|
<para>
|
|
The heart and soul of the Usenet News system, <literal>newsrun</literal> just picks up the batches/
|
|
articles in the <literal>in.coming</literal> directory of
|
|
<literal>$NEWSARTS</literal> and uncompresses them (if required) and calls
|
|
<literal>relaynews</literal>. It should run from cron.
|
|
</para>
|
|
|
|
<para>
|
|
<literal>relaynews</literal> picks up each <literal>article</literal> one by one
|
|
through stdin, determines if it belongs to a subscribed group by looking up
|
|
<literal>sys</literal> file, looks in the <literal>history</literal> file
|
|
to determine that it does not already exist locally, digests it updating the
|
|
<literal>active</literal> and <literal>history</literal> file and batches it
|
|
for neighbouring sites. Logs errors on encountering problems while processing
|
|
the <literal>article</literal> and takes appropriate action if it happens to be
|
|
a control message. More info in manpage of <literal>relaynews</literal>.
|
|
</para>
|
|
</section>
|
|
|
|
<section><title><literal>doexpire</literal> and <literal>expire</literal>: removing old articles </title>
|
|
<para>
|
|
A good way to get rid of unwanted/old articles from the
|
|
<literal>$NEWSARTS</literal> area is to run doexpire once a day. It reads the
|
|
<literal>explist</literal> file from the <literal>$NEWSCTL</literal> directory
|
|
to determine what articles expire today. It can archive the
|
|
said <literal>article</literal> if so configured. It then updates the
|
|
<literal>active</literal> and the <literal>history</literal> file accordingly.
|
|
If you wish to retain the <literal>article</literal> entry in the
|
|
<literal>history</literal> file to avoid re-digesting it as a new
|
|
article after having expired it add a special /expired/; line
|
|
in the control file. More on the options and functioning in the expire manpage.
|
|
</para>
|
|
</section>
|
|
|
|
<section><title><literal>nntpd</literal> and <literal>msgidd</literal>: managing the NNTP interface </title>
|
|
<para>
|
|
As has already been discussed in the chapter on setting up the software,
|
|
<literal>nntpd</literal> is a TCP-based server daemon which runs under
|
|
<literal>inetd</literal>. It is fired by <literal>inetd</literal>
|
|
whenever there's an incoming connection on the NNTP port, and it takes
|
|
over the dialogue from there. It reads the C-News configuration and data
|
|
files in <literal>$NEWSCTL</literal>, article files from
|
|
<literal>$NEWSARTS></literal>, and receives incoming posts and
|
|
transfers. These it dutifully queues in
|
|
<literal>$NEWSARTS/in.coming</literal>, either as batch files or single
|
|
article files.</para>
|
|
|
|
<para>It is important that <literal>inetd</literal> be configured to
|
|
fire <literal>nntpd</literal> as user <literal>news</literal>, not as
|
|
<literal>root</literal> like it does for other daemons like
|
|
<literal>telnetd</literal> or <literal>ftpd</literal>. If this is not
|
|
done correctly, a lot of problems can be caused in the functioning of
|
|
the C-News system later.</para>
|
|
|
|
<para><literal>nntpd</literal> is fired each time a new NNTP connection
|
|
is received, and dies once the NNTP client closes its connection. Thus,
|
|
if one <literal>nntpd</literal> receives a few articles by an incoming
|
|
batch feed (not a <literal>POST</literal> but an <literal>XFER</literal>),
|
|
then another <literal>nntpd</literal> will not know about the receipt of
|
|
these articles till the batches are digested. This will hamper
|
|
duplicate newsfeed detection if there are multiple upstream NDNs feeding
|
|
our server with the same set of articles over NNTP. To fix this,
|
|
<literal>nntpd</literal> uses an ally: <literal>msgidd</literal>, the
|
|
message ID daemon. This
|
|
daemon is fired once at server bootup time through
|
|
<literal>newsboot</literal>, and keeps running quietly in the
|
|
background, listening on a named Unix socket in the
|
|
<literal>$NEWSCTL</literal> area. It keeps in its memory a list of all
|
|
message IDs which various incarnations of <literal>nntpd</literal> have
|
|
asked it to remember.</para>
|
|
|
|
<para>Thus, when one copy of <literal>nntpd</literal> receives an
|
|
incoming feed of news articles, it updates <literal>msgidd</literal>
|
|
with the message IDs of these messages through the Unix socket. When
|
|
another copy of <literal>nntpd</literal> is fired later and the NNTP
|
|
client tries to feed it some more articles, the <literal>nntpd</literal>
|
|
checks each message ID against <literal>msgidd</literal>. Since
|
|
<literal>msgidd</literal> stores all these IDs in memory, the lookup is
|
|
very fast, and duplicate articles are blocked at the NNTP interface
|
|
itself.</para>
|
|
|
|
<para>On a running system, expect to see one instance of
|
|
<literal>nntpd</literal> for each active NNTP connection, and just one
|
|
instance of <literal>msgidd</literal> running quietly in the background,
|
|
hardly consuming any CPU resources. Our <literal>nntpd</literal> is
|
|
configured to die if the NNTP connection is more than a few minutes
|
|
idle, thus conserving server resources. This does not inconvenience the
|
|
user because modern NNTP clients simply re-connect. If an
|
|
<literal>nntpd</literal> instance is found to be running for days, it is
|
|
either hung due to a network error, or is receiving a very long incoming
|
|
NNTP feed from your upstream server. We used to receive our primary
|
|
incoming feed from our service provider through NNTP sessions lasting 18
|
|
to 20 hours without a break, every day.</para>
|
|
|
|
</section>
|
|
|
|
<section><title><literal>nov</literal>, the News Overview system</title>
|
|
<para>NOV, the News Overview System is a recent augmentation to the
|
|
C-News and NNTP systems and to the NNTP protocol. This subsystem
|
|
maintains a file for each active newsgroup, in which it maintains one
|
|
line per current article. This line of text contains some key meta-data
|
|
about the article, <emphasis>e.g.</emphasis> the contents of the
|
|
<literal>From</literal>, <literal>Subject</literal>,
|
|
<literal>Date</literal> and the article size and message ID. This speeds
|
|
up NNTP response enormously. The <literal>nov</literal> library has been
|
|
integrated into the <literal>nntpd</literal> code, and into key binaries
|
|
of C-News, thus providing seamless maintenance of the News Overview
|
|
database when articles are added or deleted from the repository.</para>
|
|
|
|
<para>When <literal>newsrun</literal> adds an article into
|
|
<literal>starcom.test</literal>, it also updates
|
|
<literal>$NEWSARTS/starcom/test/.overview</literal> and adds a line with
|
|
the relevant data, tab-separated, into it. When <literal>nntpd</literal>
|
|
comes to life with an NNTP client, and it sees the
|
|
<literal>XOVER</literal> NNTP command, it reads this
|
|
<literal>.overview</literal> file, and returns the relevant lines to the
|
|
NNTP client. When <literal>expire</literal> deletes an article, it also
|
|
removes the corresponding line from the <literal>.overview</literal>
|
|
file. Thus, the maintenance of the NOV database is seamless.</para>
|
|
</section>
|
|
|
|
<section><title>Batching feeds with UUCP and NNTP</title>
|
|
<para>Some information about batching feeds has been provided in earlier
|
|
sections. More will be added later here in this document.</para>
|
|
</section>
|
|
|
|
</section>
|