mirror of https://github.com/tLDP/LDP
new
This commit is contained in:
parent
f5992572cc
commit
a244760233
|
@ -0,0 +1,66 @@
|
|||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
|
||||
<!ENTITY what system "what.sgml">
|
||||
<!ENTITY pop system "pop.sgml">
|
||||
<!ENTITY software system "software.sgml">
|
||||
<!ENTITY settingup system "settingup.sgml">
|
||||
<!ENTITY inn system "inn.sgml">
|
||||
<!ENTITY mail2news system "mail2news.sgml">
|
||||
<!ENTITY accesscontrol system "accesscontrol.sgml">
|
||||
<!ENTITY components system "components.sgml">
|
||||
<!ENTITY monitoring system "monitoring.sgml">
|
||||
<!ENTITY clients system "clients.sgml">
|
||||
<!ENTITY perspective system "perspective.sgml">
|
||||
<!ENTITY doc system "doc.sgml">
|
||||
<!ENTITY conclusion system "conclusion.sgml">
|
||||
]>
|
||||
|
||||
<book>
|
||||
<bookinfo>
|
||||
<title>Usenet News HOWTO </title>
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Shuvam Misra</firstname>
|
||||
<othername>
|
||||
</othername>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>Hema Kariyappa</firstname>
|
||||
<othername>
|
||||
<emphasis>(usenet@starcomsoftware.com)</emphasis>
|
||||
</othername>
|
||||
</author>
|
||||
</authorgroup>
|
||||
<address>Starcom Software Private Limited.
|
||||
starcomsoftware.com
|
||||
<country>Mumbai, India</country>
|
||||
</address>
|
||||
<revhistory>
|
||||
<revision>
|
||||
<revnumber>2.0</revnumber>
|
||||
<date>2002-07-30</date>
|
||||
<authorinitials>sm</authorinitials>
|
||||
<revremark>Major update by new authors.</revremark>
|
||||
</revision>
|
||||
<revision>
|
||||
<revnumber>1.4</revnumber>
|
||||
<date>1995-11-29</date>
|
||||
<authorinitials>vs</authorinitials>
|
||||
<revremark>Original document; authored by Vince Skahan.</revremark>
|
||||
</revision>
|
||||
</revhistory>
|
||||
</bookinfo>
|
||||
<toc></toc>
|
||||
&what;
|
||||
&pop;
|
||||
&software;
|
||||
&settingup;
|
||||
&inn;
|
||||
&mail2news;
|
||||
&accesscontrol;
|
||||
&components;
|
||||
&monitoring;
|
||||
&clients;
|
||||
&perspective;
|
||||
&doc;
|
||||
&conclusion;
|
||||
</book>
|
|
@ -0,0 +1,52 @@
|
|||
<chapter><title>Access control in NNTPd</title>
|
||||
<para>
|
||||
The original NNTPd had host-based authentication which allowed clients
|
||||
connecting from a particular IP address to read only certain newsgroups.
|
||||
This was very clearly inadequate for enterprise deployment on an
|
||||
Intranet, where each desktop computer has a different IP address, often
|
||||
DHCP-assigned, and the mapping between person and desktop is not static.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
What was needed was a user-based authentication, where a username and
|
||||
password could be used to authenticate the user. Even this was provided
|
||||
as an extension to NNTPd, but more was needed. The corporate IS manager
|
||||
needs to ensure that certain Usenet discussion groups remain visible only
|
||||
to certain people. This authorisation layer was not available in NNTPd.
|
||||
Once authenticated, all users could read all newsgroups.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
We have extended the user-based authentication facility in NNTPd in some
|
||||
(we hope!) useful ways, and we have added an entire authorisation layer
|
||||
which lets the administrator specify which newsgroups each user can
|
||||
read. With this infrastructure, we feel NNTPd is fit for enterprise
|
||||
deployment and can be used to handle corporate document repositories,
|
||||
messages, and discussion archives. Details are given below.
|
||||
</para>
|
||||
|
||||
<section><title>Host-based access control</title>
|
||||
<para></para>
|
||||
</section>
|
||||
|
||||
<section><title>User authentication and authorisation</title>
|
||||
<para></para>
|
||||
|
||||
<section><title>The NNTPd password file</title>
|
||||
<para></para>
|
||||
</section>
|
||||
|
||||
<section><title>Mapping users to newsgroups</title>
|
||||
<para></para>
|
||||
</section>
|
||||
|
||||
<section><title>The <literal>X-Authenticated-Author</literal> article header</title>
|
||||
<para></para>
|
||||
</section>
|
||||
|
||||
<section><title>Other article header additions</title>
|
||||
<para></para>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
</chapter>
|
|
@ -0,0 +1,67 @@
|
|||
<chapter><title>Usenet news clients</title>
|
||||
<para>
|
||||
This HOWTO was written to allow a Linux system administrator provide the
|
||||
Usenet news service to readers of those articles. The rest of this HOWTO
|
||||
focuses on the server-end software and systems, but one chapter
|
||||
dedicated to the clients does not seem disproportionate, considering
|
||||
that the <emphasis>raison d'etre</emphasis> of Usenet news servers is to serve
|
||||
these clients.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The overwhelming majority of clients are software programs which access
|
||||
the article database, either by reading <literal>/var/spool/news</literal> on a
|
||||
Unix system or over NNTP, and allow their human users to read and post
|
||||
articles. We can therefore probably term this class of programs UUA, for
|
||||
Usenet User Agents, along the lines of MUA for Mail User Agents.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There are other special-purpose clients, which either pull out articles
|
||||
to copy or transfer somewhere else, or for analysis, <emphasis>e.g.</emphasis> a
|
||||
search engine which allows you to search a Usenet article archive, like Google
|
||||
(<literal>www.google.com</literal>) does.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This chapter will discuss issues in UUA software design, and bring out
|
||||
essential features and efficiency and management issues. What this
|
||||
chapter will certainly <emphasis>never</emphasis> attempt to do is catalogue all
|
||||
the different UUA programs available in the world --- that is best left to
|
||||
specialised catalogues on the Internet.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This chapter will also briefly cover special-purpose clients which
|
||||
transfer articles or do other special-purpose things with them.
|
||||
</para>
|
||||
|
||||
<section><title>Usenet User Agents</title>
|
||||
|
||||
<section><title>Accessing articles: NNTP or spool area?</title>
|
||||
<para></para>
|
||||
</section>
|
||||
|
||||
<section><title>Threading</title>
|
||||
<para></para>
|
||||
</section>
|
||||
|
||||
<section><title>Quick reading features</title>
|
||||
<para></para>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Clients that transfer articles</title>
|
||||
|
||||
<para>
|
||||
We will discuss Suck and <literal>nntpxfer</literal> from the NNTP server
|
||||
distribution here. Suck has already discussed earlier. We will be happy
|
||||
to take contributed additions that discuss other client software.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title>Special clients</title>
|
||||
<para></para>
|
||||
</section>
|
||||
</chapter>
|
|
@ -0,0 +1,373 @@
|
|||
<chapter><title>Components of a running system</title>
|
||||
<para>
|
||||
This chapter reviews the components of a running CNews+NNTPd server.
|
||||
Analogous components will be found in an INN-based system too. We invite
|
||||
additions from readers familiar with INN to add their pieces to this
|
||||
chapter.
|
||||
</para>
|
||||
|
||||
<section><title><literal>/var/lib/news</literal>: the CNews control area</title>
|
||||
<para>
|
||||
This directory is more popularly known as <literal>$NEWSCTL</literal>. It
|
||||
contains configuration, log and status files. There are no
|
||||
articles or binaries kept here. Let's see what some of the
|
||||
files are meant for.
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>sys</literal>:
|
||||
One line per system/NDN listing all the newsgroup
|
||||
hierarchies each system subscribes to. Each line is prefixed with the system
|
||||
name and the one beginning with ME: indicates what we are going to receive.
|
||||
Look up manpage of <literal>newssys</literal>.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>explist</literal>:
|
||||
This file has entries indicating articles of which
|
||||
newsgroup expire and when and if they have to be archived. The order in
|
||||
which the newsgroups are listed is important. See manpage of
|
||||
<literal>expire</literal> for file format.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>batchparms</literal>:
|
||||
Details of how to feed other sites/NDN, like the size of
|
||||
batches, the mode of transmission (UUCP/NNTP) are specified here.
|
||||
manpage to refer: <literal>newsbatch</literal>.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>controlperm</literal>:
|
||||
If you wish to authenticate a control message before any
|
||||
action is taken on it, you must enter authentication-related information
|
||||
here. The <literal>controlperm</literal> manpage will list all the fields
|
||||
in detail.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>mailpaths</literal>:
|
||||
It features the e-mail address of the moderator for each
|
||||
newsgroup who is responsible for approving/disapproving
|
||||
articles posted to moderated newsgroups. The sample
|
||||
<literal>mailpaths</literal> file in the <literal>tar</literal> will
|
||||
you give an idea of how entries are made.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>nntp_access/user_access</literal>:
|
||||
These files contain entries of servernames
|
||||
and usernames on whom restrictions will apply when accessing newsgroups.
|
||||
Again, the sample file in the tarball shall explain the format of the file.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>log, errlog</literal>:
|
||||
These are log files that keep growing large with each batch
|
||||
that is received. The <literal>log</literal> file has one entry per
|
||||
<literal>article</literal> telling you if it
|
||||
has been accepted by your news server or rejected. To understand the
|
||||
format of this file, refer to Chapter 2.2 of the <literal>CNews</literal>
|
||||
guide. Errors, if any, while digesting the articles are
|
||||
logged in <literal>errlog</literal>. These
|
||||
log files have to be rolled as the files hog a lot of disk space.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>nntplog</literal>:
|
||||
This file logs information of the <literal>nntp daemon</literal> giving
|
||||
details of when a connection was established/broken and what commands were
|
||||
issued. This file needs to be configured in syslog and syslog
|
||||
<literal>daemon</literal> should be running.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>active</literal>:
|
||||
This file has one line per newsgroup to be found in your news
|
||||
server. Besides other things, it tells you how many articles are
|
||||
currently present in each newsgroup. It is updated when each batch is
|
||||
digested or when articles are expired. The <literal>active</literal>
|
||||
manpage will furnish more details about other paramaters.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>history</literal>:
|
||||
This file, again, contains one line per <literal>article</literal>, mapping
|
||||
<literal>message-id</literal> to newsgroup name and also giving its
|
||||
associated <literal>article</literal> no. in that newsgroup. It is updated
|
||||
each time a feed is digested
|
||||
and when <literal>doexpire</literal> is run. Plays a key role in
|
||||
loop-detection and serves as an article database. Read manpage of
|
||||
<literal>newsdb</literal>, <literal>doexpire</literal> for the file format
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>newsgroups:
|
||||
It has a one line description for each newsgroup explaining
|
||||
what kind of posts go into each of them. Ideally speaking, it should cover
|
||||
all the newsgroups found in the <literal>active</literal> file.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Miscellaneous files:
|
||||
Files like <literal>mailname</literal>, <literal>organisation</literal>,
|
||||
<literal>whoami</literal> contain information required for forming some of
|
||||
the headers of an <literal>article</literal>. The contents of
|
||||
<literal>mailname</literal> form the <literal>From:</literal> header and
|
||||
that of <literal>organisation</literal> form the
|
||||
<literal>Organisation:</literal> header. <literal>whoami</literal> contains
|
||||
the name of the news system. Refer to chapter 2.1 of
|
||||
<literal>guide.ps</literal> for a detailed list of files in the
|
||||
<literal>$NEWSCTL</literal> area. Read <literal>RFC 1036</literal> for
|
||||
description of article headers .
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section><title><literal>/var/spool/news</literal>: the article repository</title>
|
||||
<para>
|
||||
This is also known as the <literal>$NEWSARTS</literal> or
|
||||
<literal>$NEWSSPOOL</literal> directory. This is where the
|
||||
articles reside on your disk. No binaries or control files
|
||||
should belong here. Enough space should be allocated to this
|
||||
directory as the number of articles keep increasing with each
|
||||
batch that is digested. An explanation of the following sub-directories will
|
||||
give you an overview of this directory:
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>in.coming</literal>:
|
||||
Feeds/batches/articles from NDNs on their arrival and
|
||||
before being processed reside in this directory. After processing, they
|
||||
appear in
|
||||
<literal>$NEWSARTS</literal> or in its <literal>bad</literal> sub-directory
|
||||
if there were errors.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>out.going</literal>:
|
||||
This directory contains batches/feeds to be sent to your
|
||||
NDNs i.e. feeds to be pushed to your neighbouring sites reside here
|
||||
before they are transmitted. It contains one sub-directory per NDN mentioned
|
||||
in the <literal>sys</literal> file. These sub-directories contain files
|
||||
called <literal>togo</literal>
|
||||
which contain information about the <literal>article</literal> like the
|
||||
<literal>message-id</literal>
|
||||
or the article no. that is queued for transmission.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><anchor id="newsgroupdir"/>newsgroup directories:
|
||||
For each newsgroup hierarchy that the news server
|
||||
has subscribed to, a directory is created under
|
||||
<literal>$NEWSARTS</literal>.
|
||||
Further sub-directories are created under the parent to hold
|
||||
articles of specific newsgroups. For instance, for a
|
||||
newsgroup like <literal>comp.music.compose</literal>, the parent directory
|
||||
<literal>comp</literal> will appear in <literal>$NEWSARTS</literal> and a
|
||||
sub-directory called <literal>music</literal> will be created under
|
||||
<literal>comp</literal>. The <literal>music</literal> sub-directory
|
||||
shall contain a further sub-directory called <literal>compose</literal> and
|
||||
all articles of <literal>comp.music.compose</literal>
|
||||
shall reside here. In effect,
|
||||
<literal>article</literal> 242 of newsgroup
|
||||
<literal>comp.music.compose</literal> shall map to file
|
||||
<literal>$NEWSARTS/comp/music/compose/242</literal>.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>control:
|
||||
The control directory houses only the control messages that
|
||||
have been received by this site. The control messages could be any of the
|
||||
following: <literal>newgroup, rmgroup, checkgroup</literal> and
|
||||
<literal>cancel</literal>
|
||||
appearing in the subject line of the <literal>article</literal>.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>junk</literal>:
|
||||
The <literal>junk</literal> directory contains all
|
||||
articles that the news
|
||||
server has received and has decided, after processing, that it does not
|
||||
belong to any of the hierarchies it has subscribed to. The news server
|
||||
transfers/passes all <literal>articles</literal> in this directory to NDNs
|
||||
that have subscribed to the <literal>junk</literal> hierarchy.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section><title><literal>/usr/lib/newsbin</literal>: the executables</title>
|
||||
<para></para>
|
||||
</section>
|
||||
|
||||
<section id="cronjobs"><title><literal>crontab and cron jobs </literal></title>
|
||||
<para>
|
||||
The heart of the Usenet news server is the various scripts that run at regular
|
||||
intervals processing articles, digesting/rejecting them and
|
||||
transmitting them to NDNs. I shall try to enumerate the ones that are important
|
||||
enough to be cronned. :)
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>newsrun</literal>:
|
||||
The key script. This script picks the batches in the
|
||||
<literal>in.coming</literal> directory, uncompresses them if necessary and
|
||||
feeds it to <literal>relaynews</literal> which then processes each
|
||||
<literal>article</literal> digesting and
|
||||
batching them and logging any errors. This script needs to run through cron
|
||||
as frequently as you want the feeds to be digested. Every half hour should
|
||||
suffice for a non-critical requirement.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>sendbatches</literal>:
|
||||
This script is run to transmit the togo files formed in
|
||||
the <literal>out.going</literal> directory to your NDNs. It reads the
|
||||
<literal>batchparms</literal> file to know
|
||||
exactly how and to whom the batches need to be transmitted. The frequency,
|
||||
again, can be set according to your requirements. Once an hour should be
|
||||
sufficient.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsdaily</literal>:
|
||||
This script does maintenance chores like rolling logs and
|
||||
saving them, reporting errors/anomalies and doing cleanup jobs.
|
||||
It should typically run once a day.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newswatch</literal>:
|
||||
This looks for news problems at a more detailed level than
|
||||
newsdaily like looking for persistent lock files, determining if there is
|
||||
enough space for a minimum no. of files, if there is a huge queue of
|
||||
unattended batches and the likes. This should typically run once every hour.
|
||||
For more on this and the above, read the <literal>newsmaint</literal>
|
||||
manpage.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>doexpire</literal>:
|
||||
This script expires old articles as determined by the
|
||||
control file <literal>explist</literal> and updates the
|
||||
<literal>active</literal> file. This is necessary if you do not
|
||||
want unnecessary/unwanted articels hogging up your disk space. Run it once
|
||||
a day. Manpage: <literal>expire</literal>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsrunning off/on</literal>:
|
||||
This script shuts/starts off the news server for you.
|
||||
You could choose to add this in your cron job if you think the news server
|
||||
takes up lots of CPU time during peak hours and you wish to keep a check on
|
||||
it.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section><title><literal>newsrun</literal> and <literal>relaynews</literal>: digesting received articles </title>
|
||||
<para>
|
||||
The heart and soul of the Usenet News system, <literal>newsrun</literal> just picks up the batches/
|
||||
articles in the <literal>in.coming</literal> directory of
|
||||
<literal>$NEWSARTS</literal> and uncompresses them (if required) and calls
|
||||
<literal>relaynews</literal>. It should run from cron.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<literal>relaynews</literal> picks up each <literal>article</literal> one by one
|
||||
through stdin, determines if it belongs to a subscribed group by looking up
|
||||
<literal>sys</literal> file, looks in the <literal>history</literal> file
|
||||
to determine that it does not already exist locally, digests it updating the
|
||||
<literal>active</literal> and <literal>history</literal> file and batches it
|
||||
for neighbouring sites. Logs errors on encountering problems while processing
|
||||
the <literal>article</literal> and takes appropriate action if it happens to be
|
||||
a control message. More info in manpage of <literal>relaynews</literal>.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title><literal>doexpire</literal> and <literal>expire</literal>: removing old articles </title>
|
||||
<para>
|
||||
A good way to get rid of unwanted/old articles from the
|
||||
<literal>$NEWSARTS</literal> area is to run doexpire once a day. It reads the
|
||||
<literal>explist</literal> file from the <literal>$NEWSCTL</literal> directory
|
||||
to determine what articles expire today. It can archive the
|
||||
said <literal>article</literal> if so configured. It then updates the
|
||||
<literal>active</literal> and the <literal>history</literal> file accordingly.
|
||||
If you wish to retain the <literal>article</literal> entry in the
|
||||
<literal>history</literal> file to avoid re-digesting it as a new
|
||||
article after having expired it add a special /expired/; line
|
||||
in the control file. More on the options and functioning in the expire manpage.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title><literal>nntpd</literal> and <literal>msgidd</literal>: managing the NNTP interface </title>
|
||||
<para>
|
||||
As has already been discussed in the chapter on setting up the software,
|
||||
<literal>nntpd</literal> is a TCP-based server daemon which runs under
|
||||
<literal>inetd</literal>. It is fired by <literal>inetd</literal>
|
||||
whenever there's an incoming connection on the NNTP port, and it takes
|
||||
over the dialogue from there. It reads the C-News configuration and data
|
||||
files in <literal>$NEWSCTL</literal>, article files from
|
||||
<literal>$NEWSARTS></literal>, and receives incoming posts and
|
||||
transfers. These it dutifully queues in
|
||||
<literal>$NEWSARTS/in.coming</literal>, either as batch files or single
|
||||
article files.</para>
|
||||
|
||||
<para>It is important that <literal>inetd</literal> be configured to
|
||||
fire <literal>nntpd</literal> as user <literal>news</literal>, not as
|
||||
<literal>root</literal> like it does for other daemons like
|
||||
<literal>telnetd</literal> or <literal>ftpd</literal>. If this is not
|
||||
done correctly, a lot of problems can be caused in the functioning of
|
||||
the C-News system later.</para>
|
||||
|
||||
<para><literal>nntpd</literal> is fired each time a new NNTP connection
|
||||
is received, and dies once the NNTP client closes its connection. Thus,
|
||||
if one <literal>nntpd</literal> receives a few articles by an incoming
|
||||
batch feed (not a <literal>POST</literal> but an <literal>XFER</literal>),
|
||||
then another <literal>nntpd</literal> will not know about the receipt of
|
||||
these articles till the batches are digested. This will hamper
|
||||
duplicate newsfeed detection if there are multiple upstream NDNs feeding
|
||||
our server with the same set of articles over NNTP. To fix this,
|
||||
<literal>nntpd</literal> uses an ally: <literal>msgidd</literal>, the
|
||||
message ID daemon. This
|
||||
daemon is fired once at server bootup time through
|
||||
<literal>newsboot</literal>, and keeps running quietly in the
|
||||
background, listening on a named Unix socket in the
|
||||
<literal>$NEWSCTL</literal> area. It keeps in its memory a list of all
|
||||
message IDs which various incarnations of <literal>nntpd</literal> have
|
||||
asked it to remember.</para>
|
||||
|
||||
<para>Thus, when one copy of <literal>nntpd</literal> receives an
|
||||
incoming feed of news articles, it updates <literal>msgidd</literal>
|
||||
with the message IDs of these messages through the Unix socket. When
|
||||
another copy of <literal>nntpd</literal> is fired later and the NNTP
|
||||
client tries to feed it some more articles, the <literal>nntpd</literal>
|
||||
checks each message ID against <literal>msgidd</literal>. Since
|
||||
<literal>msgidd</literal> stores all these IDs in memory, the lookup is
|
||||
very fast, and duplicate articles are blocked at the NNTP interface
|
||||
itself.</para>
|
||||
|
||||
<para>On a running system, expect to see one instance of
|
||||
<literal>nntpd</literal> for each active NNTP connection, and just one
|
||||
instance of <literal>msgidd</literal> running quietly in the background,
|
||||
hardly consuming any CPU resources. Our <literal>nntpd</literal> is
|
||||
configured to die if the NNTP connection is more than a few minutes
|
||||
idle, thus conserving server resources. This does not inconvenience the
|
||||
user because modern NNTP clients simply re-connect. If an
|
||||
<literal>nntpd</literal> instance is found to be running for days, it is
|
||||
either hung due to a network error, or is receiving a very long incoming
|
||||
NNTP feed from your upstream server. We used to receive our primary
|
||||
incoming feed from our service provider through NNTP sessions lasting 18
|
||||
to 20 hours without a break, every day.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title><literal>nov</literal>, the News Overview system</title>
|
||||
<para>NOV, the News Overview System is a recent augmentation to the
|
||||
C-News and NNTP systems and to the NNTP protocol. This subsystem
|
||||
maintains a file for each active newsgroup, in which it maintains one
|
||||
line per current article. This line of text contains some key meta-data
|
||||
about the article, <emphasis>e.g.</emphasis> the contents of the
|
||||
<literal>From</literal>, <literal>Subject</literal>,
|
||||
<literal>Date</literal> and the article size and message ID. This speeds
|
||||
up NNTP response enormously. The <literal>nov</literal> library has been
|
||||
integrated into the <literal>nntpd</literal> code, and into key binaries
|
||||
of C-News, thus providing seamless maintenance of the News Overview
|
||||
database when articles are added or deleted from the repository.</para>
|
||||
|
||||
<para>When <literal>newsrun</literal> adds an article into
|
||||
<literal>starcom.test</literal>, it also updates
|
||||
<literal>$NEWSARTS/starcom/test/.overview</literal> and adds a line with
|
||||
the relevant data, tab-separated, into it. When <literal>nntpd</literal>
|
||||
comes to life with an NNTP client, and it sees the
|
||||
<literal>XOVER</literal> NNTP command, it reads this
|
||||
<literal>.overview</literal> file, and returns the relevant lines to the
|
||||
NNTP client. When <literal>expire</literal> deletes an article, it also
|
||||
removes the corresponding line from the <literal>.overview</literal>
|
||||
file. Thus, the maintenance of the NOV database is seamless.</para>
|
||||
</section>
|
||||
|
||||
<section><title>Batching feeds with UUCP and NNTP</title>
|
||||
<para>Some information about batching feeds has been provided in earlier
|
||||
sections. More will be added later here in this document.</para>
|
||||
</section>
|
||||
|
||||
</chapter>
|
|
@ -0,0 +1,96 @@
|
|||
<chapter><title>Wrapping up</title>
|
||||
|
||||
<section><title>Acknowledgements</title>
|
||||
<para>
|
||||
This HOWTO is a by-product of many years of experience setting up and
|
||||
managing Usenet news servers. We have learned a lot from those who have
|
||||
trod the path ahead of us. Some of them include the team of the ERNET
|
||||
Project, which brought the Internet technology to India's academic
|
||||
institutions in the early nineties. We specially remember what we have
|
||||
learned from the SIGSys Group of the Department of Computer Science of
|
||||
the Indian Institute of Technology, Mumbai. We have also benefited
|
||||
enormously from the guidance we received from the Networking Group at
|
||||
the NCST in Mumbai, specially from Geetanjali Sampemane.
|
||||
</para>
|
||||
|
||||
<para>On a wider scale, our learning along the path of systems and
|
||||
networks started with Unix, without which our appreciation of computer
|
||||
systems would have remained very fragmented and superficial. Our
|
||||
insight into Unix came from our village elders at the Department
|
||||
of Computer Science of the IIT at Mumbai, specially from ``Hattu,''
|
||||
``Sathe,'' and ``Sapre,'' none of which are with the IIT today, and from
|
||||
Professor D.B.Phatak and others, many of whom, luckily are still with
|
||||
the IIT.</para>
|
||||
|
||||
<para>Coming down to specifics, all the members of Starcom Software who
|
||||
have worked on various problems with networking, Linux, and Usenet news
|
||||
installations, have helped the authors in understanding what works and
|
||||
what doesn't. Without their work, this HOWTO would have been a dry text
|
||||
book.</para>
|
||||
</section>
|
||||
|
||||
<section><title>Comments invited</title>
|
||||
<para>Your comments and contributions are invited. We cannot possibly
|
||||
write all sections of this HOWTO based on our knowledge alone. Please
|
||||
contribute all you can, starting with minor corrections and bug fixes
|
||||
and going on to entire sections and chapters. Your contributions will be
|
||||
acknowledged in the HOWTO.</para>
|
||||
</section>
|
||||
|
||||
<section><title>Copyright</title>
|
||||
<para>
|
||||
Copyright (c) 2002 by Starcom Software Private Limited, India
|
||||
</para>
|
||||
|
||||
<para>Please freely copy and distribute (sell or give away) this
|
||||
document in any format. It is requested that corrections and/or
|
||||
comments be fowarded to the document maintainer, reachable at
|
||||
<literal>usenet@starcomsoftware.com</literal>. When these comments
|
||||
and contributions are incorporated into this document and released
|
||||
for distribution in future versions of this HOWTO, the content of the
|
||||
incorporated text will become the copyright of Starcom Software Private
|
||||
Limited. By submitting your contributions to us, you implicitly agree
|
||||
to these terms.</para>
|
||||
|
||||
<para>You may create a derivative work and distribute it provided that
|
||||
you:</para>
|
||||
|
||||
<orderedlist>
|
||||
<listitem><para>
|
||||
Send your derivative work (in the most suitable format such as SGML) to the
|
||||
LDP (Linux Documentation Project) or the like for posting on the Internet.
|
||||
If not the LDP, then let the LDP know where it is available.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
License the derivative work with this same license or use GPL. Include a
|
||||
copyright notice and at least a pointer to the license used.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Give due credit to previous authors and major contributors.
|
||||
If you're considering making a derived work other than a
|
||||
translation, it is requested that you discuss your plans with the
|
||||
current maintainer.
|
||||
</para></listitem>
|
||||
</orderedlist>
|
||||
</section>
|
||||
|
||||
<section><title>About Starcom Software Private Limited</title>
|
||||
<para>
|
||||
<emphasis role=bold>starcom</emphasis> (Starcom Software Private
|
||||
Limited, <literal>www.starcomsoftware.com</literal>) has been building
|
||||
products and solutions using Linux and Web technology since 1996. Our
|
||||
entire office runs on Linux, and we have built mission-critical
|
||||
solutions for some of the top corporate entities in India and abroad.
|
||||
Our client list includes arguably the world's largest securities
|
||||
depository (The National Securities Depository of India Limited), one of
|
||||
the world's top five stock exchanges in terms of trading volumes (The
|
||||
National Stock Exchange of India Limited), and one of India's premier
|
||||
financial institutions, which is listed on the NYSE. In all these cases,
|
||||
we have introduced them to Linux, and in many cases, we have built them
|
||||
their first mission-critical business applications on Linux. Contact the
|
||||
authors or check the Website for more information about the work we have done.
|
||||
</para>
|
||||
</section>
|
||||
</chapter>
|
|
@ -0,0 +1,218 @@
|
|||
<chapter><title>Documentation and information</title>
|
||||
<section><title>The manpages</title>
|
||||
|
||||
<para>The following manpages are installed automatically when our
|
||||
integrated software distribution is compiled and installed, listed here
|
||||
in no particular order:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>badexpiry:</literal>
|
||||
utility to look for articles with bad explicit Expiry headers
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>checkactive:</literal>
|
||||
utility to perform some sanity checks on the <literal>active</literal>
|
||||
file
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>cnewsdo:</literal>
|
||||
utility to perform some checks and then run C-News maintenance commands
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>controlperm:</literal>
|
||||
configuration file for controlling responses to Usenet control messages
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>expire:</literal>
|
||||
utility to expire old articles
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>explode:</literal>
|
||||
internal utility to convert a master batch file to ordinary batch files
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>inews:</literal>
|
||||
the program which forms the entry point for fresh postings to be
|
||||
injected into the Usenet system
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>mergeactive:</literal>
|
||||
utility to merge one site's newsgroups to another site's
|
||||
<literal>active</literal> file
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>mkhistory:</literal>
|
||||
utility to rebuild news <literal>history</literal> file
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>news(5):</literal>
|
||||
description of Usenet news article file and batch file formats
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsaux:</literal>
|
||||
a collection of C-News utilities used by its own scripts and by the
|
||||
Usenet news administrator for various maintenance purposes
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsbatch:</literal>
|
||||
covers all the utilities and programs which are part of the news
|
||||
batching system of C-News
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsctl:</literal>
|
||||
describes the file formats and uses of all the files in
|
||||
<literal>$NEWSCTL</literal> other than the two key files,
|
||||
<literal>sys</literal> and <literal>active</literal>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsdb:</literal>
|
||||
describes the key files and directories for news articles, including the
|
||||
structure of <literal>$NEWSARTS</literal>, the <literal>active</literal>
|
||||
file, the <literal>active.times</literal> file, and the
|
||||
<literal>history</literal> file.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsflag:</literal>
|
||||
utility to change the flag or type column of a newsgroup in the
|
||||
<literal>active</literal> file
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsmail:</literal>
|
||||
utility scripts used to send and receive newsfeeds by email. This is
|
||||
different from a mail-to-news gateway, since this is for communication
|
||||
between two Usenet news servers.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsmaint:</literal>
|
||||
utility scripts used by Usenet administrator to manage and maintain
|
||||
C-News system
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsoverview(5):</literal>
|
||||
file formats for the NOV database
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsoverview(8):</literal>
|
||||
library functions of the NOV library and the utilities which use them
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newssys:</literal>
|
||||
the important <literal>sys</literal> file of C-News
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>relaynews:</literal>
|
||||
the <literal>relaynews</literal> program of C-News
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>report:</literal>
|
||||
utility to generate and send email reports of errors and events from
|
||||
C-News scripts
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>rnews:</literal>
|
||||
receive news batches and queue them for processing
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>nntpd:</literal>
|
||||
The NNTP daemon
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>nntpxmit:</literal>
|
||||
The NNTP batch transmit program for outgoing push feeds
|
||||
</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>The C-News guide</title>
|
||||
|
||||
<para>This document is part of the C-News source, and is available in
|
||||
the <literal>c-news/doc</literal> directory of the source tree. The
|
||||
<literal>makefile</literal> here uses <literal>troff</literal> and the
|
||||
source files to generate <literal>guide.ps</literal>. This C-News Guide
|
||||
is a very well-written document to provide an introduction to the
|
||||
functioning of C-News.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>O'Reilly's books on Usenet news</title>
|
||||
|
||||
<para>O'Reilly and Associates had an excellent book that can form the
|
||||
foundations for understanding C-News and Usenet news in general, titled
|
||||
``Managing UUCP and Usenet,'' dated 1992. This was considered a bit
|
||||
dated because it did not cover INN or the Internet protocols.</para>
|
||||
|
||||
<para>They have subsequently published a more recent book, titled
|
||||
``Managing Usenet,'' written by Henry Spencer, the co-author of C-News,
|
||||
and David Lawrence, one of the most respected Usenet veterans and
|
||||
administrators today. The book was published in 1998 and includes both
|
||||
C-News and INN.</para>
|
||||
|
||||
<para>We have a distinct preference for books published by O'Reilly; we
|
||||
usually find them the best books on their subjects. We make no attempts
|
||||
to hide this bias. We recommend both books. We believe that there is
|
||||
very little in this HOWTO of value to someone who studies one of these
|
||||
books and then peruses information on the Internet.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Usenet-related RFCs</title>
|
||||
|
||||
<para>TO BE ADDED</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>The source code</title>
|
||||
|
||||
<para>TO BE ADDED</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Usenet newsgroups</title>
|
||||
|
||||
<para>There are many discussion groups on the Usenet dedicated to the
|
||||
technical and non-technical issues in managing a Usenet server and
|
||||
service. These are:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>news.admin.technical</literal>
|
||||
Discusses technical issues about administering Usenet news
|
||||
</para></listitem>
|
||||
<listitem><para><literal>news.admin.policy</literal>
|
||||
Discusses policy issues about Usenet news
|
||||
</para></listitem>
|
||||
<listitem><para><literal>news.software.b</literal>
|
||||
Discusses C-News (no separate newsgroup was created after B-News gave
|
||||
way to C-News) source, configuration and bugs (if any)
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>MORE WILL BE ADDED LATER</para>
|
||||
</section>
|
||||
|
||||
<section><title>We</title>
|
||||
|
||||
<para>We, at Starcom Software, offer the services of our Usenet news
|
||||
team to provide assistance to you by email, as a service to the Linux
|
||||
and Usenet administrator community, on a best effort basis.</para>
|
||||
|
||||
<para>We will endeavour to answer all queries sent to
|
||||
<literal>usenet@starcomsoftware.com</literal>, pertaining to the source
|
||||
distribution we have put together and its configuration and maintenance,
|
||||
and also pertaining to general technical issues related to running a
|
||||
Usenet news service off a Unix or Linux server.</para>
|
||||
|
||||
<para>We may not be in a position to assist with software components we
|
||||
are not familiar with, <emphasis>e.g.</emphasis> Leafnode, or platforms
|
||||
we do not have access to, <emphasis>e.g.</emphasis> SGI IRIX. Intel
|
||||
Linux will be supported as long as our group is alive; our entire office
|
||||
runs on Linux servers and diskless Linux desktops.</para>
|
||||
|
||||
<para>You do not need to be dependent on us, because neither do we have
|
||||
proprietary knowledge nor proprietary closed-source software. All the
|
||||
extensions we are currently involved in with C-News and NNTPd will
|
||||
immediately be made available to the Internet.</para>
|
||||
|
||||
</section>
|
||||
</chapter>
|
|
@ -0,0 +1,50 @@
|
|||
<chapter><title>Setting up INN</title>
|
||||
|
||||
<section><title>Getting the source</title>
|
||||
<para>INN is maintained and archived by the ISC (Internet Software
|
||||
Consortium, <literal>www.isc.org</literal>) since 1996, and the INN
|
||||
homepage is at <literal>http://www.isc.org/products/INN/</literal>. The
|
||||
latest release of INN as of the time of this writing is INN v2.3.3,
|
||||
released 7 May 2002. The full sources can be downloaded from
|
||||
<literal>ftp://ftp.isc.org/isc/inn/inn-2.3.3.tar.gz</literal></para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Compiling and installing</title>
|
||||
|
||||
<para>TO BE ADDED LATER.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Configuring the system</title>
|
||||
|
||||
<para>TO BE ADDED LATER.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Setting up <literal>pgpverify</literal></title>
|
||||
|
||||
<para>TO BE ADDED LATER.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Feeding off an upstream neighbour</title>
|
||||
|
||||
<para>TO BE ADDED LATER.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Setting up outgoing feeds</title>
|
||||
|
||||
<para>TO BE ADDED LATER.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section id=innefficiency>
|
||||
<title>Efficiency issues and advantages</title>
|
||||
|
||||
<para>TO BE ADDED LATER.</para>
|
||||
|
||||
</section>
|
||||
|
||||
</chapter>
|
|
@ -0,0 +1,101 @@
|
|||
<chapter><title>Connecting email with Usenet news</title>
|
||||
<para>
|
||||
Usenet news and mailing lists constantly remind us of each other. And the
|
||||
parallels are so strong that many mailing lists are gatewayed two-way
|
||||
with corresponding Usenet newsgroups, in the <literal>bit</literal> hierarchy
|
||||
which maps onto the old BITNET, and elsewhere.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There are probably ten different situations where a mailing list is
|
||||
better, and ten others where the newsgroup approach works better. The
|
||||
point to recognise is that the system administrator needs a choice of
|
||||
gatewaying one with the other, whenever tradeoffs justify it. Instead
|
||||
of getting into the tradeoffs themselves, this chapter will then focus
|
||||
on the mechanisms of gatewaying the two worlds.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
One clear and recurring use we find for this gatewaying is for mailing
|
||||
lists which are of general use to many employees in a corporate network.
|
||||
For instance, in stockbroking company, many employees may like to
|
||||
subscribe to a business news mailing list. If each employee had to
|
||||
subscribe to the mailing list independently, it would waste mail spool
|
||||
area and perhaps bandwidth. In such situations, we receive the mailing
|
||||
list into an internal newsgroup, so that individual mailboxes are not
|
||||
overloaded. Everyone can then read the newsgroup, and messages are also
|
||||
archived till expired.
|
||||
</para>
|
||||
|
||||
<section><title>Feeding Usenet news to email</title>
|
||||
|
||||
<para>
|
||||
In CNews, this is trivially done by adding one line to the
|
||||
<literal>sys</literal> file, defining a new outgoing feed listing all the
|
||||
relevant groups and distributions, and specifying the commandline to be executed
|
||||
which is supposed to send out the outgoing message to that ``feed.'' This
|
||||
command, in our case, should be a mail-sending program,
|
||||
<emphasis>e.g.</emphasis>
|
||||
<literal>/bin/mail user@somewhere.com</literal>. This is often adequate to get
|
||||
the job done. We are sure almost every Usenet news software system will have
|
||||
an equally easy way of piping the feed of a newsgroup to an email address.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title>Feeding email to news: the <literal>mail2news gateway</literal></title>
|
||||
|
||||
<para>With our Usenet software sources has been integrated a set of
|
||||
scripts which we have been using for at least five years internally.
|
||||
This set of scripts is called <literal>mail2news</literal>. It contains
|
||||
one shellscript called <literal>mail2news</literal>, which takes an
|
||||
email message from <literal>stdin</literal>, processes it, and feeds the
|
||||
processed version to <literal>inews</literal>, the
|
||||
<literal>stdin</literal>-based news article injection utility of C-News.
|
||||
The <literal>inews</literal> utility accepts a new article post in its
|
||||
<literal>stdin</literal> and queues it for digestion by
|
||||
<literal>newsrun</literal> whenever it runs next.</para>
|
||||
|
||||
<para>To use <literal>mail2news</literal>, we assume you are using
|
||||
Sendmail to process incoming email. Our instructions can easily be
|
||||
modified to adapt to any Mail Transport Agent (MTA) of your choice. You
|
||||
will have to configure Sendmail or any other MTA to redirect incoming
|
||||
mails for the gateway to a program called <literal>m2nmailer</literal>,
|
||||
a Perlscript which accepts the incoming message in its standard input
|
||||
and a list of newsgroup names, space separated, on its command line.
|
||||
Sendmail can be easily configured to trigger <literal>m2nmailer</literal>
|
||||
this way by defining a new mailer in <literal>sendmail.cf</literal>,
|
||||
and directing all incoming emails meant for the Usenet news system to
|
||||
this mailer. Once you set up the appropriate rulesets for Sendmail,
|
||||
it automatically triggers <literal>m2nmailer</literal> each time an
|
||||
incoming email comes for the <literal>mail2news</literal>
|
||||
gateway.</para>
|
||||
|
||||
<para>The precise configuration changes to Sendmail have already been
|
||||
specified in the chapter titled ``Setting up C-News + NNTPd.''</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Using GNU Mailman as an email-NNTP gateway</title>
|
||||
|
||||
<para>TO BE ADDED LATER</para>
|
||||
|
||||
<section><title>GNU's all-singing all-dancing MLM</title>
|
||||
|
||||
<para>TO BE ADDED LATER</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Features of GNU Mailman</title>
|
||||
|
||||
<para>TO BE ADDED LATER</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Gateway features connecting NNTP and email</title>
|
||||
|
||||
<para>TO BE ADDED LATER</para>
|
||||
|
||||
</section>
|
||||
|
||||
</section>
|
||||
</chapter>
|
|
@ -0,0 +1,194 @@
|
|||
<chapter><title>Monitoring and administration</title>
|
||||
<para>
|
||||
Once the Usenet News system is in place and running, the news administrator
|
||||
is then aided in monitoring the system by various reports generated by the
|
||||
system. Also, he needs to make regular checks in specific directories and
|
||||
file to ascertain the smooth working of the system.
|
||||
</para>
|
||||
|
||||
<section><title>The <literal>newsdaily</literal> report</title>
|
||||
<para>
|
||||
This report is generated by newsdaily which is typically run through cron. I
|
||||
shall enumerate some of the problems reported based on what I have seen.
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>bad input batches:
|
||||
This reports articles that have been processed and
|
||||
declared bad and hence not digested. The reason for it is not mentioned. You
|
||||
are expected to check the article and determine the cause.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>leading unknown newsgroups by articles:
|
||||
This gives a list of newsgroups
|
||||
whose hierarchy has been subscribed to, but the specific newsgroup does not
|
||||
appear in the active file. You could add the newsgroup in the active file if
|
||||
you think it is important enough.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>leading unsubscribed newsgroups:
|
||||
This gives a list of newsgroups
|
||||
that have not been subscribed to, of which the news server receives a
|
||||
maximum no. of articles. You really cannot do much about this except to
|
||||
subscribe to them if they are required.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>leading sites sending bad headers:
|
||||
This will list your NDNs who
|
||||
are sending articles with malformed/insufficient headers.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>leading sites sending stale/future/misdated news:
|
||||
This will list your NDNs who are sending you articles that are older than
|
||||
the date you have specified for accepting feeds.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Some of the reports generated by us:
|
||||
We have modified the newsdaily script to include some more statistics.
|
||||
<itemizedlist>
|
||||
<listitem><para>disk usage:
|
||||
This reports the size in bytes of the <literal>$NEWSARTS</literal>
|
||||
area. If you are receiving feeds regularly, you should see this figure
|
||||
increasing.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>incoming feed statistics:
|
||||
This reports the no. of articles and total bytes recevied from each of
|
||||
your NDNs.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>NNTP traffic report:
|
||||
The output of nestor has also been included in
|
||||
this report which gives details of each nntp connection and the overall
|
||||
performance of the network connection read from the newslog file.
|
||||
To understand the format, read manpage of nestor.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Error reporting from the errorlog file:
|
||||
Reports errors logged in the errorlog file. Usually these are file
|
||||
ownership or file missing problems which can be easily handled.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section><title>Crisis reports from <literal>newswatch</literal></title>
|
||||
<para>
|
||||
Most of the problems reported to me are ones with either space shortage or
|
||||
persistent locks. There are instances when the scripts have created locks files
|
||||
and have aborted/terminated without removing them. Sometimes they are
|
||||
innocuous enough to be deleted but this should be determined after a careful
|
||||
analysis. They could be an indication of some part of the system not working
|
||||
correctly. For <emphasis>e.g.</emphasis> I would receive this error message when
|
||||
sendbatches would abnormally terminate trying to transmit huge togo files. I had
|
||||
to determine why sendbatches was failing this often.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The space shortage issue has to be addressed immediately. You could
|
||||
delete unwanted articles by running doexpire or add more disk space at the OS
|
||||
level. The latter seems a better option.
|
||||
<para>
|
||||
</section>
|
||||
|
||||
<section><title>Disk space</title>
|
||||
<para>
|
||||
The <literal>$NEWSBIN</literal> area occupies space that is fixed. Since the
|
||||
binaries do not grow once installed, you do not have to worry about disk
|
||||
shortage here. The areas that take up more space as feeds come in are
|
||||
<literal>$NEWSCTL</literal> and <literal>$NEWSARTS</literal>. The
|
||||
<literal>$NEWSCTL</literal> has log files that keep growing with each feed and
|
||||
as the articles are digested in huge numbers the <literal>$NEWSARTS</literal>
|
||||
continues to grow. Also, if articles are being archived on expiry you will need
|
||||
space. Allocate a few GB of disk space for <literal>$NEWSARTS</literal>
|
||||
depending on the no. of hierarchies you are subscribing and the feeds that come
|
||||
in everyday. <literal>$NEWSCTL</literal> grows to a lesser proportion as
|
||||
compared to <literal>$NEWSARTS</literal>. Allocate space for this accordingly.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title>CPU load and RAM usage</title>
|
||||
<para>With modern C-News and NNTPd, there is very little usage of these
|
||||
system resources for processing news article flow. Key components like
|
||||
<literal>newsrun</literal> or <literal>sendbatches</literal> do not load
|
||||
the system much, except for cases where you have a very heavy flow of
|
||||
compressed outgoing batches and the compression utility is run by
|
||||
<literal>sendbatches</literal> frequently. <literal>newsrun</literal> is
|
||||
amazingly efficient in the current C-News release. Even when it takes
|
||||
half an hour to digest a large consignment of batches, it hardly loads the
|
||||
CPU of a slow Pentium 200 MHz CPU or consumes much RAM in a 64 MB
|
||||
system.</para>
|
||||
|
||||
<para>One thing which does slow down a system is a large bunch of
|
||||
users connecting using NNTP to browse newsgroups. We do not have
|
||||
heuristic based figures off-hand to provide a guidance figure for
|
||||
resource consumption for this, but we have found that the load on the
|
||||
CPU and RAM for a certain number of active users invoking
|
||||
<literal>nntpd</literal> is more than with an equal number of
|
||||
users connecting to the POP3 port of the same system for pulling
|
||||
out mailboxes. A few hundred active NNTP users can really slow down
|
||||
a dual-P-III Intel Linux server, for instance. This loading has no
|
||||
bearing on whether you are using INN or <literal>nntpd</literal>;
|
||||
both have practically identical implementations for NNTP
|
||||
<emphasis>reading</emphasis> and differ only in their handling of
|
||||
feeds.</para>
|
||||
|
||||
<para>Another situation which will slow down your Usenet news server is
|
||||
when downstream servers connect to you for pulling out NNTP feeds using
|
||||
the pull method. This has been mentioned before. This can really load
|
||||
your server's I/O system and CPU.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>The <literal>in.coming/bad</literal> directory</title>
|
||||
<para>
|
||||
The in.coming directory is where the batches/articles reside when you have
|
||||
received feeds from your NDN and before processing happens. Checking this
|
||||
directory regularly to see if there are batches is a good way of determining
|
||||
that feeds are coming in. The batches and articles have different nomenclature.
|
||||
Names like nntp.GxhsDj are indicative of batches and individual
|
||||
articles are named beginning with digits like <literal>0.10022643380.t</literal>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The bad sub-directory under in.coming holds batches/articles that have
|
||||
encountered errors when they were being processed by relaynews. You will have
|
||||
to look into the directory for the cause of it. Ideally speaking, this
|
||||
directory should be empty.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title>Long pending queues in <literal>out.going</literal></title>
|
||||
|
||||
<para>TO BE ADDED.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Problems with <literal>nntpxmit</literal> and <literal>nntpsend</literal></title>
|
||||
|
||||
<para>TO BE ADDED.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>The <literal>junk</literal> and <literal>control</literal> groups</title>
|
||||
<para>
|
||||
Control messages are those that have a newgroup/rmgroup/cancel/checkgroup in
|
||||
their subject line. Such messages result in relaynews calling the appropriate
|
||||
script and on execution a message is mailed to the admin about the action
|
||||
taken. These control messages are stored in the control directory of
|
||||
<literal>$NEWSARTS</literal>. For the propogation of such messages, one must
|
||||
subscribe to the control hierarchy.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When your news system determines that a certain article has not been subscribed
|
||||
by you, it is 'junked' i.e. such articles appear in the junk directory. This
|
||||
directory plays a key role in transferring articles to your NDNs as they would
|
||||
subscribe to the junk hierarchy to receive feeds. If you are a leaf node, there
|
||||
is no reason why articles should pile here. Keep deleting them on a daily
|
||||
basis.
|
||||
</para>
|
||||
|
||||
</section>
|
||||
</chapter>
|
|
@ -0,0 +1,327 @@
|
|||
<chapter><title>Our perspective</title>
|
||||
<para>
|
||||
This chapter has been added to allow us to share our perspective on
|
||||
certain technical choices. Certain issues which are more a matter of
|
||||
opinion than detail, are discussed here.
|
||||
</para>
|
||||
|
||||
<section id=feedefficiency><title>Efficiency issues of NNTP</title>
|
||||
<para>
|
||||
To understand why NNTP is often an inappropriate choice for
|
||||
newsfeeds, we need to understand TCP's sliding window protocol
|
||||
and the nature of NNTP. NNTP is an apalling waste of bandwidth
|
||||
for most bulk article transfer situations, because of the
|
||||
following simple reasons:
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>
|
||||
<emphasis>No compression</emphasis>: articles are transferred in plain text.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
<emphasis>No article transmission restart</emphasis>: if a
|
||||
connection breaks halfway through an article, the next round
|
||||
will have to start with the beginning of the article.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
<emphasis>Ping-pong protocol</emphasis>: NNTP is unsuitable for
|
||||
bulk streaming data transfer.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
A word of explanation on the ping-pong issue is perhaps
|
||||
needed here. TCP uses a sliding window mechanism to pump out
|
||||
data in one direction very rapidly, and can achieve near
|
||||
wire speeds under most circumstances. However, this only
|
||||
works if the application layer protocol can aggregate a
|
||||
large amount of data and pump it out without having to stop
|
||||
every so often, waiting for an ack or a response from the
|
||||
other end's application layer. This is precisely why sending
|
||||
one file of 100~Mbytes by FTP takes so much less clock time
|
||||
than 10,000 files of 10~Kbytes each, all other parameters
|
||||
remaining unchanged. The trick is to keep the sliding window
|
||||
sliding smoothly over the outgoing data, blasting packets
|
||||
out as fast as the wire will carry it, without ever
|
||||
allowing the window to empty out while you wait for an ack.
|
||||
Protocols which require short bursts of data from either end
|
||||
constantly, <emphasis>e.g.</emphasis> in the case of remote
|
||||
procedure calls, are called ``ping pong protocols'' because they
|
||||
remind you of a table-tennis ball.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
With NNTP, this is precisely the problem. The average size
|
||||
of Usenet news messages, including header and body, is
|
||||
3 Kbytes. When thousands of such articles are sent out by
|
||||
NNTP, the sending server has to send the message~ID of the
|
||||
first article, then wait for the receiving server to respond
|
||||
with a ``yes'' or ``no.'' Once the sendiing server gets the
|
||||
``yes'', it sends out that article, and waits for an ``ok''
|
||||
from the receiving server. Then it sends out the message~ID
|
||||
of the second article, and waits for another ``yes'' or
|
||||
``no.'' And so on. The TCP sliding window never gets to do
|
||||
its job.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This sub-optimal use of TCP's data pumping ability, coupled with
|
||||
the absence of compression, make for a protocol which is great
|
||||
for synchronous connectivity, <emphasis>e.g.</emphasis> for news
|
||||
reading or real-time
|
||||
updates, but very poor for batched transfer of data which can be
|
||||
delayed and pumped out. All these are precisely reversed in the
|
||||
case of UUCP over TCP.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
To decide which protocol, UUCP over TCP or NNTP, is appropriate
|
||||
for your server, you must address two questions:
|
||||
</para>
|
||||
|
||||
<orderedlist>
|
||||
<listitem><para>
|
||||
How much time can your server afford to wait from the time
|
||||
your upstream server receives an article to the time it
|
||||
passes it on to you?
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Are you receiving the same set of hierarchies from multiple
|
||||
next-door neighbour servers, <emphasis>i.e.</emphasis> is your
|
||||
newsfeed flow pattern a mesh instead of a tree?
|
||||
</para></listitem>
|
||||
</orderedlist>
|
||||
|
||||
<para>
|
||||
If your answers to the two questions above are ``messages cannot
|
||||
wait'' and ``we operate in a mesh'', then NNTP is the correct
|
||||
protocol for your server to receive its primary feed(s).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In most cases, carrier-class servers operated by major service
|
||||
providers do not want to accept even a minute's delay from the
|
||||
time they receive an article to the time they retransmit it out.
|
||||
They also operate in a mesh with other servers operated by their
|
||||
own organisations (<emphasis>e.g.</emphasis> for redundancy) or
|
||||
others. They usually
|
||||
sit very close to the Internet backbone,
|
||||
<emphasis>i.e.</emphasis> with Tier 1 ISPs,
|
||||
and have extremely fast Internet links, usually more than
|
||||
10 Mbits/sec. The amount of data that flows out of such servers
|
||||
in outgoing feeds is more than the amount that comes in, because
|
||||
each incoming article is retained, not for local consumption,
|
||||
but for retransmission to others lower down in the flow. And
|
||||
these servers boast of a retransmission latency of less than 30
|
||||
seconds, <emphasis>i.e.</emphasis> I will retransmit an article
|
||||
to you within 30 seconds of my having received it.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
However, if your server is used by a company for making Usenet
|
||||
news available for its employees, or by an institute to make the
|
||||
service available for its students and teachers, then you are
|
||||
not operating your server in a mesh pattern, nor do you mind it
|
||||
if messages take a few hours to reach you from your upstream
|
||||
neighbour.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In that case, you have enormous bandwidth to conserve by moving
|
||||
to UUCP. Even if, in this Internet-dominated era, you have no
|
||||
one to supply you with a newsfeed using dialup point-to-point
|
||||
links, you can pick up a compressed batched newsfeed using UUCP
|
||||
over TCP, over the Internet.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In this context, we want to mention Taylor UUCP, an excellent
|
||||
UUCP implementation available under GNU GPL. We use this UUCP
|
||||
implementation in preference to the bundled UUCP systems offered
|
||||
by commercial Unix vendors even for dialup connections, because
|
||||
it is far more stable, high performance, and always supports
|
||||
file transfer restart. Over TCP/IP, Taylor is the only one we
|
||||
have tried, and we have no wish to try any others.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Apart from its robustness, Taylor UUCP has one invaluable
|
||||
feature critical to large Usenet batch transfers: file transfer
|
||||
restart. If it is transferring a 10 MB batch, and the connection
|
||||
breaks after 8 MB, it will restart precisely where it left off
|
||||
last time. Therefore, no bytes of bandwidth are wasted, and
|
||||
queues never get stuck forever. </para>
|
||||
|
||||
<para>
|
||||
Over NNTP, since there is no batching, transfers happen one
|
||||
article at a time. Considering the (relatively) small size of an
|
||||
article compared to multi-megabyte UUCP batches, one would
|
||||
expect that an article would never pose a major problem while
|
||||
being transported; if it can't be pushed across in one attempt,
|
||||
it'll surely be copied the next time. However, we have
|
||||
experienced entire NNTP feeds getting stuck for days on end
|
||||
because of one article, with logs showing the same article
|
||||
breaking the connection over and over again while being
|
||||
transferred <footnote><para>
|
||||
This lack of a restart facility is something NNTP shares with
|
||||
its older cousin, SMTP, and we have often seen email messages
|
||||
getting stuck in a similar fashion over flaky data links. In
|
||||
many such networks which we manage for our clients, we have
|
||||
moved the inter-server mail transfer to Taylor UUCP, using UUCP
|
||||
over TCP.</para></footnote>. Some rare articles can be
|
||||
more than a megabyte in size, particularly in
|
||||
<literal>comp.binaries</literal>. In each such incident, we have
|
||||
had to manually edit the queue file on the transmitting server
|
||||
and remove the offending article from the head of the queue.
|
||||
Taylor UUCP, on the other hand, has never given us a single
|
||||
hiccup with blocked queues.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
We feel that the overwhelming majority of servers offering the
|
||||
Usenet news service are at the leaf nodes of the Usenet news
|
||||
flow, not at the heart. These servers are usually connected in a
|
||||
tree, with each server having one upstream ``parent node'', and
|
||||
multiple downstream ``child nodes.'' These servers receive their
|
||||
bulk incoming feed from their upstream server, and their users
|
||||
can tolerate a delay of a few hours for articles to move in and
|
||||
out. If your server is in this class, we feel you should
|
||||
consider using UUCP over TCP and transfer compressed batches.
|
||||
This will minimise bandwidth usage, and if you operate using
|
||||
dialup Internet connections, it will directly reduce your
|
||||
expenses.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
A word about the link between mesh-patterned newsfeed flow and
|
||||
the need to use NNTP. If your server is receiving primary ---
|
||||
as against trickle --- feeds from multiple next-door neighbours,
|
||||
then you have to use NNTP to receive these feeds. The reason
|
||||
lies in the way UUCP batches are accepted. UUCP batches are
|
||||
received in their entirety into your server, and then they are
|
||||
uncompressed and processed. When the sending server is giving
|
||||
you the batch, it is not getting a chance to go through the
|
||||
batch article by article and ask your server whether you have or
|
||||
don't have each article. This way, if multiple servers give you
|
||||
large feeds for the same hierarchies, then you will be bound to
|
||||
receive multiple copies of each article if you go the UUCP way.
|
||||
All the gains of compressed batches will then be neutralised.
|
||||
NNTP's <literal>IHAVE</literal> and <literal>SENDME</literal>
|
||||
dialogue in effect
|
||||
permits precisely this double-check for each article, and thus
|
||||
you don't receive even a single article twice.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For Usenet servers which connect to the Internet periodically
|
||||
using dialup connections to fetch news, the UUCP option is
|
||||
especially important. Their primary incoming newsfeed cannot be
|
||||
pushed into them using queued NNTP feeds for reasons described
|
||||
in the above <link linkend="dialupnonntp">paragraph</link>
|
||||
These
|
||||
hapless servers are usually forced to pull out their articles
|
||||
using a pull NNTP feed, which is often very slow. This may lead
|
||||
to long connect times, repeat attempts after every line break,
|
||||
and high Internet connection charges.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
On the other hand, we have been using UUCP over TCP and
|
||||
<literal>gzip</literal>'d batches for more than five years now
|
||||
in a variety of sites. Even today, a full feed of all eight
|
||||
standard hierarchies, plus the full
|
||||
<literal>microsoft</literal>, <literal>gnu</literal>
|
||||
and <literal>netscape</literal> hierarchies, minus
|
||||
<literal>alt</literal> and <literal>comp.binaries</literal>, can
|
||||
comfortably be handled in just a few hours of connect time every
|
||||
night, dialing up to the
|
||||
Internet at 33.6 or 56 Kbits/sec. We believe that the proverbial
|
||||
`full feed' with all hierarchies including
|
||||
<literal>alt</literal> can be handled comfortably with a 24-hour
|
||||
link at 56 Kbits/sec, provided you forget about NNTP feeds. We
|
||||
usually get compression ratios of 4:1 using
|
||||
<literal>gzip -9</literal> on our news batches, incidentally.
|
||||
</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>C-News+NNTPd or INN?</title>
|
||||
<para>
|
||||
INN and CNews are the two most popular free software implementations
|
||||
of Usenet news. Of these two, we prefer CNews, primarily because
|
||||
we have been using it across a very large range of Unixen for more
|
||||
than one decade, starting from its earliest release --- the so-called
|
||||
``Shellscript release'' --- and we have yet to see a need to
|
||||
change.<footnote><para>One of us did his first installation with with BNews,
|
||||
actually, at the IIT Mumbai. Then we rapidly moved from there to CNews
|
||||
Shellscript Release, then CNews Performance Release, CNews Cleanup
|
||||
Release, and our current release has fixed some bugs in the latest
|
||||
Cleanup Release.</para></footnote>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
We have seen INN, and we are not comfortable with a software
|
||||
implementation which puts in so much of functionality inside one
|
||||
executable. This reminds us of Windows NT, Netscape Communicator,
|
||||
and other complex and monolithic systems, which make us uncomfortable
|
||||
with their opaqueness. We feel that CNews' architecture, which comprises
|
||||
many small programs, intuitively fits into the Unix approach of building
|
||||
large and complex systems, where each piece can be understood, debugged,
|
||||
and if needed, replaced, individually.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Secondly, we seem to see the move towards INN accompanied by a move
|
||||
towards NNTP as a primary newsfeed mechanism. This is no fault of INN;
|
||||
we suspect it is a sort of cultural difference between INN users and
|
||||
CNews users. We find the issue of UUCP versus NNTP for batched newsfeeds
|
||||
a far more serious issue than the choice of CNews versus INN. We simply
|
||||
cannot agree with the idea that NNTP is an appropriate protocol for bulk
|
||||
Usenet feeds for most sites. Unfortunately, we seem to find that most
|
||||
sites which are more comfortable using INN seem to also prefer NNTP over
|
||||
UUCP, for reasons not clear to us.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Our comments should not be taken as expressing any reservation about
|
||||
INN's quality or robustness. Its popularity is testimony to its
|
||||
quality; it most certainly ``gets the job done'' as well as anything
|
||||
else. In addition, there are a large number of commercial Usenet news
|
||||
server implementations which have started with the INN code; we do not
|
||||
know of any which have started with the CNews code. The Netwinsite DNews
|
||||
system and the Cyclone Typhoon, we suspect, both are INN-spired.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
We will recommend CNews and NNTPd over INN, because we are more
|
||||
comfortable with the CNews architecture for reasons given above, and we
|
||||
do not run carrier-class sites. We will continue to support, maintain and
|
||||
extend this software base, at least for Linux. And we see no reason for
|
||||
the overwhelming majority of Usenet sites to be forced to use anything
|
||||
else. Your viewpoints welcome.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Had we been setting up and managing carrier-class sites with their
|
||||
near-real-time throughput requirements, we would probably not have
|
||||
chosen CNews. And for those situations, our opinion of NNTP versus
|
||||
compressed UUCP has been discussed in <xref linkend="feedefficiency"/>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Suck and Leafnode have their place in the range of options, where they
|
||||
appear to be attractive for novices who are intimidated by the ``full
|
||||
blown'' appearance of CNews+NNTPd or INN. However, we run CNews + NNTPd
|
||||
even on Linux laptops. We suspect INN can be used this way too. We do
|
||||
not find these ``full blown'' implementations any more resource
|
||||
hungry than their simpler cousins. Therefore, other than administration
|
||||
and configuration familiarity, we don't see any other reason why even a
|
||||
solitary end-user will choose Leafnode or Suck over CNews+NNTPd. As
|
||||
always, contrary opinions invited.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
</chapter>
|
|
@ -0,0 +1,515 @@
|
|||
<chapter><title>Principles of Operation</title>
|
||||
<para>Here we discuss the basic concepts behind the operation of a Usenet news
|
||||
system.</para>
|
||||
|
||||
<section><title>Newsgroups and articles </title>
|
||||
|
||||
<para>A Usenet news article sits in a file or in some other on-disk
|
||||
data structure on the disks of a Usenet server, and its contents look
|
||||
like this:</para>
|
||||
|
||||
<programlisting>
|
||||
<![CDATA[
|
||||
Xref: news.starcomsoftware.com starcom.tech.misc:211 starcom.tech.security:452
|
||||
Newsgroups: starcom.tech.misc,starcom.tech.security
|
||||
Path: news.starcomsoftware.com!purva!shuvam
|
||||
From: Shuvam <shuvam@starcomsoftware.com>
|
||||
Subject: "You just throw up your hands and reboot" (fwd)
|
||||
Content-Type: TEXT/PLAIN; charset=US-ASCII
|
||||
Distribution: starcom
|
||||
Organization: Starcom Software Pvt Ltd, India
|
||||
Message-ID: <Pine.LNX.4.31.0107022153490.30462-100000@starcomsoftware.com>
|
||||
Mime-Version: 1.0
|
||||
Date: Mon, 2 Jul 2001 16:27:57 GMT
|
||||
|
||||
Interesting quote, and interesting article.
|
||||
|
||||
Incidentally, comp.risks may be an interesting newsgroup to follow. We
|
||||
must be receiving the feed for this group on our server, since we
|
||||
receive all groups under comp.*, unless specifically cancelled. Check it
|
||||
out sometime.
|
||||
|
||||
comp.risks tracks risks in the use of computer technology, including
|
||||
issues in protecting ourselves from failures of such stuff.
|
||||
|
||||
Shuvam
|
||||
|
||||
> Date: Thu, 14 Jun 2001 08:11:00 -0400
|
||||
> From: "Chris Norloff" <cnorloff@norloff.com>
|
||||
> Subject: NYSE: "Throw up your hands and reboot"
|
||||
>
|
||||
> When the New York Stock Exchange computer systems crashed for 85
|
||||
> minutes (8 Jun 2001), Andrew Brooks, chief of equity trading at
|
||||
> Baltimore mutual fund giant T. Rowe Price, was quoted as saying "Hey,
|
||||
> we're all subject to the vagaries of technology. It happens on your
|
||||
> own PC at home. You just throw up your hands and reboot."
|
||||
>
|
||||
> http://www.washingtonpost.com/ac3/ContentServer?articleid=A42885-2001Jun8&pagename=article
|
||||
>
|
||||
> Chris Norloff
|
||||
>
|
||||
>
|
||||
> This is from --
|
||||
>
|
||||
> From: risko@csl.sri.com (RISKS List Owner)
|
||||
> Newsgroups: comp.risks
|
||||
> Subject: Risks Digest 21.48
|
||||
> Date: Mon, 18 Jun 2001 19:14:57 +0000 (UTC)
|
||||
> Organization: University of California, Berkeley
|
||||
>
|
||||
> RISKS-LIST: Risks-Forum Digest Monday 19 June 2001
|
||||
> Volume 21 : Issue 48
|
||||
>
|
||||
> FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks)
|
||||
> ACM Committee on Computers and Public Policy,
|
||||
> Peter G. Neumann, moderator
|
||||
>
|
||||
> This issue is archived at <URL:http://catless.ncl.ac.uk/Risks/21.48.html>
|
||||
> and by anonymous ftp at ftp.sri.com, cd risks .
|
||||
>
|
||||
]]>
|
||||
</programlisting>
|
||||
|
||||
<para>A Usenet article's header is very interesting if you want to learn
|
||||
about the functioning of the Usenet. The <literal>From</literal>,
|
||||
<literal>Subject</literal>, and <literal>Date</literal> headers are
|
||||
familiar to anyone who has used email. The <literal>Message-ID</literal>
|
||||
header contains a unique ID for each message, and is present in each
|
||||
email message, though not many non-technical email users know about it.
|
||||
The <literal>Content-Type</literal> and <literal>Mime-Version</literal>
|
||||
headers are used for MIME encoding of articles, attaching files and
|
||||
other attachments, and so on, just like in email messages.</para>
|
||||
|
||||
<para>The <literal>Organisation</literal> header is an informational header
|
||||
which is supposed to carry some information identifying the organisation
|
||||
to which the author of the article belongs. What remains now are the
|
||||
<literal>Newsgroups</literal>, <literal>Xref</literal>,
|
||||
<literal>Path</literal> and <literal>Distributions</literal> headers.
|
||||
These are special to Usenet articles and are very important.</para>
|
||||
|
||||
<para>The <literal>Newsgroups</literal> header specifies which newsgroups
|
||||
this article should belong to. The <literal>Distributions</literal>
|
||||
header, sadly under-utilised in today's globalised Internet world,
|
||||
allows the author of an article to specify how far the article will be
|
||||
re-transmitted. The author of an article, working in conjunction with
|
||||
well-configured networks of Usenet servers, can control the ``radius'' of
|
||||
replication of his article, thus posting an article of local significance
|
||||
into a newsgroup but setting the <literal>Distribution</literal> header to
|
||||
some suitable setting, <emphasis>e.g.</emphasis> <literal>local</literal>
|
||||
or <literal>starcom</literal>, to prevent the article from being relayed
|
||||
to servers outside the specified domain.</para>
|
||||
|
||||
<para>The <literal>Xref</literal> header specifies the precise
|
||||
<emphasis role=strong>article number</emphasis> of this article in each of the
|
||||
newsgroups in which it is inserted, for the current server. When an
|
||||
article is copied from one server to another as part of a newsfeed,
|
||||
the receiving server throws away the old <literal>Xref</literal> header
|
||||
and inserts its own, with its own article numbers. This indicates an
|
||||
interesting feature of the Usenet system: each article in a Usenet server
|
||||
has a unique number (an integer) for each newsgroup it is a part of.
|
||||
Our sample above has been added to two newsgroups on our server, and has
|
||||
the article numbers 211 and 452 in those groups. Therefore, any Usenet
|
||||
client software can query our server and ask for article number 211 in
|
||||
the newsgroup <literal>starcom.tech.misc</literal> and get this article.
|
||||
Asking for article number 452 in <literal>starcom.tech.security</literal>
|
||||
will fetch the article too. On another server, the numbers may be very
|
||||
different.</para>
|
||||
|
||||
<para>The <literal>Path</literal> specifies the list of machines through
|
||||
which this article has travelled before it has reached the current
|
||||
server. UUCP-style syntax is used for this string. The current
|
||||
example indicates that a user called <literal>shuvam</literal> first
|
||||
wrote this article and posted it onto a computer which calls itself
|
||||
<literal>purva</literal>, and this computer then transferred this article
|
||||
by a newsfeed to <literal>news.starcomsoftware.com</literal>. The
|
||||
<literal>Path</literal> header is critical for breaking loops in
|
||||
newsfeeds, and will be discussed in detail later.</para>
|
||||
|
||||
<para>Our sample article will sit in the two newsgroups listed above
|
||||
forever, unless expired. The Usenet software on a server is usually
|
||||
configured to expire articles based on certain conditions,
|
||||
<emphasis>e.g.</emphasis> after it's older than a certain number of
|
||||
days. The C-News software we use allows expiry control based on the
|
||||
newsgroup hierarchy and the type of newsgroup, <emphasis>i.e.</emphasis>
|
||||
moderated or unmoderated. Against each class of newsgroups, it allows
|
||||
the administrator to specify a number of days after which the article
|
||||
will be expired. It is possible for an article to control its own
|
||||
expiry, by carrying an <literal>Expires</literal> header specifying a
|
||||
date and time. Unless overriden in the Usenet server software, the
|
||||
article will be expired only after its explicit expiry time is
|
||||
reached.</para>
|
||||
</section>
|
||||
|
||||
<section><title>Of readers and servers</title>
|
||||
<para>Computers which access Usenet articles are broadly of two classes:
|
||||
the readers and the servers. A Usenet server carries a repository of
|
||||
articles, manages them, handles newsfeeds, and offers its repository to
|
||||
authorised readers to read. A Usenet reader is merely a computer with
|
||||
the appropriate software to allow a user to access a software, fetch
|
||||
articles, post new articles, and keep track of which articles it has
|
||||
read in each newsgroup. In terms of functionality, Usenet reading
|
||||
software is less interesting to a Usenet administrator than a Usenet
|
||||
server software. However, in terms of lines of code, the Usenet reader
|
||||
software can often be much larger than Usenet server software, primarily
|
||||
because of the complexities of modern GUI code.</para>
|
||||
|
||||
<para>Most modern computers almost exclusively access Usenet servers using
|
||||
the NNTP (Network News Transfer Protocol) for reading and posting. This
|
||||
protocol can also be used for inter-server communication, but those
|
||||
aspects will be discussed later. The NNTP protocol, like any other
|
||||
well-designed TCP-based Internet protocol, carries ASCII commands and
|
||||
responses terminated with <literal>CR-LF</literal>, and comprises a
|
||||
sequence of commands, somewhat reminiscent of the POP3 protocol for
|
||||
email. Using NNTP, a Usenet reader program connects to a Usenet server,
|
||||
asks for a list of active newsgroups, and receives this (often huge)
|
||||
list. It then sets the ``current newsgroup'' to one of these, depending
|
||||
on what the user wants to browse through. Having done this, it gets the
|
||||
meta-data of all current articles in the group, including the author,
|
||||
subject line, date, and size of each article, and displays an index of
|
||||
articles to the user.</para>
|
||||
|
||||
<para>The user then scans through this list, selects an article, and
|
||||
asks the reader to fetch it. The reader gives the article number of
|
||||
this article to the server, and fetches the full article for the user
|
||||
to read through. Once the user finishes his NNTP session, he exits,
|
||||
and the reader program closes the NNTP socket. It then (usually)
|
||||
updates a local file in the user's home area, keeping track of which
|
||||
news articles the user has read. These articles are typically not shown
|
||||
to the user next time, thus allowing the user to progress rapidly to new
|
||||
articles in each session. The reader software is helped along in this
|
||||
endeavour by the <literal>Xref</literal> header, using which it knows
|
||||
all the different identities by which a single article is identified
|
||||
in the server. Thus, if you read the sample article given above by
|
||||
accessing <literal>starcom.tech.misc</literal>, you'll never be shown
|
||||
this article again when you access <literal>starcom.tech.misc</literal>
|
||||
or <literal>starcom.tech.security</literal>; your reader software will
|
||||
do this by tracking the <literal>Xref</literal> header and mapping
|
||||
article numbers.</para>
|
||||
|
||||
<para>When a user posts an article, he first composes his message using
|
||||
the user interface of his reader software. When he finally gives the
|
||||
command to send the article, the reader software contacts the Usenet
|
||||
server using the pre-existing NNTP connection and sends the article to
|
||||
it. The article carries a <literal>Newsgroups</literal> header with the
|
||||
list of newsgroups to post to, often a <literal>Distribution</literal>
|
||||
header with a distribution specification, and other headers
|
||||
like <literal>From</literal>, <literal>Subject</literal>
|
||||
<emphasis>etc.</emphasis> These headers are used by the server
|
||||
software to do the right thing. Special and rare headers like
|
||||
<literal>Expires</literal> and <literal>Approved</literal> are acted upon
|
||||
when present. The server assigns a new article number to the article for
|
||||
each newsgroup it is posted to, and creates a new <literal>Xref</literal>
|
||||
header for the article.</para>
|
||||
|
||||
<para>Transfer of articles between servers is done in various ways, and
|
||||
is discussed in quite a bit of detail in Section XXX titled
|
||||
``Newsfeeds'' below.</para>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
<section ><title>Newsfeeds </title>
|
||||
<section><title> Fundamental concepts</title>
|
||||
<para>When we try to analyse newsfeeds in real life, we begin to see
|
||||
that, for most sites, traffic flow is not symmetrical in both
|
||||
directions. We usually find that one server will feed the bulk
|
||||
of the world's articles to one or more secondary servers every
|
||||
day, and receive a few articles written by the users of those
|
||||
secondary servers in exchange. Thus, we usually find that
|
||||
articles flow down from the stem to the branches to the leaves
|
||||
of the worldwide Usenet server network, and not exactly in a totally
|
||||
balanced mesh flow pattern. Therefore, we use the term
|
||||
``upstream server'' to refer to the server from which we receive
|
||||
the bulk of our daily dose of articles, and ``downstream
|
||||
server'' to refer to those servers which receive the bulk dose
|
||||
of articles from us.</para>
|
||||
|
||||
<para>Newsfeeds relay articles from one server to their ``next door
|
||||
neighbour'' servers, metaphorically speaking. Therefore, articles
|
||||
move around the globe, not by a massive number of single-hop
|
||||
transfers from the originating server to every other server in
|
||||
the world, but in a sequence of hops, like passing the baton in
|
||||
a relay race. This increases the latency time for an article
|
||||
to reach a remote tertiary server after, say, ten hops, but
|
||||
it allows tighter control of what gets relayed at every hop,
|
||||
and helps in redundancy, decentralisation of server loads,
|
||||
and conservation of network bandwidth. In this respect, Usenet
|
||||
newsfeeds are more complex than HTTP data flows, which
|
||||
typically use single-hop techniques.</para>
|
||||
|
||||
<para>Each Usenet news server therefore has to worry about
|
||||
newsfeeds each time it receives an article, either by a fresh post
|
||||
or from an incoming newsfeed. When the Usenet server digests this
|
||||
article and files it away in its repository, it simultaneously
|
||||
looks through its database to see which other server it should
|
||||
feed the article to. In order to do this, it carries out a
|
||||
sequence of checks, described below.</para>
|
||||
|
||||
<para>Each server knows which other servers are its ``next door
|
||||
neighbours;'' this information is kept in its newsfeed
|
||||
configuration information. Against each of its ``next door
|
||||
neighbours,'' there will be a list of newsgroups which it
|
||||
wants, and a list of distributions. The new article's list of
|
||||
newsgroups will be matched against the newsgroup list of the
|
||||
``next door neighbour'' to see whether there's even a single
|
||||
common newsgroup which makes it necessary to feed the article to
|
||||
it. If there's a matching newsgroup, and the server's distribution
|
||||
list matches the article's distribution, then the article is
|
||||
marked for feeding to this neighbour.</para>
|
||||
|
||||
<para>When the neighbour receives the article as part of the
|
||||
feed, it performs some sanity checks of its own. The first check
|
||||
it performs is on the <literal>Newsgroups</literal> header of
|
||||
the new article. If none of the newsgroups listed there are part
|
||||
of the active newsgroups list of this server, then the article
|
||||
can be rejected. An article rejected thus may even be queued for
|
||||
outgoing feeds to other servers, but will not be digested for
|
||||
incorporation into the local article repository.</para>
|
||||
|
||||
<para>The next check performed is against the
|
||||
<literal>Path</literal> header of the incoming article. If this
|
||||
header lists the name of the current Usenet server anywhere,
|
||||
it indicates that it has already passed through this server at
|
||||
least once before, and is now re-appearing here erroneously because
|
||||
of a newsfeed loop. Such loops are quite often configured into
|
||||
newsfeed topologies for redundancy: ``I'll get the articles from
|
||||
Server X if not Server Y, and may the first one in win.'' The
|
||||
Usenet server software automatically detects a duplicate feed
|
||||
of an article and rejects it.</para>
|
||||
|
||||
<para>The next check is against what is called the server's
|
||||
<emphasis>history database</emphasis>. Every Usenet server has
|
||||
a history database, which is a list of the message IDs of all
|
||||
current articles in the local repository. Oftentimes the history
|
||||
database also carries the message IDs of all messages recently
|
||||
expired. If the incoming article's message ID matches any of the
|
||||
entries in the database, then again it is rejected without being
|
||||
filed in the local repository. This is a second loop detection
|
||||
method. Sometimes, the mere checking of the article's
|
||||
<literal>Path</literal> header does not detection of all
|
||||
potential problems, because the problem may be a re-insertion
|
||||
instead of a loop. A re-insertion happens when the same incoming
|
||||
batch of news articles is re-fed into the local server, perhaps
|
||||
after recovering the system's data from tapes after a system
|
||||
crash. In such cases, there's no newsfeed loop, but there's
|
||||
still the risk that one article may be digested into the local
|
||||
server twice. The history database prevents this.</para>
|
||||
|
||||
<para>All these simple checks are very effective, and work
|
||||
across server and software types, as per the Internet standards.
|
||||
Together, they allow robust and fail-safe Usenet article flow
|
||||
across the world.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Types of newsfeeds</title>
|
||||
<para>This section explains the basics of newsfeeds, without getting
|
||||
into details of software and configuration files.</para>
|
||||
|
||||
<section><title>Queued feeds</title>
|
||||
<para>
|
||||
This is the commonest method of sending articles from one server
|
||||
to another, and is followed whenever large volumes of articles
|
||||
are to be transferred per day. This approach needs a one-time
|
||||
modification to the upstream server's configuration for each
|
||||
outgoing feed, to define a new <emphasis>queue.</emphasis>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In essence all queued feeds work in the following way. When the
|
||||
sending server receives an article, it processes it for
|
||||
inclusion into its local repository, and also checks through all
|
||||
its outgoing feed definitions to see whether the article needs
|
||||
to be queued for any of the feeds. If yes, it is added to a
|
||||
<emphasis>queue file</emphasis> for each outgoing feed. The
|
||||
precise details
|
||||
of the queue file can change depending on the software
|
||||
implementation, but the basic processes remain the same. A queue
|
||||
file is a list of queued articles, but does not contain the
|
||||
article contents. Typical queue files are ASCII text files with
|
||||
one line per article giving the path to a copy of the article in
|
||||
the local spool area.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Later, a separate process picks up each queue file and creates
|
||||
one or more <emphasis>batches</emphasis> for each outgoing feed.
|
||||
A <emphasis>batch</emphasis> is a large file containing multiple
|
||||
Usenet news
|
||||
articles. Once the batches are created, various transport
|
||||
mechanisms can be used to move the files from sending server to
|
||||
receiving server. You can even use scripted FTP. You only need
|
||||
to ensure that the batch is picked up from the upstream server
|
||||
and somehow copied into a designated incoming batch directory in
|
||||
the downstream server.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
UUCP has traditionally been the mechanism of choice for batch
|
||||
movement, because it predates the Internet and wide availability
|
||||
of fast packet-switched data networks. Today, with TCP/IP
|
||||
everywhere, UUCP once again emerges as the most logical choice
|
||||
of batch movement, because it too has moved with the times: it
|
||||
can work over TCP.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
NNTP is the <emphasis>de facto</emphasis> mechanism of choice
|
||||
for moving
|
||||
queued newsfeeds for carrier-class Usenet servers on the
|
||||
Internet, and unfortunately, for a lot of other Usenet servers
|
||||
as well. The reason why we find this choice unfortunate is
|
||||
discussed in <xref linkend="feedefficiency"/> below. But in NNTP
|
||||
feeds, an intermediate step of building batches out of queue
|
||||
files can be eliminated --- this is both its strength and its
|
||||
weakness.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In the case of queued NNTP feeds, articles get added to queue
|
||||
files as described above. An NNTP transmit process periodically
|
||||
wakes up, picks up a queue file, and makes an NNTP connection to
|
||||
the downstream server. It then begins a processing loop where,
|
||||
for each queued article, it uses the NNTP
|
||||
<literal>IHAVE</literal>
|
||||
command to inform the downstream server of the article's
|
||||
message~ID. The downstream server checks its local repository to
|
||||
see whether it already has the message. If not, it responds with
|
||||
a <literal>SENDME</literal> response. The transmitting server
|
||||
then pumps
|
||||
out the article contents in plaintext form. When all articles
|
||||
in the queue have been thus processed, the sending server closes
|
||||
the connection. If the NNTP connection breaks in between due to
|
||||
any reason, the sending server truncates the queue file and
|
||||
retains only those articles which are yet to be transmitted,
|
||||
thus minimising repeat transmissions.
|
||||
</para>
|
||||
|
||||
<para><anchor id="dialupnonntp"/>
|
||||
A queued NNTP feed works with the sending server making an NNTP
|
||||
connection to the receiving server. This implies that the
|
||||
receiving server must have an IP address which is known to the
|
||||
sending server or can be looked up in the DNS. If the receiving
|
||||
server connects to the Internet periodically using a dialup
|
||||
connection and works with a dynamically assigned IP address,
|
||||
this can get tricky. UUCP feeds suffer no such problems because
|
||||
the sending server for the newsfeed can be the UUCP server,
|
||||
<emphasis>i.e.</emphasis>
|
||||
passive. The receiving server for the feed can be the UUCP
|
||||
master, <emphasis>i.e.</emphasis> the active party. So the
|
||||
receiving server can then
|
||||
initiate the UUCP connection and connect to the sending server.
|
||||
Thus, if even one of the two parties has a static IP address,
|
||||
UUCP queued feeds can work fine.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Thus, NNTP feeds can be sent out a little faster than the
|
||||
batched transmission processes used for UUCP and other older
|
||||
methods, because no batches need to be constructed. However,
|
||||
NNTP is often used in newsfeeds where it is not necessary and it
|
||||
results in colossal waste of bandwidth. Before we study
|
||||
efficiency issues of NNTP versus batched feeds, we will cover
|
||||
another way feeds can be organised using NNTP: the pull feeds.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title>Pull feeds</title>
|
||||
<para>
|
||||
This method of transferring a set of articles works only over
|
||||
NNTP, and requires absolutely no configuration on the
|
||||
transmitting, or upstream, server. In fact, the upstream server
|
||||
cannot even easily detect that the downstream server is pulling
|
||||
out a feed --- it appears to be just a heavy and thorough
|
||||
newsreader, that's all.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This pull feed works by the downstream server pulling out
|
||||
articles i one by one, just like any NNTP newsreader, using the
|
||||
NNTP <literal>ARTICLE</literal> command with the Message-ID as
|
||||
parameter.
|
||||
The interesting detail is how it gets the message~IDs to begin
|
||||
with. For this, it uses an NNTP command, specially designed for
|
||||
pull feeds, called <literal>NEWNEWS</literal>. This command
|
||||
takes a hierarchy and a date,
|
||||
<screen> NEWNEWS comp 15081997 </screen>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This command is sent by the downstream server over NNTP to the
|
||||
upstream server, and in effect asks the upstream server to list
|
||||
out all news articles which are newer than 15 August 1997 in the
|
||||
<literal>comp</literal> hierarchy. The upstream server responds
|
||||
with a
|
||||
(often huge) list of message~IDs, one per line, ending with a
|
||||
period on a line by itself.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The pulling server then compares each newly received message~ID
|
||||
with its own article database and makes a (possibly shorter)
|
||||
list of all articles which it does not have, thus eliminating
|
||||
duplicate fetches. That done, it begins fetching articles one
|
||||
by one, using the NNTP <literal>ARTICLE</literal> command as
|
||||
mentioned above.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In addition, there is another NNTP command,
|
||||
<literal>NEWGROUPS</literal>,
|
||||
which allows the NNTP client --- <emphasis>i.e.</emphasis> the
|
||||
downstream server in
|
||||
this case --- to ask its upstream server what were the new
|
||||
newsgroups created since a given date. This allows the
|
||||
downstream server to add the new groups to its
|
||||
<literal>active</literal> file.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <literal>NEWNEWS</literal> based approach is usually one of
|
||||
the most inefficient methods of pulling out a large Usenet feed.
|
||||
By inefficiency, here we refer to the CPU loads and RAM
|
||||
utilisation on the upstream server, not on bandwidth usage. This
|
||||
inefficiency is because most Usenet news servers do not keep
|
||||
their article databases indexed by hierarchy and date; CNews
|
||||
certainly does not. This means that a <literal>NEWNEWS</literal>
|
||||
command issued to an upstream server will put that server into a
|
||||
sequential search of its article database, to see which articles
|
||||
fit into the hierarchy given and are newer than the given date.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
If pull feeds were to become the most common way of sending out
|
||||
articles, then all upstream servers would badly need an
|
||||
efficient way of sorting their article databases to allow each
|
||||
<literal>NEWNEWS</literal> command to rapidly generate its list
|
||||
of matching articles. A slow upstream server today might take
|
||||
minutes to begin responding to a <literal>NEWNEWS</literal>
|
||||
command, and
|
||||
the downstream server may time out and close its NNTP connection
|
||||
in the meanwhile. We have often seen this happening, till we
|
||||
tweak timeouts.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There are basic efficiency issues of bandwidth utilisation
|
||||
involved in NNTP for news feeds, which are applicable for both
|
||||
queued and pull feeds. But the problem with
|
||||
<literal>NEWNEWS</literal> is unique to pull feeds, and relates
|
||||
to server loads, not bandwidth wastage.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section id="controlmsg"> <title>Control messages</title>
|
||||
<para>
|
||||
(Discuss control messages. Show examples of actual control messages
|
||||
if possible. Discuss security issues in the form of control message
|
||||
storms, and how digital signatures are being used to tackle it. This
|
||||
sets the ground for <literal>pgpverify</literal> later on.)
|
||||
</para>
|
||||
</section>
|
||||
</chapter>
|
|
@ -0,0 +1,716 @@
|
|||
<chapter><title>Setting up CNews + NNTPd</title>
|
||||
|
||||
<section><title>Getting the sources and stuff</title>
|
||||
|
||||
<section><title>The sources</title>
|
||||
|
||||
<para>C-News software can be obtained from
|
||||
<literal>ftp://ftp.uu.net/networking/news/transport/cnews/cnews.tar.Z</literal>
|
||||
and will need to be uncompressed using the BSD
|
||||
<literal>uncompress</literal> utility or a compatible program. The
|
||||
tarball is about 650 KBytes in size. It has its own highly intelligent
|
||||
configuration and installation processes, which are very well
|
||||
documented. The version that is available is Cleanup Release revision G,
|
||||
on which our own version is based.</para>
|
||||
|
||||
<para>NNTPd is available from
|
||||
<literal>ftp://ftp.uu.net/networking/news/nntp/nntp.1.5.12.1.tar.Z</literal>.
|
||||
It has no automatic scripts and processes to configure itself. After
|
||||
fetching the sources, you will have to follow a set of directions given
|
||||
in the documentation and configure some C header files. These
|
||||
configuration settings must be done keeping in mind what you have
|
||||
specified when you build the C-News sources, because NNTPd and C-News
|
||||
must work together. Therefore, some key file formats, directory paths,
|
||||
<emphasis>etc.</emphasis>, will have to be specified identically in both
|
||||
software systems.</para>
|
||||
|
||||
<para>The third software system we use is Nestor. This too is to be
|
||||
found in the same place where the NNTPd software is kept, at
|
||||
<literal>ftp://ftp.uu.net/networking/news/nntp/nestor.tar.Z</literal>.
|
||||
This software compiles to one binary program, which must be run
|
||||
periodically to process the logs of <literal>nntpd</literal>, the NNTP
|
||||
server which is part of NNTPd, and report usage statistics to the
|
||||
administrator. We have integrated Nestor into our source base.</para>
|
||||
|
||||
<para>The fourth piece of the puzzle, without which no Usenet server
|
||||
administrator dares venture out into the wild world of public Internet
|
||||
newsfeeds, is <literal>pgpverify</literal>.</para>
|
||||
|
||||
<para>We have been working with C-News and NNTPd for many years now, and
|
||||
have fixed a few bugs in both packages. We have also integrated the four
|
||||
software systems listed above, and added a few features here and there to
|
||||
make things work more smoothly. We offer our entire source base to
|
||||
anyone for free download from
|
||||
<literal>http://www.starcomsoftware.com/proj/news/src/news.tar.gz</literal>.
|
||||
There are no licensing restrictions on our sources; they are as freely
|
||||
redistributable as the original components we started with.</para>
|
||||
|
||||
<para>When you download our software distribution, you will extract it
|
||||
to find a directory tree with the following subdirectories and files:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>c-news</literal>: the source tree of the CR.G
|
||||
software release, with our additions like
|
||||
<literal>pgpverify</literal> integration, our scripts like
|
||||
<literal>mail2news</literal>, and pre-created configuration
|
||||
files.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>nntp-1.5.12.1</literal>: the source tree of the
|
||||
original NNTPd release, with header files pre-configured to fit in
|
||||
with our configuration of C-News, and our addition of bits and
|
||||
pieces like Nestor, the log analysis program.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>howto</literal>: this document, and its SGML
|
||||
sources and Makefile.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>archives</literal>: a directory containing the
|
||||
tarballs of the original C-News, NNTPd, Nestor and
|
||||
<literal>pgpverify</literal> source distributions, in case you want
|
||||
them. Strictly speaking, the <literal>archive</literal> directory is
|
||||
not necessary unless you want to study what changes we have made,
|
||||
what files we have added, to the original sources.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>build.sh</literal>: a shellscript you can run
|
||||
to compile the entire combined source tree and install binaries in the
|
||||
right places, if you are lucky and all goes well.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>Needless to say, we believe that our source tree is a better
|
||||
place to start with than the original components, specially if you
|
||||
are installing a Usenet server on a Linux box and for the first time.
|
||||
We will be available on email to provide technical assistance should
|
||||
you run into trouble.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>The key configuration files</title>
|
||||
|
||||
<para>Once you get the sources, you will need some key configuration
|
||||
files to seed your C-News system. These configuration files are
|
||||
actually database tables, and are changing frequently, whenever
|
||||
newsgroups are created, modified or deleted. These files specify
|
||||
the list of active newsgroups in the ``public'' Usenet. You can,
|
||||
and should, add your organisation's internal newsgroups to this
|
||||
list when you set up your own server, but you will need to know
|
||||
the list of public standard newsgroups to begin with. This list
|
||||
can be obtained from the same FTP server by downloading the files
|
||||
<literal>active.gz</literal> and <literal>newsgroups.gz</literal> from
|
||||
<literal>ftp://ftp.uu.net/networking/news/config/</literal>. You
|
||||
can create your own <literal>active</literal> and
|
||||
<literal>newsgroups</literal> files by retaining a subset of the entries
|
||||
in these two files. Both these are ASCII text files.</para>
|
||||
|
||||
<para>Getting the sources from our server will not obviate the need to
|
||||
get the latest versions of these files from
|
||||
<literal>ftp.uu.net</literal>. We do not (yet) maintain an up-to-date
|
||||
copy of these files on our server, and we will add no value to the
|
||||
original by just mirroring them.</para>
|
||||
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Compiling and installing</title>
|
||||
<para>
|
||||
For installing, first make sure you have an entry for a user called
|
||||
<literal>news</literal> in your <literal>/etc/password</literal> file. Add one if not present. This
|
||||
is setting the news-database owner to <literal>news</literal>. Now download
|
||||
the source from us and untar it in the home directory of news. This creates
|
||||
two main directories <emphasis>viz.</emphasis> <literal>c-news</literal>
|
||||
and <literal>nntp</literal>.
|
||||
To install and compile, run the script <literal>build.sh</literal> as root
|
||||
in the
|
||||
directory that contains the script. It is important that the script run as
|
||||
<literal>root</literal> as it sets ownerships and installs and compiles as
|
||||
<literal>news</literal> and hence should have adequate permissions to do
|
||||
this. This
|
||||
is a one-step process that puts in place both the C-News and the
|
||||
NNTP software, setting correct permissions and paths.
|
||||
Following
|
||||
is a brief description of what build.sh does:
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>
|
||||
Checks for the <literal>OS</literal> platform and exits if
|
||||
it is not <literal>Linux</literal>.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Again, exits if you are not running as
|
||||
<literal>root</literal>.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Looks for and exits if cannot find the above two directories.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Compiles <literal>C-News</literal> and exits on error. This builds
|
||||
all the software. Writes the error into a file called <literal>make.out</literal>. Read it to
|
||||
determine the cause. Also, performs regression tests if the
|
||||
compilation was successfull and does not exit on error. Sends out a
|
||||
warning to read the error file <literal>make.out.r</literal> and fix 'em.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Performs the above operation in the <literal>nntp</literal> directory, too.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Checks the presence of the three key directories:
|
||||
<literal>$NEWSARTS - (/var/spool/news)</literal> that houses the artciles,
|
||||
<literal>$NEWSCTL -(/var/lib/news)</literal> that contain
|
||||
the configuration, log and status files and <literal>$NEWSBIN -
|
||||
(/usr/lib/newsbin)</literal> that contain the binaries and
|
||||
executables for
|
||||
the working of the Usenet News system. Tries to create them if non-existent
|
||||
and exits if it results in failure.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Changes the ownership of these directories to <literal>news.news</literal>.
|
||||
This is important since the entire Usenet News System runs as user <literal>news.</literal> It
|
||||
will not function properly as any other user.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Then starts the installation process of C News. It runs
|
||||
<literal>make install </literal>to install binaries at the right locations; <literal>make setup </literal>to set
|
||||
the correct paths, umask, create directories for newsgroups, determine who
|
||||
will receive reports; make ui to set up inews and injnews and
|
||||
make readpostcheck to use readnews, postnews and checknews provided by
|
||||
C News. The errors, if any are to be found in the respective make.out
|
||||
files. e.g. make.setup will write errors to make.out.setup
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
<literal>Newsspool</literal> which queues incoming
|
||||
batches in <literal>$NEWSARTS/in.coming</literal> directory should run as
|
||||
set-userid and set-groupid This is done.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
A softlink is made to <literal>/var/lib/news</literal> from
|
||||
<literal>/usr/lib/news.</literal>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
The NNTP software is installed.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Sets up the manpages for C News and makes it world
|
||||
readable. The NNTP manpages get installed when the software is installed.
|
||||
Compiles the C News documentation guide.ps and makes it readable and
|
||||
available in <literal>/usr/doc/packages/news</literal> or
|
||||
<literal>/usr/doc/news</literal>.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>
|
||||
Checks for the PGP binary and asks the administrator to get
|
||||
it if not found.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section><title>Configuring the system: What and how to configure files?</title>
|
||||
<para>Once installed, you have to now configure the system to accept feeds and
|
||||
batch them for neighbours. You will have to do the following:</para>
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>nntpd</literal>:
|
||||
Copy the compiled nntpd into a directory where
|
||||
executables are kept and activate it. It runs on port 119 as a daemon
|
||||
through inetd unless you have compiled it as stand-alone.
|
||||
|
||||
An entry in the services file for nntp would look like this:
|
||||
<programlisting>nntp 119/tcp \# Network News Transfer Protocol</programlisting>
|
||||
|
||||
An entry in the inetd.conf file will be:
|
||||
<programlisting>nntp stream tcp nowait news path-to-tcpd path-to-nntpd</programlisting>
|
||||
|
||||
The last two fields in the inetd.conf file are the paths to binaries of the
|
||||
tcp daemon and the nntpd daemon respectively.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><emphasis role=bold>Configuring control files:</emphasis>
|
||||
There are plenty of control files in <literal>$NEWSCTL</literal> that will
|
||||
need to be configured before you can
|
||||
start using the news system. The files mentioned here are explained in some
|
||||
detail in chapter 8, section 8.1. The files to be configured are dealt in
|
||||
detail in the following section.
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>sys</literal>:
|
||||
One line per system/NDN listing all the
|
||||
newsgroup hierarchies each system subscribes to. Each line is prefixed
|
||||
with the system name and the one beginning with ME: indicates what we
|
||||
are going to receive. Following are typical entries that go into this
|
||||
file:
|
||||
|
||||
<programlisting>ME:comp,news,misc,netscape</programlisting>
|
||||
|
||||
This line indicates what newsgroups your server, as determined by the
|
||||
whoami file have subscribed to and will receive.
|
||||
|
||||
<programlisting>server/server.starcomsoftware.com:all,!general/all:f</programlisting>
|
||||
|
||||
This is a list of newsgroups this site will pass on to its NDN.
|
||||
The newsgroups specified should be a comma separated list and no spaces
|
||||
should be inserted in the whole line. The f flag indicates that the
|
||||
newsgroup name and article no. alongwith its size will be one entry in
|
||||
the togo file in the $NEWSARTS/out.going directory.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>explist</literal>:
|
||||
This file has entries indicating
|
||||
which <literal>articles</literal> expire and when and if they have to be
|
||||
archived The order in which the newsgroups are listed is important. An
|
||||
example follows:
|
||||
|
||||
<programlisting>comp.lang.java.3d x 60 /var/spool/news/Archive</programlisting>
|
||||
|
||||
This means that the articles of comp.lang.java expire after 60 days and
|
||||
shall be archived in the directory mentioned as the fourth field.
|
||||
Archiving is an option. The second field indicates that this line
|
||||
applies to both moderated and unmoderted newsgroups.
|
||||
<emphasis>m</emphasis> would
|
||||
specify moderated and <emphasis>u</emphasis> would specify unmoderated
|
||||
groups. If you want to specify an extremely large no. as the expiry
|
||||
period you can use the word 'never'.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>batchparms</literal>:
|
||||
Sendbatches is a program that
|
||||
adminsters batched transmission of news to other sites. To do this it
|
||||
consults the batchparms file. Each line in the file specifies the
|
||||
behaviour for each site. There are five fields for each site to be
|
||||
specified.</para>
|
||||
|
||||
<screen>server u 100000 100 batcher | gzip -9 | viauux -d gunzip</screen>
|
||||
<para>
|
||||
The first field is the site name which matches the entry in the sys
|
||||
file and has a corresponding directory in $NEWSARTS/out.going by that
|
||||
name.
|
||||
</para>
|
||||
<para>
|
||||
The second field is the class of the site, 'u' for UUCP and 'n' for
|
||||
NNTP feeds. A '!' in this field means that batching for this site has
|
||||
been disabled.
|
||||
</para>
|
||||
<para>
|
||||
The third field is the size of batches to be prepared in bytes.
|
||||
</para>
|
||||
<para>
|
||||
The fourth field is the maximum length of the output queue for
|
||||
transmission to that site.
|
||||
</para>
|
||||
<para>
|
||||
The fifth field is the command line to be used to build, compress and
|
||||
transmit batches to that site. It receives the contents of the togo file
|
||||
on standard input.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem><para><literal>controlperm</literal>:
|
||||
This file controls how the news
|
||||
system responds to control messages. Each line consists of 4-5 fields
|
||||
separated by white space.</para>
|
||||
|
||||
<programlisting>comp,sci tale@uunet.uu.net nrc pv news.announce.newsgroups</programlisting>
|
||||
|
||||
<para>
|
||||
The first field is a newsgroup pattern to which the line applies.
|
||||
</para>
|
||||
<para>
|
||||
The second field is either 'any' or an e-mail address. The latter
|
||||
specifies that the line applies only to control messages from that
|
||||
author.
|
||||
</para>
|
||||
<para>
|
||||
The third field is a set of opcode letters indicating what control
|
||||
operations need to be performed on messages emanating from the e-mail
|
||||
address mentioned in the second field. 'n' stands for creating a
|
||||
newgroup, 'r' stands for deleting a newsgroup and 'c' stands for
|
||||
checkgroup.
|
||||
</para>
|
||||
<para>
|
||||
The fourth field is a set of flag letters indicating how to respond to
|
||||
a control message that meets all the applicability tests:
|
||||
<screen>
|
||||
y Do it.
|
||||
n Don't do it.
|
||||
v Report it and include the entire control
|
||||
message in the report.
|
||||
q Don't report it.
|
||||
p Do it iff the control message carries a valid PGP signature.
|
||||
</screen>
|
||||
Exactly one of y, n or p must be present.
|
||||
</para>
|
||||
<para>
|
||||
The fifth field, which is optional, will be used if the fourth field
|
||||
contains a 'p'. It must contain the PGP key ID of the public key to be
|
||||
used for signature verification.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem><para><literal>mailpaths</literal>:
|
||||
This file describes how to reach
|
||||
the moderators of various heirarchies of news groups by mail. Each line
|
||||
consists of two fields: a news group pattern and an e-mail address. The
|
||||
first line whose group pattern matches the newsgroup is used. As an
|
||||
example:
|
||||
|
||||
<screen>
|
||||
comp.lang.java.3d somebody@mydomain.com
|
||||
all %s@moderators.uu.net
|
||||
</screen>
|
||||
|
||||
In the second example, the %s gets replaced with the groupname and all
|
||||
dots appearing in the name are substituted with dashes.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><emphasis role=bold>Miscellaneous files:</emphasis>
|
||||
The other files to be modified are:
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>mailname:</literal>
|
||||
Contains the Internet domain name of the
|
||||
news system. Consider getting one if you don't have it.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>organization:</literal>
|
||||
Contains the default value for the
|
||||
Organization: header for postings originating locally.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>whoami:</literal>
|
||||
Contains the name of the news system. This
|
||||
is the site name used in the Path: headers and hence should concur
|
||||
with the names your neighbours use in their sys files.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>active </literal>file:
|
||||
This file specifies one line for each
|
||||
newsgroup (not just the hierarchy) to be found on your news system. You
|
||||
will have to get the most recent copy of the active file from
|
||||
<literal>ftp://ftp.isc.org/usenet/CONFIG/active</literal> and prune it
|
||||
to delete
|
||||
newsgroups that you have not subscribed to. Run the script "addgroup"
|
||||
for each newsgroup in this file which will create relevant directories
|
||||
in the <literal>$NEWSARTS</literal> area. The "addgroup" script takes
|
||||
two paramters: the newsgroup name being created and a flag. The flag can
|
||||
be any one of the following:
|
||||
<screen>
|
||||
y local postings are allowed
|
||||
n no local postings, only remote ones
|
||||
m postings to this group must be approved
|
||||
by the moderator
|
||||
j articles in this group are only passed and not kept
|
||||
x posting to this newsgroup is disallowed
|
||||
=foo.bar articles are locally filed in
|
||||
"foo.bar" group
|
||||
</screen>
|
||||
|
||||
An entry in this file looks like this:
|
||||
|
||||
<programlisting>comp.lang.java.3d 0000003716 01346 m </programlisting>
|
||||
|
||||
The first field is the name of the newsgroup. The second field is the
|
||||
highest article number that has been used in that newsgroup. The
|
||||
third field is the lowest article number in the group. The fourth
|
||||
field is flag as explained above.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsgroups </literal>file:
|
||||
This contains a one line description
|
||||
of each newsgroup to be found in the active file. You will have to
|
||||
get the most recent file from
|
||||
<literal>ftp://ftp.isc.org/usenet/CONFIG/newsgroups</literal>
|
||||
and prune it to remove unwanted information. As an example:
|
||||
|
||||
<programlisting>comp.lang.java.3d 3D Grphics APIs for the Java language</programlisting>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><emphasis role=bold>Create aliases: </emphasis>
|
||||
These aliases are required for trouble reporting.
|
||||
Once the system is in place and scripts are run, anomalies/problems
|
||||
are reported to addresses in the /etc/aliases file. These entries
|
||||
include email addresses for <literal>newsmaster, newscrisis, news,
|
||||
usenet, newsmap</literal>
|
||||
They should ideally point to an email address that will be
|
||||
looked at regularly. Arrange the emails for "newsmap" to be
|
||||
discarded to minimize the effect of "sendsys bombing" by practical
|
||||
jokers.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><emphasis role=bold>Cron jobs:</emphasis>
|
||||
Certain scripts like newsrun that picks up incoming
|
||||
batches and maintenance scripts, should run through news-database
|
||||
owner's cron. The cron entries ideally will be for the following: A more
|
||||
detailed report can be found in <xref linkend="cronjobs"/>
|
||||
<orderedlist>
|
||||
<listitem><para><literal>newsrun: </literal>
|
||||
This script processes incoming batches of
|
||||
article. Run this as frequently as you want them to get digested.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>sendbatches:</literal>
|
||||
This script transmit batches to the
|
||||
NDNs. Set the frequency according to your requirements.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newsdaily:</literal>
|
||||
This should be run ideally once a day
|
||||
since it reports errors and anomalies in the news system.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>newswatch:</literal>
|
||||
This looks for errors/anomalies at a more detailed level and hence
|
||||
should be run atleast once every hour
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para><literal>doexpire:</literal>
|
||||
This script expires old articles as
|
||||
determined by the explist file. Run this once a day.
|
||||
</para></listitem>
|
||||
</orderedlist>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>newslog:
|
||||
Make an entry in the system's syslog.conf
|
||||
file for logging messages spewed out by the nntp daemon in "newslog".
|
||||
It should be located in <literal>$NEWSCTL</literal>. The entry will
|
||||
look like this:
|
||||
|
||||
<programlisting>news.debug -/var/lib/news/newslog</programlisting>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Newsboot:
|
||||
Have newsboot run (as "news", the
|
||||
news-database owner) when the system boots to clear out debris left
|
||||
around by crashes.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Add a Usenet mailer in sendmail:
|
||||
The mail2news program provided as
|
||||
part of the source code is a handy tool to send an e-mail to a newsgroup
|
||||
which gets digested as an article. You will have to add the following
|
||||
ruleset and mailer definition in your sendmail.cf file:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem><para>Under SParse1, add the following:
|
||||
<programlisting>
|
||||
R$+ . USENET < @ $=w . > $#usenet $: $1
|
||||
</programlisting>
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Under mailer definitions, define the mailer Usenet as:
|
||||
<screen>
|
||||
MUsenet P=/usr/lib/newsbin/mail2news/m2nmailer, F=lsDFMmn,
|
||||
S=10, R=0, M=2000000, T=X-Usenet/X-Usenet/X-Unix, A=m2nmailer $u
|
||||
</screen>
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>In order to send a mail to a newsgroup you will now have to suffix
|
||||
the
|
||||
newsgroup name with usenet <emphasis>i.e.</emphasis> your To: header
|
||||
will look like this:
|
||||
<screen>To: misc.test.usenet@yourdomain.</screen>
|
||||
The mailer definition of usenet will intercept this mail and post it to
|
||||
the respective newsgroup, in this case, misc.test</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
This, more or less, completes the configuration part.
|
||||
</para>
|
||||
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section><title>Testing the system</title>
|
||||
<para>
|
||||
To locally test the system, follow the steps given below:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem><para>post an article:
|
||||
Create a local newsgroup
|
||||
<screen>
|
||||
cnewsdo addgroup mysite.test y
|
||||
</screen>
|
||||
and using <literal>postnews </literal>post an article to it.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>Has it arrived in <literal>$NEWSARTS</literal>/in.coming?:
|
||||
The article should show up in the directory mentioned. Note the nomenclature
|
||||
of the article.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>When newsrun runs:
|
||||
When newsrun runs through cron, the article disappears from in.coming
|
||||
directory and appears in <literal>$NEWSARTS</literal>/mysite/test. Look how
|
||||
the newsgroup, active, log and history (not the errorlog) files and
|
||||
<literal> .overview </literal>file in
|
||||
<literal>$NEWSARTS/mysite/test</literal> reflect the digestion of the file
|
||||
into the news system.
|
||||
</para></listitem>
|
||||
|
||||
<listitem><para>reading the article:
|
||||
Try to read the article through readnews or any
|
||||
news client. If you are able to, then you have set most everything right.
|
||||
</para></listitem>
|
||||
</itemizedlist>
|
||||
</section>
|
||||
|
||||
<section><title><literal>pgpverify</literal> and <literal>controlperms</literal></title>
|
||||
<para>
|
||||
As mentioned in <xref linkend="controlmsg"/>, it becomes necessary to
|
||||
authenticate control messages to protect yourself from being attacked by
|
||||
pranksters. For this, you will have to configure the
|
||||
<literal>$NEWSCTL</literal>/controlperm file to declare whose control
|
||||
messages you are willing to honour and for what newsgroups alongwith their
|
||||
public key ID. The controlperm manpage shall give you details on the format.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This will work only in association with <literal> pgpverify </literal> which
|
||||
verifies the Usenet control messages that have been signed using the
|
||||
<literal>signcontrol</literal> process. The script can be found at
|
||||
<literal>ftp://ftp.isc.org/pub/pgpcontrol/pgpverify</literal>.
|
||||
<literal> pgpverify </literal>pgpverify internally uses the PGP binary which
|
||||
will have to be made available in the default executables directory. If you
|
||||
wish to send control messages for your local news system, you will have to
|
||||
digitally sign them using the above mentioned "signcontrol" program which is
|
||||
available at
|
||||
<literal>ftp://ftp.isc.org/pub/pgpcontrol/signcontrol</literal>. You will
|
||||
also have to configure the signcontrol program accordingly.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title>Feeding off an upstream neighbour</title>
|
||||
<para>
|
||||
For external feeds, commercial customers will have to buy them
|
||||
from a regular News Provider like <literal>dejanews.com</literal>
|
||||
or <literal>newsfeeds.com</literal>. You will have to specify
|
||||
to them what hierarchies you want and decide on the mode of
|
||||
transmission, <emphasis>i.e.</emphasis> UUCP or NNTP, based on
|
||||
your requirements. Once that is done, you will have to ask them to
|
||||
initiate feeds, and check <literal>$NEWSARTS/in.coming</literal>
|
||||
directory to see if feeds are coming in.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
If your organisation belongs to the academic community or is
|
||||
otherwise lucky enough to have an NDN server somewhere which is
|
||||
willing to provide you a free newsfeed, then the payment issue goes
|
||||
out of the picture, but the rest of the technical requirements
|
||||
remain the same.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
One problem with incoming NNTP feeds is that it is far easier to use
|
||||
(relatively) efficient NNTP inflows if you have a server with a
|
||||
permanent Internet connection and a fixed IP address. If you are a
|
||||
small office with a dialup Internet connection, this may not be
|
||||
possible. In that case, the only way to get incoming newsfeeds by
|
||||
NNTP may be by using a highly inefficient pull feed.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title>Configuring outgoing feeds</title>
|
||||
<para>
|
||||
If you are a leaf node, you will only have to send feeds back to your
|
||||
news provider for your postings in public newsgroups to propagate
|
||||
to the outside world. To enable this, you need one line in the
|
||||
<literal>sys</literal> and <literal>batchparms</literal> files
|
||||
and one directory in <literal>$NEWSARTS/out.going</literal>. If
|
||||
you are willing to transmit articles to your neighbouring
|
||||
sites, you will have to configure <literal>sys</literal> and
|
||||
<literal>batchparms</literal> with more entries. The number of directories
|
||||
in <literal>$NEWSARTS/out.going</literal> shall increase, too. Refer
|
||||
to chapter 8, section 8.1 and 8.2 for a better understanding of
|
||||
outgoing feeds. Again, you will have to determine how you wish to
|
||||
transmit the feed: UUCP or NNTP.
|
||||
</para>
|
||||
|
||||
<section><title>By UUCP</title>
|
||||
<para>For outgoing feeds by UUCP, we recommend that you start with
|
||||
Taylor UUCP. In fact, this is the UUCP version which forms part
|
||||
of the GNU Project and is the default UUCP on Linux
|
||||
systems.</para>
|
||||
|
||||
<para>A full treatment of UUCP configuration is beyond the scope of
|
||||
this document. However, the basic steps will be as follows. First,
|
||||
you will have to define a ``system'' in your Usenet server for the
|
||||
NDN (next door neighbour) host. This definition will include various
|
||||
parameters, including the manner in which your server will call the
|
||||
remote server, the protocol it will use, <emphasis>etc.</emphasis>
|
||||
Then an identical process will have to be followed on the NDN
|
||||
server's UUCP configuration, for your server, so that
|
||||
<emphasis>that</emphasis> server can recognize
|
||||
<emphasis>your</emphasis> Usenet server.</para>
|
||||
|
||||
<para>Finally, you will need to set up appropriate
|
||||
<literal>cron</literal> jobs for the user <literal>uucp</literal>
|
||||
to run <literal>uucico</literal> periodically. Taylor UUCP comes with
|
||||
a script called <literal>uusched</literal> which may be modified to
|
||||
your requirements; this script calls <literal>uucico</literal>. One
|
||||
<literal>uucico</literal> connection will both upload and download
|
||||
news batches. Smaller sites can run <literal>uusched</literal> even
|
||||
once or twice a day.</para>
|
||||
|
||||
<para>Later versions of this document will include the
|
||||
<literal>uusched</literal> scripts that we use in Starcom. We use
|
||||
UUCP over TCP/IP, and we run the <literal>uucico</literal>
|
||||
connection through an SSH tunnel, to prevent transmission of
|
||||
UUCP passwords in plain text over the Internet, and our SSH tunnel
|
||||
is established using public-key cryptography, without passwords
|
||||
being used anywhere.</para>
|
||||
</section>
|
||||
|
||||
<section><title>By NNTP</title>
|
||||
<para>For NNTP feeds, you will have to decide whether your server
|
||||
will be the connection initiator or connection recipient. If you are
|
||||
the connection initiator, you can send outgoing NNTP feeds more
|
||||
easily. If you are the connection recipient, then outgoing feeds
|
||||
will have to be pulled out of your server using the NNTP
|
||||
<literal>NEWNEWS</literal> command, which will place heavy loads on
|
||||
your server. This is not recommended.</para>
|
||||
|
||||
<para>Connecting to your NDN server for pushing out outgoing feeds
|
||||
will require the use of the <literal>nntpsend.sh</literal> script,
|
||||
which is part of the NNTPd source tree. This script will perform
|
||||
some housekeeping, and internally call the
|
||||
<literal>nntpxmit</literal> binary to actually send the queued set
|
||||
of articles out. You may have to provide authentication information
|
||||
like usernames and passwords to <literal>nntpxmit</literal> to allow
|
||||
it to connect to your NDN server, in case that server insists on
|
||||
checking the identity of incoming connections. (You can't be too
|
||||
careful in today's world.) <literal>nntpsend.sh</literal> will clean
|
||||
up after an <literal>nntpxmit</literal> connection finishes, and
|
||||
will requeue any unsent articles for the next session. Thus, even if
|
||||
there is a network problem, typically nothing is lost and all
|
||||
pending articles are transmitted next time.</para>
|
||||
|
||||
<para>Thus, pushing feeds out <emphasis>via</emphasis> may mean
|
||||
setting up <literal>nntpsend.sh</literal> properly, and then
|
||||
invoking it periodically from <literal>cron</literal>. If your
|
||||
Usenet server connects to the Internet only intermittently, then the
|
||||
process which sets up the Internet connection should be extended or
|
||||
modified to fire <literal>nntpsend.sh</literal> whenever the Internet
|
||||
link is established. For instance, if you are using the Linux
|
||||
<literal>pppd</literal>, you can add statements to the
|
||||
<literal>/etc/ppp/ip-up</literal> script to change user to
|
||||
<literal>news</literal> and run <literal>nntpsend.sh</literal></para>
|
||||
</section>
|
||||
</section>
|
||||
</chapter>
|
|
@ -0,0 +1,247 @@
|
|||
<chapter><title>Usenet news software</title>
|
||||
|
||||
<section><title>CNews and NNTPd</title>
|
||||
<para>
|
||||
Once upon a time, when Usenet news was a term not yet invented, the
|
||||
first recorded attempt to use a UUCP-based email backbone to maintain a
|
||||
replicated message repository, was called A-News. It connected four
|
||||
servers in four universities, and was written as Unix shell
|
||||
scripts.</para>
|
||||
|
||||
<para>The designers of A-News had not anticipated how much load users
|
||||
would put on their simplistic system. A far superior, more sophisticated,
|
||||
and faster implementation of Usenet news was written later, called
|
||||
B-News. This was a mix of C and shell scripts, and was designed
|
||||
much better than A-News, to allow handling of much larger volumes of
|
||||
messages. B-News v2.x was the current version in around 1990. By 1992 or
|
||||
so, it had been surpassed by C-News.</para>
|
||||
|
||||
<para>C-News was written by Henry Spencer and Geoff Collyer of the
|
||||
Department of Zoology, University of Toronto, almost entirely in shell
|
||||
and <literal>awk</literal>, as a replacement for B-News. Once again, the
|
||||
focus was on adding some extra features and a lot of performance. The
|
||||
first release was called Shellscript Release, which was deployed by a very
|
||||
large number of servers worldwide, as a natural upgrade to B-News. This
|
||||
version of C-News even had upward compatibility with B-News meta-data,
|
||||
<emphasis>e.g.</emphasis> history files. This was the version of C-News
|
||||
which was initially rolled out in 1992 or so at the National Centre for
|
||||
Software Technology (NCST, <literal>http://www.ncst.ernet.in</literal>)
|
||||
and the Indian Institute of Technologies in India as part of the Indian
|
||||
ERNET network.</para>
|
||||
|
||||
<para>The Shellscript Release was soon followed by a re-write with a lot
|
||||
more C code, called Performance Release, and then a set of cleanup and
|
||||
component integration steps leading to the last release called the
|
||||
Cleanup Release. This Cleanup Release was revised many times, and the
|
||||
last one was CR.G (Cleanup Release revision G). The version of C-News
|
||||
discussed in this HOWTO is a set of bug fixes on CR.G.</para>
|
||||
|
||||
<para>Since C-News came from shellscript-based antecedents, its
|
||||
architecture followed the set-of-programs style so typical of Unix,
|
||||
rather than large monolothic software systems traditional to some other
|
||||
OSs. All pieces had well-defined roles, and therefore could be easily
|
||||
replaced with other pieces as needed. This allowed easy adaptations and
|
||||
upgradations. This never affected performance, because key components
|
||||
which did a lot of work at high speed, <emphasis>e.g.</emphasis>
|
||||
<literal>newsrun</literal>, had been rewritten in C by that time. Even
|
||||
within the shellscripts, crucial components which handled binary data,
|
||||
<emphasis>e.g.</emphasis> a component called <literal>dbz</literal>
|
||||
to manipulate efficient on-disk hash arrays, were C programs with
|
||||
command-line interfaces, called from scripts.</para>
|
||||
|
||||
<para>C-News was born in a world with widely varying network line speeds,
|
||||
where bandwidth utilisation was a big issue and dialup links with UUCP
|
||||
file transfers was common. Therefore, it has very strong support for
|
||||
batched feeds, specially with a variety of compression techniques and
|
||||
over a variety of fast and slow transport channels. And C-News virtually
|
||||
does not know the existence of TCP/IP, other than one or two tiny batch
|
||||
transport programs like <literal>viarsh</literal>. However, its design
|
||||
was so modular that there was absolutely no problem in plugging in NNTP
|
||||
functionality using a separate set of C programs without modifying
|
||||
a single line of C-News. This was done by a program suite called
|
||||
NNTPd.</para>
|
||||
|
||||
<para>This software suite could work with B-News and C-News article
|
||||
repositories, and provided the full NNTP functionality. Since B-News
|
||||
died a gradual death, the combination of C-News and NNTPd became a freely
|
||||
redistributable, portable, modern, extensible, and high-performance
|
||||
software suite for Unix Usenet servers. Further refinements were
|
||||
added later, <emphasis>e.g.</emphasis> <literal>nov</literal>, the News
|
||||
Overview package and <literal>pgpverify</literal>, a public-key-based
|
||||
digital signature module to protect Usenet news servers against
|
||||
fraudulent control messages.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>INN</title>
|
||||
<para>
|
||||
INN is one of the two most widely used Usenet news server solutions. It
|
||||
was written by Rich Salz for Unix systems which have a socket API ---
|
||||
probably all Unix systems do, today.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
INN has an architecture diametrically opposite to CNews. It is a
|
||||
monolithic program, which is started at bootup time, and keeps running
|
||||
till your server OS is shut down. This is like the way high performance
|
||||
HTTP servers are run in most cases, and allows INN to cache a lot of
|
||||
things in its memory, including message-IDs of recently posted messages,
|
||||
<emphasis>etc.</emphasis> This interesting architecture has been discussed
|
||||
in an interesting paper by the author, where he explains the problems
|
||||
of the older BNews and CNews systems that he tried to address. Anyone
|
||||
interested in Usenet software in general and INN in particular should
|
||||
study this paper.</para>
|
||||
|
||||
<para>
|
||||
INN addresses a Usenet news world which revolves around NNTP, though it
|
||||
has support for UUCP batches --- a fact that not many INN administrators
|
||||
seem to talk about. The primary situations where INN works at higher
|
||||
efficiency over the CNews-NNTPd combination are in processing incoming
|
||||
NNTP feeds when there are multiple incoming NNTP feeds. For multiple
|
||||
readers reading and posting news over NNTP, there is no difference
|
||||
between the efficiency of INN and NNTPd. <xref linkend="innefficiency"/>
|
||||
discusses the efficiency issues of INN over the earlier CNews
|
||||
architecture, based on Rich Salz' paper and our analyses of usage
|
||||
patterns.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
INN's architecture has inspired a lot of high-performance Usenet news
|
||||
software, including a lot of commercial systems which address the
|
||||
``carrier class'' market. That is the market for which the INN
|
||||
architecture has clear advantages over C-News.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section><title>Leafnode</title>
|
||||
<para>
|
||||
This is an interesting software system, to set up a ``small'' Usenet
|
||||
news server on one computer which only receives newsfeeds but does not
|
||||
have the headache of sending out bulk feeds to other sites,
|
||||
<emphasis>i.e.</emphasis> it is a ``leaf node'' in the newsfeed flow
|
||||
diagram.</para>
|
||||
|
||||
<para>This software is a sort of combination of article repository and
|
||||
NNTP news server, and receives articles, digests and stores them on the
|
||||
local hard disks, expires them periodically, and serves them to an NNTP
|
||||
reader. It is claimed that it is simple to manage and is ideal for
|
||||
installation on a desktop-class Unix or Linux box, since it does not
|
||||
take up much resources.</para>
|
||||
|
||||
<para>Leafnode is based on an appealing idea, but we find no problem
|
||||
using C-News and NNTPd on a desktop-class box. Its resource consumption is
|
||||
somewhat proportional to the volume of articles you want it to process,
|
||||
and the number of groups you'll want to retain for a small team of users
|
||||
will be easily handled by C-News on a desktop-class computer. An office
|
||||
of a hundred users can easily use C-News and NNTPd on a desktop computer
|
||||
running Linux, with 64 MBytes of RAM, IDE drives, and sufficient disk
|
||||
space. Of course, ease of configuration and management is dependent on
|
||||
familiarity, and we are more familiar with C-News than with Leafnode. We
|
||||
hope this HOWTO will help you in that direction.</para>
|
||||
|
||||
<para>TO BE EXTENDED AND CORRECTED.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Suck</title>
|
||||
<para>Suck is a program which lets you pull out an NNTP feed from an NNTP
|
||||
server and file it locally. It does not contain any article repository
|
||||
management software, expecting you to do it using some other
|
||||
software system, <emphasis>e.g.</emphasis> C-News or INN. It can
|
||||
create batchfiles which can be fed to C-News, for instance. (Well,
|
||||
to be fair, Suck <emphasis>does</emphasis> have an option to store the
|
||||
fetched articles in a spool directory tree very much like what is used
|
||||
by C-News or INN in their article area, with one file per article. You
|
||||
can later read this raw message spool area using a mail client which
|
||||
supports the <literal>msgdir</literal> file layout for mail folders,
|
||||
like MH, perhaps. We don't find this option useful if you're running
|
||||
Suck on a Usenet server.) Suck finally boils down to a single
|
||||
command-line program which is invoked periodically, typically from
|
||||
<literal>cron</literal>. It has a zillion command-line options which
|
||||
are confusing at first, but later show how mature and finely tunable
|
||||
the software is.</para>
|
||||
|
||||
<para>If you need an NNTP pull feed, then we know of no better programs
|
||||
than Suck for the job. The <literal>nntpxfer</literal> program which
|
||||
forms part of the NNTPd package also implements an NNTP pull feed, for
|
||||
instance, but does not have one-tenth of the flexibility and fine-tuning
|
||||
of Suck. One of the banes of the NNTP pull feed is connection timeouts;
|
||||
Suck allows a lot of special tuning to handle this problem. If we had
|
||||
to set up a Usenet server with an NNTP pull feed, we'd use Suck right
|
||||
away.</para>
|
||||
|
||||
<para>TO BE EXTENDED AND CORRECTED.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>Carrier class software</title>
|
||||
|
||||
<para>We have touched upon the characteristics of carrier-class Usenet
|
||||
software in the section where we discuss NNTP efficiency issues. As that
|
||||
bit shows, the requirements of carrier-class Usenet servers is very
|
||||
different from those run within organisations and institutes for
|
||||
providing internal service to their members.</para>
|
||||
|
||||
<para>Carrier-class servers are expected to handle a complete feed of all
|
||||
articles in all newsgroups, including a lot of groups which have what we
|
||||
call a ``high noise-to-signal ratio.'' They do not have the luxury of
|
||||
choosing a ``useful'' subset like administrators of internal corporate
|
||||
Usenet servers do. Secondly, carrier-class servers are expected to turn
|
||||
articles around very fast, <emphasis>i.e.</emphasis> they are expected to
|
||||
have very low latency from the moment they receive an article to the
|
||||
time they retransmit it by NNTP to downstream servers. Third, they are
|
||||
supposed to provide very high availability, <emphasis>i.e.</emphasis>
|
||||
they are supposed to be like other carrier class services. This usually
|
||||
means that they have parallel arrays of computers in load sharing
|
||||
configurations. And fourth, they usually do not cater to retail
|
||||
connections for reading and posting articles by human users. Usenet news
|
||||
carriers usually reserve separate computers to handle retail
|
||||
connections.</para>
|
||||
|
||||
<para>Thus, carrier-class servers do not need to maintain a repository
|
||||
of articles with the usual residence times of days or weeks, and expire
|
||||
articles after they age. They only need to focus on super-efficient
|
||||
re-transmission. These highly specialised servers have software
|
||||
which receive an article over NNTP, parse it, and immediately re-queue
|
||||
it for outward transmission to dozens or hundreds of other servers. And
|
||||
since they work at these high throughputs, their downstream servers
|
||||
are also expected to be live on the Internet round the clock to receive
|
||||
incoming NNTP connections from the carrier servers. Therefore, there's
|
||||
no batching or long queueing needed, and batching cannot be used. In
|
||||
fact, some carrier class servers state that if you wish to receive feeds
|
||||
from them, then your servers need to be available round the clock and
|
||||
connected with lines fast enough to take the blast of a full feed. If
|
||||
you do not fulfil these conditions, your servers will lose articles,
|
||||
and the carrier is not responsible for the loss.</para>
|
||||
|
||||
<para>Therefore, one can almost say that carrier-class servers have
|
||||
neither article repositories nor queues other than the current message(s)
|
||||
being re-transmitted. If they fail to connect to five of their fifty
|
||||
downstream neighbours, or fail to push an article through due to
|
||||
a transmit error, those five neighbours will never receive that
|
||||
article later from this server; the article will be dropped from their
|
||||
queues. Retries are not part of the game. Therefore, carrier-class
|
||||
Usenet servers are more like packet routers than servers with
|
||||
repositories.</para>
|
||||
|
||||
<para>It can be seen why carrier-class software cannot hope to do its
|
||||
job using batch-oriented repository management software like C-News and
|
||||
why it needs a totally NNTP-oriented implementation. Therefore, the INN
|
||||
antecedents of some of these systems is to be expected. We would
|
||||
<emphasis>love</emphasis> to hear from any Linux HOWTO reader whose
|
||||
Usenet server needs include carrier-class behaviour.</para>
|
||||
|
||||
<para>As far as we know, there is no freely redistributable software
|
||||
implementation of carrier-class Usenet news servers. There is no reason
|
||||
why such services cannot be offered on Linux, even Intel Linux, provided
|
||||
you have fast network links and arrays of servers. Linux as an OS platform
|
||||
is not an issue here, but free software has not yet been made available
|
||||
for this niche. Presumably it is because the users of such software are
|
||||
service providers who earn money using it, and therefore are expected
|
||||
to be willing to pay for it.</para>
|
||||
|
||||
<para>TO BE EXTENDED AND CORRECTED.</para>
|
||||
|
||||
</section>
|
||||
|
||||
</chapter>
|
|
@ -0,0 +1,158 @@
|
|||
<chapter> <title>What is the Usenet?</title>
|
||||
|
||||
<section> <title>Discussion groups </title>
|
||||
<para>The Usenet is a huge worldwide collection of discussion
|
||||
groups. Each discussion group has a name, <emphasis>e.g.</emphasis>
|
||||
<literal>comp.os.linux.announce</literal>, and a collection of messages.
|
||||
These messages, usually called <emphasis>articles</emphasis>, are posted
|
||||
by readers like you and me who have access to Usenet servers, and are
|
||||
then stored on the Usenet servers.</para>
|
||||
|
||||
<para>This ability to both read and write into a Usenet newsgroup makes
|
||||
the Usenet very different from the bulk of what people today call ``the
|
||||
Internet.'' The Internet has become a colloquial term to refer to the
|
||||
World Wide Web, and the Web is (largely) read-only. There are online
|
||||
discussion groups with Web interfaces, and there are mailing lists, but
|
||||
Usenet is probably more convenient than either of these for most large
|
||||
discussion communities. This is because the articles get replicated to
|
||||
your local Usenet server, thus allowing you to read and post articles
|
||||
without accessing the global Internet, something which is of great value
|
||||
for those with slow Internet links. Usenet articles also conserve
|
||||
bandwidth because they do not come and sit in each member's mailbox, unlike
|
||||
email
|
||||
based mailing lists. This way, twenty members of a mailing list in one
|
||||
office will have twenty copies of each message copied to their
|
||||
mailboxes. However, with a Usenet discussion group and a local Usenet
|
||||
server, there's just one copy of each article, and it does not fill up
|
||||
anyone's mailbox.</para>
|
||||
|
||||
<para>Another nice feature of having your own local Usenet server is
|
||||
that articles stay on the server even after you've read them. You can't
|
||||
accidentally delete a Usenet articles the way you can delete a message
|
||||
from your mailbox. This way, a Usenet server is an
|
||||
<emphasis>excellent</emphasis> way to archive articles of a group
|
||||
discussion on a local server without placing the onus of archiving on
|
||||
any group member. This makes local Usenet servers very valuable as
|
||||
archives of internal discussion messages within corporate Intranets,
|
||||
provided the article expiry configuration of the Usenet server software
|
||||
has been set up for sufficiently long expiry periods.</para>
|
||||
</section>
|
||||
|
||||
<section> <title>How it works, loosely speaking</title>
|
||||
<para> Usenet news works by the reader first firing up a Usenet news
|
||||
program, which in today's GUI world will highly likely be something like
|
||||
Netscape Messenger or Microsoft's Outlook Express. There are a lot of
|
||||
proven, well-designed character-based Usenet news readers, but a proper
|
||||
review of the user agent software is outside the scope of this HOWTO, so
|
||||
we will just assume that you are using whatever software you like. The
|
||||
reader then selects a Usenet newsgroup from the hundreds or thousands of
|
||||
newsgroups which are hosted by her local server, and accesses all unread
|
||||
articles. These articles are displayed to her. She can then decide to
|
||||
respond to some of them.</para>
|
||||
|
||||
<para>When the reader writes an article, either in response to an
|
||||
existing one or as a start of a brand-new thread of discussion, her
|
||||
software <emphasis>posts</emphasis> this article to the Usenet server.
|
||||
The article contains a list of newsgroups into which it is to be posted.
|
||||
Once it is accepted by the server, it becomes available for other users
|
||||
to read and respond to. The article is automatically
|
||||
<emphasis>expired</emphasis> or deleted by the server from its internal
|
||||
archives based on expiry policies set in its software; the author of the
|
||||
article usually can do little or nothing to control the expiry of her
|
||||
articles.</para>
|
||||
|
||||
<para>A Usenet server rarely works on its own. It forms a part of
|
||||
a collection of servers, which automatically exchange articles with
|
||||
each other. The flow of articles from one server to another is called a
|
||||
<emphasis>newsfeed</emphasis>. In a simplistic case, one can imagine a
|
||||
worldwide network of servers, all configured to replicate articles with
|
||||
each other, busily passing along copies across the network as soon as one
|
||||
of them receives a new articles posted by a human reader. This replication
|
||||
is done by powerful and fault-tolerant processes, and gives the Usenet
|
||||
network its power. Your local Usenet server literally has a copy of all
|
||||
current articles in all relevant newsgroups.</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section> <title>About sizes, volumes, and so on </title>
|
||||
<para>Any would-be Usenet server administrator or creator
|
||||
<emphasis>must</emphasis> read the <quote>Periodic Posting about the basic steps
|
||||
involved in configuring a machine to store Usenet news,</quote> also known as
|
||||
the Site Setup FAQ, available from
|
||||
<literal>ftp://rtfm.mit.edu/pub/usenet/news.answers/usenet/site-setup</literal>
|
||||
or
|
||||
<literal>ftp://ftp.uu.net/usenet/news.answers/news/site-setup.Z</literal>.
|
||||
It was last updated in 1997, but trends haven't changed much since
|
||||
then, though absolute volume figures have.</para>
|
||||
|
||||
<para>If you want your Usenet server to be a repository for all articles
|
||||
in all newsgroups, you will probably not be reading this HOWTO, or even
|
||||
if you do, you will rapidly realise that anyone who needs to read this
|
||||
HOWTO may not be ready to set up such a server. This is because the
|
||||
volumes of articles on the Usenet have reached a point where very
|
||||
specialised networks, very high end servers, and large disk arrays
|
||||
are required for handling such Usenet volumes. Those setups are called
|
||||
``carrier-class'' Usenet servers, and will be discussed a bit later on in
|
||||
this HOWTO. Administering such an array of hardware may not be the job
|
||||
of the new Usenet administrator, for which this HOWTO (and most Linux
|
||||
HOWTO's) are written.</para>
|
||||
|
||||
<para>Nevertheless, it may be interesting to understand what volumes we
|
||||
are talking about. Usenet news article volumes have been doubling every
|
||||
fourteen months or so, going by what we hear in comments from
|
||||
carrier class Usenet administrators. In the beginning of 1997, this
|
||||
volume was 1.2 GBytes of articles a day. Thus, the volumes should have
|
||||
roughly done five doublings, or grown 32 times, by the time we reach
|
||||
mid-2002, at the time of this writing. This gives us a volume of 38.4
|
||||
GBytes per day. Assume that this transfer happens using uncompressed
|
||||
NNTP (the norm), and add 50% extra for the overheads of NNTP, TCP,
|
||||
and IP. This gives you a raw data transfer volume of 57.6 GBytes/day or
|
||||
about 460 Gbits/day. If you have to transfer such volumes of data in 24
|
||||
hours (86400 seconds), you'll need raw bandwidth of about 5.3 Mbits per
|
||||
second just to <emphasis>receive all these articles</emphasis>. You'll
|
||||
need more bandwidth to send out feeds to other neighbouring Usenet
|
||||
servers, and then you'll need bandwidth to allow your readers to access
|
||||
your servers and read and post articles in retail quantities. Clearly,
|
||||
these volume figures are outside the network bandwidths of most
|
||||
corporate organisations or educational institutions, and therefore only
|
||||
those who are in the business of offering Usenet news can afford
|
||||
it.</para>
|
||||
|
||||
<para>At the other end of the scale, it is perfectly feasible for a
|
||||
small office to subscribe to a well-trimmed subset of Usenet newsgroups,
|
||||
and exclude most of the high-volume newsgroups. Starcom Software, where
|
||||
the authors of this HOWTO work, has worked with a fairly large subset of
|
||||
600 newsgroups, which is still a tiny fraction of the 15,000+ newsgroups
|
||||
that the carrier class services offer. Your office or college may not
|
||||
even need 600 groups. And our company had excluded specific high-volume
|
||||
but low-usefulness newsgroups like the <literal>talk</literal>,
|
||||
<literal>comp.binaries</literal>, and <literal>alt</literal>
|
||||
hierarchies. With the pruned subset, the total volume of articles per
|
||||
day may amount to barely a hundred MBytes a day or so, and can be easily
|
||||
handled by most small offices and educational institutions. And in such
|
||||
situations, a single Intel Linux server can deliver excellent performance
|
||||
as a Usenet server.</para>
|
||||
|
||||
<para>Then there's the <emphasis>internal</emphasis> Usenet service. By
|
||||
internal here, we mean a private set of Usenet newsgroups, not a private
|
||||
computer network. Every company or university which runs a Usenet news
|
||||
service creates its own hierarchy of internal newsgroups, whose articles
|
||||
never leave the campus or office, and which therefore do not consume
|
||||
Internet bandwidth. These newsgroups are often the ones most hotly
|
||||
accessed, and will carry more <emphasis>internally generated</emphasis>
|
||||
traffic than all the ``public'' newsgroups you may subscribe to, within your
|
||||
organisation. After all, how often does a guy have something to say
|
||||
which is relevant to the world at large, unless he's discussing a globally
|
||||
relevant topic like ``Unix rules!''? If such internal newsgroups are the
|
||||
focus of your Usenet servers, then you may find that fairly modest
|
||||
hardware and Internet bandwidth will suffice, depending on the size of
|
||||
your organisation.</para>
|
||||
|
||||
<para>The new Usenet server administrator has to undertake a sizing
|
||||
exercise to ensure that he does not bite off more than he, or his
|
||||
network resources, can chew. We hope we have provided sufficient
|
||||
information for him to get started with the right questions.</para>
|
||||
|
||||
</section>
|
||||
|
||||
</chapter>
|
Loading…
Reference in New Issue