This commit is contained in:
gferg 2002-07-30 14:54:25 +00:00
parent f5992572cc
commit a244760233
14 changed files with 3180 additions and 0 deletions

View File

@ -0,0 +1,66 @@
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
<!ENTITY what system "what.sgml">
<!ENTITY pop system "pop.sgml">
<!ENTITY software system "software.sgml">
<!ENTITY settingup system "settingup.sgml">
<!ENTITY inn system "inn.sgml">
<!ENTITY mail2news system "mail2news.sgml">
<!ENTITY accesscontrol system "accesscontrol.sgml">
<!ENTITY components system "components.sgml">
<!ENTITY monitoring system "monitoring.sgml">
<!ENTITY clients system "clients.sgml">
<!ENTITY perspective system "perspective.sgml">
<!ENTITY doc system "doc.sgml">
<!ENTITY conclusion system "conclusion.sgml">
]>
<book>
<bookinfo>
<title>Usenet News HOWTO </title>
<authorgroup>
<author>
<firstname>Shuvam Misra</firstname>
<othername>
</othername>
</author>
<author>
<firstname>Hema Kariyappa</firstname>
<othername>
<emphasis>(usenet@starcomsoftware.com)</emphasis>
</othername>
</author>
</authorgroup>
<address>Starcom Software Private Limited.
starcomsoftware.com
<country>Mumbai, India</country>
</address>
<revhistory>
<revision>
<revnumber>2.0</revnumber>
<date>2002-07-30</date>
<authorinitials>sm</authorinitials>
<revremark>Major update by new authors.</revremark>
</revision>
<revision>
<revnumber>1.4</revnumber>
<date>1995-11-29</date>
<authorinitials>vs</authorinitials>
<revremark>Original document; authored by Vince Skahan.</revremark>
</revision>
</revhistory>
</bookinfo>
<toc></toc>
&what;
&pop;
&software;
&settingup;
&inn;
&mail2news;
&accesscontrol;
&components;
&monitoring;
&clients;
&perspective;
&doc;
&conclusion;
</book>

View File

@ -0,0 +1,52 @@
<chapter><title>Access control in NNTPd</title>
<para>
The original NNTPd had host-based authentication which allowed clients
connecting from a particular IP address to read only certain newsgroups.
This was very clearly inadequate for enterprise deployment on an
Intranet, where each desktop computer has a different IP address, often
DHCP-assigned, and the mapping between person and desktop is not static.
</para>
<para>
What was needed was a user-based authentication, where a username and
password could be used to authenticate the user. Even this was provided
as an extension to NNTPd, but more was needed. The corporate IS manager
needs to ensure that certain Usenet discussion groups remain visible only
to certain people. This authorisation layer was not available in NNTPd.
Once authenticated, all users could read all newsgroups.
</para>
<para>
We have extended the user-based authentication facility in NNTPd in some
(we hope!) useful ways, and we have added an entire authorisation layer
which lets the administrator specify which newsgroups each user can
read. With this infrastructure, we feel NNTPd is fit for enterprise
deployment and can be used to handle corporate document repositories,
messages, and discussion archives. Details are given below.
</para>
<section><title>Host-based access control</title>
<para></para>
</section>
<section><title>User authentication and authorisation</title>
<para></para>
<section><title>The NNTPd password file</title>
<para></para>
</section>
<section><title>Mapping users to newsgroups</title>
<para></para>
</section>
<section><title>The <literal>X-Authenticated-Author</literal> article header</title>
<para></para>
</section>
<section><title>Other article header additions</title>
<para></para>
</section>
</section>
</chapter>

View File

@ -0,0 +1,67 @@
<chapter><title>Usenet news clients</title>
<para>
This HOWTO was written to allow a Linux system administrator provide the
Usenet news service to readers of those articles. The rest of this HOWTO
focuses on the server-end software and systems, but one chapter
dedicated to the clients does not seem disproportionate, considering
that the <emphasis>raison d'etre</emphasis> of Usenet news servers is to serve
these clients.
</para>
<para>
The overwhelming majority of clients are software programs which access
the article database, either by reading <literal>/var/spool/news</literal> on a
Unix system or over NNTP, and allow their human users to read and post
articles. We can therefore probably term this class of programs UUA, for
Usenet User Agents, along the lines of MUA for Mail User Agents.
</para>
<para>
There are other special-purpose clients, which either pull out articles
to copy or transfer somewhere else, or for analysis, <emphasis>e.g.</emphasis> a
search engine which allows you to search a Usenet article archive, like Google
(<literal>www.google.com</literal>) does.
</para>
<para>
This chapter will discuss issues in UUA software design, and bring out
essential features and efficiency and management issues. What this
chapter will certainly <emphasis>never</emphasis> attempt to do is catalogue all
the different UUA programs available in the world --- that is best left to
specialised catalogues on the Internet.
</para>
<para>
This chapter will also briefly cover special-purpose clients which
transfer articles or do other special-purpose things with them.
</para>
<section><title>Usenet User Agents</title>
<section><title>Accessing articles: NNTP or spool area?</title>
<para></para>
</section>
<section><title>Threading</title>
<para></para>
</section>
<section><title>Quick reading features</title>
<para></para>
</section>
</section>
<section><title>Clients that transfer articles</title>
<para>
We will discuss Suck and <literal>nntpxfer</literal> from the NNTP server
distribution here. Suck has already discussed earlier. We will be happy
to take contributed additions that discuss other client software.
</para>
</section>
<section><title>Special clients</title>
<para></para>
</section>
</chapter>

View File

@ -0,0 +1,373 @@
<chapter><title>Components of a running system</title>
<para>
This chapter reviews the components of a running CNews+NNTPd server.
Analogous components will be found in an INN-based system too. We invite
additions from readers familiar with INN to add their pieces to this
chapter.
</para>
<section><title><literal>/var/lib/news</literal>: the CNews control area</title>
<para>
This directory is more popularly known as <literal>$NEWSCTL</literal>. It
contains configuration, log and status files. There are no
articles or binaries kept here. Let's see what some of the
files are meant for.
</para>
<itemizedlist>
<listitem><para><literal>sys</literal>:
One line per system/NDN listing all the newsgroup
hierarchies each system subscribes to. Each line is prefixed with the system
name and the one beginning with ME: indicates what we are going to receive.
Look up manpage of <literal>newssys</literal>.
</para></listitem>
<listitem><para><literal>explist</literal>:
This file has entries indicating articles of which
newsgroup expire and when and if they have to be archived. The order in
which the newsgroups are listed is important. See manpage of
<literal>expire</literal> for file format.
</para></listitem>
<listitem><para><literal>batchparms</literal>:
Details of how to feed other sites/NDN, like the size of
batches, the mode of transmission (UUCP/NNTP) are specified here.
manpage to refer: <literal>newsbatch</literal>.
</para></listitem>
<listitem><para><literal>controlperm</literal>:
If you wish to authenticate a control message before any
action is taken on it, you must enter authentication-related information
here. The <literal>controlperm</literal> manpage will list all the fields
in detail.
</para></listitem>
<listitem><para><literal>mailpaths</literal>:
It features the e-mail address of the moderator for each
newsgroup who is responsible for approving/disapproving
articles posted to moderated newsgroups. The sample
<literal>mailpaths</literal> file in the <literal>tar</literal> will
you give an idea of how entries are made.
</para></listitem>
<listitem><para><literal>nntp_access/user_access</literal>:
These files contain entries of servernames
and usernames on whom restrictions will apply when accessing newsgroups.
Again, the sample file in the tarball shall explain the format of the file.
</para></listitem>
<listitem><para><literal>log, errlog</literal>:
These are log files that keep growing large with each batch
that is received. The <literal>log</literal> file has one entry per
<literal>article</literal> telling you if it
has been accepted by your news server or rejected. To understand the
format of this file, refer to Chapter 2.2 of the <literal>CNews</literal>
guide. Errors, if any, while digesting the articles are
logged in <literal>errlog</literal>. These
log files have to be rolled as the files hog a lot of disk space.
</para></listitem>
<listitem><para><literal>nntplog</literal>:
This file logs information of the <literal>nntp daemon</literal> giving
details of when a connection was established/broken and what commands were
issued. This file needs to be configured in syslog and syslog
<literal>daemon</literal> should be running.
</para></listitem>
<listitem><para><literal>active</literal>:
This file has one line per newsgroup to be found in your news
server. Besides other things, it tells you how many articles are
currently present in each newsgroup. It is updated when each batch is
digested or when articles are expired. The <literal>active</literal>
manpage will furnish more details about other paramaters.
</para></listitem>
<listitem><para><literal>history</literal>:
This file, again, contains one line per <literal>article</literal>, mapping
<literal>message-id</literal> to newsgroup name and also giving its
associated <literal>article</literal> no. in that newsgroup. It is updated
each time a feed is digested
and when <literal>doexpire</literal> is run. Plays a key role in
loop-detection and serves as an article database. Read manpage of
<literal>newsdb</literal>, <literal>doexpire</literal> for the file format
</para></listitem>
<listitem><para>newsgroups:
It has a one line description for each newsgroup explaining
what kind of posts go into each of them. Ideally speaking, it should cover
all the newsgroups found in the <literal>active</literal> file.
</para></listitem>
<listitem><para>Miscellaneous files:
Files like <literal>mailname</literal>, <literal>organisation</literal>,
<literal>whoami</literal> contain information required for forming some of
the headers of an <literal>article</literal>. The contents of
<literal>mailname</literal> form the <literal>From:</literal> header and
that of <literal>organisation</literal> form the
<literal>Organisation:</literal> header. <literal>whoami</literal> contains
the name of the news system. Refer to chapter 2.1 of
<literal>guide.ps</literal> for a detailed list of files in the
<literal>$NEWSCTL</literal> area. Read <literal>RFC 1036</literal> for
description of article headers .
</para></listitem>
</itemizedlist>
</section>
<section><title><literal>/var/spool/news</literal>: the article repository</title>
<para>
This is also known as the <literal>$NEWSARTS</literal> or
<literal>$NEWSSPOOL</literal> directory. This is where the
articles reside on your disk. No binaries or control files
should belong here. Enough space should be allocated to this
directory as the number of articles keep increasing with each
batch that is digested. An explanation of the following sub-directories will
give you an overview of this directory:
<itemizedlist>
<listitem><para><literal>in.coming</literal>:
Feeds/batches/articles from NDNs on their arrival and
before being processed reside in this directory. After processing, they
appear in
<literal>$NEWSARTS</literal> or in its <literal>bad</literal> sub-directory
if there were errors.
</para></listitem>
<listitem><para><literal>out.going</literal>:
This directory contains batches/feeds to be sent to your
NDNs i.e. feeds to be pushed to your neighbouring sites reside here
before they are transmitted. It contains one sub-directory per NDN mentioned
in the <literal>sys</literal> file. These sub-directories contain files
called <literal>togo</literal>
which contain information about the <literal>article</literal> like the
<literal>message-id</literal>
or the article no. that is queued for transmission.
</para></listitem>
<listitem><para><anchor id="newsgroupdir"/>newsgroup directories:
For each newsgroup hierarchy that the news server
has subscribed to, a directory is created under
<literal>$NEWSARTS</literal>.
Further sub-directories are created under the parent to hold
articles of specific newsgroups. For instance, for a
newsgroup like <literal>comp.music.compose</literal>, the parent directory
<literal>comp</literal> will appear in <literal>$NEWSARTS</literal> and a
sub-directory called <literal>music</literal> will be created under
<literal>comp</literal>. The <literal>music</literal> sub-directory
shall contain a further sub-directory called <literal>compose</literal> and
all articles of <literal>comp.music.compose</literal>
shall reside here. In effect,
<literal>article</literal> 242 of newsgroup
<literal>comp.music.compose</literal> shall map to file
<literal>$NEWSARTS/comp/music/compose/242</literal>.
</para></listitem>
<listitem><para>control:
The control directory houses only the control messages that
have been received by this site. The control messages could be any of the
following: <literal>newgroup, rmgroup, checkgroup</literal> and
<literal>cancel</literal>
appearing in the subject line of the <literal>article</literal>.
</para></listitem>
<listitem><para><literal>junk</literal>:
The <literal>junk</literal> directory contains all
articles that the news
server has received and has decided, after processing, that it does not
belong to any of the hierarchies it has subscribed to. The news server
transfers/passes all <literal>articles</literal> in this directory to NDNs
that have subscribed to the <literal>junk</literal> hierarchy.
</para></listitem>
</itemizedlist>
</section>
<section><title><literal>/usr/lib/newsbin</literal>: the executables</title>
<para></para>
</section>
<section id="cronjobs"><title><literal>crontab and cron jobs </literal></title>
<para>
The heart of the Usenet news server is the various scripts that run at regular
intervals processing articles, digesting/rejecting them and
transmitting them to NDNs. I shall try to enumerate the ones that are important
enough to be cronned. :)
</para>
<itemizedlist>
<listitem><para><literal>newsrun</literal>:
The key script. This script picks the batches in the
<literal>in.coming</literal> directory, uncompresses them if necessary and
feeds it to <literal>relaynews</literal> which then processes each
<literal>article</literal> digesting and
batching them and logging any errors. This script needs to run through cron
as frequently as you want the feeds to be digested. Every half hour should
suffice for a non-critical requirement.
</para></listitem>
<listitem><para><literal>sendbatches</literal>:
This script is run to transmit the togo files formed in
the <literal>out.going</literal> directory to your NDNs. It reads the
<literal>batchparms</literal> file to know
exactly how and to whom the batches need to be transmitted. The frequency,
again, can be set according to your requirements. Once an hour should be
sufficient.
</para></listitem>
<listitem><para><literal>newsdaily</literal>:
This script does maintenance chores like rolling logs and
saving them, reporting errors/anomalies and doing cleanup jobs.
It should typically run once a day.
</para></listitem>
<listitem><para><literal>newswatch</literal>:
This looks for news problems at a more detailed level than
newsdaily like looking for persistent lock files, determining if there is
enough space for a minimum no. of files, if there is a huge queue of
unattended batches and the likes. This should typically run once every hour.
For more on this and the above, read the <literal>newsmaint</literal>
manpage.
</para></listitem>
<listitem><para><literal>doexpire</literal>:
This script expires old articles as determined by the
control file <literal>explist</literal> and updates the
<literal>active</literal> file. This is necessary if you do not
want unnecessary/unwanted articels hogging up your disk space. Run it once
a day. Manpage: <literal>expire</literal>
</para></listitem>
<listitem><para><literal>newsrunning off/on</literal>:
This script shuts/starts off the news server for you.
You could choose to add this in your cron job if you think the news server
takes up lots of CPU time during peak hours and you wish to keep a check on
it.
</para></listitem>
</itemizedlist>
</section>
<section><title><literal>newsrun</literal> and <literal>relaynews</literal>: digesting received articles </title>
<para>
The heart and soul of the Usenet News system, <literal>newsrun</literal> just picks up the batches/
articles in the <literal>in.coming</literal> directory of
<literal>$NEWSARTS</literal> and uncompresses them (if required) and calls
<literal>relaynews</literal>. It should run from cron.
</para>
<para>
<literal>relaynews</literal> picks up each <literal>article</literal> one by one
through stdin, determines if it belongs to a subscribed group by looking up
<literal>sys</literal> file, looks in the <literal>history</literal> file
to determine that it does not already exist locally, digests it updating the
<literal>active</literal> and <literal>history</literal> file and batches it
for neighbouring sites. Logs errors on encountering problems while processing
the <literal>article</literal> and takes appropriate action if it happens to be
a control message. More info in manpage of <literal>relaynews</literal>.
</para>
</section>
<section><title><literal>doexpire</literal> and <literal>expire</literal>: removing old articles </title>
<para>
A good way to get rid of unwanted/old articles from the
<literal>$NEWSARTS</literal> area is to run doexpire once a day. It reads the
<literal>explist</literal> file from the <literal>$NEWSCTL</literal> directory
to determine what articles expire today. It can archive the
said <literal>article</literal> if so configured. It then updates the
<literal>active</literal> and the <literal>history</literal> file accordingly.
If you wish to retain the <literal>article</literal> entry in the
<literal>history</literal> file to avoid re-digesting it as a new
article after having expired it add a special /expired/; line
in the control file. More on the options and functioning in the expire manpage.
</para>
</section>
<section><title><literal>nntpd</literal> and <literal>msgidd</literal>: managing the NNTP interface </title>
<para>
As has already been discussed in the chapter on setting up the software,
<literal>nntpd</literal> is a TCP-based server daemon which runs under
<literal>inetd</literal>. It is fired by <literal>inetd</literal>
whenever there's an incoming connection on the NNTP port, and it takes
over the dialogue from there. It reads the C-News configuration and data
files in <literal>$NEWSCTL</literal>, article files from
<literal>$NEWSARTS></literal>, and receives incoming posts and
transfers. These it dutifully queues in
<literal>$NEWSARTS/in.coming</literal>, either as batch files or single
article files.</para>
<para>It is important that <literal>inetd</literal> be configured to
fire <literal>nntpd</literal> as user <literal>news</literal>, not as
<literal>root</literal> like it does for other daemons like
<literal>telnetd</literal> or <literal>ftpd</literal>. If this is not
done correctly, a lot of problems can be caused in the functioning of
the C-News system later.</para>
<para><literal>nntpd</literal> is fired each time a new NNTP connection
is received, and dies once the NNTP client closes its connection. Thus,
if one <literal>nntpd</literal> receives a few articles by an incoming
batch feed (not a <literal>POST</literal> but an <literal>XFER</literal>),
then another <literal>nntpd</literal> will not know about the receipt of
these articles till the batches are digested. This will hamper
duplicate newsfeed detection if there are multiple upstream NDNs feeding
our server with the same set of articles over NNTP. To fix this,
<literal>nntpd</literal> uses an ally: <literal>msgidd</literal>, the
message ID daemon. This
daemon is fired once at server bootup time through
<literal>newsboot</literal>, and keeps running quietly in the
background, listening on a named Unix socket in the
<literal>$NEWSCTL</literal> area. It keeps in its memory a list of all
message IDs which various incarnations of <literal>nntpd</literal> have
asked it to remember.</para>
<para>Thus, when one copy of <literal>nntpd</literal> receives an
incoming feed of news articles, it updates <literal>msgidd</literal>
with the message IDs of these messages through the Unix socket. When
another copy of <literal>nntpd</literal> is fired later and the NNTP
client tries to feed it some more articles, the <literal>nntpd</literal>
checks each message ID against <literal>msgidd</literal>. Since
<literal>msgidd</literal> stores all these IDs in memory, the lookup is
very fast, and duplicate articles are blocked at the NNTP interface
itself.</para>
<para>On a running system, expect to see one instance of
<literal>nntpd</literal> for each active NNTP connection, and just one
instance of <literal>msgidd</literal> running quietly in the background,
hardly consuming any CPU resources. Our <literal>nntpd</literal> is
configured to die if the NNTP connection is more than a few minutes
idle, thus conserving server resources. This does not inconvenience the
user because modern NNTP clients simply re-connect. If an
<literal>nntpd</literal> instance is found to be running for days, it is
either hung due to a network error, or is receiving a very long incoming
NNTP feed from your upstream server. We used to receive our primary
incoming feed from our service provider through NNTP sessions lasting 18
to 20 hours without a break, every day.</para>
</section>
<section><title><literal>nov</literal>, the News Overview system</title>
<para>NOV, the News Overview System is a recent augmentation to the
C-News and NNTP systems and to the NNTP protocol. This subsystem
maintains a file for each active newsgroup, in which it maintains one
line per current article. This line of text contains some key meta-data
about the article, <emphasis>e.g.</emphasis> the contents of the
<literal>From</literal>, <literal>Subject</literal>,
<literal>Date</literal> and the article size and message ID. This speeds
up NNTP response enormously. The <literal>nov</literal> library has been
integrated into the <literal>nntpd</literal> code, and into key binaries
of C-News, thus providing seamless maintenance of the News Overview
database when articles are added or deleted from the repository.</para>
<para>When <literal>newsrun</literal> adds an article into
<literal>starcom.test</literal>, it also updates
<literal>$NEWSARTS/starcom/test/.overview</literal> and adds a line with
the relevant data, tab-separated, into it. When <literal>nntpd</literal>
comes to life with an NNTP client, and it sees the
<literal>XOVER</literal> NNTP command, it reads this
<literal>.overview</literal> file, and returns the relevant lines to the
NNTP client. When <literal>expire</literal> deletes an article, it also
removes the corresponding line from the <literal>.overview</literal>
file. Thus, the maintenance of the NOV database is seamless.</para>
</section>
<section><title>Batching feeds with UUCP and NNTP</title>
<para>Some information about batching feeds has been provided in earlier
sections. More will be added later here in this document.</para>
</section>
</chapter>

View File

@ -0,0 +1,96 @@
<chapter><title>Wrapping up</title>
<section><title>Acknowledgements</title>
<para>
This HOWTO is a by-product of many years of experience setting up and
managing Usenet news servers. We have learned a lot from those who have
trod the path ahead of us. Some of them include the team of the ERNET
Project, which brought the Internet technology to India's academic
institutions in the early nineties. We specially remember what we have
learned from the SIGSys Group of the Department of Computer Science of
the Indian Institute of Technology, Mumbai. We have also benefited
enormously from the guidance we received from the Networking Group at
the NCST in Mumbai, specially from Geetanjali Sampemane.
</para>
<para>On a wider scale, our learning along the path of systems and
networks started with Unix, without which our appreciation of computer
systems would have remained very fragmented and superficial. Our
insight into Unix came from our village elders at the Department
of Computer Science of the IIT at Mumbai, specially from ``Hattu,''
``Sathe,'' and ``Sapre,'' none of which are with the IIT today, and from
Professor D.B.Phatak and others, many of whom, luckily are still with
the IIT.</para>
<para>Coming down to specifics, all the members of Starcom Software who
have worked on various problems with networking, Linux, and Usenet news
installations, have helped the authors in understanding what works and
what doesn't. Without their work, this HOWTO would have been a dry text
book.</para>
</section>
<section><title>Comments invited</title>
<para>Your comments and contributions are invited. We cannot possibly
write all sections of this HOWTO based on our knowledge alone. Please
contribute all you can, starting with minor corrections and bug fixes
and going on to entire sections and chapters. Your contributions will be
acknowledged in the HOWTO.</para>
</section>
<section><title>Copyright</title>
<para>
Copyright (c) 2002 by Starcom Software Private Limited, India
</para>
<para>Please freely copy and distribute (sell or give away) this
document in any format. It is requested that corrections and/or
comments be fowarded to the document maintainer, reachable at
<literal>usenet@starcomsoftware.com</literal>. When these comments
and contributions are incorporated into this document and released
for distribution in future versions of this HOWTO, the content of the
incorporated text will become the copyright of Starcom Software Private
Limited. By submitting your contributions to us, you implicitly agree
to these terms.</para>
<para>You may create a derivative work and distribute it provided that
you:</para>
<orderedlist>
<listitem><para>
Send your derivative work (in the most suitable format such as SGML) to the
LDP (Linux Documentation Project) or the like for posting on the Internet.
If not the LDP, then let the LDP know where it is available.
</para></listitem>
<listitem><para>
License the derivative work with this same license or use GPL. Include a
copyright notice and at least a pointer to the license used.
</para></listitem>
<listitem><para>
Give due credit to previous authors and major contributors.
If you're considering making a derived work other than a
translation, it is requested that you discuss your plans with the
current maintainer.
</para></listitem>
</orderedlist>
</section>
<section><title>About Starcom Software Private Limited</title>
<para>
<emphasis role=bold>starcom</emphasis> (Starcom Software Private
Limited, <literal>www.starcomsoftware.com</literal>) has been building
products and solutions using Linux and Web technology since 1996. Our
entire office runs on Linux, and we have built mission-critical
solutions for some of the top corporate entities in India and abroad.
Our client list includes arguably the world's largest securities
depository (The National Securities Depository of India Limited), one of
the world's top five stock exchanges in terms of trading volumes (The
National Stock Exchange of India Limited), and one of India's premier
financial institutions, which is listed on the NYSE. In all these cases,
we have introduced them to Linux, and in many cases, we have built them
their first mission-critical business applications on Linux. Contact the
authors or check the Website for more information about the work we have done.
</para>
</section>
</chapter>

View File

@ -0,0 +1,218 @@
<chapter><title>Documentation and information</title>
<section><title>The manpages</title>
<para>The following manpages are installed automatically when our
integrated software distribution is compiled and installed, listed here
in no particular order:</para>
<itemizedlist>
<listitem><para><literal>badexpiry:</literal>
utility to look for articles with bad explicit Expiry headers
</para></listitem>
<listitem><para><literal>checkactive:</literal>
utility to perform some sanity checks on the <literal>active</literal>
file
</para></listitem>
<listitem><para><literal>cnewsdo:</literal>
utility to perform some checks and then run C-News maintenance commands
</para></listitem>
<listitem><para><literal>controlperm:</literal>
configuration file for controlling responses to Usenet control messages
</para></listitem>
<listitem><para><literal>expire:</literal>
utility to expire old articles
</para></listitem>
<listitem><para><literal>explode:</literal>
internal utility to convert a master batch file to ordinary batch files
</para></listitem>
<listitem><para><literal>inews:</literal>
the program which forms the entry point for fresh postings to be
injected into the Usenet system
</para></listitem>
<listitem><para><literal>mergeactive:</literal>
utility to merge one site's newsgroups to another site's
<literal>active</literal> file
</para></listitem>
<listitem><para><literal>mkhistory:</literal>
utility to rebuild news <literal>history</literal> file
</para></listitem>
<listitem><para><literal>news(5):</literal>
description of Usenet news article file and batch file formats
</para></listitem>
<listitem><para><literal>newsaux:</literal>
a collection of C-News utilities used by its own scripts and by the
Usenet news administrator for various maintenance purposes
</para></listitem>
<listitem><para><literal>newsbatch:</literal>
covers all the utilities and programs which are part of the news
batching system of C-News
</para></listitem>
<listitem><para><literal>newsctl:</literal>
describes the file formats and uses of all the files in
<literal>$NEWSCTL</literal> other than the two key files,
<literal>sys</literal> and <literal>active</literal>
</para></listitem>
<listitem><para><literal>newsdb:</literal>
describes the key files and directories for news articles, including the
structure of <literal>$NEWSARTS</literal>, the <literal>active</literal>
file, the <literal>active.times</literal> file, and the
<literal>history</literal> file.
</para></listitem>
<listitem><para><literal>newsflag:</literal>
utility to change the flag or type column of a newsgroup in the
<literal>active</literal> file
</para></listitem>
<listitem><para><literal>newsmail:</literal>
utility scripts used to send and receive newsfeeds by email. This is
different from a mail-to-news gateway, since this is for communication
between two Usenet news servers.
</para></listitem>
<listitem><para><literal>newsmaint:</literal>
utility scripts used by Usenet administrator to manage and maintain
C-News system
</para></listitem>
<listitem><para><literal>newsoverview(5):</literal>
file formats for the NOV database
</para></listitem>
<listitem><para><literal>newsoverview(8):</literal>
library functions of the NOV library and the utilities which use them
</para></listitem>
<listitem><para><literal>newssys:</literal>
the important <literal>sys</literal> file of C-News
</para></listitem>
<listitem><para><literal>relaynews:</literal>
the <literal>relaynews</literal> program of C-News
</para></listitem>
<listitem><para><literal>report:</literal>
utility to generate and send email reports of errors and events from
C-News scripts
</para></listitem>
<listitem><para><literal>rnews:</literal>
receive news batches and queue them for processing
</para></listitem>
<listitem><para><literal>nntpd:</literal>
The NNTP daemon
</para></listitem>
<listitem><para><literal>nntpxmit:</literal>
The NNTP batch transmit program for outgoing push feeds
</para></listitem>
</itemizedlist>
</section>
<section><title>The C-News guide</title>
<para>This document is part of the C-News source, and is available in
the <literal>c-news/doc</literal> directory of the source tree. The
<literal>makefile</literal> here uses <literal>troff</literal> and the
source files to generate <literal>guide.ps</literal>. This C-News Guide
is a very well-written document to provide an introduction to the
functioning of C-News.</para>
</section>
<section><title>O'Reilly's books on Usenet news</title>
<para>O'Reilly and Associates had an excellent book that can form the
foundations for understanding C-News and Usenet news in general, titled
``Managing UUCP and Usenet,'' dated 1992. This was considered a bit
dated because it did not cover INN or the Internet protocols.</para>
<para>They have subsequently published a more recent book, titled
``Managing Usenet,'' written by Henry Spencer, the co-author of C-News,
and David Lawrence, one of the most respected Usenet veterans and
administrators today. The book was published in 1998 and includes both
C-News and INN.</para>
<para>We have a distinct preference for books published by O'Reilly; we
usually find them the best books on their subjects. We make no attempts
to hide this bias. We recommend both books. We believe that there is
very little in this HOWTO of value to someone who studies one of these
books and then peruses information on the Internet.</para>
</section>
<section><title>Usenet-related RFCs</title>
<para>TO BE ADDED</para>
</section>
<section><title>The source code</title>
<para>TO BE ADDED</para>
</section>
<section><title>Usenet newsgroups</title>
<para>There are many discussion groups on the Usenet dedicated to the
technical and non-technical issues in managing a Usenet server and
service. These are:</para>
<itemizedlist>
<listitem><para><literal>news.admin.technical</literal>
Discusses technical issues about administering Usenet news
</para></listitem>
<listitem><para><literal>news.admin.policy</literal>
Discusses policy issues about Usenet news
</para></listitem>
<listitem><para><literal>news.software.b</literal>
Discusses C-News (no separate newsgroup was created after B-News gave
way to C-News) source, configuration and bugs (if any)
</para></listitem>
</itemizedlist>
<para>MORE WILL BE ADDED LATER</para>
</section>
<section><title>We</title>
<para>We, at Starcom Software, offer the services of our Usenet news
team to provide assistance to you by email, as a service to the Linux
and Usenet administrator community, on a best effort basis.</para>
<para>We will endeavour to answer all queries sent to
<literal>usenet@starcomsoftware.com</literal>, pertaining to the source
distribution we have put together and its configuration and maintenance,
and also pertaining to general technical issues related to running a
Usenet news service off a Unix or Linux server.</para>
<para>We may not be in a position to assist with software components we
are not familiar with, <emphasis>e.g.</emphasis> Leafnode, or platforms
we do not have access to, <emphasis>e.g.</emphasis> SGI IRIX. Intel
Linux will be supported as long as our group is alive; our entire office
runs on Linux servers and diskless Linux desktops.</para>
<para>You do not need to be dependent on us, because neither do we have
proprietary knowledge nor proprietary closed-source software. All the
extensions we are currently involved in with C-News and NNTPd will
immediately be made available to the Internet.</para>
</section>
</chapter>

View File

@ -0,0 +1,50 @@
<chapter><title>Setting up INN</title>
<section><title>Getting the source</title>
<para>INN is maintained and archived by the ISC (Internet Software
Consortium, <literal>www.isc.org</literal>) since 1996, and the INN
homepage is at <literal>http://www.isc.org/products/INN/</literal>. The
latest release of INN as of the time of this writing is INN v2.3.3,
released 7 May 2002. The full sources can be downloaded from
<literal>ftp://ftp.isc.org/isc/inn/inn-2.3.3.tar.gz</literal></para>
</section>
<section><title>Compiling and installing</title>
<para>TO BE ADDED LATER.</para>
</section>
<section><title>Configuring the system</title>
<para>TO BE ADDED LATER.</para>
</section>
<section><title>Setting up <literal>pgpverify</literal></title>
<para>TO BE ADDED LATER.</para>
</section>
<section><title>Feeding off an upstream neighbour</title>
<para>TO BE ADDED LATER.</para>
</section>
<section><title>Setting up outgoing feeds</title>
<para>TO BE ADDED LATER.</para>
</section>
<section id=innefficiency>
<title>Efficiency issues and advantages</title>
<para>TO BE ADDED LATER.</para>
</section>
</chapter>

View File

@ -0,0 +1,101 @@
<chapter><title>Connecting email with Usenet news</title>
<para>
Usenet news and mailing lists constantly remind us of each other. And the
parallels are so strong that many mailing lists are gatewayed two-way
with corresponding Usenet newsgroups, in the <literal>bit</literal> hierarchy
which maps onto the old BITNET, and elsewhere.
</para>
<para>
There are probably ten different situations where a mailing list is
better, and ten others where the newsgroup approach works better. The
point to recognise is that the system administrator needs a choice of
gatewaying one with the other, whenever tradeoffs justify it. Instead
of getting into the tradeoffs themselves, this chapter will then focus
on the mechanisms of gatewaying the two worlds.
</para>
<para>
One clear and recurring use we find for this gatewaying is for mailing
lists which are of general use to many employees in a corporate network.
For instance, in stockbroking company, many employees may like to
subscribe to a business news mailing list. If each employee had to
subscribe to the mailing list independently, it would waste mail spool
area and perhaps bandwidth. In such situations, we receive the mailing
list into an internal newsgroup, so that individual mailboxes are not
overloaded. Everyone can then read the newsgroup, and messages are also
archived till expired.
</para>
<section><title>Feeding Usenet news to email</title>
<para>
In CNews, this is trivially done by adding one line to the
<literal>sys</literal> file, defining a new outgoing feed listing all the
relevant groups and distributions, and specifying the commandline to be executed
which is supposed to send out the outgoing message to that ``feed.'' This
command, in our case, should be a mail-sending program,
<emphasis>e.g.</emphasis>
<literal>/bin/mail user@somewhere.com</literal>. This is often adequate to get
the job done. We are sure almost every Usenet news software system will have
an equally easy way of piping the feed of a newsgroup to an email address.
</para>
</section>
<section><title>Feeding email to news: the <literal>mail2news gateway</literal></title>
<para>With our Usenet software sources has been integrated a set of
scripts which we have been using for at least five years internally.
This set of scripts is called <literal>mail2news</literal>. It contains
one shellscript called <literal>mail2news</literal>, which takes an
email message from <literal>stdin</literal>, processes it, and feeds the
processed version to <literal>inews</literal>, the
<literal>stdin</literal>-based news article injection utility of C-News.
The <literal>inews</literal> utility accepts a new article post in its
<literal>stdin</literal> and queues it for digestion by
<literal>newsrun</literal> whenever it runs next.</para>
<para>To use <literal>mail2news</literal>, we assume you are using
Sendmail to process incoming email. Our instructions can easily be
modified to adapt to any Mail Transport Agent (MTA) of your choice. You
will have to configure Sendmail or any other MTA to redirect incoming
mails for the gateway to a program called <literal>m2nmailer</literal>,
a Perlscript which accepts the incoming message in its standard input
and a list of newsgroup names, space separated, on its command line.
Sendmail can be easily configured to trigger <literal>m2nmailer</literal>
this way by defining a new mailer in <literal>sendmail.cf</literal>,
and directing all incoming emails meant for the Usenet news system to
this mailer. Once you set up the appropriate rulesets for Sendmail,
it automatically triggers <literal>m2nmailer</literal> each time an
incoming email comes for the <literal>mail2news</literal>
gateway.</para>
<para>The precise configuration changes to Sendmail have already been
specified in the chapter titled ``Setting up C-News + NNTPd.''</para>
</section>
<section><title>Using GNU Mailman as an email-NNTP gateway</title>
<para>TO BE ADDED LATER</para>
<section><title>GNU's all-singing all-dancing MLM</title>
<para>TO BE ADDED LATER</para>
</section>
<section><title>Features of GNU Mailman</title>
<para>TO BE ADDED LATER</para>
</section>
<section><title>Gateway features connecting NNTP and email</title>
<para>TO BE ADDED LATER</para>
</section>
</section>
</chapter>

View File

@ -0,0 +1,194 @@
<chapter><title>Monitoring and administration</title>
<para>
Once the Usenet News system is in place and running, the news administrator
is then aided in monitoring the system by various reports generated by the
system. Also, he needs to make regular checks in specific directories and
file to ascertain the smooth working of the system.
</para>
<section><title>The <literal>newsdaily</literal> report</title>
<para>
This report is generated by newsdaily which is typically run through cron. I
shall enumerate some of the problems reported based on what I have seen.
</para>
<itemizedlist>
<listitem><para>bad input batches:
This reports articles that have been processed and
declared bad and hence not digested. The reason for it is not mentioned. You
are expected to check the article and determine the cause.
</para></listitem>
<listitem><para>leading unknown newsgroups by articles:
This gives a list of newsgroups
whose hierarchy has been subscribed to, but the specific newsgroup does not
appear in the active file. You could add the newsgroup in the active file if
you think it is important enough.
</para></listitem>
<listitem><para>leading unsubscribed newsgroups:
This gives a list of newsgroups
that have not been subscribed to, of which the news server receives a
maximum no. of articles. You really cannot do much about this except to
subscribe to them if they are required.
</para></listitem>
<listitem><para>leading sites sending bad headers:
This will list your NDNs who
are sending articles with malformed/insufficient headers.
</para></listitem>
<listitem><para>leading sites sending stale/future/misdated news:
This will list your NDNs who are sending you articles that are older than
the date you have specified for accepting feeds.
</para></listitem>
<listitem><para>Some of the reports generated by us:
We have modified the newsdaily script to include some more statistics.
<itemizedlist>
<listitem><para>disk usage:
This reports the size in bytes of the <literal>$NEWSARTS</literal>
area. If you are receiving feeds regularly, you should see this figure
increasing.
</para></listitem>
<listitem><para>incoming feed statistics:
This reports the no. of articles and total bytes recevied from each of
your NDNs.
</para></listitem>
<listitem><para>NNTP traffic report:
The output of nestor has also been included in
this report which gives details of each nntp connection and the overall
performance of the network connection read from the newslog file.
To understand the format, read manpage of nestor.
</para></listitem>
</itemizedlist>
</para></listitem>
<listitem><para>Error reporting from the errorlog file:
Reports errors logged in the errorlog file. Usually these are file
ownership or file missing problems which can be easily handled.
</para></listitem>
</itemizedlist>
</section>
<section><title>Crisis reports from <literal>newswatch</literal></title>
<para>
Most of the problems reported to me are ones with either space shortage or
persistent locks. There are instances when the scripts have created locks files
and have aborted/terminated without removing them. Sometimes they are
innocuous enough to be deleted but this should be determined after a careful
analysis. They could be an indication of some part of the system not working
correctly. For <emphasis>e.g.</emphasis> I would receive this error message when
sendbatches would abnormally terminate trying to transmit huge togo files. I had
to determine why sendbatches was failing this often.
</para>
<para>
The space shortage issue has to be addressed immediately. You could
delete unwanted articles by running doexpire or add more disk space at the OS
level. The latter seems a better option.
<para>
</section>
<section><title>Disk space</title>
<para>
The <literal>$NEWSBIN</literal> area occupies space that is fixed. Since the
binaries do not grow once installed, you do not have to worry about disk
shortage here. The areas that take up more space as feeds come in are
<literal>$NEWSCTL</literal> and <literal>$NEWSARTS</literal>. The
<literal>$NEWSCTL</literal> has log files that keep growing with each feed and
as the articles are digested in huge numbers the <literal>$NEWSARTS</literal>
continues to grow. Also, if articles are being archived on expiry you will need
space. Allocate a few GB of disk space for <literal>$NEWSARTS</literal>
depending on the no. of hierarchies you are subscribing and the feeds that come
in everyday. <literal>$NEWSCTL</literal> grows to a lesser proportion as
compared to <literal>$NEWSARTS</literal>. Allocate space for this accordingly.
</para>
</section>
<section><title>CPU load and RAM usage</title>
<para>With modern C-News and NNTPd, there is very little usage of these
system resources for processing news article flow. Key components like
<literal>newsrun</literal> or <literal>sendbatches</literal> do not load
the system much, except for cases where you have a very heavy flow of
compressed outgoing batches and the compression utility is run by
<literal>sendbatches</literal> frequently. <literal>newsrun</literal> is
amazingly efficient in the current C-News release. Even when it takes
half an hour to digest a large consignment of batches, it hardly loads the
CPU of a slow Pentium 200 MHz CPU or consumes much RAM in a 64 MB
system.</para>
<para>One thing which does slow down a system is a large bunch of
users connecting using NNTP to browse newsgroups. We do not have
heuristic based figures off-hand to provide a guidance figure for
resource consumption for this, but we have found that the load on the
CPU and RAM for a certain number of active users invoking
<literal>nntpd</literal> is more than with an equal number of
users connecting to the POP3 port of the same system for pulling
out mailboxes. A few hundred active NNTP users can really slow down
a dual-P-III Intel Linux server, for instance. This loading has no
bearing on whether you are using INN or <literal>nntpd</literal>;
both have practically identical implementations for NNTP
<emphasis>reading</emphasis> and differ only in their handling of
feeds.</para>
<para>Another situation which will slow down your Usenet news server is
when downstream servers connect to you for pulling out NNTP feeds using
the pull method. This has been mentioned before. This can really load
your server's I/O system and CPU.</para>
</section>
<section><title>The <literal>in.coming/bad</literal> directory</title>
<para>
The in.coming directory is where the batches/articles reside when you have
received feeds from your NDN and before processing happens. Checking this
directory regularly to see if there are batches is a good way of determining
that feeds are coming in. The batches and articles have different nomenclature.
Names like nntp.GxhsDj are indicative of batches and individual
articles are named beginning with digits like <literal>0.10022643380.t</literal>
</para>
<para>
The bad sub-directory under in.coming holds batches/articles that have
encountered errors when they were being processed by relaynews. You will have
to look into the directory for the cause of it. Ideally speaking, this
directory should be empty.
</para>
</section>
<section><title>Long pending queues in <literal>out.going</literal></title>
<para>TO BE ADDED.</para>
</section>
<section><title>Problems with <literal>nntpxmit</literal> and <literal>nntpsend</literal></title>
<para>TO BE ADDED.</para>
</section>
<section><title>The <literal>junk</literal> and <literal>control</literal> groups</title>
<para>
Control messages are those that have a newgroup/rmgroup/cancel/checkgroup in
their subject line. Such messages result in relaynews calling the appropriate
script and on execution a message is mailed to the admin about the action
taken. These control messages are stored in the control directory of
<literal>$NEWSARTS</literal>. For the propogation of such messages, one must
subscribe to the control hierarchy.
</para>
<para>
When your news system determines that a certain article has not been subscribed
by you, it is 'junked' i.e. such articles appear in the junk directory. This
directory plays a key role in transferring articles to your NDNs as they would
subscribe to the junk hierarchy to receive feeds. If you are a leaf node, there
is no reason why articles should pile here. Keep deleting them on a daily
basis.
</para>
</section>
</chapter>

View File

@ -0,0 +1,327 @@
<chapter><title>Our perspective</title>
<para>
This chapter has been added to allow us to share our perspective on
certain technical choices. Certain issues which are more a matter of
opinion than detail, are discussed here.
</para>
<section id=feedefficiency><title>Efficiency issues of NNTP</title>
<para>
To understand why NNTP is often an inappropriate choice for
newsfeeds, we need to understand TCP's sliding window protocol
and the nature of NNTP. NNTP is an apalling waste of bandwidth
for most bulk article transfer situations, because of the
following simple reasons:
</para>
<itemizedlist>
<listitem><para>
<emphasis>No compression</emphasis>: articles are transferred in plain text.
</para></listitem>
<listitem><para>
<emphasis>No article transmission restart</emphasis>: if a
connection breaks halfway through an article, the next round
will have to start with the beginning of the article.
</para></listitem>
<listitem><para>
<emphasis>Ping-pong protocol</emphasis>: NNTP is unsuitable for
bulk streaming data transfer.
</para></listitem>
</itemizedlist>
<para>
A word of explanation on the ping-pong issue is perhaps
needed here. TCP uses a sliding window mechanism to pump out
data in one direction very rapidly, and can achieve near
wire speeds under most circumstances. However, this only
works if the application layer protocol can aggregate a
large amount of data and pump it out without having to stop
every so often, waiting for an ack or a response from the
other end's application layer. This is precisely why sending
one file of 100~Mbytes by FTP takes so much less clock time
than 10,000 files of 10~Kbytes each, all other parameters
remaining unchanged. The trick is to keep the sliding window
sliding smoothly over the outgoing data, blasting packets
out as fast as the wire will carry it, without ever
allowing the window to empty out while you wait for an ack.
Protocols which require short bursts of data from either end
constantly, <emphasis>e.g.</emphasis> in the case of remote
procedure calls, are called ``ping pong protocols'' because they
remind you of a table-tennis ball.
</para>
<para>
With NNTP, this is precisely the problem. The average size
of Usenet news messages, including header and body, is
3 Kbytes. When thousands of such articles are sent out by
NNTP, the sending server has to send the message~ID of the
first article, then wait for the receiving server to respond
with a ``yes'' or ``no.'' Once the sendiing server gets the
``yes'', it sends out that article, and waits for an ``ok''
from the receiving server. Then it sends out the message~ID
of the second article, and waits for another ``yes'' or
``no.'' And so on. The TCP sliding window never gets to do
its job.
</para>
<para>
This sub-optimal use of TCP's data pumping ability, coupled with
the absence of compression, make for a protocol which is great
for synchronous connectivity, <emphasis>e.g.</emphasis> for news
reading or real-time
updates, but very poor for batched transfer of data which can be
delayed and pumped out. All these are precisely reversed in the
case of UUCP over TCP.
</para>
<para>
To decide which protocol, UUCP over TCP or NNTP, is appropriate
for your server, you must address two questions:
</para>
<orderedlist>
<listitem><para>
How much time can your server afford to wait from the time
your upstream server receives an article to the time it
passes it on to you?
</para></listitem>
<listitem><para>
Are you receiving the same set of hierarchies from multiple
next-door neighbour servers, <emphasis>i.e.</emphasis> is your
newsfeed flow pattern a mesh instead of a tree?
</para></listitem>
</orderedlist>
<para>
If your answers to the two questions above are ``messages cannot
wait'' and ``we operate in a mesh'', then NNTP is the correct
protocol for your server to receive its primary feed(s).
</para>
<para>
In most cases, carrier-class servers operated by major service
providers do not want to accept even a minute's delay from the
time they receive an article to the time they retransmit it out.
They also operate in a mesh with other servers operated by their
own organisations (<emphasis>e.g.</emphasis> for redundancy) or
others. They usually
sit very close to the Internet backbone,
<emphasis>i.e.</emphasis> with Tier 1 ISPs,
and have extremely fast Internet links, usually more than
10 Mbits/sec. The amount of data that flows out of such servers
in outgoing feeds is more than the amount that comes in, because
each incoming article is retained, not for local consumption,
but for retransmission to others lower down in the flow. And
these servers boast of a retransmission latency of less than 30
seconds, <emphasis>i.e.</emphasis> I will retransmit an article
to you within 30 seconds of my having received it.
</para>
<para>
However, if your server is used by a company for making Usenet
news available for its employees, or by an institute to make the
service available for its students and teachers, then you are
not operating your server in a mesh pattern, nor do you mind it
if messages take a few hours to reach you from your upstream
neighbour.
</para>
<para>
In that case, you have enormous bandwidth to conserve by moving
to UUCP. Even if, in this Internet-dominated era, you have no
one to supply you with a newsfeed using dialup point-to-point
links, you can pick up a compressed batched newsfeed using UUCP
over TCP, over the Internet.
</para>
<para>
In this context, we want to mention Taylor UUCP, an excellent
UUCP implementation available under GNU GPL. We use this UUCP
implementation in preference to the bundled UUCP systems offered
by commercial Unix vendors even for dialup connections, because
it is far more stable, high performance, and always supports
file transfer restart. Over TCP/IP, Taylor is the only one we
have tried, and we have no wish to try any others.
</para>
<para>
Apart from its robustness, Taylor UUCP has one invaluable
feature critical to large Usenet batch transfers: file transfer
restart. If it is transferring a 10 MB batch, and the connection
breaks after 8 MB, it will restart precisely where it left off
last time. Therefore, no bytes of bandwidth are wasted, and
queues never get stuck forever. </para>
<para>
Over NNTP, since there is no batching, transfers happen one
article at a time. Considering the (relatively) small size of an
article compared to multi-megabyte UUCP batches, one would
expect that an article would never pose a major problem while
being transported; if it can't be pushed across in one attempt,
it'll surely be copied the next time. However, we have
experienced entire NNTP feeds getting stuck for days on end
because of one article, with logs showing the same article
breaking the connection over and over again while being
transferred <footnote><para>
This lack of a restart facility is something NNTP shares with
its older cousin, SMTP, and we have often seen email messages
getting stuck in a similar fashion over flaky data links. In
many such networks which we manage for our clients, we have
moved the inter-server mail transfer to Taylor UUCP, using UUCP
over TCP.</para></footnote>. Some rare articles can be
more than a megabyte in size, particularly in
<literal>comp.binaries</literal>. In each such incident, we have
had to manually edit the queue file on the transmitting server
and remove the offending article from the head of the queue.
Taylor UUCP, on the other hand, has never given us a single
hiccup with blocked queues.
</para>
<para>
We feel that the overwhelming majority of servers offering the
Usenet news service are at the leaf nodes of the Usenet news
flow, not at the heart. These servers are usually connected in a
tree, with each server having one upstream ``parent node'', and
multiple downstream ``child nodes.'' These servers receive their
bulk incoming feed from their upstream server, and their users
can tolerate a delay of a few hours for articles to move in and
out. If your server is in this class, we feel you should
consider using UUCP over TCP and transfer compressed batches.
This will minimise bandwidth usage, and if you operate using
dialup Internet connections, it will directly reduce your
expenses.
</para>
<para>
A word about the link between mesh-patterned newsfeed flow and
the need to use NNTP. If your server is receiving primary ---
as against trickle --- feeds from multiple next-door neighbours,
then you have to use NNTP to receive these feeds. The reason
lies in the way UUCP batches are accepted. UUCP batches are
received in their entirety into your server, and then they are
uncompressed and processed. When the sending server is giving
you the batch, it is not getting a chance to go through the
batch article by article and ask your server whether you have or
don't have each article. This way, if multiple servers give you
large feeds for the same hierarchies, then you will be bound to
receive multiple copies of each article if you go the UUCP way.
All the gains of compressed batches will then be neutralised.
NNTP's <literal>IHAVE</literal> and <literal>SENDME</literal>
dialogue in effect
permits precisely this double-check for each article, and thus
you don't receive even a single article twice.
</para>
<para>
For Usenet servers which connect to the Internet periodically
using dialup connections to fetch news, the UUCP option is
especially important. Their primary incoming newsfeed cannot be
pushed into them using queued NNTP feeds for reasons described
in the above <link linkend="dialupnonntp">paragraph</link>
These
hapless servers are usually forced to pull out their articles
using a pull NNTP feed, which is often very slow. This may lead
to long connect times, repeat attempts after every line break,
and high Internet connection charges.
</para>
<para>
On the other hand, we have been using UUCP over TCP and
<literal>gzip</literal>'d batches for more than five years now
in a variety of sites. Even today, a full feed of all eight
standard hierarchies, plus the full
<literal>microsoft</literal>, <literal>gnu</literal>
and <literal>netscape</literal> hierarchies, minus
<literal>alt</literal> and <literal>comp.binaries</literal>, can
comfortably be handled in just a few hours of connect time every
night, dialing up to the
Internet at 33.6 or 56 Kbits/sec. We believe that the proverbial
`full feed' with all hierarchies including
<literal>alt</literal> can be handled comfortably with a 24-hour
link at 56 Kbits/sec, provided you forget about NNTP feeds. We
usually get compression ratios of 4:1 using
<literal>gzip -9</literal> on our news batches, incidentally.
</para>
</section>
<section><title>C-News+NNTPd or INN?</title>
<para>
INN and CNews are the two most popular free software implementations
of Usenet news. Of these two, we prefer CNews, primarily because
we have been using it across a very large range of Unixen for more
than one decade, starting from its earliest release --- the so-called
``Shellscript release'' --- and we have yet to see a need to
change.<footnote><para>One of us did his first installation with with BNews,
actually, at the IIT Mumbai. Then we rapidly moved from there to CNews
Shellscript Release, then CNews Performance Release, CNews Cleanup
Release, and our current release has fixed some bugs in the latest
Cleanup Release.</para></footnote>
</para>
<para>
We have seen INN, and we are not comfortable with a software
implementation which puts in so much of functionality inside one
executable. This reminds us of Windows NT, Netscape Communicator,
and other complex and monolithic systems, which make us uncomfortable
with their opaqueness. We feel that CNews' architecture, which comprises
many small programs, intuitively fits into the Unix approach of building
large and complex systems, where each piece can be understood, debugged,
and if needed, replaced, individually.
</para>
<para>
Secondly, we seem to see the move towards INN accompanied by a move
towards NNTP as a primary newsfeed mechanism. This is no fault of INN;
we suspect it is a sort of cultural difference between INN users and
CNews users. We find the issue of UUCP versus NNTP for batched newsfeeds
a far more serious issue than the choice of CNews versus INN. We simply
cannot agree with the idea that NNTP is an appropriate protocol for bulk
Usenet feeds for most sites. Unfortunately, we seem to find that most
sites which are more comfortable using INN seem to also prefer NNTP over
UUCP, for reasons not clear to us.
</para>
<para>
Our comments should not be taken as expressing any reservation about
INN's quality or robustness. Its popularity is testimony to its
quality; it most certainly ``gets the job done'' as well as anything
else. In addition, there are a large number of commercial Usenet news
server implementations which have started with the INN code; we do not
know of any which have started with the CNews code. The Netwinsite DNews
system and the Cyclone Typhoon, we suspect, both are INN-spired.
</para>
<para>
We will recommend CNews and NNTPd over INN, because we are more
comfortable with the CNews architecture for reasons given above, and we
do not run carrier-class sites. We will continue to support, maintain and
extend this software base, at least for Linux. And we see no reason for
the overwhelming majority of Usenet sites to be forced to use anything
else. Your viewpoints welcome.
</para>
<para>
Had we been setting up and managing carrier-class sites with their
near-real-time throughput requirements, we would probably not have
chosen CNews. And for those situations, our opinion of NNTP versus
compressed UUCP has been discussed in <xref linkend="feedefficiency"/>
</para>
<para>
Suck and Leafnode have their place in the range of options, where they
appear to be attractive for novices who are intimidated by the ``full
blown'' appearance of CNews+NNTPd or INN. However, we run CNews + NNTPd
even on Linux laptops. We suspect INN can be used this way too. We do
not find these ``full blown'' implementations any more resource
hungry than their simpler cousins. Therefore, other than administration
and configuration familiarity, we don't see any other reason why even a
solitary end-user will choose Leafnode or Suck over CNews+NNTPd. As
always, contrary opinions invited.
</para>
</section>
</chapter>

View File

@ -0,0 +1,515 @@
<chapter><title>Principles of Operation</title>
<para>Here we discuss the basic concepts behind the operation of a Usenet news
system.</para>
<section><title>Newsgroups and articles </title>
<para>A Usenet news article sits in a file or in some other on-disk
data structure on the disks of a Usenet server, and its contents look
like this:</para>
<programlisting>
<![CDATA[
Xref: news.starcomsoftware.com starcom.tech.misc:211 starcom.tech.security:452
Newsgroups: starcom.tech.misc,starcom.tech.security
Path: news.starcomsoftware.com!purva!shuvam
From: Shuvam <shuvam@starcomsoftware.com>
Subject: "You just throw up your hands and reboot" (fwd)
Content-Type: TEXT/PLAIN; charset=US-ASCII
Distribution: starcom
Organization: Starcom Software Pvt Ltd, India
Message-ID: <Pine.LNX.4.31.0107022153490.30462-100000@starcomsoftware.com>
Mime-Version: 1.0
Date: Mon, 2 Jul 2001 16:27:57 GMT
Interesting quote, and interesting article.
Incidentally, comp.risks may be an interesting newsgroup to follow. We
must be receiving the feed for this group on our server, since we
receive all groups under comp.*, unless specifically cancelled. Check it
out sometime.
comp.risks tracks risks in the use of computer technology, including
issues in protecting ourselves from failures of such stuff.
Shuvam
> Date: Thu, 14 Jun 2001 08:11:00 -0400
> From: "Chris Norloff" <cnorloff@norloff.com>
> Subject: NYSE: "Throw up your hands and reboot"
>
> When the New York Stock Exchange computer systems crashed for 85
> minutes (8 Jun 2001), Andrew Brooks, chief of equity trading at
> Baltimore mutual fund giant T. Rowe Price, was quoted as saying "Hey,
> we're all subject to the vagaries of technology. It happens on your
> own PC at home. You just throw up your hands and reboot."
>
> http://www.washingtonpost.com/ac3/ContentServer?articleid=A42885-2001Jun8&pagename=article
>
> Chris Norloff
>
>
> This is from --
>
> From: risko@csl.sri.com (RISKS List Owner)
> Newsgroups: comp.risks
> Subject: Risks Digest 21.48
> Date: Mon, 18 Jun 2001 19:14:57 +0000 (UTC)
> Organization: University of California, Berkeley
>
> RISKS-LIST: Risks-Forum Digest Monday 19 June 2001
> Volume 21 : Issue 48
>
> FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks)
> ACM Committee on Computers and Public Policy,
> Peter G. Neumann, moderator
>
> This issue is archived at <URL:http://catless.ncl.ac.uk/Risks/21.48.html>
> and by anonymous ftp at ftp.sri.com, cd risks .
>
]]>
</programlisting>
<para>A Usenet article's header is very interesting if you want to learn
about the functioning of the Usenet. The <literal>From</literal>,
<literal>Subject</literal>, and <literal>Date</literal> headers are
familiar to anyone who has used email. The <literal>Message-ID</literal>
header contains a unique ID for each message, and is present in each
email message, though not many non-technical email users know about it.
The <literal>Content-Type</literal> and <literal>Mime-Version</literal>
headers are used for MIME encoding of articles, attaching files and
other attachments, and so on, just like in email messages.</para>
<para>The <literal>Organisation</literal> header is an informational header
which is supposed to carry some information identifying the organisation
to which the author of the article belongs. What remains now are the
<literal>Newsgroups</literal>, <literal>Xref</literal>,
<literal>Path</literal> and <literal>Distributions</literal> headers.
These are special to Usenet articles and are very important.</para>
<para>The <literal>Newsgroups</literal> header specifies which newsgroups
this article should belong to. The <literal>Distributions</literal>
header, sadly under-utilised in today's globalised Internet world,
allows the author of an article to specify how far the article will be
re-transmitted. The author of an article, working in conjunction with
well-configured networks of Usenet servers, can control the ``radius'' of
replication of his article, thus posting an article of local significance
into a newsgroup but setting the <literal>Distribution</literal> header to
some suitable setting, <emphasis>e.g.</emphasis> <literal>local</literal>
or <literal>starcom</literal>, to prevent the article from being relayed
to servers outside the specified domain.</para>
<para>The <literal>Xref</literal> header specifies the precise
<emphasis role=strong>article number</emphasis> of this article in each of the
newsgroups in which it is inserted, for the current server. When an
article is copied from one server to another as part of a newsfeed,
the receiving server throws away the old <literal>Xref</literal> header
and inserts its own, with its own article numbers. This indicates an
interesting feature of the Usenet system: each article in a Usenet server
has a unique number (an integer) for each newsgroup it is a part of.
Our sample above has been added to two newsgroups on our server, and has
the article numbers 211 and 452 in those groups. Therefore, any Usenet
client software can query our server and ask for article number 211 in
the newsgroup <literal>starcom.tech.misc</literal> and get this article.
Asking for article number 452 in <literal>starcom.tech.security</literal>
will fetch the article too. On another server, the numbers may be very
different.</para>
<para>The <literal>Path</literal> specifies the list of machines through
which this article has travelled before it has reached the current
server. UUCP-style syntax is used for this string. The current
example indicates that a user called <literal>shuvam</literal> first
wrote this article and posted it onto a computer which calls itself
<literal>purva</literal>, and this computer then transferred this article
by a newsfeed to <literal>news.starcomsoftware.com</literal>. The
<literal>Path</literal> header is critical for breaking loops in
newsfeeds, and will be discussed in detail later.</para>
<para>Our sample article will sit in the two newsgroups listed above
forever, unless expired. The Usenet software on a server is usually
configured to expire articles based on certain conditions,
<emphasis>e.g.</emphasis> after it's older than a certain number of
days. The C-News software we use allows expiry control based on the
newsgroup hierarchy and the type of newsgroup, <emphasis>i.e.</emphasis>
moderated or unmoderated. Against each class of newsgroups, it allows
the administrator to specify a number of days after which the article
will be expired. It is possible for an article to control its own
expiry, by carrying an <literal>Expires</literal> header specifying a
date and time. Unless overriden in the Usenet server software, the
article will be expired only after its explicit expiry time is
reached.</para>
</section>
<section><title>Of readers and servers</title>
<para>Computers which access Usenet articles are broadly of two classes:
the readers and the servers. A Usenet server carries a repository of
articles, manages them, handles newsfeeds, and offers its repository to
authorised readers to read. A Usenet reader is merely a computer with
the appropriate software to allow a user to access a software, fetch
articles, post new articles, and keep track of which articles it has
read in each newsgroup. In terms of functionality, Usenet reading
software is less interesting to a Usenet administrator than a Usenet
server software. However, in terms of lines of code, the Usenet reader
software can often be much larger than Usenet server software, primarily
because of the complexities of modern GUI code.</para>
<para>Most modern computers almost exclusively access Usenet servers using
the NNTP (Network News Transfer Protocol) for reading and posting. This
protocol can also be used for inter-server communication, but those
aspects will be discussed later. The NNTP protocol, like any other
well-designed TCP-based Internet protocol, carries ASCII commands and
responses terminated with <literal>CR-LF</literal>, and comprises a
sequence of commands, somewhat reminiscent of the POP3 protocol for
email. Using NNTP, a Usenet reader program connects to a Usenet server,
asks for a list of active newsgroups, and receives this (often huge)
list. It then sets the ``current newsgroup'' to one of these, depending
on what the user wants to browse through. Having done this, it gets the
meta-data of all current articles in the group, including the author,
subject line, date, and size of each article, and displays an index of
articles to the user.</para>
<para>The user then scans through this list, selects an article, and
asks the reader to fetch it. The reader gives the article number of
this article to the server, and fetches the full article for the user
to read through. Once the user finishes his NNTP session, he exits,
and the reader program closes the NNTP socket. It then (usually)
updates a local file in the user's home area, keeping track of which
news articles the user has read. These articles are typically not shown
to the user next time, thus allowing the user to progress rapidly to new
articles in each session. The reader software is helped along in this
endeavour by the <literal>Xref</literal> header, using which it knows
all the different identities by which a single article is identified
in the server. Thus, if you read the sample article given above by
accessing <literal>starcom.tech.misc</literal>, you'll never be shown
this article again when you access <literal>starcom.tech.misc</literal>
or <literal>starcom.tech.security</literal>; your reader software will
do this by tracking the <literal>Xref</literal> header and mapping
article numbers.</para>
<para>When a user posts an article, he first composes his message using
the user interface of his reader software. When he finally gives the
command to send the article, the reader software contacts the Usenet
server using the pre-existing NNTP connection and sends the article to
it. The article carries a <literal>Newsgroups</literal> header with the
list of newsgroups to post to, often a <literal>Distribution</literal>
header with a distribution specification, and other headers
like <literal>From</literal>, <literal>Subject</literal>
<emphasis>etc.</emphasis> These headers are used by the server
software to do the right thing. Special and rare headers like
<literal>Expires</literal> and <literal>Approved</literal> are acted upon
when present. The server assigns a new article number to the article for
each newsgroup it is posted to, and creates a new <literal>Xref</literal>
header for the article.</para>
<para>Transfer of articles between servers is done in various ways, and
is discussed in quite a bit of detail in Section XXX titled
``Newsfeeds'' below.</para>
</section>
<section ><title>Newsfeeds </title>
<section><title> Fundamental concepts</title>
<para>When we try to analyse newsfeeds in real life, we begin to see
that, for most sites, traffic flow is not symmetrical in both
directions. We usually find that one server will feed the bulk
of the world's articles to one or more secondary servers every
day, and receive a few articles written by the users of those
secondary servers in exchange. Thus, we usually find that
articles flow down from the stem to the branches to the leaves
of the worldwide Usenet server network, and not exactly in a totally
balanced mesh flow pattern. Therefore, we use the term
``upstream server'' to refer to the server from which we receive
the bulk of our daily dose of articles, and ``downstream
server'' to refer to those servers which receive the bulk dose
of articles from us.</para>
<para>Newsfeeds relay articles from one server to their ``next door
neighbour'' servers, metaphorically speaking. Therefore, articles
move around the globe, not by a massive number of single-hop
transfers from the originating server to every other server in
the world, but in a sequence of hops, like passing the baton in
a relay race. This increases the latency time for an article
to reach a remote tertiary server after, say, ten hops, but
it allows tighter control of what gets relayed at every hop,
and helps in redundancy, decentralisation of server loads,
and conservation of network bandwidth. In this respect, Usenet
newsfeeds are more complex than HTTP data flows, which
typically use single-hop techniques.</para>
<para>Each Usenet news server therefore has to worry about
newsfeeds each time it receives an article, either by a fresh post
or from an incoming newsfeed. When the Usenet server digests this
article and files it away in its repository, it simultaneously
looks through its database to see which other server it should
feed the article to. In order to do this, it carries out a
sequence of checks, described below.</para>
<para>Each server knows which other servers are its ``next door
neighbours;'' this information is kept in its newsfeed
configuration information. Against each of its ``next door
neighbours,'' there will be a list of newsgroups which it
wants, and a list of distributions. The new article's list of
newsgroups will be matched against the newsgroup list of the
``next door neighbour'' to see whether there's even a single
common newsgroup which makes it necessary to feed the article to
it. If there's a matching newsgroup, and the server's distribution
list matches the article's distribution, then the article is
marked for feeding to this neighbour.</para>
<para>When the neighbour receives the article as part of the
feed, it performs some sanity checks of its own. The first check
it performs is on the <literal>Newsgroups</literal> header of
the new article. If none of the newsgroups listed there are part
of the active newsgroups list of this server, then the article
can be rejected. An article rejected thus may even be queued for
outgoing feeds to other servers, but will not be digested for
incorporation into the local article repository.</para>
<para>The next check performed is against the
<literal>Path</literal> header of the incoming article. If this
header lists the name of the current Usenet server anywhere,
it indicates that it has already passed through this server at
least once before, and is now re-appearing here erroneously because
of a newsfeed loop. Such loops are quite often configured into
newsfeed topologies for redundancy: ``I'll get the articles from
Server X if not Server Y, and may the first one in win.'' The
Usenet server software automatically detects a duplicate feed
of an article and rejects it.</para>
<para>The next check is against what is called the server's
<emphasis>history database</emphasis>. Every Usenet server has
a history database, which is a list of the message IDs of all
current articles in the local repository. Oftentimes the history
database also carries the message IDs of all messages recently
expired. If the incoming article's message ID matches any of the
entries in the database, then again it is rejected without being
filed in the local repository. This is a second loop detection
method. Sometimes, the mere checking of the article's
<literal>Path</literal> header does not detection of all
potential problems, because the problem may be a re-insertion
instead of a loop. A re-insertion happens when the same incoming
batch of news articles is re-fed into the local server, perhaps
after recovering the system's data from tapes after a system
crash. In such cases, there's no newsfeed loop, but there's
still the risk that one article may be digested into the local
server twice. The history database prevents this.</para>
<para>All these simple checks are very effective, and work
across server and software types, as per the Internet standards.
Together, they allow robust and fail-safe Usenet article flow
across the world.</para>
</section>
<section><title>Types of newsfeeds</title>
<para>This section explains the basics of newsfeeds, without getting
into details of software and configuration files.</para>
<section><title>Queued feeds</title>
<para>
This is the commonest method of sending articles from one server
to another, and is followed whenever large volumes of articles
are to be transferred per day. This approach needs a one-time
modification to the upstream server's configuration for each
outgoing feed, to define a new <emphasis>queue.</emphasis>
</para>
<para>
In essence all queued feeds work in the following way. When the
sending server receives an article, it processes it for
inclusion into its local repository, and also checks through all
its outgoing feed definitions to see whether the article needs
to be queued for any of the feeds. If yes, it is added to a
<emphasis>queue file</emphasis> for each outgoing feed. The
precise details
of the queue file can change depending on the software
implementation, but the basic processes remain the same. A queue
file is a list of queued articles, but does not contain the
article contents. Typical queue files are ASCII text files with
one line per article giving the path to a copy of the article in
the local spool area.
</para>
<para>
Later, a separate process picks up each queue file and creates
one or more <emphasis>batches</emphasis> for each outgoing feed.
A <emphasis>batch</emphasis> is a large file containing multiple
Usenet news
articles. Once the batches are created, various transport
mechanisms can be used to move the files from sending server to
receiving server. You can even use scripted FTP. You only need
to ensure that the batch is picked up from the upstream server
and somehow copied into a designated incoming batch directory in
the downstream server.
</para>
<para>
UUCP has traditionally been the mechanism of choice for batch
movement, because it predates the Internet and wide availability
of fast packet-switched data networks. Today, with TCP/IP
everywhere, UUCP once again emerges as the most logical choice
of batch movement, because it too has moved with the times: it
can work over TCP.
</para>
<para>
NNTP is the <emphasis>de facto</emphasis> mechanism of choice
for moving
queued newsfeeds for carrier-class Usenet servers on the
Internet, and unfortunately, for a lot of other Usenet servers
as well. The reason why we find this choice unfortunate is
discussed in <xref linkend="feedefficiency"/> below. But in NNTP
feeds, an intermediate step of building batches out of queue
files can be eliminated --- this is both its strength and its
weakness.
</para>
<para>
In the case of queued NNTP feeds, articles get added to queue
files as described above. An NNTP transmit process periodically
wakes up, picks up a queue file, and makes an NNTP connection to
the downstream server. It then begins a processing loop where,
for each queued article, it uses the NNTP
<literal>IHAVE</literal>
command to inform the downstream server of the article's
message~ID. The downstream server checks its local repository to
see whether it already has the message. If not, it responds with
a <literal>SENDME</literal> response. The transmitting server
then pumps
out the article contents in plaintext form. When all articles
in the queue have been thus processed, the sending server closes
the connection. If the NNTP connection breaks in between due to
any reason, the sending server truncates the queue file and
retains only those articles which are yet to be transmitted,
thus minimising repeat transmissions.
</para>
<para><anchor id="dialupnonntp"/>
A queued NNTP feed works with the sending server making an NNTP
connection to the receiving server. This implies that the
receiving server must have an IP address which is known to the
sending server or can be looked up in the DNS. If the receiving
server connects to the Internet periodically using a dialup
connection and works with a dynamically assigned IP address,
this can get tricky. UUCP feeds suffer no such problems because
the sending server for the newsfeed can be the UUCP server,
<emphasis>i.e.</emphasis>
passive. The receiving server for the feed can be the UUCP
master, <emphasis>i.e.</emphasis> the active party. So the
receiving server can then
initiate the UUCP connection and connect to the sending server.
Thus, if even one of the two parties has a static IP address,
UUCP queued feeds can work fine.
</para>
<para>
Thus, NNTP feeds can be sent out a little faster than the
batched transmission processes used for UUCP and other older
methods, because no batches need to be constructed. However,
NNTP is often used in newsfeeds where it is not necessary and it
results in colossal waste of bandwidth. Before we study
efficiency issues of NNTP versus batched feeds, we will cover
another way feeds can be organised using NNTP: the pull feeds.
</para>
</section>
<section><title>Pull feeds</title>
<para>
This method of transferring a set of articles works only over
NNTP, and requires absolutely no configuration on the
transmitting, or upstream, server. In fact, the upstream server
cannot even easily detect that the downstream server is pulling
out a feed --- it appears to be just a heavy and thorough
newsreader, that's all.
</para>
<para>
This pull feed works by the downstream server pulling out
articles i one by one, just like any NNTP newsreader, using the
NNTP <literal>ARTICLE</literal> command with the Message-ID as
parameter.
The interesting detail is how it gets the message~IDs to begin
with. For this, it uses an NNTP command, specially designed for
pull feeds, called <literal>NEWNEWS</literal>. This command
takes a hierarchy and a date,
<screen> NEWNEWS comp 15081997 </screen>
</para>
<para>
This command is sent by the downstream server over NNTP to the
upstream server, and in effect asks the upstream server to list
out all news articles which are newer than 15 August 1997 in the
<literal>comp</literal> hierarchy. The upstream server responds
with a
(often huge) list of message~IDs, one per line, ending with a
period on a line by itself.
</para>
<para>
The pulling server then compares each newly received message~ID
with its own article database and makes a (possibly shorter)
list of all articles which it does not have, thus eliminating
duplicate fetches. That done, it begins fetching articles one
by one, using the NNTP <literal>ARTICLE</literal> command as
mentioned above.
</para>
<para>
In addition, there is another NNTP command,
<literal>NEWGROUPS</literal>,
which allows the NNTP client --- <emphasis>i.e.</emphasis> the
downstream server in
this case --- to ask its upstream server what were the new
newsgroups created since a given date. This allows the
downstream server to add the new groups to its
<literal>active</literal> file.
</para>
<para>
The <literal>NEWNEWS</literal> based approach is usually one of
the most inefficient methods of pulling out a large Usenet feed.
By inefficiency, here we refer to the CPU loads and RAM
utilisation on the upstream server, not on bandwidth usage. This
inefficiency is because most Usenet news servers do not keep
their article databases indexed by hierarchy and date; CNews
certainly does not. This means that a <literal>NEWNEWS</literal>
command issued to an upstream server will put that server into a
sequential search of its article database, to see which articles
fit into the hierarchy given and are newer than the given date.
</para>
<para>
If pull feeds were to become the most common way of sending out
articles, then all upstream servers would badly need an
efficient way of sorting their article databases to allow each
<literal>NEWNEWS</literal> command to rapidly generate its list
of matching articles. A slow upstream server today might take
minutes to begin responding to a <literal>NEWNEWS</literal>
command, and
the downstream server may time out and close its NNTP connection
in the meanwhile. We have often seen this happening, till we
tweak timeouts.
</para>
<para>
There are basic efficiency issues of bandwidth utilisation
involved in NNTP for news feeds, which are applicable for both
queued and pull feeds. But the problem with
<literal>NEWNEWS</literal> is unique to pull feeds, and relates
to server loads, not bandwidth wastage.
</para>
</section>
</section>
</section>
<section id="controlmsg"> <title>Control messages</title>
<para>
(Discuss control messages. Show examples of actual control messages
if possible. Discuss security issues in the form of control message
storms, and how digital signatures are being used to tackle it. This
sets the ground for <literal>pgpverify</literal> later on.)
</para>
</section>
</chapter>

View File

@ -0,0 +1,716 @@
<chapter><title>Setting up CNews + NNTPd</title>
<section><title>Getting the sources and stuff</title>
<section><title>The sources</title>
<para>C-News software can be obtained from
<literal>ftp://ftp.uu.net/networking/news/transport/cnews/cnews.tar.Z</literal>
and will need to be uncompressed using the BSD
<literal>uncompress</literal> utility or a compatible program. The
tarball is about 650 KBytes in size. It has its own highly intelligent
configuration and installation processes, which are very well
documented. The version that is available is Cleanup Release revision G,
on which our own version is based.</para>
<para>NNTPd is available from
<literal>ftp://ftp.uu.net/networking/news/nntp/nntp.1.5.12.1.tar.Z</literal>.
It has no automatic scripts and processes to configure itself. After
fetching the sources, you will have to follow a set of directions given
in the documentation and configure some C header files. These
configuration settings must be done keeping in mind what you have
specified when you build the C-News sources, because NNTPd and C-News
must work together. Therefore, some key file formats, directory paths,
<emphasis>etc.</emphasis>, will have to be specified identically in both
software systems.</para>
<para>The third software system we use is Nestor. This too is to be
found in the same place where the NNTPd software is kept, at
<literal>ftp://ftp.uu.net/networking/news/nntp/nestor.tar.Z</literal>.
This software compiles to one binary program, which must be run
periodically to process the logs of <literal>nntpd</literal>, the NNTP
server which is part of NNTPd, and report usage statistics to the
administrator. We have integrated Nestor into our source base.</para>
<para>The fourth piece of the puzzle, without which no Usenet server
administrator dares venture out into the wild world of public Internet
newsfeeds, is <literal>pgpverify</literal>.</para>
<para>We have been working with C-News and NNTPd for many years now, and
have fixed a few bugs in both packages. We have also integrated the four
software systems listed above, and added a few features here and there to
make things work more smoothly. We offer our entire source base to
anyone for free download from
<literal>http://www.starcomsoftware.com/proj/news/src/news.tar.gz</literal>.
There are no licensing restrictions on our sources; they are as freely
redistributable as the original components we started with.</para>
<para>When you download our software distribution, you will extract it
to find a directory tree with the following subdirectories and files:</para>
<itemizedlist>
<listitem><para><literal>c-news</literal>: the source tree of the CR.G
software release, with our additions like
<literal>pgpverify</literal> integration, our scripts like
<literal>mail2news</literal>, and pre-created configuration
files.
</para></listitem>
<listitem><para><literal>nntp-1.5.12.1</literal>: the source tree of the
original NNTPd release, with header files pre-configured to fit in
with our configuration of C-News, and our addition of bits and
pieces like Nestor, the log analysis program.
</para></listitem>
<listitem><para><literal>howto</literal>: this document, and its SGML
sources and Makefile.
</para></listitem>
<listitem><para><literal>archives</literal>: a directory containing the
tarballs of the original C-News, NNTPd, Nestor and
<literal>pgpverify</literal> source distributions, in case you want
them. Strictly speaking, the <literal>archive</literal> directory is
not necessary unless you want to study what changes we have made,
what files we have added, to the original sources.
</para></listitem>
<listitem><para><literal>build.sh</literal>: a shellscript you can run
to compile the entire combined source tree and install binaries in the
right places, if you are lucky and all goes well.
</para></listitem>
</itemizedlist>
<para>Needless to say, we believe that our source tree is a better
place to start with than the original components, specially if you
are installing a Usenet server on a Linux box and for the first time.
We will be available on email to provide technical assistance should
you run into trouble.</para>
</section>
<section><title>The key configuration files</title>
<para>Once you get the sources, you will need some key configuration
files to seed your C-News system. These configuration files are
actually database tables, and are changing frequently, whenever
newsgroups are created, modified or deleted. These files specify
the list of active newsgroups in the ``public'' Usenet. You can,
and should, add your organisation's internal newsgroups to this
list when you set up your own server, but you will need to know
the list of public standard newsgroups to begin with. This list
can be obtained from the same FTP server by downloading the files
<literal>active.gz</literal> and <literal>newsgroups.gz</literal> from
<literal>ftp://ftp.uu.net/networking/news/config/</literal>. You
can create your own <literal>active</literal> and
<literal>newsgroups</literal> files by retaining a subset of the entries
in these two files. Both these are ASCII text files.</para>
<para>Getting the sources from our server will not obviate the need to
get the latest versions of these files from
<literal>ftp.uu.net</literal>. We do not (yet) maintain an up-to-date
copy of these files on our server, and we will add no value to the
original by just mirroring them.</para>
</section>
</section>
<section><title>Compiling and installing</title>
<para>
For installing, first make sure you have an entry for a user called
<literal>news</literal> in your <literal>/etc/password</literal> file. Add one if not present. This
is setting the news-database owner to <literal>news</literal>. Now download
the source from us and untar it in the home directory of news. This creates
two main directories <emphasis>viz.</emphasis> <literal>c-news</literal>
and <literal>nntp</literal>.
To install and compile, run the script <literal>build.sh</literal> as root
in the
directory that contains the script. It is important that the script run as
<literal>root</literal> as it sets ownerships and installs and compiles as
<literal>news</literal> and hence should have adequate permissions to do
this. This
is a one-step process that puts in place both the C-News and the
NNTP software, setting correct permissions and paths.
Following
is a brief description of what build.sh does:
</para>
<itemizedlist>
<listitem><para>
Checks for the <literal>OS</literal> platform and exits if
it is not <literal>Linux</literal>.
</para></listitem>
<listitem><para>
Again, exits if you are not running as
<literal>root</literal>.
</para></listitem>
<listitem><para>
Looks for and exits if cannot find the above two directories.
</para></listitem>
<listitem><para>
Compiles <literal>C-News</literal> and exits on error. This builds
all the software. Writes the error into a file called <literal>make.out</literal>. Read it to
determine the cause. Also, performs regression tests if the
compilation was successfull and does not exit on error. Sends out a
warning to read the error file <literal>make.out.r</literal> and fix 'em.
</para></listitem>
<listitem><para>
Performs the above operation in the <literal>nntp</literal> directory, too.
</para></listitem>
<listitem><para>
Checks the presence of the three key directories:
<literal>$NEWSARTS - (/var/spool/news)</literal> that houses the artciles,
<literal>$NEWSCTL -(/var/lib/news)</literal> that contain
the configuration, log and status files and <literal>$NEWSBIN -
(/usr/lib/newsbin)</literal> that contain the binaries and
executables for
the working of the Usenet News system. Tries to create them if non-existent
and exits if it results in failure.
</para></listitem>
<listitem><para>
Changes the ownership of these directories to <literal>news.news</literal>.
This is important since the entire Usenet News System runs as user <literal>news.</literal> It
will not function properly as any other user.
</para></listitem>
<listitem><para>
Then starts the installation process of C News. It runs
<literal>make install </literal>to install binaries at the right locations; <literal>make setup </literal>to set
the correct paths, umask, create directories for newsgroups, determine who
will receive reports; make ui to set up inews and injnews and
make readpostcheck to use readnews, postnews and checknews provided by
C News. The errors, if any are to be found in the respective make.out
files. e.g. make.setup will write errors to make.out.setup
</para></listitem>
<listitem><para>
<literal>Newsspool</literal> which queues incoming
batches in <literal>$NEWSARTS/in.coming</literal> directory should run as
set-userid and set-groupid This is done.
</para></listitem>
<listitem><para>
A softlink is made to <literal>/var/lib/news</literal> from
<literal>/usr/lib/news.</literal>
</para></listitem>
<listitem><para>
The NNTP software is installed.
</para></listitem>
<listitem><para>
Sets up the manpages for C News and makes it world
readable. The NNTP manpages get installed when the software is installed.
Compiles the C News documentation guide.ps and makes it readable and
available in <literal>/usr/doc/packages/news</literal> or
<literal>/usr/doc/news</literal>.
</para></listitem>
<listitem><para>
Checks for the PGP binary and asks the administrator to get
it if not found.
</para></listitem>
</itemizedlist>
</section>
<section><title>Configuring the system: What and how to configure files?</title>
<para>Once installed, you have to now configure the system to accept feeds and
batch them for neighbours. You will have to do the following:</para>
<itemizedlist>
<listitem><para><literal>nntpd</literal>:
Copy the compiled nntpd into a directory where
executables are kept and activate it. It runs on port 119 as a daemon
through inetd unless you have compiled it as stand-alone.
An entry in the services file for nntp would look like this:
<programlisting>nntp 119/tcp \# Network News Transfer Protocol</programlisting>
An entry in the inetd.conf file will be:
<programlisting>nntp stream tcp nowait news path-to-tcpd path-to-nntpd</programlisting>
The last two fields in the inetd.conf file are the paths to binaries of the
tcp daemon and the nntpd daemon respectively.
</para></listitem>
<listitem><para><emphasis role=bold>Configuring control files:</emphasis>
There are plenty of control files in <literal>$NEWSCTL</literal> that will
need to be configured before you can
start using the news system. The files mentioned here are explained in some
detail in chapter 8, section 8.1. The files to be configured are dealt in
detail in the following section.
</para>
<itemizedlist>
<listitem><para><literal>sys</literal>:
One line per system/NDN listing all the
newsgroup hierarchies each system subscribes to. Each line is prefixed
with the system name and the one beginning with ME: indicates what we
are going to receive. Following are typical entries that go into this
file:
<programlisting>ME:comp,news,misc,netscape</programlisting>
This line indicates what newsgroups your server, as determined by the
whoami file have subscribed to and will receive.
<programlisting>server/server.starcomsoftware.com:all,!general/all:f</programlisting>
This is a list of newsgroups this site will pass on to its NDN.
The newsgroups specified should be a comma separated list and no spaces
should be inserted in the whole line. The f flag indicates that the
newsgroup name and article no. alongwith its size will be one entry in
the togo file in the $NEWSARTS/out.going directory.
</para></listitem>
<listitem><para><literal>explist</literal>:
This file has entries indicating
which <literal>articles</literal> expire and when and if they have to be
archived The order in which the newsgroups are listed is important. An
example follows:
<programlisting>comp.lang.java.3d x 60 /var/spool/news/Archive</programlisting>
This means that the articles of comp.lang.java expire after 60 days and
shall be archived in the directory mentioned as the fourth field.
Archiving is an option. The second field indicates that this line
applies to both moderated and unmoderted newsgroups.
<emphasis>m</emphasis> would
specify moderated and <emphasis>u</emphasis> would specify unmoderated
groups. If you want to specify an extremely large no. as the expiry
period you can use the word 'never'.
</para></listitem>
<listitem><para><literal>batchparms</literal>:
Sendbatches is a program that
adminsters batched transmission of news to other sites. To do this it
consults the batchparms file. Each line in the file specifies the
behaviour for each site. There are five fields for each site to be
specified.</para>
<screen>server u 100000 100 batcher | gzip -9 | viauux -d gunzip</screen>
<para>
The first field is the site name which matches the entry in the sys
file and has a corresponding directory in $NEWSARTS/out.going by that
name.
</para>
<para>
The second field is the class of the site, 'u' for UUCP and 'n' for
NNTP feeds. A '!' in this field means that batching for this site has
been disabled.
</para>
<para>
The third field is the size of batches to be prepared in bytes.
</para>
<para>
The fourth field is the maximum length of the output queue for
transmission to that site.
</para>
<para>
The fifth field is the command line to be used to build, compress and
transmit batches to that site. It receives the contents of the togo file
on standard input.
</para>
</listitem>
<listitem><para><literal>controlperm</literal>:
This file controls how the news
system responds to control messages. Each line consists of 4-5 fields
separated by white space.</para>
<programlisting>comp,sci tale@uunet.uu.net nrc pv news.announce.newsgroups</programlisting>
<para>
The first field is a newsgroup pattern to which the line applies.
</para>
<para>
The second field is either 'any' or an e-mail address. The latter
specifies that the line applies only to control messages from that
author.
</para>
<para>
The third field is a set of opcode letters indicating what control
operations need to be performed on messages emanating from the e-mail
address mentioned in the second field. 'n' stands for creating a
newgroup, 'r' stands for deleting a newsgroup and 'c' stands for
checkgroup.
</para>
<para>
The fourth field is a set of flag letters indicating how to respond to
a control message that meets all the applicability tests:
<screen>
y Do it.
n Don't do it.
v Report it and include the entire control
message in the report.
q Don't report it.
p Do it iff the control message carries a valid PGP signature.
</screen>
Exactly one of y, n or p must be present.
</para>
<para>
The fifth field, which is optional, will be used if the fourth field
contains a 'p'. It must contain the PGP key ID of the public key to be
used for signature verification.
</para>
</listitem>
<listitem><para><literal>mailpaths</literal>:
This file describes how to reach
the moderators of various heirarchies of news groups by mail. Each line
consists of two fields: a news group pattern and an e-mail address. The
first line whose group pattern matches the newsgroup is used. As an
example:
<screen>
comp.lang.java.3d somebody@mydomain.com
all %s@moderators.uu.net
</screen>
In the second example, the %s gets replaced with the groupname and all
dots appearing in the name are substituted with dashes.
</para></listitem>
<listitem><para><emphasis role=bold>Miscellaneous files:</emphasis>
The other files to be modified are:
<itemizedlist>
<listitem><para><literal>mailname:</literal>
Contains the Internet domain name of the
news system. Consider getting one if you don't have it.
</para></listitem>
<listitem><para><literal>organization:</literal>
Contains the default value for the
Organization: header for postings originating locally.
</para></listitem>
<listitem><para><literal>whoami:</literal>
Contains the name of the news system. This
is the site name used in the Path: headers and hence should concur
with the names your neighbours use in their sys files.
</para></listitem>
</itemizedlist>
</para></listitem>
<listitem><para><literal>active </literal>file:
This file specifies one line for each
newsgroup (not just the hierarchy) to be found on your news system. You
will have to get the most recent copy of the active file from
<literal>ftp://ftp.isc.org/usenet/CONFIG/active</literal> and prune it
to delete
newsgroups that you have not subscribed to. Run the script "addgroup"
for each newsgroup in this file which will create relevant directories
in the <literal>$NEWSARTS</literal> area. The "addgroup" script takes
two paramters: the newsgroup name being created and a flag. The flag can
be any one of the following:
<screen>
y local postings are allowed
n no local postings, only remote ones
m postings to this group must be approved
by the moderator
j articles in this group are only passed and not kept
x posting to this newsgroup is disallowed
=foo.bar articles are locally filed in
"foo.bar" group
</screen>
An entry in this file looks like this:
<programlisting>comp.lang.java.3d 0000003716 01346 m </programlisting>
The first field is the name of the newsgroup. The second field is the
highest article number that has been used in that newsgroup. The
third field is the lowest article number in the group. The fourth
field is flag as explained above.
</para></listitem>
<listitem><para><literal>newsgroups </literal>file:
This contains a one line description
of each newsgroup to be found in the active file. You will have to
get the most recent file from
<literal>ftp://ftp.isc.org/usenet/CONFIG/newsgroups</literal>
and prune it to remove unwanted information. As an example:
<programlisting>comp.lang.java.3d 3D Grphics APIs for the Java language</programlisting>
</para></listitem>
<listitem><para><emphasis role=bold>Create aliases: </emphasis>
These aliases are required for trouble reporting.
Once the system is in place and scripts are run, anomalies/problems
are reported to addresses in the /etc/aliases file. These entries
include email addresses for <literal>newsmaster, newscrisis, news,
usenet, newsmap</literal>
They should ideally point to an email address that will be
looked at regularly. Arrange the emails for "newsmap" to be
discarded to minimize the effect of "sendsys bombing" by practical
jokers.
</para></listitem>
<listitem><para><emphasis role=bold>Cron jobs:</emphasis>
Certain scripts like newsrun that picks up incoming
batches and maintenance scripts, should run through news-database
owner's cron. The cron entries ideally will be for the following: A more
detailed report can be found in <xref linkend="cronjobs"/>
<orderedlist>
<listitem><para><literal>newsrun: </literal>
This script processes incoming batches of
article. Run this as frequently as you want them to get digested.
</para></listitem>
<listitem><para><literal>sendbatches:</literal>
This script transmit batches to the
NDNs. Set the frequency according to your requirements.
</para></listitem>
<listitem><para><literal>newsdaily:</literal>
This should be run ideally once a day
since it reports errors and anomalies in the news system.
</para></listitem>
<listitem><para><literal>newswatch:</literal>
This looks for errors/anomalies at a more detailed level and hence
should be run atleast once every hour
</para></listitem>
<listitem><para><literal>doexpire:</literal>
This script expires old articles as
determined by the explist file. Run this once a day.
</para></listitem>
</orderedlist>
</para></listitem>
<listitem><para>newslog:
Make an entry in the system's syslog.conf
file for logging messages spewed out by the nntp daemon in "newslog".
It should be located in <literal>$NEWSCTL</literal>. The entry will
look like this:
<programlisting>news.debug -/var/lib/news/newslog</programlisting>
</para></listitem>
<listitem><para>Newsboot:
Have newsboot run (as "news", the
news-database owner) when the system boots to clear out debris left
around by crashes.
</para></listitem>
<listitem><para>Add a Usenet mailer in sendmail:
The mail2news program provided as
part of the source code is a handy tool to send an e-mail to a newsgroup
which gets digested as an article. You will have to add the following
ruleset and mailer definition in your sendmail.cf file:</para>
<itemizedlist>
<listitem><para>Under SParse1, add the following:
<programlisting>
R$+ . USENET < @ $=w . > $#usenet $: $1
</programlisting>
</para></listitem>
<listitem><para>Under mailer definitions, define the mailer Usenet as:
<screen>
MUsenet P=/usr/lib/newsbin/mail2news/m2nmailer, F=lsDFMmn,
S=10, R=0, M=2000000, T=X-Usenet/X-Usenet/X-Unix, A=m2nmailer $u
</screen>
</para></listitem>
</itemizedlist>
<para>In order to send a mail to a newsgroup you will now have to suffix
the
newsgroup name with usenet <emphasis>i.e.</emphasis> your To: header
will look like this:
<screen>To: misc.test.usenet@yourdomain.</screen>
The mailer definition of usenet will intercept this mail and post it to
the respective newsgroup, in this case, misc.test</para>
</listitem>
</itemizedlist>
<para>
This, more or less, completes the configuration part.
</para>
</listitem>
</itemizedlist>
</section>
<section><title>Testing the system</title>
<para>
To locally test the system, follow the steps given below:
</para>
<itemizedlist>
<listitem><para>post an article:
Create a local newsgroup
<screen>
cnewsdo addgroup mysite.test y
</screen>
and using <literal>postnews </literal>post an article to it.
</para></listitem>
<listitem><para>Has it arrived in <literal>$NEWSARTS</literal>/in.coming?:
The article should show up in the directory mentioned. Note the nomenclature
of the article.
</para></listitem>
<listitem><para>When newsrun runs:
When newsrun runs through cron, the article disappears from in.coming
directory and appears in <literal>$NEWSARTS</literal>/mysite/test. Look how
the newsgroup, active, log and history (not the errorlog) files and
<literal> .overview </literal>file in
<literal>$NEWSARTS/mysite/test</literal> reflect the digestion of the file
into the news system.
</para></listitem>
<listitem><para>reading the article:
Try to read the article through readnews or any
news client. If you are able to, then you have set most everything right.
</para></listitem>
</itemizedlist>
</section>
<section><title><literal>pgpverify</literal> and <literal>controlperms</literal></title>
<para>
As mentioned in <xref linkend="controlmsg"/>, it becomes necessary to
authenticate control messages to protect yourself from being attacked by
pranksters. For this, you will have to configure the
<literal>$NEWSCTL</literal>/controlperm file to declare whose control
messages you are willing to honour and for what newsgroups alongwith their
public key ID. The controlperm manpage shall give you details on the format.
</para>
<para>
This will work only in association with <literal> pgpverify </literal> which
verifies the Usenet control messages that have been signed using the
<literal>signcontrol</literal> process. The script can be found at
<literal>ftp://ftp.isc.org/pub/pgpcontrol/pgpverify</literal>.
<literal> pgpverify </literal>pgpverify internally uses the PGP binary which
will have to be made available in the default executables directory. If you
wish to send control messages for your local news system, you will have to
digitally sign them using the above mentioned "signcontrol" program which is
available at
<literal>ftp://ftp.isc.org/pub/pgpcontrol/signcontrol</literal>. You will
also have to configure the signcontrol program accordingly.
</para>
</section>
<section><title>Feeding off an upstream neighbour</title>
<para>
For external feeds, commercial customers will have to buy them
from a regular News Provider like <literal>dejanews.com</literal>
or <literal>newsfeeds.com</literal>. You will have to specify
to them what hierarchies you want and decide on the mode of
transmission, <emphasis>i.e.</emphasis> UUCP or NNTP, based on
your requirements. Once that is done, you will have to ask them to
initiate feeds, and check <literal>$NEWSARTS/in.coming</literal>
directory to see if feeds are coming in.
</para>
<para>
If your organisation belongs to the academic community or is
otherwise lucky enough to have an NDN server somewhere which is
willing to provide you a free newsfeed, then the payment issue goes
out of the picture, but the rest of the technical requirements
remain the same.
</para>
<para>
One problem with incoming NNTP feeds is that it is far easier to use
(relatively) efficient NNTP inflows if you have a server with a
permanent Internet connection and a fixed IP address. If you are a
small office with a dialup Internet connection, this may not be
possible. In that case, the only way to get incoming newsfeeds by
NNTP may be by using a highly inefficient pull feed.
</para>
</section>
<section><title>Configuring outgoing feeds</title>
<para>
If you are a leaf node, you will only have to send feeds back to your
news provider for your postings in public newsgroups to propagate
to the outside world. To enable this, you need one line in the
<literal>sys</literal> and <literal>batchparms</literal> files
and one directory in <literal>$NEWSARTS/out.going</literal>. If
you are willing to transmit articles to your neighbouring
sites, you will have to configure <literal>sys</literal> and
<literal>batchparms</literal> with more entries. The number of directories
in <literal>$NEWSARTS/out.going</literal> shall increase, too. Refer
to chapter 8, section 8.1 and 8.2 for a better understanding of
outgoing feeds. Again, you will have to determine how you wish to
transmit the feed: UUCP or NNTP.
</para>
<section><title>By UUCP</title>
<para>For outgoing feeds by UUCP, we recommend that you start with
Taylor UUCP. In fact, this is the UUCP version which forms part
of the GNU Project and is the default UUCP on Linux
systems.</para>
<para>A full treatment of UUCP configuration is beyond the scope of
this document. However, the basic steps will be as follows. First,
you will have to define a ``system'' in your Usenet server for the
NDN (next door neighbour) host. This definition will include various
parameters, including the manner in which your server will call the
remote server, the protocol it will use, <emphasis>etc.</emphasis>
Then an identical process will have to be followed on the NDN
server's UUCP configuration, for your server, so that
<emphasis>that</emphasis> server can recognize
<emphasis>your</emphasis> Usenet server.</para>
<para>Finally, you will need to set up appropriate
<literal>cron</literal> jobs for the user <literal>uucp</literal>
to run <literal>uucico</literal> periodically. Taylor UUCP comes with
a script called <literal>uusched</literal> which may be modified to
your requirements; this script calls <literal>uucico</literal>. One
<literal>uucico</literal> connection will both upload and download
news batches. Smaller sites can run <literal>uusched</literal> even
once or twice a day.</para>
<para>Later versions of this document will include the
<literal>uusched</literal> scripts that we use in Starcom. We use
UUCP over TCP/IP, and we run the <literal>uucico</literal>
connection through an SSH tunnel, to prevent transmission of
UUCP passwords in plain text over the Internet, and our SSH tunnel
is established using public-key cryptography, without passwords
being used anywhere.</para>
</section>
<section><title>By NNTP</title>
<para>For NNTP feeds, you will have to decide whether your server
will be the connection initiator or connection recipient. If you are
the connection initiator, you can send outgoing NNTP feeds more
easily. If you are the connection recipient, then outgoing feeds
will have to be pulled out of your server using the NNTP
<literal>NEWNEWS</literal> command, which will place heavy loads on
your server. This is not recommended.</para>
<para>Connecting to your NDN server for pushing out outgoing feeds
will require the use of the <literal>nntpsend.sh</literal> script,
which is part of the NNTPd source tree. This script will perform
some housekeeping, and internally call the
<literal>nntpxmit</literal> binary to actually send the queued set
of articles out. You may have to provide authentication information
like usernames and passwords to <literal>nntpxmit</literal> to allow
it to connect to your NDN server, in case that server insists on
checking the identity of incoming connections. (You can't be too
careful in today's world.) <literal>nntpsend.sh</literal> will clean
up after an <literal>nntpxmit</literal> connection finishes, and
will requeue any unsent articles for the next session. Thus, even if
there is a network problem, typically nothing is lost and all
pending articles are transmitted next time.</para>
<para>Thus, pushing feeds out <emphasis>via</emphasis> may mean
setting up <literal>nntpsend.sh</literal> properly, and then
invoking it periodically from <literal>cron</literal>. If your
Usenet server connects to the Internet only intermittently, then the
process which sets up the Internet connection should be extended or
modified to fire <literal>nntpsend.sh</literal> whenever the Internet
link is established. For instance, if you are using the Linux
<literal>pppd</literal>, you can add statements to the
<literal>/etc/ppp/ip-up</literal> script to change user to
<literal>news</literal> and run <literal>nntpsend.sh</literal></para>
</section>
</section>
</chapter>

View File

@ -0,0 +1,247 @@
<chapter><title>Usenet news software</title>
<section><title>CNews and NNTPd</title>
<para>
Once upon a time, when Usenet news was a term not yet invented, the
first recorded attempt to use a UUCP-based email backbone to maintain a
replicated message repository, was called A-News. It connected four
servers in four universities, and was written as Unix shell
scripts.</para>
<para>The designers of A-News had not anticipated how much load users
would put on their simplistic system. A far superior, more sophisticated,
and faster implementation of Usenet news was written later, called
B-News. This was a mix of C and shell scripts, and was designed
much better than A-News, to allow handling of much larger volumes of
messages. B-News v2.x was the current version in around 1990. By 1992 or
so, it had been surpassed by C-News.</para>
<para>C-News was written by Henry Spencer and Geoff Collyer of the
Department of Zoology, University of Toronto, almost entirely in shell
and <literal>awk</literal>, as a replacement for B-News. Once again, the
focus was on adding some extra features and a lot of performance. The
first release was called Shellscript Release, which was deployed by a very
large number of servers worldwide, as a natural upgrade to B-News. This
version of C-News even had upward compatibility with B-News meta-data,
<emphasis>e.g.</emphasis> history files. This was the version of C-News
which was initially rolled out in 1992 or so at the National Centre for
Software Technology (NCST, <literal>http://www.ncst.ernet.in</literal>)
and the Indian Institute of Technologies in India as part of the Indian
ERNET network.</para>
<para>The Shellscript Release was soon followed by a re-write with a lot
more C code, called Performance Release, and then a set of cleanup and
component integration steps leading to the last release called the
Cleanup Release. This Cleanup Release was revised many times, and the
last one was CR.G (Cleanup Release revision G). The version of C-News
discussed in this HOWTO is a set of bug fixes on CR.G.</para>
<para>Since C-News came from shellscript-based antecedents, its
architecture followed the set-of-programs style so typical of Unix,
rather than large monolothic software systems traditional to some other
OSs. All pieces had well-defined roles, and therefore could be easily
replaced with other pieces as needed. This allowed easy adaptations and
upgradations. This never affected performance, because key components
which did a lot of work at high speed, <emphasis>e.g.</emphasis>
<literal>newsrun</literal>, had been rewritten in C by that time. Even
within the shellscripts, crucial components which handled binary data,
<emphasis>e.g.</emphasis> a component called <literal>dbz</literal>
to manipulate efficient on-disk hash arrays, were C programs with
command-line interfaces, called from scripts.</para>
<para>C-News was born in a world with widely varying network line speeds,
where bandwidth utilisation was a big issue and dialup links with UUCP
file transfers was common. Therefore, it has very strong support for
batched feeds, specially with a variety of compression techniques and
over a variety of fast and slow transport channels. And C-News virtually
does not know the existence of TCP/IP, other than one or two tiny batch
transport programs like <literal>viarsh</literal>. However, its design
was so modular that there was absolutely no problem in plugging in NNTP
functionality using a separate set of C programs without modifying
a single line of C-News. This was done by a program suite called
NNTPd.</para>
<para>This software suite could work with B-News and C-News article
repositories, and provided the full NNTP functionality. Since B-News
died a gradual death, the combination of C-News and NNTPd became a freely
redistributable, portable, modern, extensible, and high-performance
software suite for Unix Usenet servers. Further refinements were
added later, <emphasis>e.g.</emphasis> <literal>nov</literal>, the News
Overview package and <literal>pgpverify</literal>, a public-key-based
digital signature module to protect Usenet news servers against
fraudulent control messages.</para>
</section>
<section><title>INN</title>
<para>
INN is one of the two most widely used Usenet news server solutions. It
was written by Rich Salz for Unix systems which have a socket API ---
probably all Unix systems do, today.
</para>
<para>
INN has an architecture diametrically opposite to CNews. It is a
monolithic program, which is started at bootup time, and keeps running
till your server OS is shut down. This is like the way high performance
HTTP servers are run in most cases, and allows INN to cache a lot of
things in its memory, including message-IDs of recently posted messages,
<emphasis>etc.</emphasis> This interesting architecture has been discussed
in an interesting paper by the author, where he explains the problems
of the older BNews and CNews systems that he tried to address. Anyone
interested in Usenet software in general and INN in particular should
study this paper.</para>
<para>
INN addresses a Usenet news world which revolves around NNTP, though it
has support for UUCP batches --- a fact that not many INN administrators
seem to talk about. The primary situations where INN works at higher
efficiency over the CNews-NNTPd combination are in processing incoming
NNTP feeds when there are multiple incoming NNTP feeds. For multiple
readers reading and posting news over NNTP, there is no difference
between the efficiency of INN and NNTPd. <xref linkend="innefficiency"/>
discusses the efficiency issues of INN over the earlier CNews
architecture, based on Rich Salz' paper and our analyses of usage
patterns.
</para>
<para>
INN's architecture has inspired a lot of high-performance Usenet news
software, including a lot of commercial systems which address the
``carrier class'' market. That is the market for which the INN
architecture has clear advantages over C-News.
</para>
</section>
<section><title>Leafnode</title>
<para>
This is an interesting software system, to set up a ``small'' Usenet
news server on one computer which only receives newsfeeds but does not
have the headache of sending out bulk feeds to other sites,
<emphasis>i.e.</emphasis> it is a ``leaf node'' in the newsfeed flow
diagram.</para>
<para>This software is a sort of combination of article repository and
NNTP news server, and receives articles, digests and stores them on the
local hard disks, expires them periodically, and serves them to an NNTP
reader. It is claimed that it is simple to manage and is ideal for
installation on a desktop-class Unix or Linux box, since it does not
take up much resources.</para>
<para>Leafnode is based on an appealing idea, but we find no problem
using C-News and NNTPd on a desktop-class box. Its resource consumption is
somewhat proportional to the volume of articles you want it to process,
and the number of groups you'll want to retain for a small team of users
will be easily handled by C-News on a desktop-class computer. An office
of a hundred users can easily use C-News and NNTPd on a desktop computer
running Linux, with 64 MBytes of RAM, IDE drives, and sufficient disk
space. Of course, ease of configuration and management is dependent on
familiarity, and we are more familiar with C-News than with Leafnode. We
hope this HOWTO will help you in that direction.</para>
<para>TO BE EXTENDED AND CORRECTED.</para>
</section>
<section><title>Suck</title>
<para>Suck is a program which lets you pull out an NNTP feed from an NNTP
server and file it locally. It does not contain any article repository
management software, expecting you to do it using some other
software system, <emphasis>e.g.</emphasis> C-News or INN. It can
create batchfiles which can be fed to C-News, for instance. (Well,
to be fair, Suck <emphasis>does</emphasis> have an option to store the
fetched articles in a spool directory tree very much like what is used
by C-News or INN in their article area, with one file per article. You
can later read this raw message spool area using a mail client which
supports the <literal>msgdir</literal> file layout for mail folders,
like MH, perhaps. We don't find this option useful if you're running
Suck on a Usenet server.) Suck finally boils down to a single
command-line program which is invoked periodically, typically from
<literal>cron</literal>. It has a zillion command-line options which
are confusing at first, but later show how mature and finely tunable
the software is.</para>
<para>If you need an NNTP pull feed, then we know of no better programs
than Suck for the job. The <literal>nntpxfer</literal> program which
forms part of the NNTPd package also implements an NNTP pull feed, for
instance, but does not have one-tenth of the flexibility and fine-tuning
of Suck. One of the banes of the NNTP pull feed is connection timeouts;
Suck allows a lot of special tuning to handle this problem. If we had
to set up a Usenet server with an NNTP pull feed, we'd use Suck right
away.</para>
<para>TO BE EXTENDED AND CORRECTED.</para>
</section>
<section><title>Carrier class software</title>
<para>We have touched upon the characteristics of carrier-class Usenet
software in the section where we discuss NNTP efficiency issues. As that
bit shows, the requirements of carrier-class Usenet servers is very
different from those run within organisations and institutes for
providing internal service to their members.</para>
<para>Carrier-class servers are expected to handle a complete feed of all
articles in all newsgroups, including a lot of groups which have what we
call a ``high noise-to-signal ratio.'' They do not have the luxury of
choosing a ``useful'' subset like administrators of internal corporate
Usenet servers do. Secondly, carrier-class servers are expected to turn
articles around very fast, <emphasis>i.e.</emphasis> they are expected to
have very low latency from the moment they receive an article to the
time they retransmit it by NNTP to downstream servers. Third, they are
supposed to provide very high availability, <emphasis>i.e.</emphasis>
they are supposed to be like other carrier class services. This usually
means that they have parallel arrays of computers in load sharing
configurations. And fourth, they usually do not cater to retail
connections for reading and posting articles by human users. Usenet news
carriers usually reserve separate computers to handle retail
connections.</para>
<para>Thus, carrier-class servers do not need to maintain a repository
of articles with the usual residence times of days or weeks, and expire
articles after they age. They only need to focus on super-efficient
re-transmission. These highly specialised servers have software
which receive an article over NNTP, parse it, and immediately re-queue
it for outward transmission to dozens or hundreds of other servers. And
since they work at these high throughputs, their downstream servers
are also expected to be live on the Internet round the clock to receive
incoming NNTP connections from the carrier servers. Therefore, there's
no batching or long queueing needed, and batching cannot be used. In
fact, some carrier class servers state that if you wish to receive feeds
from them, then your servers need to be available round the clock and
connected with lines fast enough to take the blast of a full feed. If
you do not fulfil these conditions, your servers will lose articles,
and the carrier is not responsible for the loss.</para>
<para>Therefore, one can almost say that carrier-class servers have
neither article repositories nor queues other than the current message(s)
being re-transmitted. If they fail to connect to five of their fifty
downstream neighbours, or fail to push an article through due to
a transmit error, those five neighbours will never receive that
article later from this server; the article will be dropped from their
queues. Retries are not part of the game. Therefore, carrier-class
Usenet servers are more like packet routers than servers with
repositories.</para>
<para>It can be seen why carrier-class software cannot hope to do its
job using batch-oriented repository management software like C-News and
why it needs a totally NNTP-oriented implementation. Therefore, the INN
antecedents of some of these systems is to be expected. We would
<emphasis>love</emphasis> to hear from any Linux HOWTO reader whose
Usenet server needs include carrier-class behaviour.</para>
<para>As far as we know, there is no freely redistributable software
implementation of carrier-class Usenet news servers. There is no reason
why such services cannot be offered on Linux, even Intel Linux, provided
you have fast network links and arrays of servers. Linux as an OS platform
is not an issue here, but free software has not yet been made available
for this niche. Presumably it is because the users of such software are
service providers who earn money using it, and therefore are expected
to be willing to pay for it.</para>
<para>TO BE EXTENDED AND CORRECTED.</para>
</section>
</chapter>

View File

@ -0,0 +1,158 @@
<chapter> <title>What is the Usenet?</title>
<section> <title>Discussion groups </title>
<para>The Usenet is a huge worldwide collection of discussion
groups. Each discussion group has a name, <emphasis>e.g.</emphasis>
<literal>comp.os.linux.announce</literal>, and a collection of messages.
These messages, usually called <emphasis>articles</emphasis>, are posted
by readers like you and me who have access to Usenet servers, and are
then stored on the Usenet servers.</para>
<para>This ability to both read and write into a Usenet newsgroup makes
the Usenet very different from the bulk of what people today call ``the
Internet.'' The Internet has become a colloquial term to refer to the
World Wide Web, and the Web is (largely) read-only. There are online
discussion groups with Web interfaces, and there are mailing lists, but
Usenet is probably more convenient than either of these for most large
discussion communities. This is because the articles get replicated to
your local Usenet server, thus allowing you to read and post articles
without accessing the global Internet, something which is of great value
for those with slow Internet links. Usenet articles also conserve
bandwidth because they do not come and sit in each member's mailbox, unlike
email
based mailing lists. This way, twenty members of a mailing list in one
office will have twenty copies of each message copied to their
mailboxes. However, with a Usenet discussion group and a local Usenet
server, there's just one copy of each article, and it does not fill up
anyone's mailbox.</para>
<para>Another nice feature of having your own local Usenet server is
that articles stay on the server even after you've read them. You can't
accidentally delete a Usenet articles the way you can delete a message
from your mailbox. This way, a Usenet server is an
<emphasis>excellent</emphasis> way to archive articles of a group
discussion on a local server without placing the onus of archiving on
any group member. This makes local Usenet servers very valuable as
archives of internal discussion messages within corporate Intranets,
provided the article expiry configuration of the Usenet server software
has been set up for sufficiently long expiry periods.</para>
</section>
<section> <title>How it works, loosely speaking</title>
<para> Usenet news works by the reader first firing up a Usenet news
program, which in today's GUI world will highly likely be something like
Netscape Messenger or Microsoft's Outlook Express. There are a lot of
proven, well-designed character-based Usenet news readers, but a proper
review of the user agent software is outside the scope of this HOWTO, so
we will just assume that you are using whatever software you like. The
reader then selects a Usenet newsgroup from the hundreds or thousands of
newsgroups which are hosted by her local server, and accesses all unread
articles. These articles are displayed to her. She can then decide to
respond to some of them.</para>
<para>When the reader writes an article, either in response to an
existing one or as a start of a brand-new thread of discussion, her
software <emphasis>posts</emphasis> this article to the Usenet server.
The article contains a list of newsgroups into which it is to be posted.
Once it is accepted by the server, it becomes available for other users
to read and respond to. The article is automatically
<emphasis>expired</emphasis> or deleted by the server from its internal
archives based on expiry policies set in its software; the author of the
article usually can do little or nothing to control the expiry of her
articles.</para>
<para>A Usenet server rarely works on its own. It forms a part of
a collection of servers, which automatically exchange articles with
each other. The flow of articles from one server to another is called a
<emphasis>newsfeed</emphasis>. In a simplistic case, one can imagine a
worldwide network of servers, all configured to replicate articles with
each other, busily passing along copies across the network as soon as one
of them receives a new articles posted by a human reader. This replication
is done by powerful and fault-tolerant processes, and gives the Usenet
network its power. Your local Usenet server literally has a copy of all
current articles in all relevant newsgroups.</para>
</section>
<section> <title>About sizes, volumes, and so on </title>
<para>Any would-be Usenet server administrator or creator
<emphasis>must</emphasis> read the <quote>Periodic Posting about the basic steps
involved in configuring a machine to store Usenet news,</quote> also known as
the Site Setup FAQ, available from
<literal>ftp://rtfm.mit.edu/pub/usenet/news.answers/usenet/site-setup</literal>
or
<literal>ftp://ftp.uu.net/usenet/news.answers/news/site-setup.Z</literal>.
It was last updated in 1997, but trends haven't changed much since
then, though absolute volume figures have.</para>
<para>If you want your Usenet server to be a repository for all articles
in all newsgroups, you will probably not be reading this HOWTO, or even
if you do, you will rapidly realise that anyone who needs to read this
HOWTO may not be ready to set up such a server. This is because the
volumes of articles on the Usenet have reached a point where very
specialised networks, very high end servers, and large disk arrays
are required for handling such Usenet volumes. Those setups are called
``carrier-class'' Usenet servers, and will be discussed a bit later on in
this HOWTO. Administering such an array of hardware may not be the job
of the new Usenet administrator, for which this HOWTO (and most Linux
HOWTO's) are written.</para>
<para>Nevertheless, it may be interesting to understand what volumes we
are talking about. Usenet news article volumes have been doubling every
fourteen months or so, going by what we hear in comments from
carrier class Usenet administrators. In the beginning of 1997, this
volume was 1.2 GBytes of articles a day. Thus, the volumes should have
roughly done five doublings, or grown 32 times, by the time we reach
mid-2002, at the time of this writing. This gives us a volume of 38.4
GBytes per day. Assume that this transfer happens using uncompressed
NNTP (the norm), and add 50% extra for the overheads of NNTP, TCP,
and IP. This gives you a raw data transfer volume of 57.6 GBytes/day or
about 460 Gbits/day. If you have to transfer such volumes of data in 24
hours (86400 seconds), you'll need raw bandwidth of about 5.3 Mbits per
second just to <emphasis>receive all these articles</emphasis>. You'll
need more bandwidth to send out feeds to other neighbouring Usenet
servers, and then you'll need bandwidth to allow your readers to access
your servers and read and post articles in retail quantities. Clearly,
these volume figures are outside the network bandwidths of most
corporate organisations or educational institutions, and therefore only
those who are in the business of offering Usenet news can afford
it.</para>
<para>At the other end of the scale, it is perfectly feasible for a
small office to subscribe to a well-trimmed subset of Usenet newsgroups,
and exclude most of the high-volume newsgroups. Starcom Software, where
the authors of this HOWTO work, has worked with a fairly large subset of
600 newsgroups, which is still a tiny fraction of the 15,000+ newsgroups
that the carrier class services offer. Your office or college may not
even need 600 groups. And our company had excluded specific high-volume
but low-usefulness newsgroups like the <literal>talk</literal>,
<literal>comp.binaries</literal>, and <literal>alt</literal>
hierarchies. With the pruned subset, the total volume of articles per
day may amount to barely a hundred MBytes a day or so, and can be easily
handled by most small offices and educational institutions. And in such
situations, a single Intel Linux server can deliver excellent performance
as a Usenet server.</para>
<para>Then there's the <emphasis>internal</emphasis> Usenet service. By
internal here, we mean a private set of Usenet newsgroups, not a private
computer network. Every company or university which runs a Usenet news
service creates its own hierarchy of internal newsgroups, whose articles
never leave the campus or office, and which therefore do not consume
Internet bandwidth. These newsgroups are often the ones most hotly
accessed, and will carry more <emphasis>internally generated</emphasis>
traffic than all the ``public'' newsgroups you may subscribe to, within your
organisation. After all, how often does a guy have something to say
which is relevant to the world at large, unless he's discussing a globally
relevant topic like ``Unix rules!''? If such internal newsgroups are the
focus of your Usenet servers, then you may find that fairly modest
hardware and Internet bandwidth will suffice, depending on the size of
your organisation.</para>
<para>The new Usenet server administrator has to undertake a sizing
exercise to ensure that he does not bite off more than he, or his
network resources, can chew. We hope we have provided sufficient
information for him to get started with the right questions.</para>
</section>
</chapter>