LDP/LDP/howto/docbook/Usenet-News-HOWTO/software.sgml

248 lines
13 KiB
Plaintext

<section><title>Usenet news software</title>
<section><title>CNews and NNTPd</title>
<para>
Once upon a time, when Usenet news was a term not yet invented, the
first recorded attempt to use a UUCP-based email backbone to maintain a
replicated message repository, was called A-News. It connected four
servers in four universities, and was written as Unix shell
scripts.</para>
<para>The designers of A-News had not anticipated how much load users
would put on their simplistic system. A far superior, more sophisticated,
and faster implementation of Usenet news was written later, called
B-News. This was a mix of C and shell scripts, and was designed
much better than A-News, to allow handling of much larger volumes of
messages. B-News v2.x was the current version in around 1990. By 1992 or
so, it had been surpassed by C-News.</para>
<para>C-News was written by Henry Spencer and Geoff Collyer of the
Department of Zoology, University of Toronto, almost entirely in shell
and <literal>awk</literal>, as a replacement for B-News. Once again, the
focus was on adding some extra features and a lot of performance. The
first release was called Shellscript Release, which was deployed by a very
large number of servers worldwide, as a natural upgrade to B-News. This
version of C-News even had upward compatibility with B-News meta-data,
<emphasis>e.g.</emphasis> history files. This was the version of C-News
which was initially rolled out in 1992 or so at the National Centre for
Software Technology (NCST, <literal>http://www.ncst.ernet.in</literal>)
and the Indian Institute of Technologies in India as part of the Indian
ERNET network.</para>
<para>The Shellscript Release was soon followed by a re-write with a lot
more C code, called Performance Release, and then a set of cleanup and
component integration steps leading to the last release called the
Cleanup Release. This Cleanup Release was revised many times, and the
last one was CR.G (Cleanup Release revision G). The version of C-News
discussed in this HOWTO is a set of bug fixes on CR.G.</para>
<para>Since C-News came from shellscript-based antecedents, its
architecture followed the set-of-programs style so typical of Unix,
rather than large monolothic software systems traditional to some other
OSs. All pieces had well-defined roles, and therefore could be easily
replaced with other pieces as needed. This allowed easy adaptations and
upgradations. This never affected performance, because key components
which did a lot of work at high speed, <emphasis>e.g.</emphasis>
<literal>newsrun</literal>, had been rewritten in C by that time. Even
within the shellscripts, crucial components which handled binary data,
<emphasis>e.g.</emphasis> a component called <literal>dbz</literal>
to manipulate efficient on-disk hash arrays, were C programs with
command-line interfaces, called from scripts.</para>
<para>C-News was born in a world with widely varying network line speeds,
where bandwidth utilisation was a big issue and dialup links with UUCP
file transfers was common. Therefore, it has very strong support for
batched feeds, specially with a variety of compression techniques and
over a variety of fast and slow transport channels. And C-News virtually
does not know the existence of TCP/IP, other than one or two tiny batch
transport programs like <literal>viarsh</literal>. However, its design
was so modular that there was absolutely no problem in plugging in NNTP
functionality using a separate set of C programs without modifying
a single line of C-News. This was done by a program suite called
NNTPd.</para>
<para>This software suite could work with B-News and C-News article
repositories, and provided the full NNTP functionality. Since B-News
died a gradual death, the combination of C-News and NNTPd became a freely
redistributable, portable, modern, extensible, and high-performance
software suite for Unix Usenet servers. Further refinements were
added later, <emphasis>e.g.</emphasis> <literal>nov</literal>, the News
Overview package and <literal>pgpverify</literal>, a public-key-based
digital signature module to protect Usenet news servers against
fraudulent control messages.</para>
</section>
<section><title>INN</title>
<para>
INN is one of the two most widely used Usenet news server solutions. It
was written by Rich Salz for Unix systems which have a socket API ---
probably all Unix systems do, today.
</para>
<para>
INN has an architecture diametrically opposite to CNews. It is a
monolithic program, which is started at bootup time, and keeps running
till your server OS is shut down. This is like the way high performance
HTTP servers are run in most cases, and allows INN to cache a lot of
things in its memory, including message-IDs of recently posted messages,
<emphasis>etc.</emphasis> This interesting architecture has been discussed
in an interesting paper by the author, where he explains the problems
of the older BNews and CNews systems that he tried to address. Anyone
interested in Usenet software in general and INN in particular should
study this paper.</para>
<para>
INN addresses a Usenet news world which revolves around NNTP, though it
has support for UUCP batches --- a fact that not many INN administrators
seem to talk about. The primary situations where INN works at higher
efficiency over the CNews-NNTPd combination are in processing incoming
NNTP feeds when there are multiple incoming NNTP feeds. For multiple
readers reading and posting news over NNTP, there is no difference
between the efficiency of INN and NNTPd. <xref linkend="innefficiency"/>
discusses the efficiency issues of INN over the earlier CNews
architecture, based on Rich Salz' paper and our analyses of usage
patterns.
</para>
<para>
INN's architecture has inspired a lot of high-performance Usenet news
software, including a lot of commercial systems which address the
``carrier class'' market. That is the market for which the INN
architecture has clear advantages over C-News.
</para>
</section>
<section><title>Leafnode</title>
<para>
This is an interesting software system, to set up a ``small'' Usenet
news server on one computer which only receives newsfeeds but does not
have the headache of sending out bulk feeds to other sites,
<emphasis>i.e.</emphasis> it is a ``leaf node'' in the newsfeed flow
diagram.</para>
<para>This software is a sort of combination of article repository and
NNTP news server, and receives articles, digests and stores them on the
local hard disks, expires them periodically, and serves them to an NNTP
reader. It is claimed that it is simple to manage and is ideal for
installation on a desktop-class Unix or Linux box, since it does not
take up much resources.</para>
<para>Leafnode is based on an appealing idea, but we find no problem
using C-News and NNTPd on a desktop-class box. Its resource consumption is
somewhat proportional to the volume of articles you want it to process,
and the number of groups you'll want to retain for a small team of users
will be easily handled by C-News on a desktop-class computer. An office
of a hundred users can easily use C-News and NNTPd on a desktop computer
running Linux, with 64 MBytes of RAM, IDE drives, and sufficient disk
space. Of course, ease of configuration and management is dependent on
familiarity, and we are more familiar with C-News than with Leafnode. We
hope this HOWTO will help you in that direction.</para>
<para>TO BE EXTENDED AND CORRECTED.</para>
</section>
<section><title>Suck</title>
<para>Suck is a program which lets you pull out an NNTP feed from an NNTP
server and file it locally. It does not contain any article repository
management software, expecting you to do it using some other
software system, <emphasis>e.g.</emphasis> C-News or INN. It can
create batchfiles which can be fed to C-News, for instance. (Well,
to be fair, Suck <emphasis>does</emphasis> have an option to store the
fetched articles in a spool directory tree very much like what is used
by C-News or INN in their article area, with one file per article. You
can later read this raw message spool area using a mail client which
supports the <literal>msgdir</literal> file layout for mail folders,
like MH, perhaps. We don't find this option useful if you're running
Suck on a Usenet server.) Suck finally boils down to a single
command-line program which is invoked periodically, typically from
<literal>cron</literal>. It has a zillion command-line options which
are confusing at first, but later show how mature and finely tunable
the software is.</para>
<para>If you need an NNTP pull feed, then we know of no better programs
than Suck for the job. The <literal>nntpxfer</literal> program which
forms part of the NNTPd package also implements an NNTP pull feed, for
instance, but does not have one-tenth of the flexibility and fine-tuning
of Suck. One of the banes of the NNTP pull feed is connection timeouts;
Suck allows a lot of special tuning to handle this problem. If we had
to set up a Usenet server with an NNTP pull feed, we'd use Suck right
away.</para>
<para>TO BE EXTENDED AND CORRECTED.</para>
</section>
<section><title>Carrier class software</title>
<para>We have touched upon the characteristics of carrier-class Usenet
software in the section where we discuss NNTP efficiency issues. As that
bit shows, the requirements of carrier-class Usenet servers is very
different from those run within organisations and institutes for
providing internal service to their members.</para>
<para>Carrier-class servers are expected to handle a complete feed of all
articles in all newsgroups, including a lot of groups which have what we
call a ``high noise-to-signal ratio.'' They do not have the luxury of
choosing a ``useful'' subset like administrators of internal corporate
Usenet servers do. Secondly, carrier-class servers are expected to turn
articles around very fast, <emphasis>i.e.</emphasis> they are expected to
have very low latency from the moment they receive an article to the
time they retransmit it by NNTP to downstream servers. Third, they are
supposed to provide very high availability, <emphasis>i.e.</emphasis>
they are supposed to be like other carrier class services. This usually
means that they have parallel arrays of computers in load sharing
configurations. And fourth, they usually do not cater to retail
connections for reading and posting articles by human users. Usenet news
carriers usually reserve separate computers to handle retail
connections.</para>
<para>Thus, carrier-class servers do not need to maintain a repository
of articles with the usual residence times of days or weeks, and expire
articles after they age. They only need to focus on super-efficient
re-transmission. These highly specialised servers have software
which receive an article over NNTP, parse it, and immediately re-queue
it for outward transmission to dozens or hundreds of other servers. And
since they work at these high throughputs, their downstream servers
are also expected to be live on the Internet round the clock to receive
incoming NNTP connections from the carrier servers. Therefore, there's
no batching or long queueing needed, and batching cannot be used. In
fact, some carrier class servers state that if you wish to receive feeds
from them, then your servers need to be available round the clock and
connected with lines fast enough to take the blast of a full feed. If
you do not fulfil these conditions, your servers will lose articles,
and the carrier is not responsible for the loss.</para>
<para>Therefore, one can almost say that carrier-class servers have
neither article repositories nor queues other than the current message(s)
being re-transmitted. If they fail to connect to five of their fifty
downstream neighbours, or fail to push an article through due to
a transmit error, those five neighbours will never receive that
article later from this server; the article will be dropped from their
queues. Retries are not part of the game. Therefore, carrier-class
Usenet servers are more like packet routers than servers with
repositories.</para>
<para>It can be seen why carrier-class software cannot hope to do its
job using batch-oriented repository management software like C-News and
why it needs a totally NNTP-oriented implementation. Therefore, the INN
antecedents of some of these systems is to be expected. We would
<emphasis>love</emphasis> to hear from any Linux HOWTO reader whose
Usenet server needs include carrier-class behaviour.</para>
<para>As far as we know, there is no freely redistributable software
implementation of carrier-class Usenet news servers. There is no reason
why such services cannot be offered on Linux, even Intel Linux, provided
you have fast network links and arrays of servers. Linux as an OS platform
is not an issue here, but free software has not yet been made available
for this niche. Presumably it is because the users of such software are
service providers who earn money using it, and therefore are expected
to be willing to pay for it.</para>
<para>TO BE EXTENDED AND CORRECTED.</para>
</section>
</section>