mirror of https://github.com/tLDP/LDP
248 lines
13 KiB
Plaintext
248 lines
13 KiB
Plaintext
<section><title>Usenet news software</title>
|
|
|
|
<section><title>CNews and NNTPd</title>
|
|
<para>
|
|
Once upon a time, when Usenet news was a term not yet invented, the
|
|
first recorded attempt to use a UUCP-based email backbone to maintain a
|
|
replicated message repository, was called A-News. It connected four
|
|
servers in four universities, and was written as Unix shell
|
|
scripts.</para>
|
|
|
|
<para>The designers of A-News had not anticipated how much load users
|
|
would put on their simplistic system. A far superior, more sophisticated,
|
|
and faster implementation of Usenet news was written later, called
|
|
B-News. This was a mix of C and shell scripts, and was designed
|
|
much better than A-News, to allow handling of much larger volumes of
|
|
messages. B-News v2.x was the current version in around 1990. By 1992 or
|
|
so, it had been surpassed by C-News.</para>
|
|
|
|
<para>C-News was written by Henry Spencer and Geoff Collyer of the
|
|
Department of Zoology, University of Toronto, almost entirely in shell
|
|
and <literal>awk</literal>, as a replacement for B-News. Once again, the
|
|
focus was on adding some extra features and a lot of performance. The
|
|
first release was called Shellscript Release, which was deployed by a very
|
|
large number of servers worldwide, as a natural upgrade to B-News. This
|
|
version of C-News even had upward compatibility with B-News meta-data,
|
|
<emphasis>e.g.</emphasis> history files. This was the version of C-News
|
|
which was initially rolled out in 1992 or so at the National Centre for
|
|
Software Technology (NCST, <literal>http://www.ncst.ernet.in</literal>)
|
|
and the Indian Institute of Technologies in India as part of the Indian
|
|
ERNET network.</para>
|
|
|
|
<para>The Shellscript Release was soon followed by a re-write with a lot
|
|
more C code, called Performance Release, and then a set of cleanup and
|
|
component integration steps leading to the last release called the
|
|
Cleanup Release. This Cleanup Release was revised many times, and the
|
|
last one was CR.G (Cleanup Release revision G). The version of C-News
|
|
discussed in this HOWTO is a set of bug fixes on CR.G.</para>
|
|
|
|
<para>Since C-News came from shellscript-based antecedents, its
|
|
architecture followed the set-of-programs style so typical of Unix,
|
|
rather than large monolothic software systems traditional to some other
|
|
OSs. All pieces had well-defined roles, and therefore could be easily
|
|
replaced with other pieces as needed. This allowed easy adaptations and
|
|
upgradations. This never affected performance, because key components
|
|
which did a lot of work at high speed, <emphasis>e.g.</emphasis>
|
|
<literal>newsrun</literal>, had been rewritten in C by that time. Even
|
|
within the shellscripts, crucial components which handled binary data,
|
|
<emphasis>e.g.</emphasis> a component called <literal>dbz</literal>
|
|
to manipulate efficient on-disk hash arrays, were C programs with
|
|
command-line interfaces, called from scripts.</para>
|
|
|
|
<para>C-News was born in a world with widely varying network line speeds,
|
|
where bandwidth utilisation was a big issue and dialup links with UUCP
|
|
file transfers was common. Therefore, it has very strong support for
|
|
batched feeds, specially with a variety of compression techniques and
|
|
over a variety of fast and slow transport channels. And C-News virtually
|
|
does not know the existence of TCP/IP, other than one or two tiny batch
|
|
transport programs like <literal>viarsh</literal>. However, its design
|
|
was so modular that there was absolutely no problem in plugging in NNTP
|
|
functionality using a separate set of C programs without modifying
|
|
a single line of C-News. This was done by a program suite called
|
|
NNTPd.</para>
|
|
|
|
<para>This software suite could work with B-News and C-News article
|
|
repositories, and provided the full NNTP functionality. Since B-News
|
|
died a gradual death, the combination of C-News and NNTPd became a freely
|
|
redistributable, portable, modern, extensible, and high-performance
|
|
software suite for Unix Usenet servers. Further refinements were
|
|
added later, <emphasis>e.g.</emphasis> <literal>nov</literal>, the News
|
|
Overview package and <literal>pgpverify</literal>, a public-key-based
|
|
digital signature module to protect Usenet news servers against
|
|
fraudulent control messages.</para>
|
|
|
|
</section>
|
|
|
|
<section><title>INN</title>
|
|
<para>
|
|
INN is one of the two most widely used Usenet news server solutions. It
|
|
was written by Rich Salz for Unix systems which have a socket API ---
|
|
probably all Unix systems do, today.
|
|
</para>
|
|
|
|
<para>
|
|
INN has an architecture diametrically opposite to CNews. It is a
|
|
monolithic program, which is started at bootup time, and keeps running
|
|
till your server OS is shut down. This is like the way high performance
|
|
HTTP servers are run in most cases, and allows INN to cache a lot of
|
|
things in its memory, including message-IDs of recently posted messages,
|
|
<emphasis>etc.</emphasis> This interesting architecture has been discussed
|
|
in an interesting paper by the author, where he explains the problems
|
|
of the older BNews and CNews systems that he tried to address. Anyone
|
|
interested in Usenet software in general and INN in particular should
|
|
study this paper.</para>
|
|
|
|
<para>
|
|
INN addresses a Usenet news world which revolves around NNTP, though it
|
|
has support for UUCP batches --- a fact that not many INN administrators
|
|
seem to talk about. The primary situations where INN works at higher
|
|
efficiency over the CNews-NNTPd combination are in processing incoming
|
|
NNTP feeds when there are multiple incoming NNTP feeds. For multiple
|
|
readers reading and posting news over NNTP, there is no difference
|
|
between the efficiency of INN and NNTPd. <xref linkend="innefficiency"/>
|
|
discusses the efficiency issues of INN over the earlier CNews
|
|
architecture, based on Rich Salz' paper and our analyses of usage
|
|
patterns.
|
|
</para>
|
|
|
|
<para>
|
|
INN's architecture has inspired a lot of high-performance Usenet news
|
|
software, including a lot of commercial systems which address the
|
|
``carrier class'' market. That is the market for which the INN
|
|
architecture has clear advantages over C-News.
|
|
</para>
|
|
</section>
|
|
|
|
<section><title>Leafnode</title>
|
|
<para>
|
|
This is an interesting software system, to set up a ``small'' Usenet
|
|
news server on one computer which only receives newsfeeds but does not
|
|
have the headache of sending out bulk feeds to other sites,
|
|
<emphasis>i.e.</emphasis> it is a ``leaf node'' in the newsfeed flow
|
|
diagram.</para>
|
|
|
|
<para>This software is a sort of combination of article repository and
|
|
NNTP news server, and receives articles, digests and stores them on the
|
|
local hard disks, expires them periodically, and serves them to an NNTP
|
|
reader. It is claimed that it is simple to manage and is ideal for
|
|
installation on a desktop-class Unix or Linux box, since it does not
|
|
take up much resources.</para>
|
|
|
|
<para>Leafnode is based on an appealing idea, but we find no problem
|
|
using C-News and NNTPd on a desktop-class box. Its resource consumption is
|
|
somewhat proportional to the volume of articles you want it to process,
|
|
and the number of groups you'll want to retain for a small team of users
|
|
will be easily handled by C-News on a desktop-class computer. An office
|
|
of a hundred users can easily use C-News and NNTPd on a desktop computer
|
|
running Linux, with 64 MBytes of RAM, IDE drives, and sufficient disk
|
|
space. Of course, ease of configuration and management is dependent on
|
|
familiarity, and we are more familiar with C-News than with Leafnode. We
|
|
hope this HOWTO will help you in that direction.</para>
|
|
|
|
<para>TO BE EXTENDED AND CORRECTED.</para>
|
|
|
|
</section>
|
|
|
|
<section><title>Suck</title>
|
|
<para>Suck is a program which lets you pull out an NNTP feed from an NNTP
|
|
server and file it locally. It does not contain any article repository
|
|
management software, expecting you to do it using some other
|
|
software system, <emphasis>e.g.</emphasis> C-News or INN. It can
|
|
create batchfiles which can be fed to C-News, for instance. (Well,
|
|
to be fair, Suck <emphasis>does</emphasis> have an option to store the
|
|
fetched articles in a spool directory tree very much like what is used
|
|
by C-News or INN in their article area, with one file per article. You
|
|
can later read this raw message spool area using a mail client which
|
|
supports the <literal>msgdir</literal> file layout for mail folders,
|
|
like MH, perhaps. We don't find this option useful if you're running
|
|
Suck on a Usenet server.) Suck finally boils down to a single
|
|
command-line program which is invoked periodically, typically from
|
|
<literal>cron</literal>. It has a zillion command-line options which
|
|
are confusing at first, but later show how mature and finely tunable
|
|
the software is.</para>
|
|
|
|
<para>If you need an NNTP pull feed, then we know of no better programs
|
|
than Suck for the job. The <literal>nntpxfer</literal> program which
|
|
forms part of the NNTPd package also implements an NNTP pull feed, for
|
|
instance, but does not have one-tenth of the flexibility and fine-tuning
|
|
of Suck. One of the banes of the NNTP pull feed is connection timeouts;
|
|
Suck allows a lot of special tuning to handle this problem. If we had
|
|
to set up a Usenet server with an NNTP pull feed, we'd use Suck right
|
|
away.</para>
|
|
|
|
<para>TO BE EXTENDED AND CORRECTED.</para>
|
|
|
|
</section>
|
|
|
|
<section><title>Carrier class software</title>
|
|
|
|
<para>We have touched upon the characteristics of carrier-class Usenet
|
|
software in the section where we discuss NNTP efficiency issues. As that
|
|
bit shows, the requirements of carrier-class Usenet servers is very
|
|
different from those run within organisations and institutes for
|
|
providing internal service to their members.</para>
|
|
|
|
<para>Carrier-class servers are expected to handle a complete feed of all
|
|
articles in all newsgroups, including a lot of groups which have what we
|
|
call a ``high noise-to-signal ratio.'' They do not have the luxury of
|
|
choosing a ``useful'' subset like administrators of internal corporate
|
|
Usenet servers do. Secondly, carrier-class servers are expected to turn
|
|
articles around very fast, <emphasis>i.e.</emphasis> they are expected to
|
|
have very low latency from the moment they receive an article to the
|
|
time they retransmit it by NNTP to downstream servers. Third, they are
|
|
supposed to provide very high availability, <emphasis>i.e.</emphasis>
|
|
they are supposed to be like other carrier class services. This usually
|
|
means that they have parallel arrays of computers in load sharing
|
|
configurations. And fourth, they usually do not cater to retail
|
|
connections for reading and posting articles by human users. Usenet news
|
|
carriers usually reserve separate computers to handle retail
|
|
connections.</para>
|
|
|
|
<para>Thus, carrier-class servers do not need to maintain a repository
|
|
of articles with the usual residence times of days or weeks, and expire
|
|
articles after they age. They only need to focus on super-efficient
|
|
re-transmission. These highly specialised servers have software
|
|
which receive an article over NNTP, parse it, and immediately re-queue
|
|
it for outward transmission to dozens or hundreds of other servers. And
|
|
since they work at these high throughputs, their downstream servers
|
|
are also expected to be live on the Internet round the clock to receive
|
|
incoming NNTP connections from the carrier servers. Therefore, there's
|
|
no batching or long queueing needed, and batching cannot be used. In
|
|
fact, some carrier class servers state that if you wish to receive feeds
|
|
from them, then your servers need to be available round the clock and
|
|
connected with lines fast enough to take the blast of a full feed. If
|
|
you do not fulfil these conditions, your servers will lose articles,
|
|
and the carrier is not responsible for the loss.</para>
|
|
|
|
<para>Therefore, one can almost say that carrier-class servers have
|
|
neither article repositories nor queues other than the current message(s)
|
|
being re-transmitted. If they fail to connect to five of their fifty
|
|
downstream neighbours, or fail to push an article through due to
|
|
a transmit error, those five neighbours will never receive that
|
|
article later from this server; the article will be dropped from their
|
|
queues. Retries are not part of the game. Therefore, carrier-class
|
|
Usenet servers are more like packet routers than servers with
|
|
repositories.</para>
|
|
|
|
<para>It can be seen why carrier-class software cannot hope to do its
|
|
job using batch-oriented repository management software like C-News and
|
|
why it needs a totally NNTP-oriented implementation. Therefore, the INN
|
|
antecedents of some of these systems is to be expected. We would
|
|
<emphasis>love</emphasis> to hear from any Linux HOWTO reader whose
|
|
Usenet server needs include carrier-class behaviour.</para>
|
|
|
|
<para>As far as we know, there is no freely redistributable software
|
|
implementation of carrier-class Usenet news servers. There is no reason
|
|
why such services cannot be offered on Linux, even Intel Linux, provided
|
|
you have fast network links and arrays of servers. Linux as an OS platform
|
|
is not an issue here, but free software has not yet been made available
|
|
for this niche. Presumably it is because the users of such software are
|
|
service providers who earn money using it, and therefore are expected
|
|
to be willing to pay for it.</para>
|
|
|
|
<para>TO BE EXTENDED AND CORRECTED.</para>
|
|
|
|
</section>
|
|
|
|
</section>
|