mirror of https://github.com/tLDP/LDP
243 lines
12 KiB
Plaintext
243 lines
12 KiB
Plaintext
<section><title>Usenet news software</title>
|
|
|
|
<section><title>A brief history of Usenet systems</title>
|
|
|
|
<para>Towards the end of this HOWTO, we have added some information
|
|
about the history of Usenet server software by quoting sections from an
|
|
earlier Usenet Periodic Posting. We consider this historical
|
|
perspective, and the Usenix papers and other documents referred to in
|
|
it, essential reading for any Usenet server administrator. Please see
|
|
the section titled <quote><xref linkend="softwarehistory"/></quote>.
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section><title>C-News and NNTPd</title>
|
|
<para>C-News was written by Henry Spencer and Geoff Collyer of the
|
|
Department of Zoology, University of Toronto, almost entirely in shell
|
|
and <literal>awk</literal>, as a replacement for an earlier system called
|
|
B-News. The focus was on adding some extra features and a
|
|
lot of performance. The first release was called Shellscript Release,
|
|
which was deployed by a very large number of servers worldwide, as
|
|
a natural upgrade to B-News. This version of C-News had upward
|
|
compatibility with B-News meta-data, <emphasis>e.g.</emphasis> history
|
|
files. This was the version of C-News which was initially rolled out
|
|
in 1991 or so at the National Centre for Software Technology (NCST,
|
|
<literal>http://www.ncst.ernet.in</literal>) and the Indian Institutes
|
|
of Technology in India as part of the Indian educational and research
|
|
network (ERNET). We received guidance from the NCST about Usenet news
|
|
installation and management.</para>
|
|
|
|
<para>The Shellscript Release was soon followed by a re-write with a lot
|
|
more C code, called Performance Release, and then a set of cleanup and
|
|
component integration steps leading to the last release called the Cleanup
|
|
Release. This Cleanup Release was patched many times by the authors,
|
|
and the last one was CR.G (Cleanup Release revision G). The version of
|
|
C-News discussed in this HOWTO is a set of small bug fixes on CR.G.</para>
|
|
|
|
<para>Since C-News came from shellscript-based antecedents, its
|
|
architecture followed the set-of-programs style so typical of Unix,
|
|
rather than large monolothic software systems traditional to some other
|
|
OSs. All pieces had well-defined roles, and therefore could be easily
|
|
replaced with other pieces as needed. This allowed easy adaptations and
|
|
upgradations. This never affected performance, because key components
|
|
which did a lot of work at high speed, <emphasis>e.g.</emphasis>
|
|
<literal>newsrun</literal>, had been rewritten in C by that time. Even
|
|
within the shellscripts, crucial components which handled binary data,
|
|
<emphasis>e.g.</emphasis> a component called <literal>dbz</literal>
|
|
to manipulate efficient on-disk hash arrays, were C programs with
|
|
command-line interfaces, called from scripts.</para>
|
|
|
|
<para>C-News was born in a world with widely varying network line speeds,
|
|
where bandwidth utilisation was a big issue and dialup links with UUCP
|
|
file transfers was common. Therefore, it has strong support for
|
|
batched feeds, specially with a variety of compression techniques and
|
|
over a variety of fast and slow transport channels. And C-News virtually
|
|
does not know the existence of TCP/IP, other than one or two tiny batch
|
|
transport programs like <literal>viarsh</literal>. However, its design
|
|
was so modular that there was absolutely no problem in plugging in NNTP
|
|
functionality using a separate set of C programs without modifying
|
|
a single line of C-News. This was done by a program suite called
|
|
NNTP Reference Implementation, which we call NNTPd.</para>
|
|
|
|
<para>This software suite could work with B-News and C-News article
|
|
repositories, and provided the full NNTP functionality. Since B-News
|
|
died a gradual death, the combination of C-News and NNTPd became a freely
|
|
redistributable, portable, modern, extensible, and high-performance
|
|
software suite for Unix Usenet servers. Further refinements were
|
|
added later, <emphasis>e.g.</emphasis> <literal>nov</literal>, the News
|
|
Overview package and <literal>pgpverify</literal>, a public-key-based
|
|
digital signature module to protect Usenet news servers against
|
|
fraudulent control messages.</para>
|
|
|
|
</section>
|
|
|
|
<section><title>INN</title>
|
|
<para>
|
|
INN is one of the two most widely used Usenet news server solutions. It
|
|
was written by Rich Salz for Unix systems which have a socket API ---
|
|
probably all Unix systems do, today.
|
|
</para>
|
|
|
|
<para>
|
|
INN has an architecture diametrically opposite to CNews. It is a
|
|
monolithic program, which is started at bootup time, and keeps running
|
|
till your server OS is shut down. This is like the way high performance
|
|
HTTP servers are run in most cases, and allows INN to cache a lot of
|
|
things in its memory, including message-IDs of recently posted messages,
|
|
<emphasis>etc.</emphasis> This interesting architecture has been discussed
|
|
in an interesting paper by the author, where he explains the problems
|
|
of the older B-News and C-News systems that he tried to address. Anyone
|
|
interested in Usenet software in general and INN in particular should
|
|
study this paper.
|
|
</para>
|
|
|
|
<para>
|
|
INN addresses a Usenet news world which revolves around NNTP, though it
|
|
has support for UUCP batches --- a fact that not many INN administrators
|
|
seem to talk about. INN works faster than the CNews-NNTPd combination when
|
|
processing multiple parallel incoming NNTP feeds. For multiple readers
|
|
reading and posting news over NNTP, there is no difference between the
|
|
efficiency of INN and NNTPd. <xref linkend="innefficiency"/> discusses
|
|
the efficiency issues of INN over the earlier C-News architecture, based
|
|
on Rich Salz' paper and our analyses of usage patterns. </para>
|
|
|
|
<para>
|
|
INN's architecture has inspired a lot of high-performance Usenet news
|
|
software, including a lot of commercial systems which address the
|
|
``carrier class'' market. That is the market for which the INN
|
|
architecture has clear advantages over C-News.
|
|
</para>
|
|
</section>
|
|
|
|
<section><title>Leafnode</title>
|
|
<para>
|
|
This is an interesting software system, to set up a ``small'' Usenet
|
|
news server on one computer which only receives newsfeeds but does not
|
|
have the headache of sending out bulk feeds to other sites,
|
|
<emphasis>i.e.</emphasis> it is a ``leaf node'' in the newsfeed flow
|
|
diagram. According to its homepage (<literal>www.leafnode.org</literal>),
|
|
``Leafnode is a USENET software package designed for small sites running
|
|
any flavour of Unix, with a few tens of readers and only a slow link
|
|
to the net. [...] The current version is 1.9.24.''</para>
|
|
|
|
<para>This software is a sort of combination of article repository and
|
|
NNTP news server, and receives articles, digests and stores them on the
|
|
local hard disks, expires them periodically, and serves them to an NNTP
|
|
reader. It is claimed that it is simple to manage and is ideal for
|
|
installation on a desktop-class Unix or Linux box, since it does not
|
|
take up much resources.</para>
|
|
|
|
<para>Leafnode is based on an appealing idea, but we find no problem
|
|
using C-News and NNTPd on a desktop-class box. Its resource consumption is
|
|
somewhat proportional to the volume of articles you want it to process,
|
|
and the number of groups you'll want to retain for a small team of users
|
|
will be easily handled by C-News on a desktop-class computer. An office
|
|
of a hundred users can easily use C-News and NNTPd on a desktop computer
|
|
running Linux, with 64 MBytes of RAM, IDE drives, and sufficient disk
|
|
space. Of course, ease of configuration and management is dependent on
|
|
familiarity, and we are more familiar with C-News than with Leafnode. We
|
|
hope this HOWTO will help you in that direction.</para>
|
|
|
|
<para>There <emphasis>is</emphasis>, however, one area in which Leafnode
|
|
is far easier to administer than INN or C-News. Leafnode constantly
|
|
monitors the actual usage of the newsgroups it carries, based on
|
|
readership statistics of its NNTP readers. If a particular newsgroup
|
|
is not read at all by any user for a week, then Leafnode will delete
|
|
all articles in that newsgroup, free up disk space, and stop fetching
|
|
new articles for it. If it finds that a previously abandoned newsgroup
|
|
is now again receiving attention, even from one user, then it'll fetch
|
|
all articles for that group from its upstream server the next time it
|
|
connects. This self-tuning feature of Leafnode is really an excellent
|
|
advantage which makes a Leafnode site easier to manage, specially for
|
|
small setups with bandwidth and disk space constraints.</para>
|
|
|
|
<para>The Leafnode Website gives a lot of details in an easily
|
|
understood format.</para>
|
|
|
|
<para>TO BE EXTENDED AND CORRECTED.</para>
|
|
|
|
</section>
|
|
|
|
<section><title>Suck</title>
|
|
<para>Suck is a program which lets you pull out an NNTP feed from an NNTP
|
|
server and file it locally. It does not contain any article repository
|
|
management software, expecting you to do it using some other
|
|
software system, <emphasis>e.g.</emphasis> C-News or INN. It can
|
|
create batchfiles which can be fed to C-News, for instance. (Well,
|
|
to be fair, Suck <emphasis>does</emphasis> have an option to store the
|
|
fetched articles in a spool directory tree very much like what is used
|
|
by C-News or INN in their article area, with one file per article. You
|
|
can later read this raw message spool area using a mail client which
|
|
supports the <literal>msgdir</literal> file layout for mail folders,
|
|
like MH, perhaps. We don't find this option useful if you're running
|
|
Suck on a Usenet server.) Suck finally boils down to a single
|
|
command-line program which is invoked periodically, typically from
|
|
<literal>cron</literal>. It has a zillion command-line options which
|
|
are confusing at first, but later show how mature and finely tunable
|
|
the software is.</para>
|
|
|
|
<para>If you need an NNTP pull feed, then we know of no better programs
|
|
than Suck for the job. The <literal>nntpxfer</literal> program which
|
|
forms part of the NNTPd package also implements an NNTP pull feed, for
|
|
instance, but does not have one-tenth of the flexibility and fine-tuning
|
|
of Suck. One of the banes of the NNTP pull feed is connection timeouts;
|
|
Suck allows a lot of special tuning to handle this problem. If we had
|
|
to set up a Usenet server with an NNTP pull feed, we'd use Suck right
|
|
away.</para>
|
|
|
|
<para>TO BE EXTENDED AND CORRECTED.</para>
|
|
|
|
</section>
|
|
|
|
<section><title>Carrier class software</title>
|
|
|
|
<para>Carrier-class servers are expected to handle a complete feed of all
|
|
articles in all newsgroups, including a lot of groups which have what we
|
|
call a ``high noise-to-signal ratio.'' They do not have the luxury of
|
|
choosing a ``useful'' subset like administrators of internal corporate
|
|
Usenet servers do. Secondly, carrier-class servers are expected to turn
|
|
articles around very fast, <emphasis>i.e.</emphasis> they are expected to
|
|
have very low latency from the moment they receive an article to the
|
|
time they retransmit it by NNTP to downstream servers. Third, they
|
|
are supposed to provide very high availability, like other ``carrier
|
|
class'' services. This usually means that they have parallel arrays of
|
|
computers in load sharing configurations. And fourth, they usually do
|
|
not cater to retail connections for reading and posting articles by human
|
|
users. Usenet news carriers usually reserve separate computers to handle
|
|
retail connections.</para>
|
|
|
|
<para>Thus, carrier-class servers do not need to maintain a repository
|
|
of articles; they only need to focus on super-efficient real-time
|
|
re-transmission. These highly specialised servers have software which
|
|
receive an article over NNTP, parse it, and immediately re-queue it for
|
|
outward transmission to dozens or hundreds of other servers. And since
|
|
they work at these high throughputs, their downstream servers are also
|
|
expected to be live on the Internet round the clock to receive incoming
|
|
NNTP connections, or be prepared to lose articles. Therefore, there's
|
|
no batching or long queueing needed, and C-News-style batching in fact
|
|
is totally inapplicable.</para>
|
|
|
|
<para>Therefore, these carrier-class Usenet servers are more like packet
|
|
routers than servers with repositories. They are referred to nowadays as
|
|
NNTP routers or news routers.</para>
|
|
|
|
<para>It can be seen why batch-oriented repository management
|
|
software like C-News is a total anachronism here, and why they need an
|
|
NNTP-oriented, online, real-time design. The INN antecedents of some
|
|
of these systems is therefore natural. We would love to hear from any
|
|
Linux HOWTO reader whose Usenet server requirements include carrier-class
|
|
behaviour.</para>
|
|
|
|
<para>We are aware of only one freely redistributable NNTP router:
|
|
NNTPRelay (see <literal>http://nntprelay.maxwell.syr.edu/</literal>); this
|
|
software runs on NT. There is no reason why such services cannot run off
|
|
Linux servers, even Intel Linux, provided you have fast network links and
|
|
arrays of servers. Linux as an OS platform is not an issue here.</para>
|
|
|
|
<para>TO BE EXTENDED AND CORRECTED.</para>
|
|
|
|
</section>
|
|
|
|
</section>
|