old-www/HOWTO/Usenet-News-HOWTO/x248.html

453 lines
14 KiB
HTML

<HTML
><HEAD
><TITLE
>Usenet news software</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.76b+
"><LINK
REL="HOME"
TITLE="Usenet News HOWTO "
HREF="index.html"><LINK
REL="PREVIOUS"
TITLE="Principles of Operation"
HREF="x64.html"><LINK
REL="NEXT"
TITLE="Setting up CNews + NNTPd"
HREF="settingup.html"></HEAD
><BODY
CLASS="SECTION"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Usenet News HOWTO</TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="x64.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
></TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="settingup.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECTION"
><H1
CLASS="SECTION"
><A
NAME="AEN248">3. Usenet news software</H1
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN250">3.1. A brief history of Usenet systems</H2
><P
>Towards the end of this HOWTO, we have added some information
about the history of Usenet server software by quoting sections from an
earlier Usenet Periodic Posting. We consider this historical
perspective, and the Usenix papers and other documents referred to in
it, essential reading for any Usenet server administrator. Please see
the section titled <SPAN
CLASS="QUOTE"
>"<A
HREF="softwarehistory.html"
>Usenet software: a historical perspective</A
>&#62;"</SPAN
>.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN255">3.2. C-News and NNTPd</H2
><P
>C-News was written by Henry Spencer and Geoff Collyer of the
Department of Zoology, University of Toronto, almost entirely in shell
and <TT
CLASS="LITERAL"
>awk</TT
>, as a replacement for an earlier system called
B-News. The focus was on adding some extra features and a
lot of performance. The first release was called Shellscript Release,
which was deployed by a very large number of servers worldwide, as
a natural upgrade to B-News. This version of C-News had upward
compatibility with B-News meta-data, <EM
>e.g.</EM
> history
files. This was the version of C-News which was initially rolled out
in 1991 or so at the National Centre for Software Technology (NCST,
<TT
CLASS="LITERAL"
>http://www.ncst.ernet.in</TT
>) and the Indian Institutes
of Technology in India as part of the Indian educational and research
network (ERNET). We received guidance from the NCST about Usenet news
installation and management.</P
><P
>The Shellscript Release was soon followed by a re-write with a lot
more C code, called Performance Release, and then a set of cleanup and
component integration steps leading to the last release called the Cleanup
Release. This Cleanup Release was patched many times by the authors,
and the last one was CR.G (Cleanup Release revision G). The version of
C-News discussed in this HOWTO is a set of small bug fixes on CR.G.</P
><P
>Since C-News came from shellscript-based antecedents, its
architecture followed the set-of-programs style so typical of Unix,
rather than large monolothic software systems traditional to some other
OSs. All pieces had well-defined roles, and therefore could be easily
replaced with other pieces as needed. This allowed easy adaptations and
upgradations. This never affected performance, because key components
which did a lot of work at high speed, <EM
>e.g.</EM
>
<TT
CLASS="LITERAL"
>newsrun</TT
>, had been rewritten in C by that time. Even
within the shellscripts, crucial components which handled binary data,
<EM
>e.g.</EM
> a component called <TT
CLASS="LITERAL"
>dbz</TT
>
to manipulate efficient on-disk hash arrays, were C programs with
command-line interfaces, called from scripts.</P
><P
>C-News was born in a world with widely varying network line speeds,
where bandwidth utilisation was a big issue and dialup links with UUCP
file transfers was common. Therefore, it has strong support for
batched feeds, specially with a variety of compression techniques and
over a variety of fast and slow transport channels. And C-News virtually
does not know the existence of TCP/IP, other than one or two tiny batch
transport programs like <TT
CLASS="LITERAL"
>viarsh</TT
>. However, its design
was so modular that there was absolutely no problem in plugging in NNTP
functionality using a separate set of C programs without modifying
a single line of C-News. This was done by a program suite called
NNTP Reference Implementation, which we call NNTPd.</P
><P
>This software suite could work with B-News and C-News article
repositories, and provided the full NNTP functionality. Since B-News
died a gradual death, the combination of C-News and NNTPd became a freely
redistributable, portable, modern, extensible, and high-performance
software suite for Unix Usenet servers. Further refinements were
added later, <EM
>e.g.</EM
> <TT
CLASS="LITERAL"
>nov</TT
>, the News
Overview package and <TT
CLASS="LITERAL"
>pgpverify</TT
>, a public-key-based
digital signature module to protect Usenet news servers against
fraudulent control messages.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN273">3.3. INN</H2
><P
>INN is one of the two most widely used Usenet news server solutions. It
was written by Rich Salz for Unix systems which have a socket API ---
probably all Unix systems do, today.</P
><P
>INN has an architecture diametrically opposite to CNews. It is a
monolithic program, which is started at bootup time, and keeps running
till your server OS is shut down. This is like the way high performance
HTTP servers are run in most cases, and allows INN to cache a lot of
things in its memory, including message-IDs of recently posted messages,
<EM
>etc.</EM
> This interesting architecture has been discussed
in an interesting paper by the author, where he explains the problems
of the older B-News and C-News systems that he tried to address. Anyone
interested in Usenet software in general and INN in particular should
study this paper.</P
><P
>INN addresses a Usenet news world which revolves around NNTP, though it
has support for UUCP batches --- a fact that not many INN administrators
seem to talk about. INN works faster than the CNews-NNTPd combination when
processing multiple parallel incoming NNTP feeds. For multiple readers
reading and posting news over NNTP, there is no difference between the
efficiency of INN and NNTPd. <A
HREF="x687.html#INNEFFICIENCY"
>Section 5.7</A
>&#62; discusses
the efficiency issues of INN over the earlier C-News architecture, based
on Rich Salz' paper and our analyses of usage patterns. </P
><P
>INN's architecture has inspired a lot of high-performance Usenet news
software, including a lot of commercial systems which address the
``carrier class'' market. That is the market for which the INN
architecture has clear advantages over C-News.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN281">3.4. Leafnode</H2
><P
>This is an interesting software system, to set up a ``small'' Usenet
news server on one computer which only receives newsfeeds but does not
have the headache of sending out bulk feeds to other sites,
<EM
>i.e.</EM
> it is a ``leaf node'' in the newsfeed flow
diagram. According to its homepage (<TT
CLASS="LITERAL"
>www.leafnode.org</TT
>),
``Leafnode is a USENET software package designed for small sites running
any flavour of Unix, with a few tens of readers and only a slow link
to the net. [...] The current version is 1.9.24.''</P
><P
>This software is a sort of combination of article repository and
NNTP news server, and receives articles, digests and stores them on the
local hard disks, expires them periodically, and serves them to an NNTP
reader. It is claimed that it is simple to manage and is ideal for
installation on a desktop-class Unix or Linux box, since it does not
take up much resources.</P
><P
>Leafnode is based on an appealing idea, but we find no problem
using C-News and NNTPd on a desktop-class box. Its resource consumption is
somewhat proportional to the volume of articles you want it to process,
and the number of groups you'll want to retain for a small team of users
will be easily handled by C-News on a desktop-class computer. An office
of a hundred users can easily use C-News and NNTPd on a desktop computer
running Linux, with 64 MBytes of RAM, IDE drives, and sufficient disk
space. Of course, ease of configuration and management is dependent on
familiarity, and we are more familiar with C-News than with Leafnode. We
hope this HOWTO will help you in that direction.</P
><P
>There <EM
>is</EM
>, however, one area in which Leafnode
is far easier to administer than INN or C-News. Leafnode constantly
monitors the actual usage of the newsgroups it carries, based on
readership statistics of its NNTP readers. If a particular newsgroup
is not read at all by any user for a week, then Leafnode will delete
all articles in that newsgroup, free up disk space, and stop fetching
new articles for it. If it finds that a previously abandoned newsgroup
is now again receiving attention, even from one user, then it'll fetch
all articles for that group from its upstream server the next time it
connects. This self-tuning feature of Leafnode is really an excellent
advantage which makes a Leafnode site easier to manage, specially for
small setups with bandwidth and disk space constraints.</P
><P
>The Leafnode Website gives a lot of details in an easily
understood format.</P
><P
>TO BE EXTENDED AND CORRECTED.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN292">3.5. Suck</H2
><P
>Suck is a program which lets you pull out an NNTP feed from an NNTP
server and file it locally. It does not contain any article repository
management software, expecting you to do it using some other
software system, <EM
>e.g.</EM
> C-News or INN. It can
create batchfiles which can be fed to C-News, for instance. (Well,
to be fair, Suck <EM
>does</EM
> have an option to store the
fetched articles in a spool directory tree very much like what is used
by C-News or INN in their article area, with one file per article. You
can later read this raw message spool area using a mail client which
supports the <TT
CLASS="LITERAL"
>msgdir</TT
> file layout for mail folders,
like MH, perhaps. We don't find this option useful if you're running
Suck on a Usenet server.) Suck finally boils down to a single
command-line program which is invoked periodically, typically from
<TT
CLASS="LITERAL"
>cron</TT
>. It has a zillion command-line options which
are confusing at first, but later show how mature and finely tunable
the software is.</P
><P
>If you need an NNTP pull feed, then we know of no better programs
than Suck for the job. The <TT
CLASS="LITERAL"
>nntpxfer</TT
> program which
forms part of the NNTPd package also implements an NNTP pull feed, for
instance, but does not have one-tenth of the flexibility and fine-tuning
of Suck. One of the banes of the NNTP pull feed is connection timeouts;
Suck allows a lot of special tuning to handle this problem. If we had
to set up a Usenet server with an NNTP pull feed, we'd use Suck right
away.</P
><P
>TO BE EXTENDED AND CORRECTED.</P
></DIV
><DIV
CLASS="SECTION"
><H2
CLASS="SECTION"
><A
NAME="AEN302">3.6. Carrier class software</H2
><P
>Carrier-class servers are expected to handle a complete feed of all
articles in all newsgroups, including a lot of groups which have what we
call a ``high noise-to-signal ratio.'' They do not have the luxury of
choosing a ``useful'' subset like administrators of internal corporate
Usenet servers do. Secondly, carrier-class servers are expected to turn
articles around very fast, <EM
>i.e.</EM
> they are expected to
have very low latency from the moment they receive an article to the
time they retransmit it by NNTP to downstream servers. Third, they
are supposed to provide very high availability, like other ``carrier
class'' services. This usually means that they have parallel arrays of
computers in load sharing configurations. And fourth, they usually do
not cater to retail connections for reading and posting articles by human
users. Usenet news carriers usually reserve separate computers to handle
retail connections.</P
><P
>Thus, carrier-class servers do not need to maintain a repository
of articles; they only need to focus on super-efficient real-time
re-transmission. These highly specialised servers have software which
receive an article over NNTP, parse it, and immediately re-queue it for
outward transmission to dozens or hundreds of other servers. And since
they work at these high throughputs, their downstream servers are also
expected to be live on the Internet round the clock to receive incoming
NNTP connections, or be prepared to lose articles. Therefore, there's
no batching or long queueing needed, and C-News-style batching in fact
is totally inapplicable.</P
><P
>Therefore, these carrier-class Usenet servers are more like packet
routers than servers with repositories. They are referred to nowadays as
NNTP routers or news routers.</P
><P
>It can be seen why batch-oriented repository management
software like C-News is a total anachronism here, and why they need an
NNTP-oriented, online, real-time design. The INN antecedents of some
of these systems is therefore natural. We would love to hear from any
Linux HOWTO reader whose Usenet server requirements include carrier-class
behaviour.</P
><P
>We are aware of only one freely redistributable NNTP router:
NNTPRelay (see <TT
CLASS="LITERAL"
>http://nntprelay.maxwell.syr.edu/</TT
>); this
software runs on NT. There is no reason why such services cannot run off
Linux servers, even Intel Linux, provided you have fast network links and
arrays of servers. Linux as an OS platform is not an issue here.</P
><P
>TO BE EXTENDED AND CORRECTED.</P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="x64.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="settingup.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Principles of Operation</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
>&nbsp;</TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Setting up CNews + NNTPd</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>