3274 lines
177 KiB
Plaintext
3274 lines
177 KiB
Plaintext
Usenet News HOWTO
|
||
|
||
Shuvam Misra (usenet at starcomsoftware dot com)
|
||
|
||
Revision History
|
||
Revision 2.1 2002-08-20 Revised by: sm
|
||
New sections on Security and Software History, lots of other small additions
|
||
and cleanup
|
||
Revision 2.0 2002-07-30 Revised by: sm
|
||
Rewritten by new authors at Starcom Software
|
||
Revision 1.4 1995-11-29 Revised by: vs
|
||
Original document; authored by Vince Skahan.
|
||
-----------------------------------------------------------------------------
|
||
|
||
Table of Contents
|
||
1. What is the Usenet?
|
||
1.1. Discussion groups
|
||
1.2. How it works, loosely speaking
|
||
1.3. About sizes, volumes, and so on
|
||
|
||
|
||
2. Principles of Operation
|
||
2.1. Newsgroups and articles
|
||
2.2. Of readers and servers
|
||
2.3. Newsfeeds
|
||
2.4. Control messages
|
||
|
||
|
||
3. Usenet news software
|
||
3.1. A brief history of Usenet systems
|
||
3.2. C-News and NNTPd
|
||
3.3. INN
|
||
3.4. Leafnode
|
||
3.5. Suck
|
||
3.6. Carrier class software
|
||
|
||
|
||
4. Setting up CNews + NNTPd
|
||
4.1. Getting the sources and stuff
|
||
4.2. Compiling and installing
|
||
4.3. Configuring the system: What and how to configure files?
|
||
4.4. Testing the system
|
||
4.5. pgpverify and controlperms
|
||
4.6. Feeding off an upstream neighbour
|
||
4.7. Configuring outgoing feeds
|
||
|
||
|
||
5. Setting up INN
|
||
5.1. Getting the source
|
||
5.2. Compiling and installing
|
||
5.3. Configuring the system
|
||
5.4. Setting up pgpverify
|
||
5.5. Feeding off an upstream neighbour
|
||
5.6. Setting up outgoing feeds
|
||
5.7. Efficiency issues and advantages
|
||
|
||
|
||
6. Connecting email with Usenet news
|
||
6.1. Feeding Usenet news to email
|
||
6.2. Feeding email to news: the mail2news gateway
|
||
6.3. Using GNU Mailman as an email-NNTP gateway
|
||
|
||
|
||
7. Security issues
|
||
7.1. Intrusion threats
|
||
7.2. Vulnerabilities unique to the Usenet service
|
||
|
||
|
||
8. Access control in NNTPd
|
||
8.1. Host-based access control
|
||
8.2. User authentication and authorisation
|
||
|
||
|
||
9. Components of a running system
|
||
9.1. /var/lib/news: the CNews control area
|
||
9.2. /var/spool/news: the article repository
|
||
9.3. /usr/lib/newsbin: the executables
|
||
9.4. crontab and cron jobs
|
||
9.5. newsrun and relaynews: digesting received articles
|
||
9.6. doexpire and expire: removing old articles
|
||
9.7. nntpd and msgidd: managing the NNTP interface
|
||
9.8. nov, the News Overview system
|
||
9.9. Batching feeds with UUCP and NNTP
|
||
|
||
|
||
10. Monitoring and administration
|
||
10.1. The newsdaily report
|
||
10.2. Crisis reports from newswatch
|
||
10.3. Disk space
|
||
10.4. CPU load and RAM usage
|
||
10.5. The in.coming/bad directory
|
||
10.6. Long pending queues in out.going
|
||
10.7. Problems with nntpxmit and nntpsend
|
||
10.8. The junk and control groups
|
||
|
||
|
||
11. Usenet news clients
|
||
11.1. Usenet User Agents
|
||
11.2. Clients that transfer articles
|
||
11.3. Special clients
|
||
|
||
|
||
12. Our perspective
|
||
12.1. Efficiency issues of NNTP
|
||
12.2. C-News+NNTPd or INN?
|
||
|
||
|
||
13. Usenet software: a historical perspective
|
||
13.1. The quoted excerpts
|
||
|
||
|
||
14. Documentation, information and further reading
|
||
14.1. The manpages
|
||
14.2. Papers, documents, articles
|
||
14.3. O'Reilly's books on Usenet news
|
||
14.4. Usenet-related RFCs
|
||
14.5. The source code
|
||
14.6. Usenet newsgroups
|
||
14.7. We
|
||
|
||
|
||
15. Wrapping up
|
||
15.1. Acknowledgements
|
||
15.2. Comments invited
|
||
15.3. Copyright
|
||
15.4. About Starcom Software Private Limited
|
||
|
||
|
||
|
||
#
|
||
-----------------------------------------------------------------------------
|
||
|
||
1. What is the Usenet?
|
||
|
||
1.1. Discussion groups
|
||
|
||
The Usenet is a huge worldwide collection of discussion groups. Each
|
||
discussion group has a name, e.g. comp.os.linux.announce, and a collection of
|
||
messages. These messages, usually called articles, are posted by readers like
|
||
you and me who have access to Usenet servers, and are then stored on the
|
||
Usenet servers.
|
||
|
||
This ability to both read and write into a Usenet newsgroup makes the Usenet
|
||
very different from the bulk of what people today call ``the Internet.'' The
|
||
Internet has become a colloquial term to refer to the World Wide Web, and the
|
||
Web is (largely) read-only. There are online discussion groups with Web
|
||
interfaces, and there are mailing lists, but Usenet is probably more
|
||
convenient than either of these for most large discussion communities. This
|
||
is because the articles get replicated to your local Usenet server, thus
|
||
allowing you to read and post articles without accessing the global Internet,
|
||
something which is of great value for those with slow Internet links. Usenet
|
||
articles also conserve bandwidth because they do not come and sit in each
|
||
member's mailbox, unlike email based mailing lists. This way, twenty members
|
||
of a mailing list in one office will have twenty copies of each message
|
||
copied to their mailboxes. However, with a Usenet discussion group and a
|
||
local Usenet server, there's just one copy of each article, and it does not
|
||
fill up anyone's mailbox.
|
||
|
||
Another nice feature of having your own local Usenet server is that articles
|
||
stay on the server even after you've read them. You can't accidentally delete
|
||
a Usenet articles the way you can delete a message from your mailbox. This
|
||
way, a Usenet server is an excellent way to archive articles of a group
|
||
discussion on a local server without placing the onus of archiving on any
|
||
group member. This makes local Usenet servers very valuable as archives of
|
||
internal discussion messages within corporate Intranets, provided the article
|
||
expiry configuration of the Usenet server software has been set up for
|
||
sufficiently long expiry periods.
|
||
-----------------------------------------------------------------------------
|
||
|
||
1.2. How it works, loosely speaking
|
||
|
||
Usenet news works by the reader first firing up a Usenet news program, which
|
||
in today's GUI world will highly likely be something like Netscape Messenger
|
||
or Microsoft's Outlook Express. There are a lot of proven, well-designed
|
||
character-based Usenet news readers, but a proper review of the user agent
|
||
software is outside the scope of this HOWTO, so we will just assume that you
|
||
are using whatever software you like. The reader then selects a Usenet
|
||
newsgroup from the hundreds or thousands of newsgroups which are hosted by
|
||
her local server, and accesses all unread articles. These articles are
|
||
displayed to her. She can then decide to respond to some of them.
|
||
|
||
When the reader writes an article, either in response to an existing one or
|
||
as a start of a brand-new thread of discussion, her software posts this
|
||
article to the Usenet server. The article contains a list of newsgroups into
|
||
which it is to be posted. Once it is accepted by the server, it becomes
|
||
available for other users to read and respond to. The article is
|
||
automatically expired or deleted by the server from its internal archives
|
||
based on expiry policies set in its software; the author of the article
|
||
usually can do little or nothing to control the expiry of her articles.
|
||
|
||
A Usenet server rarely works on its own. It forms a part of a collection of
|
||
servers, which automatically exchange articles with each other. The flow of
|
||
articles from one server to another is called a newsfeed. In a simplistic
|
||
case, one can imagine a worldwide network of servers, all configured to
|
||
replicate articles with each other, busily passing along copies across the
|
||
network as soon as one of them receives a new articles posted by a human
|
||
reader. This replication is done by powerful and fault-tolerant processes,
|
||
and gives the Usenet network its power. Your local Usenet server literally
|
||
has a copy of all current articles in all relevant newsgroups.
|
||
-----------------------------------------------------------------------------
|
||
|
||
1.3. About sizes, volumes, and so on
|
||
|
||
Any would-be Usenet server administrator or creator must read the "Periodic
|
||
Posting about the basic steps involved in configuring a machine to store
|
||
Usenet news," also known as the Site Setup FAQ, available from ftp://
|
||
rtfm.mit.edu/pub/usenet/news.answers/usenet/site-setup or ftp://ftp.uu.net/
|
||
usenet/news.answers/news/site-setup.Z. It was last updated in 1997, but
|
||
trends haven't changed much since then, though absolute volume figures have.
|
||
|
||
If you want your Usenet server to be a repository for all articles in all
|
||
newsgroups, you will probably not be reading this HOWTO, or even if you do,
|
||
you will rapidly realise that anyone who needs to read this HOWTO may not be
|
||
ready to set up such a server. This is because the volumes of articles on the
|
||
Usenet have reached a point where very specialised networks, very high end
|
||
servers, and large disk arrays are required for handling such Usenet volumes.
|
||
Those setups are called ``carrier-class'' Usenet servers, and will be
|
||
discussed a bit later on in this HOWTO. Administering such an array of
|
||
hardware may not be the job of the new Usenet administrator, for which this
|
||
HOWTO (and most Linux HOWTO's) are written.
|
||
|
||
Nevertheless, it may be interesting to understand what volumes we are talking
|
||
about. Usenet news article volumes have been doubling every fourteen months
|
||
or so, going by what we hear in comments from carrier class Usenet
|
||
administrators. In the beginning of 1997, this volume was 1.2 GBytes of
|
||
articles a day. Thus, the volumes should have roughly done five doublings, or
|
||
grown 32 times, by the time we reach mid-2002, at the time of this writing.
|
||
This gives us a volume of 38.4 GBytes per day. Assume that this transfer
|
||
happens using uncompressed NNTP (the norm), and add 50% extra for the
|
||
overheads of NNTP, TCP, and IP. This gives you a raw data transfer volume of
|
||
57.6 GBytes/day or about 460 Gbits/day. If you have to transfer such volumes
|
||
of data in 24 hours (86400 seconds), you'll need raw bandwidth of about 5.3
|
||
Mbits per second just to receive all these articles. You'll need more
|
||
bandwidth to send out feeds to other neighbouring Usenet servers, and then
|
||
you'll need bandwidth to allow your readers to access your servers and read
|
||
and post articles in retail quantities. Clearly, these volume figures are
|
||
outside the network bandwidths of most corporate organisations or educational
|
||
institutions, and therefore only those who are in the business of offering
|
||
Usenet news can afford it.
|
||
|
||
At the other end of the scale, it is perfectly feasible for a small office to
|
||
subscribe to a well-trimmed subset of Usenet newsgroups, and exclude most of
|
||
the high-volume newsgroups. Starcom Software, where the authors of this HOWTO
|
||
work, has worked with a fairly large subset of 600 newsgroups, which is still
|
||
a tiny fraction of the 15,000+ newsgroups that the carrier class services
|
||
offer. Your office or college may not even need 600 groups. And our company
|
||
had excluded specific high-volume but low-usefulness newsgroups like the
|
||
talk, comp.binaries, and alt hierarchies. With the pruned subset, the total
|
||
volume of articles per day may amount to barely a hundred MBytes a day or so,
|
||
and can be easily handled by most small offices and educational institutions.
|
||
And in such situations, a single Intel Linux server can deliver excellent
|
||
performance as a Usenet server.
|
||
|
||
Then there's the internal Usenet service. By internal here, we mean a private
|
||
set of Usenet newsgroups, not a private computer network. Every company or
|
||
university which runs a Usenet news service creates its own hierarchy of
|
||
internal newsgroups, whose articles never leave the campus or office, and
|
||
which therefore do not consume Internet bandwidth. These newsgroups are often
|
||
the ones most hotly accessed, and will carry more internally generated
|
||
traffic than all the ``public'' newsgroups you may subscribe to, within your
|
||
organisation. After all, how often does a guy have something to say which is
|
||
relevant to the world at large, unless he's discussing a globally relevant
|
||
topic like ``Unix rules!''? If such internal newsgroups are the focus of your
|
||
Usenet servers, then you may find that fairly modest hardware and Internet
|
||
bandwidth will suffice, depending on the size of your organisation.
|
||
|
||
The new Usenet server administrator has to undertake a sizing exercise to
|
||
ensure that he does not bite off more than he, or his network resources, can
|
||
chew. We hope we have provided sufficient information for him to get started
|
||
with the right questions.
|
||
-----------------------------------------------------------------------------
|
||
|
||
2. Principles of Operation
|
||
|
||
Here we discuss the basic concepts behind the operation of a Usenet news
|
||
system.
|
||
-----------------------------------------------------------------------------
|
||
|
||
2.1. Newsgroups and articles
|
||
|
||
A Usenet news article sits in a file or in some other on-disk data structure
|
||
on the disks of a Usenet server, and its contents look like this:
|
||
Xref: news.starcomsoftware.com starcom.tech.misc:211 starcom.tech.security:452
|
||
Newsgroups: starcom.tech.misc,starcom.tech.security
|
||
Path: news.starcomsoftware.com!purva!shuvam
|
||
From: Shuvam <shuvam@starcomsoftware.com>
|
||
Subject: "You just throw up your hands and reboot" (fwd)
|
||
Content-Type: TEXT/PLAIN; charset=US-ASCII
|
||
Distribution: starcom
|
||
Organization: Starcom Software Pvt Ltd, India
|
||
Message-ID: <Pine.LNX.4.31.0107022153490.30462-100000@starcomsoftware.com>
|
||
Mime-Version: 1.0
|
||
Date: Mon, 2 Jul 2001 16:27:57 GMT
|
||
|
||
Interesting quote, and interesting article.
|
||
|
||
Incidentally, comp.risks may be an interesting newsgroup to follow. We
|
||
must be receiving the feed for this group on our server, since we
|
||
receive all groups under comp.*, unless specifically cancelled. Check it
|
||
out sometime.
|
||
|
||
comp.risks tracks risks in the use of computer technology, including
|
||
issues in protecting ourselves from failures of such stuff.
|
||
|
||
Shuvam
|
||
|
||
> Date: Thu, 14 Jun 2001 08:11:00 -0400
|
||
> From: "Chris Norloff" <cnorloff@norloff.com>
|
||
> Subject: NYSE: "Throw up your hands and reboot"
|
||
>
|
||
> When the New York Stock Exchange computer systems crashed for 85
|
||
> minutes (8 Jun 2001), Andrew Brooks, chief of equity trading at
|
||
> Baltimore mutual fund giant T. Rowe Price, was quoted as saying "Hey,
|
||
> we're all subject to the vagaries of technology. It happens on your
|
||
> own PC at home. You just throw up your hands and reboot."
|
||
>
|
||
> http://www.washingtonpost.com/ac3/ContentServer?articleid=A42885-2001Jun8&pagename=article
|
||
>
|
||
> Chris Norloff
|
||
>
|
||
>
|
||
> This is from --
|
||
>
|
||
> From: risko@csl.sri.com (RISKS List Owner)
|
||
> Newsgroups: comp.risks
|
||
> Subject: Risks Digest 21.48
|
||
> Date: Mon, 18 Jun 2001 19:14:57 +0000 (UTC)
|
||
> Organization: University of California, Berkeley
|
||
>
|
||
> RISKS-LIST: Risks-Forum Digest Monday 19 June 2001
|
||
> Volume 21 : Issue 48
|
||
>
|
||
> FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS (comp.risks)
|
||
> ACM Committee on Computers and Public Policy,
|
||
> Peter G. Neumann, moderator
|
||
>
|
||
> This issue is archived at <URL:http://catless.ncl.ac.uk/Risks/21.48.html>
|
||
> and by anonymous ftp at ftp.sri.com, cd risks .
|
||
>
|
||
|
||
A Usenet article's header is very interesting if you want to learn about the
|
||
functioning of the Usenet. The From, Subject, and Date headers are familiar
|
||
to anyone who has used email. The Message-ID header contains a unique ID for
|
||
each message, and is present in each email message, though not many
|
||
non-technical email users know about it. The Content-Type and Mime-Version
|
||
headers are used for MIME encoding of articles, attaching files and other
|
||
attachments, and so on, just like in email messages.
|
||
|
||
The Organisation header is an informational header which is supposed to carry
|
||
some information identifying the organisation to which the author of the
|
||
article belongs. What remains now are the Newsgroups, Xref, Path and
|
||
Distributions headers. These are special to Usenet articles and are very
|
||
important.
|
||
|
||
The Newsgroups header specifies which newsgroups this article should belong
|
||
to. The Distributions header, sadly under-utilised in today's globalised
|
||
Internet world, allows the author of an article to specify how far the
|
||
article will be re-transmitted. The author of an article, working in
|
||
conjunction with well-configured networks of Usenet servers, can control the
|
||
``radius'' of replication of his article, thus posting an article of local
|
||
significance into a newsgroup but setting the Distribution header to some
|
||
suitable setting, e.g. local or starcom, to prevent the article from being
|
||
relayed to servers outside the specified domain.
|
||
|
||
The Xref header specifies the precise article number of this article in each
|
||
of the newsgroups in which it is inserted, for the current server. When an
|
||
article is copied from one server to another as part of a newsfeed, the
|
||
receiving server throws away the old Xref header and inserts its own, with
|
||
its own article numbers. This indicates an interesting feature of the Usenet
|
||
system: each article in a Usenet server has a unique number (an integer) for
|
||
each newsgroup it is a part of. Our sample above has been added to two
|
||
newsgroups on our server, and has the article numbers 211 and 452 in those
|
||
groups. Therefore, any Usenet client software can query our server and ask
|
||
for article number 211 in the newsgroup starcom.tech.misc and get this
|
||
article. Asking for article number 452 in starcom.tech.security will fetch
|
||
the article too. On another server, the numbers may be very different.
|
||
|
||
The Path specifies the list of machines through which this article has
|
||
travelled before it has reached the current server. UUCP-style syntax is used
|
||
for this string. The current example indicates that a user called shuvam
|
||
first wrote this article and posted it onto a computer which calls itself
|
||
purva, and this computer then transferred this article by a newsfeed to
|
||
news.starcomsoftware.com. The Path header is critical for breaking loops in
|
||
newsfeeds, and will be discussed in detail later.
|
||
|
||
Our sample article will sit in the two newsgroups listed above forever,
|
||
unless expired. The Usenet software on a server is usually configured to
|
||
expire articles based on certain conditions, e.g. after it's older than a
|
||
certain number of days. The C-News software we use allows expiry control
|
||
based on the newsgroup hierarchy and the type of newsgroup, i.e. moderated or
|
||
unmoderated. Against each class of newsgroups, it allows the administrator to
|
||
specify a number of days after which the article will be expired. It is
|
||
possible for an article to control its own expiry, by carrying an Expires
|
||
header specifying a date and time. Unless overriden in the Usenet server
|
||
software, the article will be expired only after its explicit expiry time is
|
||
reached.
|
||
-----------------------------------------------------------------------------
|
||
|
||
2.2. Of readers and servers
|
||
|
||
Computers which access Usenet articles are broadly of two classes: the
|
||
readers and the servers. A Usenet server carries a repository of articles,
|
||
manages them, handles newsfeeds, and offers its repository to authorised
|
||
readers to read. A Usenet reader is merely a computer with the appropriate
|
||
software to allow a user to access a software, fetch articles, post new
|
||
articles, and keep track of which articles it has read in each newsgroup. In
|
||
terms of functionality, Usenet reading software is less interesting to a
|
||
Usenet administrator than a Usenet server software. However, in terms of
|
||
lines of code, the Usenet reader software can often be much larger than
|
||
Usenet server software, primarily because of the complexities of modern GUI
|
||
code.
|
||
|
||
Most modern computers almost exclusively access Usenet servers using the NNTP
|
||
(Network News Transfer Protocol) for reading and posting. This protocol can
|
||
also be used for inter-server communication, but those aspects will be
|
||
discussed later. The NNTP protocol, like any other well-designed TCP-based
|
||
Internet protocol, carries ASCII commands and responses terminated with
|
||
CR-LF, and comprises a sequence of commands, somewhat reminiscent of the POP3
|
||
protocol for email. Using NNTP, a Usenet reader program connects to a Usenet
|
||
server, asks for a list of active newsgroups, and receives this (often huge)
|
||
list. It then sets the ``current newsgroup'' to one of these, depending on
|
||
what the user wants to browse through. Having done this, it gets the
|
||
meta-data of all current articles in the group, including the author, subject
|
||
line, date, and size of each article, and displays an index of articles to
|
||
the user.
|
||
|
||
The user then scans through this list, selects an article, and asks the
|
||
reader to fetch it. The reader gives the article number of this article to
|
||
the server, and fetches the full article for the user to read through. Once
|
||
the user finishes his NNTP session, he exits, and the reader program closes
|
||
the NNTP socket. It then (usually) updates a local file in the user's home
|
||
area, keeping track of which news articles the user has read. These articles
|
||
are typically not shown to the user next time, thus allowing the user to
|
||
progress rapidly to new articles in each session. The reader software is
|
||
helped along in this endeavour by the Xref header, using which it knows all
|
||
the different identities by which a single article is identified in the
|
||
server. Thus, if you read the sample article given above by accessing
|
||
starcom.tech.misc, you'll never be shown this article again when you access
|
||
starcom.tech.misc or starcom.tech.security; your reader software will do this
|
||
by tracking the Xref header and mapping article numbers.
|
||
|
||
When a user posts an article, he first composes his message using the user
|
||
interface of his reader software. When he finally gives the command to send
|
||
the article, the reader software contacts the Usenet server using the
|
||
pre-existing NNTP connection and sends the article to it. The article carries
|
||
a Newsgroups header with the list of newsgroups to post to, often a
|
||
Distribution header with a distribution specification, and other headers like
|
||
From, Subject etc. These headers are used by the server software to do the
|
||
right thing. Special and rare headers like Expires and Approved are acted
|
||
upon when present. The server assigns a new article number to the article for
|
||
each newsgroup it is posted to, and creates a new Xref header for the
|
||
article.
|
||
|
||
Transfer of articles between servers is done in various ways, and is
|
||
discussed in quite a bit of detail in Section XXX titled ``Newsfeeds'' below.
|
||
-----------------------------------------------------------------------------
|
||
|
||
2.3. Newsfeeds
|
||
|
||
2.3.1. Fundamental concepts
|
||
|
||
When we try to analyse newsfeeds in real life, we begin to see that, for most
|
||
sites, traffic flow is not symmetrical in both directions. We usually find
|
||
that one server will feed the bulk of the world's articles to one or more
|
||
secondary servers every day, and receive a few articles written by the users
|
||
of those secondary servers in exchange. Thus, we usually find that articles
|
||
flow down from the stem to the branches to the leaves of the worldwide Usenet
|
||
server network, and not exactly in a totally balanced mesh flow pattern.
|
||
Therefore, we use the term ``upstream server'' to refer to the server from
|
||
which we receive the bulk of our daily dose of articles, and ``downstream
|
||
server'' to refer to those servers which receive the bulk dose of articles
|
||
from us.
|
||
|
||
Newsfeeds relay articles from one server to their ``next door neighbour''
|
||
servers, metaphorically speaking. Therefore, articles move around the globe,
|
||
not by a massive number of single-hop transfers from the originating server
|
||
to every other server in the world, but in a sequence of hops, like passing
|
||
the baton in a relay race. This increases the latency time for an article to
|
||
reach a remote tertiary server after, say, ten hops, but it allows tighter
|
||
control of what gets relayed at every hop, and helps in redundancy,
|
||
decentralisation of server loads, and conservation of network bandwidth. In
|
||
this respect, Usenet newsfeeds are more complex than HTTP data flows, which
|
||
typically use single-hop techniques.
|
||
|
||
Each Usenet news server therefore has to worry about newsfeeds each time it
|
||
receives an article, either by a fresh post or from an incoming newsfeed.
|
||
When the Usenet server digests this article and files it away in its
|
||
repository, it simultaneously looks through its database to see which other
|
||
server it should feed the article to. In order to do this, it carries out a
|
||
sequence of checks, described below.
|
||
|
||
Each server knows which other servers are its ``next door neighbours;'' this
|
||
information is kept in its newsfeed configuration information. Against each
|
||
of its ``next door neighbours,'' there will be a list of newsgroups which it
|
||
wants, and a list of distributions. The new article's list of newsgroups will
|
||
be matched against the newsgroup list of the ``next door neighbour'' to see
|
||
whether there's even a single common newsgroup which makes it necessary to
|
||
feed the article to it. If there's a matching newsgroup, and the server's
|
||
distribution list matches the article's distribution, then the article is
|
||
marked for feeding to this neighbour.
|
||
|
||
When the neighbour receives the article as part of the feed, it performs some
|
||
sanity checks of its own. The first check it performs is on the Newsgroups
|
||
header of the new article. If none of the newsgroups listed there are part of
|
||
the active newsgroups list of this server, then the article can be rejected.
|
||
An article rejected thus may even be queued for outgoing feeds to other
|
||
servers, but will not be digested for incorporation into the local article
|
||
repository.
|
||
|
||
The next check performed is against the Path header of the incoming article.
|
||
If this header lists the name of the current Usenet server anywhere, it
|
||
indicates that it has already passed through this server at least once
|
||
before, and is now re-appearing here erroneously because of a newsfeed loop.
|
||
Such loops are quite often configured into newsfeed topologies for
|
||
redundancy: ``I'll get the articles from Server X if not Server Y, and may
|
||
the first one in win.'' The Usenet server software automatically detects a
|
||
duplicate feed of an article and rejects it.
|
||
|
||
The next check is against what is called the server's history database. Every
|
||
Usenet server has a history database, which is a list of the message IDs of
|
||
all current articles in the local repository. Oftentimes the history database
|
||
also carries the message IDs of all messages recently expired. If the
|
||
incoming article's message ID matches any of the entries in the database,
|
||
then again it is rejected without being filed in the local repository. This
|
||
is a second loop detection method. Sometimes, the mere checking of the
|
||
article's Path header does not detection of all potential problems, because
|
||
the problem may be a re-insertion instead of a loop. A re-insertion happens
|
||
when the same incoming batch of news articles is re-fed into the local
|
||
server, perhaps after recovering the system's data from tapes after a system
|
||
crash. In such cases, there's no newsfeed loop, but there's still the risk
|
||
that one article may be digested into the local server twice. The history
|
||
database prevents this.
|
||
|
||
All these simple checks are very effective, and work across server and
|
||
software types, as per the Internet standards. Together, they allow robust
|
||
and fail-safe Usenet article flow across the world.
|
||
-----------------------------------------------------------------------------
|
||
|
||
2.3.2. Types of newsfeeds
|
||
|
||
This section explains the basics of newsfeeds, without getting into details
|
||
of software and configuration files.
|
||
-----------------------------------------------------------------------------
|
||
|
||
2.3.2.1. Queued feeds
|
||
|
||
This is the commonest method of sending articles from one server to another,
|
||
and is followed whenever large volumes of articles are to be transferred per
|
||
day. This approach needs a one-time modification to the upstream server's
|
||
configuration for each outgoing feed, to define a new queue.
|
||
|
||
In essence all queued feeds work in the following way. When the sending
|
||
server receives an article, it processes it for inclusion into its local
|
||
repository, and also checks through all its outgoing feed definitions to see
|
||
whether the article needs to be queued for any of the feeds. If yes, it is
|
||
added to a queue file for each outgoing feed. The precise details of the
|
||
queue file can change depending on the software implementation, but the basic
|
||
processes remain the same. A queue file is a list of queued articles, but
|
||
does not contain the article contents. Typical queue files are ASCII text
|
||
files with one line per article giving the path to a copy of the article in
|
||
the local spool area.
|
||
|
||
Later, a separate process picks up each queue file and creates one or more
|
||
batches for each outgoing feed. A batch is a large file containing multiple
|
||
Usenet news articles. Once the batches are created, various transport
|
||
mechanisms can be used to move the files from sending server to receiving
|
||
server. You can even use scripted FTP. You only need to ensure that the batch
|
||
is picked up from the upstream server and somehow copied into a designated
|
||
incoming batch directory in the downstream server.
|
||
|
||
UUCP has traditionally been the mechanism of choice for batch movement,
|
||
because it predates the Internet and wide availability of fast
|
||
packet-switched data networks. Today, with TCP/IP everywhere, UUCP once again
|
||
emerges as the most logical choice of batch movement, because it too has
|
||
moved with the times: it can work over TCP.
|
||
|
||
NNTP is the de facto mechanism of choice for moving queued newsfeeds for
|
||
carrier-class Usenet servers on the Internet, and unfortunately, for a lot of
|
||
other Usenet servers as well. The reason why we find this choice unfortunate
|
||
is discussed in Section 12.1> below. But in NNTP feeds, an intermediate step
|
||
of building batches out of queue files can be eliminated --- this is both its
|
||
strength and its weakness.
|
||
|
||
In the case of queued NNTP feeds, articles get added to queue files as
|
||
described above. An NNTP transmit process periodically wakes up, picks up a
|
||
queue file, and makes an NNTP connection to the downstream server. It then
|
||
begins a processing loop where, for each queued article, it uses the NNTP
|
||
IHAVE command to inform the downstream server of the article's message~ID.
|
||
The downstream server checks its local repository to see whether it already
|
||
has the message. If not, it responds with a SENDME response. The transmitting
|
||
server then pumps out the article contents in plaintext form. When all
|
||
articles in the queue have been thus processed, the sending server closes the
|
||
connection. If the NNTP connection breaks in between due to any reason, the
|
||
sending server truncates the queue file and retains only those articles which
|
||
are yet to be transmitted, thus minimising repeat transmissions.
|
||
|
||
> A queued NNTP feed works with the sending server making an NNTP connection
|
||
to the receiving server. This implies that the receiving server must have an
|
||
IP address which is known to the sending server or can be looked up in the
|
||
DNS. If the receiving server connects to the Internet periodically using a
|
||
dialup connection and works with a dynamically assigned IP address, this can
|
||
get tricky. UUCP feeds suffer no such problems because the sending server for
|
||
the newsfeed can be the UUCP server, i.e. passive. The receiving server for
|
||
the feed can be the UUCP master, i.e. the active party. So the receiving
|
||
server can then initiate the UUCP connection and connect to the sending
|
||
server. Thus, if even one of the two parties has a static IP address, UUCP
|
||
queued feeds can work fine.
|
||
|
||
Thus, NNTP feeds can be sent out a little faster than the batched
|
||
transmission processes used for UUCP and other older methods, because no
|
||
batches need to be constructed. However, NNTP is often used in newsfeeds
|
||
where it is not necessary and it results in colossal waste of bandwidth.
|
||
Before we study efficiency issues of NNTP versus batched feeds, we will cover
|
||
another way feeds can be organised using NNTP: the pull feeds.
|
||
-----------------------------------------------------------------------------
|
||
|
||
2.3.2.2. Pull feeds
|
||
|
||
This method of transferring a set of articles works only over NNTP, and
|
||
requires absolutely no configuration on the transmitting, or upstream,
|
||
server. In fact, the upstream server cannot even easily detect that the
|
||
downstream server is pulling out a feed --- it appears to be just a heavy and
|
||
thorough newsreader, that's all.
|
||
|
||
This pull feed works by the downstream server pulling out articles i one by
|
||
one, just like any NNTP newsreader, using the NNTP ARTICLE command with the
|
||
Message-ID as parameter. The interesting detail is how it gets the
|
||
message~IDs to begin with. For this, it uses an NNTP command, specially
|
||
designed for pull feeds, called NEWNEWS. This command takes a hierarchy and a
|
||
date,
|
||
+---------------------------------------------------------------------------+
|
||
| NEWNEWS comp 15081997 |
|
||
+---------------------------------------------------------------------------+
|
||
|
||
This command is sent by the downstream server over NNTP to the upstream
|
||
server, and in effect asks the upstream server to list out all news articles
|
||
which are newer than 15 August 1997 in the comp hierarchy. The upstream
|
||
server responds with a (often huge) list of message~IDs, one per line, ending
|
||
with a period on a line by itself.
|
||
|
||
The pulling server then compares each newly received message~ID with its own
|
||
article database and makes a (possibly shorter) list of all articles which it
|
||
does not have, thus eliminating duplicate fetches. That done, it begins
|
||
fetching articles one by one, using the NNTP ARTICLE command as mentioned
|
||
above.
|
||
|
||
In addition, there is another NNTP command, NEWGROUPS, which allows the NNTP
|
||
client --- i.e. the downstream server in this case --- to ask its upstream
|
||
server what were the new newsgroups created since a given date. This allows
|
||
the downstream server to add the new groups to its active file.
|
||
|
||
The NEWNEWS based approach is usually one of the most inefficient methods of
|
||
pulling out a large Usenet feed. By inefficiency, here we refer to the CPU
|
||
loads and RAM utilisation on the upstream server, not on bandwidth usage.
|
||
This inefficiency is because most Usenet news servers do not keep their
|
||
article databases indexed by hierarchy and date; CNews certainly does not.
|
||
This means that a NEWNEWS command issued to an upstream server will put that
|
||
server into a sequential search of its article database, to see which
|
||
articles fit into the hierarchy given and are newer than the given date.
|
||
|
||
If pull feeds were to become the most common way of sending out articles,
|
||
then all upstream servers would badly need an efficient way of sorting their
|
||
article databases to allow each NEWNEWS command to rapidly generate its list
|
||
of matching articles. A slow upstream server today might take minutes to
|
||
begin responding to a NEWNEWS command, and the downstream server may time out
|
||
and close its NNTP connection in the meanwhile. We have often seen this
|
||
happening, till we tweak timeouts.
|
||
|
||
There are basic efficiency issues of bandwidth utilisation involved in NNTP
|
||
for news feeds, which are applicable for both queued and pull feeds. But the
|
||
problem with NEWNEWS is unique to pull feeds, and relates to server loads,
|
||
not bandwidth wastage.
|
||
-----------------------------------------------------------------------------
|
||
|
||
2.4. Control messages
|
||
|
||
The Usenet is a massive dispersed collection of servers which operate almost
|
||
without any supervision, provided they have adequate disk space and do not
|
||
suffer disk corruption due to power failures, etc. (It is indeed surprising
|
||
how self-managing a good Usenet server is, provided these two pre-requisites
|
||
are met.) These servers are each under the control of human administrators,
|
||
but it is preferable that certain routine actions be performed across all
|
||
these servers remotely from one location, without the manual intervention of
|
||
these humans.
|
||
|
||
One common need for centralised operations is the creation of new groups in
|
||
the standard eight hierarchies. The Usenet follows a fairly formal process
|
||
which asks for votes from readers worldwide before deciding on the
|
||
restructuring of its newsgroups list, including merging of low-volume groups,
|
||
splitting of high-volume groups into many specialised groups, creating new
|
||
groups, and even deleting groups. Once the voting process for a change
|
||
concludes and the change action is to be carried out, it would be extremely
|
||
tedious to send email to the hundreds of thousands of Usenet administrators
|
||
and hope that they make the changes right, and answer their doubts if they
|
||
get confused. It would be much better to have an automatic way to make the
|
||
changes across all servers, of course with proper authorisation.
|
||
|
||
The solution to this does not lie in giving some central authority the
|
||
ability to run an OS-level command of his choice on all the world's Usenet
|
||
servers, because OS commands differ from OS to OS, and because few Usenet
|
||
administrators would trust a stranger from another part of the world with OS
|
||
level access. Therefore, the solution lay in defining a small set of common
|
||
Usenet maintenance actions, and permitting only these actions to be triggered
|
||
on all servers through the passing of special command messages, called
|
||
control messages.
|
||
|
||
Control messages look like ordinary Usenet articles, more or less. They have
|
||
an extra header line, with its value in a specific format, but they usually
|
||
carry body text which looks like a normal human-written article. Here is a
|
||
control message (a spurious one at that, but it'll do for now):
|
||
Xref: news.starcomsoftware.com control:814217
|
||
Path: news.starcomsoftware.com!linux594.dn.net!news.dn.hoopoo.com!
|
||
feed-out.newsfeeds.com!newsfeeds.com!feed.newsfeeds.com!
|
||
newsfeeds.com!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!
|
||
newsfeed.icl.net!newsfeed.skycache.com!Cidera!newsfeed.gamma.ru!
|
||
Gamma.RU!carrier.kiev.ua!goblin.nadrabank.kiev.ua!not-for-mail
|
||
From: tale@uunet.uu.net (David C Lawrence)
|
||
Newsgroups: news.groups,humanities.hipcrime
|
||
Subject: cmsg newgroup humanities.hipcrime
|
||
Control: newgroup humanities.hipcrime
|
||
Date: Sun, 18 Feb 2001 11:50:28 GMT
|
||
Organization: The Cabal
|
||
Lines: 20
|
||
Approved: tale@uunet.uu.net
|
||
Message-ID: <3afWYZTIR.G5YOC2@uunet.uu.net>
|
||
NNTP-Posting-Host: 203.145.147.67
|
||
X-Trace: goblin.nadrabank.kiev.ua 982528840 21455 203.145.147.67
|
||
(18 Feb 2001 20:40:40 GMT)
|
||
X-Complaints-To: usenet@nadrabank.kiev.ua
|
||
NNTP-Posting-Date: 18 Feb 2001 20:40:40 GMT
|
||
X-No-Archive: Yes
|
||
|
||
humanities.hipcrime is an unmoderated newsgroup which passed its
|
||
vote for creation by 326:10 as reported in news.announce.newgroups
|
||
on 18 Feb 2001.
|
||
|
||
For your newsgroups file:
|
||
humanities.hipcrime HipCrime for Humanity - you committed one now!
|
||
|
||
Anyone can create a newsgroup in the alt, biz, comp, earth,
|
||
humanities, misc, news, meow, rec, sci, soc, talk, us, or
|
||
any other Usenet hierarchy. New newsgroup proposals may be
|
||
optionally discussed in news.groups. Please be sure that your
|
||
/usr/lib/news/control.ctl is configured correctly:
|
||
|
||
## NEWGROUP MESSAGES
|
||
## honor them all and log in \${LOG}/newgroup.log
|
||
newgroup:*:alt.*|biz.*|comp.*|earth.*|humanities.*|misc.*|news.*|\
|
||
meaw.*|rec.*|sci.*|soc.*|talk.*|us.*:doit=newgroup
|
||
|
||
## RMGROUP MESSAGES
|
||
## drop them all and don't log
|
||
rmgroup:*:*:drop
|
||
|
||
Meow!
|
||
David C Lawrence
|
||
|
||
A control message must have a Control header. Besides, all control messages
|
||
will have an Approved header, like messages posted to moderated newsgroups.
|
||
The Control header actually specifies a command to run on the local server,
|
||
and the parameter(s) to supply to it. The local Usenet server software is
|
||
supposed to figure out its own way to get the task done. In this example, the
|
||
command in the Control header is newgroup, which creates a new newsgroup. And
|
||
its parameter is humanities.hipcrime, which gives the name of the newsgroup
|
||
to create.
|
||
|
||
In C-News, the control message implementation works through separate
|
||
shellscripts kept in a fixed directory, $NEWSBIN/ctl/, as a security measure;
|
||
if the executable script isn't present there, the control message command
|
||
will be ignored. The control message types supported are:
|
||
|
||
<EFBFBD><EFBFBD>*<2A>checkgroups: control message to check whether the list of newsgroups in
|
||
your active file are all correct as per a master list of newsgroups sent
|
||
in the control message
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newgroup: control message to create a new newsgroup
|
||
|
||
<EFBFBD><EFBFBD>*<2A>rmgroup: control message to delete a newsgroup and all articles in it
|
||
|
||
<EFBFBD><EFBFBD>*<2A>sendsys: control message to cause an email response to be sent to the
|
||
author with the sys file of your server in it. This results in a response
|
||
storm of emails from all the Usenet servers in the world to the author.
|
||
These responses allow the sender of the control message to analyse all
|
||
the sys files of the world's Usenet servers and create the directed graph
|
||
of Usenet newsfeeds. Why someone would want to do this is hard to guess,
|
||
but the result is surely an awesome picture of one facet of networked
|
||
human civilisation, like looking at a giant world map.
|
||
|
||
Incidentally, there is no invasion of privacy here, because your server's
|
||
sys file is supposed to be public information, if you take feeds from the
|
||
public Usenet.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>version: control message which results in your Usenet software sending an
|
||
email to the author of the message, containing the type and version of
|
||
the Usenet news software you are using. This too is not an invasion of
|
||
privacy, because this information is supposed to be public knowledge.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>The cancel message: the most frequently occurring type of control
|
||
messages. They specify the message ID of an article, and result in the
|
||
cancellation (deletion) of that article. If you post an article and
|
||
regret it a moment later, your Usenet newsreader software usually allows
|
||
you to ``cancel'' it by generating a cancel message.
|
||
|
||
|
||
The Usenet news software maintains a pseudo-newsgroup called control, where
|
||
it files all control messages it receives. If you have an incoming newsfeed
|
||
from the public Usenet, your server's control group will usually be full with
|
||
thousands of cancel messages from trigger-happy fingers all over the world.
|
||
Usenet news server software like C-News allows you to filter the incoming
|
||
feed based on newsgroups, and will discard articles for groups they do not
|
||
subscribe to. But since all servers have to receive and process control
|
||
messages, they will all accept these cancel messages, though many of them may
|
||
apply to articles which are not part of your highly-pruned subset of groups.
|
||
C'est la vie.
|
||
|
||
Remember to set expiry for the control group to one day or even shorter, so
|
||
that the junk can be cleaned out as rapidly as possible, just like the junk
|
||
newsgroup.
|
||
|
||
The beauty of the control message architecture is that it integrates
|
||
seamlessly into the newsfeed mechanism for automatic control of the network
|
||
of servers. No separate channel of connection is needed for the control
|
||
actions. And article replication automatically propagates control messages
|
||
with human-readable articles, thus guaranteeing reach across heterogenous
|
||
networks technologies.
|
||
|
||
What your Usenet server does on receiving a control message is governed by an
|
||
authorisation file: $NEWSCTL/controlperms in the case of C-News and
|
||
control.ctl in the case of INN, for instance. The security measures
|
||
implemented by this module are further enhanced by the pgpcontrol package
|
||
with its pgpverify script. Using pgpverify, your server can check that all
|
||
control messages (except for article cancellation messages) are digitally
|
||
signed by a trusted party using military-spec public key cryptography. Our
|
||
integrated Usenet news software distribution includes integration with
|
||
pgpverify.
|
||
-----------------------------------------------------------------------------
|
||
|
||
3. Usenet news software
|
||
|
||
3.1. A brief history of Usenet systems
|
||
|
||
Towards the end of this HOWTO, we have added some information about the
|
||
history of Usenet server software by quoting sections from an earlier Usenet
|
||
Periodic Posting. We consider this historical perspective, and the Usenix
|
||
papers and other documents referred to in it, essential reading for any
|
||
Usenet server administrator. Please see the section titled "Usenet software:
|
||
a historical perspective>".
|
||
-----------------------------------------------------------------------------
|
||
|
||
3.2. C-News and NNTPd
|
||
|
||
C-News was written by Henry Spencer and Geoff Collyer of the Department of
|
||
Zoology, University of Toronto, almost entirely in shell and awk, as a
|
||
replacement for an earlier system called B-News. The focus was on adding some
|
||
extra features and a lot of performance. The first release was called
|
||
Shellscript Release, which was deployed by a very large number of servers
|
||
worldwide, as a natural upgrade to B-News. This version of C-News had upward
|
||
compatibility with B-News meta-data, e.g. history files. This was the version
|
||
of C-News which was initially rolled out in 1991 or so at the National Centre
|
||
for Software Technology (NCST, http://www.ncst.ernet.in) and the Indian
|
||
Institutes of Technology in India as part of the Indian educational and
|
||
research network (ERNET). We received guidance from the NCST about Usenet
|
||
news installation and management.
|
||
|
||
The Shellscript Release was soon followed by a re-write with a lot more C
|
||
code, called Performance Release, and then a set of cleanup and component
|
||
integration steps leading to the last release called the Cleanup Release.
|
||
This Cleanup Release was patched many times by the authors, and the last one
|
||
was CR.G (Cleanup Release revision G). The version of C-News discussed in
|
||
this HOWTO is a set of small bug fixes on CR.G.
|
||
|
||
Since C-News came from shellscript-based antecedents, its architecture
|
||
followed the set-of-programs style so typical of Unix, rather than large
|
||
monolothic software systems traditional to some other OSs. All pieces had
|
||
well-defined roles, and therefore could be easily replaced with other pieces
|
||
as needed. This allowed easy adaptations and upgradations. This never
|
||
affected performance, because key components which did a lot of work at high
|
||
speed, e.g. newsrun, had been rewritten in C by that time. Even within the
|
||
shellscripts, crucial components which handled binary data, e.g. a component
|
||
called dbz to manipulate efficient on-disk hash arrays, were C programs with
|
||
command-line interfaces, called from scripts.
|
||
|
||
C-News was born in a world with widely varying network line speeds, where
|
||
bandwidth utilisation was a big issue and dialup links with UUCP file
|
||
transfers was common. Therefore, it has strong support for batched feeds,
|
||
specially with a variety of compression techniques and over a variety of fast
|
||
and slow transport channels. And C-News virtually does not know the existence
|
||
of TCP/IP, other than one or two tiny batch transport programs like viarsh.
|
||
However, its design was so modular that there was absolutely no problem in
|
||
plugging in NNTP functionality using a separate set of C programs without
|
||
modifying a single line of C-News. This was done by a program suite called
|
||
NNTP Reference Implementation, which we call NNTPd.
|
||
|
||
This software suite could work with B-News and C-News article repositories,
|
||
and provided the full NNTP functionality. Since B-News died a gradual death,
|
||
the combination of C-News and NNTPd became a freely redistributable,
|
||
portable, modern, extensible, and high-performance software suite for Unix
|
||
Usenet servers. Further refinements were added later, e.g. nov, the News
|
||
Overview package and pgpverify, a public-key-based digital signature module
|
||
to protect Usenet news servers against fraudulent control messages.
|
||
-----------------------------------------------------------------------------
|
||
|
||
3.3. INN
|
||
|
||
INN is one of the two most widely used Usenet news server solutions. It was
|
||
written by Rich Salz for Unix systems which have a socket API --- probably
|
||
all Unix systems do, today.
|
||
|
||
INN has an architecture diametrically opposite to CNews. It is a monolithic
|
||
program, which is started at bootup time, and keeps running till your server
|
||
OS is shut down. This is like the way high performance HTTP servers are run
|
||
in most cases, and allows INN to cache a lot of things in its memory,
|
||
including message-IDs of recently posted messages, etc. This interesting
|
||
architecture has been discussed in an interesting paper by the author, where
|
||
he explains the problems of the older B-News and C-News systems that he tried
|
||
to address. Anyone interested in Usenet software in general and INN in
|
||
particular should study this paper.
|
||
|
||
INN addresses a Usenet news world which revolves around NNTP, though it has
|
||
support for UUCP batches --- a fact that not many INN administrators seem to
|
||
talk about. INN works faster than the CNews-NNTPd combination when processing
|
||
multiple parallel incoming NNTP feeds. For multiple readers reading and
|
||
posting news over NNTP, there is no difference between the efficiency of INN
|
||
and NNTPd. Section 5.7> discusses the efficiency issues of INN over the
|
||
earlier C-News architecture, based on Rich Salz' paper and our analyses of
|
||
usage patterns.
|
||
|
||
INN's architecture has inspired a lot of high-performance Usenet news
|
||
software, including a lot of commercial systems which address the ``carrier
|
||
class'' market. That is the market for which the INN architecture has clear
|
||
advantages over C-News.
|
||
-----------------------------------------------------------------------------
|
||
|
||
3.4. Leafnode
|
||
|
||
This is an interesting software system, to set up a ``small'' Usenet news
|
||
server on one computer which only receives newsfeeds but does not have the
|
||
headache of sending out bulk feeds to other sites, i.e. it is a ``leaf node''
|
||
in the newsfeed flow diagram. According to its homepage (www.leafnode.org),
|
||
``Leafnode is a USENET software package designed for small sites running any
|
||
flavour of Unix, with a few tens of readers and only a slow link to the net.
|
||
[...] The current version is 1.9.24.''
|
||
|
||
This software is a sort of combination of article repository and NNTP news
|
||
server, and receives articles, digests and stores them on the local hard
|
||
disks, expires them periodically, and serves them to an NNTP reader. It is
|
||
claimed that it is simple to manage and is ideal for installation on a
|
||
desktop-class Unix or Linux box, since it does not take up much resources.
|
||
|
||
Leafnode is based on an appealing idea, but we find no problem using C-News
|
||
and NNTPd on a desktop-class box. Its resource consumption is somewhat
|
||
proportional to the volume of articles you want it to process, and the number
|
||
of groups you'll want to retain for a small team of users will be easily
|
||
handled by C-News on a desktop-class computer. An office of a hundred users
|
||
can easily use C-News and NNTPd on a desktop computer running Linux, with 64
|
||
MBytes of RAM, IDE drives, and sufficient disk space. Of course, ease of
|
||
configuration and management is dependent on familiarity, and we are more
|
||
familiar with C-News than with Leafnode. We hope this HOWTO will help you in
|
||
that direction.
|
||
|
||
There is, however, one area in which Leafnode is far easier to administer
|
||
than INN or C-News. Leafnode constantly monitors the actual usage of the
|
||
newsgroups it carries, based on readership statistics of its NNTP readers. If
|
||
a particular newsgroup is not read at all by any user for a week, then
|
||
Leafnode will delete all articles in that newsgroup, free up disk space, and
|
||
stop fetching new articles for it. If it finds that a previously abandoned
|
||
newsgroup is now again receiving attention, even from one user, then it'll
|
||
fetch all articles for that group from its upstream server the next time it
|
||
connects. This self-tuning feature of Leafnode is really an excellent
|
||
advantage which makes a Leafnode site easier to manage, specially for small
|
||
setups with bandwidth and disk space constraints.
|
||
|
||
The Leafnode Website gives a lot of details in an easily understood format.
|
||
|
||
TO BE EXTENDED AND CORRECTED.
|
||
-----------------------------------------------------------------------------
|
||
|
||
3.5. Suck
|
||
|
||
Suck is a program which lets you pull out an NNTP feed from an NNTP server
|
||
and file it locally. It does not contain any article repository management
|
||
software, expecting you to do it using some other software system, e.g.
|
||
C-News or INN. It can create batchfiles which can be fed to C-News, for
|
||
instance. (Well, to be fair, Suck does have an option to store the fetched
|
||
articles in a spool directory tree very much like what is used by C-News or
|
||
INN in their article area, with one file per article. You can later read this
|
||
raw message spool area using a mail client which supports the msgdir file
|
||
layout for mail folders, like MH, perhaps. We don't find this option useful
|
||
if you're running Suck on a Usenet server.) Suck finally boils down to a
|
||
single command-line program which is invoked periodically, typically from
|
||
cron. It has a zillion command-line options which are confusing at first, but
|
||
later show how mature and finely tunable the software is.
|
||
|
||
If you need an NNTP pull feed, then we know of no better programs than Suck
|
||
for the job. The nntpxfer program which forms part of the NNTPd package also
|
||
implements an NNTP pull feed, for instance, but does not have one-tenth of
|
||
the flexibility and fine-tuning of Suck. One of the banes of the NNTP pull
|
||
feed is connection timeouts; Suck allows a lot of special tuning to handle
|
||
this problem. If we had to set up a Usenet server with an NNTP pull feed,
|
||
we'd use Suck right away.
|
||
|
||
TO BE EXTENDED AND CORRECTED.
|
||
-----------------------------------------------------------------------------
|
||
|
||
3.6. Carrier class software
|
||
|
||
Carrier-class servers are expected to handle a complete feed of all articles
|
||
in all newsgroups, including a lot of groups which have what we call a ``high
|
||
noise-to-signal ratio.'' They do not have the luxury of choosing a ``useful''
|
||
subset like administrators of internal corporate Usenet servers do. Secondly,
|
||
carrier-class servers are expected to turn articles around very fast, i.e.
|
||
they are expected to have very low latency from the moment they receive an
|
||
article to the time they retransmit it by NNTP to downstream servers. Third,
|
||
they are supposed to provide very high availability, like other ``carrier
|
||
class'' services. This usually means that they have parallel arrays of
|
||
computers in load sharing configurations. And fourth, they usually do not
|
||
cater to retail connections for reading and posting articles by human users.
|
||
Usenet news carriers usually reserve separate computers to handle retail
|
||
connections.
|
||
|
||
Thus, carrier-class servers do not need to maintain a repository of articles;
|
||
they only need to focus on super-efficient real-time re-transmission. These
|
||
highly specialised servers have software which receive an article over NNTP,
|
||
parse it, and immediately re-queue it for outward transmission to dozens or
|
||
hundreds of other servers. And since they work at these high throughputs,
|
||
their downstream servers are also expected to be live on the Internet round
|
||
the clock to receive incoming NNTP connections, or be prepared to lose
|
||
articles. Therefore, there's no batching or long queueing needed, and
|
||
C-News-style batching in fact is totally inapplicable.
|
||
|
||
Therefore, these carrier-class Usenet servers are more like packet routers
|
||
than servers with repositories. They are referred to nowadays as NNTP routers
|
||
or news routers.
|
||
|
||
It can be seen why batch-oriented repository management software like C-News
|
||
is a total anachronism here, and why they need an NNTP-oriented, online,
|
||
real-time design. The INN antecedents of some of these systems is therefore
|
||
natural. We would love to hear from any Linux HOWTO reader whose Usenet
|
||
server requirements include carrier-class behaviour.
|
||
|
||
We are aware of only one freely redistributable NNTP router: NNTPRelay (see
|
||
http://nntprelay.maxwell.syr.edu/); this software runs on NT. There is no
|
||
reason why such services cannot run off Linux servers, even Intel Linux,
|
||
provided you have fast network links and arrays of servers. Linux as an OS
|
||
platform is not an issue here.
|
||
|
||
TO BE EXTENDED AND CORRECTED.
|
||
-----------------------------------------------------------------------------
|
||
|
||
4. Setting up CNews + NNTPd
|
||
|
||
4.1. Getting the sources and stuff
|
||
|
||
4.1.1. The sources
|
||
|
||
C-News software can be obtained from ftp://ftp.uu.net/networking/news/
|
||
transport/cnews/cnews.tar.Z and will need to be uncompressed using the BSD
|
||
uncompress utility or a compatible program. The tarball is about 650 KBytes
|
||
in size. It has its own highly intelligent configuration and installation
|
||
processes, which are very well documented. The version that is available is
|
||
Cleanup Release revision G, on which our own version is based.
|
||
|
||
NNTPd (the NNTP Reference Implementation) is available from ftp://ftp.uu.net/
|
||
networking/news/nntp/nntp.1.5.12.1.tar.Z. It has no automatic scripts and
|
||
processes to configure itself. After fetching the sources, you will have to
|
||
follow a set of directions given in the documentation and configure some C
|
||
header files. These configuration settings must be done keeping in mind what
|
||
you have specified when you build the C-News sources, because NNTPd and
|
||
C-News must work together. Therefore, some key file formats, directory paths,
|
||
etc., will have to be specified identically in both software systems.
|
||
|
||
The third software system we use is Nestor. This too is to be found in the
|
||
same place where the NNTPd software is kept, at ftp://ftp.uu.net/networking/
|
||
news/nntp/nestor.tar.Z. This software compiles to one binary program, which
|
||
must be run periodically to process the logs of nntpd, the NNTP server which
|
||
is part of NNTPd, and report usage statistics to the administrator. We have
|
||
integrated Nestor into our source base.
|
||
|
||
The fourth piece of the system, without which no Usenet server administrator
|
||
dares venture out into the wild world of public Internet newsfeeds, is
|
||
pgpverify.
|
||
|
||
We have been working with C-News and NNTPd for many years now, and have fixed
|
||
a few bugs in both packages. We have also integrated the four software
|
||
systems listed above, and added a few features here and there to make things
|
||
work more smoothly. We offer our entire source base to anyone for free
|
||
download from http://www.starcomsoftware.com/proj/usenet/src/news.tar.gz.
|
||
There are no licensing restrictions on our sources; they are as freely
|
||
redistributable as the original components we started with.
|
||
|
||
When you download our software distribution, you will extract it to find a
|
||
directory tree with the following subdirectories and files:
|
||
|
||
<EFBFBD><EFBFBD>*<2A>c-news: the source tree of the CR.G software release, with our additions
|
||
like pgpverify integration, our scripts like mail2news, and pre-created
|
||
configuration files.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>nntp-1.5.12.1: the source tree of the original NNTPd release, with header
|
||
files pre-configured to fit in with our configuration of C-News, and our
|
||
addition of bits and pieces like Nestor, the log analysis program.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>howto: this document, and its SGML sources and Makefile.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>build.sh: a shellscript you can run to compile the entire combined source
|
||
tree and install binaries in the right places, if you are lucky and all
|
||
goes well.
|
||
|
||
|
||
Needless to say, we believe that our source tree is a better place to start
|
||
with than the original components, specially if you are installing a Usenet
|
||
server on a Linux box and for the first time. We will be available on email
|
||
to provide technical assistance should you run into trouble.
|
||
-----------------------------------------------------------------------------
|
||
|
||
4.1.2. The key configuration files
|
||
|
||
Once you get the sources, you will need some key configuration files to seed
|
||
your C-News system. These configuration files are actually database tables,
|
||
and are changing frequently, whenever newsgroups are created, modified or
|
||
deleted. These files specify the list of active newsgroups in the ``public''
|
||
Usenet. You can, and should, add your organisation's internal newsgroups to
|
||
this list when you set up your own server, but you will need to know the list
|
||
of public standard newsgroups to begin with. This list can be obtained from
|
||
the same FTP server by downloading the files active.gz and newsgroups.gz from
|
||
ftp://ftp.uu.net/networking/news/config/. You can create your own active and
|
||
newsgroups files by retaining a subset of the entries in these two files.
|
||
Both these are ASCII text files.
|
||
|
||
Getting the sources from our server will not obviate the need to get the
|
||
latest versions of these files from ftp.uu.net. We do not (yet) maintain an
|
||
up-to-date copy of these files on our server, and we will add no value to the
|
||
original by just mirroring them.
|
||
-----------------------------------------------------------------------------
|
||
|
||
4.2. Compiling and installing
|
||
|
||
For installing, first make sure you have an entry for a user called news in
|
||
your /etc/password file. This is setting the news-database owner to news. Now
|
||
download the source from us and untar it in the home directory of news. This
|
||
creates two main directories viz. c-news and nntp. To install and compile,
|
||
run the script build.sh as root in the directory that contains the script. It
|
||
is important that the script run as root as it sets ownerships, installs and
|
||
compiles the source as user news. This is a one-step process that puts in
|
||
place both the C-News and the NNTP software, setting correct permissions and
|
||
paths. Following is a brief description of what build.sh does:
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Checks for the OS platform and exits if it is not Linux.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Again, exits if you are not running as root.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Looks for and exits if cannot find the above two directories.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Compiles C-News and performs regression tests if the compilation was
|
||
successfull. Sends out a warning to read the error file make.out.r and to
|
||
fix 'em. Compilation erros are written to a file called make.out.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Performs the above operation in the nntp directory, too.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Checks for the presence of the three key directories: $NEWSARTS - (/var/
|
||
spool/news) that houses the artciles, $NEWSCTL -(/var/lib/news) that
|
||
contain configuration, log and status files and $NEWSBIN - (/usr/lib/
|
||
newsbin) that contain binaries and executables for the working of the
|
||
Usenet News system. Tries to create them if non-existent and exits if it
|
||
results in failure.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Changes the ownership of these directories to news.news. This is
|
||
important since the entire Usenet News System runs as user news. It will
|
||
not function properly as any other user.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Then starts the installation process of C News. It runs make install to
|
||
install binaries at the right locations; make setup to set the correct
|
||
paths and umask, create directories for newsgroups, determine who will
|
||
receive reports; make ui to set up inews and injnews and make
|
||
readpostcheck to use readnews, postnews and checknews scripts provided by
|
||
C News. The errors, if any are to be found in the respective make.out
|
||
files. e.g. make.setup will write errors to make.out.setup
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Newsspool, which queues incoming batches in $NEWSARTS/in.coming directory
|
||
should run as set-userid and set-groupid. This is done.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>A softlink is made to /var/lib/news from /usr/lib/news.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>The NNTP software is installed.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Sets up the manpages for C News and makes it world readable. The NNTP
|
||
manpages get installed when the software is installed. Compiles the C
|
||
News documentation guide.ps and makes it readable and available in /usr/
|
||
doc/packages/news or /usr/doc/news.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Checks for the PGP binary and asks the administrator to get it, if not
|
||
found.
|
||
|
||
|
||
-----------------------------------------------------------------------------
|
||
4.3. Configuring the system: What and how to configure files?
|
||
|
||
Once installed, you have to now configure the system to accept feeds and
|
||
batch them for your neighbours. You will have to do the following:
|
||
|
||
<EFBFBD><EFBFBD>*<2A>nntpd: Copy the compiled nntpd into a directory where executables are
|
||
kept and activate it. It runs on port 119 as a daemon through inetd
|
||
unless you have compiled it as stand-alone. An entry in the /etc/services
|
||
file for nntp would look like this:
|
||
+---------------------------------------------------------------+
|
||
|nntp 119/tcp \# Network News Transfer Protocol |
|
||
+---------------------------------------------------------------+
|
||
An entry in the inetd.conf file will be:
|
||
+-----------------------------------------------------------------------+
|
||
| nntp stream tcp nowait news path-to-tcpd path-to-nntpd |
|
||
+-----------------------------------------------------------------------+
|
||
The last two fields in the inetd.conf file are paths to binaries of the
|
||
tcp and the nntp daemon respectively.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Configuring control files: There are plenty of control files in $NEWSCTL
|
||
that will need to be configured before you can start using the news
|
||
system. The files mentioned here are also discussed in the first section
|
||
of the section titled "Components of a running system>". These control
|
||
files are dealt in detail in the following below.
|
||
|
||
<20><>+<2B>sys: One line per system/NDN listing all the newsgroup hierarchies
|
||
each system subscribes to. Each line is prefixed with the system name
|
||
and the one beginning with
|
||
+------------------------------------------------------------+
|
||
|ME: |
|
||
+------------------------------------------------------------+
|
||
indicates what your server is willing to receive. Following are
|
||
typical entries that go into this file:
|
||
+------------------------------------------------------------+
|
||
|ME:comp,news,misc,netscape |
|
||
+------------------------------------------------------------+
|
||
This line indicates what newsgroups your server has subscribed to.
|
||
+------------------------------------------------------------+
|
||
|server/server.starcomsoftware.com:all,!general/all:f |
|
||
+------------------------------------------------------------+
|
||
This is a list of newsgroups your server will pass on to your NDN.
|
||
The newsgroups specified should be a comma separated list and the
|
||
entire line should contain no spaces. The f flag indicates that the
|
||
newsgroup name and the article number alongwith its size will make up
|
||
one entry in the togo file in the $NEWSARTS/out.going directory.
|
||
|
||
<20><>+<2B>explist: This file has entries indicating which articles expire and
|
||
when and whether they have to be archived. The order in which the
|
||
newsgroups are listed is important. An example follows:
|
||
+------------------------------------------------------------+
|
||
|comp.lang.java.3d x 60 /var/spool/news/Archive |
|
||
+------------------------------------------------------------+
|
||
This means that the articles of comp.lang.java expire after 60 days
|
||
and shall be archived in the directory mentioned in the fourth field.
|
||
Archiving is an option. The second field indicates that this line
|
||
applies to both moderated and unmoderted newsgroups. m would specify
|
||
moderated and u would specify unmoderated groups. If you want to
|
||
specify an extremely large no. as the expiry period you can use the
|
||
keyword "never".
|
||
|
||
<20><>+<2B>batchparms: sendbatches is a program that administers batched
|
||
transmission of news articles to other sites. To do this it consults
|
||
the batchparms file. Each line in the file specifies the behaviour
|
||
for each of your NDN mentioned in the sys file. There are five fields
|
||
for each site to be specified.
|
||
+----------------------------------------------------------------------+
|
||
| server u 100000 100 batcher | gzip -9 | viauux -d gunzip |
|
||
+----------------------------------------------------------------------+
|
||
|
||
The first field is the site name which matches the entry in the sys
|
||
file and has a corresponding directory in $NEWSARTS/out.going by that
|
||
name.
|
||
|
||
The second field is the class of the site,u for UUCP and n for NNTP
|
||
feeds. A "!" in this field means that batching for this site has been
|
||
disabled.
|
||
|
||
The third field is the size of batches to be prepared in bytes.
|
||
|
||
The fourth field is the maximum length of the output queue for
|
||
transmission to that site.
|
||
|
||
The fifth field is the command line to be used to build, compress and
|
||
transmit batches to that site. The contents of the togo file are made
|
||
available on standard input.
|
||
|
||
<20><>+<2B>controlperm: This file controls how the news system responds to
|
||
control messages. Each line consists of 4-5 fields separated by white
|
||
space. Control messages has been discussed in "Section 2.4>".
|
||
+--------------------------------------------------------------------+
|
||
|comp,sci tale@uunet.uu.net nrc pv news.announce.newsgroups|
|
||
+--------------------------------------------------------------------+
|
||
|
||
The first field is a newsgroup pattern to which the line applies.
|
||
|
||
The second field is either the keyword "any" or an e-mail address.
|
||
The latter specifies that the line applies to control messages from
|
||
only that author.
|
||
|
||
The third field is a set of opcode letters indicating what control
|
||
operations need to be performed on messages emanating from the e-mail
|
||
address mentioned in the second field. n stands for creating a
|
||
newgroup, r stands for deleting a newsgroup and c stands for
|
||
checkgroup.
|
||
|
||
The fourth field is a set of flag letters indicating how to respond
|
||
to a control message that meets all the applicability tests:
|
||
+------------------------------------------------------------------------------+
|
||
| y Do it. |
|
||
| n Don't do it. |
|
||
| v Report it and include the entire control |
|
||
| message in the report. |
|
||
| q Don't report it. |
|
||
| p Do it iff the control message carries a valid PGP signature. |
|
||
| |
|
||
+------------------------------------------------------------------------------+
|
||
Exactly one of y, n or p must be present.
|
||
|
||
The fifth field, which is optional, will be used if the fourth field
|
||
contains a p. It must contain the PGP key ID of the public key to be
|
||
used for signature verification.
|
||
|
||
<20><>+<2B>mailpaths: This file describes how to reach the moderators of various
|
||
hierarchies of newsgroups by mail. Each line consists of two fields:
|
||
a news group pattern and an e-mail address. The first line whose
|
||
group pattern matches the newsgroup is used. As an example:
|
||
+--------------------------------------------------------------+
|
||
| comp.lang.java.3d somebody@mydomain.com |
|
||
| all %s@moderators.uu.net |
|
||
| |
|
||
+--------------------------------------------------------------+
|
||
In the second example, the %s gets replaced with the groupname and
|
||
all dots appearing in the newsgroup name are substituted with dashes.
|
||
|
||
<20><>+<2B>Miscellaneous files: The other files to be modified are:
|
||
|
||
<20><>o<EFBFBD>mailname: Contains the Internet domain name of the news system.
|
||
Consider getting one if you don't have it.
|
||
|
||
<20><>o<EFBFBD>organization: Contains the default value for the Organization:
|
||
header for postings originating locally.
|
||
|
||
<20><>o<EFBFBD>whoami: Contains the name of the news system. This is the site
|
||
name used in the Path: headers and hence should concur with the
|
||
names your neighbours use in their sys files.
|
||
|
||
|
||
<20><>+<2B>active file: This file specifies one line for each newsgroup (not
|
||
just the hierarchy) to be found on your news system. You will have to
|
||
get the most recent copy of the active file from ftp://ftp.isc.org/
|
||
usenet/CONFIG/active and prune it to delete newsgroups that you have
|
||
not subscribed to. Run the script addgroup for each newsgroup in this
|
||
file which will create relevant directories in the $NEWSARTS area.
|
||
The addgroup script takes two paramters: the newsgroup name being
|
||
created and a flag. The flag can be any one of the following:
|
||
+----------------------------------------------------------------------------+
|
||
| y local postings are allowed |
|
||
| n no local postings, only remote ones |
|
||
| m postings to this group must be approved |
|
||
| by the moderator |
|
||
| j articles in this group are only passed and not kept |
|
||
| x posting to this newsgroup is disallowed |
|
||
| =foo.bar articles are locally filed in |
|
||
| "foo.bar" group |
|
||
| |
|
||
+----------------------------------------------------------------------------+
|
||
An entry in this file looks like this:
|
||
+------------------------------------------------------------+
|
||
|comp.lang.java.3d 0000003716 01346 m |
|
||
+------------------------------------------------------------+
|
||
The first field is the name of the newsgroup. The second field is the
|
||
highest article number that has been used in that newsgroup. The
|
||
third field is the lowest article number in the group. The fourth
|
||
field is a flag as explained above.
|
||
|
||
<20><>+<2B>newsgroups file: This contains a one-line description of each
|
||
newsgroup to be found in the active file. You will have to get the
|
||
most recent file from ftp://ftp.isc.org/usenet/CONFIG/newsgroups and
|
||
prune it to remove unwanted information. As an example:
|
||
+----------------------------------------------------------------+
|
||
|comp.lang.java.3d 3D Graphics APIs for the Java language |
|
||
+----------------------------------------------------------------+
|
||
|
||
<20><>+<2B>Aliases: These aliases are required for trouble reporting. Once the
|
||
system is in place and scripts are run, anomalies/problems are
|
||
reported to addresses in the /etc/aliases file. These entries include
|
||
email addresses for newsmaster, newscrisis, news, usenet, newsmap.
|
||
They should ideally point to an email address that will be accessed
|
||
at regularly. Arrange the emails for newsmap to be discarded to
|
||
minimize the effect of sendsys bombing by practical jokers.
|
||
|
||
<20><>+<2B>Cron jobs: Certain scripts like newsrun that picks up incoming
|
||
batches and maintenance scripts, should run through news-database
|
||
owner's cron which is news. The cron entries ideally will be for the
|
||
following: A more detailed report can be found in "Section 9.4>"
|
||
|
||
1. newsrun: This script processes incoming batches of article. Run
|
||
this as frequently as you want them to get digested.
|
||
|
||
2. sendbatches: This script transmit batches to the NDNs. Set the
|
||
frequency according to your requirements.
|
||
|
||
3. newsdaily: This should be run ideally once a day since it reports
|
||
errors and anomalies in the news system.
|
||
|
||
4. newswatch: This looks for errors/anomalies at a more detailed
|
||
level and hence should be run atleast once every hour
|
||
|
||
5. doexpire: This script expires old articles as determined by the
|
||
explist file. Run this once a day.
|
||
|
||
|
||
<20><>+<2B>newslog: Make an entry in the system's syslog.conf file for logging
|
||
messages spewed out by nntpd in newslog . It should be located in
|
||
$NEWSCTL. The entry will look like this:
|
||
+------------------------------------------------------------+
|
||
|news.debug -/var/lib/news/newslog |
|
||
+------------------------------------------------------------+
|
||
|
||
<20><>+<2B>Newsboot: Have this run (as news the news-database owner) when the
|
||
system boots to clear out debris left around by crashes.
|
||
|
||
<20><>+<2B>Add a Usenet mailer in sendmail: The mail2news program provided as
|
||
part of the source code is a handy tool to send an e-mail to a
|
||
newsgroup which gets digested as an article. You will have to add the
|
||
following ruleset and mailer definition in your sendmail.cf file:
|
||
|
||
<20><>o<EFBFBD>Under SParse1, add the following:
|
||
+-------------------------------------------------------------+
|
||
| R$+ . USENET < @ $=w . > $#usenet $: $1 |
|
||
| |
|
||
+-------------------------------------------------------------+
|
||
|
||
<20><>o<EFBFBD>Under mailer definitions, define the mailer Usenet as:
|
||
+---------------------------------------------------------------------------------+
|
||
| MUsenet P=/usr/lib/newsbin/mail2news/m2nmailer, F=lsDFMmn, |
|
||
| S=10, R=0, M=2000000, T=X-Usenet/X-Usenet/X-Unix, A=m2nmailer $u |
|
||
| |
|
||
+---------------------------------------------------------------------------------+
|
||
|
||
|
||
In order to send a mail to a newsgroup you will now have to suffix
|
||
the newsgroup name with usenet i.e. your To: header will look like
|
||
this:
|
||
+------------------------------------------------------------+
|
||
|To: misc.test.usenet@yourdomain. |
|
||
+------------------------------------------------------------+
|
||
The mailer definition of usenet will intercept this mail and post it
|
||
to the respective newsgroup, in this case, misc.test
|
||
|
||
|
||
This, more or less, completes the configuration part.
|
||
|
||
|
||
-----------------------------------------------------------------------------
|
||
4.4. Testing the system
|
||
|
||
To locally test the system, follow the steps given below:
|
||
|
||
<EFBFBD><EFBFBD>*<2A>post an article: Create a local newsgroup
|
||
+---------------------------------------------------------------+
|
||
| cnewsdo addgroup mysite.test y |
|
||
| |
|
||
+---------------------------------------------------------------+
|
||
and using postnews post an article to it.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Has it arrived in $NEWSARTS/in.coming?: The article should show up in the
|
||
directory mentioned. Note the nomenclature of the article.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>When newsrun runs: When newsrun runs from cron , the article disappears
|
||
from in.coming directory and appears in $NEWSARTS/mysite/test. Look how
|
||
the newsgroup, active, log and history (not the errorlog) files and
|
||
.overview file in $NEWSARTS/mysite/test reflect the digestion of the file
|
||
into the news system.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>reading the article: Try to read the article through readnews or any news
|
||
client. If you are able to, then you have set most everything right.
|
||
|
||
|
||
-----------------------------------------------------------------------------
|
||
4.5. pgpverify and controlperms
|
||
|
||
As mentioned in "Section 2.4>", it becomes necessary to authenticate control
|
||
messages to protect yourself from being attacked by pranksters. For this, you
|
||
will have to configure the $NEWSCTL/controlperm file to declare whose control
|
||
messages you are willing to honour and for what newsgroups alongwith their
|
||
public key ID. The controlperm manpage shall give you details on the format.
|
||
|
||
This will work only in association with pgpverify which verifies the Usenet
|
||
control messages that have been signed using the signcontrol process. The
|
||
script can be found at ftp://ftp.isc.org/pub/pgpcontrol/pgpverify. pgpverify
|
||
internally uses the PGP binary which will have to be made available in the
|
||
default executables directory. If you wish to send control messages for your
|
||
local news system, you will have to digitally sign them using the above
|
||
mentioned signcontrol program which is available at ftp://ftp.isc.org/pub/
|
||
pgpcontrol/signcontrol. You will also have to configure the signcontrol
|
||
program accordingly.
|
||
-----------------------------------------------------------------------------
|
||
|
||
4.6. Feeding off an upstream neighbour
|
||
|
||
For external feeds, commercial customers will have to buy them from a regular
|
||
News Provider like dejanews.com or newsfeeds.com. You will have to specify to
|
||
them what hierarchies you want and decide on the mode of transmission, i.e.
|
||
UUCP or NNTP, based on your requirements. Once that is done, you will have to
|
||
ask them to initiate feeds, and check $NEWSARTS/in.coming directory to see if
|
||
feeds are coming in.
|
||
|
||
If your organisation belongs to the academic community or is otherwise lucky
|
||
enough to have an NDN server somewhere which is willing to provide you a free
|
||
newsfeed, then the payment issue goes out of the picture, but the rest of the
|
||
technical requirements remain the same.
|
||
|
||
One problem with incoming NNTP feeds is that it is far easier to use
|
||
(relatively) efficient NNTP inflows if you have a server with a permanent
|
||
Internet connection and a fixed IP address. If you are a small office with a
|
||
dialup Internet connection, this may not be possible. In that case, the only
|
||
way to get incoming newsfeeds by NNTP may be by using a highly inefficient
|
||
pull feed.
|
||
-----------------------------------------------------------------------------
|
||
|
||
4.7. Configuring outgoing feeds
|
||
|
||
If you are a leaf node, you will only have to send feeds back to your news
|
||
provider for your postings in public newsgroups to propagate to the outside
|
||
world. To enable this, you need one line in the sys and batchparms files and
|
||
one directory in $NEWSARTS/out.going. If you are willing to transmit articles
|
||
to your neighbouring sites, you will have to configure sys and batchparms
|
||
with more entries. The number of directories in $NEWSARTS/out.going shall
|
||
increase, too. Refer to first two sections of the chapter titled "Components
|
||
of a running system>"for a better understanding of outgoing feeds. Again, you
|
||
will have to determine how you wish to transmit the feed: UUCP or NNTP.
|
||
-----------------------------------------------------------------------------
|
||
|
||
4.7.1. By UUCP
|
||
|
||
For outgoing feeds by UUCP, we recommend that you start with Taylor UUCP. In
|
||
fact, this is the UUCP version which forms part of the GNU Project and is the
|
||
default UUCP on Linux systems.
|
||
|
||
A full treatment of UUCP configuration is beyond the scope of this document.
|
||
However, the basic steps will be as follows. First, you will have to define a
|
||
"system" in your Usenet server for the NDN (next door neighbour) host. This
|
||
definition will include various parameters, including the manner in which
|
||
your server will call the remote server, the protocol it will use, etc. Then
|
||
an identical process will have to be followed on the NDN server's UUCP
|
||
configuration, for your server, so that that server can recognize your Usenet
|
||
server.
|
||
|
||
Finally, you will need to set up appropriate cron jobs for the user uucp to
|
||
run uucico periodically. Taylor UUCP comes with a script called uusched which
|
||
may be modified to your requirements; this script calls uucico. One uucico
|
||
connection will both upload and download news batches. Smaller sites can run
|
||
uusched even once or twice a day.
|
||
|
||
Later versions of this document will include the uusched scripts that we use
|
||
in Starcom. We use UUCP over TCP/IP, and we run the uucico connection through
|
||
an SSH tunnel, to prevent transmission of UUCP passwords in plain text over
|
||
the Internet, and our SSH tunnel is established using public-key
|
||
cryptography, without passwords being used anywhere.
|
||
-----------------------------------------------------------------------------
|
||
|
||
4.7.2. By NNTP
|
||
|
||
For NNTP feeds, you will have to decide whether your server will be the
|
||
connection initiator or connection recipient. If you are the connection
|
||
initiator, you can send outgoing NNTP feeds more easily. If you are the
|
||
connection recipient, then outgoing feeds will have to be pulled out of your
|
||
server using the NNTP NEWNEWS command, which will place heavy loads on your
|
||
server. This is not recommended.
|
||
|
||
Connecting to your NDN server for pushing out outgoing feeds will require the
|
||
use of the nntpsend.sh script, which is part of the NNTPd source tree. This
|
||
script will perform some housekeeping, and internally call the nntpxmit
|
||
binary to actually send the queued set of articles out. You may have to
|
||
provide authentication information like usernames and passwords to nntpxmit
|
||
to allow it to connect to your NDN server, in case that server insists on
|
||
checking the identity of incoming connections. (You can't be too careful in
|
||
today's world.) nntpsend.sh will clean up after an nntpxmit connection
|
||
finishes, and will requeue any unsent articles for the next session. Thus,
|
||
even if there is a network problem, typically nothing is lost and all pending
|
||
articles are transmitted next time.
|
||
|
||
Thus, pushing feeds out via may mean setting up nntpsend.sh properly, and
|
||
then invoking it periodically from cron. If your Usenet server connects to
|
||
the Internet only intermittently, then the process which sets up the Internet
|
||
connection should be extended or modified to fire nntpsend.sh whenever the
|
||
Internet link is established. For instance, if you are using the Linux pppd,
|
||
you can add statements to the /etc/ppp/ip-up script to change user to news
|
||
and run nntpsend.sh
|
||
-----------------------------------------------------------------------------
|
||
|
||
5. Setting up INN
|
||
|
||
5.1. Getting the source
|
||
|
||
INN is maintained and archived by the ISC (Internet Software Consortium,
|
||
www.isc.org) since 1996, and the INN homepage is at http://www.isc.org/
|
||
products/INN/. The latest release of INN as of the time of this writing is
|
||
INN v2.3.3, released 7 May 2002. The full sources can be downloaded from ftp:
|
||
//ftp.isc.org/isc/inn/inn-2.3.3.tar.gz
|
||
-----------------------------------------------------------------------------
|
||
|
||
5.2. Compiling and installing
|
||
|
||
TO BE EXTENDED LATER.
|
||
-----------------------------------------------------------------------------
|
||
|
||
5.3. Configuring the system
|
||
|
||
TO BE ADDED LATER.
|
||
-----------------------------------------------------------------------------
|
||
|
||
5.4. Setting up pgpverify
|
||
|
||
TO BE ADDED LATER.
|
||
-----------------------------------------------------------------------------
|
||
|
||
5.5. Feeding off an upstream neighbour
|
||
|
||
TO BE ADDED LATER.
|
||
-----------------------------------------------------------------------------
|
||
|
||
5.6. Setting up outgoing feeds
|
||
|
||
TO BE ADDED LATER.
|
||
-----------------------------------------------------------------------------
|
||
|
||
5.7. Efficiency issues and advantages
|
||
|
||
TO BE ADDED LATER.
|
||
-----------------------------------------------------------------------------
|
||
|
||
6. Connecting email with Usenet news
|
||
|
||
Usenet news and mailing lists constantly remind us of each other. And the
|
||
parallels are so strong that many mailing lists are gatewayed two-way with
|
||
corresponding Usenet newsgroups, in the bit hierarchy which maps onto the old
|
||
BITNET, and elsewhere.
|
||
|
||
There are probably ten different situations where a mailing list is better,
|
||
and ten others where the newsgroup approach works better. The point to
|
||
recognise is that the system administrator needs a choice of gatewaying one
|
||
with the other, whenever tradeoffs justify it. Instead of getting into the
|
||
tradeoffs themselves, this chapter will then focus on the mechanisms of
|
||
gatewaying the two worlds.
|
||
|
||
One clear and recurring use we find for this gatewaying is for mailing lists
|
||
which are of general use to many employees in a corporate network. For
|
||
instance, in stockbroking company, many employees may like to subscribe to a
|
||
business news mailing list. If each employee had to subscribe to the mailing
|
||
list independently, it would waste mail spool area and perhaps bandwidth. In
|
||
such situations, we receive the mailing list into an internal newsgroup, so
|
||
that individual mailboxes are not overloaded. Everyone can then read the
|
||
newsgroup, and messages are also archived till expired.
|
||
-----------------------------------------------------------------------------
|
||
|
||
6.1. Feeding Usenet news to email
|
||
|
||
In CNews, this is trivially done by adding one line to the sys file, defining
|
||
a new outgoing feed listing all the relevant groups and distributions, and
|
||
specifying the commandline to be executed which is supposed to send out the
|
||
outgoing message to that ``feed.'' This command, in our case, should be a
|
||
mail-sending program, e.g. /bin/mail user@somewhere.com. This is often
|
||
adequate to get the job done. We are sure almost every Usenet news software
|
||
system will have an equally easy way of piping the feed of a newsgroup to an
|
||
email address.
|
||
-----------------------------------------------------------------------------
|
||
|
||
6.2. Feeding email to news: the mail2news gateway
|
||
|
||
With our Usenet software sources has been integrated a set of scripts which
|
||
we have been using for at least five years internally. This set of scripts is
|
||
called mail2news. It contains one shellscript called mail2news, which takes
|
||
an email message from stdin, processes it, and feeds the processed version to
|
||
inews, the stdin-based news article injection utility of C-News. The inews
|
||
utility accepts a new article post in its stdin and queues it for digestion
|
||
by newsrun whenever it runs next.
|
||
|
||
To use mail2news, we assume you are using Sendmail to process incoming email.
|
||
Our instructions can easily be modified to adapt to any Mail Transport Agent
|
||
(MTA) of your choice. You will have to configure Sendmail or any other MTA to
|
||
redirect incoming mails for the gateway to a program called m2nmailer, a
|
||
Perlscript which accepts the incoming message in its standard input and a
|
||
list of newsgroup names, space separated, on its command line. Sendmail can
|
||
be easily configured to trigger m2nmailer this way by defining a new mailer
|
||
in sendmail.cf, and directing all incoming emails meant for the Usenet news
|
||
system to this mailer. Once you set up the appropriate rulesets for Sendmail,
|
||
it automatically triggers m2nmailer each time an incoming email comes for the
|
||
mail2news gateway.
|
||
|
||
The precise configuration changes to Sendmail have already been specified in
|
||
the chapter titled ``Setting up C-News + NNTPd.''
|
||
-----------------------------------------------------------------------------
|
||
|
||
6.3. Using GNU Mailman as an email-NNTP gateway
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
6.3.1. GNU's all-singing all-dancing MLM
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
6.3.2. Features of GNU Mailman
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
6.3.3. Gateway features connecting NNTP and email
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
7. Security issues
|
||
|
||
It almost seems strange that we are discussing security issues in the context
|
||
of Usenet news servers. Usenet news has been one of the most open and
|
||
free-for-all network services traditionally. However, with the exponential
|
||
growth of the Internet, all services are becoming aware of potential threats.
|
||
The community of Internet intruders too has acquired new profiles: a lot of
|
||
Internet intrusion attempts are program-driven, and exploit a set of ``well
|
||
known'' vulnerabilities, i.e. vulnerabilities which have been identified by
|
||
the computer security and intrusion community and published in their reports
|
||
and advisories. Thus, the question of ``Why will someone attack my harmless
|
||
Usenet server?'' is no longer valid. It will be attacked if it can be
|
||
attacked, merely because its IP address falls in a range of addresses being
|
||
targeted, perhaps.
|
||
|
||
Security issues for Usenet news servers fall into two categories. First come
|
||
vulnerabilities which will allow an attacker to bring down your server or run
|
||
code of his choice on it. Second come vulnerabilities which can distort or
|
||
corrupt your Usenet article hierarchy, either by junk postings, unsolicited
|
||
commercial messages, or forged control messages. The second category of
|
||
threats is specific to Usenet news and needs Usenet-specific protection
|
||
mechanisms, some of which require tapping into defence mechanisms designed by
|
||
the Usenet administrator community.
|
||
-----------------------------------------------------------------------------
|
||
|
||
7.1. Intrusion threats
|
||
|
||
Here we discuss the vulnerabilities which will allow an intruder to ``gain
|
||
control'' of your Usenet server, or ``bring it down,'' either of which may be
|
||
irritating, embarassing, or downright disastrous for your business or
|
||
occupation.
|
||
-----------------------------------------------------------------------------
|
||
|
||
7.1.1. Generic server vulnerabilities
|
||
|
||
Foremost among these vulnerabilities are those which render any server
|
||
vulnerable to intrusion attempts. Most of these vulnerabilities are unrelated
|
||
to Usenet news itself. For instance, if you have the Telnet service active on
|
||
a server exposed to the Internet, then it is likely that systematic attempts
|
||
by intruders to acquire usernames and passwords will bear fruit, using
|
||
methods we will best leave to specialised texts on the subject. Once this is
|
||
done, the intruder will merely ``walk into'' your server by Telnetting into
|
||
it.
|
||
|
||
We will not discuss this class of vulnerabilities here any further; they
|
||
belong in documents dedicated to general security issues. For further
|
||
reading, check the ``Security HOWTO'', the ``Security Quickstart HOWTO'', the
|
||
``User Authentication HOWTO'', the ``VPN HOWTO'', and the ``VPN Masquerade
|
||
HOWTO'' ... and that's just from the Linux HOWTO collection. As one can see,
|
||
there is, if anything, a surfeit of material on this and related subjects.
|
||
|
||
There are vulnerabilities which allow an intruder to mount the so-called DoS
|
||
attacks, which make your service inaccessible to legitimate users, even
|
||
though it does not let the intruder in. The most publicised of these attacks
|
||
were the SYNFlood and the Ping of Death attacks, both quite old and
|
||
well-understood by now. A Linux server running a recent version of the kernel
|
||
and properly configured, should be immune to both these attack methods. But
|
||
network protocols being what they are, there are always new DoS methods being
|
||
thought up, which can temporarily overload or slow down a server. Once again,
|
||
the texts discussing generic security issues are the best place to study
|
||
these vulnerabilities.
|
||
-----------------------------------------------------------------------------
|
||
|
||
7.1.2. Vulnerabilities in Usenet software
|
||
|
||
Then come server vulnerabilities, if any, which are caused specifically by
|
||
Usenet news software. For instance, if it was possible for an intruder to
|
||
issue some string of bytes to your server's NNTP server and cause it to
|
||
execute a command of the intruder's choice, then this vulnerability would be
|
||
in this category.
|
||
|
||
Any server which accepts a text string as input from a client is open to the
|
||
buffer overrun class of attacks, if the gets() C library function has been
|
||
used in its code instead of the fgets() with a buffer size limit. This was a
|
||
vulnerability made famous by the 1988 Morris Internet Worm, discussions on
|
||
which can be found elsewhere. (Go Google for it if you're keen.) As far as we
|
||
know, the INN NNTP server and the nntpd which forms part of the NNTP
|
||
Reference Implementation both have no known buffer overrun vulnerabilities.
|
||
This class of vulnerabilities is less significant in the case of NNTPd or INN
|
||
because these daemons do not run as root. In fact, they would begin to cause
|
||
malfunctioning of the underlying Usenet software if they ran as root.
|
||
Therefore, even if an intrepid intruder could find some way of gaining
|
||
control of these daemons, she would only be able to get into the server as
|
||
user news, which means that she can play havoc with the Usenet installation,
|
||
but no further. A daemon which runs as root, if compromised, can allow an
|
||
intruder to take control of the operating system itself.
|
||
|
||
UUCP is generally believed to be insecure. We believe a careful configuration
|
||
of Taylor UUCP plugs a lot of these vulnerabilities. One vulnerability with
|
||
UUCP over TCP is that the username and password travel in plaintext form in
|
||
TCP data streams, much like with Telnet or FTP. We therefore do not advise
|
||
using UUCP over TCP in this manner if security is a concern at all. We
|
||
recommend the use of UUCP through a SSH tunnel, with the SSH setup working
|
||
only with a pre-installed public key. This way, there is no need for
|
||
usernames and passwords for the SSH tunnel setup, and passwords cannot be
|
||
leaked even intentionally. And the UUCP username and password then passes
|
||
through this encrypted tunnel and is therefore totally superfluous for
|
||
security; the preceding SSH tunnel provides a much stronger connection
|
||
authentication than the UUCP username and password. And since we set up our
|
||
SSH tunnels to demand key-based authentication only, it rejects any attempt
|
||
to connect using usernames and passwords when the tunnel is being set up.
|
||
|
||
A third possible vulnerability is related to the back-end software which
|
||
processes incoming Usenet articles. It is conceivable that an NNTP server
|
||
will receive an incoming POST command, receive an article, and queue it for
|
||
processing on the local spool; the NNTP server often does not perform any
|
||
real-time processing on the incoming post. The post-processing software which
|
||
periodically processes the incoming spool (the in.coming directory in C-News)
|
||
will read this article and somehow be forced to run a command of the
|
||
intruder's choice, either by buffer overrun vulnerabilities or any other
|
||
means.
|
||
|
||
While this possibility exists, it appears that neither the C-News newsrun and
|
||
family nor INN are vulnerable to this class of attempts. We base our comment
|
||
on the solid evidence that both these systems have been around in an
|
||
intrusion-prone world of public Usenet servers for more than a decade. INN,
|
||
the newer of the two, completed one decade of life on 20 August 2002. And
|
||
both these software systems had their source freely available to all,
|
||
including intruders. We can be fairly certain that if vulnerabilities of this
|
||
class have not been seen, it not for want of intrusion attempts.
|
||
-----------------------------------------------------------------------------
|
||
|
||
7.2. Vulnerabilities unique to the Usenet service
|
||
|
||
There are certain security precautions that a Usenet server administrator has
|
||
to take to ensure that her servers are not swamped by irritating junk or
|
||
configured out of shape by spurious control messages. These vulnerabilities
|
||
do not allow an intruder to run her software on your servers, but allows her
|
||
to mess up your server, causing you to lose a precious weekend (or week)
|
||
straightening out the mess.
|
||
-----------------------------------------------------------------------------
|
||
|
||
7.2.1. Unsolicited commercial messages
|
||
|
||
Unsolicited commercial messages are called SPAM. There is a war against SPAM
|
||
being fought in the Internet community. The biggest battlefront is in the
|
||
world of email. Second to that is Usenet newsgroups.
|
||
|
||
There are many tools that Usenet administrators use in their battle against
|
||
SPAM. The most important of these is the NoCeM suite. See http://www.cm.org/
|
||
for details of NoCeM, and the newsgroup alt.nocem.misc for the SPAM cancel
|
||
messages which NoCeM reads to identify which articles to discard. Your server
|
||
will need a feed of alt.nocem.misc to use the NoCeM facility. These special
|
||
messages are signed by NoCeM volunteers whose job is to identify SPAM
|
||
articles, list their message-IDs, and then issue these deletion instruction,
|
||
digitally signed with special private keys, which tell all Usenet servers to
|
||
delete the SPAM messages. Your server's NoCeM software will need public key
|
||
software (typically PGP) and a keyring with the public key of each NoCeM
|
||
volunteer you want to accept instructions from.
|
||
|
||
Other anti-spam tools for Usenet services are listed in the Anti-SPAM
|
||
Software Web page (http://www.exit109.com/~jeremy/news/antispam.html). The
|
||
Cleanfeed software will clean out articles identified as SPAM. There are many
|
||
others.
|
||
|
||
SPAM is such a nuisance and a drain on organisational expense pockets (by
|
||
wasting bandwidth you pay for) that it is almost imperative today that every
|
||
Usenet server protects itself against it. We will integrate some selected
|
||
anti-SPAM measures into our integrated source distribution soon.
|
||
-----------------------------------------------------------------------------
|
||
|
||
7.2.2. Spurious control messages
|
||
|
||
Control messages, discussed in detail earlier in Section 2.4>, instruct a
|
||
Usenet server to take certain actions, like delete a message or create a
|
||
newsgroup. If this facility is ``open to the public'', anyone with half a
|
||
brain can forge control messages to create twenty new newsgroups, and then
|
||
post thousands of articles into those groups. In the mid-nineties, we were
|
||
hit by a storm of over 2,000 (two thousand) newgroup control messages, which
|
||
rapidly taught us the danger of unprotected control messages and the
|
||
protection against them.
|
||
|
||
The standard protection mechanism against this vulnerability is pgpverify,
|
||
which can be downloaded from multiple Websites and FTP mirror sites by
|
||
searching for pgpverify (the program) or pgpcontrol (the total software
|
||
package). We have integrated this into our source distribution, so that our
|
||
C-News works in a tightly coupled manner with pgpverify.
|
||
|
||
pgpverify works using public key cryptography, much like NoCeM, and all the
|
||
official maintainers of respective Usenet group hierarchies sign control
|
||
messages using their private keys. Your server will carry their public keys,
|
||
and pgpverify will check the sign on each control message to ensure that it's
|
||
from the official maintainer of the hierarchy. It will then act upon legit
|
||
control messages and discard the spurious ones.
|
||
|
||
In today's nuisance-ridden Usenet environment, no sane Usenet server
|
||
administrator receiving a feed of ``public'' hierarchies and control messages
|
||
will even dream of running her server without pgpverify protection.
|
||
-----------------------------------------------------------------------------
|
||
|
||
8. Access control in NNTPd
|
||
|
||
The original NNTPd had host-based authentication which allowed clients
|
||
connecting from a particular IP address to read only certain newsgroups. This
|
||
was very clearly inadequate for enterprise deployment on an Intranet, where
|
||
each desktop computer has a different IP address, often DHCP-assigned, and
|
||
the mapping between person and desktop is not static.
|
||
|
||
What was needed was a user-based authentication, where a username and
|
||
password could be used to authenticate the user. Even this was provided as an
|
||
extension to NNTPd, but more was needed. The corporate IS manager needs to
|
||
ensure that certain Usenet discussion groups remain visible only to certain
|
||
people. This authorisation layer was not available in NNTPd. Once
|
||
authenticated, all users could read all newsgroups.
|
||
|
||
We have extended the user-based authentication facility in NNTPd in some (we
|
||
hope!) useful ways, and we have added an entire authorisation layer which
|
||
lets the administrator specify which newsgroups each user can read. With this
|
||
infrastructure, we feel NNTPd is fit for enterprise deployment and can be
|
||
used to handle corporate document repositories, messages, and discussion
|
||
archives. Details are given below.
|
||
-----------------------------------------------------------------------------
|
||
|
||
8.1. Host-based access control
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
8.2. User authentication and authorisation
|
||
|
||
8.2.1. The NNTPd password file
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
8.2.2. Mapping users to newsgroups
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
8.2.3. The X-Authenticated-Author article header
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
8.2.4. Other article header additions
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
9. Components of a running system
|
||
|
||
This chapter reviews the components of a running CNews+NNTPd server.
|
||
Analogous components will be found in an INN-based system too. We invite
|
||
additions from readers familiar with INN to add their pieces to this chapter.
|
||
-----------------------------------------------------------------------------
|
||
|
||
9.1. /var/lib/news: the CNews control area
|
||
|
||
This directory is more popularly known as $NEWSCTL. It contains
|
||
configuration, log and status files. There are no articles or binaries kept
|
||
here. Let's see what some of the files are meant for. Control files are dealt
|
||
in slightly greater detail in "Section 4.3>"
|
||
|
||
<EFBFBD><EFBFBD>*<2A>sys: One line per system/NDN listing all the newsgroup hierarchies each
|
||
system subscribes to. Each line is prefixed with the system name and the
|
||
one beginning with ME: indicates what we are going to receive. Look up
|
||
manpage of newssys.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>explist: This file has entries indicating articles of which newsgroup
|
||
expire and when and if they have to be archived. The order in which the
|
||
newsgroups are listed is important. See manpage of expire for file
|
||
format.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>batchparms: Details of how to feed other sites/NDN, like the size of
|
||
batches, the mode of transmission (UUCP/NNTP) are specified here. manpage
|
||
to refer: newsbatch.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>controlperm: If you wish to authenticate a control message before any
|
||
action is taken on it, you must enter authentication-related information
|
||
here. The controlperm manpage will list all the fields in detail.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>mailpaths: It features the e-mail address of the moderator for each
|
||
newsgroup who is responsible for approving/disapproving articles posted
|
||
to moderated newsgroups. The sample mailpaths file in the tar will give
|
||
you an idea of how entries are made.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>nntp_access/user_access: These files contain entries of servernames and
|
||
usernames on whom restrictions will apply when accessing newsgroups.
|
||
Again, the sample file in the tarball shall explain the format of the
|
||
file.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>log, errlog: These are log files that keep growing large with each batch
|
||
that is received. The log file has one entry per article telling you if
|
||
it has been accepted by your news server or rejected. To understand the
|
||
format of this file, refer to Chapter 2.2 of the CNews guide. Errors, if
|
||
any, while digesting the articles are logged in errlog. These log files
|
||
have to be rolled as the files hog a lot of disk space.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>nntplog: This file logs information of the nntpd giving details of when a
|
||
connection was established/broken and what commands were issued. This
|
||
file needs to be configured in syslog syslogd should be running.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>active: This file has one line per newsgroup to be found in your news
|
||
server. Besides other things, it tells you how many articles are
|
||
currently present in each newsgroup. It is updated when each batch is
|
||
digested or when articles are expired. The active manpage will furnish
|
||
more details about other paramaters.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>history: This file, again, contains one line per article, mapping
|
||
message-id to newsgroup name and also giving its associated article
|
||
number in that newsgroup. It is updated each time a feed is digested and
|
||
when doexpire is run. Plays a key role in loop-detection and serves as an
|
||
article database. Read manpage of newsdb, doexpire for the file format
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsgroups: It has a one-line description for each newsgroup explaining
|
||
what kind of posts go into each of them. Ideally speaking, it should
|
||
cover all the newsgroups found in the active file.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Miscellaneous files: Files like mailname, organisation, whoami contain
|
||
information required for forming some of the headers of an article. The
|
||
contents of mailname form the From: header and that of organisation form
|
||
the Organisation: header. whoami contains the name of the news system.
|
||
Refer to chapter 2.1 of guide.ps for a detailed list of files in the
|
||
$NEWSCTL area. Read RFC 1036 for description of article headers .
|
||
|
||
|
||
-----------------------------------------------------------------------------
|
||
9.2. /var/spool/news: the article repository
|
||
|
||
This is also known as the $NEWSARTS or $NEWSSPOOL directory. This is where
|
||
the articles reside on your disk. No binaries or control files should belong
|
||
here. Enough space should be allocated to this directory as the number of
|
||
articles keep increasing with each batch that is digested. An explanation of
|
||
the following sub-directories will give you an overview of this directory:
|
||
|
||
<EFBFBD><EFBFBD>*<2A>in.coming: Feeds/batches/articles from NDNs on their arrival and before
|
||
being processed reside in this directory. After processing, they appear
|
||
in $NEWSARTS or in its bad sub-directory if there were errors.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>out.going: This directory contains batches/feeds to be sent to your NDNs
|
||
i.e. feeds to be pushed to your neighbouring sites reside here before
|
||
they are transmitted. It contains one sub-directory per NDN mentioned in
|
||
the sys file. These sub-directories contain files called togo which
|
||
contain information about the article like the message-id or the article
|
||
number that is queued for transmission.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>>newsgroup directories: For each newsgroup hierarchy that the news server
|
||
has subscribed to, a directory is created under $NEWSARTS. Further
|
||
sub-directories are created under the parent to hold articles of specific
|
||
newsgroups. For instance, for a newsgroup like comp.music.compose, the
|
||
parent directory comp will appear in $NEWSARTS and a sub-directory called
|
||
music will be created under comp. The music sub-directory shall contain a
|
||
further sub-directory called compose and all articles of
|
||
comp.music.compose shall reside here. In effect, article 242 of newsgroup
|
||
comp.music.compose shall map to file $NEWSARTS/comp/music/compose/242.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>control: The control directory houses only the control messages that have
|
||
been received by this site. The control messages could be any of the
|
||
following: newgroup, rmgroup, checkgroup and cancel appearing in the
|
||
subject line of the article. More information to be found in "Section 2.4
|
||
>"
|
||
|
||
<EFBFBD><EFBFBD>*<2A>junk: The junk directory contains all articles that the news server has
|
||
received and has decided, after processing, that it does not belong to
|
||
any of the hierarchies it has subscribed to. The news server transfers/
|
||
passes all articles in this directory to NDNs that have subscribed to the
|
||
junk hierarchy.
|
||
|
||
|
||
-----------------------------------------------------------------------------
|
||
9.3. /usr/lib/newsbin: the executables
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
9.4. crontab and cron jobs
|
||
|
||
The heart of the Usenet news server is the various scripts that run at
|
||
regular intervals processing articles, digesting/rejecting them and
|
||
transmitting them to NDNs. I shall try to enumerate the ones that are
|
||
important enough to be cronned. :)
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsrun: The key script. This script picks the batches in the in.coming
|
||
directory, uncompresses them if necessary and feeds it to relaynews which
|
||
then processes each article digesting and batching them and logging any
|
||
errors. This script needs to run through cron as frequently as you want
|
||
the feeds to be digested. Every half hour should suffice for a
|
||
non-critical requirement.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>sendbatches: This script is run to transmit the togo files formed in the
|
||
out.going directory to your NDNs. It reads the batchparms file to know
|
||
exactly how and to whom the batches need to be transmitted. The
|
||
frequency, again, can be set according to your requirements. Once an hour
|
||
should be sufficient.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsdaily: This script does maintenance chores like rolling logs and
|
||
saving them, reporting errors/anomalies and doing cleanup jobs. It should
|
||
typically run once a day.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newswatch: This looks for news problems at a more detailed level than
|
||
newsdaily like looking for persistent lock files or unattended batches,
|
||
determining space shortage issues, and the likes. This should typically
|
||
run once every hour. For more on this and the above, read the newsmaint
|
||
manpage.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>doexpire: This script expires old articles as determined by the control
|
||
file explist and updates the active file. This is necessary if you do not
|
||
want unnecessary/unwanted articels hogging up your disk space. Run it
|
||
once a day. Manpage: expire
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsrunning off/on: This script shuts/starts off the news server for you.
|
||
You could choose to add this in your cron job if you think the news
|
||
server takes up lots of CPU time during peak hours and you wish to keep a
|
||
check on it.
|
||
|
||
|
||
-----------------------------------------------------------------------------
|
||
9.5. newsrun and relaynews: digesting received articles
|
||
|
||
The heart and soul of the Usenet News system, newsrun just picks up the
|
||
batches/ articles in the in.coming directory of $NEWSARTS and uncompresses
|
||
them (if required) and calls relaynews. It should run from cron.
|
||
|
||
relaynews picks up each article one by one through stdin, determines if it
|
||
belongs to a subscribed group by looking up sys file, looks in the history
|
||
file to determine that it does not already exist locally, digests it updating
|
||
the active and history file and batches it for neighbouring sites. Logs
|
||
errors on encountering problems while processing the article and takes
|
||
appropriate action if it happens to be a control message. More info in
|
||
manpage of relaynews.
|
||
-----------------------------------------------------------------------------
|
||
|
||
9.6. doexpire and expire: removing old articles
|
||
|
||
A good way to get rid of unwanted/old articles from the $NEWSARTS area is to
|
||
run doexpire once a day. It reads the explist file from the $NEWSCTL
|
||
directory to determine what articles expire today. It can archive the said
|
||
article if so configured. It then updates the active and the history file
|
||
accordingly. If you wish to retain the article entry in the history file to
|
||
avoid re-digesting it as a new article after having expired it, add a special
|
||
/expired/; line in the control file. More on the options and functioning in
|
||
the expire manpage.
|
||
-----------------------------------------------------------------------------
|
||
|
||
9.7. nntpd and msgidd: managing the NNTP interface
|
||
|
||
As has already been discussed in the chapter on setting up the software,
|
||
nntpd is a TCP-based server daemon which runs under inetd. It is fired by
|
||
inetd whenever there's an incoming connection on the NNTP port, and it takes
|
||
over the dialogue from there. It reads the C-News configuration and data
|
||
files in $NEWSCTL, article files from $NEWSARTS>, and receives incoming posts
|
||
and transfers. These it dutifully queues in $NEWSARTS/in.coming, either as
|
||
batch files or single article files.
|
||
|
||
It is important that inetd be configured to fire nntpd as user news, not as
|
||
root like it does for other daemons like telnetd or ftpd. If this is not done
|
||
correctly, a lot of problems can be caused in the functioning of the C-News
|
||
system later.
|
||
|
||
nntpd is fired each time a new NNTP connection is received, and dies once the
|
||
NNTP client closes its connection. Thus, if one nntpd receives a few articles
|
||
by an incoming batch feed (not a POST but an XFER), then another nntpd will
|
||
not know about the receipt of these articles till the batches are digested.
|
||
This will hamper duplicate newsfeed detection if there are multiple upstream
|
||
NDNs feeding our server with the same set of articles over NNTP. To fix this,
|
||
nntpd uses an ally: msgidd, the message ID daemon. This daemon is fired once
|
||
at server bootup time through newsboot, and keeps running quietly in the
|
||
background, listening on a named Unix socket in the $NEWSCTL area. It keeps
|
||
in its memory a list of all message IDs which various incarnations of nntpd
|
||
have asked it to remember.
|
||
|
||
Thus, when one copy of nntpd receives an incoming feed of news articles, it
|
||
updates msgidd with the message IDs of these messages through the Unix
|
||
socket. When another copy of nntpd is fired later and the NNTP client tries
|
||
to feed it some more articles, the nntpd checks each message ID against
|
||
msgidd. Since msgidd stores all these IDs in memory, the lookup is very fast,
|
||
and duplicate articles are blocked at the NNTP interface itself.
|
||
|
||
On a running system, expect to see one instance of nntpd for each active NNTP
|
||
connection, and just one instance of msgidd running quietly in the
|
||
background, hardly consuming any CPU resources. Our nntpd is configured to
|
||
die if the NNTP connection is more than a few minutes idle, thus conserving
|
||
server resources. This does not inconvenience the user because modern NNTP
|
||
clients simply re-connect. If an nntpd instance is found to be running for
|
||
days, it is either hung due to a network error, or is receiving a very long
|
||
incoming NNTP feed from your upstream server. We used to receive our primary
|
||
incoming feed from our service provider through NNTP sessions lasting 18 to
|
||
20 hours without a break, every day.
|
||
-----------------------------------------------------------------------------
|
||
|
||
9.8. nov, the News Overview system
|
||
|
||
NOV, the News Overview System is a recent augmentation to the C-News and NNTP
|
||
systems and to the NNTP protocol. This subsystem maintains a file for each
|
||
active newsgroup, in which it maintains one line per current article. This
|
||
line of text contains some key meta-data about the article, e.g. the contents
|
||
of the From, Subject, Date and the article size and message ID. This speeds
|
||
up NNTP response enormously. The nov library has been integrated into the
|
||
nntpd code, and into key binaries of C-News, thus providing seamless
|
||
maintenance of the News Overview database when articles are added or deleted
|
||
from the repository.
|
||
|
||
When newsrun adds an article into starcom.test, it also updates $NEWSARTS/
|
||
starcom/test/.overview and adds a line with the relevant data, tab-separated,
|
||
into it. When nntpd comes to life with an NNTP client, and it sees the XOVER
|
||
NNTP command, it reads this .overview file, and returns the relevant lines to
|
||
the NNTP client. When expire deletes an article, it also removes the
|
||
corresponding line from the .overview file. Thus, the maintenance of the NOV
|
||
database is seamless.
|
||
-----------------------------------------------------------------------------
|
||
|
||
9.9. Batching feeds with UUCP and NNTP
|
||
|
||
Some information about batching feeds has been provided in earlier sections.
|
||
More will be added later here in this document.
|
||
-----------------------------------------------------------------------------
|
||
|
||
10. Monitoring and administration
|
||
|
||
Once the Usenet News system is in place and running, the news administrator
|
||
is then aided in monitoring the system by various reports generated by it.
|
||
Also, he needs to make regular checks in specific directories and files to
|
||
ascertain the smooth working of the system.
|
||
-----------------------------------------------------------------------------
|
||
|
||
10.1. The newsdaily report
|
||
|
||
This report is generated by the script newsdaily which is typically run
|
||
through cron. I shall enumerate some of the problems that are reported by it,
|
||
based on my observations .
|
||
|
||
<EFBFBD><EFBFBD>*<2A>bad input batches: This gives a list of articles that have been declared
|
||
bad after processing and hence have not been digested. The reason for
|
||
this is not given. You are expected to check each article and determine
|
||
the cause.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>leading unknown newsgroups by articles: Newsgroup names that do not
|
||
appear in the active file but their hierarchy has been subscribed to,
|
||
would find their names mentioned under this heading. Choose to add the
|
||
name in the active file if you think it is important. For e.g., you would
|
||
see this happen if you have subscribed to the hierarchy comp but the
|
||
active does not contain the newsgroup name comp.lang.java.3d. You could
|
||
deny subscription to this particular newsgroup by specifying so in the
|
||
sys file.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>leading unsubscribed newsgroups: If the news server receives maximum
|
||
articles of a particular newsgroup hierarchy to which you haven't
|
||
subscribed, it will appear under this heading. You really cannot do much
|
||
about this except to subscribe to them if they are required.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>leading sites sending bad headers: This will list your NDNs who are
|
||
sending articles with malformed/insufficient headers.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>leading sites sending stale/future/misdated news: This will list your
|
||
NDNs who are sending you articles that are older than the date you have
|
||
specified for accepting feeds.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Some of the reports generated by us: We have modified the newsdaily
|
||
script to include some more statistics.
|
||
|
||
<20><>+<2B>disk usage: This reports the size in bytes of the $NEWSARTS area. If
|
||
you are receiving feeds regularly, you should see this figure
|
||
increasing.
|
||
|
||
<20><>+<2B>incoming feed statistics: This reports the number of articles and
|
||
total bytes recevied from each of your NDNs.
|
||
|
||
<20><>+<2B>NNTP traffic report: The output of nestor has also been included in
|
||
this report which gives details of each nntp connection and the
|
||
overall performance of the network connection read from the newslog
|
||
file. To understand the format, read the manpage of nestor.
|
||
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Error reporting from the errorlog file: Reports errors logged in the
|
||
errorlog file. Usually these are file ownership or file missing problems
|
||
which can be easily handled.
|
||
|
||
|
||
-----------------------------------------------------------------------------
|
||
10.2. Crisis reports from newswatch
|
||
|
||
Most of the problems reported to me are those with either space shortage or
|
||
persistent locks. There are instances when the scripts have created locks
|
||
files and have aborted/terminated without removing them. Sometimes they are
|
||
innocuous enough to be deleted but this should be determined after a careful
|
||
analysis. They could be an indication of some part of the system not working
|
||
correctly. For e.g. I would receive this error message when sendbatches would
|
||
abnormally terminate trying to transmit huge togo files. I had to determine
|
||
why sendbatches was failing this often.
|
||
|
||
The space shortage issue has to be addressed immediately. You could delete
|
||
unwanted articles by running doexpire or add more disk space at the OS level.
|
||
-----------------------------------------------------------------------------
|
||
|
||
10.3. Disk space
|
||
|
||
The $NEWSBIN area occupies space that is fixed. Since the binaries do not
|
||
grow once installed, you do not have to worry about disk shortage here. The
|
||
areas that take up more space as feeds come in are $NEWSCTL and $NEWSARTS.
|
||
The $NEWSCTL has log files that keep growing with each feed. As the articles
|
||
are digested in huge numbers, the $NEWSARTS area continues to grow. Also, you
|
||
will need space if you have chosen to archive articles on expiry. Allocate a
|
||
few GB of disk space for $NEWSARTS depending on the number of hierarchies you
|
||
are subscribing and the feeds that come in everyday. $NEWSCTL grows to a
|
||
lesser proportion as compared to $NEWSARTS. Allocate space for this
|
||
accordingly.
|
||
-----------------------------------------------------------------------------
|
||
|
||
10.4. CPU load and RAM usage
|
||
|
||
With modern C-News and NNTPd, there is very little usage of these system
|
||
resources for processing news article flow. Key components like newsrun or
|
||
sendbatches do not load the system much, except for cases where you have a
|
||
very heavy flow of compressed outgoing batches and the compression utility is
|
||
run by sendbatches frequently. newsrun is amazingly efficient in the current
|
||
C-News release. Even when it takes half an hour to digest a large consignment
|
||
of batches, it hardly loads the CPU of a slow Pentium 200 MHz CPU or consumes
|
||
much RAM in a 64 MB system.
|
||
|
||
One thing which does slow down a system is a large bunch of users connecting
|
||
using NNTP to browse newsgroups. We do not have heuristic based figures
|
||
off-hand to provide a guidance figure for resource consumption for this, but
|
||
we have found that the load on the CPU and RAM for a certain number of active
|
||
users invoking nntpd is more than with an equal number of users connecting to
|
||
the POP3 port of the same system for pulling out mailboxes. A few hundred
|
||
active NNTP users can really slow down a dual-P-III Intel Linux server, for
|
||
instance. This loading has no bearing on whether you are using INN or nntpd;
|
||
both have practically identical implementations for NNTP reading and differ
|
||
only in their handling of feeds.
|
||
|
||
Another situation which will slow down your Usenet news server is when
|
||
downstream servers connect to you for pulling out NNTP feeds using the pull
|
||
method. This has been mentioned before. This can really load your server's I/
|
||
O system and CPU.
|
||
-----------------------------------------------------------------------------
|
||
|
||
10.5. The in.coming/bad directory
|
||
|
||
The in.coming directory is where the batches/articles reside when you have
|
||
received feeds from your NDN and before processing happens. Checking this
|
||
directory regularly to see if there are batches is a good way of determining
|
||
that feeds are coming in. The batches and articles have different
|
||
nomenclature. Batches, typically, have names like nntp.GxhsDj and individual
|
||
articles are named beginning with digits like 0.10022643380.t
|
||
|
||
The bad sub-directory under in.coming holds batches/articles that have
|
||
encountered errors when they were being processed by relaynews. You will have
|
||
to look at the individual files in this directory to determine the cause .
|
||
Ideally speaking, this directory should be empty.
|
||
-----------------------------------------------------------------------------
|
||
|
||
10.6. Long pending queues in out.going
|
||
|
||
TO BE ADDED.
|
||
-----------------------------------------------------------------------------
|
||
|
||
10.7. Problems with nntpxmit and nntpsend
|
||
|
||
TO BE ADDED.
|
||
-----------------------------------------------------------------------------
|
||
|
||
10.8. The junk and control groups
|
||
|
||
Control messages are those that have a newgroup/rmgroup/cancel/checkgroup in
|
||
their subject line. Such messages result in relaynews calling the appropriate
|
||
script and on execution a message is mailed to the admin about the action
|
||
taken. These control messages are stored in the control directory of
|
||
$NEWSARTS. For the propogation of such messages, one must subscribe to the
|
||
control hierarchy.
|
||
|
||
When your news system determines that a certain article has not been
|
||
subscribed by you, it is "junked" i.e. such articles appear in the junk
|
||
directory. This directory plays a key role in transferring articles to your
|
||
NDNs as they would subscribe to the junk hierarchy to receive feeds. If you
|
||
are a leaf node, there is no reason why articles should pile here. Keep
|
||
deleting them on a daily basis.
|
||
-----------------------------------------------------------------------------
|
||
|
||
11. Usenet news clients
|
||
|
||
This HOWTO was written to allow a Linux system administrator provide the
|
||
Usenet news service to readers of those articles. The rest of this HOWTO
|
||
focuses on the server-end software and systems, but one chapter dedicated to
|
||
the clients does not seem disproportionate, considering that the raison
|
||
d'etre of Usenet news servers is to serve these clients.
|
||
|
||
The overwhelming majority of clients are software programs which access the
|
||
article database, either by reading /var/spool/news on a Unix system or over
|
||
NNTP, and allow their human users to read and post articles. We can therefore
|
||
probably term this class of programs UUA, for Usenet User Agents, along the
|
||
lines of MUA for Mail User Agents.
|
||
|
||
There are other special-purpose clients, which either pull out articles to
|
||
copy or transfer somewhere else, or for analysis, e.g. a search engine which
|
||
allows you to search a Usenet article archive, like Google (www.google.com)
|
||
does.
|
||
|
||
This chapter will discuss issues in UUA software design, and bring out
|
||
essential features and efficiency and management issues. What this chapter
|
||
will certainly never attempt to do is catalogue all the different UUA
|
||
programs available in the world --- that is best left to specialised
|
||
catalogues on the Internet.
|
||
|
||
This chapter will also briefly cover special-purpose clients which transfer
|
||
articles or do other special-purpose things with them.
|
||
-----------------------------------------------------------------------------
|
||
|
||
11.1. Usenet User Agents
|
||
|
||
11.1.1. Accessing articles: NNTP or spool area?
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
11.1.2. Threading
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
11.1.3. Quick reading features
|
||
|
||
TO BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
11.2. Clients that transfer articles
|
||
|
||
We will discuss Suck and nntpxfer from the NNTP server distribution here.
|
||
Suck has already discussed earlier. We will be happy to take contributed
|
||
additions that discuss other client software.
|
||
-----------------------------------------------------------------------------
|
||
|
||
11.3. Special clients
|
||
|
||
11.3.1. NNTPCache
|
||
|
||
NNTPCache is an interesting transparent cacheing proxy for news articles.
|
||
News articles are read-only by definition, i.e. they do not change once they
|
||
are posted; they can only be deleted. NNTPCache uses this feature to build a
|
||
local cache of news articles.
|
||
|
||
You set up NNTPCache to listen on the NNTP port of your local Unix server,
|
||
and act like an NNTP daemon. You configure it to connect back-to-back to
|
||
another NNTP daemon, further away, which has all the interesting stuff the
|
||
users want to read. When a user connects to the local NNTPCache, it connects
|
||
to the remote NNTP server and acts as a relay for the NNTP connection,
|
||
ferrying commands and responses back and forth. What the user sees therefore
|
||
comes from the remote server, the first time. However, all news articles
|
||
fetched by NNTPCache are also stored in a local cache, thus allowing the next
|
||
user to browse the same set of articles faster. Like all demand-driven
|
||
caches, the advantage here is that the local NNTPCache does not need (much)
|
||
administering, and will automatically delete all articles from its cache once
|
||
they've been lying unread long enough.
|
||
|
||
We list it here as an NNTP client because every proxy server is a server on
|
||
one side and a client on the other.
|
||
-----------------------------------------------------------------------------
|
||
|
||
12. Our perspective
|
||
|
||
This chapter has been added to allow us to share our perspective on certain
|
||
technical choices. Certain issues which are more a matter of opinion than
|
||
detail, are discussed here.
|
||
-----------------------------------------------------------------------------
|
||
|
||
12.1. Efficiency issues of NNTP
|
||
|
||
To understand why NNTP is often an inappropriate choice for newsfeeds, we
|
||
need to understand TCP's sliding window protocol and the nature of NNTP. NNTP
|
||
is an apalling waste of bandwidth for most bulk article transfer situations,
|
||
because of the following simple reasons:
|
||
|
||
<EFBFBD><EFBFBD>*<2A>No compression: articles are transferred in plain text.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>No article transmission restart: if a connection breaks halfway through
|
||
an article, the next round will have to start with the beginning of the
|
||
article.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>Ping-pong protocol: NNTP is unsuitable for bulk streaming data transfer
|
||
because the TCP sliding window feature is unusable with NNTP.
|
||
|
||
|
||
What is a ping-pong protocol? TCP uses a sliding window mechanism to pump out
|
||
data in one direction very rapidly, and can achieve near wire speeds under
|
||
most circumstances. However, this only works if the application layer
|
||
protocol can aggregate a large amount of data and pump it out without having
|
||
to stop every so often, waiting for an ack or a response from the other end's
|
||
application layer. This is precisely why sending one file of 100 Mbytes by
|
||
FTP takes so much less clock time than 10,000 files of 10 Kbytes each, all
|
||
other parameters remaining unchanged. The trick is to keep the sliding window
|
||
sliding smoothly over the outgoing data, blasting packets out as fast as the
|
||
wire will carry it, without ever allowing the window to empty out while you
|
||
wait for an ack. Protocols which require short bursts of data from either end
|
||
constantly, e.g. in the case of remote procedure calls, are called ``ping
|
||
pong protocols'' because they remind you of a table-tennis ball.
|
||
|
||
With NNTP, this is precisely the problem. The average size of Usenet news
|
||
messages, including header and body, is 3 Kbytes. When thousands of such
|
||
articles are sent out by NNTP, the sending server has to send the message ID
|
||
of the first article, then wait for the receiving server to respond with a
|
||
``yes'' or ``no.'' Once the sending server gets the ``yes'', it sends out
|
||
that article, and waits for an ``ok'' from the receiving server. Then it
|
||
sends out the message ID of the second article, and waits for another ``yes''
|
||
or ``no.'' And so on. The TCP sliding window never gets to do its job.
|
||
|
||
This sub-optimal use of TCP's data pumping ability, coupled with the absence
|
||
of compression, make for a protocol which is great for synchronous
|
||
connectivity, e.g. for news reading or real-time updates, but very poor for
|
||
batched transfer of data which can be delayed and pumped out. All these are
|
||
precisely reversed in the case of UUCP over TCP.
|
||
|
||
To decide which protocol, UUCP over TCP or NNTP, is appropriate for your
|
||
server, you must address two questions:
|
||
|
||
1. How much time can your server afford to wait from the time your upstream
|
||
server receives an article to the time it passes it on to you?
|
||
|
||
2. Are you receiving the same set of hierarchies from multiple next-door
|
||
neighbour servers, i.e. is your newsfeed flow pattern a mesh instead of a
|
||
tree?
|
||
|
||
|
||
If your answers to the two questions above are ``messages cannot wait'' and
|
||
``we operate in a mesh'', then NNTP is the correct protocol for your server
|
||
to receive its primary feed(s).
|
||
|
||
In most cases, carrier-class servers operated by major service providers do
|
||
not want to accept even a minute's delay from the time they receive an
|
||
article to the time they retransmit it out. They also operate in a mesh with
|
||
other servers operated by their own organisations (e.g. for redundancy) or
|
||
others. They usually sit very close to the Internet backbone, i.e. with Tier
|
||
1 ISPs, and have extremely fast Internet links, usually more than 10 Mbits/
|
||
sec. The amount of data that flows out of such servers in outgoing feeds is
|
||
more than the amount that comes in, because each incoming article is
|
||
retained, not for local consumption, but for retransmission to others lower
|
||
down in the flow. And these servers boast of a retransmission latency of less
|
||
than 30 seconds, i.e. I will retransmit an article to you within 30 seconds
|
||
of my having received it.
|
||
|
||
However, if your server is used by a company for making Usenet news available
|
||
for its employees, or by an institute to make the service available for its
|
||
students and teachers, then you are not operating your server in a mesh
|
||
pattern, nor do you mind it if messages take a few hours to reach you from
|
||
your upstream neighbour.
|
||
|
||
In that case, you have enormous bandwidth to conserve by moving to UUCP. Even
|
||
if, in this Internet-dominated era, you have no one to supply you with a
|
||
newsfeed using dialup point-to-point links, you can pick up a compressed
|
||
batched newsfeed using UUCP over TCP, over the Internet.
|
||
|
||
In this context, we want to mention Taylor UUCP, an excellent UUCP
|
||
implementation available under GNU GPL. We use this UUCP implementation in
|
||
preference to the bundled UUCP systems offered by commercial Unix vendors
|
||
even for dialup connections, because it is far more stable, high performance,
|
||
and always supports file transfer restart. Over TCP/IP, Taylor is the only
|
||
one we have tried, and we have no wish to try any others.
|
||
|
||
Apart from its robustness, Taylor UUCP has one invaluable feature critical to
|
||
large Usenet batch transfers: file transfer restart. If it is transferring a
|
||
10 MB batch, and the connection breaks after 8 MB, it will restart precisely
|
||
where it left off last time. Therefore, no bytes of bandwidth are wasted, and
|
||
queues never get stuck forever.
|
||
|
||
Over NNTP, since there is no batching, transfers happen one article at a
|
||
time. Considering the (relatively) small size of an article compared to
|
||
multi-megabyte UUCP batches, one would expect that an article would never
|
||
pose a major problem while being transported; if it can't be pushed across in
|
||
one attempt, it'll surely be copied the next time. However, we have
|
||
experienced entire NNTP feeds getting stuck for days on end because of one
|
||
article, with logs showing the same article breaking the connection over and
|
||
over again while being transferred [1]. Some rare articles can be more than a
|
||
megabyte in size, particularly in comp.binaries. In each such incident, we
|
||
have had to manually edit the queue file on the transmitting server and
|
||
remove the offending article from the head of the queue. Taylor UUCP, on the
|
||
other hand, has never given us a single hiccup with blocked queues.
|
||
|
||
We feel that the overwhelming majority of servers offering the Usenet news
|
||
service are at the leaf nodes of the Usenet news flow, not at the heart.
|
||
These servers are usually connected in a tree, with each server having one
|
||
upstream ``parent node'', and multiple downstream ``child nodes.'' These
|
||
servers receive their bulk incoming feed from their upstream server, and
|
||
their users can tolerate a delay of a few hours for articles to move in and
|
||
out. If your server is in this class, we feel you should consider using UUCP
|
||
over TCP and transfer compressed batches. This will minimise bandwidth usage,
|
||
and if you operate using dialup Internet connections, it will directly reduce
|
||
your expenses.
|
||
|
||
A word about the link between mesh-patterned newsfeed flow and the need to
|
||
use NNTP. If your server is receiving primary --- as against trickle ---
|
||
feeds from multiple next-door neighbours, then you have to use NNTP to
|
||
receive these feeds. The reason lies in the way UUCP batches are accepted.
|
||
UUCP batches are received in their entirety into your server, and then they
|
||
are uncompressed and processed. When the sending server is giving you the
|
||
batch, it is not getting a chance to go through the batch article by article
|
||
and ask your server whether you have or don't have each article. This way, if
|
||
multiple servers give you large feeds for the same hierarchies, then you will
|
||
be bound to receive multiple copies of each article if you go the UUCP way.
|
||
All the gains of compressed batches will then be neutralised. NNTP's IHAVE
|
||
and SENDME dialogue in effect permits precisely this double-check for each
|
||
article, and thus you don't receive even a single article twice.
|
||
|
||
For Usenet servers which connect to the Internet periodically using dialup
|
||
connections to fetch news, the UUCP option is especially important. Their
|
||
primary incoming newsfeed cannot be pushed into them using queued NNTP feeds
|
||
for reasons described in the above paragraph These hapless servers are
|
||
usually forced to pull out their articles using a pull NNTP feed, which is
|
||
often very slow. This may lead to long connect times, repeat attempts after
|
||
every line break, and high Internet connection charges.
|
||
|
||
On the other hand, we have been using UUCP over TCP and gzip'd batches for
|
||
more than five years now in a variety of sites. Even today, a full feed of
|
||
all eight standard hierarchies, plus the full microsoft, gnu and netscape
|
||
hierarchies, minus alt and comp.binaries, can comfortably be handled in just
|
||
a few hours of connect time every night, dialing up to the Internet at 33.6
|
||
or 56 Kbits/sec. We believe that the proverbial `full feed' with all
|
||
hierarchies including alt can be handled comfortably with a 24-hour link at
|
||
56 Kbits/sec, provided you forget about NNTP feeds. We usually get
|
||
compression ratios of 4:1 using gzip -9 on our news batches, incidentally.
|
||
-----------------------------------------------------------------------------
|
||
|
||
12.2. C-News+NNTPd or INN?
|
||
|
||
INN and CNews are the two most popular free software implementations of
|
||
Usenet news. Of these two, we prefer CNews, primarily because we have been
|
||
using it across a very large range of Unixen for more than one decade,
|
||
starting from its earliest release --- the so-called ``Shellscript release''
|
||
--- and we have yet to see a need to change.[2]
|
||
|
||
We have seen INN, and we are not comfortable with a software implementation
|
||
which puts in so much of functionality inside one executable. This reminds us
|
||
of Windows NT, Netscape Communicator, and other complex and monolithic
|
||
systems, which make us uncomfortable with their opaqueness. We feel that
|
||
CNews' architecture, which comprises many small programs, intuitively fits
|
||
into the Unix approach of building large and complex systems, where each
|
||
piece can be understood, debugged, and if needed, replaced, individually.
|
||
|
||
Secondly, we seem to see the move towards INN accompanied by a move towards
|
||
NNTP as a primary newsfeed mechanism. This is no fault of INN; we suspect it
|
||
is a sort of cultural difference between INN users and CNews users. We find
|
||
the issue of UUCP versus NNTP for batched newsfeeds a far more serious issue
|
||
than the choice of CNews versus INN. We simply cannot agree with the idea
|
||
that NNTP is an appropriate protocol for bulk Usenet feeds for most sites.
|
||
Unfortunately, we seem to find that most sites which are more comfortable
|
||
using INN seem to also prefer NNTP over UUCP, for reasons not clear to us.
|
||
|
||
Our comments should not be taken as expressing any reservation about INN's
|
||
quality or robustness. Its popularity is testimony to its quality; it most
|
||
certainly ``gets the job done'' as well as anything else. In addition, there
|
||
are a large number of commercial Usenet news server implementations which
|
||
have started with the INN code; we do not know of any which have started with
|
||
the CNews code. The Netwinsite DNews system and the Cyclone Typhoon, we
|
||
suspect, both are INN-spired.
|
||
|
||
We will recommend CNews and NNTPd over INN, because we are more comfortable
|
||
with the CNews architecture for reasons given above, and we do not run
|
||
carrier-class sites. We will continue to support, maintain and extend this
|
||
software base, at least for Linux. And we see no reason for the overwhelming
|
||
majority of Usenet sites to be forced to use anything else. Your viewpoints
|
||
welcome.
|
||
|
||
Had we been setting up and managing carrier-class sites with their
|
||
near-real-time throughput requirements, we would probably not have chosen
|
||
CNews. And for those situations, our opinion of NNTP versus compressed UUCP
|
||
has been discussed in Section 12.1>
|
||
|
||
Suck and Leafnode have their place in the range of options, where they appear
|
||
to be attractive for novices who are intimidated by the ``full blown''
|
||
appearance of CNews+NNTPd or INN. However, we run CNews + NNTPd even on Linux
|
||
laptops. We suspect INN can be used this way too. We do not find these ``full
|
||
blown'' implementations any more resource hungry than their simpler cousins.
|
||
Therefore, other than administration and configuration familiarity, we don't
|
||
see any other reason why even a solitary end-user will choose Leafnode or
|
||
Suck over CNews+NNTPd. As always, contrary opinions invited.
|
||
-----------------------------------------------------------------------------
|
||
|
||
13. Usenet software: a historical perspective
|
||
|
||
This section comprises excerpts from a well-known Usenet Periodic Posting
|
||
document which was last changed in Feb 1998. Our copy of that old document
|
||
was picked up from
|
||
|
||
ftp://rtfm.mit.edu/pub/usenet-by-hierarchy/news/software/b/Usenet_Software:
|
||
_History_and_Sources
|
||
|
||
We suspect other copies will also be found elsewhere. The physical file on
|
||
the FTP server appears to have been touched last on 29 Dec 1999. The first
|
||
few lines of the archived file provide information about the origin of this
|
||
document and its authors:
|
||
Date: Tue, 28 Dec 1999 09:00:19 GMT
|
||
Supersedes: <FMMECL.58s@tac.nyc.ny.us>
|
||
Expires: Fri, 28 Jan 2000 09:00:19 GMT
|
||
Message-ID: <FnG10J.HAo@tac.nyc.ny.us>
|
||
From: netannounce@deshaw.com (Mark Moraes)
|
||
Subject: Usenet Software: History and Sources
|
||
Newsgroups: news.admin.misc,news.announce.newusers,news.software.readers,news.software.b,news.answers
|
||
Followup-To: news.newusers.questions
|
||
Approved: netannounce@deshaw.com (Mark Moraes)
|
||
|
||
Archive-name: usenet/software/part1
|
||
Original-from: spaf@cs.purdue.edu (Gene Spafford)
|
||
Comment: edited until 5/93 by spaf@cs.purdue.edu (Gene Spafford)
|
||
Last-change: 9 Feb 1998 by netannounce@deshaw.com (Mark Moraes)
|
||
Changes-posted-to: news.admin.misc,news.misc,news.software.readers,news.software.b,news.answers
|
||
|
||
We have been seeing this document as a periodic posting in
|
||
news.announce.newusers since the early nineties, and it has always been our
|
||
final reference on the history of Usenet server software. We reproduce
|
||
excerpts below, retaining the portions which discuss server software, and
|
||
removing discussions of client software, newsreaders, software for non-Unix
|
||
operating systems, etc. All quoted portions are reproduced unedited other
|
||
than changing FTP file paths to the modern URL format. We have added our
|
||
comments emphasised, in separate paragraphs. We feel the information captured
|
||
here is essential reading for anyone interested in Usenet server software.
|
||
|
||
If anyone can point us to a fresher version of this document, in case it is
|
||
still maintained, we will be happy to refer to that version instead of this
|
||
one, though we suspect the reader will not suffer due to the four-year gap;
|
||
most of the information reproduced below is historical anyway.
|
||
-----------------------------------------------------------------------------
|
||
|
||
13.1. The quoted excerpts
|
||
|
||
Currently, Usenet readers interact with the news using a number of software
|
||
packages and programs. This article mentions the important ones and a little
|
||
of their history, gives pointers where you can look for more information and
|
||
ends with some special notes about ``foreign'' and ``obsolete'' software. At
|
||
the very end is a list of sites from which current versions of the Usenet
|
||
software may be obtained.
|
||
|
||
...
|
||
-----------------------------------------------------------------------------
|
||
|
||
13.1.1. History
|
||
|
||
Usenet came into being in late 1979, shortly after the release of V7 Unix
|
||
with UUCP. Two Duke University grad students in North Carolina, Tom Truscott
|
||
and Jim Ellis, thought of hooking computers together to exchange information
|
||
with the Unix community. Steve Bellovin, a grad student at the University of
|
||
North Carolina, put together the first version of the news software using
|
||
shell scripts and installed it on the first two sites: unc and duke. At the
|
||
beginning of 1980 the network consisted of those two sites and phs (another
|
||
machine at Duke), and was described at the January Usenix conference. Steve
|
||
Bellovin later rewrote the scripts into C programs, but they were never
|
||
released beyond unc and duke. Shortly thereafter, Steve Daniel did another
|
||
implementation in C for public distribution. Tom Truscott made further
|
||
modifications, and this became the ``A'' news release.
|
||
|
||
In 1981 at U. C. Berkeley, grad student Mark Horton and high school student
|
||
Matt Glickman rewrote the news software to add functionality and to cope with
|
||
the ever increasing volume of news -- ``A'' News was intended for only a few
|
||
articles per group per day. This rewrite was the ``B'' News version. The
|
||
first public release was version 2.1 in 1982; the 1.* versions were all beta
|
||
test. As the net grew, the news software was expanded and modified. The last
|
||
version maintained and released primarily by Mark was 2.10.1.
|
||
|
||
Rick Adams, at the Center for Seismic Studies, took over coordination of the
|
||
maintenance and enhancement of the B News software with the 2.10.2 release in
|
||
1984. By this time, the increasing volume of news was becoming a concern, and
|
||
the mechanism for moderated groups was added to the software at 2.10.2.
|
||
Moderated groups were inspired by ARPA mailing lists and experience with
|
||
other bulletin board systems. In late 1986, version 2.11 of B News was
|
||
released, including a number of changes to support a new naming structure for
|
||
newsgroups, enhanced batching and compression, enhanced ihave/sendme control
|
||
messages, and other features.
|
||
|
||
The final release of B News was 2.11, patchlevel 19. B News has been declared
|
||
``dead'' by a number of people, including Rick Adams, and is unlikely to be
|
||
upgraded further; most Usenet sites are using C News or INN (see next
|
||
paragraphs).
|
||
|
||
In March 1986 a package was released implementing news transmission, posting,
|
||
and reading using the Network News Transfer Protocol (NNTP) (as specified in
|
||
RFC 977). This protocol allows hosts to exchange articles via TCP/IP
|
||
connections rather than using the traditional UUCP. It also permits users to
|
||
read and post news (using a modified news user agent) from machines which
|
||
cannot or choose not to install the Usenet news software. Reading and posting
|
||
are done using TCP/IP messages to a server host which does run the Usenet
|
||
software. Sites which have many workstations like the Sun and SGI, and HP
|
||
products find this a convenient way to allow workstation users to read news
|
||
without having to store articles on each system. Many of the Usenet hosts
|
||
that are also on the Internet exchange news articles using NNTP because the
|
||
load impact of NNTP is much lower than UUCP (and NNTP ensures much faster
|
||
propagation).
|
||
|
||
Our comments: This remark about relative loadings of UUCP and NNTP is no
|
||
longer applicable with faster machines and networks, and with hugely
|
||
increased traffic volumes. Today's desktop computers, let alone servers, can
|
||
all handle both NNTP and UUCP loads effortlessly, if traffic volumes can be
|
||
restricted. This is partly due to performance enhancements to UUCP as
|
||
embodied in Taylor UUCP, and partly due to vastly faster processors.
|
||
|
||
NNTP grew out of independent work in 1984-1985 by Brian Kantor at U. C. San
|
||
Diego and Phil Lapsley at U. C. Berkeley. Primary development was done at U.
|
||
C. Berkeley by Phil Lapsley with help from Erik Fair, Steven Grady, and Mike
|
||
Meyer, among others. The NNTP package (now called the reference
|
||
implementation) was distributed on the 4.3BSD release tape (although that was
|
||
version 1.2a and out-of-date) and is also available on many major hosts by
|
||
anonymous FTP. The current version is 1.5.12.2. It includes NOV (News
|
||
Overview -- see below) support and runs on a wide variety of systems. It is
|
||
available from ftp.academ.com:/pub/nntp1.5/nntp.1.5.12.2.tar.gz. For those
|
||
with access to the World-Wide Web on the Internet, the WWW page http://
|
||
www.academ.com/academ/nntp.html contains a description and news about NNTP. A
|
||
different variant, called nntp-t5, implements many of the extensions provided
|
||
by INN (including NOV support). It is available from ftp.uu.net:/networking/
|
||
news/nntp/nntp-t5.tar.gz.
|
||
|
||
One widely-used version of news, known as C News, was developed at the
|
||
University of Toronto by Geoff Collyer and Henry Spencer. This version is a
|
||
rewrite of the lowest levels of news to increase article processing speed,
|
||
decrease article expiration processing and improve the reliability of the
|
||
news system through better locking, etc. The package was released to the net
|
||
in the autumn of 1987. For more information, see the paper ``News Need Not Be
|
||
Slow,'' published in The Winter 1987 Usenix Technical Conference proceedings.
|
||
This paper is also available from ftp://ftp.cs.toronto.edu/doc/programming/
|
||
c-news.*, and is recommended reading for all news software programmers. The
|
||
most recent version of C News is the Sept 1994 ``Cleanup Release.'' C News
|
||
can be obtained by anonymous ftp from its official archive site,
|
||
ftp.cs.toronto.edu:pub/c-news/c-news.tar.Z.
|
||
|
||
Our comments: C News is no longer maintained by anyone that we know, other
|
||
than ourselves. However, after fixing the remaining bugs in the source, we
|
||
have not found the need for further maintenance. NNTPd from Brian Kantor and
|
||
Phil Lapsley is in the same state, but we are working on enhancements to the
|
||
source for access control and other functionality.
|
||
|
||
Another Usenet system, known as InterNetNews, or INN, was written by Rich
|
||
Salz (rsalz@uunet.uu.net). INN is designed to run on Unix hosts that have a
|
||
socket interface. It is optimized for larger hosts where most traffic uses
|
||
NNTP, but it does provide full UUCP support. INN is very fast, and since it
|
||
integrates NNTP many people find it easier to administer only one package.
|
||
The package was publicly released on August 20, 1992. For more information,
|
||
see the paper ``InterNetNews: Usenet Transport for Internet Sites'' published
|
||
in the June 1992 Usenix Technical Conference Proceedings. INN can be obtained
|
||
from many places, including the 4.4BSD tape; its official archive site is
|
||
ftp.uu.net in the directory /networking/news/nntp/inn. Rich's last official
|
||
release was 1.4sec in Dec 1993.
|
||
|
||
Our comments: The original paper by Rich Salz about INN, where he proposed
|
||
the design of an alternate Usenet server software, is a must-read for readers
|
||
interested in Usenet server software. So is the paper by C News authors,
|
||
cited before it. Most of the issues that Rich Salz had with C News, as stated
|
||
in his paper, were very relevant at that time. Today, with the current
|
||
version of NNTPd and the incorporation of the message ID daemon and NOV,
|
||
these issues are no longer relevant, and the choice of C News+NNTPd versus
|
||
INN is now based more on the level of maintenance of source code, familiarity
|
||
and personal preferences than on core design factors.
|
||
|
||
In June 1995, David Barr began a series of unoffical releases of INN based on
|
||
1.4sec, integrating various bug-fixes, enhancements and security patches. His
|
||
last release was 1.4unoff4, found in ftp://ftp.math.psu.edu:/pub/INN. This
|
||
site is also the home of contributed software for INN and other news
|
||
administration tools.
|
||
|
||
INN is now maintained by the Internet Software Consortium (inn@isc.org). The
|
||
official INN home is now http://www.isc.org/isc/ and the latest version
|
||
(1.7.2) can be obtained from ftp://ftp.isc.org/isc/inn/.
|
||
|
||
Our comments: The URL for the INN home page above is probably incorrect. Try
|
||
http://www.isc.org/products/INN/.
|
||
|
||
Towards the end of 1992, Geoff Collyer implemented NOV (News Overview): a
|
||
database that stores the important headers of all news articles as they
|
||
arrive. This is intended for use by the implementors of news readers to
|
||
provide fast article presentation by sorting and ``threading'' the article
|
||
headers. (Before NOV, newsreaders like trn, tin and nn came with their own
|
||
daemons and databases that used a nontrivial amount of system resources). NOV
|
||
is fully supported by C News, INN and NNTP-t5. Most modern news readers use
|
||
NOV to get information for their threading and article menu presentation; use
|
||
of NOV by a newsreader is fairly easy, since NOV comes with sample
|
||
client-side threading code.
|
||
|
||
...
|
||
|
||
Details on many other mail and news readers for MSDOS, Windows and OS/2
|
||
systems can be found in the FAQ posted to comp.os.msdos.mail-news.
|
||
<ftp://rtfm.mit.edu/pub/usenet/comp.os.msdos.mail-news/intro>
|
||
<ftp://rtfm.mit.edu/pub/usenet/comp.os.msdos.mail-news/software>
|
||
-----------------------------------------------------------------------------
|
||
|
||
13.1.2. Newsfeed management software
|
||
|
||
Gup, the Group Update Program is a Unix mail-server program that lets a
|
||
remote site change their newsgroups subscription on their news feed without
|
||
requiring the intervention of the news administrator at the feed site. Gup
|
||
operates with the INN (and likely the C News) batching mechanisms. The news
|
||
administrators at the remote sites simply mail commands to gup to make
|
||
changes to their own site's subscription list. The mail/interface is password
|
||
protected. Gup checks the requests for valid newsgroup names, patterns that
|
||
have no effect and so on. Gup's authors are Mark Delany (markd@mira.net.au)
|
||
and Andrew Herbert (andrew@mira.net.au). Its official FTP location is
|
||
ftp.mira.net.au:/unix/news/gup-0.4.tar.gz, but since that's not as well
|
||
connected as UUNET, people are strongly advised to obtain it from a mirror
|
||
site, e.g. ftp.uu.net:/networking/news/misc/gup-0.4.tar.gz.
|
||
|
||
dynafeed is a package from Looking Glass Software Limited that maintains a
|
||
.newsrc for every remote site and generates the batches for them. Remote
|
||
sites can use UUCP or run a program to change their .newsrc dynamically. It
|
||
comes with a program that the remote site can run to monitor readership in
|
||
newsgroups and dynamically update the feed list to match reader interest. The
|
||
goal of this is to get a feed that sends only exactly the groups currently
|
||
being read. dynafeed can be obtained from ftp://ftp.clarinet.com/sources/
|
||
dynafeed.tar.Z.
|
||
-----------------------------------------------------------------------------
|
||
|
||
13.1.3. News processing software
|
||
|
||
Software also exists to automatically archive Usenet newsgroups. The package
|
||
rkive, written by Kent Landfield (kent@sterling.com) can be configured to
|
||
archive news automatically based on different headers -- Archive-Name,
|
||
Volume-Issue, Chronological, Subject and External-Command to name a few. It
|
||
can be run in batch mode from the command line or from cron. It can also be
|
||
installed in the sys or newsfeeds file to process articles as they are
|
||
received. rkive supports local spool directories as well as NNTP based
|
||
access. rkive is available via FTP from ftp://ftp.sterling.com/rkive.
|
||
|
||
Newsclip is a programming language for writing news filtering programs, from
|
||
Looking Glass Software Limited, marketed by ClariNet Communications Corp. It
|
||
is C-like, and translates to C, so a C compiler is required. It has
|
||
data-types to represent the kinds of things found in article headers and
|
||
bodies. It can maintain databases of users, message-ids, patterns, subjects,
|
||
etc. These can be used to decide whether to ignore or select an article.
|
||
Newsclip can either operate as a standalone program or as part of rn. It is
|
||
free for non-commercial use and is available from ftp://ftp.clarinet.com/
|
||
sources/nc.tar.Z. Contact clari-info@clarinet.com with a subject line of
|
||
``newsclip'' for more info.
|
||
-----------------------------------------------------------------------------
|
||
|
||
13.1.4. Commercial software
|
||
|
||
DNEWS is a commercial product from NetWin. DNEWS licenses are provided free
|
||
to educational institutions for non profit use. With DNEWS, the news is
|
||
stored in a database so as not to overload the raw file system. DNEWS
|
||
supports 'sucking' where only groups which users read are pulled over from
|
||
the feeder site. DNEWS is currently known to run on VMS, Windows NT, Solaris,
|
||
SunOS, Unixware, HP/UX. DNEWS binaries are available by anonymous FTP from
|
||
ftp://ftp.std.com/ftp/vendors/netwin/dnews or from http://world.std.com/
|
||
~netwin/ DNEWS sources can be obtained on request, see the file source.txt in
|
||
the FTP area for more information.
|
||
|
||
Our comments: The information on DNEWS may be dated. We have been seeing
|
||
DNEWS on their own Website for quite a few years now. Check
|
||
www.netwinsite.com. Moreover, there are other commercial Usenet server
|
||
software systems available, including the one bundled with the Internet
|
||
Information Server of Microsoft Windows NT and the ones from iPlanet. And for
|
||
carrier class systems, there are many commercial Usenet routers available.
|
||
-----------------------------------------------------------------------------
|
||
|
||
13.1.5. Special note on ``notes'' and old versions of news
|
||
|
||
...
|
||
|
||
``B'' news software is currently considered obsolete. Unix sites joining the
|
||
Usenet should install C news or INN to ensure proper behavior and good
|
||
performance. Most old B news software had compiled-in limits on the number of
|
||
newsgroups and the number of articles per newsgroup; the increasing volume of
|
||
news means that B news software cannot reliably cope with a moderately-full
|
||
newsfeed.
|
||
-----------------------------------------------------------------------------
|
||
|
||
14. Documentation, information and further reading
|
||
|
||
This section fills in gaps which were hard to classify under any of the
|
||
previous chapters.
|
||
-----------------------------------------------------------------------------
|
||
|
||
14.1. The manpages
|
||
|
||
The following manpages are installed automatically when our integrated
|
||
software distribution is compiled and installed, listed here in no particular
|
||
order:
|
||
|
||
<EFBFBD><EFBFBD>*<2A>badexpiry: utility to look for articles with bad explicit Expiry headers
|
||
|
||
<EFBFBD><EFBFBD>*<2A>checkactive: utility to perform some sanity checks on the active file
|
||
|
||
<EFBFBD><EFBFBD>*<2A>cnewsdo: utility to perform some checks and then run C-News maintenance
|
||
commands
|
||
|
||
<EFBFBD><EFBFBD>*<2A>controlperm: configuration file for controlling responses to Usenet
|
||
control messages
|
||
|
||
<EFBFBD><EFBFBD>*<2A>expire: utility to expire old articles
|
||
|
||
<EFBFBD><EFBFBD>*<2A>explode: internal utility to convert a master batch file to ordinary
|
||
batch files
|
||
|
||
<EFBFBD><EFBFBD>*<2A>inews: the program which forms the entry point for fresh postings to be
|
||
injected into the Usenet system
|
||
|
||
<EFBFBD><EFBFBD>*<2A>mergeactive: utility to merge one site's newsgroups to another site's
|
||
active file
|
||
|
||
<EFBFBD><EFBFBD>*<2A>mkhistory: utility to rebuild news history file
|
||
|
||
<EFBFBD><EFBFBD>*<2A>news(5): description of Usenet news article file and batch file formats
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsaux: a collection of C-News utilities used by its own scripts and by
|
||
the Usenet news administrator for various maintenance purposes
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsbatch: covers all the utilities and programs which are part of the
|
||
news batching system of C-News
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsctl: describes the file formats and uses of all the files in $NEWSCTL
|
||
other than the two key files, sys and active
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsdb: describes the key files and directories for news articles,
|
||
including the structure of $NEWSARTS, the active file, the active.times
|
||
file, and the history file.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsflag: utility to change the flag or type column of a newsgroup in the
|
||
active file
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsmail: utility scripts used to send and receive newsfeeds by email.
|
||
This is different from a mail-to-news gateway, since this is for
|
||
communication between two Usenet news servers.
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsmaint: utility scripts used by Usenet administrator to manage and
|
||
maintain C-News system
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsoverview(5): file formats for the NOV database
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newsoverview(8): library functions of the NOV library and the utilities
|
||
which use them
|
||
|
||
<EFBFBD><EFBFBD>*<2A>newssys: the important sys file of C-News
|
||
|
||
<EFBFBD><EFBFBD>*<2A>relaynews: the relaynews program of C-News
|
||
|
||
<EFBFBD><EFBFBD>*<2A>report: utility to generate and send email reports of errors and events
|
||
from C-News scripts
|
||
|
||
<EFBFBD><EFBFBD>*<2A>rnews: receive news batches and queue them for processing
|
||
|
||
<EFBFBD><EFBFBD>*<2A>nntpd: The NNTP daemon
|
||
|
||
<EFBFBD><EFBFBD>*<2A>nntpxmit: The NNTP batch transmit program for outgoing push feeds
|
||
|
||
|
||
-----------------------------------------------------------------------------
|
||
14.2. Papers, documents, articles
|
||
|
||
There are certain documents and published conference papers which are a
|
||
must-read for Usenet server administrators, both for their historical value
|
||
and for the insight they give into Usenet server architecture in general. We
|
||
list our chart-toppers here.
|
||
-----------------------------------------------------------------------------
|
||
|
||
14.2.1. The Usenix paper on C News
|
||
|
||
This very interesting paper has been mentioned in the section titled "Usenet
|
||
software: a historical perspective>". It is titled ``News Need Not Be Slow'',
|
||
and is available from ftp://ftp.cs.toronto.edu/doc/programming/c-news.* or
|
||
from our Website (http://www.starcomsoftware.com/proj/usenet/doc/c-news.
|
||
{ps,pdf}).
|
||
|
||
It focuses on B News, analyses it for performance, and demonstrates how
|
||
specific changes in design and implementation can speed things up. It is
|
||
well-written, and is educative in many areas independent of Usenet news.
|
||
-----------------------------------------------------------------------------
|
||
|
||
14.2.2. The Usenix paper on INN
|
||
|
||
This paper talks about the things that C News didn't address, and takes
|
||
Usenet news processing into the world of pure Internet connectivity. Its
|
||
author is Rich Salz, the author of INN, and the paper is titled
|
||
``InterNetNews: Usenet Transport for Internet Sites.'' This can be picked up
|
||
from ftp://ftp.uu.net/networking/news/nntp/inn/inn.usenix.ps.Z or from our
|
||
Website (http://www.starcomsoftware.com/proj/usenet/doc/inn.usenix.{ps,pdf}),
|
||
uncompressed. Be warned: this PostScript file is probably missing some
|
||
mandatory first-line tag like %!PS-Adobe-1.0 and some PostScript processors
|
||
can have problems with it. For instance, on our Linux boxes, ghostview can
|
||
display it, but kghostview can't, which is very strange.
|
||
|
||
This paper analyses the world of Usenet servers with C News and NNTPd, in the
|
||
presence of multiple parallel feeds, and proceeds to build a case for a
|
||
powerful NNTP-optimised software architecture which will handle multiple
|
||
parallel incoming NNTP feeds efficiently. What later INN users appear to miss
|
||
sometimes when comparing C-News+NNTPd with INN, is that INN's strengths are
|
||
only in situations which its author had specifically targeted, i.e. multiple
|
||
parallel incoming NNTP feeds. There is no clear superiority of one system
|
||
over the other in any other situation.
|
||
-----------------------------------------------------------------------------
|
||
|
||
14.2.3. The C News guide
|
||
|
||
This document is part of the C-News source, and is available in the c-news/
|
||
doc directory of the source tree. The makefile here uses troff and the source
|
||
files to generate guide.ps. This C News Guide is a very well-written document
|
||
and provides an introduction to the functioning of C News.
|
||
-----------------------------------------------------------------------------
|
||
|
||
14.3. O'Reilly's books on Usenet news
|
||
|
||
O'Reilly and Associates had an excellent book that can form the foundations
|
||
for understanding C-News and Usenet news in general, titled ``Managing UUCP
|
||
and Usenet,'' dated 1992. This was considered a bit dated because it did not
|
||
cover INN or the Internet protocols.
|
||
|
||
They have subsequently published a more recent book, titled ``Managing
|
||
Usenet,'' written by Henry Spencer, the co-author of C-News, and David
|
||
Lawrence, one of the most respected Usenet veterans and administrators today.
|
||
The book was published in 1998 and includes both C-News and INN.
|
||
|
||
We have a distinct preference for books published by O'Reilly; we usually
|
||
find them the best books on their subjects. We make no attempts to hide this
|
||
bias. We recommend both books. In fact, we believe that there is very little
|
||
of value in this HOWTO for someone who studies one of these books and then
|
||
peruses information on the Internet.
|
||
-----------------------------------------------------------------------------
|
||
|
||
14.4. Usenet-related RFCs
|
||
|
||
TO BE ADDED
|
||
-----------------------------------------------------------------------------
|
||
|
||
14.5. The source code
|
||
|
||
TO BE ADDED
|
||
-----------------------------------------------------------------------------
|
||
|
||
14.6. Usenet newsgroups
|
||
|
||
There are many discussion groups on the Usenet dedicated to the technical and
|
||
non-technical issues in managing a Usenet server and service. These are:
|
||
|
||
<EFBFBD><EFBFBD>*<2A>news.admin.technical Discusses technical issues about administering
|
||
Usenet news
|
||
|
||
<EFBFBD><EFBFBD>*<2A>news.admin.policy Discusses policy issues about Usenet news
|
||
|
||
<EFBFBD><EFBFBD>*<2A>news.software.b Discusses C-News (no separate newsgroup was created after
|
||
B-News gave way to C-News) source, configuration and bugs (if any)
|
||
|
||
|
||
MORE WILL BE ADDED LATER
|
||
-----------------------------------------------------------------------------
|
||
|
||
14.7. We
|
||
|
||
We, at Starcom Software, offer the services of our Usenet news team to
|
||
provide assistance to you by email, as a service to the Linux and Usenet
|
||
administrator community, on a best effort basis.
|
||
|
||
We also offer you an integrated source distribution of C News, NNTPd, as
|
||
discussed earlier in the section titled "Setting up C News + NNTPd>". This
|
||
integrated source distribution fixes some bugs in the component packages it
|
||
includes, and it comes pre-configured with ready made configuration files
|
||
which allow all components to be compiled and installed on a Linux server in
|
||
a manner by which they can work together (e.g. key directory paths are
|
||
specified consistently across all components, etc.) This is available at
|
||
http://www.starcomsoftware.com/proj/usenet/src/
|
||
|
||
The URL http://www.starcomsoftware.com/proj/usenet/src/archives/ holds the
|
||
original sources of some of the software components we base our distribution
|
||
on. These include C News (c-news.tar.Z), NNTPd (nntp.1.5.12.1.tar.Z), and
|
||
Nestor (nestor.tar.Z). Other components, like pgpverify are maintained by
|
||
their current maintainers and can be obtained from their respective sites.
|
||
Therefore, they are not included in our archives.
|
||
|
||
The URL http://www.starcomsoftware.com/proj/usenet/doc/ carries copies of
|
||
some of the important technical articles and Usenix papers on the subject of
|
||
the Usenet.
|
||
|
||
We will endeavour to answer all queries sent to usenet@starcomsoftware.com,
|
||
pertaining to the source distribution we have put together and its
|
||
configuration and maintenance, and also pertaining to general technical
|
||
issues related to running a Usenet news service off a Unix or Linux server.
|
||
|
||
We may not be in a position to assist with software components we are not
|
||
familiar with, e.g. Leafnode, or platforms we do not have access to, e.g. SGI
|
||
IRIX. Intel Linux will be supported as long as our group is alive; our entire
|
||
office runs on Linux servers and diskless Linux desktops.
|
||
|
||
You are not forced to be dependent on us, because neither do we have
|
||
proprietary knowledge nor proprietary closed-source software. All the
|
||
extensions we are currently involved in with C-News and NNTPd will
|
||
immediately be made available to the Internet in freely redistributable
|
||
source form.
|
||
-----------------------------------------------------------------------------
|
||
|
||
15. Wrapping up
|
||
|
||
15.1. Acknowledgements
|
||
|
||
This HOWTO is a by-product of many years of experience setting up and
|
||
managing Usenet news servers. We have learned a lot from those who have trod
|
||
the path ahead of us. Some of them include the team of the ERNET Project
|
||
(Educational and Research Network), which brought the Internet technology to
|
||
India's academic institutions in the early nineties. We specially remember
|
||
what we have learned from the SIGSys Group of the Department of Computer
|
||
Science of the Indian Institute of Technology, Mumbai. We have also benefited
|
||
enormously from the guidance we received from the Networking Group at the
|
||
NCST (National Centre for Software Technology) in Mumbai, specially from
|
||
Geetanjali Sampemane.
|
||
|
||
On a wider scale, our learning along the path of systems and networks started
|
||
with Unix, without which our appreciation of computer systems would have
|
||
remained very fragmented and superficial. Our insight into Unix came from our
|
||
``Village Elders'' in the Department of Computer Science of the IIT (Indian
|
||
Institute of Technology) at Mumbai, specially from ``Hattu,'' ``Sathe,'' and
|
||
``Sapre,'' none of whom are with the IIT today, and from Professor D. B.
|
||
Phatak and others, many of whom, luckily are still with the Institute.
|
||
|
||
Coming to Starcom, all the members of Starcom Software who have worked on
|
||
various problems with networking, Linux, and Usenet news installations have
|
||
helped the authors in understanding what works and what doesn't. Without
|
||
their work, this HOWTO would have been a dry text book.
|
||
|
||
Hema Kariyappa co-authored the first couple of versions of this HOWTO,
|
||
starting with v2.0.
|
||
-----------------------------------------------------------------------------
|
||
|
||
15.2. Comments invited
|
||
|
||
Your comments and contributions are invited. We cannot possibly write all
|
||
sections of this HOWTO based on our knowledge alone. Please contribute all
|
||
you can, starting with minor corrections and bug fixes and going on to entire
|
||
sections and chapters. Your contributions will be acknowledged in the HOWTO.
|
||
-----------------------------------------------------------------------------
|
||
|
||
15.3. Copyright
|
||
|
||
Copyright (c) 2002 Starcom Software Private Limited, India
|
||
|
||
Please freely copy and distribute (sell or give away) this document in any
|
||
format. It is requested that corrections and/or comments be fowarded to the
|
||
document maintainer, reachable at usenet@starcomsoftware.com. When these
|
||
comments and contributions are incorporated into this document and released
|
||
for distribution in future versions of this HOWTO, the content of the
|
||
incorporated text will become the copyright of Starcom Software Private
|
||
Limited. By submitting your contributions to us, you implicitly agree to
|
||
these terms.
|
||
|
||
You may create a derivative work and distribute it provided that you:
|
||
|
||
1. Send your derivative work (in the most suitable format such as SGML) to
|
||
the LDP (Linux Documentation Project) or the like for posting on the
|
||
Internet. If not the LDP, then let the LDP know where it is available.
|
||
|
||
2. License the derivative work with this same license or use GPL. Include a
|
||
copyright notice and at least a pointer to the licence used.
|
||
|
||
3. Give due credit to previous authors and major contributors. If you are
|
||
considering making a derived work other than a translation, it is
|
||
requested that you discuss your plans with the current maintainer.
|
||
|
||
|
||
-----------------------------------------------------------------------------
|
||
15.4. About Starcom Software Private Limited
|
||
|
||
starcom (Starcom Software Private Limited, www.starcomsoftware.com) has been
|
||
building products and solutions using Linux and Web technology since 1996.
|
||
Our entire office runs on Linux, and we have built mission-critical solutions
|
||
for some of the top corporate entities in India and abroad. Our client list
|
||
includes arguably the world's largest securities depository (The National
|
||
Securities Depository Limited, India, www.nsdl.com), one of the world's top
|
||
five stock exchanges in terms of trading volumes (The National Stock Exchange
|
||
of India Limited, www.nseindia.com), and one of India's premier financial
|
||
institutions listed on the NYSE. In all these cases, we have introduced them
|
||
to Linux, and in many cases, we have built them their first mission-critical
|
||
business applications on Linux. Contact the authors or check the Starcom
|
||
Website for more information about the work we have done.
|
||
|
||
Notes
|
||
|
||
[1] This lack of a restart facility is something NNTP shares with its older
|
||
cousin, SMTP, and we have often seen email messages getting stuck in a
|
||
similar fashion over flaky data links. In many such networks which we
|
||
manage for our clients, we have moved the inter-server mail transfer to
|
||
Taylor UUCP, using UUCP over TCP.
|
||
[2] One of us did his first installation with with BNews, actually, at the
|
||
IIT Mumbai. Then we rapidly moved from there to CNews Shellscript
|
||
Release, then CNews Performance Release, CNews Cleanup Release, and our
|
||
current release has fixed some bugs in the latest Cleanup Release.
|