LDP/LDP/howto/docbook/Spam-Filtering-for-MX/Spam-Filtering-for-MX.xml

784 lines
22 KiB
XML

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
<!ENTITY background SYSTEM "chapter-background.xml">
<!ENTITY techniques SYSTEM "chapter-techniques.xml">
<!ENTITY consider SYSTEM "chapter-considerations.xml">
<!ENTITY qanda SYSTEM "chapter-qanda.xml">
<!ENTITY exim SYSTEM "impl-exim.xml">
<!ENTITY glossary SYSTEM "glossary.xml">
<!ENTITY gpl SYSTEM "gpl.xml">
]>
<!-- "/usr/share/sgml/docbook/dtd/4.2/docbookx.dtd" -->
<!-- "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" -->
<book>
<bookinfo>
<title>Spam Filtering for Mail Exchangers</title>
<subtitle>
How to reject junk mail in incoming SMTP transactions.
</subtitle>
<authorgroup>
<author>
<firstname>Tor</firstname>
<surname>Slettnes</surname>
<affiliation>
<address><email>tor@slett.net</email></address>
</affiliation>
</author>
<editor>
<firstname>Joost</firstname>
<surname>De Cock</surname>
<affiliation>
<address><email>joost.decock (at) astrid.be</email></address>
</affiliation>
<contrib>Technical Review</contrib>
</editor>
<editor>
<firstname>Devdas</firstname>
<surname>Bhagat</surname>
<affiliation>
<address><email>devdas (at) dvb.homelinux.org</email></address>
</affiliation>
<contrib>Technical Review</contrib>
</editor>
<editor>
<firstname>Tom</firstname>
<surname>Wright</surname>
<affiliation>
<address><email>tom (at) maladmin.com</email></address>
</affiliation>
<contrib>Language Review</contrib>
</editor>
</authorgroup>
<edition>Version 1.0 -- Release</edition>
<keywordset>
<keyword>anti-spam</keyword>
<keyword>anti-virus</keyword>
<keyword>bogus virus warnings</keyword>
<keyword>collateral spam</keyword>
<keyword>delivery status notification</keyword>
<keyword>dsn</keyword>
<keyword>exim</keyword>
<keyword>exim4</keyword>
<keyword>exiscan</keyword>
<keyword>exiscan-acl</keyword>
<keyword>greylisting</keyword>
<keyword>junk mail</keyword>
<keyword>sa-exim</keyword>
<keyword>smtp</keyword>
<keyword>spam</keyword>
<keyword>spamassassin</keyword>
<keyword>teergrubing</keyword>
<keyword>transaction delay</keyword>
</keywordset>
</bookinfo>
<preface>
<?dbhtml filename="introduction.html"?>
<title>Introduction</title>
<section id="purpose">
<?dbhtml filename="intro-purpose.html"?>
<title>Purpose of this Document</title>
<para>
This document discusses various highly effective and low
impact ways to weed out spam and malware during incoming SMTP
transactions in a mail exchanger (MX host), with an added
emphasis on eliminating so-called <xref linkend="colspam"/>.
</para>
<para>
The discussions are conceptual in nature, but a sample
implementation is provided using the Exim MTA and other
specific software tools. Miscellaneous other bigotry is
expressed throughout.
</para>
</section>
<section id="audience">
<?dbhtml filename="intro-audience.html"?>
<title>Audience</title>
<para>
The intended audience is mail system administrators, who are
already familiar with such acronyms as SMTP, MTA/MDA/MUA,
DNS/rDNS, and MX records. If you are an end user who is
looking for a spam filtering solution for your mail reader
(such as Evolution, Thunderbird, Mail.app or Outlook Express),
this document is <emphasis>not</emphasis> for you; but you may
wish to point the mail system administrator for your domain
(company, school, ISP...) to its existence.
</para>
</section>
<section id="updates">
<?dbhtml filename="intro-updates.html"?>
<title>New versions of this document</title>
<para>
The newest version of this document can be found at
<ulink url="http://slett.net/spam-filtering-for-mx/" />.
Please check back periodically for corrections and additions.
</para>
</section>
<section id="history" xreflabel="Revision History">
<?dbhtml filename="intro-history.html"?>
<title>Revision History</title>
<para>
<revhistory id="revhistory">
<revision>
<revnumber>1.0</revnumber>
<date>2004-09-08</date>
<authorinitials>TS</authorinitials>
<revremark>
First public release.
</revremark>
</revision>
<revision>
<revnumber>0.18</revnumber>
<date>2004-09-07</date>
<authorinitials>TS</authorinitials>
<revremark>
Incorporated second language review from Tom Wright.
</revremark>
</revision>
<revision>
<revnumber>0.17</revnumber>
<date>2004-09-06</date>
<authorinitials>TS</authorinitials>
<revremark>
Incorporated language review from Tom Wright.
</revremark>
</revision>
<revision>
<revnumber>0.16</revnumber>
<date>2004-08-13</date>
<authorinitials>TS</authorinitials>
<revremark>
Incorporated third round of changes from Devdas Bhagat.
</revremark>
</revision>
<revision>
<revnumber>0.15</revnumber>
<date>2004-08-04</date>
<authorinitials>TS</authorinitials>
<revremark>
Incorporated second round of changes from technical
review by Devdas Bhagat.
</revremark>
</revision>
<revision>
<revnumber>0.14</revnumber>
<date>2004-08-01</date>
<authorinitials>TS</authorinitials>
<revremark>
Incorporated technical review comments/corrections from
Devdas Bhagat.
</revremark>
</revision>
<revision>
<revnumber>0.13</revnumber>
<date>2004-08-01</date>
<authorinitials>TS</authorinitials>
<revremark>
Incorporated technical review from Joost De Cock.
</revremark>
</revision>
<revision>
<revnumber>0.12</revnumber>
<date>2004-07-27</date>
<authorinitials>TS</authorinitials>
<revremark>
Replaced "A Note on Controversies" with a more
opinionated "The Good, The Bad, the Ugly" section. Also
rewrote text on DNS blocklists. Some corrections from
Seymour J. Metz.
</revremark>
</revision>
<revision>
<revnumber>0.11</revnumber>
<date>2004-07-19</date>
<authorinitials>TS</authorinitials>
<revremark>
Incorporated comments from Rick Stewart on RMX++.
Swapped order of "Techniques" and "Considerations".
Minor typographic fixes in Exim implementation.
</revremark>
</revision>
<revision>
<revnumber>0.10</revnumber>
<date>2004-07-16</date>
<authorinitials>TS</authorinitials>
<revremark>
Added &lt;?dbhtml..?&gt; tags to control generated HTML
filenames - should prevent broken links from google etc.
Swapped order of "Forwarded Mail" and "User Settings".
Correction from Tony Finch on Bayesian filters;
commented out check for Subject:, Date:, and Message-ID:
headers per Johannes Berg; processing time subtracted
from SMTP delays per suggestion from Alan Flavell.
</revremark>
</revision>
<revision>
<revnumber>0.09</revnumber>
<date>2004-07-13</date>
<authorinitials>TS</authorinitials>
<revremark>
Elaborated on problems with envelope sender signatures
and mailing list servers, and a scheme to make such
signatures optional per host/domain for each user.
Moved "Considerations" section out as a separate
chapter; added subsections "Blocking Access to other
SMTP Server", "User Settings" and "Forwarded Mail".
Incorporated Matthew Byng-Maddick's comments on the
mechanism used to generate these signatures, Chris
Edwards' comments on sender callout verification, and
Hadmut Danisch's comments on RMX++ and other topics.
Changed license terms (GPL instead of GFDL).
</revremark>
</revision>
<revision>
<revnumber>0.08</revnumber>
<date>2004-07-09</date>
<authorinitials>TS</authorinitials>
<revremark>
Additional work on Exim implementation: Added section on
per-user settings and data for SpamAssassin per
suggestion from Tollef Fog Heen. Added SPF checks via
Exiscan-ACL. Corrections from Sam Michaels.
</revremark>
</revision>
<revision>
<revnumber>0.07</revnumber>
<date>2004-07-08</date>
<authorinitials>TS</authorinitials>
<revremark>
Made corrections to the Exim Envelope Sender Signatures
examples, and added support for users to "opt in" to
this feature, per suggestion from Christian Balzer.
</revremark>
</revision>
<revision>
<revnumber>0.06</revnumber>
<date>2004-07-08</date>
<authorinitials>TS</authorinitials>
<revremark>
Incorporated Exim/MySQL greylisting implementation and
various corrections from Johannes Berg. Moved "Sender
Authorization Schemes" up two levels to become a
top-level section in the Techniques chapter. Added
greylisting for NULL empty envelope senders after DATA.
Added SpamAssassin configuration to match Exim examples.
Incorporated corrections from Dominik Ruff, Mark
Valites, "Andrew" at Supernews.
</revremark>
</revision>
<revision>
<revnumber>0.05</revnumber>
<date>2004-07-07</date>
<authorinitials>TS</authorinitials>
<revremark>
Eliminated the (empty) Sendmail implementation for now,
to move ahead with the final review process.
</revremark>
</revision>
<revision>
<revnumber>0.04</revnumber>
<date>2004-07-06</date>
<authorinitials>TS</authorinitials>
<revremark>
Reorganized layout a little: Combined "SMTP-Time
Filtering", "Introduction to SMTP", and "Considerations"
into a single "Background" chapter. Split the previous
"Building ACLs" section in the Exim implementation into
top-level sections. Added alternate sender
authorization schemes to SPF: Microsoft Caller-ID for
E-Mail and RMX++. Incorporated comments from Ken
Raeburn.
</revremark>
</revision>
<revision>
<revnumber>0.03</revnumber>
<date>2004-07-02</date>
<authorinitials>TS</authorinitials>
<revremark>
Added discussion on Multiple Incoming Mail Exchangers;
minor corrections related to Sender Callout Verification.
</revremark>
</revision>
<revision>
<revnumber>0.02</revnumber>
<date>2004-06-30</date>
<authorinitials>TS</authorinitials>
<revremark>Added Exim implementation as an appendix</revremark>
</revision>
<revision>
<revnumber>0.01</revnumber>
<date>2004-06-16</date>
<authorinitials>TS</authorinitials>
<revremark>Initial draft.</revremark>
</revision>
</revhistory>
</para>
</section>
<!-- Give credit where credit is due...very important -->
<section id="credits">
<?dbhtml filename="intro-credits.html"?>
<title>Credits</title>
<para>
A number of people have provided feedback, corrections, and
contributions, as indicated in the <xref linkend="history"/>.
Thank you!
</para>
<para>
The following are <emphasis>some</emphasis> of the people and
groups that have provided tools and ideas to this document, in
no particular order:
</para>
<!-- Please scramble addresses; help prevent spam/email harvesting -->
<itemizedlist>
<listitem>
<para>
Evan Harris <email>eharris (at) puremagic.com</email>, who
conceived and wrote a white paper on <link
linkend="greylisting">greylisting</link>.
</para>
</listitem>
<listitem>
<para>
Axel Zinser <email>fifi (at) hiss.org</email>, who
apparently conceived of
<link linkend="smtpdelays">teergrubing</link>.
</para>
</listitem>
<listitem>
<para>
The developers of
<ulink url="http://spf.pobox.com/">SPF</ulink>,
<ulink url="http://www.danisch.de/work/security/antispam.html">RMX++</ulink>,
and other
<xref linkend="senderauth"/>.
</para>
</listitem>
<listitem>
<para>
The creators and maintainers of distributed, collaborative
junk mail signature repositories, such as
<ulink url="http://rhyolite.com/anti-spam/dcc/">DCC</ulink>,
<ulink url="http://razor.sf.net/">Razor</ulink>, and
<ulink url="http://pyzor.sf.net/">Pyzor</ulink>.
</para>
</listitem>
<listitem>
<para>
The creators and maintainers of various DNS blocklists and
whitelists, such as
<ulink url="http://www.spamcop.net/">SpamCop</ulink>,
<ulink url="http://www.spamhaus.org/">SpamHaus</ulink>,
<ulink url="http://www.sorbs.net/">SORBS</ulink>,
<ulink url="http://cbl.abuseat.org/">CBL</ulink>, and
<ulink url="http://moensted.dk/spam/">many others</ulink>.
</para>
</listitem>
<listitem>
<para>
The
<ulink url="http://www.spamassassin.org/full/3.0.x/dist/CREDITS">developers</ulink>
of
<ulink url="http://www.spamassassin.org/">SpamAssassin</ulink>,
who have taken giant leaps forward in developing and
integrating various spam filtering techniques into a
sophisticated heuristics-based tool.
</para>
</listitem>
<listitem>
<para>
Tim Jackson <email>tim (at) timj.co.uk</email> collated
and maintains a list of
<link linkend="bogusviruswarning">bogus virus warnings</link>
for use with SpamAssassin.
</para>
</listitem>
<listitem>
<para>
A lot of smart people who developed the excellent <link
linkend="exim">Exim</link> MTA, including: Philip
Hazel <email>ph10 (at) cus.cam.ac.uk</email>, the
maintainer; Tom Kistner <email>tom (at)
duncanthrax.net</email>, who wrote the Exiscan-ACL patch
for SMTP-time content checks; Andreas Metzler
<email>ametzler (at) debian.org</email>, who did a really
good job of building the Exim 4 Debian packages.
</para>
</listitem>
<listitem>
<para>
Many, many others who contributed ideas, software, and
other techniques to counter the spam epidemic.
</para>
</listitem>
<listitem>
<para>
You, for reading this document and your interest in
reclaiming e-mail as a useful communication tool
</para>
</listitem>
</itemizedlist>
</section>
<!-- Feedback -->
<section id="feedback">
<?dbhtml filename="intro-feedback.html"?>
<title>Feedback</title>
<para>
I would love to hear of your experiences with the techniques
outlined in this document, and of any other comments,
questions, suggestions, and/or contributions you may have.
Please send me an e-mail at: <email>tor@slett.net</email>.
</para>
<para>
If you are able to provide implementations for other <xref
linkend="mta"/>s, such as Sendmail or Postfix, please let me
know.
</para>
</section>
<!-- Translations -->
<section id="translations">
<?dbhtml filename="intro-translations.html"?>
<title>Translations</title>
<para>
No translations exist yet. If you would like to create one,
please let me know.
</para>
</section>
<section id="copyright">
<?dbhtml filename="intro-copyright.html"?>
<title>Copyright information</title>
<para>
Copyright &copy; 2004 Tor Slettnes.
</para>
<para>
This document is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
</para>
<para>
This document is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE. See the GNU General Public License for more details.
A copy of the license is included in <xref linkend="gpl"/>.
</para>
<para>
Read <ulink url="http://www.fsf.org/gnu/manifesto.html">The
GNU Manifesto </ulink> if you want to know why this license
was chosen for this book.
</para>
<para>
The logos, trademarks and symbols used in this book are the
properties of their respective owners.
</para>
</section>
<section id="prerequisites">
<?dbhtml filename="intro-prerequisites.html"?>
<title>What do you need?</title>
<para>
The techniques described in this document predicate system
access to the inbound <xref linkend="mx"/>(s) for the internet
domain where you receive e-mail. Essentially, you need to be
able to install software and/or modify the configuration files
for the <xref linkend="mta"/> on that system.
</para>
<para>
Although the discussions in this document are conceptual in
nature and can be incorporated into a number of different
MTAs, a sample Exim 4 implementation is provided. This
implementation, in turn, incorporates other software tools,
such as <ulink
url="http://www.spamassassin.org/">SpamAssassin</ulink>.
See <xref linkend="exim"/> for details.
</para>
</section>
<section id="conventions">
<?dbhtml filename="intro-conventions.html"?>
<title>Conventions used in this document</title>
<para>
The following typographic and usage conventions occur in this text:
</para>
<table id="conventiontable" frame="all">
<title>Typographic and usage conventions</title>
<tgroup cols="2" align="left" colsep="1" rowsep="1">
<thead>
<row>
<entry>Text type</entry>
<entry>Meaning</entry>
</row>
</thead>
<tbody>
<row>
<entry><quote>Quoted text</quote></entry>
<entry>Quotes from people, quoted computer output.</entry>
</row>
<row>
<entry><screen>terminal view</screen></entry>
<entry>
Literal computer input and output captured from
the terminal, usually rendered with a light grey
background.
</entry>
</row>
<row>
<entry><command>command</command></entry>
<entry>
Name of a command that can be entered on the command
line.
</entry>
</row>
<row>
<entry><varname>VARIABLE</varname></entry>
<entry>
Name of a variable or pointer to content of a
variable, as in <varname>$VARNAME</varname>.
</entry>
</row>
<row>
<entry><option>option</option></entry>
<entry>
Option to a command, as in <quote>the
<option>-a</option> option to the
<command>ls</command> command</quote>.
</entry>
</row>
<row>
<entry><parameter>argument</parameter></entry>
<entry>
Argument to a command, as in
<quote>read <command>man
<parameter>ls</parameter></command>
</quote>.
</entry>
</row>
<row>
<entry>
<cmdsynopsis>
<command>command
<option>options</option>
<parameter>arguments</parameter>
</command>
</cmdsynopsis>
</entry>
<entry>
Command synopsis or general usage, on a separated
line.
</entry>
</row>
<row>
<entry><filename>filename</filename></entry>
<entry>
Name of a file or directory, for example <quote>Change
to the <filename>/usr/bin</filename>
directory.</quote>
</entry>
</row>
<row>
<entry><keycap>Key</keycap></entry>
<entry>
Keys to hit on the keyboard, such as <quote>type
<keycap>Q</keycap> to quit</quote>.
</entry>
</row>
<row>
<entry><guibutton>Button</guibutton></entry>
<entry>
Graphical button to click, like the
<guibutton>OK</guibutton> button.
</entry>
</row>
<row>
<entry>
<menuchoice>
<guimenu>Menu</guimenu>
<guimenuitem>Choice</guimenuitem>
</menuchoice>
</entry>
<entry>
Choice to select from a graphical menu, for instance:
<quote>Select
<menuchoice><guimenu>Help</guimenu><guimenuitem>About
Mozilla</guimenuitem> </menuchoice> in your
browser.</quote></entry>
</row>
<row>
<entry>
<emphasis>Terminology</emphasis></entry>
<entry>
Important term or concept:
<quote>The Linux <emphasis>kernel</emphasis>
is the heart of the system.</quote>
</entry>
</row>
<row>
<entry>
See <xref linkend="glossary" />
</entry>
<entry>
link to related subject within this guide.
</entry>
</row>
<row>
<entry>
<ulink
url="http://slett.net/gallery/2003-05/IMG_1655">The author</ulink>
</entry>
<entry>Clickable link to an external web resource.</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
<section id="organization">
<?dbhtml filename="intro-organization.html"?>
<title>Organization of this document</title>
<para>
This document is organized into the following chapters:
</para>
<variablelist>
<varlistentry>
<term><xref linkend="background"/></term>
<listitem>
<para>
General introduction to SMTP time filtering, and to SMTP.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><xref linkend="techniques"/></term>
<listitem>
<para>
Various ways to block junk mail in an SMTP transaction.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><xref linkend="considerations"/></term>
<listitem>
<para>
Issues that pertain to transaction time filtering.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><xref linkend="qanda"/></term>
<listitem>
<para>
My attempt at anticipating your questions, and then
answering them.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
A sample Exim implementation is provided in <xref
linkend="exim"/>.
</para>
</section>
</preface>
<toc></toc>
&background;
&techniques;
&consider;
&qanda;
&exim;
&glossary;
&gpl;
</book>