776 lines
14 KiB
HTML
776 lines
14 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML
|
|
><HEAD
|
|
><TITLE
|
|
>Message data checks</TITLE
|
|
><META
|
|
NAME="GENERATOR"
|
|
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
|
|
REL="HOME"
|
|
TITLE="Spam Filtering for Mail Exchangers"
|
|
HREF="index.html"><LINK
|
|
REL="UP"
|
|
TITLE="Techniques"
|
|
HREF="techniques.html"><LINK
|
|
REL="PREVIOUS"
|
|
TITLE="Sender Authorization Schemes"
|
|
HREF="senderauth.html"><LINK
|
|
REL="NEXT"
|
|
TITLE="Blocking Collateral Spam"
|
|
HREF="collateral.html"></HEAD
|
|
><BODY
|
|
CLASS="section"
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#840084"
|
|
ALINK="#0000FF"
|
|
><DIV
|
|
CLASS="NAVHEADER"
|
|
><TABLE
|
|
SUMMARY="Header navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TH
|
|
COLSPAN="3"
|
|
ALIGN="center"
|
|
>Spam Filtering for Mail Exchangers: </TH
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="left"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="senderauth.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="80%"
|
|
ALIGN="center"
|
|
VALIGN="bottom"
|
|
>Chapter 2. Techniques</TD
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="right"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="collateral.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H1
|
|
CLASS="section"
|
|
><A
|
|
NAME="datachecks"
|
|
></A
|
|
>2.6. Message data checks</H1
|
|
><P
|
|
> Time has come to look at the content of the message itself.
|
|
This is what conventional spam and virus scanners do, as they
|
|
normally operate on the message after it has been accepted.
|
|
However, in our case, we perform these checks
|
|
<EM
|
|
>before</EM
|
|
> issuing the final
|
|
<B
|
|
CLASS="command"
|
|
>250</B
|
|
> response, so that we have a chance to
|
|
reject the mail on the spot rather than later generating <A
|
|
HREF="gloss.html#colspam"
|
|
><I
|
|
CLASS="glossterm"
|
|
>Collateral Spam</I
|
|
></A
|
|
>.
|
|
</P
|
|
><P
|
|
> If your incoming mail exchangers are very busy (i.e. large site,
|
|
few machines), you may find that performing some or all of these
|
|
checks directly in the mail exchanger is too costly. In
|
|
particular, running <A
|
|
HREF="datachecks.html#virusscanners"
|
|
>Virus Scanners</A
|
|
> and <A
|
|
HREF="datachecks.html#spamscanners"
|
|
>Spam Scanners</A
|
|
> do take up a fair amount of CPU
|
|
bandwidth and time.
|
|
</P
|
|
><P
|
|
> If so, you will want to set up dedicated machines for these
|
|
scanning operations. Most server-side anti-spam and anti-virus
|
|
software can be invoked over the network, i.e. from your mail
|
|
exchanger. More on this in the following chapters, where we
|
|
discuss implementation for the various MTAs.
|
|
</P
|
|
><DIV
|
|
CLASS="section"
|
|
><H2
|
|
CLASS="section"
|
|
><A
|
|
NAME="headerchecks"
|
|
></A
|
|
>2.6.1. Header checks</H2
|
|
><DIV
|
|
CLASS="section"
|
|
><H3
|
|
CLASS="section"
|
|
><A
|
|
NAME="headersmissing"
|
|
></A
|
|
>2.6.1.1. Missing Header Lines</H3
|
|
><P
|
|
> <A
|
|
HREF="http://www.ietf.org/rfc/rfc2822.txt"
|
|
TARGET="_top"
|
|
>RFC
|
|
2822</A
|
|
> mandates that a message
|
|
<EM
|
|
>should</EM
|
|
> contain at least the following
|
|
header lines:
|
|
|
|
<TABLE
|
|
BORDER="0"
|
|
BGCOLOR="#E0E0E0"
|
|
WIDTH="100%"
|
|
><TR
|
|
><TD
|
|
><FONT
|
|
COLOR="#000000"
|
|
><PRE
|
|
CLASS="screen"
|
|
> From: ...
|
|
To: ...
|
|
Subject: ...
|
|
Message-ID: ...
|
|
Date: ...
|
|
</PRE
|
|
></FONT
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
>
|
|
</P
|
|
><P
|
|
> The absence of any of these lines means that the message
|
|
is not generated by a mainstream <A
|
|
HREF="gloss.html#mua"
|
|
><I
|
|
CLASS="glossterm"
|
|
>Mail User Agent</I
|
|
></A
|
|
>, and
|
|
that it is probably junk
|
|
<A
|
|
NAME="AEN1045"
|
|
HREF="#FTN.AEN1045"
|
|
><SPAN
|
|
CLASS="footnote"
|
|
>[1]</SPAN
|
|
></A
|
|
>.
|
|
</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H3
|
|
CLASS="section"
|
|
><A
|
|
NAME="headersyntax"
|
|
></A
|
|
>2.6.1.2. Header Address Syntax Check</H3
|
|
><P
|
|
> Addresses presented in the message header (i.e. the
|
|
<B
|
|
CLASS="command"
|
|
>To:</B
|
|
>, <B
|
|
CLASS="command"
|
|
>Cc:</B
|
|
>,
|
|
<B
|
|
CLASS="command"
|
|
>From:</B
|
|
> ... fields) should be syntactically
|
|
valid. Enough said.
|
|
</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H3
|
|
CLASS="section"
|
|
><A
|
|
NAME="headeraddress"
|
|
></A
|
|
>2.6.1.3. Simple Header Address Validation</H3
|
|
><P
|
|
> For each address in the message header:
|
|
</P
|
|
><P
|
|
></P
|
|
><UL
|
|
><LI
|
|
><P
|
|
> If the address is local, is the <EM
|
|
>local
|
|
part</EM
|
|
> (before the @ sign) a valid mailbox?
|
|
</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
> If the address is remote, does the <EM
|
|
>domain
|
|
part</EM
|
|
> (after the @ sign) exist?
|
|
</P
|
|
></LI
|
|
></UL
|
|
></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H3
|
|
CLASS="section"
|
|
><A
|
|
NAME="headercallout"
|
|
></A
|
|
>2.6.1.4. Header Address Callout Verification</H3
|
|
><P
|
|
> This works similar to <A
|
|
HREF="smtpchecks.html#callback"
|
|
>Sender Callout Verification</A
|
|
> and <A
|
|
HREF="smtpchecks.html#callforward"
|
|
>Recipient Callout Verification</A
|
|
>. Each remote header address is
|
|
verified by calling the primary MX for the corresponding
|
|
domain to determine if a <A
|
|
HREF="gloss.html#dsn"
|
|
><I
|
|
CLASS="glossterm"
|
|
>Delivery Status Notification</I
|
|
></A
|
|
> would be
|
|
accepted.
|
|
</P
|
|
></DIV
|
|
></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H2
|
|
CLASS="section"
|
|
><A
|
|
NAME="jmsr"
|
|
></A
|
|
>2.6.2. Junk Mail Signature Repositories</H2
|
|
><P
|
|
> One trait of junk mail is that it is sent to a large number of
|
|
addresses. If 50 other recipients have already flagged a
|
|
particular message as spam, why couldn't you use this fact to
|
|
decide whether or not to accept the message when it is
|
|
delivered to you? Better yet, why not set up <A
|
|
HREF="gloss.html#spamtrap"
|
|
><I
|
|
CLASS="glossterm"
|
|
>Spam Trap</I
|
|
></A
|
|
>s that feed a public pool of known spam?
|
|
</P
|
|
><P
|
|
> I am glad you asked. As it turns out, such pools do exist:
|
|
</P
|
|
><P
|
|
></P
|
|
><UL
|
|
><LI
|
|
><P
|
|
> <A
|
|
HREF="http://razor.sf.net/"
|
|
TARGET="_top"
|
|
>Razor</A
|
|
>
|
|
</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
> <A
|
|
HREF="http://pyzor.sf.net/"
|
|
TARGET="_top"
|
|
>Pyzor</A
|
|
>
|
|
</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
> <A
|
|
HREF="http://rhyolite.com/anti-spam/dcc/"
|
|
TARGET="_top"
|
|
>Distributed
|
|
Checksum Clearinghouse (DCC)</A
|
|
>
|
|
</P
|
|
></LI
|
|
></UL
|
|
><P
|
|
> These tools have progressed beyond simple signature checks
|
|
that only trigger if you receive an identical copy of a
|
|
message that is known to be junk mail. Rather, they evaluate
|
|
common patterns, to account for slight variations in the
|
|
message header and body.
|
|
</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H2
|
|
CLASS="section"
|
|
><A
|
|
NAME="garbagechars"
|
|
></A
|
|
>2.6.3. Binary garbage checks</H2
|
|
><P
|
|
> Messages containing non-printable characters are rare. When
|
|
they do show up, the message is nearly always a virus, or in
|
|
some cases spam written in a non-western language, without the
|
|
appropriate MIME encoding.
|
|
</P
|
|
><P
|
|
> One particular case is where the message contains NUL
|
|
characters (ordinal zero). Even if you decide that figuring
|
|
out what a <EM
|
|
>non-printable</EM
|
|
> character means
|
|
is more complex than beneficial, you might consider checking
|
|
for this character. That is because some <A
|
|
HREF="gloss.html#mda"
|
|
><I
|
|
CLASS="glossterm"
|
|
>Mail Delivery Agent</I
|
|
></A
|
|
>s, such as the <A
|
|
HREF="http://asg.web.cmu.edu/cyrus/"
|
|
TARGET="_top"
|
|
>Cyrus Mail Suite</A
|
|
>,
|
|
will ultimately reject mails that contain it.
|
|
<A
|
|
NAME="AEN1096"
|
|
HREF="#FTN.AEN1096"
|
|
><SPAN
|
|
CLASS="footnote"
|
|
>[2]</SPAN
|
|
></A
|
|
>.
|
|
|
|
If you use such software, you should definitely consider
|
|
getting rid of NUL characters.
|
|
</P
|
|
><P
|
|
> On the other hand, the (now obsolete) RFC 822 specification
|
|
did not explicitly prohibit NUL characters in the message.
|
|
For this reason, as an alternative to rejecting mails
|
|
containing it, you may choose to strip these characters from
|
|
the message before delivering it to Cyrus.
|
|
</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H2
|
|
CLASS="section"
|
|
><A
|
|
NAME="mimeerrors"
|
|
></A
|
|
>2.6.4. MIME checks</H2
|
|
><P
|
|
> Similarly, it might be worthwhile to validate the MIME
|
|
structure of incoming message. MIME decoding errors or
|
|
inconsistencies do not happen very often; but when they do,
|
|
the message is definitely junk. Moreover, such errors may
|
|
indicate potential problems in subsequent checks, such as
|
|
<A
|
|
HREF="datachecks.html#fileext"
|
|
>File Attachment Check</A
|
|
>s, <A
|
|
HREF="datachecks.html#virusscanners"
|
|
>Virus Scanners</A
|
|
>,
|
|
or <A
|
|
HREF="datachecks.html#spamscanners"
|
|
>Spam Scanners</A
|
|
>.
|
|
</P
|
|
><P
|
|
> In other words, if the MIME encoding is illegal, reject the
|
|
message.
|
|
</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H2
|
|
CLASS="section"
|
|
><A
|
|
NAME="fileext"
|
|
></A
|
|
>2.6.5. File Attachment Check</H2
|
|
><P
|
|
> When was the last time someone sent you a Windows screensaver
|
|
(<SPAN
|
|
CLASS="QUOTE"
|
|
>".scr"</SPAN
|
|
> file) or Windows Program Information File
|
|
(<SPAN
|
|
CLASS="QUOTE"
|
|
>".pif"</SPAN
|
|
>) that you actually wanted?
|
|
</P
|
|
><P
|
|
> Consider blocking messages with <SPAN
|
|
CLASS="QUOTE"
|
|
>"Windows
|
|
executable"</SPAN
|
|
> file attachment(s) - i.e. file names that
|
|
end with a period followed by any of a number of three-letter
|
|
combinations such as the above. This check consumes
|
|
significantly less resources on your server than <A
|
|
HREF="datachecks.html#virusscanners"
|
|
>Virus Scanners</A
|
|
>, and may also catch new virii for
|
|
which a signature does not yet exist in your anti-virus
|
|
scanner.
|
|
</P
|
|
><P
|
|
> For a more-or-less comprehensive list of such <SPAN
|
|
CLASS="QUOTE"
|
|
>"file name
|
|
extensions"</SPAN
|
|
>, please visit: <A
|
|
HREF="http://support.microsoft.com/default.aspx?scid=kb;EN-US;290497"
|
|
TARGET="_top"
|
|
>http://support.microsoft.com/default.aspx?scid=kb;EN-US;290497</A
|
|
>.
|
|
</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H2
|
|
CLASS="section"
|
|
><A
|
|
NAME="virusscanners"
|
|
></A
|
|
>2.6.6. Virus Scanners</H2
|
|
><P
|
|
> A number of different server-side virus scanners are
|
|
available. To name a few:
|
|
</P
|
|
><P
|
|
></P
|
|
><UL
|
|
><LI
|
|
><P
|
|
> <A
|
|
HREF="http://www.vanja.com/tools/sophie/"
|
|
TARGET="_top"
|
|
>Sophie</A
|
|
>
|
|
</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
> <A
|
|
HREF="http://www.kapersky.com/"
|
|
TARGET="_top"
|
|
>KAVDaemon</A
|
|
>
|
|
</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
> <A
|
|
HREF="http://clamav.elektrapro.com/"
|
|
TARGET="_top"
|
|
>ClamAV</A
|
|
>
|
|
</P
|
|
></LI
|
|
><LI
|
|
><P
|
|
> <A
|
|
HREF="http://www.sald.com/"
|
|
TARGET="_top"
|
|
>DrWeb</A
|
|
>
|
|
</P
|
|
></LI
|
|
></UL
|
|
><P
|
|
> In situations where you are not willing to block all
|
|
potentially dangerous files based on their file names alone
|
|
(consider <SPAN
|
|
CLASS="QUOTE"
|
|
>".zip"</SPAN
|
|
> files), such scanners are
|
|
helpful. Also, they will be able to catch virii that are
|
|
not transmitted as file attachments, such as the
|
|
<SPAN
|
|
CLASS="QUOTE"
|
|
>"Bagle.R"</SPAN
|
|
> virus that arrived in March, 2004.
|
|
</P
|
|
><P
|
|
> In most cases, the machine performing the virus scan does not
|
|
need to be your mail exchanger. Most of these anti-virus
|
|
scanners can be invoked on a different host over a network
|
|
connection.
|
|
</P
|
|
><P
|
|
> Anti-virus software mainly detect virii based on a set of
|
|
signatures for known virii, or <EM
|
|
>virus
|
|
definitions</EM
|
|
>. These need to be updated regularly,
|
|
as new virii are developed. Also, the software itself
|
|
should at any time be up to date for maximum accuracy.
|
|
</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="section"
|
|
><H2
|
|
CLASS="section"
|
|
><A
|
|
NAME="spamscanners"
|
|
></A
|
|
>2.6.7. Spam Scanners</H2
|
|
><P
|
|
> Similarly, anti-spam software can be used to classify messages
|
|
based on a large set of heuristics, including their content,
|
|
standards compliance, and various network checks such as <A
|
|
HREF="dnschecks.html#dnsbl"
|
|
>DNS Blacklists</A
|
|
> and <A
|
|
HREF="datachecks.html#jmsr"
|
|
>Junk Mail Signature Repository</A
|
|
>. In the end,
|
|
such software typically assigns a composite
|
|
<SPAN
|
|
CLASS="QUOTE"
|
|
>"score"</SPAN
|
|
> to each message, indicating the
|
|
likelihood that the message is spam, and if the score is above
|
|
a certain threshold, would classify it as such.
|
|
</P
|
|
><P
|
|
> Two of the most popular server-side heuristic anti-spam
|
|
filters are:
|
|
|
|
<P
|
|
></P
|
|
><UL
|
|
><LI
|
|
><A
|
|
NAME="spamassassin"
|
|
></A
|
|
><P
|
|
> <A
|
|
HREF="http://www.spamassassin.org/"
|
|
TARGET="_top"
|
|
>SpamAssassin</A
|
|
>
|
|
</P
|
|
></LI
|
|
><LI
|
|
><A
|
|
NAME="brightmail"
|
|
></A
|
|
><P
|
|
> <A
|
|
HREF="http://www.brightmail.com/"
|
|
TARGET="_top"
|
|
>BrightMail</A
|
|
>
|
|
</P
|
|
></LI
|
|
></UL
|
|
>
|
|
</P
|
|
><P
|
|
> These tools undergo a constant evolution as spammers find ways
|
|
to circumvent their various checks. For instance, consider
|
|
<SPAN
|
|
CLASS="QUOTE"
|
|
>"creative"</SPAN
|
|
> spelling, such as <SPAN
|
|
CLASS="QUOTE"
|
|
>"GR0W lO
|
|
1NCH35"</SPAN
|
|
>. So, just like anti-virus software, if you use
|
|
anti-spam software, you should update it frequently for the
|
|
highest level of accuracy.
|
|
</P
|
|
><P
|
|
> I use SpamAssassin, although to minimize impact on machine
|
|
resources, it is no longer my first line of defense. Out of
|
|
approximately 500 junk mail delivery attempts to my personal
|
|
address per day, about 50 reach the point where they are being
|
|
checked by SpamAssassin (mainly because they are forwarded
|
|
from one of my other accounts, so the checks described above
|
|
are not effective). Out of these 50 messages, one message
|
|
ends up in my inbox approximately every 2 or 3 days.
|
|
</P
|
|
></DIV
|
|
></DIV
|
|
><H3
|
|
CLASS="FOOTNOTES"
|
|
>Notes</H3
|
|
><TABLE
|
|
BORDER="0"
|
|
CLASS="FOOTNOTES"
|
|
WIDTH="100%"
|
|
><TR
|
|
><TD
|
|
ALIGN="LEFT"
|
|
VALIGN="TOP"
|
|
WIDTH="5%"
|
|
><A
|
|
NAME="FTN.AEN1045"
|
|
HREF="datachecks.html#AEN1045"
|
|
><SPAN
|
|
CLASS="footnote"
|
|
>[1]</SPAN
|
|
></A
|
|
></TD
|
|
><TD
|
|
ALIGN="LEFT"
|
|
VALIGN="TOP"
|
|
WIDTH="95%"
|
|
><P
|
|
> Some specialized MTAs, such as certain mailing list
|
|
servers, do not automatically generate a
|
|
<TT
|
|
CLASS="option"
|
|
>Message-ID:</TT
|
|
> header for
|
|
<SPAN
|
|
CLASS="QUOTE"
|
|
>"bounced"</SPAN
|
|
> messages (<A
|
|
HREF="gloss.html#dsn"
|
|
><I
|
|
CLASS="glossterm"
|
|
>Delivery Status Notification</I
|
|
></A
|
|
>s). These messages are identified by an
|
|
empty <A
|
|
HREF="gloss.html#envfrom"
|
|
><I
|
|
CLASS="glossterm"
|
|
>Envelope Sender</I
|
|
></A
|
|
>.
|
|
</P
|
|
></TD
|
|
></TR
|
|
><TR
|
|
><TD
|
|
ALIGN="LEFT"
|
|
VALIGN="TOP"
|
|
WIDTH="5%"
|
|
><A
|
|
NAME="FTN.AEN1096"
|
|
HREF="datachecks.html#AEN1096"
|
|
><SPAN
|
|
CLASS="footnote"
|
|
>[2]</SPAN
|
|
></A
|
|
></TD
|
|
><TD
|
|
ALIGN="LEFT"
|
|
VALIGN="TOP"
|
|
WIDTH="95%"
|
|
><P
|
|
> The IMAP protocol does not allow for NUL characters to be
|
|
transmitted to the mail user agent, so the Cyrus
|
|
developers decided that the easiest way to deal with mails
|
|
containing it was to reject them.
|
|
</P
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><DIV
|
|
CLASS="NAVFOOTER"
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"><TABLE
|
|
SUMMARY="Footer navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="senderauth.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="index.html"
|
|
ACCESSKEY="H"
|
|
>Home</A
|
|
></TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="collateral.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
>Sender Authorization Schemes</TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="techniques.html"
|
|
ACCESSKEY="U"
|
|
>Up</A
|
|
></TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
>Blocking Collateral Spam</TD
|
|
></TR
|
|
></TABLE
|
|
></DIV
|
|
></BODY
|
|
></HTML
|
|
> |