LDP/LDP/howto/docbook/Spam-Filtering-for-MX/chapter-considerations.xml

<?xml version='1.0' encoding='ISO-8859-1'?>
<chapter id="considerations" xreflabel="Considerations">
  <?dbhtml filename="considerations.html"?>
  <title>Considerations</title>

  <abstract>
    <para>
      Some specific considerations come into play as a result of
      system-wide SMTP time filtering.  Here we cover some of those.
    </para>
  </abstract>

  <section id="multimx" xreflabel="Multiple Incoming Mail Exchangers">
    <?dbhtml filename="multimx.html"?>
    <title>Multiple Incoming Mail Exchangers</title>

    <para>
      Most domains list more than one incoming <xref linkend="mx"/>s
      (a.k.a. <quote>MX hosts</quote>).  If you do so, then bear in
      mind that in order to have any effect, any SMTP time filtering
      you incorporate on the primary MX has to be incorporated on all
      the others as well.  Otherwise, the sending host would simply
      sidestep filtering by retrying the mail delivery through your
      backup server(s).
    </para>

    <para>
      If the backup server(s) are not under your control, ask
      yourself whether you need multiple MXs in the first place.  In
      this situation, chances are that they serve only as
      <emphasis>redundant</emphasis> mail servers, and that they in
      turn forward the mail to your primary MX.  If so, you probably
      don't need them.  If your host happens to be down for a little
      while, that's OK -- well-behaved sender hosts will retry
      deliveries for several days before giving up
      <footnoteref linkend="noretrysenders"/>.
    </para>

    <para>
      A situation where you <emphasis>may</emphasis> need multiple
      MXs is to perform load balancing between several servers -
      i.e. if you receive so much mail that one machine alone could
      not handle it.  In this case, see if you could offload some
      tasks (such as <link linkend="virusscanners">virus</link> and
      <link linkend="spamscanners">spam</link> scanners) to other
      machines, in order to reduce or eliminate this need.
    </para>

    <para>
      Again, if you do decide to keep using several MXs, your backup
      servers need to be (at least) as restrictive as the primary
      server, lest filtering in the primary MX is useless.
    </para>

    <para>
      See also the section on <xref linkend="greylisting"/> for
      additional concerns related to multiple MX hosts.
    </para>
  </section>


  <section id="otherservers" xreflabel="Blocking Access to Other SMTP Servers">
    <?dbhtml filename="otherservers.html"?>
    <title>Blocking Access to Other SMTP Servers</title>

    <para>
      Any SMTP server that is not listed as a public <xref
      linkend="mx"/> in the DNS zone of your domain(s) should not
      accept incoming connections from the internet.  All incoming
      mail traffic should go through your incoming mail exchanger(s).
    </para>

    <para>
      This consideration is not unique to SMTP servers.  If you have
      machines that only serve an internal purpose within your site,
      use a firewall to restrict access to these.
    </para>

    <para>
      This is a rule, so therefore there must be exceptions.  However,
      if you don't know what they are, then the above applies to you.
    </para>
  </section>


  <section id="forwardedmail" xreflabel="Forwarded Mail">
    <?dbhtml filename="forwardedmail.html"?>
    <title>Forwarded Mail</title>

    <para>
      You should take care not to reject mail as a result of spam
      filtering if it is forwarded from <quote>friendly</quote>
      sources, such as:
    </para>

    <itemizedlist>
      <listitem>
	<para>
	  Your backup MX hosts, if any.  Supposedly, these have
	  already filtered out most of the junk (see <xref
	  linkend="multimx"/>).
	</para>
      </listitem>

      <listitem>
	<para>
	  Mailing lists, to which you or your users subscribe. You may
	  still filter such mail (it may not be as criticial if it
	  ends up in a black hole). However, if you reject the mail,
	  you may end up causing the list server to automatically
	  unsubscribe the recipient.
	</para>
      </listitem>

      <listitem>
	<para>
	  Other accounts belonging to the recipient.  Again,
	  rejections will generate collateral spam, and/or create
	  problems for the host that forwards the mail.
	</para>
      </listitem>
    </itemizedlist>

    <para>
      You may see a logistical issue with the last two of these
      sources: They are specific to each recipient.  How to you allow
      each user to specify which hosts they want to whitelist, and
      then use such individual whitelists in a system-wide SMTP-time
      filtering setup?  If the message is forwarded to several
      recipients at your site (as may often be true in the case of
      a mailing list), how do you decide whose whitelist to use?
    </para>

    <para>
      There is no magic bullet here.  This is one of those situations
      where we just have to do a bit of work.  You can decide to
      accept all mails, regardless of spam classification, so long as
      it is sent from a host in the whitelist of any one of the
      recipients.  For instance, in response to each <command>RCPT
      TO:</command> command, we can match the sending host against the
      corresponding user's whitelist.  If found, set a flag that will
      prevent a subsequent rejection.  Effectively, you are using an
      <emphasis>aggregate</emphasis> of each recipient's whitelist.
    </para>

    <para>
      The implementation appendices cover this in more detail.
    </para>
  </section>


  <section id="usersettings" xreflabel="User Settings and Data">
    <?dbhtml filename="usersettings.html"?>
    <title>User Settings and Data</title>

    <para>
      There are other situations where you may want to support
      settings and data for each user at site.  For instance, if you
      scan incoming mail with SpamAssassin (see <xref
      linkend="spamscanners"/>), you may want to allow for individual
      spam thresholds, acceptable languages and character sets, and
      Bayesian training/data.
    </para>

    <para>
      A sticking point is that SMTP-time filtering of incoming mail is
      done at the system level, before mail is being delivered to a
      particular user, and as such, does not lend itself too well to
      individual preferences.  A single message may have several
      recipients; and unlike the case with <xref
      linkend="forwardedmail"/>, using an aggregate of each
      recipient's preferences is not a good option.  Consider a
      scenario where you have users from different linguistic
      backgrounds.
    </para>

    <para>
      As it turns out, though, there is a modification to this truth.
      The trick is to limit the number of recipients in incoming
      messages to one, so that the message can be analyzed in
      accordance with the settings and data that belongs to the
      corresponding user.
    </para>

    <para>
      To do this, you would accept the first <command>RCPT
      TO:</command>, then issue a SMTP <command>451</command> (defer)
      response to subsequent commands.  If the caller is a
      well-behaved MTA, it will know how to interpret this response,
      and try later.  (If it is confused, then, well, it is probably a
      sender from which you don't want to receive mail in the first
      place).
    </para>

    <para>
      Obviously, this is a hack.  Every mail sent to several users at
      your site will be slowed down by 30 minutes or more per
      recipient.  Especially in corporate environments, where it is
      common to see e-mail discussions involving several people on the
      inside and several others on the outside, and where timelines of
      mail deliveries are essential, this is probably not a good
      solution at all.
    </para>

    <para>
      Another issue that mainly pertains to corporate enterprises and
      other large sites is that incoming mail is often forwarded to
      internal machines for delivery, and that recipients don't
      normally have accounts on the mail exchanger.  It may still be
      possible to support user-specific settings and data in these
      situations (e.g. via database lookups or LDAP queries), but you
      may also want to consider whether it's worth the effort.
    </para>

    <para>
      That said, if you are on a small site, and where you are not
      afraid of delayed deliveries, this may be an acceptable way
      to allow each user to fine tune their filtering criteria.
    </para>
  </section>
</chapter>