or your local metalab mirror. transproxy will not be discussed further in this
document.
</p>
<p>
A cleaner solution is to get a web proxy that is aware of transparent proxying
itself. The one we are going to focus on here is squid. Squid is an Open Source
caching proxy server for Unix systems. It is available from <htmlurl url="http://www.squid-cache.org" name="www.squid-cache.org">
</p>
<p>Alternatively, instead of redirecting the connections to local ports, we could redirect the connections to remote ports. This is discussed in the <ref id="twoboxes" name="Transparent Proxy to a Remote Box"> section. Readers interested in this approach should skip down to that section. Readers interested on doing everything on one box can safely ignore that section.
<sect1>
Scope of this document
<p>
This document will focus on squid version 2.4 and Linux kernel version
2.4, the most current stable releases as of this writing (August 2002). It
should also work with most of the later 2.3 kernels. If you need information
about earlier releases of squid or Linux, you can find some earlier
documents at <htmlurl url="http://users.gurulink.com/transproxy/" name="http://users.gurulink.com/transproxy/">. Note that this site has moved from it's previous location.
</p>
<p>
If you are using a development kernel or a development version of squid, you are on your own. This document may help you, but YMMV.
</p>
<p>
Note that this document focuses only on HTTP proxing. I get many emails asking
about transparent FTP proxying. Squid can't do it. Now, allegedly a program
called Frox can. I have not tried this myself, so I cannot say how well it
works. You can find it at <htmlurl url="http://www.hollo32.fsnet.co.uk/frox/" name ="http://www.hollo32.fsnet.co.uk/frox/">.
</p>
<p>
I only focus on squid here, but Apache can also function as a caching proxy
server. (If you are not sure which to use, I recommend squid, since it was
built from the ground up to be
a caching proxy server, Apache's caching proxy features are more of
afterthought additions to an already existing system.)
If you want use Apache instead of squid: follow all the instructions in this
document that pertain to the kernel and iptables rules. Ignore the squid
specific sections, and instead look at
<htmlurl url="http://lupo.campus.uniroma2.it/progetti/mod_tproxy/" name="http://lupo.campus.uniroma2.it/progetti/mod_tproxy/"> for source code and
instructions for a transparent proxy module for Apache (thanks to Cristiano Paris (c.paris@libero.it) for contributing this).
<sect1>
HTTPS
<p>Finally, as far as transparently proxing HTTPS (e.g. secure web pages using
SSL, TSL, etc.), you can't do it. Don't even ask. For the explanation, do a
search for 'man-in-the-middle attack'. Note that you probably don't
really need to transparently proxy HTTPS anyway, since squid can not
cache secure pages.
<sect1>
Proxy Authentication
<p>
You cannot use Proxy Authentication transparently. See the
First, we need to make sure all the proper options are set in your kernel.
If you are using a stock kernel from your distribution, transparent proxying
may or may not be enabled.
If you are unsure, the best way to tell is to simply skip this section, and
if the commands in the next section give you weird errors, it's probably because
the kernel wasn't configured properly.
</p>
<p>
If your kernel is not configured for transparent proxying, you will need
to recompile. Recompiling a kernel is a complex process (at least at first),
and it is beyond the scope of this document. If you need help compiling a kernel,
please see <htmlurl url="http://metalab.unc.edu/pub/Linux/docs/HOWTO/Kernel-HOWTO" name="The Kernel HOWTO">
</p>
<p>
The options you need to set in your configuration are as follows (Note:
if you prefer modules, some (but not all) of these can be built as modules. Luckily, everything that is not modularizable is probably got in your kernel anyway.)
</p>
<p>
<itemize>
<item> Under General Setup
<itemize>
<item>
Networking support
<item>
Sysctl support
</itemize>
<item> Under Networking Options
<itemize>
<item>Network packet filtering
<item>TCP/IP networking
</itemize>
<item> Under Networking Options -> IP: Netfilter Configuration
<itemize>
<item>Connection tracking
<item>IP tables support
<item>Full NAT
<item>REDIRECT target support
</itemize>
<item>Under File Systems
<itemize>
<item>/proc filesystem support
</itemize>
</itemize>
You must say NO to ``Fast switching'' under Networking Options.
</p><p>
Once you have your new kernel up and running, you may need to enable IP
forwarding. IP forwarding allows your computer to act as a router. Since this
is not what the average user wants to do, it is off by default and must be
explicitly enabled at run-time. However, your distribution might do this for
you already. To check, do ``cat /proc/sys/net/ipv4/ip_forward''. If you see
``1'' you're good. Otherwise, do ``echo '1' > /proc/sys/net/ipv4/ip_forward''.
You will then want to add that command to your appropriate bootup scripts (depending on your distribution, these may live in /etc/rc.d, /etc/init.d, or maybe somewhere else entirely).
</p>
<sect>
Setting up squid
<p>
Now, we need to get squid up and running. Download the latest source tarball
from <htmlurl url="http://www.squid-cache.org" name="www.squid-cache.org">.
Make sure you get a STABLE version, not a DEVEL version.
The latest as of this writing was squid-2.4.STABLE4.tar.gz. Note that AFAIK,
you must have squid-2.4 for linux kernel 2.4. The reason is that the
mechanism by which the process determines the original destination address
has changed from linux 2.2, and only squid-2.4 has this new code in it. (For those of you who are interested, previously the getsockname() call was hacked to provide the original destination address, but now the call is getsockopt() with a level of SOL_IP and an option of SO_ORIGINAL_DST).
</p>
<p>
Now, untar and gunzip the archive (use ``tar -xzf <filename>'').
Run the autoconfiguration script and tell it to include netfilter code
(``./configure --enable-linux-netfilter''), compile (``make'') and
then install (``make install'').
</p>
<p>
Now, we need to edit the default squid.conf file (installed to /usr/local/squid/etc/squid.conf, unless you changed the defaults). The squid.conf file is heavily
commented. In fact, some of the best documentation available for squid is in
the squid.conf file. After you get it all up and running, you should go back
and reread the whole thing. But for now, let's just get the minimum required.
Find the following directives, uncomment them, and change them to the
appropriate values:
</p>
<p>
<itemize>
<item>
httpd_accel_host virtual
<item>
httpd_accel_port 80
<item>
httpd_accel_with_proxy on
<item>
httpd_accel_uses_host_header on
</itemize>
</p><p>
Next, look at the cache_effective_user and cache_effective_group directives.
Unless the default nobody/nogroup has been created on your system (AFAIK, it
is not created out of the box on many popular distributions, including RH7.1),
you'll either need to create those, or create another username/group for
squid to run under. I strongly recommend that you create a username/group of
squid/squid and run under that, but you could use any existing user/group
if you want.
</p><p>
Finally, look at the http_access directive. The default is usually ``http_access
deny all''. This will prevent anyone from accessing squid. For now, you can
change this to ``http_access allow all'', but once it is working, you will
probably want to read the directions on ACLs (Access Control Lists), and setup
the cache such that only people on your local network (or whatever) can access
the cache. This may seem silly, but you should put some kind of restrictions
on access to your cache. People behind filtering firewalls (such as porn
filters, or filters in nations where speech is not very free) often ``hijack'' onto
wide open proxies and eat up your bandwidth.
</p>
<p>
Initialize the cache directories with ``squid -z'' (if this is a not a
new installation of squid, you should skip this step).
</p>
<p>
Now, run squid using the RunCache script in the /usr/local/squid/bin/ directory.
If it works, you should be able to set your web browser's proxy settings to
the IP of the box and port 3128 (unless you changed the default port number)
and access squid as a normal proxy.
</p>
<p>
For additional help configuring squid, see the squid FAQ at <htmlurl url="http://www.squid-cache.org" name="www.squid-cache.org">
</p>
<sect>
Setting up iptables (Netfilter)
<p>
iptables is a new thing for Linux kernel 2.4 that replaces ipchains.
If your distribution came with a 2.4 kernel, it probably has iptables
already installed. If not, you'll have to download it (and possibly
compile it). The homepage is <htmlurl url="http://netfilter.samba.org/" name="netfilter.samba.org">.
You make be able to find binary RPMs elsewhere, I haven't looked. For the
curious, there is plenty of documentation on the netfilter site.
</p>
<p>
To set up the rules, you will need to know two things, the interface that
the to-be-proxied requests are coming in on (I'll use eth0 as an example)
and the port squid is running on (I'll use the default of 3128 as an example).
You will want to add the above commands to your appropriate bootup script
under /etc/rc.d/. Readers upgrading from 2.2 kernels should note that
this is the only command needed. 2.2 kernels required two extra commands
in order to prevent forwarding loops. The infastructure of netfilter is
much nicer, and only this command is needed.
</p>
<sect>
Transparent Proxy to a Remote Box <label id="twoboxes">
<p>
Now, the question naturally arises, if we can do all this nifty stuff
redirecting HTTP connections to local ports, could we do the same thing
but to a remote box (e.g., the machine with squid running is not the same
machine as iptables is running on). The answer is yes, but it takes a little different magic words. If you only want to redirect to the local box
(the normal case), skip this section.
</p><p>
For the purposes of example commands, let's assume we have two boxes called squid-box and iptables-box, and that they are on the network local-network. In the commands below, replace these strings with the actual IP addresses or
name of your machines and network.
</p><p>
I will present two different approaches here.
<sect1>
First method (simpler, but does not work for some esoteric cases)
<p>
First, we need to machine that squid will be running on, squid-box.
You do not need iptables
or any special kernel options on this machine, just squid. You *will*,
however, need the 'http_accel' options as described above. (Previous version
of this HOWTO suggested that you did not need those options. That was a
mistake. Sorry to have confused people...)
</p><p>
Now, the machine that iptables will be running on, iptables-box You will need to configure the kernel as described in section 3 above, except that you don't need the REDIRECT target support). Now, for the iptables commands. You need three:
As before, add all of these commands to the appropriate startup scripts.
<p>
Here is a brief explanation of how this works: in method one, we used Network
Address Translation to get the packets to the other box. The result of this
is that the packet gets altered. This alteration is what causes some kinds
of clients mentioned above to fail. In method two,
we use a magic thing called policy routing. The first thing we do is to
select the packets we want. Thus, all packets on port 80, except those
coming from squid-box itself, are MARKed. Then, when the kernel goes to
make a routing decision, the MARKed packets aren't routing using the normal
routing table that you access with the ``route'' command but with a special
table. This special table has only one entry, a default gateway to squid-box.
Thus, the packet is sent merrily on it's way without every having been altered.
So, even HTTP/1.0 connections can be handled perfectly.
(Thanks to Michal Svoboda for suggesting and helping to write this section)
<p>
<sect1>Method One: What if iptables-box is on a dynamic IP?
<p>
If the iptables-box is on a dynamic IP address (e.g. a dialup PPP connection, or a DHCP assigned IP address from a cable modem, etc.), then you will want to
make a slight change to the above commands. Replace the second command with this one: