old-www/LDP/LG/issue27/lg_answer27.html

951 lines
40 KiB
HTML

<!--startcut ======================================================= -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>The Answer Guy Issue 27</title>
</head>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#A000A0"
ALINK="#FF0000">
<!--endcut ========================================================= -->
<H4>"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>
<P> <hr> <P>
<!-- =============================================================== -->
<center>
<H1><A NAME="answer">
<img src="../gx/ans.gif" alt="" border=0 align=middle>
The Answer Guy
<img src="../gx/ans.gif" alt="" border=0 align=middle>
</A></H1> <BR>
<H4>By James T. Dennis,
<a href="mailto:linux-questions-only@ssc.com">linux-questions-only@ssc.com</a><BR>
Starshine Technical Services, <A HREF="http://www.starshine.org/">
http://www.starshine.org/</A> </H4>
</center>
<p><hr><p>
<H3>Contents:</H3>
<ul>
<li><a HREF="./lg_answer27.html#mccluan">Regarding Compile Errors with
Tripwire 1.2</a>
<li><a HREF="./lg_answer27.html#karsh">Applix Spreadsheet ELF Macro
Language</a>
<li><a HREF="./lg_answer27.html#geene">Answer Guy Issue 18 -- Procmail Spam
Filter</a>
<li><a HREF="./lg_answer27.html#geene2">Great Procmail Article</a>
<li><a HREF="./lg_answer27.html#sindona">Linux Cluster Configuration</a>
<li><a HREF="./lg_answer27.html#holloway">IP Masquerading/Proxy?</a>
</ul>
<p><hr><p>
<!--================================================================-->
<a name="mccluan"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Regarding Compile Errors with Tripwire 1.2
</h3>
<P> <B>
From: Tc McCluan, <A HREF="mailto:tc@4dcomm.com">tc@4dcomm.com</A>
</B> <P><B>
<img align=bottom alt=" " src="../gx/ans2.gif">
I was on http://www.starshine.org/linux/ and since I am unable to
compile Tripwire 1.2 on my system (redhat 4.2 with 2.0.33 kernel)
I am trying all avenues of help.
</B> <P><B>
I have tried the recommendation in the /contrib/README.linux but
I still get the same error message. I have tried many combinations,
but still no luck.
</B> <P><B>
Following are the list of errors I am getting, hopefully you can spot where
this compile is failing.
Thanks in advance,
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
You could look for my Tripwire patch at
<P>
http://www.starshine.org/linux/
<P>
... or you could grab the RPM file from any Red Hat "contrib"
mirror like:
<P>
ftp://ftp.redhat.com/pub/contrib/i386/tripwire-1.2-1.i386.rpm
<P>
... for a precompiled binary or:
<P>
ftp://ftp.redhat.com/pub/contrib/SRPMS/tripwire-1.2-1.src.rpm
<P>
... for sources that you should be able to build cleanly.
<P>
So far I really haven't found a tripwire configuration that I really
like. I can never quite get the balance between what aspects to ignore
(permission and ownership changes on /dev/tty*, /dev/pty*, etc) and
which ones I need to watch.
<P>
So, if anyone out there as a really good tw.config file that really
minimizes the superfluous alerts and maximized the intrustion detection,
I'd like to hear about it.
<P>
Also if anyone has a YARD or other rescue disk builder that is customized
for creating write-protected tripwire boot/root diskette sets (for
periodic integrity auditing of Linux systems) I'd like to see a step-by-step
Mini-HOWTO or tutorial (maybe as a submission to Linux Gazette).
<P>
-- Jim
<p><hr><p>
<!--================================================================-->
<a name="karsh"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Applix Spreadsheet ELF Macro Language
</h3>
<P> <B>
From: Paul T. Karsh ITTC-237B 8-286-xxxx, <A
HREF="mailto:karchpte@acm.org">karchpte@acm.org</A>
</B> <P><B>
I happened on the Linux Gazette in the process of searching
for some information on "scripting" macros in the Applixware
spreadsheet. Although this is not strictly a Linux question,
I hope you can help me with some "pointers" (links ?) on how to learn
this language. The Applixware help is no help and the company
at which I consult does not have the on-line Applixware books nor
the hardcopy "macro" manual.
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
I played with Applixware a little bit -- but was highly
discouraged to find that its file conversion package
couldn't handle more recent versions of MS Word and Excel.
That was my main interest in the product since I occasionally
get file attachments in these proprietary formats -- and
sometimes they are potential customers.
<P>
As for the issue of learning this Macro language without
having the appropriate documentation. I would ask your
client where their manuals and/or installation CD is --
if they can't produce it and are unwilling to order a
replacement then I would question their decision to use
the product.
<P>
Applixware is a commercial product. Assuming this is
on a Linux system you'd probably want to contact Red Hat
Corporation to order replacement manuals (I think RH is
the sole Linux distributor for Applixware -- just as
Caldera is the sole distributor for the Linux version
of WordPerfect).
<P>
If they have the installation CD -- borrow it and install
its online documentation on some system somewhere (long
enough to get the information your need). Be sure to remove
that installation unless the appropriate licensing arrangements
are made, of course.
<P> <B>
<img align=bottom alt=" " src="../gx/ques.gif">
Is there somewhere on the net (FTP or anything) where I can get an intro
to this? I tried the Applixware site; it just seems to be page after
page of PR.
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
I would like to see far more technical content on their
web site as well. (The same desire applies to other hardware
and software company sites).
<P>
-- Jim
<p><hr><p>
<!--================================================================-->
<a name="geene"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Answer Guy Issue 18 -- Procmail Spam Filter
</h3>
<P> <B>
From: Anthony E. Geene, <A HREF="mailto:agreene@pobox.com">
agreene@pobox.com</A>
</B> <P><B>
I'm not a procmail user, but I've found that most spam is sent using
envelope addresses, the standard recipient headers are not addressed to
the actual recipient. So I set up filters to catch my mailing list mail
and any mail that is addressed to a list of my vailid addresses. Other
mail is put elsewhere for later review.
</B> <P><B>
Such a method is relatively simple and would catch all but the
more sophisticated spammers.
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
It is a good suggestion. It doesn't work if you have
some people that prefer to Bcc: you (use "blind carbon
copies"). Naturally many people's mail user agents
(MUA's) like elm, pine, etc don't have obvious options for
Bcc:'s -- others do (and most Unix/Linux MUA's allow
some way to do it -- even if it isn't *obvious*).
<P>
There are probably a number of other "false positive"
situations. As you say most automated mailing lists
have headers that would trigger on your criteria. The
obvious response to these problems is to make a list of
all the exceptional cases (of which you are aware) and
add appropriate rules to precede your anti-spam filter.
<P>
In addition it is important to ensure that your disposition
of apparently bogus messages is a refile to a specific
mail folder. You don't want to file it to /dev/null!
<P>
As you check your "probably junk" folder you can manually
refile the exceptions -- and optionally add new rules to
"pre-approve" lists of your favorite correspondents.
<P>
Note: if you keep a list of correspondents and a list of
known spammers, and you write a recipe to check the list
you may be concerned about the amount of time spent in
'grep'. Here's a hint: keep the list sorted and use the
'look' command.
<P>
(The advantage of 'look' is that it does a "binary" search
(think about successive approximation to "zero in on" the
desired lines) on a sorted file -- and returns the lines
that match. While the overhead of 'grep' grows in a linear
fashion (the search doubles in time as the file doubles
in size) that of 'look' grows much more slowly (it's
proportional to the square root of number of records/lines
in the file). Similar results would be attained if one
used 'dbm' hashes (indexes) -- but there is greater overhead
in programming (Perl offers modules to support dbm, gdbm,
ndbm and other hashing libraries -- it also has much higher
load time overhead as a result of it's generality).
<P>
The point is that even on a small file (100 lines) I
can see about a 10% difference in overhead. After a
few thousand lines the difference is substantial
(grep takes twice as long to run).
<P>
None of this matters much on your personal workstation
which has only one active user and receives a couple
hundred e-mail items per day. However -- if you're
filtering on the company mailhub, or at your ISP's
location -- it's worth it to reduce your impact.
<P>
-- Jim
<p><hr><p>
<!--================================================================-->
<a name="geene2"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Great Procmail Article!
</h3>
<P> <B>
From: Anthony E. Geene, <A HREF="mailto:agreene@pobox.com">
agreene@pobox.com</A>
</B> <P><B>
I read your procmail article in issue 14 of the Linux Gazette. It was
the best explanation of how procmail works that I've seen yet.
</B> <P><B>
I just wanted to say Thanks,<BR>
Anthony,
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
Thanks for the feedback. BTW there is a new article on
use TDG (The Dotfile Generator) as a GUI front end for
creating procmail scripts. I haven't finished reading
it yet -- but it looks pretty good to me.
<P>
In your earlier mail you mentioned that you aren't using
procmail yet. This article on TDG and my explanation of
what's going on "under the hood" may yet change that.
(Also, somewhere on that morass of half-baked pages that
I keep as a "website" are some links to other procmail
and mail filtering resources).
<P>
-- Jim
<p><hr><p>
<!--================================================================-->
<a name="sindona"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
Linux Cluster configuration
</h3>
<P> <B>
From: Antonio Sindona, <A HREF="mailto:Antonio.Sindona@trinacria.it">
Antonio.Sindona@trinacria.it</A>
</B> <P><B>
I'd like to create a *Linux cluster configuration* to have some degree of
fault-tolerance (Linux normally works ... hardware not always ! ;-) ). Do
You know if somebody tried to develop something to solve this problem ?
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
The first place I'd look for info on fault tolerance for
Linux would be:
Linux High Availability HOWTO<BR>
http://sunsite.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html
<P>
Then take a look at:
<P>
Linux Parallel Processing HOWTO<BR>
http://yara.ecn.purdue.edu/~pplinux/pphowto.html
<P>
... and:
<P>
MP and Clustering for Linux<BR>
http://linas.org/linux/mp.html
<P>
One of the most famous Linux parallel computing projects (which has
been written up in the _Linux_Journal_ among other places) is the
Beowulf Project:
<P>
http://sdcd.gsfc.nasa.gov/ESS/linux.html
<P>
After you've been overwhelmed by reading all of that you can
slog through all of the links at:
<P>
Linux Parallel Processing Using Clusters<BR>
http://yara.ecn.purdue.edu/~pplinux/ppcluster.html
<P>
.... which include links to some classic Unix projects like
"Condor," PVM, and MPI.
<P>
After reading all of those you'll undoubtedly decide that Linux
is years ahead of Microsoft in the field of clustering. (MS'
"wolfpack" project is still vaporware last I heard). However,
lest we grow complacent we should consider some features that Linux
needs to compete with mainframe and mini clustering technologies
(like those in VMS, and the ones that HP managed to eke out of
their aquisition of Apollo -- when they gutted DomainOS, from what
I hear).
<P>
The two features Linux needs in order to attain the next level
of clustering capacity are "transparent checkpointing" and
"process migration."
<P>
"Transparent checkpointing" allows the kernel to periodically
take a comprehensive snapshot of a process' state (to disk or
to some network filesystem) and allows the OS to restart a
process "where it left off" in the event of a system failure.
<P>
(System failures that damage the checkpoint files notwithstanding,
of course).
<P>
"Process Migration" allows a node's kernel to push a process
onto another (presumably less heavily loaded) system. The process
continues to run on the new system without any knowlege of the
transition.
<P>
At first it seems like "checkpointing" would cost way too much
in performance. However, it turns out that relatively little
of your system's RAM has been modified from the disk images
(binaries and libraries) in any given time frame. I've heard
reliable reports that this has almost trivial overhead on a
Unix/Linux like system.
<P>
It's easy to see how "checkpointing" is a necessary feature
to support process migration. However, it's not enough. You
also need mechanisms to allow the target kernel to give the
incoming process access to all of the resources that it had
allocated (open file descriptors, other interprocess channels,
etc). For Unix like systems you also have to account for
the process structure (the PID of the process can't change)
-- and there has to be some implicit inter-node communications
to maintain the process groups (to get a process' exit
status to its parent and to allow members of a process group
to get status and send signals to it.
<P>
There have been a number of operating systems that have implemented
checkpointing and process migration features. Chorus Mi/X, Berkeley
Sprite and Amoeba (a project that the father of Minix, Andrew S.
Tanenbaum, collaborated on) come to mind.
<blockquote>
(see http://www.am.cs.vu.nl/ for info on Amoeba,
http://HTTP.CS.Berkeley.EDU/projects/sprite/ for
Sprite, and http://www.chorus.com for Chorus Mi/X
info).
</blockquote>
One Unix package that is supposed to offer these features is Softway Ltd's
Hibernator II. Just SGI and a Fujitsu mainframe version are supported.
This is probably an expensive commercial package and we shouldn't
hold our breath for a Linux port.
<P>
* http://softway.com.au/softway/products/hibernator.html
<P>
The MOSIX project also supports transparent process migration (imagine
that copy of emacs being moved from one overloaded CPU to an idle
machine while you were using it). It is currently available on
BSD/OS. However we're in luck! As I was typing this and checking
my URL's and references I noticed the following statement on their
pages:
<P>
``MOSIX for Linux (RedHat) is now under development''
<P>
(Yay!).
<P>
You can read more about MOSIX (and see this note yourself) at:
<P>
http://www.cs.huji.ac.il/mosix/
(Hebrew University, Israel)<BR>
http://www.cnds.jhu.edu/mirrors/mosix/txt_main.html
<P>
One OS project that I've been keeping my eye on for awhile has been
EROS (http://www.cis.upenn.edu/~eros/). This isn't widely
available yet -- but I have high hopes for it. It will use a
"persistence" model that implicitly checkpoints the state of the
entire system (all processes and threads).
<P>
EROS is not "Unix" though it should eventually support a Unix/Linux
compatible subsystem (called Dionysix). The major difference is that
EROS is a pure "capabilities" system. ``Capabilities'' are the key to a
security model that is much different than the traditional identity/group
(Unix), process privileges (VMS and Posix.6), and ACL (NT, Netware,
etc) that are common in other operating systems. Read Mr. Shapiro's
web pages for more info on that.
<P>
I personally think we (in the Linux community) have quite a bit
to learn from other operating systems -- their strengths and
their weaknesses. To anyone of us who would say "But those are
just obscure systems. Nobody is running those!" I would point
out that millions of PC users still have that same reaction to
Linux.
<P>
So, to learn *far* more than you ever wanted to know about
operating systems *other* than DOS, MacOS, and Unix take a
look at the links on my short page about OS':
<P>
http://www.starshine.org/jim/os/
<P>
-- Jim
<p><hr><p>
<!--================================================================-->
<a name="holloway"></a>
<h3><img align=bottom alt=" " src="../gx/ques.gif">
IP Masquerading/Proxy?
</h3>
<P> <B>
From: Jack Holloway
</B> <P><B>
Ok... I'm alittle foggy on the terminology... if I have a machine on an
ethernet network that is hooked to the internet, and I want all of the other
machines on the network to connect to the internet THROUGH the machine
connected to the internet, I need to use IP masquerading or proxy server
stuff?
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
You can use IP Masquerading and/or any sort of proxy systems.
<P>
IP Masquerading is a particular form of NAT (network
address translation).
<P>
The one machine (your Linux box) that is connected to
your LAN and to the Internet is the "router" or "gateway."
("routers" work at the "transport" layer, while "gateways"
work at the "applications" layer of the OSI reference model).
(More on that later).
<P>
One "real" (IANA issued) IP address is assigned to the
"outer" interface and attached to the Internet (through
your ISP). This will typically be a PPP link through your
router/gateway's modem -- though it might be any network
interface that you can get Linux to use.
<P>
One the other interface (typically an ethernet card) you
assign one out of any of the "private" or "reserved for
disconnected networks" IP address ranges as defined in
RFC1918 (previously in RFC1597 and 16??). These RFC1918
addresses are guaranteed to never be issued to any
Internet host (so those of use using them on our networks
will never create an ambiguity with *our* router by attempting
to access a machine *outside* our network that has an IP
address that duplicates one *inside* of our network).
<P>
The RFC1918 address blocks are:
<PRE>
10.*.*.* (one class A net)
172.16.*.* through 172.31.*.* (16 class B's)
192.168.0.* through 192.168.255.* (255 class C's)
</PRE>
You can pick any of those RFC1918 address blocks and
you can subnet them anyway that's convenient. I use
192.168.64.0 for my home LAN.
<P>
Within my LAN I use the .1 address (192.168.64.1) for my
Linux gateway/router's ethernet -- it gets its other (real)
IP address dynamically from my ISP when 'diald' establishes
a connection (diald is a daemon that automatically invokes my
ppp connection whenever traffic routing to the network is
required -- I actually have another RFC1918 address assigned
to the SLIP connection that diald uses for internal purposes).
I run a caching nameserver on this box (which we'll call "gw").
<P>
All systems on my LAN execute a line like the following:
<PRE>
route add -net 192.168.64.0 eth0
</PRE>
... in their rc scripts at some point. This configures
them to all agree where packets for this network go.
This is called a "static" route.
<P>
I then point the /etc/resolv.conf on all of the "client" machines
on my LAN to "gw" and add a default route to each of them that
looks like:
<PRE>
route add default gw 192.168.64.1
# other traffic goes to host named "gw"
</PRE>
(the "client" machines don't have to be Linux and don't
have to have any special support for IP Masquerading --
you just assign them IP addresses like 192.168.64.2, etc.
to each of them).
<P>
In the "gw" server I have the kernel compiled with masquerading
and "forwarding" support enabled (of course). I don't put in
the default static route -- that would be a loop. "gw" also
has a different /etc/resolv.conf file -- one that points to
a couple of my ISP nameservers.
<P>
Note: One trick I've learned about resolv.conf files --
You only get three nameserver entries (in most versions of
the bind libraries) -- so I repeat the first and the last
one. When a query times out (for a client) it moves to the
second nameserver. Meanwhile the first nameserver still
has a good chance of getting a response (DNS over today's
busy Internet times out more often than nameservers fail).
So, a timeout on the second nameserver leads to a repeat
request on the first one -- which has probably received
and cached a response by this time. I could explain that
in more detail -- but the real gist is: try it. It helps.
<P>
Now, back to masquerading:
<P>
All it takes for masquerading to work is to run the command
<PRE>
LAN="192.168.64.0/24"
ipfwadm -F -a accept -m -S $LAN -D 0.0.0.0/0
</PRE>
... which means:
<P>
use the "IP firewall administrative" program to
make the following change to the "forwarding" (-F)
table:
<blockquote>
add/append (-a) a rule to accept for
masquerading (-m) any packet from (-S
or "source address") my LAN (which is a
shell variable I defined in the preceding
line) that is going to (whose "destination"
-D) is anywhere (0.0.0.0/0).
</blockquote>
Here's how that works. When the kernel receives a packet
that's not destined for the localhost (the gateway itself)
it checks to see if forwarding is enabled, then it looks in
the routing table to see where the packet should go. My
gateway's default route is pointing to the sl0 interface
(the SLIP interface that diald maintains to detect outgoing
traffic) -- when diald detects traffic on sl0 -- it
runs my PPP connection script which changes the default
route to point to my ISP's routers (which is part of the
information that's negotiated via PPP along with my
dynamic IP address). Now the packet is "forwarded" from
interface to the other. Assuming that the packet came
from my LAN (via the ethernet card in "gw" the kernel's
packet filtering ("firewall") code takes over.
<P>
ipfw inspects the packet to see if it was part of an
existing TCP session (part of a connection that it has
already been working with). If it is than ipfw notes
the TCP "port" that this session is assigned to, otherwise
ipfw just picks another port. If it picks a new port it
adds an entry to it's masquerading table that records the
packet's original source address and source port. The
"client" machine on my LAN is expecting any reply packets to
come back to the appropriate source port (which is how it
knows which process' "socket" to write the reply packets to)
-- ipfw then re-writes the packet headers, changing the source
address to match the one on ppp0 (the "real IP address for
which my ISP knows a route), and changing the source port to
the one it selected.
<P>
When ipfw receives reply packets the kernel routes them to
sockets which ipfw owns (the source port on my outgoing
packets becomes the destination port on the reply packets).
ipfw then looks that socket up in its table, retrieves the
*original* source addr and port (for the outgoing packet that
generated this reply) rewrites the destination fields (on the
*reply* packet). Finally the (now re-written) packet is
routed to the LAN.
<P>
Effectively IP Masquerading makes a whole LAN full of machines
look like one really busy one to the rest of the Internet.
While a typical workstation might only have a few dozen
active network connections available, a masquerading gateway
might have hundreds or thousands. As a practical matter
the TCP/IP protocol provides a 16 bit field for "ports" and
Most Unix systems can't handle more than a few thousand
concurrent open connections (sockets) and file descriptors.
(This has to do with the tables that the kernel allocates
for the data structures that manage all this -- regardless
of whether masquerading is active or not). Luckily you're
unlikely to have enough bandwidth to approach Linux'
capacity.
<P>
I'm sorry for the length of that description. Note that
it is purely conceptual (I've never read the code, I've
just deduced what it must be doing from what I know of
how TCP works).
<P> <B>
<img align=bottom alt=" " src="../gx/ques.gif">
Ouch! That's a big question there! Ok, firstly, do own IPs for every
machine on your network? (That is, do you have an internet unique IP for
each machine) If so, all you want is routed. If you don't, then to
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
'routed' is deprecated. In addition he doesn't need routed
or gated to talk to his ISP (and almost certainly can't
use it with them -- they won't listen to his routes unless
he goes out and gets an AS number and negotiates a contract
for "peering" with them which would absurd unless he were
becoming a multi-home ISP or something like that).
<P>
The case where routed or gated makes sense is with his
own internetwork of LAN's. If he has several ethernet
segments and is moving systems around them frequently
(or adding new IP devices to them) then it would be
be useful. For simpler and for more structured LANs
(each ether segment gets a subnet -- a global, static
routing table is distributed to all routers) you don't
need or want 'routed' or 'gated'.
<P>
If he had a block of ISP (or IANA) issued IP addresses,
his ISP would have to include routing to them (they
don't make sense otherwise). Usually this amounts to
some static routes that they maintain in their systems
-- specifically some entries that are invoked whenever
your system authenticates on one of their terminal servers
or routers.
<P>
You don't have to run any software on your end to make
use of this routing. (That's a confusing statement --
you have to run PPP or SLIP to connect to them -- but
once you're connected they will route packets to you
even if your routes back to them are completely missing).
<P>
As I've described above -- you just have to have your
own LAN routing set up properly. That means that each
system on your LAN has "-net" routes unto your ethernet
and a "default gw" route to your router/gateway (masquerading
host).
<P> <B>
<img align=bottom alt=" " src="../gx/ques.gif">
browse the web you can use a proxy server(which looks to the outside world
as if only the proxy is actually on the net.). If you want to telnet etc.
out, you will need IP-Masquerading, which isn't the most reliable way of
doing things. ask me further in email if you need more detail!
</B> <P>
<img align=bottom alt=" " src="../gx/ans2.gif">
I disagree with several points here. Both masquerading *and*
proxying look like "only the proxy is actually on the net."
-- because only the router/gateway has an IP address with
valid Internet routes. The rest of your LAN is "hidden"
(behind your "gw") because those IP addresses don't have
valid Internet routes. The are IP addresses but they are
not *Internet* addresses!
<P>
Proxying is an applications layer solution. Masquerading
and NAT are transport layer. The difference is what
data structures the software is dealing with.
<P>
At the network layer we're working with "data frames." This
is what an ethernet bridge or switch uses -- the MAC (BIA)
addresses. That's also the layer at which ARP (address
resolution protocol) works. It's how one host finds
finds the ethernet card address of another system that's
on the same LAN (how our client machines "find" our router/gw).
<P>
At the transport layer we deal with packets. These have IP
addresses (as opposed to the MAC -- media access control --
addresses in the ethernet "frame" header). This is where
the masquerading happens. As I've described masquerading
involves a relatively "dumb" (mechanical) bit of packet
patching with some table reference and maintenance.
Technically there are some details I left out -- like
recomputing the packet checksums.
<P>
The problem is that the transport layer conveys no information
about the applications protocol for which it is a carrier.
For "normal" TCP protocols (like HTTP and telnet) this is
no problem. However, FTP and a few other protocols do "bad"
things. In particular an FTP session consists of *two* TCP
sessions (a control session which is initiated from the client
to the server) and a data session which is initiated from the
server back to the client! The IP address and port to which
this "return connection" goes is passed to the server via
the control connection. This last detail has caused more
firewall designers and admins to rip out their hair than
all the cheap combs from China. In the context of masquerading
it means that the masquerading server must monitor the
*data* (the stuff in the payloads of the packets) and make
some selective patches therein. In the other cases we only
touched the headers of each packet -- never the contents of
their payloads.
<P>
So, this is the part of Masquerading that is unreliable.
Linux IP Masquerading is by no means the only flavor --
though it's probably the most widely used by now. Linux
as several modules for dealing with unruly protocols --
so the usually work.
<P>
However, I've found it more reliable to use the TIS FWTK
ftp-gw (Trusted Information Systems http://www.tis.com,
Firewall Toolkit). This is a proxy.
<P>
Proxy packages work at the applications layer. You have
to have support for each applications protocol (http, ftp,
telnet, rlogin, smtp, etc) that you want to allow "through"
your firewall. They come in two forms: SOCKS and FWTK
(There are many of them besides these -- but all of them
follow one *model* or the other).
<P>
In the FWTK model the user opens his or her initial
connection to the firewall (I 'ftp' to gw.starshine.org).
The firewall (gateway) is running the FWTK proxy *instead
of* (or *in addition to*) the normal server (ftpd). If
it is "in addition to" than one or the other must be on a
different port or using a different IP Alias on the machine
(more on that later). Now my FTP server (ftp-gw) prompts
me to "login"
<P>
For a normal FTP server I'd type my name (or "ftp" or "anonymous").
For ftp-gw I'm trying to go *though* this machine and unto
one that's on the other side (on the Internet). So I have to
provide more information. So I type:
<P>
ftp@sunsite.unc.edu
<P>
... or
<P>
webauthor@www.svlug.org
<P>
... or whatever. The gateway ftp server then opens a connection
to my target (everthing *after* the @ sign) and passes my
name (everything before the @ sign) to *its* login prompt.
<P>
The TIS FWTK comes with a number of other small proxies --
and most of them work in a similar fasion. (There are
also options to limit *who* can access *what* and *when*
(via administrator edited access control lists).
<P>
The key point here is that FWTK doesn't require any special
client software support. The users have to be trained how
to traverse the firewall and the have to remember how to do it.
<P>
FWTK is only appropriate for relative small groups of technically
savvy users (who are easy to train in this and won't make the
sysadmin's life a constant hell of walking everyone through this
extra connectivity step).
<P>
SOCKS has a model that works for larger groups of less savvy
users. However, it requires that you install SOCKS aware
versions of your client applications. So you have to replace
your normal telnet, ftp, rlogin, etc with a "socksified"
version. In many cases it is possible to effectively
"socksify" all of your client utilities by replacing a shared
library (Unix/Linux) or a DLL (Windows). Many commercial
TCP clients and utilities are built with SOCKS support
(Netscape Navigator and Communicator are prime examples).
I think the Trumpet shareware utilities for Windows are
another.
<P>
The hassle is installing and configuring this software on every
client system. However, the advantage is that none of the users
has to remember, or even know, about the firewall. The SOCKS
applications will automatically negotiate sessions through the
firewall.
<P>
There are some protocols that are inherently easy or even
unnecessary to proxy. For example DNS doesn't need to
be proxied. You run your caching copy of named and let
all of the client machines talk to and trust it. This
gives a great performance boost to most of the clients
and saves quite a bit of bandwidth on the critical link
to the ISP. There is no reason that I can think of not
to run a caching nameserver somewhere on your Internet
connected LAN.
<P>
HTTP is a protocol that benefits quite a bit from proxying.
It is trivial to add caching features a web proxy -- and
I think just about all of them do so.
<P>
SMTP is a protocol that doesn't need proxying (from the
standpoint of the clients on your LAN). You configure
an internally accessible system to accept mail and
it will relay it to your gateway via whatever means
you configure. A typical model would be that outgoing
mail is collected on an internal hub, which is configured
to relay it to the external gateway, which, in turn,
relays it to the ISP and on to the world. To see what
this looks like read the "Received" headers in some of your
mail.
<P>
The externally visible mail gateway can route mail
back to the internal hub -- which can run POP and/or
IMAP servers for the clients to use to actually get
their mail. (You could have the internal hub route
all of the mail directly to people's desktops via
SMTP too.
<P>
The reason you generally don't need proxying for
SMTP is that most sites use some form of masquerading
(mail appears to come from the "domain" rather than from
a particular host whithin the domain). FWTK includes
smapd -- and there is an independent and free smtpd
which act as proxies for sendmail. Here the intend is
to have a small simple program receive mail and
pass it along to the larger, vastly more complicated
'sendmail' itself. (I don't want to get into the
raging debates about sendmail vs. qmail etc -- suffice
it to say there are many alternatives).
<P>
Note that masquerading and proxying are not mutually
exclusive. I use masquerading and I have ftp-gw and
squid (caching web service) installed. I could
also install SOCKS on the same gateway.
<P>
Incidentally I mentioned that it's possible to run ftpd
and ftp-gw on the same machine without putting them on
different ports. Here's two ways of doing that:
<P>
IP Aliasing method:
<ol>
<li> You install ftpd and ftp-gw
<li> You create an IP Alias (you add an extra address
to your gateway system's internal interface with
a command like:
<PRE>
ifconfig eth0:1 192.168.64.129
</PRE>
<li> You configure your TCP Wrappers to virtual host
a service by adding a line like this to your
/etc/hosts.allow file:
<PRE>
in.ftpd@192.168.64.129: 192.168.64. : twist /usr/local/fwtk/ftp-gw
</PRE>
This will "twist" any ftp request *to that IP alias*
into an ftp-gw session. FTP requests to any other
interface address will be handled in the usual way
(tcpd will launch the ftp daemon that's listed in
inetd.conf).
</ol>
That's all there is to that method. Note that you can
to other interesting things with this sort of virtual
hosting, if you're clever.
<P>
Loopback Twist method:
<ol>
<li> Install ftpd and ftp-gw (as you would for
the other method).
<P>
<li> Configure tcp wrappers to allow normal ftp
access *from* the localhost address (127.0.0.1)
<P>
<li> Configure tcp wrappers to twist any other
ftp requests into ftp-gw
</ol>
That looks like this (in the /etc/hosts.allow file):
<PRE>
in.ftpd: 127.0.0.1 : ALLOW
in.ftpd : ALL : twist /usr/local/fwtk/ftp-gw
</PRE>
WARNING! This second line would allow *anyone*
(from inside or outside) of your LAN to access the
proxy. However, ftp-gw reads a file --
/usr/local/etc/netperm-table according to the way I
compiled mine -- to determine who is allowed to access
each of its proxy services.
<P>
So, this line is neither as dangerous as it looks
nor as safe as it should be. Changing it to:
<PRE>
in.ftpd : LOCAL : twist /usr/local/fwtk/ftp-gw
</PRE>
... is safer and more appropriate.
<P>
One key point here is that you can use proxies on
your masquerading route/gateway to allow access from
the "outside" back *into* services inside your LAN.
Usually you want to prevent this (the whole point of
a firewall). However you can use tcpd and netperm to
allow specific 'friendly' networks to get to servers
on one of your LAN's, despite the fact that there are
no routes directly to those machines.
<P>
This brings us back to other forms of NAT. I mentioned
at the get-go that masquerading is one form of NAT. It
specifically involves a "many to one" arrangement.
(The "many" clients on your LAN appearing as "one" connection
to the Internet).
<P>
Another form of NAT is "many to many" -- where you have a
table translations. Thus each of your systems might be
configured to use one address, and be translated to appear as
if it came from anoter. I personally don't see much use for
this arrangement. The one case I could see for it might be
if you had a net of devices that you couldn't renumber, which
had "illegal" or "invalid" addresses.
<P>
One other form of NAT involves a different "many to many"
translation -- its not currently available for Linux but
it's used in the Cisco Local Director product. This is a
trick for doing IP level load balancing. You have a
"reverse masquerade" host accept requests to "a" busy
server (one service on one IP address) and you have it
masquerade the session to any of multiple "inside" machines
that have the same service and content available.
<P>
For load balancing it's trivially easy to use DNS "round
robin records" -- so I don't see much application for this
form of NAT either.
<P>
Anyway -- that's all I have the energy to type for now.
<P>
I hope this explains the terms and concepts and gives you
enough examples to set up what you want. For the most part
you can just use the one magic ipfwadm command to "turn on"
masquerading. The rest is just the configuration of your
network and of your ISP connection -- which you've presumably
already done.
<P>
-- Jim
<!--================================================================-->
<P> <hr> <P>
<center><H4>Previous "Answer Guy" Columns</H4></center>
<P>
<A HREF="../issue13/answer.html">Answer Guy #1, January 1997</A><BR>
<A HREF="../issue14/answer.html">Answer Guy #2, February 1997</A><br>
<A HREF="../issue15/answer.html">Answer Guy #3, March 1997</A><br>
<A HREF="../issue16/answer.html">Answer Guy #4, April 1997</A><br>
<A HREF="../issue17/answer.html">Answer Guy #5, May 1997</A><br>
<A HREF="../issue18/lg_answer18.html">Answer Guy #6, June 1997</A><br>
<A HREF="../issue19/lg_answer19.html">Answer Guy #7, July 1997</A><br>
<A HREF="../issue20/lg_answer20.html">Answer Guy #8, August 1997</A><br>
<A HREF="../issue21/lg_answer21.html">Answer Guy #9, September 1997</A><br>
<A HREF="../issue22/lg_answer22.html">Answer Guy #10, October 1997</A><br>
<A HREF="../issue23/lg_answer23.html">Answer Guy #11, December 1997</A><br>
<A HREF="../issue24/lg_answer24.html">Answer Guy #12, January 1998</A><br>
<A HREF="../issue25/lg_answer25.html">Answer Guy #13, February 1998</A><br>
<A HREF="../issue26/lg_answer26.html">Answer Guy #14, March 1998</A>
<P><HR><P>
<center><H5>Copyright &copy; 1998, James T. Dennis <BR>
Published in <I>Linux Gazette</I> Issue 27 April 1998</H5></center>
<P> <hr> <P>
<!--================================================================-->
<A HREF="./index.html"><IMG SRC="../gx/indexnew.gif" ALT="[ TABLE OF
CONTENTS ]"></A> <A HREF="../index.html"><IMG SRC="../gx/homenew.gif"
ALT="[ FRONT PAGE ]"></A>
<A HREF="lg_bytes27.html"><IMG SRC="../gx/back2.gif" ALT=" Back "></A>
<A HREF="./kodis.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
<!--startcut ======================================================= -->
</body>
</html>
<!--endcut ========================================================= -->