365 lines
14 KiB
HTML
365 lines
14 KiB
HTML
<!--startcut ======================================================= -->
|
||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
||
<html><head>
|
||
<META NAME="generator" CONTENT="lgazmail v1.1pre9c">
|
||
<TITLE>The Answer Guy 32:
|
||
Web server clustering project
|
||
</TITLE>
|
||
<!-- ORIGINAL SUBJECT:
|
||
Web server clustering project
|
||
JTD SUBTITLE:
|
||
|
||
-->
|
||
</head>
|
||
|
||
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#A000A0"
|
||
ALINK="#FF0000">
|
||
<H4>"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
||
</H4>
|
||
<P> <hr> <P>
|
||
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
|
||
<H1 align="center"><A NAME="answer">
|
||
<img src="../gx/dennis/qbubble.gif" alt="" border="0" align="middle">
|
||
<a href="./index.html">The Answer Guy</a>
|
||
<img src="../gx/dennis/bbubble.gif" alt="" border="0" align="middle">
|
||
</A></H1>
|
||
<BR>
|
||
<H4 align="center">By James T. Dennis,
|
||
<a href="mailto:linux-questions-only@ssc.com">linux-questions-only@ssc.com</a>
|
||
<BR>Starshine Technical Services, <A HREF="http://www.starshine.org/">http://www.starshine.org/</A>
|
||
</H4>
|
||
<p><hr><p>
|
||
<!--endcut ========================================================= -->
|
||
<H3><img src="../gx/dennis/qbub.gif" alt="(?)"width="50" height="28"
|
||
align="left" border="0">Web server clustering project</H3>
|
||
|
||
|
||
<p><strong>From Jim Kinney
|
||
in the</strong>
|
||
<a href="news:comp.unix.questions">comp.unix.questions</a>
|
||
<strong>newsgroup on 22 Jul 1998 </strong></p>
|
||
|
||
<!-- begin body -->
|
||
|
||
|
||
<STRONG><P>
|
||
I am starting the research into the design and implementation of a 3
|
||
node cluster to provide high availability web, database, and support
|
||
services to a computer based physics lab. As envisioned, the primary
|
||
interface machine will be the web server. The database that provides
|
||
the dynamic web pages will be on a separate machine. Some other
|
||
processes that accept input from the web process and output to the
|
||
database will be on the third machine.
|
||
</P></STRONG>
|
||
|
||
|
||
<BLOCKQUOTE><IMG SRC="../gx/dennis/bbub.gif" ALT="(!)" width="50" height="28" border="0" lign="bottom"
|
||
>Have you looked at the "High Availability HOWTO"?
|
||
</blockquote>
|
||
|
||
|
||
<code><blockquote><blockquote
|
||
><A HREF="http://sunsite.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html"
|
||
>http://sunsite.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html</A>
|
||
</blockquote></blockquote></code>
|
||
|
||
<blockquote>There's also the common "round robin DNS" model ---
|
||
which is already used by many service providers. It
|
||
has its limitations --- but it's the first thing
|
||
to try if the clients can be configured to gracefully
|
||
retry transactions on failure.
|
||
</blockquote>
|
||
|
||
<blockquote>There's also the MOSIX project which was developed
|
||
under BSD/OS and is allegedly being ported to Linux.
|
||
This provides for process migration (again, more of
|
||
a performance clustering and load balancing feature set).
|
||
</blockquote>
|
||
|
||
<code><blockquote><blockquote
|
||
><A HREF="http://www.cs.huji.ac.il/mosix/">http://www.cs.huji.ac.il/mosix/</A>
|
||
</blockquote></blockquote></code>
|
||
|
||
<blockquote>However, there is another concept called "checkpointing."
|
||
You can think of this as having regular, transparent,
|
||
non-terminal "core dumps" (snapshots) taken of each
|
||
process (or process group). These are written to
|
||
disk and can be reloaded and restarted at the point where
|
||
they left off. I'm not aware of any projects to
|
||
provide checkpointing to Linux (or checkpointing subsystems).
|
||
(Obviously any application can do its own checkpointing
|
||
in a non-transparent fashion --- roughly equivalent to the
|
||
periodic automatic saves performed by '<TT>emacs</TT>' and other editors).
|
||
</blockquote>
|
||
|
||
<blockquote>I have a pointer to some miscellaneous notes on checkpointing:
|
||
</blockquote>
|
||
|
||
<code><blockquote><blockquote
|
||
><A HREF="http://warp.dcs.st-and.ac.uk/warp/systems/checkpoint/"
|
||
>http://warp.dcs.st-and.ac.uk/warp/systems/checkpoint/</A>
|
||
</blockquote></blockquote></code>
|
||
|
||
<blockquote>The implication here is that you could create a hybrid
|
||
checkpointing <EM>and</EM> process migration model that would
|
||
provide high availability. In a client/server context
|
||
this would probably only be suitable for situations where
|
||
the communications protocols were very robust --- and
|
||
it might still require some IP and/or MAC address assumption
|
||
or some specialized routing tricks.
|
||
</blockquote>
|
||
|
||
<blockquote>One such routing trick might be the IP NAT project.
|
||
IP masquerading is one form of NAT (allowing many
|
||
clients to masquerade as a single proxy system).
|
||
</blockquote>
|
||
|
||
|
||
<code><blockquote><blockquote
|
||
><A HREF="http://www.csn.tu-chemnitz.de/~mha/"
|
||
>http://www.csn.tu-chemnitz.de/~mha/</A>
|
||
</blockquote></blockquote></code>
|
||
|
||
<blockquote>Another form of NAT is many-to-many. Let's say you
|
||
connected two disconnected sites that both chose
|
||
<tt>10.1.*.*</tt> addresses for their use --- you could put
|
||
a NAT router between them that would bidirectionally
|
||
translate the <tt>10.1.*.*</tt> to corresponding <tt>172.16.*.*</tt>
|
||
addresses. Thus the two sites would be able to
|
||
interoperate over a broader range of protocols than
|
||
would be the case for IP masqurading --- since the
|
||
TCP/UDP ports would not be re-written --- each <tt>10.1.*.*</tt>
|
||
address corresponds on a one-to-one basis with a <tt>172.16.*.*</tt>
|
||
counterpart.
|
||
</blockquote>
|
||
|
||
<blockquote>The one other form of NAT is one-to-many (or "load
|
||
balancing"). This makes one simple router look like a server.
|
||
In actuality that "server" is just dispatching the packets it
|
||
receives to any one of the backend servers it chooses
|
||
(statistically or based on metrics that they communicate
|
||
amongst themselves, privately). Cisco has a product called
|
||
"Local Director" that does exactly this. One of the experimental
|
||
versions of the Linux IP NAT code also appeared to do this
|
||
with some success. I don't know if any further work as
|
||
progressed on these lines.
|
||
</blockquote>
|
||
|
||
<blockquote>Yet another approach that might make sense is to provide
|
||
for replication of the data (files) across servers and
|
||
to use protocols that transparent select among available
|
||
servers (mirrors). This sounds just like CODA.
|
||
</blockquote>
|
||
|
||
<code><blockquote><blockquote>
|
||
<A HREF="http://www.coda.cs.cmu.edu/top.html"
|
||
>http://www.coda.cs.cmu.edu/top.html</A>
|
||
</blockquote></blockquote></code>
|
||
|
||
|
||
<BLOCKQUOTE>A less sophisticated approach to replication is to
|
||
use the <TT>rsync</TT> package to maintain some failover servers
|
||
(mirrors) --- and require that writes all go to one
|
||
active server.
|
||
</blockquote>
|
||
|
||
|
||
<STRONG><P><IMG SRC="../gx/dennis/qbub.gif" ALT="(?)" width="50" height="28" border="0" lign="bottom"
|
||
>So, I am open to suggestions, comments, info, links to sites, book
|
||
titles, etc. I have proposed a one year development time for the whole
|
||
cluster, with a single machine application prototype of the user
|
||
visible/used portion by around the Jan 1999. I love my job!
|
||
</p></strong>
|
||
|
||
<P><STRONG>Jim Kinney M.S.
|
||
<BR>Educational Technology Specialist
|
||
<BR>Department of Physics
|
||
<BR>Emory University
|
||
</strong></p>
|
||
|
||
|
||
<BLOCKQUOTE><IMG SRC="../gx/dennis/bbub.gif" ALT="(!)" width="50" height="28" border="0" lign="bottom"
|
||
>Web, mail, DNS and a number of other Internet services
|
||
are naturally robust. With DNS you normally list up to
|
||
three servers per host (in the <TT>/etc/resolv.conf</TT>) and all
|
||
of these will be checked before a name lookup will fail.
|
||
With SMTP the client will try each of the hosts listed
|
||
in the results of an MX query. Round robin DNS will force
|
||
most clients to try multiple different IP addresses on
|
||
failure most of the time.
|
||
</blockquote>
|
||
|
||
|
||
<BLOCKQUOTE>However, the applications that really need HA (fail over)
|
||
and clustering for performance are things like db servers.
|
||
</blockquote>
|
||
|
||
|
||
<BLOCKQUOTE>Having two systems monitor and process something like
|
||
a set of db transactions in parallel (one active the
|
||
other "mimic'ing" the first but not returning results)
|
||
would be very interesting. The "mimic" would attempt
|
||
to maintain the same applications state as the server
|
||
--- and would assume the server's IP and MAC (ethernet
|
||
media access control) addresses on failure --- to then
|
||
transparently continue the transaction processing
|
||
that was going on.
|
||
</blockquote>
|
||
|
||
|
||
<BLOCKQUOTE>You might prototype such a system using web and ftp
|
||
(the FTP application is a more dramatic demonstration
|
||
--- since a web server involves many short transactions
|
||
and mostly operates in a "disconnected" fashion).
|
||
</blockquote>
|
||
|
||
|
||
<BLOCKQUOTE>One approach might be to have a custom ethernet
|
||
driver that can be instructed to throw all of its
|
||
output into the bit bucket. Thus the mimic is
|
||
normally silent, but following a failure on the
|
||
server it does the address assumption and rips off
|
||
the muzzle. I suspect you'd have to have another
|
||
interface between the two servers, one which is
|
||
dedicated to maintaining the same state between
|
||
the server and the mimic.
|
||
</blockquote>
|
||
|
||
|
||
<BLOCKQUOTE>(For example if the server get a collision or
|
||
an error that wasn't sensed by the mimic -- or
|
||
vice versa -- the two might get horribly out
|
||
of sync when the upper layer protocols require
|
||
a resend. With special drivers the two systems
|
||
might resolve these discrepancies at the kernel/driver
|
||
layer --- so that the applications will always get
|
||
the same data on their sockets).
|
||
</blockquote>
|
||
|
||
|
||
<BLOCKQUOTE>I really have no idea how much tweaking this would
|
||
take and whether or not it's even feasible.
|
||
</blockquote>
|
||
|
||
|
||
<BLOCKQUOTE>However, it seems that your intent is to provide
|
||
failover that is transparent to the applications
|
||
layer. So, the work obviously has to happen below that.
|
||
</blockquote>
|
||
|
||
|
||
<BLOCKQUOTE>It is unclear whether you are primarily interested
|
||
in deploying a set of servers for use by your
|
||
Physics team or whether you are interested in
|
||
doing research and development in the computer science.
|
||
</blockquote>
|
||
|
||
<blockquote>In any event your project will probably involve a
|
||
hybrid of several of these approaches:
|
||
</blockquote>
|
||
|
||
<ul>
|
||
<li>Round robin DNS
|
||
<li>Failover with IP and MAC address assumption.
|
||
(and/or load balancing NAT).
|
||
<li>Replication and or "mirroring" (more failover).
|
||
<li>Multi-initiator SCSI (where a single SCSI
|
||
bus has multiple computers active on it, such
|
||
that these computers have shared access to
|
||
the attached peripheral devices).
|
||
</ul>
|
||
|
||
<blockquote>It would be very interesting to see someone develop
|
||
process migration and checkpointing features for Linux
|
||
though there doesn't seem to be any active work going
|
||
on now.
|
||
</blockquote>
|
||
|
||
|
||
<BLOCKQUOTE>I'd also love to see an
|
||
"<a href="http://google.stanford.edu/linux?query=doc:cesdis.gsfc.nasa.gov/linux-web/beowulf/beowulf.html">Beowulf</a> enabled" SQL dbserver
|
||
(where a couple of failover capable "dispatchers" could
|
||
farm out transactions to multiple clustered Linux boxes
|
||
in some sensible manner). I'm not even sure if that's
|
||
feasible --- but it sure would knock down the scaleability
|
||
walls that I hear about from those dbadmins.
|
||
</blockquote>
|
||
|
||
<!-- end body -->
|
||
|
||
<!--startcut ======================================================= -->
|
||
<P> <hr> <P>
|
||
<H5 align="center"><a href="http://www.linuxgazette.com/copying.html"
|
||
>Copyright ©</a> 1998, James T. Dennis <BR>
|
||
Published in <I>Linux Gazette</I> Issue 32 September 1998</H5>
|
||
<P> <hr> <P>
|
||
|
||
<!--::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::-->
|
||
<table width="98%"><tr valign="center" align="center">
|
||
<td rowspan="3"><A HREF="./lg_answer32.html"><IMG
|
||
SRC="../gx/dennis/answernew.gif"
|
||
ALT="[ Answer Guy Index ]"></A></td>
|
||
<td><A HREF="tag_phreak.html">phreak</A>
|
||
<td><A HREF="tag_abandon.html">abandon</A>
|
||
<td><A HREF="tag_javaterm.html">javaterm</A>
|
||
<td><A HREF="tag_BBS.html">BBS</A>
|
||
<td><A HREF="tag_flaws.html">flaws</A>
|
||
<td><A HREF="tag_doslinux.html">doslinux</A>
|
||
<td><A HREF="tag_resume.html">resume</A>
|
||
|
||
</tr><tr valign="center" align="center">
|
||
<td><A HREF="tag_softwindows.html">softwindows</A>
|
||
<td><A HREF="tag_convert.html">convert</A>
|
||
<td><A HREF="tag_apache.html">apache</A>
|
||
<td><A HREF="tag_emulate.html">emulate</A>
|
||
<td><A HREF="tag_database.html">database</A>
|
||
<td><A HREF="tag_distrib.html">distrib</A>
|
||
<td><A HREF="tag_proxy.html">proxy</A>
|
||
|
||
</tr><tr valign="center" align="center">
|
||
<td><A HREF="tag_disable.html">disable</A>
|
||
<td><A HREF="tag_DVI.html">DVI</A>
|
||
<td><A HREF="tag_superblock.html">superblock</A>
|
||
<td><A HREF="tag_serial.html">serial</A>
|
||
<td><A HREF="tag_permission.html">permission</A>
|
||
<td><A HREF="tag_detach.html">detach</A>
|
||
<td><A HREF="tag_cdr.html">cdr</A>
|
||
|
||
</tr><tr valign="center" align="center">
|
||
<td><A HREF="tag_rs422.html">rs422</A>
|
||
<td><A HREF="tag_modem.html">modem</A>
|
||
<td><A HREF="tag_notfound.html">notfound</A>
|
||
<td><A HREF="tag_tuning.html">tuning</A>
|
||
<td><A HREF="tag_libc5.html">libc5</A>
|
||
<td><A HREF="tag_startup.html">startup</A>
|
||
<td><A HREF="tag_clock.html">clock</A>
|
||
<td><A HREF="tag_ping.html">ping</A>
|
||
|
||
</tr><tr valign="center" align="center">
|
||
<td><A HREF="tag_accounts.html">accounts</A>
|
||
<td><A HREF="tag_lilo.html">lilo</A>
|
||
<td><A HREF="tag_NDS.html">NDS</A>
|
||
<td><A HREF="tag_95slow.html">95slow</A>
|
||
<td><A HREF="tag_nonlinux.html">nonlinux</A>
|
||
<td><A HREF="tag_progenv.html">progenv</A>
|
||
<td><A HREF="tag_cluster.html">cluster</A>
|
||
<td><A HREF="tag_ftpd.html">ftpd</A>
|
||
|
||
</tr></table>
|
||
<P> <hr> <P>
|
||
<!--::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::-->
|
||
<A HREF="./index.html"><IMG SRC="../gx/indexnew.gif"
|
||
ALT="[ Table Of Contents ]"></A>
|
||
<A HREF="../index.html"><IMG SRC="../gx/homenew.gif"
|
||
ALT="[ Front Page ]"></A>
|
||
<A HREF="lg_bytes32.html"><IMG SRC="../gx/back2.gif"
|
||
ALT="[ Previous Section ]"></A>
|
||
<A HREF="./stemen.html"><IMG SRC="../gx/fwd.gif"
|
||
ALT="[ Next Section ]"></A>
|
||
<!--::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::-->
|
||
</body>
|
||
</html>
|
||
<!--endcut ========================================================= -->
|
||
|
||
|