old-www/LDP/LG/issue42/lg_answer42.html

540 lines
20 KiB
HTML

<!--startcut ======================================================= -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<META NAME="generator" CONTENT="lgazmail v1.2J.h">
<TITLE>The Linux Gazette 42: The Answer Guy</TITLE>
</HEAD><BODY BGCOLOR="#FFFFFF" TEXT="#000000"
LINK="#3366FF" VLINK="#A000A0">
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<H4>"The Linux Gazette...<I>making Linux just a little more fun!</I>"</H4>
<P> <hr> <P>
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<center>
<H1><A NAME="answer">
<img src="./../gx/dennis/qbubble.gif" alt="(?)"
border="0" align="middle">
<font color="#B03060">The Answer Guy</font>
<img src="./../gx/dennis/bbubble.gif" alt="(!)"
border="0" align="middle">
</A></H1>
<BR>
<H4>By James T. Dennis,
<a href="mailto:linux-questions-only@ssc.com">linux-questions-only@ssc.com</a><BR>
LinuxCare,
<A HREF="http://www.linuxcare.com/">http://www.linuxcare.com/</A>
</H4>
</center>
<p><hr><p>
<!-- endcut ======================================================= -->
<H3>Contents:</H3>
<p><a href="#tag/greeting"
><img src="./../gx/dennis/bbub.gif" alt="(!)" border="0"
align="middle"><strong>Greetings From Jim Dennis</strong></A></p>
<DL>
<!-- index_text begins -->
<dt><A HREF="tag/1.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Setting up a Loopback Mount --or--
<dd><A HREF="tag/1.html"
><strong>
Loopback (localhost) NFS Mounting for FTP
</strong></a>
<dt><A HREF="tag/2.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>sites for general disk info? --or--
<dd><A HREF="tag/2.html"
><strong>
General HD Info and Boot Code
</strong></a>
<dt><A HREF="tag/3.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>TCP Sockets --or--
<dd><A HREF="tag/3.html"
><strong>
SYN, SYN/ACK, ACK, ACK, ACK: TCP Handshaking
</strong></a>
"Pleased to meet you!"
<dt><A HREF="tag/4.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>cvs tree for pam --or--
<dd><A HREF="tag/4.html"
><strong>
PAM chroot
</strong></a>
Wherein Jim rants about PAM
<dt><A HREF="tag/5.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Resizing partitions --or--
<dd><A HREF="tag/5.html"
><strong>
Filesystem Management: What must be "resident" at all times?
</strong></a>
<dt><A HREF="tag/6.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Hubs --or--
<dd><A HREF="tag/6.html"
><strong>
Ethernet Switches vs. Hubs
</strong></a>
<dt><A HREF="tag/7.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>procmail and saved variables. --or--
<dd><A HREF="tag/7.html"
><strong>
MATCH and Replaceable Parameters in procmail
</strong></a>
<dt><A HREF="tag/8.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
><strong>RMA for Video Card</strong></a>
<dt><A HREF="tag/9.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Unix Internal --or--
<dd><A HREF="tag/9.html"
><strong>
Inodes Numbering: An Academic Question
</strong></a>
<dt><A HREF="tag/10.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>One Bad Sector thats gettin on my nerves! --or--
<dd><A HREF="tag/10.html"
><strong>
One Bad Sector
</strong></a>
It Doesn't Ruin the Whole Disk
<dt><A HREF="tag/11.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Server shutdown/restart: 2-key keyboard --or--
<dd><A HREF="tag/11.html"
><strong>
Server Shutdown Button
</strong></a>
<dt><A HREF="tag/12.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>hal91 --or--
<dd><A HREF="tag/12.html"
><strong>
HAL91 (Floppy Based Linux Distribution)
</strong></a>
<dt><A HREF="tag/13.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>ping at a differnt port --or--
<dd><A HREF="tag/13.html"
><strong>
Ping a Port: NOT
</strong></a>
<dt><A HREF="tag/14.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Hey answer guy!!! --or--
<dd><A HREF="tag/14.html"
><strong>
Linux as a Job!
</strong></a>
Hobbies become fun and profit
<dt><A HREF="tag/15.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
><strong>New Kernel Loses Ether Driver;
Dial on Demand and Masquerading</strong></a>
<br>A grabbag of user questions.
<dt><A HREF="tag/16.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
><strong>pcmcia install on debian</strong></a>
<dt><A HREF="tag/17.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>work-around for gdi printer? --or--
<dd><A HREF="tag/17.html"
><strong>
WinPrinter Work-around
</strong></a>
<dt><A HREF="tag/18.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Question about 2 GB max? --or--
<dd><A HREF="tag/18.html"
><strong>
Maximum Filesize vs. Maximum Filesystem Size
</strong></a>
<dt><A HREF="tag/19.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Advanced ipfwadm question. icmp forwarding. --or--
<dd><A HREF="tag/19.html"
><strong>
ICMP Masquerading
</strong></a>
<dt><A HREF="tag/20.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>RedHat 5.2 Kernel 2.0.36 --or--
<dd><A HREF="tag/20.html"
><strong>
Upgrade Breaks Several Programs, <TT>/proc</TT> Problems, BogoMIPS Discrepancies
</strong></a>
<br>A visit to "Library Hell"
<dt><A HREF="tag/21.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Pls spare a minute: --or--
<dd><A HREF="tag/21.html"
><strong>
Spare a Minute to Provide "Some Info"
</strong></a>
<dt><A HREF="tag/22.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>HELP!!!!!!!!!! --or--
<dd><A HREF="tag/22.html"
><strong>
Data "Losted" (sic)
</strong></a>
<dt><A HREF="tag/23.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>"Network Neighborhood" --or--
<dd><A HREF="tag/23.html"
><strong>
Network Neighborhood: Heterogenous File Sharing
</strong></a>
<dt><A HREF="tag/24.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
><strong>AOL</strong></a>
<!-- index_text ends -->
</DL>
<!-- .~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~. -->
<A NAME="tag/greeting"><HR WIDTH="75%" ALIGN="center"></A>
<H3 align="left"><img src="./../gx/dennis/bbubble.gif"
height="50" width="60" alt="(!) " border="0"
>Greetings from Jim Dennis</H3>
<!-- begin greeting -->
<h4 align="center">Lies, Damn Lies and Benchmarks</h4>
<p>Those of you who read slashdot (<a href="http://www.slashdot.org"
>http://www.slashdot.org</a>), the Linux Weekly News
(<a href="http://www.lwn.net">http://www.lwn.net</a>), or other common
Linux webazines and forums have undoubtedly tired of reading about
the Mindcraft fiasco. If so, maybe you'll skip this and go unto the
usual collection of "Answer Guy" questions.</p>
<p>The Mindcraft story has been interesting. As some of my colleagues
have pointed out their "attack" on Linux serves more to legitimize
Linux as a choice for business servers than to undermine it. In
addition it appears that the methodology they used has uncovered
some legitimate opportunities for improvement in the Linux process
scheduling facilities.</p>
<p>I'm referring to the "thundering herd" issue that results from a
large number of processes all doing a <tt>select()</tt> call on a given
socket for file resource -- such as having a 150 Apache servers
listening on port 80. However that is not a new issue; Richard
Gooch (a significant contributor to the Linux kernel mailing list
and code base) discussed similar issues and possible patches almost
a year ago:</p>
<dl><dt>I/O Event Handling Under Linux
<dd><tt><a
href="http://wwwatnf.atnf.csiro.au/people/rgooch/linux/docs/io-events.html"
>http://wwwatnf.atnf.csiro.au/people/rgooch/linux/docs/io-events.html</a></tt>
</dl>
<p>It looks like some work will go into the Linux kernel and into
Apache to resolve some of those issues. In addition I know that
Andrew Tridgell and Jeremy Allison (a couple of the principal
members of the Samba development team) have been been continuing
thier work on Samba.</p>
<p>So the Linux/Apache/Samba combination will show improvement for the
general case. Samba 2.0.4 just shipped and already has some of
these enhancements. Some of the interesting changes to the Linux
kernel might already be present in the 2.3.3 developmental kernel
(and might be easily pack ported as a set of 2.2.9 patches). So we
could see some of the improvements within a couple of weeks.</p>
<p>Some of these improvements may give Linux a better showing in any
"Mindcraft III" or similar benchmark. Maybe they won't. The
<em>improvements</em> will be for the general case --- and I don't see
much chance that open source developers will sneak in special case
code that will only improve "benchmark" performance without being
of real benefit.</p>
<p>That's one of the problems with closed source vendors. There's
great temptation to put in code that isn't of real value to real
customers but will be great for benchmarks and magazine reviewers.
This has been detected on several occassions by several vendors;
but it would be completely blatant in any open source project.</p>
<p>Frankly, I don't care if we improve our Mindcraft results. I
prefer to question the very premises on which the whole discussion
is based.</p>
<p>There are three I'd like to mention:</p>
<ul>
<li>Big Server for Little Jobs
<li>Apache for simple HTTP of static HTML
<li>SMB as a File Service
</ul>
<p>The fallacy of the whole Mindcraft mindset is that we should have
"big servers" to provide file and web services. Let's ask about that.</p>
<p>Why?</p>
<p>The reason Microsoft wants to push big servers should be relatively
obvious. Microsoft's customers are the hardware vendors and VARs.
Most end customers, even the IT departments at large corporations,
don't install their own OS. They order a system with the OS and
major services pre-installed (or order systems and pay contractors
and/or consultants to perform the installation and initial
configurations).</p>
<p>So, it is in Microsoft's vested interest to encourage the sale of
high end and expensive systems. The cost of NT itself is then a
tinier fraction of the overall outlay. One or two grand for the OS
seems less outrageous when expressed as a percentage of 10 to 20
thousand dollars.</p>
<p>So, how many customers really need 4-way SMP systems? Are 4-way
SMP systems <em>EVER</em> really a better choice for web and file services
than a set of four or more similar quality separate systems?</p>
<p>Big 4 or 8 CPU SMP servers are probably the best choice for some
applications. It's even possible that such systems are optimal for
SOME web and file servers. What's really important, however, is
whether such systems are appropriate to YOUR situation.</p>
<p>Back when NT was first starting to emerge as a real threat to
Netware it was interesting that the press harped on the lack of
"scaleable SMP" support in Netware 3.x and 4.x. I'm sure there are
analysts today who would continue to argue that this was the
primary reason for Netware's loss of marketshare during the early
to mid '90s.</p>
<p>Personally I suspect that the bigger factors in Netware's woes were
from three other causes:</p>
<dl><dt>Client support: <dd>MS shipped Win '95 and WfW with
support for SMB. Novell never adapted their
servers to work with the support that was shipped
with the clients. By all accounts SMB is a
vastly inferior suite of protocols to Netware's
NCP. However, IT managers are often eager to
save a penny on every client by not having their
sysadmins and help desk people visit every new
system to install network client drivers.
<dt>TCP/IP: <dd>Novell provided TCP/IP early on --- in the
form of expensive addons to their main servers,
and a relatively expensive suite of client tools
for MS-DOS. They didn't adapt to the emergence
of the Internet in corporate circles by including
TCP/IP as standard features in their base
packages. Meanwhile IPX's SAP (service
advertising protocols) were sucking up a
noticable portion of the available bandwidth as
more companies put MANY more devices on their
LANs and WANs. Novell had the technology, but
they failed to rethink their pricing model,
probably in a doomed effort to protect some of
their revenue streams.
<dt>Pricing: <dd>Microsoft had a huge advantage over Novell.
They could afford to practically give away NT
server for a few years (and perhaps turn a blind
eye to some amount of piracy, temporarily) so
long as that would cost Novell some server licenses.
</dl>
<p>Of course, I could be wrong. I'm not an industry analyst.
However, I do know that the considered opinion of the Netware
specialists I knew back around '93 was that Netware didn't need SMP
support. It was plenty fast enough without additional processors.
NT, on the other hand, has so much overhead that it needs about 4
CPUs to get going.</p>
<p>So, if we're not going to use "big servers" how do we "scale?"</p>
<p>Replication and Distribution.</p>
<p>Look at how the whole Internet scales. We have the DNS system
which distributes (and delegates) the management of a huge database
over millions of domains. We don't even bat an eye that an average
DNS lookup takes less than a second. The SMTP mail system also has
proven scalability. It handles untold millions of messages a day
(some of which isn't even spam).</p>
<p>Of course some people are already chomping at the bit to write to
me and explain what an idiot I am. There are problems with
replicating files and HTML across multiple servers. Some
applications are very sensitive to concurrency issues and race
conditions. There are cases where the accessor of a file must have
the absolute latest version and must be able to retain a lock on
it. There are cases where we want to lock just portions of files, etc.</p>
<p>However, these are not the most common cases. Going for the "big
server" approach is often a sign of laziness. Rather than identify
the specific sets of applications that require centralized control
and access, they try to toss everything on the "one size stomps
all" server.</p>
<p>In the degenerate case of the Mindcraft benchmarks it would be
amusing to pit four low cost PCs running Linux against one "big
server" running NT. I say "degenerate case" since the benchmarks
used there don't seem to have any concurrency or locking issues (at
least not for the HTTP portions of the test).</p>
<p>Needless to say we'd also seem some advantages beyond the
scalability of our "hoard of cheap servers" approach. For example
we could use dynamic DNS and failover scripts to ensure that
transparent availability was maintained even through the loss of
three of the four servers. There's certainly some robustness to
this approach. In addition we can perform tests and upgrades to
one or more systems in these loose clusters without any service
down time.</p>
<p>Because these use commodity components it's also possible to keep
shelf spares in an on site depot. Thus reducing the downtime for
individual nodes and providing the flexibility to rapidly increase
the clusters capacity in the face of exceptional demands.</p>
<p>All that --- and it's usually CHEAPER, too.</p>
<p>Naturally there are some challenges to this approach. As I
mentioned, we have to configure these systems with some sort of
replication software (<tt>rdist</tt>, <tt>rsync</tt>) and test
regularly to ensure that the replication process isn't introducing
errors and/or corruption. There are also the problems with writable
access and the needs for the nodes in a cluster to communicate about
file locking and application (i.e. CGI) state.</p>
<p>The point is not so much to promote the "hoard of thin servers"
approach as to question the premise. Do we really need a "big
server" for OUR task?</p>
<p>I've talked about the fundamental disconnect between mass marketing
and customer requirements before. "Mass marketing" sells features
in the hopes that masses will will buy them. Customers must
consider the "benefits" of each "feature" before accepting any
arguments about the superiority of one product's implementation of
a given "feature" over another.</p>
<p>As an example let's consider Linux' much vaunted "multi-user"
feature. To many people this is not a benefit. Many people will
never have anyone else "logged into" their system. To people like
my mom "multi-user" is just an inconvenience that requires her to
"login" and means that she sometimes needs to 'su' to get at
something she wants. (Granted there are ways around those). In
some way Linux' "multi-user" features (and those of NT, for that
matter) are actually a detriment to some people. The represent a
cost (albeit a small and easily surmounted one) to some users.</p>
<p>This leads us to the other two issues that I would question.</p>
<p>Apache is not necessarily the best package for providing
high speed, low-latency, HTTP of simple, static HTML files.</p>
<p>There are lightweight micro web servers that can do this
better. I've also heard of people who use a small cluster
of Squid proxy servers interposed between their Apache servers
and their routers. Thus the end users are transparently
access an organizations Squid caches rather than directly accessing
it's web servers. This is a strange twist on the usual case
where the squid caches are located at the client's network.</p>
<p>By all accounts SMB is a horrid filesharing protocol. The authors
of Samba take a certain amount of wretched glee in describing all
of the misfeatures of this protocol. Its sole "advantage" is that
it's included, preconfigured with 98% of the the client systems
that are shipped by hardware vendors today.</p>
<p>Note: I'm NOT saying that NFS is any better. Its main advantage
is that almost all UNIX systems support it.</p>
<p>Personally I have high hopes for CODA. Its about time we deployed
better filesystems for the more common requirements of a new millennia.</p>
<p>I'm not the first to say it:</p>
<blockquote>
"There are lies, damned lies, and benchmarks"
</blockquote>
<p>However, the important thing about any statistic or benchmark is
to understand the presenter. Look behind the numbers and
even the methodology and ask: "Who says?" "What do they want
from this?"</p>
<p>Alternatively you can just reject statistics and benchmarks
from others, and make your decisions based on your own criteria and
as a result of your own tests.</p>
<p>The scientific method should not be used solely by scientists. It
has application for each of us.</p>
<p>-- Jim Dennis</p>
<!-- end greeting -->
<!--startcut ======================================================= -->
<P> <hr> <P>
<H5 align="center"><a href="http://www.linuxgazette.com/copying.html"
>Copyright &copy;</a> 1999, James T. Dennis
<BR>Published in <I>The Linux Gazette</I> Issue 42 June 1999</H5>
<H6 ALIGN="center">HTML transformation by
<A HREF="mailto:star@starshine.org">Heather Stern</a> of
Starshine Techinical Services,
<A HREF="http://www.starshine.org/">http://www.starshine.org/</A>
</H6>
<P> <hr> <P>
<!-- begin lgnav ::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<A HREF="./index.html"
><IMG SRC="./../gx/indexnew.gif" ALT="[ Table Of Contents ]"></A>
<A HREF="/index.html"
><IMG SRC="./../gx/homenew.gif" ALT="[ Front Page ]"></A>
<A HREF="./lg_bytes42.html"
><IMG SRC="./../gx/back2.gif" ALT="[ Previous Section ]"></A>
<A HREF="./lg_tips42.html"
><IMG SRC="./../gx/fwd.gif" ALT="[ Next Section ]"></A>
<!-- end lgnav ::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
</BODY></HTML>
<!--endcut ========================================================= -->