mirror of https://github.com/tLDP/LDP
344 lines
15 KiB
Plaintext
344 lines
15 KiB
Plaintext
<chapter id="apps">
|
|
<title>Application Tuning</title>
|
|
<para>
|
|
While one can tune hardware and the OS and get great measurements from it,
|
|
most applications for Linux have their own rules for improving performance.
|
|
Tuning a hard drive for an application that uses a lot of memory will not
|
|
improve the speed of that application, whereas investing in more memory will
|
|
create a vast improvement in speed. Let us examine some of the applications
|
|
for Linux and how you can tune a box specifically for it.
|
|
</para>
|
|
|
|
<section id="appssquid">
|
|
<title>Squid Proxy Server</title>
|
|
<para>
|
|
Squid proxy server has two common uses. The first is as a proxy
|
|
cache for internal clients access the external web. The second is
|
|
as a reverse proxy, caching content from internal web servers.
|
|
</para>
|
|
<section id="appssquidcache">
|
|
<title>Squid as a proxy cache</title>
|
|
<para>
|
|
Squid as a proxy cache stores large numbers of web pages, enabling
|
|
them to be sent to clients upon request instead of requiring
|
|
another internet fetch. Squid uses large amounts of disk space
|
|
for the cache, but by nature doesn't access any one that much more
|
|
frequently than any other.
|
|
</para>
|
|
<para>
|
|
The first thing to avoid is a disk bottleneck. In a high-usage
|
|
environment, you will definitely want to distribute the cache file
|
|
across as many spindles as is practical. Since the cache data is
|
|
relatively worthless, in that it is flushed out as a matter of
|
|
course, the additional overhead of a RAID 5 or other relatively
|
|
fail-safe multi-spindle system is not necessary. RAID 0 striping
|
|
will provide the fastest access with no fail-safe requirements.
|
|
</para>
|
|
<para>
|
|
The next thing to avoid is a memory bottleneck. There should
|
|
ABSOLUTELY be no OS paging with Squid, Squid will page enough on
|
|
its own. This is a case where Squid has in effect its own virtual
|
|
memory system, with thousands of pages on disk and tens or
|
|
hundreds in memory. If the OS is paging to provide the tens or
|
|
hundreds in memory, your performance will be horrible.
|
|
</para>
|
|
<para>
|
|
Network performance will probably not be a big deal. If your
|
|
byte hit rate is 25%, and you are maxing out a T1, you are still
|
|
only putting around 2Mbps out your Ethernet connection.
|
|
</para>
|
|
</section> <!-- appssquidcache -->
|
|
<section> <!-- appssquidproxy -->
|
|
<title>Squid as a reverse proxy</title>
|
|
<para>
|
|
As a reverse proxy, the total set of web pages that Squid is
|
|
proxying is reduced greatly. It must store only the limited
|
|
number of pages behind it on your web servers, as opposed to
|
|
the entire internet. Since disk storage requirements are down
|
|
by an order of magnitude, so are memory requirements.
|
|
</para>
|
|
<para>
|
|
However, one approach to this situation would be to build up
|
|
real memory until the entire working set of pages could be
|
|
stored, with no paging required.
|
|
</para>
|
|
<para>
|
|
In this case we're trading large store/infrequent access for a
|
|
smaller store/frequent access. We want to optimize the byte
|
|
hit rate in Squid with real memory to reduce disk operations.
|
|
Again, there should be NO OS PAGING. If any DNS lookups are
|
|
required, it might be a good idea to run a DNS cache locally.
|
|
Otherwise, this shouldn't be too difficult to tune to get
|
|
good performance.
|
|
</para>
|
|
</section> <!-- appssquidproxy -->
|
|
</section> <!-- appssquid -->
|
|
|
|
<section id="appsmail">
|
|
<title>SMTP and Mail Delivery</title>
|
|
<para>
|
|
While this section addresses sendmail directly, most of the
|
|
concepts should apply to any mail delivery system.
|
|
</para>
|
|
|
|
<section id="appsmailsendmail">
|
|
<title>Sendmail</title>
|
|
<para>
|
|
<indexterm>
|
|
<primary>
|
|
sendmail
|
|
</primary>
|
|
</indexterm>
|
|
A sendmail system will probably be the most likely to be
|
|
network bottlenecked, but that bottleneck will also
|
|
most likely be outside the system itself. There are no
|
|
special tricks since a typical sendmail system will be
|
|
fairly balanced against typical hardware. Depending
|
|
on the size of the mail store, some type of RAID
|
|
system may be implemented, but the mail store data
|
|
is very valuable so use a RAID 5 or other fault-tolerant
|
|
option. If the store does get very large, you might
|
|
also investigate the benefits of multiple caching
|
|
layers down to the card.
|
|
</para>
|
|
<para>
|
|
Sendmail itself is very dependent on DNS lookups for its
|
|
operation, and may be doing ident lookups as well. As such,
|
|
some tricks on the network can really help. Linux 2.4 kernels
|
|
offer a lot of QoS features, these can be used to prioritize
|
|
DNS and ident lookups over bulk traffic, preventing blocking
|
|
of POP, IMAP, and SMTP waiting on a DNS lookup. I would
|
|
also strongly suggest implementing a DNS cache locally,
|
|
maybe even on the sendmail server itself.
|
|
</para>
|
|
</section> <!-- appsmailsendmail -->
|
|
<section id="appsmaildelivery">
|
|
<title>Mail Delivery (POP and IMAP)</title>
|
|
<para>
|
|
<indexterm>
|
|
<primary>
|
|
Post Office Protocol (POP)
|
|
</primary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>
|
|
IMAP
|
|
</primary>
|
|
</indexterm>
|
|
Memory and processor requirements will depend on the number
|
|
of usersand whether you are providing POP or IMAP services,
|
|
or only SMTP. POP and IMAP require logins, and each login
|
|
will spawn another POP or IMAP process. If concurrent usage
|
|
of these services is high, watch for OS paging.
|
|
</para>
|
|
<para>
|
|
Using POP for delivery has a very low memory and hard drive
|
|
footprint. In POP, e-mail is typically downloaded from the
|
|
server to the client. The POP server merely handles authentication
|
|
and moving the data to the client. When a mail message is sent
|
|
to the client, it is typically removed from the server.
|
|
</para>
|
|
<para>
|
|
Using IMAP puts much more stress on the system. IMAP will copy
|
|
an entire mailbox into memory, and perform sorting and other
|
|
functions on the mailbox. When a client requests a message, the
|
|
message is sent to the client, but is not removed from the server.
|
|
So all mail messages remain on the server until the client explicitly
|
|
deletes it. The advantage to this is a user can have IMAP clients
|
|
on multiple machines and still be able to access all the messages.
|
|
Since most of the processing is offloaded to the server, this makes
|
|
IMAP a great mechanism for light clients or wireless devices
|
|
that do not have much memory or processing power.
|
|
</para>
|
|
<para>
|
|
Since the data on a POP or IMAP server are critical, you should
|
|
be using RAID-5 or RAID-1 to protect the data and keep
|
|
the server running. POP has low memory requirements, while
|
|
IMAP has very high memory and disk requirements, and both
|
|
will likely increase as more users are introduced to the system
|
|
and they start working with mailboxes in the thousands of messages.
|
|
</para>
|
|
<para>
|
|
In either case, using SSL on top of POP or IMAP will also
|
|
increase the CPU load of the protocols, but will have minimal
|
|
affect on disk or memory usage.
|
|
</para>
|
|
<para>
|
|
It is not advisable
|
|
to put a shell account on the same machine as one running IMAP
|
|
or POP, due to the CPU and memory used by these protocols.
|
|
Nor is it advisable to use NFS to export a mailbox to a shell
|
|
machine, as NFS locks can cause corruption of a mailbox
|
|
if sendmail tries to deliver a message and the user is deleting
|
|
another message from the same mailbox.
|
|
</para>
|
|
</section> <!-- appsmaildelivery -->
|
|
</section> <!-- appsmail -->
|
|
|
|
<section id="appsdatabase">
|
|
<title>MySQL, PostgresSQL, other databases</title>
|
|
<para></para>
|
|
</section> <!-- appsdatabase -->
|
|
|
|
<section id="appsfirewall">
|
|
<title>Routers and firewalls</title>
|
|
<para></para>
|
|
</section> <!-- appsfirewall -->
|
|
|
|
<section id="appsapache">
|
|
<title>Apache web server</title>
|
|
<para>
|
|
Apache works best when it is delivering static content. But if you're
|
|
only delivering static content, you should consider using
|
|
<application>Tux</application>, the Linux kernel HTTP server.
|
|
</para>
|
|
<para>
|
|
There are a number of small quick-tips you can use to increase the
|
|
throughput and responsiveness of Apache.
|
|
</para>
|
|
<para>
|
|
One such tip is to turn off DNS resolving in Apache. When Apache receives
|
|
a request to send data to a remote server, Apache will request a reverse
|
|
DNS lookup on that IP address. While the requests winds its way through
|
|
the DNS heirarchy, Apache is stuck waiting for the answer before it can
|
|
send the page back to the requesting client. By turning this feature off,
|
|
Apache will merely log the IP address without name in its log file, and
|
|
then deliver the page. A problem with this is if the machine is cracked
|
|
through the web server, you will have to do your own reverse DNS lookup to
|
|
find out what country and domain the machine is from. A way to have the
|
|
best of both worlds is to run a caching-only name server run on the local
|
|
network or on the web server box itself. This allows Apache to quickly
|
|
get the DNS information, and store a hostname and domain instead of just
|
|
an IP address.
|
|
</para>
|
|
<para>
|
|
Cutting the size of graphic images is also a way to improve performance.
|
|
Using algorithms such as JPG or PNM will cut the size of the image file
|
|
without reducing the quality of the image by much. This allows more image
|
|
files to go out per second.
|
|
</para>
|
|
<para>
|
|
Instead of reinventing the wheel, see if Apache has a module for the
|
|
feature you need. Modules for authentication against SMB, NIS, and LDAP
|
|
servers all exist for Apache, and you will not have to add the
|
|
functionality to your server side scripts.
|
|
</para>
|
|
<para>
|
|
Make sure your database is tuned properly. This has been covered in
|
|
<xref linkend="appsdatabase">, but a poorly tuned database can slow
|
|
everything down. If you expect heavy usage, do not put the database and
|
|
web server on the same machine. One machine running the database, and
|
|
another running the web server connected by high speed (gigabit Ethernet)
|
|
will allow both machines to operate quickly.
|
|
</para>
|
|
<para>
|
|
Since so many web sites are using server-based applications and CGI, it
|
|
makes sense to take a look at the server interpreters being used. If you
|
|
use a language like Java that is known to be pretty slow, you can expect
|
|
lower performance as compared to using applications compiled in C.
|
|
</para>
|
|
<para>
|
|
But Java has its own advantages over C. The biggest is that Java is
|
|
designed to be much more portable than C is. Since Java is an object
|
|
oriented language, its code reuse is much better, allowing delopers to
|
|
create applications quicky.
|
|
</para>
|
|
<para>
|
|
There are three other major server languages used by Apache. These are
|
|
Perl, Python, and PHP. All three are included in most Linux
|
|
distributions, and if they're not on your system, each is easy to install
|
|
and get working.
|
|
</para>
|
|
<section id="appsapacheperl">
|
|
<title>Perl</title>
|
|
<para>
|
|
Perl is pretty much the granddaddy of CGI and server programs. It has
|
|
excellent string manipulation abilities, plenty of libraries to let you
|
|
create web applications quickly, and has drivers for just about anything
|
|
you want to do. It is known as the programmer's swiss army knife, and
|
|
for good reason.
|
|
</para>
|
|
<para>
|
|
The downside to perl is that it is an interpreted language, meaning each
|
|
time a perl application is run, the perl interpreter has to be loaded,
|
|
the application has to be loaded into memory, and the interpreter has to
|
|
compile the application. This can create a heavy load on a system.
|
|
</para>
|
|
<para>
|
|
<indexterm>
|
|
<primary>
|
|
mod_perl
|
|
</primary>
|
|
</indexterm>
|
|
The best way around this is to use the mod_perl library with
|
|
<application>apache</application>. This library first loads the perl
|
|
interpreter so it is always in memory and ready to run. It also caches
|
|
frequently-used perl CGI scripts in memory already in a precompiled
|
|
state. When a perl application starts up, many of the bottlenecks
|
|
(loading and interpreting) have already been done.
|
|
The cost to the sysadmin and programmer is that mod_perl takes up a lot
|
|
of memory, and its memory use will increase as more perl scripts are
|
|
added to a web server.
|
|
</para>
|
|
<note>
|
|
<para>
|
|
Adding more memory and CPU speed to a machine running mod_perl will
|
|
improve its performance.
|
|
</para>
|
|
</note>
|
|
</section> <!-- appsapacheperl -->
|
|
<section id="appsapachepython">
|
|
<title>Python</title>
|
|
<para>
|
|
Python was one of the first of the interpreted languages to implement
|
|
object oriented programming, allowing programmers to reuse and share
|
|
code. It also learned a bit from perl and it's interpreted nature by
|
|
comiling python code into a pre-compiled format. This format is not
|
|
portable like Java, but does much of the grunt work of compiling a
|
|
python application. The python interpreter then spends less time
|
|
compiling the application each time it's run.
|
|
</para>
|
|
<para>
|
|
Programmers may cringe a bit when they start writing in python. Syntax
|
|
is much more strict than C or Perl, using tabs and newlines to delineate
|
|
parts of code. And, like perl, python requires the interpreter to load
|
|
into memory and run.
|
|
</para>
|
|
<para>
|
|
There is also a mod_python for apache that operates simiar to mod_perl.
|
|
More memory and CPU speed will be a great improvement.
|
|
</para>
|
|
</section> <!-- appsapachepython -->
|
|
<section id="appsapachephp">
|
|
<title>PHP</title>
|
|
<para>
|
|
PHP is a relative newcomer to the languages listed here. But PHP was
|
|
designed for server side applications. The interpreter is a module in
|
|
<application>apache</application>, and PHP code can exist within a HTML
|
|
page, allowing authors to combine the web page and application into one
|
|
file.
|
|
</para>
|
|
<para>
|
|
PHP has two downsides to it from a programer's perspective. First, PHP
|
|
is still relatively new. It is not quite as seasoned as Perl or Python,
|
|
each of which have over ten years worth of development behind them.
|
|
Second, PHP does not have modules that can be easily loaded and used
|
|
like Perl and Python have. If your implementation of PHP does not have
|
|
the graphics libraries installes, you will have to recompile PHP and add
|
|
them in. Neither of these should prevent you from giving a serious look
|
|
at using PHP for your server side application.
|
|
</para>
|
|
</section> <!-- appsapachephp -->
|
|
</section> <!-- appsapache -->
|
|
|
|
<section id="appssamba">
|
|
<title>Samba File Sharing</title>
|
|
<para></para>
|
|
</section> <!-- appssamba -->
|
|
|
|
<section id="appsnfs">
|
|
<title>Network File System</title>
|
|
<para></para>
|
|
</section> <!-- appsnfs -->
|
|
</chapter> <!-- apps -->
|