LDP/LDP/howto/linuxdoc/Apache-Overview-HOWTO.sgml

811 lines
45 KiB
Plaintext
Raw Normal View History

2000-11-02 22:00:54 +00:00
<!doctype linuxdoc system>
<article>
<!-- Title Information -->
<title>Apache Overview How-to
<author>Daniel Lopez Ridruejo, <tt/ridruejo@apache.org/
<date>v0.4, 11 September 2000
<!-- Abstract -->
<abstract>
The purpose of this document is to give an overview of the Apache webserver
and related projects. It provides pointers for further information and
implementation details.
</abstract>
<!-- Table of Contents -->
<toc>
<sect>Introduction
<p>
The purpose of this document is to give an overview of the Apache web server
and related projects. Apache is the <url
name="most popular server on the Internet"
url="http://www.netcraft.com/survey/">. New Apache
users, specially those coming from a Windows background, are often unaware
of the possibilities of Apache, useful addons and how everything works
together. This document aims to show a general picture of such possibilities
with a brief description of each one and pointers for further information.
The information has been gathered from many sources, including projects' web
pages, conference talks, mailing lists, Apache websites and my own hands-on
experience. Full credit is given to these authors. Without them and their work
this document would not have been possible or necessary.
<p>Disclaimer: I work for <url name="Covalent" url="http://www.covalent.net">. We
provide products and support services for the Apache webserver, and I mention some
of them here, as I do for our competitors and similar open source projects.
<sect>Apache
<p>Apache is the leading internet web server, with over 60% market share, according
to the <url name="Netcraft survey" url="http://www.netcraft.com/survey">.
Several key factors have contributed to Apache's success:
<itemize>
<item>The <url name="Apache license" url="http://www.apache.org/LICENSE.txt">. It is
an open source, BSD-like license that allows for both commercial and non-commercial
usage of Apache.
<item>Talented community of <url name="developers" url="http://www.apache.org/contributors/index.html">
with a variety of backgrounds and an open development process based on technical merits.
<item>Modular architecture.
<item>Portable: Apache runs on nearly all flavors of Unix (and Linux), Windows, BeOs, mainframes...
</itemize>
Many commercial vendors have adopted Apache based solutions for their products, including <url name="Oracle"
url="http://www.oracle.com">, <url name="Red Hat" url="http://www.redhat.com"> and <url name="IBM" url="http://www.ibm.com">.
In addition, <url name="Covalent" url="http://www.covalent.net"> provides add-on modules and 24x7 support for Apache.
<p>The following websites use Apache or derivatives. Chances are that if Apache
is good enough for them, it is also good enough for you :)
<itemize>
<item><url name="Amazon.com" url="http://www.amazon.com">
<item><url name="Yahoo!" url="http://www.yahoo.com">
<item><url name="W3 Consortium" url="http://www.w3c.org">
<item><url name="Financial Times" url="http://www.ft.com">
<item><url name="Network solutions" url="http://www.networksolutions.com">
<item><url name="MP3.com" url="http://www.mp3.com"
<item><url name="Stanford" url="http://www.stanford.edu">
</itemize>
<p>
>From the <url name="Apache website" url="http://www.apache.org">:
<p>
<em>The Apache Project is a collaborative software development effort aimed at creating a robust, commercial-grade, featureful, and freely-available source code implementation of an HTTP (Web) server. </em>
<p>
The Apache project has grown beyond building just a web server into other critical server side technologies
like Java or XML. The Apache Software Foundation, described in the next section serves as an umbrella for these projects.
<sect>Apache Software Foundation
<p>
<em>The Apache Software Foundation exists to provide organizational, legal, and financial support for the Apache open-source
software projects. Formerly known as the Apache Group, the Foundation has been incorporated as a membership-based, not-for-profit
corporation in order to ensure that the Apache projects continue to exist beyond the participation of individual volunteers, to
enable contributions of intellectual property and funds on a sound basis, and to provide a vehicle for limiting legal exposure
while participating in open-source software projects. </em>
<p>Or, as Roy T. Fielding, the chairman of the ASF describes it:
<em>The mission of the Apache Software Foundation is to facilitate and support
collaborative software development projects that use the Apache methods of
collaboration over the Internet to create, maintain, and extend the infrastructure
of the Web and enforce the standards that define it.</em>
<p>You can learn more about the foundation <url name="here" url="http://www.apache.org/foundation/">.
<!-- members mail describing role -->
<!-- <sect>What is important in a web server -->
<sect>Developing web applications with Apache
<p>
There are several ways of providing content with Apache.
<sect1>Static
<p>
Apache can serve static content, like HTML files, images, etc.
If this is all you need, Apache is probably right for you.
A low end Pentium running Linux and Apache can easily saturate a 10Mbps
line serving static content. If that is your primary use of Apache, make
sure you also check the <ref id="performance" name="performance"> section.
<!-- References to mmap, bandwidth management, 2.0, alternative webservers -->
<sect1>Dynamic content
<p>For many websites, the information changes constantly and pages need to
be updated periodically or generated on the fly. This is what server side
programming is all about: programming languages, tools and frameworks that
help developers query and modify information from different sources (databases,
directory services, customer records, other websites) and deliver the content
to the user.
<sect1>CGI scripts
<p>
CGI stands for common gateway interface. CGI scripts are external programs
that are called when a user requests a certain page. The CGI receives information
from the web server (forms variable values, type of browser, IP
address of the client, etc) and uses that information to output a web page for the client.<p>
<em>Pros</em>: Since it is an external program, it can be coded any
language and the same script will also be portable among different web
servers. The CGI protocol is simple, and the return result consists of writing
the response to the standard output. It is a mature technology, and there are
plenty of online and book references and examples.
<p>
<em>Cons</em>: Spawning and initializing a process takes time. Since a CGI is
external to the server and an instance has to be launched/destroyed for every
request there is a performance hit. If the process has to load external
libraries or perform a connection to an external database the delay can be
important. Same thing if the number of hits per second is high. CGIs are
stateless and session management has to be achieved by external means.
<p>
Since CGI usually involves heavy text manipulation, scripting languages are
the natural choice. Part of <url name="Perl" url="http://www.perl.com/">
popularity stems from being the CGI programming language of choice. This is due
to its extensive support for string handling and text processing. There are plenty
of freely available CGI scripts and libraries. A good starting point is:
<url name="the Open Directory CGI section" url="http://dmoz.org/Computers/Programming/Internet/CGI/">
<sect1>Site generators
<p>If your site is high volume, you may run into performance problems when
generating content dynamically. Offline content generators are an alternative.
These solutions separate content from presentation. The HTML
generator reads both sources and outputs the static files that build the
website. The generator can be run periodically or triggered by content
changes.<p>
Future versions of <ref name="Cocoon" id="cocoon">
plan on having a batch mode to accomplish this. Another option is the
<url name="Web site meta language" url="http://www.engelschall.com/sw/wml/">.
<sect1>Out of process servers
<p>
The web server can pass dynamic requests to another program. This program
sits idle until a request comes. The request is processed and returned to the
webserver which in turn returns it to the client. This eliminates the overhead
associated with CGI scripts. Examples of this approach are <ref name="Fast CGI"
id="fastcgi">, <ref name="Java servlets" id="javaservlets">, etc.
<sect1>Fast CGI<label id="fastcgi">
<p>This standard was developed to address some shortcomings of the CGI
protocol. The main improvement is that a single spawned process can process
more than one request. There is an Apache module that implements the Fast CGI
protocol and libraries for Tcl, Perl etc. More information at <htmlurl
url="http://www.fastcgi.com">
<sect1>Java servlets<label id="javaservlets">
<p>An external Java virtual machine processes requests. The JVM can reside in
the same computer or in a different one. This is how a lot of application
servers work. Usually standard libraries are included for server side
processing. You want to check <ref name="JServ" id="jserv"> and
<ref name="Tomcat" id="tomcat">. Related Java application server projects
can be found <ref name="here" id="applicationservers">
<sect1>Embeded interpreters
<p>An alternative to out-of-process webservers is to embed the interpreter in
the server itself. There are roughly two categories in this kind of modules:
Modules that answer or modify requests directly and modules aimed to process
commands embeded in HTML pages before serving it to the client. The most
representative approaches are <ref name="mod_perl" id="modperl"> and <ref
name="PHP" id="php">
<sect>Performance and bandwidth management<label id="performance">
<p>Raw performance is only one of the factors to consider in a web server
(flexibility and stability come usually first).<p>
Having said that, there are solutions to improve performance on heavy loaded
webservers serving static content. If you are in the hosting business
Apache also provides ways in which you can measure and control bandwidth usage.
Throttling in this context usually means slowing down the delivery of content
based on the file requested, a specific client IP address, etc. This is done
to prevent abuse.
<itemize>
<item><bf>mod_mmap</bf>: Included in current Apache releases, it maps to
memory a statically configured list of frequently requested but not changed
files.
<item><bf><url name="Mod_bandwidth" url="http://www.cohprog.com/mod_bandwidth.html"></bf>: <em>Enables the setting of server-wide or per connection bandwidth limits, based on the specific directory,
size of files and remote IP/domain</em>.
<item><bf><url name="Bandwidth share module" url="http://www.topology.org/src/bwshare/README.html"></bf>: provides bandwidth throttling and balancing by client IP address. It is actively maintained.
<item><bf><url name="Mod_throttle" url="http://www.snert.com/Software/Throttle/index.shtml"></bf>:Throttle bandwidth per virtual host or user.
<item><bf><url name="Mod_throttle_access" url="http://www.fremen.org/apache/"></bf>: useful if you are <url name="slashdotted" url="http://everything2.com/index.pl?node_id=13464">. Allows throttling based on resources
(file, directory, etc.)
</itemize>
<sect>Load balancing
<p>Apache has several modules that allow distribution of requests among servers, for redundancy, increased availability, etc.
<p>
<itemize>
<item><bf>Reverse proxying + mod_rewrite</bf>: There is nothing in Apache that you can not do with <url name="mod_rewrite"
url="http://www.apache.org/docs/mod/mod_rewrite.html"> ... :) This technique consists of having an Apache front-end server acting
as a proxy for the backend servers. You can find more information <url name="here"
url="http://www.apache.org/docs/misc/rewriteguide.html">
<item><bf>Mod_redundacy</bf>: Takeover web and ip in case of failure. You can find more information
<url name="here" url="http://www.ask-the-guru.com">.
<item><bf>Mod_backhand</bf>: <em>Allows seamless redirection of HTTP requests from one web server to another. This redirection
can be used to target machines with under-utilized resources, thus providing fine-grained, per-request load balancing of web
requests</em>. More information at <htmlurl url="http://www.backhand.org/">.
</itemize>
<sect>Secure transactions
<p>There are several solutions that provide secure transactions for Apache servers.
This enables Apache servers to be used for ecommerce or other scenarios where
sensitive information is exchanged (like credit card numbers).
<itemize>
<item><url name="Mod_ssl" url="http://www.modssl.org"> and <url name="Apache-SSL" url="http://www.apache-ssl.org"> are open source
implementations. They are European based, unencumbered by RSA patents.
<item><url name="Red Hat" url="http://www.redhat.com"> offers a secure server derived from Apache. Red Hat adquired C2Net, makers
of StrongHold, another Secure server derived from Apache.
<item><url name="Covalent" url="http://www.covalent.net"> sells secure versions of Apache as well as the RavenSSL module that
plugs on existing Apache installations.
</itemize>
<p><bf>Credit card transactions</bf>
<p>Apache specific solutions exist for credit card transactions:
<itemize>
<item><url name="Cypay" url="http://www.cypay.com/"> credit card module for
Apache. Template based, tax calculations.
<item><url name="Covalent credator" url="http://www.covalent.net">, multiple
clearinghouses support, failover operation, PHP, Perl, Java support.
</itemize>
<sect>SNMP
<p>SNMP stands for Simple Network Management Protocol. It allows monitoring
and management of network servers, equipment, etc. SNMP modules for Apache
help manage large deployments of web servers, measure the quality of service
offered and integration of Apache in existing management frameworks.
<itemize>
<item>Open source <url name="Mod SNMP"
url="http://www.simpleweb.org/software/packages/mod-snmp/"> for Apache 1.3.
<item><url name="Raven SNMP" url="http://www.covalent.net"> provides
a commercial SNMP module, support for the latest SNMPv3 standard, integration
with HP-Openview, Tivoli, etc.
</itemize>
<sect>Authentication modules
<p>In many situations (subscription services, sensitive information,
private areas), user authentication is required. Apache includes basic authentication
support. Additional authentication modules exist that connect Apache to
existing security frameworks or databases, including: NT Domain
controller, Oracle, mySQL, PostgresSQL, etc.<p>
The LDAP modules are specially interesting, as they allow integration with
company and enterprise wide existing directory services.<p>
You can find these modules at <htmlurl url="http://modules.apache.org">.
<sect>GUIs for Apache
<p>Apache is configured thru text configuration files. This has advantages and
disadvantages. Management can be done from any computer that has internet
access via <url name="ssh" url="http://www.openssh.com">. Editing a
configuration file by hand implies a learning curve. There are open source
graphical tools that make this task easier:
<itemize>
<item><url name="Comanche" url="http://www.comanche.org">: It is crossplatform
and runs on Unix/Linux, Windows and Mac. Check the website for screenshots and
in-depth information. Disclaimer: I am the main author of Comanche, so remember,
there are no bugs, only undocumented features :)
<item><url name="gui.apache.org" url="http://gui.apache.org">: GUI interfaces
for Apache project. Programs with various degrees of development.
<item><url name="Webmin" url="http://www.webmin.com/webmin/">: It is a nice
web based interface.
</itemize>
<sect>Writing Apache modules
<p>
Apache, like many other successful open source projects has a modular architecture.
This means that to add or modify functionality you do not need to know the whole
code base.
Source code access for Apache means that you can custom build the server with only the
modules that you need and include your owns.
<p>
Extending Apache can be done in C or in a variety of other languages using appropriate modules.
These modules expose Apache's internal functionality to different programming languages like Perl or Tcl.
<p><bf>Writing modules in C</bf>: Apache is written in C and so they are the modules distributed with Apache.
The best way to get started writing Apache modules is to read Doug MacEachern and Lincoln Stein
<url name="Writing Apache modules with Perl and C" url="http://www.modperl.com">. It is a well-written, easy to read book by two
Apache and Perl gurus. The above link will lead you to the book website, which has some of its chapters online.
If you have not the money to buy the book or cannot borrow it from a friend, there are other ways.
You can read some of the online tutorials on writing Apache modules: Ken Coar, an Apache Group member, has a nice
<url name="tutorial and slides online" url="http://web.golux.com/coar/slides/">.
An overview of the Apache architecture can be found <url name="here" url="http://www.grad.math.uwaterloo.ca/~oadragoi/CS746G/a1/apache_conceptual_arch.html">.
The Apache website has some <url name="API notes" url="http://www.apache.org/docs/misc/API.html"> that can help you get started. You are also encouraged to browse the
source code of the modules included with Apache. Apache includes a simple one
(mod_example.c) for that purpose.
<p><bf>Writing Apache modules in other languages</bf>: There is a variety of Apache
modules that enable third party languages to access the internal Apache API.
The most popular is <ref name="mod_perl" id="modperl">.
<p>If you have any questions about the development of an Apache module you
should join the Apache modules mailing list at <htmlurl url="http://modules.apache.org">.
Remember to do your homework first, research past messages and check
all the documentation previously described. Chances are somebody had the
same problem that you are experiencing and he got an useful response.
<p>If you are interested in the development of core Apache itself, you should
checkout the <url name="Apache development site" url="http://dev.apache.org">.
<!-- <sect>Configuring Apache -->
<sect>Apache books
<p>
A comprehensive list of Apache books can be found at<url
name="http://www.apache.org/info/apache_books.html" url="here">.
<p>A couple of books
that I personally recommend are:
<itemize>
<item><url url="http://www.modperl.com"
name="Writing Apache Modules with Perl and C"> if you are interested in Apache
internals.
<item><url name="Apache server for dummies" url="http://apache-server.com/">
if you want to get started with Apache. Do not get fooled by the name. This is
a comprehensive book packed with useful information.
</itemize>
<sect>Java projects
<p>
For historical reasons, Java projects can be found both under the
java.apache.org and jakarta.apache.org umbrellas. The final goal is that over
time all Java pojects will move under the Jakarta umbrella.
<p><em>The goal of the Jakarta Project is to provide commercial-quality
server solutions based on the Java Platform that are developed in an
open and cooperative fashion.</em>
<p>The Java on Apache community is a very dynamic and active one, as shows
the quantity and quality of its subprojects, which are described now.
<sect1>Ant
<p>You can think of Ant as the Java equivalent of make. It is a big success
with Java related projects. Developers can write Java instead of shell
commands. This means increased portability and extensibility. Instead of
Makefiles Ant has XML files.
<!-- Example Ant makefile -->
You can learn more about ANT <url name="here"
url="http://jakarta.apache.org/ant/index.html">.
<sect1>ORO and Regexp
<p>ORO is a complete package that provides regular experession support for
Java. It includes Perl5 regular expression support, glob expressions, etc.
All under the Apache license.
You can learn more about ORO <url name="here" url="http://jakarta.apache.org/oro/index.html">. You can find another lightweight regular expression package,
<url name="Regexp" url="http://jakarta.apache.org/regexp/">.
<sect1>Slide
<p><em>Slide is a high-level content management framework.
Conceptually, it provides a hierarchical organization of binary
content which can be stored into arbitrary, heterogenous, distributed
data stores. In addition, Slide integrates security, locking and versioning
services.</em>
<p>If you are familiar with <url name="WedDAV" url="http://www.webdav.org">,
Slide uses it extensively. In simple words, what Slides provides is an unified,
simple way to access resources and information. These resources can be stored
in a database, the filesystem, etc. and accessed either thru a WebDAV interface
or Slide own API.
<p>
You can learn more at the <url name="Slide home page"
url="http://jakarta.apache.org/slide/index.html">.
<sect1>Struts
<p>Struts is an Apache project that tries to bring the Model-View-Controller
(MVC) design paradigm to web development. It builds on <url name="Servlet"
url="http://java.sun.com/products/servlet"> and <url name="JavaServer Pages"
url="http://java.sun.com/products/jsp"> technologies. The model part are the
Java server objects, which represent the internal estate of the application.
Enterprise Java Beans are commonly used here. The view part is constructed
via JavaServer Pages (JSP) which are a combination of static HTML/XML and
Java. JSPs also allow the developer to define its own tags.
The controller part are servlets, which take requests (GET/POST) from the
client, perform actions on the model and update the view by providing the
appropriate JSP.
You can learn more at the <url name="Struts project pages" url="http://jakarta.apache.org/struts/index.html">.
<sect1>Taglibs
<p>The JavaServer pages technology allows developers to provide functionality
by adding custom tags. The Taglibs project intends to be a common repository
for these extensions. It includes tags for common utilities (for, date),
SQL database access, etc.
<p>
You can learn about TagLibs
<url name="here" url="http://jakarta.apache.org/taglibs/index.html">.
More documentation is included in the package.
<sect1>Tomcat<label id="tomcat">
<p>Tomcat is the flagship product of the Jakarta project.
It is the official reference implementation for the Java
Servlet 2.2 and JavaServer Pages 1.1 technologies.
<p>
You can learn more in the <url name="Tomcat homepage"
url="http://jakarta.apache.org/tomcat/index.html">. The Tomcat project
was started with a code donation from Sun Microsystems.
<sect1>Velocity
<p><em>Velocity is a Java based template engine. It can be used as a
stand-alone utility for generating source code, HTML, reports, or
it can be combined with other systems to provide template services.</em>
Velocity has a Model View Controller paradigm that enforces separation of
Java code and the HTML template.<p>
You can learn more about Velocity <url name="here"
url="http://jakarta.apache.org/velocity/index.html">. Velocity is part of
other projects like <ref name="Turbine" id="turbine">
<sect1>Watchdog
<p>The Watchdog project provides the validation tests for the Servlet and
JavaServer Pages specifications. You can find more information <url name="here"
url="http://jakarta.apache.org/watchdog/index.html">
<sect1>JServ<label id="jserv">
<p><em>Apache JServ is a 100% pure Java servlet engine fully compliant with the
JavaSoft java Servlet APIs 2,0 specification.(...)The result is a pure servlet
engine that works on any "version 1.1" Java Virtual Machine.</em>
<p>
JServ is one of the original Java Apache projects. <ref name="Tomcat"
id="tomcat"> will be the successor of JServ once it is finished.
You can learn more at the <url name="JServ home page"
url="http://java.apache.org/jserv/index.html">.
<sect1>JSSI
<p>JSSI is an implementation of server side included in the Java language.
Server side includes are tags includes in files that get processed before
the page is served to the client (for example to include the current date)
You can find more information <url name="here" url="http://java.apache.org/jservssi/index.html">.
<sect1>Apache JMeter
<p><em>The Apache JMeter is a 100% pure Java desktop application designed to
load test functional behavior and measure performance. It was originally
designed for testing Web Applications but has since expanded to other test
functions.</em><p>It can be used to test static and dynamic resources and get
inmediate visual feedback.
<p>You can see some screenshots and learn more <url name="here"
url="http://java.apache.org/jmeter/index.html">.
<sect1>Server Pages Foundation Classes
<p>Is a set of libraries to help solve common problems in server side
application development. They focus on two of them:
<itemize>
<item><bf>Mixing HTML and Java</bf>: Provides a library of classes that takes
care of the HTML generation and that can be integrated with the rest of the
Java code.
<item><bf>HTTP is a stateless protocol</bf>: SPFC provides session support,
so applications can keep track of users as they navigate the website. The
application developer does not need to worry about the specific details of
page generation. He can think in more general traditional application terms.
You can learn more about SPFC <url name="here"
url="http://java.apache.org/spfc/index.html">
</itemize>
<sect1>mod_java
<p>Is the Java equivalent of mod_perl. Allows access to the Apache internals
from inside a JVM. This allows for increased flexibility and the possibility
of writing Apache Modules directly in Java. Unfortunately, no code seems
to be present at the moment. You can find more information <url
name="here" url="http://java.apache.org/mod_java/index.html">.
<sect1>Element Construction Set
<p><em>Element Construction Set (ECS) is a JAVA API or generating elements for
various markup languages it directly supports HTML 4.0 and XML, but can
easily be extended to create tags for any markup language.</em>
<p>It allows the generation of mark up tags using Java function calls,
leading to a much cleaner solution that mixing HTML and Java code.
You can learn more at the <url name="ECS project page"
url="http://java.apache.org/ecs/index.html">.
<sect1>Avalon
<p>If you are familiar with Perl or BSD systems, Avalon is roughly the
equivalent of <url name="CPAN" url="http://www.cpan.org"> or the Ports
collection for Java Apache technologies. It does not only provide guidelines
for a common repository of code, it goes one step further: <em>is an effort to
create, design, develop and maintain a common framework for server
applications written using the Java language.</em> It provides the means so
server side Java projects can be easily integrated and build on each other.
<sect1>JAMES (Java Apache Mail Enterprise Server)
<p>Complementary to the other Apache server side technologies, JAMES provides
<em>a 100% pure Java server designed to be a complete and portable enterprise
mail engine solution based on currently available open protocols (SMTP, POP3,
IMAP, HTTP)</em>
<p>More information can be found <url name="here"
url="http://java.apache.org/james/index.html">.
<sect1>PicoServer
<p>A lightweight HTTP/1.0 server in pure Java. The project seems to be stalled
and no code is available. The website can be found <url name="here"
url="http://java.apache.org/picoserver/index.html">.
<sect1>Jetspeed
<p><url name="Jetspeed" url="http://java.apache.org/jetspeed/site/overview.html">
is a web based portal written in Java. It has a modular API that
allows aggregation of differnt data sources (XML, SMTP, iCalendar)
<sect1>Turbine<label id="turbine">
<p><em>Turbine is a servlet based framework that allows experienced Java developers
to quickly build secure web applications</em>. Turbine brings together a platform
for running Java code <em>and</em> reusable components, everything under the Apache
license. Some of it features
<itemize>
<item>Integration with template systems
<item>MVC style development
<item>Access Control Lists
<item>Localization support
<item>etc.
</itemize>
If you are interested, you can visit the <url name="Turbine web site"
url="http://java.apache.org/turbine/features.html">.
<sect1>Jyve
<p>The <url name="Jyve project" url="http://java.apache.org/jyve/index.html"> is
built on top of the Turbine framework. It is an application that provides a web
based FAQ system
<sect1>Alexandria
<p>Alexandria is an integrated documentation management system. It brings
together technologies common to many open source projects like CVS and JavaDoc.
The goal is to integrate source code and documentation to encourage code
documentation and sharing. More information <url name="here" url="http://java.apache.org/alexandria/index.html">
<sect>XML projects
<p>Directly from the Apache XML project website, its goals are:
<itemize>
<item><em>To provide commercial-quality standards-based XML solutions that
are developed in an open and cooperative fashion.</em>
<item><em>To provide feedback to standards bodies (such as IETF and W3C) from
an implementation perspective.</em>
<item><em>To be a focus for XML-related activities within Apache projects</em>
</itemize>
The project homepage is located at <htmlurl url="http://xml.apache.org">.
It is an umbrella for a variety of subprojects.
<sect1>Introduction to XML
<p>
This is a quick introduction to XML. To know more about XML, a good starting
point is <htmlurl url="http://www.xml.com">. XML is a markup language (think
HTML) for describing structured content using tags and attributes. Once
content is separated from presentation, you can choose how to display
(cellphone, html, text) or exchange it. The XML standard only describes how
the tags and attributes can be arranged, not its names of what they mean.
Apache provides the tools described in the following sections.
<sect1>Xerces
<p>
The Xerces project provides XML parsers for a variety of languages, including
Java, C++ and Perl. The Perl bindings are based on the C++ sources.
There are Tcl bindings for Xerces in the 2.0 version of <url name="TclXML" url="http://www.zveno.com/">, by Steve Ball. This 2.0 version is only available at
the moment thru <url name="Ajuba CVS repository" url="http://dev.ajubasolutions.com/software/tcltk/netcvs.html">.
A XML parser is a tool used for programatic access to XML documents.
This is a description of the standards supported by Xerces:
<itemize>
<item><url name="DOM" url="http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/level-one-core.html">: DOM stands for Document Object Model. XML documents
are hierarchical by nature (nested tags). XML documents can be accessed thru
a tree like interface. The process is as follow:
<itemize>
<item>Parse document
<item>Build tree
<item>add/delete/modify nodes
<item>Serialize tree
</itemize>
<item><url name="SAX" url="http://www.megginson.com/SAX/index.html">:Simple API for XML. This is a stream based API. This means
that we will receive callbacks as elements are encountered. These callbacks
can be used to construct a DOM tree for example.
<item><url name="XML Namespaces" url="http://www.w3.org/TR/REC-xml-names/">
<item>XML Schema: The XML standard provides the syntax for writing documents. XML
Schema provides the tools for defining the <em>contents</em> of the XML
document (semantics). It allows to define that a certain element in the
document must be an integer between 10 and 20, etc.
</itemize>
The Xerces XML project initial code base was donated by IBM. You can find more
information in the <url name="Xerces Java" url="http://xml.apache.org/xerces-j/index.html">, <url name="Xerces C" url="http://xml.apache.org/xerces-c/index.html"> and <url name="Xerces Perl" url="http://xml.apache.org/xerces-p/index.html"> homepages.
<sect1>Xalan
<p>
Xalan is an XSLT processor available for Java and C++.
XSL is a style sheet language for XML. The T is for Transformation. XML
is good at storing structured data (information). We sometimes need to
display this data to the user or apply some other transformation.
Xalan takes the original XML document, reads transformation configuration
(stylesheet) and outputs HTML, plain text or another XML document.
You can learn more about Xalan at the <url name="Xalan Java" url="http://xml.apache.org/xalan/index.html"> and <url name="Xalan C" url="http://xml.apache.org/xalan-c/index.html"> project homepages.
<sect1>FOP
<p>From the website <em>FOP is a Java application that reads a formatting
object tree and then turns it into a PDF document</em>. So FOP takes an
XML document and outputs PDF, in a similar way that Xalan does with HTML
or text. You can learn more about FOP <url name="here"
url="http://xml.apache.org/fop">.
<sect1>Cocoon<label id="cocoon">
<p>Cocoon leverages other Apache XML technologies like Xerces, Xalan and FOP
to provide a comprehensive publishing framework. Cocoon is based around
XML and XSL and targeted to sites of medium - high complexity.
It separates content, logic and presentation as described in the website:
<itemize>
<item><bf>XML creation</bf>: <em>the XML file is created by the content owners.
They do not require specific knowledge on how the XML content is further
processed rather than the particular chosen DTD/namespace.
This layer is always performed by humans directly through normal text editors
or XML-aware tools/editors.</em>
<item><bf>XML process generators</bf>:<em> the logic is separated from the content
file.</em>
<item><bf>XSL rendering</bf>:<em> The created document is then rendered by applying an
XSL stylesheet to it and formatting it to the specified resource type (HTML,
PDF, XML, WML, XHTML)</em>
</itemize>
You can learn more about Cocoon at the <url name="project homepage" url="http://xml.apache.org/cocoon/index.html">
<sect1>Xang
<p>The goal of the Xang project is <em>make it easy for developers to build
commercial quality XML aware applications for the Web.</em> The application
logic is defined in a hierarchical XML file which can be scripted via
JavaScript. This file defines how to access the data (which can be other XML
files, Java plug-ins, etc.). The Xang engine takes care of mapping HTTP
requests to the appropriate handlers.
You can learn more about Xang at the <url name="project homepage" url="http://xml.apache.org/xang/samples.html">.
<!--include source code for hello world?-->
<sect1>SOAP
<p><em>Apache SOAP ("Simple Object Access Protocol") is an implementation of
the <url name="SOAP submission" url="http://www.w3.org/TR/SOAP"> to W3C.
It is based on, and supersedes, the IBM SOAP4J implementation</em>.
<p><em>From the draft W3C specification: SOAP is a lightweight protocol for
exchange of information in a decentralized, distributed environment. It is an
XML based protocol that consists of three parts</em>:
<itemize>
<item><em>An envelope that defines a framework for describing what is in a
message and how to process it</em>,
<item><em>a set of encoding rules for expressing instances of
application-defined datatypes</em>,
<item><em>and a convention for representing remote procedure calls and
responses</em>.
</itemize>
Think of SOAP as an XML based remote procedure call or CORBA system. It is
based on HTTP and XML. In one hand this means it is verbose and slow compared
to other systems. On the other hand it eases interoperatibility, debugging and
development of clients and servers for a variety of languages (C, Java,
, Perl, Python, Tcl, etc.) since most modern languages have HTTP and XML
modules. You can learn more at the <url name="Apache SOAP homepage" url="http://xml.apache.org/soap/">
<sect1>Other XML projects
<p>There are other projects based on Apache and XML that do not live under the
Apache XML umbrella
<itemize>
<item><url name="mod_xslt" url="http://modxslt.userworld.com/">.
It is a C based module for delivering XML/XSL based content. It has a GPL license.
<item><url name="AxKit" url="http://axkit.org"><label id="axkit"> is an XML based
Application Server for mod_perl and Apache. It allows separation of content and
presentation.
</itemize>
<sect>Perl
<p>Perl and Apache are a powerful and popular combination. There are several projects
that use these two technologies.
<sect1>Embperl
<p>Allows embedding of Perl in HTML pages. These pages are processed in the server
before they are delivered to the client. It is similar to <ref name="PHP" id="php">.
You can learn more <url name="here" url="http://perl.apache.org/embperl/index.html">.
<sect1>Mason
<p>The <url name="Mason project" url="http://www.masonhq.com/"> embeds Perl in HTML
with a reusable component model approach. It allows caching, templating, etc.
<sect1>Mod_Perl<label id="modperl">
<p>
Mod_perl is one of the most veteran and successful Apache projects. It embeds a Perl interpreter
in Apache and allows access to the web server internals from Perl. This allows for entire modules
to be written in Perl or a mixture of Perl and C code.
In the 1.3 Apache versions, one interpreter has to be embedded in each child, since the server is multiprocess based.
In heavy traffic dynamic sites, the increased size could make a difference.
Apache 2.0 is multithreaded, as recent versions of Perl are. The next generation of mod_perl takes advantage of this
and allows for sharing of code, data and session state among interpreters. This results in a faster, leaner solution.
<p>
Make sure you check also <ref id="axkit" name="Axkit">
<sect>PHP<label id="php">
<p>From the <url name="PHP website" url="http://www.php.net"> website:
<em>PHP is a server-side, cross-platform, HTML embedded scripting
language.</em> PHP is a scripting language like Perl, Python or Tcl. It is
the <url name="most popular module for Apache" url="http://www.securityspace.com/s_survey/data/man.200008/apachemods.html"> and this is due to a variety
of reasons:
<itemize>
<item>Learning curve is quite low
<item>Great documentation
<item>Extensive database support
<item>Modularity
</itemize>
PHP has a modular design. There are modules that provide support for:
<itemize>
<item>Database connetivity for Oracle, ODBC, mySQL, mSQL, PostgreSQL,
MS-SQL server... and many more, check the <url name="PHP website"
url="http://www.php.net">.
<item>XML support
<item>File transfer: FTP
<item>HTTP
<item>Directory support: LDAP
<item>Mail support: IMAP, POP3, NNTP
<item>PDF document generation
<item>CORBA
</itemize>
and many more. You only need to compile/use the modules you need.
<p>PHP can be used with Apache, as an external CGI or with other webservers.
It is crossplatform and it runs on most varieties of Unix and Windows.
<p>If you come from a Windows background, you probably have used Internet
Information Server with Active Server Pages and MS-SQL Server. A common
replacement in the Unix world for this trio is Apache with PHP and mySQL.
Since PHP works:
<itemize>
<item>with Apache and with Microsoft IIS
<item>with mySQL and with MS-SQL server
<item>on Unix an on Windows
</itemize>
you have a nice migration path from a Microsoft-centric solution to more
secure, stable, high performance Unix based solutions (like <url name="FreeBSD"
url="http://www.freebsd.org">, <url name="Solaris" url="http://www.sun.com">,
<url name="Linux" url="http://www.linux.com"> or <url name="OpenBSD"
url="http://www.openbsd.com">)
<!-- examples of PHP code -->
<sect>Python
<p>
Python is an scripting language similar to Perl or Tcl.
Several modules embed Python in the Apache web server:
<itemize>
<item><url name="Mod Python" url="http://www.modpython.org">
<item><url name="Mod Snake" url="http://modsnake.sourceforge.net">:
runs both in Apache 1.3.x and the upcoming 2.0
</itemize>
Both modules would be useful if you plan on writing Apache modules in Python
or run existing Python CGIs faster. Mod Snake allows to embed Python in HTML
, much like <ref name="PHP" id="php"> does.
<sect>Tcl
<p>The <url name="Tcl Apache project" url="http://tcl.apache.org"> integrates
Tcl with the Apache webserver, like Mod_dtcl.
Mod_Dtcl allows for embedding Tcl on HTML pages like
<ref name="PHP" id="php"> does. Tcl is a lightweight, extensible
scripting language. You can learn more about Tcl <url
url="http://dev.ajubasolutions.com/" name="here">.
Other Tcl based Apache solutions are <url name="Neo Web Script" url="http://www.neosoft.com/neowebscript/"> and <url name="WebSH" url="http://websh.com/">
<sect>Modules for other languages
<p>This document have described modules for popular server side languages
such as Perl, Python, PHP. You can find additional language modules (JavaScript, Haskell, etc.)
at the <url name="Apache modules directory" url="http://modules.apache.org">.
<sect>Apache 2.0
<p>The current version of Apache (the 1.3 series) is process based. That means that
the server forks itself a number of times to answer simultaneous requests.
The children are isolated from each other.
This is reliable: if a module misbehaves, the parent process kills that child and
it only affects the request being served, not the server as a whole.
Threads are similar to lightweight processes. Threads can share common data.
If a thread misbehaves it can corrupt other threads and the server as a whole
can go down. On the other hand, the thread model allows for faster, leaner
webservers. Apache 2.0 brings the best of both worlds, allowing the user to define
number of processes and number of threads per process. Apache 2.0 introduces
APR, the Apache Portable Runtime, which increases even more Apache's portability.
Finally, layered I/O brings a new level of modularity to Apache development.
<sect>Migrating from Netscape (iPlanet) web servers
<p>The bulk of the work may reside in converting custom modules from NSAPI to the Apache API.
Nearly all the other server side technologies (Java, Perl, CGIs) should be portable with little
or no change.
Netscape is tightly integrated with LDAP servers. You may be also interested in LDAP modules in <htmlurl url="http://modules.apache.org">.
Netscape includes server side JavaScript support, you can check the Apache equivalent, <url name="mod_javascript"
url="http://www.geocities.com/TimesSquare/Fortress/9743/binjs.html">.
<sect>Migrating from Microsoft IIS
<p>Common reasons why people migrate from IIS to Apache (and not the other way around) include
stability, performance and security. This is partly because most people running Apache do it on
an Unix variant (like Solaris, FreeBSD or Linux). Fortunately, Apache is multiplatform and runs
on both Unix and Windows, offering a sensible migration path.
<p>Common Windows based web development environments like Coldfusion or Active Server Pages
have Unix ports or compatible environments (some are commmercial, some are freely available):
<itemize>
<item><url name="Coldfusion for Linux" url="http://www.allaire.com/Products/coldfusion/">
<item><url name="Perl ASP module" url="http://www.apache-asp.org/"><label id="perlasp">
<item><url name="Halcyon ASP" url="http://www.halcyonsoft.com/">
<item><url name="OpenASP" url="http://www.activescripting.org/">
</itemize>
Apache for Windows supports also the ISAPI interface.
<p>If you want to go for a complete open source solution and you come from a Windows background (
IIS + ASP + MS-SQL server) the roughly equivalent (and highly popular) combination is Apache + PHP
+ <url name="MySQl" url="http://www.mysql.com"> or <url name="PostgresSQL" url="http://www.postgresql.org">.
You can learn more about PHP <ref name="here" id="php">
<p>Support for Windows is greatly improved in the new 2.0 Apache version, still in alpha stage at the time of this writing.
<sect>Links
<p>Additional Apache related resources
<sect1>Websites
<p>
<itemize>
<item><url name="Apache" url="http://www.apache.org">
<item><url name="Apache modules directory" url="http://modules.apache.org">
<item><url name="Apache today" url="http://www.apachetoday.com">
<item><url name="Slashdot Apache section" url="http://slashdot.org/index.pl?section=apache">
</itemize>
<sect1>Java application servers<label id="applicationservers">
<p>
These are open source application servers that build on or are known to play
well with Apache.
<itemize>
<item><url name="Resin" url="http://www.caucho.com/">: Servlets, JSP, XSL
<item><url name="Enhydra" url="http://www.enhydra.com">: Java/XML application
server.
<item><url name="Locomotive" url="http://www.locomotive.or">: Servlets,
load balancing, failover.
</itemize>
<sect>Contacting the author
<p>You can contact me at <htmlurl url="ridruejo@apache.org">. I welcome suggestions
and corrections, but please, please, do not send me messages asking me to
troubleshoot your Apache installation. I just do not have the bandwidth and your
mail will be most likely ignored. If you need support, consider:
<itemize>
<item>Check the error logs, read the docs, specially the <url name="FAQ" url="http://www.apache.org/docs/misc/FAQ.html">.
<item>If you still do not find the solution, go for a walk. Afterwards read the docs,
again.
<item>Try comp.infosystems.www.servers.unix at <htmlurl url="http://www.deja.com">.
Search for a similar problem.
<item>If you are still stuck. Provide as much information as you can,
relevant error_log entries and steps you have taken so far and post to that
newsgroup. This will increase the chances someone will answer your question
</itemize>
<p>If you want commercial support, consider contacting <url name="Covalent"
url="http://www.covalent.net">, which provides expert support for Apache (at a fee,
of course). If you are using Apache on Linux, your Linux vendor may have support
plans that include Apache too.
<sect1>Translations
<p>I encourage translations of this document. You probably should use
the SGML source. Check <htmlurl url="http://www.linuxdoc.org"> for info.
Please drop me a note so I can make sure you get the most recent version
<!--<sect>Other web servers-->
</article>