old-www/LDP/LG/issue63/washington.html

<!--startcut  ==============================================-->
<!-- *** BEGIN HTML header *** -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<title>XML parsing in AOLserver LG #63</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!-- *** END HTML header *** -->

<CENTER>
<A HREF="http://www.linuxgazette.com/">
<H1><IMG ALT="LINUX GAZETTE" SRC="../gx/lglogo.png" 
	WIDTH="600" HEIGHT="124" border="0"></H1></A> 

<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="sharma.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue63/washington.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><A HREF="../faq/index.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="lg_backpage63.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
<P>
</CENTER>

<!--endcut ============================================================-->

<H4 ALIGN="center">
"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>

<P> <HR> <P> 
<!--===================================================================-->

<center>
<H1><font color="maroon">XML parsing in AOLserver</font></H1>
<H4>By <a href="mailto:irvingw@pobox.com">Irving Washington</a></H4>
</center>
<P> <HR> <P>  

<!-- END header -->


<h3>AOLserver</h3>

<a href="http://www.aolserver.com">AOLserver</a> is an open-source,
multi-threaded, high-performance web server. AOLserver is less known
than Apache but it has a few features that put it ahead of
Apache: rich and well-thought extension API, superior database
connectivity API, embedded and tightly integrated Tcl interpreter.
Read my <a href="../issue58/washington.html">previous LG article </a> to
learn more about AOLserver.

<h3>XML</h3>

If you're going to do serious work with XML you'll have to learn about
it and you'll have to do it somewhere else. The best summary of XML
I've seen is: XML is an (inefficient) way to to represent data in
tree form as text (ASCII) files. Text is good because it's simple.
Tree is good because a lot can be represented as trees
(e.g., a non-circular list is just a degenerated tree and a circular
list can be described with multiple trees). Inefficient is bad but it
usually makes an engineering sense to trade inefficiency for
extensibility and wide adoption that XML enjoys (lots of tools, 
lots of information).

<h3>XML support in AOLserver</h3>

XML processing (parsing and modification of XML documents) in
AOLserver is possible thanks to an <b>ns_xml</b> module written
by <a href="http://www.arsdigita.com">ArsDigita</a>. This module is a
wrapper around version 2.x (&gt;2.2.5) of <a
href="http://www.xmlsoft.org/">libxml</a> library and adds
<code>ns_xml</code> command to the embedded Tcl interpreter.
You can <a
href="http://www.aolserver.com/download/index.adp?dir=%2fmodules%2fnsxml">
download the source</a> or get it directly from the CVS repository doing:
<pre>
cvs -d:pserver:anonymous@cvs.aolserver.sourceforge.net:/cvsroot/aolserver login
cvs -z3 -d:pserver:anonymous@cvs.aolserver.sourceforge.net:/cvsroot/aolserver co nsxml
</pre>
You need to press <i>Enter</i> after first command since CVS is
waiting for a password (which is empty).
<p>
As of Dec. 2000 Linux distributions usually come with
version 1.x of libxml library so chances are that you'll need to
install 2.x by yourself (this will change in the future since
everyone is migrating to 2.x). To install <code>nsxml</code> module go
into <tt>nsxml</tt> directory, optionally edit a path in
<code>Makefile</code> to point into AOLserver source directory. Then
run <code>make</code>. You should get <code>nsxml.so</code> module
that should be placed in AOLserver bin directory (the same that has
main <code>nsd</code> executable). Add the following to your 
<code>nsd.tcl</code> config file:
<pre>
ns_section "ns/server/${servername}/modules"
ns_param   nsxml           ${bindir}/ns_xml.so
</pre>
and restart AOLserver. You can verify that the module gets loaded by
watching server.log, I usually use a shell window with:
<pre>
tail -f $AOLSERVERDIR/log/server.log
</pre>
This is also a great way to debug Tcl scripts since AOLserver will
dump detailed debug information every time there is an error in the
script.

<h3>XML Quick reference</h3>

Here's a quick reference of all commands available through ns_xml.

<p>

<table bgcolor=#ffffff cellspacing=1>
<tr> <td bgcolor=wheat> <b><code>
 set doc_id [<font color=gray>ns_xml parse</font> <font color=red>?-persist? $string</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Parse the XML document in a <font color=red>$string</font> and return document id
(handle to in-memory parsed tree). If you don't provide
<font color=red>?-persist?</font> flag the memory will be automatically freed when the
script exits. Otherwise you'll have to free the memory by calling 
<font color=gray>ns_xml doc free</font>. You need to use <font color=red>-persist</font> flag if you want
to share parsed XML docs between scripts.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set doc_stats [<font color=gray>ns_xml doc stats</font> <font color=red>$doc_id</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Return document's statistics.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 <font color=gray>ns_xml doc free</font> <font color=red>$doc_id</font> 
 </code> </b> </td> </tr>
<tr> <td>
 Free a document. Should only be called on a document if
<font color=red>?-persistent?</font> flag has been passed to either
<font color=gray>ns_xml parse</font> or <font color=gray>ns_xml doc create</font>

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set node_id [<font color=gray>ns_xml doc root</font> <font color=red>$doc_id</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Return the node id of the document root (you start traversal of the
document tree from here.)

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set children_list [<font color=gray>ns_xml node children</font> <font color=red>$node_id</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Return a list of children nodes of a given node.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set node_name [<font color=gray>ns_xml node name</font> <font color=red>$node_id</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Return the name of a node.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set node_type [<font color=gray>ns_xml node type</font> <font color=red>$node_id</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Return the type of a node. Possible types: <i>element, attribute,
text, cdata_section, entity_ref, entity, pi, comment, document,
document_type, document_frag, notation, html_document</i>

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set content [<font color=gray>ns_xml node getcontent</font> <font color=red>$node_id</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Get a content (text) of a given node.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set attr [<font color=gray>ns_xml node getattr</font> <font color=red>$node_id $attr_name</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Return the value of an attribute of a given node.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set doc_id [<font color=gray>ns_xml doc create</font> <font color=red>?-persist? $doc-version</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Create a new document in memory. If <font color=red>-persist</font> flag is given you'll
have to explicitely free the memory taken by the document with 
<font color=gray>ns_xml doc free</font>, otherwise it'll be freed automatically after
execution of the script. <font color=red>$doc_version</font> is a version of an XML
doc, if not specified it'll be "1.0".

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set xml_string [<font color=gray>ns_xml doc render</font> <font color=red>$doc_id</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Generate XML from the in-memory representation of the document.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set node_id [<font color=gray>ns_xml doc new_root</font> <font color=red>$doc_id $node_name $node_content</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Create a root node for a document.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set node_id [<font color=gray>ns_xml node new_sibling</font> <font color=red>$node_id $name $content</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Create a new sibling of a given node.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 set node_id [<font color=gray>ns_xml node new_child</font> <font color=red>$node_id $name $content</font>] 
 </code> </b> </td> </tr>
<tr> <td>
 Create a child of a given node.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 <font color=gray>ns_xml node setcontent</font> <font color=red>$node_id $content</font> 
 </code> </b> </td> </tr>
<tr> <td>
 Set a content for a given node.

</td></tr>
<tr> <td bgcolor=wheat> <b><code>
 <font color=gray>ns_xml node setattr</font> <font color=red>$node_id $attr_name $value</font> 
 </code> </b> </td> </tr>
<tr> <td>
 Set the value of an attribute in a given node.
</td></tr>
</table>

<h3>A simple example</h3>

An educational and simple thing to do is to parse a document and print
out its tree structure. Stripped to bare bones the process is:
<ul>
<li> use <font color=gray> <code>ns_xml parse $xml_doc</code></font>
to parse XML document in string <font color=gray>$xml_doc</font> and get
its document id
<li> use <font color=gray> <code>ns_xml doc root $doc_id</code>
</font> to get the id of a root node
<li> use <font color=gray> <code>ns_xml node children
$node_id</code> </font> to traverse document tree and <font
color=gray> <code>ns_xml node ...</code> </font>commands to get
node content and attributes
</ul>

If you provide <font color=gray> <code>-persist</code> </font> flag to
<font color=gray><code>ns_xml parse</code> </font>
you'll have to explicitly call <font color=gray> <code>ns_xml doc
free $doc_id </code> </font> to free memory associated with this
document, otherwise it will get automatically freed after execution of
a script.
<p>
In code it could look like this:

<pre>
proc dump_node {node_id level} {
    set name [ns_xml node name $node_id]
    set type [ns_xml node type $node_id]
    set content [ns_xml node getcontent $node_id]
    ns_write "&lt;li&gt;"
    ns_write "node id=$node_id name=$name type=$type"
    if { [string compare $type "attribute"] != 0 } {
	ns_write " content=$content\n"
    }
}

proc dump_tree_rec {children} {
    ns_write "&lt;ul&gt;\n"
    foreach child_id $children {
	dump_node $child_id
	set new_children [ns_xml node children $child_id]
	if { [llength $new_children] &gt; 0 } {
	    dump_tree_rec $new_children
	}
    }
}

proc dump_tree {node_id} {
    dump_tree_rec [list $node_id] 0
}

proc dump_doc {doc_id} {
    ns_write "doc id=$doc_id&lt;br&gt;\n"
    set root_id [ns_xml doc root $doc_id]
    dump_tree $root_id
}

set xml_doc "&lt;test version="1.0"&gt;this is a
&lt;blind&gt;test&lt;/blind&gt; of xml&lt;/test&gt;"
set doc_id [ns_xml parse $xml_doc]
dump_doc $doc_id    
</pre>

<font color=gray> <code>ns_xml parse</code> </font> command will throw
an error if XML document is not valid (e.g., not well formed) so in
production code we should catch it and display a meaningful error
message, e.g.:

<pre>
if { [catch {set doc_id [ns_xml parse $xml_doc]} err] } {
    ns_write "There was an error parsing the following XML document: "
    ns_write [ns_quotehtml $xml_doc]
    ns_write "Error message is:"
    ns_write [ns_quotehtml $err]
    ns_write "</body></html>\n"
    return
}
</pre>

Code like this takes more time to write but some day it may save a lot of
debugging time (and a day like this always comes).

<p>
<a href="http://www.fifthgate.org/articles/aolserver/xml/test_xml.tcl">See how the code works</a> in practice
[external site running AOLserver]
and <a href="misc/washington/test_xml.tcl.txt">get the full
source</a> [included in <I>Linux Gazette</I>]. It's a bit more complex than the
above snippet. You can see the structure of an arbitrary XML document by typing
it in the provided text area. The script also shows how to parse form data and
has more robust error handling.

<h3> Real life example</h3>

XML is better than other similar formats because it is a standard, it
has  gained wide acceptance and its usage is growing rapidly. 
One of the possible usages of XML is as a way of communication between
web sites (web services). The simplest scenario is that of one web server
grabbing information in XML format from another web server. A popular
example of such communication is a congregation of headlines, e.g., if
you go to <a
href="http://www.freshmeat.net">freshmeat.net</a> you'll see that they
provide current headlines from
<a href="http://www.linuxtoday.com">linuxtoday.com</a>. We'll do the
same thing (vive l'originalite!). <p>
In the past it could've been done in a rather distasteful way by
grabbing the whole HTML page and trying to extract relevant
information. It would be hard to program and fragile (a change in the
way HTML page is generated would most likely break such parsing).
<p>
Today the site that wants to provide headlines for others can
publish this data in an easily to parse XML format under some URL.
In our case the data are provided at
<a href="http://www.linuxtoday.com/backend/linuxtoday.xml">
http://www.linuxtoday.com/backend/linuxtoday.xml</a>.
<a href="misc/washington/test_xml.tcl.txt">See the format of this
file</a> (using previously developed script).  <!-- ?show_linuxtoday_p=1 -->
<p>
As you can see XML document represent headlines on LinuxToday site. It
is a set of stories, each story having 
title, url, author etc. We know that after parsing the XML document we
would like to have a way to easily extract the information. 
Let's use a "wishful-thinking" (in other words top-down) method of
writing the code advocated in a <a
href="http://sicp.arsdigita.org">Structure and interpretation of
computer programs</a> (a truly great CS book). Let's assume that we've
converted XML representation into an object. To build an
HTML table showing the data we need the following procedures:
<ul>
<li> get total number of stories: <font color=gray><code>headlines_get_stories_count $headlines</code> </font>
<li> get n-th story: <font color=gray><code>headlines_get_story $headline $story_no</code></font>
<li> get URL of a given story: <font color=gray><code>story_get_url $story</code></font>
<li> get title of a given story: <font color=gray><code>story_get_title $story</code></font>
</ul>
For simplicity I only use URL and title but extending this to more
attributes should be trivial.
<p>
Having those procedures we can generate the simplest (but rather ugly)
table:
<pre>
proc story_to_html_table_row { story } {
    set url [story_get_url $story]
    set title [story_get_title $story]
    return "- &lt;a href=\"$url\"&gt;&lt;font color=#000000&gt;$title&lt;/font&gt;&lt;/a&gt;&lt;br&gt;\n"
}

# given headlines generate HTML code of the table with this data
proc headlines_to_html_table { headlines } {
    set to_return "&lt;table border=0 cellspacing=1 cellpadding=3&gt;"
    append to_return "&lt;tr&gt;&lt;td&gt;&lt;small&gt;"

    set stories_count [headlines_get_stories_count $headlines]
    for {set i 0} {$i < $stories_count} {incr i} {
	set story [headlines_get_story $headlines $i]
	append to_return [story_to_html_table_row $story]
    }

    append to_return "&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;\n"
    return $to_return
}
</pre>

Tcl doesn't give us much choice for representing this object; we'll
use lists.
<pre>
proc headlines_get_stories_count { headlines } {
    return [llength $headlines]
}

proc headlines_get_story { headlines story_no } {
    return [lindex $headlines $story_no]
}

proc story_get_url { story } {
    return [lindex $story 0]
}

proc story_get_title { story } {
    return [lindex $story 1]
}
</pre>

Note that if we forget about purity (just for a while) we can rewrite
the following part of <code>headlines_to_html_table</code>: 
<pre>
set stories_count [headlines_get_stories_count $headlines]
for {set i 0} {$i < $stories_count} {incr i} {
    set story [headlines_get_story $headlines $i]
    append to_return [story_to_html_table_row $story]
}
</pre>
in a bit more terse way:
<pre>
foreach story $headlines {
    append to_return [story_to_html_table_row $story]
}
</pre>

Now the most important part: converting XML doc into the
representation we've chosen.
<pre>
# does a name of the node identified by $node_id equals $name
proc is_node_name_p { node_id name } {
    set node_name [ns_xml node name $node_id]
    if { [string_equal_p $name $node_name] } {
	return 1
    } else {
	return 0
    }
}

# does a type of the node identified by $node_id equals $type
proc is_node_type_p { node_id type } {
    set node_type [ns_xml node type $node_id]
    if { [string_equal_p $type $node_type] } {
	return 1
    } else {
	return 0
    }
}

# is this an node of type "attribute"?
proc is_attribute_node_p { node_id } {
    return [is_node_type_p $node_id "attribute"]
}

# raise an error if node name is different than $name
proc error_if_node_name_not {node_id name} {
    if { ![is_node_name_p $node_id $name] } {
	set node_name [ns_xml node name $node_id]
	error "node name should be $name and not $node_name"
    }
}

# raise an error if node type is different than $type
proc error_if_node_type_not {node_id type} {
    if { ![is_node_type_p $node_id $type] } {
	set node_type [ns_xml node type $node_id]
	error "node type should be $type and not $node_type"
    }
}

# given url and title construct a story object with
# those attributes
proc define_story { url title } {
    return [list $url $title]
}

# convert a node of name "story" into an object
# that represents story
proc story_node_to_story {node_id} {
    set url ""
    set title ""
    # go through all children and extract content of url and title nodes
    set children [ns_xml node children $node_id]
    foreach node_id $children {
	# we're only interested in nodes whose name is "url" or "title"
	if { [is_attribute_node_p $node_id]} {
	    if { [is_node_name_p $node_id "url"] || [is_node_name_p $node_id "title"]} {
		set node_children [ns_xml node children $node_id]
		# those should only have one children node with
		# the name "text" and type "cdata_section"
		if { [llength $node_children] != 1 } {
		    set name [ns_xml node name $node_id]
		    error "$name node should only have 1 child"
		}
		set one_node_id [lindex $node_children 0]
		error_if_node_type_not $one_node_id "cdata_section"
		error_if_node_name_not $one_node_id "text"
		set txt [ns_xml node getcontent $one_node_id]
		if { [is_node_name_p $node_id "url"] } {
		    set url $txt
		}
		if { [is_node_name_p $node_id "title"]} {
		    set title $txt
		}
	    }
	}
    }
    return [define_story $url $title]
}

# convert XML doc to headlines object
proc xml_to_headlines { doc_id } {
    set headlines [list]
    set root_id [ns_xml doc root $doc_id]
    # root node should be named "linuxtoday" and of type "attribute"
    error_if_node_name_not $root_id "linuxtoday"
    error_if_node_type_not $root_id "attribute"
    set children [ns_xml node children $root_id]
    foreach node_id $children {
	# only interested in attribute type nodes whose name is "story"
	if { [is_node_name_p $node_id "story"] && [is_attribute_node_p $node_id]} {
	    set story [story_node_to_story $node_id]
	    lappend headlines $story
	}
    }
    return $headlines
}
</pre>

The code is rather straightforward. We use the knowledge about the
structure of XML file. In this case we know that root node is named
<tt>linuxtoday</tt> and should have a child named
<tt>story</tt>. Each <tt>story</tt> node should have children named
<tt>url</tt> and <tt>title</tt> etc. The previous script that dumps
general structure of the tree helped me a lot in writing this
function. Note the usage of <font color=gray> <tt>error</tt> </font>
command to abort the script if XML doesn't look good to us.
<p>
Having an intermediate representation of the data might look like an
excess given that it costs us more code and some performance but there
are very good reasons to have it. We could have written a proc
<code>xml_to_html_table</code> that would create HTML table directly
from XML document but such code would be more complex, more buggy and
harder to modify. Separation that we've made provides an abstraction
that reduces complexity, which is always good. It also gives us more
flexibility: we can easily imagine writing another
<code>headlines_to_html_table</code> procedure that gives us slightly
different table.

<p>
<a
href="http://www.fifthgate.org/articles/aolserver/xml/test_linuxtoday_xml.tcl">
See how it works in practice</a> 
[external site running AOLserver]
and
<a href="misc/washington/test_linuxtoday_xml.tcl.txt">get the source</a>
[included in <I>Linux Gazette</I>]. It should
produce something like this:
<p>
<center>
<TABLE WIDTH="40%" BORDER="0" CELLSPACING="1" CELLPADDING="3">
<TR>
  <TD ALIGN="center" BGCOLOR="#cccccc">
  <B>
  <FONT FACE="Lucida,Verdana,Helvetica,Arial">
    <A href="http://linuxtoday.com">
      <FONT color="#000000">linuxtoday</FONT>
    </A>
  </FONT>
  </B>
  </TD>
</TR>

<TR>
<TD BGCOLOR="#eeeeee">
  <SMALL><FONT FACE="Lucida,Verdana,Helvetica,Arial">

- <A HREF="http://linuxtoday.com/news_story.php3?ltsn=2000-12-28-001-04-OS-DB"><FONT COLOR="#000000">Kernel Cousin Debian Hurd #73 By Paul Emsley And Zack Brown</FONT></A><BR>
- <A HREF="http://linuxtoday.com/news_story.php3?ltsn=2000-12-27-006-04-OS-SW"><FONT COLOR="#000000">Zope 2.2.5 b1 released</FONT></A><BR>
- <A HREF="http://linuxtoday.com/news_story.php3?ltsn=2000-12-27-014-06-SC"><FONT COLOR="#000000">O#39;Reilly Network: Insecurities in a Nutshell: SAMBA, pine, ircd, and More</FONT></A><BR>
- <A HREF="http://linuxtoday.com/news_story.php3?ltsn=2000-12-27-005-04-OP-HW"><FONT COLOR="#000000">ZDNet: Linux Laptop SuperGuide</FONT></A><BR>
- <A HREF="http://linuxtoday.com/news_story.php3?ltsn=2000-12-27-004-04-OP-MS"><FONT COLOR="#000000">ComputerWorld: Think tank warns that Microsoft hack could pose national security risk</FONT></A><BR>
<EFBFBD></FONT></SMALL>
</TD>
</TR>
</TABLE>
</center>

<p>
One thing missing in this code is caching. As it is, it
will grab the XML file from other people's server everytime it is
invoked. This is not nice. It would be fairly easy to add a logic to
cache XML file (or its in-memory representation) and only
fetch a new version if, say, 1 hour passed since it was last retrieved.


<h3>Conclusion about XML as a data exchange language</h3>

Is this data exchange thing between web servers a novel idea? No. You
could do everything described here with the first generation of web
servers. You would probably use different technologies (C code running
inside a web server or a CGI script instead of an embedded scripting
language; some ad-hoc text or binary format instead of XML) but the
idea would be the same: one web server acts as a client, grabs the
data from the other server using HTTP protocol and does something
useful with the data. The other web server acts as a server providing
data for others. It's just another implementation of
a client-server paradigm. It's nothing new. It is just a sign that web
programming is maturing. After 5+ years we've finally solved most of the
problems with presenting static html pages or generating dynamic web
pages from the data kept on the server (e.g., in a database). Now we
enter the times of providing services and data for other web
sites. Today state-of-the-art is pretty much limited to exchanging
headlines and similar trivia but possibilities are bigger, ranging
from simple things like providing stock quotes or dictionary
definitions to executing complex (e.g., financial) transactions
following an agreed upon protocol.
<p>

<h3>Conclusion about XML parsing in AOLserver</h3>

Beside parsing you can also create and manipulate XML documents in
memory and convert them to XML ASCII representation. It is not
covered in this article but it's so straightforward that you should
be able to do it just by looking at the API.
<p>
ns_xml module provides basics of XML processing. Although you can do
quite a bit with it one could wish to do more. Things that are
obviously missing:
<ul>
<li> SAX API (it's already present in libxml so this would only
require extending ns_xml) 
<li> support for XSLT (support for XSLT, although planned, is not yet
present in libxml)
</ul>
An alternative approach to ns_xml module would be to:
<ul>
<li> use <a href="http://pywx.idyll.org">PyWx</a>, a Python
interpreter embedded inside AOLserver and standard
<a href="http://www.python.org/sigs/xml-sig/">PyXML</a> Python module
<li> write another module wrapping some other XML parsing library
<li> use pure Tcl parser
</ul>

<h3>Links</h3>
<ul>
<li> to find out more about AOLserver read
<a href="../issue58/washington.html">
intro in December 2000 issue of LG</a> or
<a
href="http://www.arsdigita.com/asj/aolserver/introduction-1">part
one</a> and <a
href="http://www.arsdigita.com/asj/aolserver/introduction-2">part
two</a> of another intro
<li> <a href="http://www.aolserver.com">AOLserver</a> home page
<li> <a href="http://www.arsdigita.com/books/panda">Philip and Alex's
Guide to Web Publishing</a>, a book
that will make you a better web programmer
<li> <a href="http://sicp.arsdigita.org/">Structure and Interpretation
of Computer Programs</a>, a book that will make you a better
programmer
<li> <a href="http://www.arsdigita.com/books/tcl">Tcl for Web
Nerds</a>, a handy book on Tcl
<li> everybody has a web page and <a
href="http://www.fifthgate.org"> this one is mine </a>
</ul>

<address> If you have comments or suggestions,
<a href="mailto:irvingw@pobox.com">send them in</a>. </address>


<!-- *** BEGIN copyright *** -->
<P> <hr> <!-- P --> 
<H5 ALIGN=center>

Copyright &copy; 2001, Irving Washington.<BR>
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR> 
Published in Issue 63 of <i>Linux Gazette</i>, Mid-February (EXTRA) 2001</H5>
<!-- *** END copyright *** -->

<!--startcut ==========================================================-->
<HR><P>
<CENTER>
<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="sharma.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue63/washington.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><A HREF="../faq/index.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="lg_backpage63.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
</CENTER>
</BODY></HTML>
<!--endcut ============================================================-->