old-www/LDP/LG/issue89/danguer.html

617 lines
21 KiB
HTML

<!--startcut ==============================================-->
<!-- *** BEGIN HTML header *** -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<title>Working with XSLT LG #89</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!-- *** END HTML header *** -->
<!-- *** BEGIN navbar *** -->
<!-- *** END navbar *** -->
<!--endcut ============================================================-->
<TABLE BORDER><TR><TD WIDTH="200">
<A HREF="http://www.linuxgazette.com/">
<IMG ALT="LINUX GAZETTE" SRC="../gx/2002/lglogo_200x41.png"
WIDTH="200" HEIGHT="41" border="0"></A>
<BR CLEAR="all">
<SMALL>...<I>making Linux just a little more fun!</I></SMALL>
</TD><TD WIDTH="380">
<CENTER>
<BIG><BIG><STRONG><FONT COLOR="maroon">Working with XSLT</FONT></STRONG></BIG></BIG>
<BR>
<STRONG>By <A HREF="../authors/danguer.html">Daniel Guerrero</A></STRONG>
</CENTER>
</TD></TR>
</TABLE>
<P>
<!-- END header -->
The <b>eXtensible Stylesheet Language Transformations (XSLT)</b> is used mostly to transform the <b>XML</b> data to <b>HTML</b> data, but
with XSLT we could transform from XML (or anything which uses the xml namespaces, like RDF) to whatever thing we need, from xml
to plain text.
</p>
<p>
The <a href="http://www.w3.org">w3</a> defines that XSL (eXtensible Stylesheet Language) consists of three parts:
<b>XSLT</b>, <a href="http://www.w3.org/TR/xpath">XPath</a> (a expression language used by XSLT to access or refer to parts of an XML document), and the third part is
<b>XSL Formatting Objects</b>, an XML vocabulary for specifying formatting semantics
</p>
<h3>Meeting XSLT</h3>
<p>
First of all, we need to specify that our XML document will be an XSL stylesheet, and import the XML NameSpace:
</p>
<pre>
&lt;xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt
...
&lt;/xsl:stylesheet&gt;
</pre>
<p>
After that, the principal element which we will use, will be the <code>xsl:template match</code>, which is called when
the name of a xml node matchs with the value of the <code>xsl:template match</code>:
</p>
<pre>
&lt;xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
&lt;xsl:template match="/"&gt; &lt;!-- '/' is taken from XPath and will match with the root element --&gt;
&lt;!-- do something with the attributes of the node --&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;
</pre>
<p>
Inside of the <code>xsl:template match</code>, we could get an attribute of the node with the element:
<code>xsl:value-of select</code>, and the name of the attribute, lets first make an xml of example with
some information:
</p>
<pre>
&lt;!-- hello.xml --&gt;
&lt;hello&gt;
&lt;text&gt;Hello World!&lt;/text&gt;
&lt;/hello&gt;
</pre>
<p>
And this is the xslt which will extract the <code>text</code> of the root element (<code>hello</code>):
</p>
<pre>
&lt;!-- hello.xsl --&gt;
&lt;xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
&lt;xsl:template match="/"&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Extracting &lt;xsl:value-of select="//text"/&gt; &lt;/title&gt;
&lt;!-- in this case '//text' is: 'hello/text' but because I'm a lazy person... I will short it with XPath --&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;p&gt;
The &lt;b&gt;text&lt;/b&gt; of the root element is: &lt;b&gt;&lt;xsl:value-of select="//text"/&gt;&lt;/b&gt;
&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;
</pre>
<p>
The HTML output is:
</p>
<pre>
&lt;!-- hello.html --&gt;
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;
&lt;title&gt;Extracting Hello World! &lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;p&gt;
The &lt;b&gt;text&lt;/b&gt; of the root element is: &lt;b&gt;Hello World!&lt;/b&gt;
&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
</pre>
<h4>Selecting Attributes</h4>
<p>
<code>@att</code> will match with the attribute <code>att</code>. For example:
</p>
<pre>
&lt;!-- hello_style.xml --&gt;
&lt;hello&gt;
&lt;text color="red"&gt;Hello World!&lt;/text&gt;
&lt;/hello&gt;
</pre>
<p>
And the XSLT:
</p>
<pre>
&lt;!-- hello_style.xsl --&gt;
&lt;xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
&lt;xsl:template match="/"&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Extracting &lt;xsl:value-of select="//text"/&gt; &lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;p&gt;
The &lt;b&gt;text&lt;/b&gt; of the root element is: &lt;b&gt;&lt;xsl:value-of select="//text"/&gt;&lt;/b&gt;
and his &lt;b&gt;color&lt;/b&gt; attribute is: &lt;xsl:value-of select="//text/@color"/&gt;
&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;
</pre>
<p>
The HTML output will be:
</p>
<pre>
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;
&lt;title&gt;Extracting Hello World! &lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;p&gt;
The &lt;b&gt;text&lt;/b&gt; of the root element is: &lt;b&gt;Hello World!&lt;/b&gt;
and his &lt;b&gt;color&lt;/b&gt; attribute is: red
&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
</pre>
<p>
If you are thinking in use this information to, in this case, put in red color the text <i>Hello World!</i>,
yes it's possible, in two forms, making variables and using they in the attributes of the font, for
example, or using the <code>xsl:attribute</code> element.
</p>
<h3>Variables</h3>
<p>
Variables could be used to contain constants or the value of an element.
</p>
<p>
Assigning constants are simple:
</p>
<pre>
&lt;!-- variables.xsl --&gt;
&lt;xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
&lt;xsl:template match="/"&gt;
&lt;!-- definition of the variable --&gt;
&lt;xsl:variable name="path"&gt;http://somedomain/tmp/xslt&lt;/xsl:variable&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Examples of Variables&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;p&gt;
&lt;a href="{$path}/photo.jpg"&gt;Photo of my latest travel&lt;/a&gt;
&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;
</pre>
<p>
The html output:
</p>
<pre>
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;
&lt;title&gt;Examples of Variables&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;p&gt;&lt;a href="http://somedomain/xslt/photo.jpg"&gt;Photo of my latest travel&lt;/a&gt;&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
</pre>
<p>
You can also get the value of the variable selecting it from the values or attributes of the nodes:
</p>
<pre>
&lt;!-- variables_select.xsl --&gt;
&lt;xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
&lt;xsl:template match="/"&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Examples of Variables&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;xsl:apply-templates select="//photo"/&gt;
&lt;/body&gt;
&lt;/html&gt;
&lt;/xsl:template&gt;
&lt;xsl:template match="photo"&gt;
&lt;!-- definition of the variables --&gt;
&lt;xsl:variable name="path"&gt;http://somedomain/tmp/xslt&lt;/xsl:variable&gt;
&lt;xsl:variable name="photo" select="file"/&gt;
&lt;p&gt;
&lt;a href="{$path}/{$photo}"&gt;&lt;xsl:value-of select="description"/&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;
</pre>
<p>
And the xml source (I don't put images of myself, because I don't want to scare you :-) )
</p>
<pre>
&lt;!-- variables_select.xml --&gt;
&lt;album&gt;
&lt;photo&gt;
&lt;file&gt;mountains.jpg&lt;/file&gt;
&lt;description&gt;me at the mountains&lt;/description&gt;
&lt;/photo&gt;
&lt;photo&gt;
&lt;file&gt;congress.jpg&lt;/file&gt;
&lt;description&gt;me at the congress&lt;/description&gt;
&lt;/photo&gt;
&lt;photo&gt;
&lt;file&gt;school.jpg&lt;/file&gt;
&lt;description&gt;me at the school&lt;/description&gt;
&lt;/photo&gt;
&lt;/album&gt;
</pre>
<p>
And the html output:
</p>
<pre>
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;
&lt;title&gt;Examples of Variables&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;p&gt;&lt;a href="http://somedomain/tmp/xslt/mountains.jpg"&gt;me at the mountains&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://somedomain/tmp/xslt/congress.jpg"&gt;me at the congress&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://somedomain/tmp/xslt/school.jpg"&gt;me at the school&lt;/a&gt;&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
</pre>
<p>
If you note, you will see that the <code>photo</code> element-match is called three times because of
the <code>xsl:apply-templates</code>, every time xslt finds an element that match it,
is called the <code>xsl:template match</code> that matches it.
</p>
<p>
Ok, so you are impatient to try to make the text in red of the <code>hello_style.xml</code>?, try to do this with
variables, if you can't do it, open this page <a href="misc/danguer/hello_style_variables.xsl">misc/danguer/hello_style_variables.xsl</a>
</p>
<h3></h3>
<h3>Sorting</h3>
<p>
XSLT could sort the processing of xml tags with <code>&lt;xsl:sort select="<i>sort_by_this_attibute</i>"&gt;</code>, this
element must be placed into <code>xsl:apply-templates</code> element, you could sort by an xml element or attribute,
in ascending or descending order, you could also specify the order of the case (if the lower case
is before than a upper case, or vice versa).
</p>
<p>
I will use the example of the album, and I will add only the sort element:
</p>
<pre>
&lt;xsl:apply-templates select="//photo"&gt;
&lt;xsl:sort select="file" order="descending"&gt;
&lt;/xsl:apply-templates&gt;
</pre>
<p>
This will alter only the order of photos is put in the html, in fact, xslt will order
first all the elements <code>photo</code> of our xml, and it will send to the
<code>template-match</code> element in that order, that's why the <code>xsl:sort</code>
element must go inside the <code>xsl:apply-templates</code>.
</p>
<p>
The xsl's and html's files are in the examples, you can get it with these links:
</p>
<ul>
<li><a href="misc/danguer/sort.xsl">misc/danguer/sort.xsl</a>. Sorting StyleSheet</li>
<li><a href="misc/danguer/sort.html">misc/danguer/sort.html</a>. Sorting HTML Output</li>
</ul>
<h3>if statement</h3>
<p>
There will some cases when you need to put some text if some xml element (or attribute) appears,
or other if doesn't appears, the <code>xsl:if</code> element will do this for you, I will show you
what can do, let's image you have a page with documents (this example is taken from my 'tests' at
TLDP-ES project) and from these documents, you know if the sources were converted to PDF, PS or
HTML format, this information is in you xml, so you can test if the PDF file was generated, and
put a link to it:
</p>
<pre>
&lt;xsl:if test="format/@pdf = 'yes'"&gt;
&lt;a href="{$doc_path}/{$doc_subpath}/{$doc_subpath}.pdf"&gt;PDF&lt;/a&gt;
&lt;/xsl:if&gt;
</pre>
<p>
If the pdf attibute of the document is yes, like this example:
</p>
<pre>
&lt;document&gt;
&lt;title&gt;Bellatrix Library and Semantic Web&lt;/title&gt;
&lt;author&gt;Daniel Guerrero&lt;/author&gt;
&lt;module&gt;bellatrix&lt;/module&gt;
&lt;format pdf="yes" ps="yes" html="yes"/&gt;
&lt;/document&gt;
</pre>
<p>
Then it will put a link to the document in the PDF format, if the attribute is 'no' or
whatever value the xml's DTD allow you, then no link will put, if you want to check all
the xsl and xml documents they are in:
</p>
<ul>
<li><a href="misc/danguer/documents.xml">misc/danguer/documents.xml</a>. Documents Info</li>
<li><a href="misc/danguer/documents.xsl">misc/danguer/documents.xsl</a>. Documents StyleSheet</li>
<li><a href="misc/danguer/documents.html">misc/danguer/documents.html</a>. Documents HTML Output</li>
</ul>
<h3>for-each statement</h3>
<p>
If you check the xml document of the below example, you will see, in the first document we have
three authors separated by a comma, obviously a better way to separate the authors will put it
in separated <code>&lt;author&gt;</code> tagas:
</p>
<pre>
&lt;document&gt;
&lt;title&gt;Donantonio: bibliographic system for automatic distribuited publication. Specifications of Software Requeriments&lt;/title&gt;
&lt;author&gt;Ismael Olea&lt;/author&gt;
&lt;author&gt;Juan Jose Amor&lt;/author&gt;
&lt;author&gt;David Escorial&lt;/author&gt;
&lt;module&gt;donantonio&lt;/module&gt;
&lt;format pdf="yes" ps="no" html="yes"/&gt;
&lt;/document&gt;
</pre>
<p>
And you could think to make an <code>xsl:apply-templates</code> and a
<code>xsl:template match</code> to put every name in a separate row, for example, this could
be done, but if you also could utilice the <code>xsl:for-each</code> statement.
</p>
<pre>
&lt;xsl:for-each select="author"&gt;
&lt;tr&gt;
&lt;td&gt;
Author: &lt;xsl:apply-templates /&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/xsl:for-each&gt;
</pre>
<p>
In this case, the processor will go through all the authors that the document had, and
if you are wondering what template I made to process the authors, I will say there is no
template, the processor will take the <code>apply-templates</code> element like a 'print'
the text of the element selected by the <code>for-each</code> element.
</p>
<h3>choose statement</h3>
<p>
The last xslt element I will show you is the choose element, this works like the popoular
<code>switch</code> of popular languages like C.
</p>
<p>
First you must declare a <code>xsl:choose</code> element, and after, put all the options
in <code>xsl:when</code> elements, if element couldn't satisfy any when, then you could
put an <code>xsl:otherwise</code> element:
</p>
<pre>
&lt;xsl:variable name="even" select="position() mod 2"/&gt;
&lt;xsl:choose&gt;
&lt;xsl:when test="$even = 1"&gt;
&lt;![CDATA[&lt;table width="100%" bgcolor="#cccccc"&gt;]]&gt;
&lt;/xsl:when&gt;
&lt;xsl:when test="$even = 0"&gt;
&lt;![CDATA[&lt;table width="100%" bgcolor="#99b0bf"&gt;]]&gt;
&lt;/xsl:when&gt;
&lt;xsl:otherwise&gt;
&lt;![CDATA[&lt;table width="100%" bgcolor="#ffffff"&gt;]]&gt;
&lt;/xsl:otherwise&gt;
&lt;/xsl:choose&gt;
</pre>
<p>
The <code>position()</code> returns the number of element processed, in the case of the
documents, the number will increment as many documents you had, in this case, we only want
to know which document is even or odd, so we can put a table of a color for the even
numbers and other for the odd numbers; I put the <code>xsl:otherwise</code> only to illustrate
its use, but actually I think it will never be a table with blank background in our library.
</p>
<p>
If you ask me why I put a <code>CDATA</code> section?, I will answer you, because if I don't
put it, then the processor will ask for his termation tag (<code>&lt;/table&gt;</code>) but
its termination is bottom, so, the termination tag will need also the <code>CDATA</code> section.
</p>
<p>
Once again, I have to short the code, if you want to see all the code, you must see these documents:
</p>
<ul>
<li><a href="misc/danguer/documents_choose.xsl">misc/danguer/documents_choose.xsl</a>. Documents StyleSheet</li>
<li><a href="misc/danguer/documents_choose.html">misc/danguer/documents_choose.html</a>. Documents HTML Output</li>
</ul>
<h2>XSLT Processors</h2>
<h3>Saxon</h3>
<p>
<a href="http://saxon.sourceforge.net">Saxon</a> is a XSLT Processor written in Java, I'm using the version 6.5.2, the following instructions will be for this version,
in others versions you have to check the properly information for running Saxon.
</p>
<h4>Installation</h4>
<p>
After you have downloaded the saxon zip, you must unzip it:
</p>
<pre>
[danguer@perseo xslt]$ unzip saxon6_5_2.zip
</pre>
<p>
After this, you must include the saxon.jar file in you class path, you can pass the path of the jar to java with the <code>-cp path</code> option.
I will put saxon.jar under the dir xslt, you must write to Java the Class you will use; in the case of my saxon version (6.5.2) the Class is:
<code>com.icl.saxon.StyleSheet</code> and also pass as argument the document in xml and the XSLT StyleSheet that you will use. For example:
</p>
<pre>
[danguer@perseo xslt]$ java -cp saxon.jar com.icl.saxon.StyleSheet document.xml tranformation.xsl
</pre>
<p>
This will send the output of the transformation to the standard output, you can send to a file with:
</p>
<pre>
[danguer@perseo xslt]$ java -cp saxon.jar com.icl.saxon.StyleSheet document.xml tranformation.xsl > file_processed.html
</pre>
<p>
For example, we will transform our first example of XSLT with saxon:
</p>
<pre>
[danguer@perseo xslt]$ java -cp saxon.jar com.icl.saxon.StyleSheet cards.xml cards.xsl > cards.html
</pre>
<p>
And as I said, the result of the processing with xslt is:
</p>
<pre>
[danguer@perseo xslt]$ java -cp saxon.jar com.icl.saxon.StyleSheet hello.xml hello.xsl > hello.html
</pre>
<h3>xsltproc</h3>
<p>
xsltproc comes with all the major distributions, the sintaxis it's like the saxon's one:
</p>
<pre>
[danguer@perseo xslt]$ xsltproc hello.xsl hello.xml > hello.html
</pre>
<p>
I know there are others xslt processors, like sablotron, but I haven't used, so, I can't suggest
you ;-).
</p>
<h2>References</h2>
<ul>
<li><a href="http://w3.org/TR/xslt">Technical Report for XSLT from the w3 consortium</a></li>
<li><a href="http://w3.org/TR/xsl">XSL homepage</a></li>
<li><a href="http://saxon.sourceforge.net">Saxon homepage</a></li>
<li>Examples (the 'card' examples, are a example I was written before start this document, and I hope could help you):
<ul>
<li><a href="misc/danguer/cards.html">misc/danguer/cards.html</a></li>
<li><a href="misc/danguer/cards.xml">misc/danguer/cards.xml</a></li>
<li><a href="misc/danguer/cards.xsl">misc/danguer/cards.xsl</a></li>
<li><a href="misc/danguer/hello.html">misc/danguer/hello.html</a></li>
<li><a href="misc/danguer/hello.xml">misc/danguer/hello.xml</a></li>
<li><a href="misc/danguer/hello.xsl">misc/danguer/hello.xsl</a></li>
<li><a href="misc/danguer/hello_style.html">misc/danguer/hello_style.html</a></li>
<li><a href="misc/danguer/hello_style.xml">misc/danguer/hello_style.xml</a></li>
<li><a href="misc/danguer/hello_style.xsl">misc/danguer/hello_style.xsl</a></li>
<li><a href="misc/danguer/hello_style_variables.html">misc/danguer/hello_style_variables.html</a></li>
<li><a href="misc/danguer/hello_style_variables.xsl">misc/danguer/hello_style_variables.xsl</a></li>
<li><a href="misc/danguer/variables.html">misc/danguer/variables.html</a></li>
<li><a href="misc/danguer/variables.xsl">misc/danguer/variables.xsl</a></li>
<li><a href="misc/danguer/variables_select.html">misc/danguer/variables_select.html</a></li>
<li><a href="misc/danguer/variables_select.xml">misc/danguer/variables_select.xml</a></li>
<li><a href="misc/danguer/variables_select.xsl">misc/danguer/variables_select.xsl</a></li>
<li><a href="misc/danguer/sort.xsl">misc/danguer/sort.xsl</a></li>
<li><a href="misc/danguer/sort.html">misc/danguer/sort.html</a></li>
<li><a href="misc/danguer/documents.xml">misc/danguer/documents.xml</a></li>
<li><a href="misc/danguer/documents.xsl">misc/danguer/documents.xsl</a></li>
<li><a href="misc/danguer/documents.html">misc/danguer/documents.html</a></li>
<li><a href="misc/danguer/documents_for.html">misc/danguer/documents_for.html</a></li>
<li><a href="misc/danguer/documents_for.xml">misc/danguer/documents_for.xml</a></li>
<li><a href="misc/danguer/documents_for.xsl">misc/danguer/documents_for.xsl</a></li>
<li><a href="misc/danguer/documents_choose.xsl">misc/danguer/documents_choose.xsl</a></li>
<li><a href="misc/danguer/documents_choose.html">misc/danguer/documents_choose.html</a></li>
</ul>
</li>
</ul>
<!-- *** BEGIN author bio *** -->
<P>&nbsp;
<P>
<!-- *** BEGIN bio *** -->
<P>
<img ALIGN="LEFT" ALT="[BIO]" SRC="../gx/2002/note.png">
<em>
I'm trying to finish my Bachelor Degree at BUAP in Puebla, Mexico. <!-- I started to
give conferences in my country at the end of 2002.--> I'm involved with TLPD-ES
project, and they make I learn all about this technologies, now I'm learning
about Semantic Web.
</em>
<br CLEAR="all">
<!-- *** END bio *** -->
<!-- *** END author bio *** -->
<!-- *** BEGIN copyright *** -->
<hr>
<CENTER><SMALL><STRONG>
Copyright &copy; 2003, Daniel Guerrero.
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
Published in Issue 89 of <i>Linux Gazette</i>, April 2003
</STRONG></SMALL></CENTER>
<!-- *** END copyright *** -->
<HR>
<!--startcut ==========================================================-->
<CENTER>
<!-- *** BEGIN navbar *** -->
<!-- *** END navbar *** -->
</CENTER>
</BODY></HTML>
<!--endcut ============================================================-->