LDP/LDP/guide/docbook/LDP-Author-Guide/tools-transform.xml

644 lines
23 KiB
XML

<!--
<!DOCTYPE book PUBLIC '-//OASIS//DTD DocBook XML V4.2//EN'>
-->
<section id="transformations">
<title>Transformations</title>
<warning><para>
This section is about how to transform documents
from DocBook to other formats. If you do not want
to transform documents on your own computer you may
<emphasis>skip this section</emphasis>.
</para>
<para>
If you would like to transform your documents for
proofreading purposes, please use the <ulink
url="http://www.xml-dev.com/blog/xml2html.php">XML to HTML on-line
converter</ulink>. You will need to upload your XML file(s)
to a web site. Then simply drop the URL into the form and
click the submit button. Your document will be magically
transformed into a beautiful (and legible)
HTML document. External files are supported. You may use
either absolute or relative URIs.
</para>
<para>
Another fairly straight-forward package is
<application>xmlto</application>. It is a front-end for
<application>xsltproc</application>. It is available as a
RedHat, Debian (etc) package or can be downloaded from <ulink
url="http://cyberelk.net/tim/xmlto/" />. You can use it to convert
documents with:
</para>
<programlisting>
<prompt>bash$ </prompt><userinput>xmlto html <filename>mydoc.xml</filename></userinput>
<prompt>bash$ </prompt><userinput>xmlto text <filename>mydoc.xml</filename></userinput>
</programlisting>
<para>
You do not ever need to transform documents
before submitting to the LDP. The LDP
volunteers have a system which transforms
your DocBook file into HTML, PDF and plain
text formats. There, you've been warned.
</para></warning>
<para>
Still here? Great! Transformations are
a pretty basic requirement to get what
you've written from a messy tag-soup into
something that can be read. This section
will help you get your system setup and
ready to transform your latest document
into other formats. This is very useful
is you want to <emphasis>see</emphasis>
your document before you release it to
the world.
</para>
<para>
There are currently two ways to transform
your document: Document Style Semantics and
Specification Language (DSSSL); and XML
Style sheets (XSLT). Although the LDP web site uses
DSSSL to convert documents you may use XSLT
if you want. You only need to choose <emphasis>one</emphasis> of these
methods. For more information about DSSSL read: <xref linkend="dsssl"
/>, for more information about XSLT read: <xref linkend="xsl" />.
</para>
<section id="dsssl">
<title>DSSSL</title>
<para>
There are three basic requirements to transform a document using DSSSL:
</para>
<itemizedlist>
<listitem><para>
The Document Style and Semantics Specification Language files
(these are plain text files). <xref linkend="dsssl-files" />
</para>
</listitem>
<listitem><para>
The Document Type Definition file
which matches the DOCTYPE of your
document (this is a plain text file).
<xref linkend="dtd" />
</para></listitem>
<listitem><para>
A processor to do the actual work. <xref linkend="dsssl-processors" />
</para>
</listitem>
</itemizedlist>
<section id="dsssl-files">
<title>The Style Sheets</title>
<para>There are two versions of the
Document Style Semantics and Specification Language
used by the LDP to transform documents from your
raw DocBook files into other formats (which are then
published on the Web). The LDP version of the style
sheets requires the Norman Walsh version--which
basically means if you're using DSSSL the Norman
Walsh version can be considered a requirement for
system setup.</para>
<formalpara>
<title>Normal Walsh DSSSL <ulink
url="http://nwalsh.com/docbook/dsssl/">http://nwalsh.com/docbook/dsssl/</ulink></title>
<para>The
<indexterm><primary>DSSSL</primary></indexterm>
Document Style Semantics and Specification Language tells
Jade (see <xref linkend="dsssl-processors"/>) how to render
a DocBook document into print or on-line
form. The DSSSL is what converts a
<sgmltag>title</sgmltag> tag into an
&lt;h1&gt; HTML tag, or to 14 point bold Times Roman for
RTF, for example. Documentation for DSSSL is located at
the same site. Note that modifying the DSSSL doesn't modify
DocBook itself. It merely changes the way the rendered text
looks. The LDP uses a modified DSSSL (see below).</para>
</formalpara>
<formalpara>
<title>LDP DSSSL <ulink
url="http://www.tldp.org/authors/tools/ldp.dsl">http://www.tldp.org/authors/tools/ldp.dsl</ulink></title>
<para>The LDP DSSSL requires the Norman Walsh version (see
above) but is a slightly modified DSSSL to provide things
like a table of contents.</para>
</formalpara>
<example id="ex-dsssl">
<title><quote>Installing</quote> DSSSL style sheets</title>
<para>Create a base directory to store everything such as <filename moreinfo="none"
class="directory">/usr/share/sgml/</filename>. Copy the DSSSL
style sheets into a sub-directory named
<filename class="directory">dsssl</filename>.
</para>
</example>
</section>
<section id="dsssl-processors">
<title>DSSSL Processors</title>
<para>
<indexterm><primary>jade</primary></indexterm>
There are two versions of the Jade processor: the
original version by James Clark; and an open-source
version of approximately the same program, <application>OpenJade</application>.
You only need <emphasis>one</emphasis> of these programs. It should
be installed <emphasis>after</emphasis> the DTD
and DSSSL have been <quote>installed.</quote>
</para>
<variablelist>
<title>DSSSL Transformation Tools</title>
<varlistentry>
<term>Jade</term>
<listitem>
<para>
<ulink url="ftp://ftp.jclark.com/pub/jade/jade-1.2.1.tar.gz">ftp://ftp.jclark.com/pub/jade/jade-1.2.1.tar.gz</ulink></para>
<para>Jade is the front-end processor for SGML and
XML. It uses the DSSSL and DocBook DTD to perform the
verification and rendering from SGML and XML into the target
format.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>OpenJade</term>
<listitem>
<para><ulink
url="http://openjade.sourceforge.net/">http://openjade.sourceforge.net/</ulink></para>
<para>An extension of Jade written by the DSSSL
community. Some applications require jade, but are being
updated to support either software package.</para>
</listitem>
</varlistentry>
</variablelist>
</section>
<section id="dsssl-setup">
<title>System Setup for DSSSL Transformations</title>
<orderedlist>
<listitem>
<para>
Tell your system where to find the SGML_CATALOG_FILES (yes, even if you
are using XML). You can find an example of how to do this in <xref
linkend="ex-catalog-files" />.
<!--
export
SGML_CATALOG_FILES=$_toolroot/dtd/docbook.cat:\
$_toolroot/dsssl/docbook/catalog:$_toolroot/jade-1.2.1/dsssl/catalog</command>
-->
</para>
</listitem>
<listitem>
<para>
Download the DSSSL and DTD files and copy them into your working
directory. You can find an example of how to do this in <xref
linkend="ex-dsssl" /> and <xref linkend="ex-dtd" />.
</para>
</listitem>
</orderedlist>
</section>
<section id="usingjade">
<title>Transformations with DSSSL</title>
<para>
Once your system is configured (see the previous section), you should
be able to start using <application>jade</application> to transform
your files from XML to XHTML.
</para>
<!--
<para>To do this you must add the following lines to the file
<filename>/etc/sgml/catalog</filename>, by bringing it up under
your favorite text editor:</para>
<informalexample>
<screen format="linespecific">
PUBLIC "-//Norman Walsh//DOCUMENT DocBook HTML Stylesheet//EN"
/usr/lib/sgml/stylesheets/nwalsh-modular/html/docbook.dsl<co id="html"/>
PUBLIC "-//Norman Walsh//DOCUMENT DocBook Print Stylesheet//EN"
/usr/lib/sgml/stylesheets/nwalsh-modular/print/docbook.dsl<co
id="print"/>
PUBLIC "-//Norman Walsh//DOCUMENT DSSSL Library//EN"
/usr/lib/sgml/stylesheets/nwalsh-modular/lib/dblib.dsl
PUBLIC "-//Norman Walsh//DOCUMENT DSSSL Library V2//EN"
/usr/lib/sgml/stylesheets/nwalsh-modular/lib/dblib.dsl
</screen>
</informalexample>
-->
<para>
To create individual HTML files, point <application>jade</application>
at the correct DSL (style sheet). The following example uses the LDP
style sheet.
</para>
<screen format="linespecific">
<prompt>bash$ </prompt><command>jade -t xml -i html \
-d /usr/local/sgml/dsssl/docbook/html/ldp.dsl#html \
<replaceable>HOWTO.xml</replaceable></command>
</screen>
<para>
If you would like to produce a single-file HTML page, add the
<parameter>-V nochunks</parameter> parameter. You can specify the
name of the final HTML file by appending the command with <parameter>
&gt; <replaceable>output.html</replaceable></parameter>.
</para>
<screen format="linespecific">
<prompt>bash$ </prompt><command>jade -t xml -i html -V nochunks \
-d /usr/local/sgml/dsssl/stylesheets/ldp.dsl#html \
<replaceable>HOWTO.sgml</replaceable> &gt; output.html</command>
</screen>
<note>
<para>
If you get an error about <quote>is not a function name</quote>, you
will need to add a pointer to <filename>xml.dcl</filename>. It has
to be listed immediately before the pointer to your XML
document. Try one of the following locations:
<filename>/usr/lib/sgml/declaration/xml.dcl</filename>, or
<filename>/usr/share/sgml/declaration/xml.dcl</filename>. Use
<application>locate</application> to find the file if it is not in
either of those two places. The modified command would be as
follows:
</para>
<screen>
<prompt>bash$ </prompt><command>jade -t xml -i html \
-d /usr/local/sgml/dsssl/docbook/html/ldp.dsl#html \
/usr/lib/sgml/declaration/xml.dcl <replaceable>HOWTO.xml</replaceable></command>
</screen>
</note>
<para>
If you would like to create print-friendly files instead of HTML files,
simply change the style sheet that you are using. In the file name
above, note <quote>html/ldp.dsl</quote> at the end. Change this to
<quote>print/docbook.dsl</quote>, or if you want XHTML output, instead
of HTML, change the file name to <quote>xhtml/docbook.dsl</quote>.
</para>
<section id="dsssl-css">
<title>Changing CSS Files</title>
<para>If you want your HTML files to use a specific
<acronym>CSS</acronym> stylesheet, you will need to edit
<filename>ldp.dsl</filename>. Immediately after
<literal>;; End of $verbatim-display$ redefinition</literal>
add the following lines:</para>
<screen format="linespecific">
(define %stylesheet-type%
;; The type of the stylesheet to use
"text/css")
(define %stylesheet%
;; Name of the css stylesheet to use, use value #f if you don't want to
;; use css stylesheets
"base.css")
</screen>
<para>
Replace <filename>base.css</filename> with the name of the
<acronym>CSS</acronym> file you would like to use.
</para>
</section>
</section>
</section>
<section id="xsl">
<title>XSL</title>
<para>
There are alternatives to DSSSL and Jade/OpenJade.</para>
<para>When working with DocBook XML, the LDP uses a series of
XSL<footnote><para>In truth, "XSL" is actually comprised of three
components: the <emphasis>XSLT</emphasis> transformation language,
the <emphasis>XPath</emphasis> expression language (used by XSLT),
and XSL Formatting Objects (FO) that are used for describing a page.
The style sheets are actually written in XSLT and generate either HTML
or (for print output) FO. The FO file is then run through a FO processor
to create the actual print (PDF or PostScript) output. See the
<ulink url="http://www.w3.org/Style/XSL/WhatIsXSL.html">W3C web
site</ulink> for more information.</para></footnote>
style sheets to process documents into HTML. These style
sheets create
output files using the XML tool set that are similar to those produced by
the SGML tools using <link linkend="dsssl">ldp.dsl</link>.
</para>
<para>The major difference between using <filename>ldp.dsl</filename>
and the XSL style sheets is the way that the generation of multiple
files is handled, i.e. the creation of a separate file for each chapter,
section and appendix. With the SGML tools, such as
<application>jade</application> or
<application>openjade</application>, the tool
itself was responsible for generating the separate files. Because of
this, only a single file, <filename>ldp.dsl</filename> was necessary as
a customization layer for the standard DocBook DSSSL style sheets.
</para>
<para>With the DocBook XSL style sheets, generation of multiple files is
controlled <emphasis>by the style sheet</emphasis>. If you want to
generate a single file, you call one style sheet. If you want to generate
multiple files, you call a different style sheet. For that reason the
LDP XSL style sheet distribution is comprised of four files:
</para>
<orderedlist>
<listitem>
<para><filename>tldp-html.xsl</filename> - style sheet called to generate
a <emphasis>single</emphasis> file.
</para>
</listitem>
<listitem>
<para><filename>tldp-html-chunk.xsl</filename><footnote><para>In XSL
terminology, the process of generating multiple files is referred to
as "chunking".</para></footnote> - style sheet called to generate
multiple files based on chapter, section and appendix elements.
</para>
</listitem>
<listitem>
<para><filename>tldp-html-common.xsl</filename> - style sheet containing
the actual XSLT transformations. It is called by the other two HTML
style sheets and is <emphasis>never</emphasis> directly called.
</para>
</listitem>
<listitem>
<para><filename>tldp-print.xsl</filename> - style sheet for generation of
XSL Formatting Objects for print output.
</para>
</listitem>
</orderedlist>
<para>
You can find the latest copy of the files at <ulink
url="http://my.core.com/~dhorton/docbook/tldp-xsl/">http://my.core.com/~dhorton/docbook/tldp-xsl/</ulink>.
The package includes installation instructions which are duplicated
at <ulink
url="http://my.core.com/~dhorton/docbook/tldp-xsl/doc/tldp-xsl-howto.html"
/>. The short version of the install instructions is as follows:
Download and unzip the latest package from the web site. Take the
files from the <filename class="directory">html</filename>
directory of TLDP-XSL and put them in the
<filename class="directory">html</filename> directory of Norman Walsh's
stylesheets. Take the file from the TLDP-XSL <filename
class="directory">fo</filename> directory and put it in the Norman Walsh
<filename class="directory">fo</filename> directory.
</para>
<para>
Once you have installed these files you can use
<application>xsltproc</application> to generate HTML files from your
XML documents. To transform your XML file(s) into a single-page HTML
document use the following command:
</para>
<screen format="linespecific">
<prompt>bash$</prompt> <command> xsltproc -o <replaceable>HOWTO.html</replaceable> /usr/local/sgml/stylesheets/tldp-html.xsl <replaceable>HOWTO.xml</replaceable></command>
</screen>
<para>To generate a set of linked HTML pages, with a separate page for each
<sgmltag>chapter</sgmltag>, <sgmltag>sect1</sgmltag> or
<sgmltag>appendix</sgmltag>, use the following
command:
</para>
<screen format="linespecific">
<prompt>bash$ </prompt><command>xsltproc /usr/share/sgml/stylesheets/tldp-html-chunk.xsl <replaceable>HOWTO.xml</replaceable></command>
</screen>
<para>Note that you never directly call the style sheet
<filename>tldp-html-common.xsl</filename>. It is called by both of the
other two style sheets.
</para>
<section id="xsl-css">
<title>Changing CSS Files</title>
<para>If you want your HTML files to use a specific
<acronym>CSS</acronym> stylesheet, you will need to edit
<filename>tldp-html-common.xsl</filename>. Look for a line that
ressembles <literal>&lt;xsl:param name="html.stylesheet"
select="'style.css'"/&gt;</literal>.</para>
<para>
Replace <filename>style.css</filename> with the name of the
<acronym>CSS</acronym> file you would like to use.
</para>
</section>
</section>
</section>
<section id="dtd">
<!-- not sure if this is in the right place.... -->
<title>DocBook DTD</title>
<para>
<indexterm><primary>DocBook DTD</primary></indexterm>
The DocBook DTD defines the structure of a DocBook
document. It contains rules about how the elements
may be used; and what the elements ought to be
describing. For example: it is the DocBook DTD
which states all <sgmltag>warning</sgmltag>s are to
<emphasis>warn</emphasis> the reader (this is the
definition of the element); and may not contain
plain text (this is the content model--and the
bit which forces you to wrap your warning text
in a <sgmltag>para</sgmltag> or perhaps a list).
</para>
<caution><para>
It is important that you download the
version(s) that match your document. You
may want to configure your system now to
deal with <quote>all</quote> DocBook DTDs
if you are going to be editing older LDP
documents. If you are a new author, you
only need the first one listed: XML DTD
for DocBook version 4.2.
</para></caution>
<para>The XML DTD is available from <ulink
url="http://www.oasis-open.org/docbook/xml/4.2">http://www.oasis-open.org/xml/4.2/</ulink>.
The LDP prefers this version of the DocBook DTD.
</para>
<para>
If you are going to be working with SGML
versions of DocBook you will need one
(or both) of:
<ulink
url="http://www.oasis-open.org/docbook/sgml/4.1/docbk41.zip">http://www.oasis-open.org/docbook/sgml/4.1/docbk41.zip</ulink>
or <ulink
url="http://www.oasis-open.org/docbook/sgml/3.1/docbk31.zip">http://www.oasis-open.org/docbook/sgml/3.1/docbk31.zip</ulink>
</para>
<example id="ex-dtd">
<title><quote>Installing</quote> DocBook Document Type Definitions</title>
<para>Create a base directory to store everything such as <filename moreinfo="none"
class="directory">/usr/local/sgml/</filename>. Copy the DTDs
into a sub-directory named <filename class="directory">dtd</filename>.
</para>
</example>
<warning><para>
Do not edit the DTD files.
</para></warning>
</section>
<section id="doc-components">
<title>Document Components</title>
<section id="tools-compile">
<title>Compiling the sources</title>
<indexterm>
<primary>tools</primary>
<secondary>compiling the sources</secondary>
</indexterm>
<para>The <command>compiles-sgml</command> script is a set
of grouped
commands. As parameters, use the
<glossterm><acronym>SGML</acronym></glossterm> file and
the output format you want.
</para>
<para>The script included below supports the following
formats: <glossterm><acronym>XML</acronym></glossterm>,
<glossterm><acronym>HTML</acronym></glossterm>,
TeX, <glossterm><acronym>RTF</acronym></glossterm>,
<glossterm>PS</glossterm>, <glossterm>DVI</glossterm>
and mirrored PS, ideal for the creation of photo
lithographs.</para>
<para>The generation of the index is made automatically by
the script <filename>collateindex.pl</filename><footnote>
<para>More information about indexes are available
at <ulink
url="http://nwalsh.com/docbook/dsssl/doc/indexing.html">the
page about index of Norman Walsh</ulink>.</para> </footnote>,
which should be installed in your system. </para>
<para>Besides the commands below, which generate the
outputs in different formats, there are other tools from
<trademark>Cygnus</trademark> for making the direct
conversion. The tools can be obtained from <ulink
url="http://sourceware.cygnus.com/docbook-tools/">Cygnus</ulink>.</para>
<para>You can <ulink
url="compiles-sgml">download the compile script</ulink> as well
as a version of <ulink
url="collateindex.pl"><filename>collateindex.pl</filename></ulink>.</para>
<indexterm>
<primary>tools</primary> <secondary>compiling
sources</secondary> <tertiary>compile-sgml</tertiary>
</indexterm>
<para>A similar script can be modified by creating
a <filename>Makefile</filename> and optimizing some
functions.</para>
</section>
<section id="toc-articles">
<title>Inserting a summary on the initial articles page</title>
<indexterm>
<primary>tools</primary> <secondary>articles</secondary>
<tertiary>summary</tertiary>
</indexterm>
<para>A feature that might be valuable in some cases is
the insertion of the summary on the initial page of an
article. DocBook articles do not include it as a standard
feature.</para>
<para>To enable this, it is necessary to modify the style
sheet file.</para>
<example>
<title>Style sheet to insert summaries in articles</title>
<programlisting>
&dsl-example;
</programlisting>
</example>
</section>
<section id="automatic-indexes">
<title>Inserting indexes automatically</title>
<indexterm>
<primary>tools</primary> <secondary>indexes</secondary>
<tertiary>automatic generation</tertiary> <seealso>edition,
index</seealso>
</indexterm>
<para>
Although DocBook has markups for the composition of
them, indexes are not generated automatically. The
<command>collateindex.pl</command> command allows indexes
to be generated automatically.
</para>
<orderedlist>
<listitem>
<para>Process the document with
<application>jade</application> using the style to
<glossterm><acronym>HTML</acronym></glossterm> with
the option <option>-V html-index</option>.</para>
<informalexample>
<screen>
<prompt>bash$</prompt> <command>jade</command> <option>-t sgml \
-d html/docbook.dsl -V html-index document.sgml</option>
</screen>
</informalexample>
</listitem>
<listitem>
<para>
Generate the <filename>index.sgml</filename> file with
<command>collateindex.pl</command>.
</para> <informalexample>
<screen>
<prompt>bash$ </prompt> <command>perl</command> <option>collateindex.pl \
-o index.sgml HTML.index</option>
</screen>
</informalexample>
</listitem>
</orderedlist>
<para>For the above example to work, it's necessary to define
an <glossterm>external entity</glossterm> by calling the
file <filename>index.sgml</filename>.</para>
<example id="ex-entity-external-index">
<title>Configuring an external entity to include the
index</title>
<programlisting>
<sgmltag class="starttag">!DOCTYPE article PUBLIC "-//OASIS//DTD
DocBook V3.1//EN" [
&lt;!-- Insertion of the index --&gt; &lt;!entity index SYSTEM
"index.sgml"&gt; ]</sgmltag>
</programlisting>
</example>
<para>See also <xref linkend="encoding-index"/> for information
on how to
insert necessary information on the text.</para>
<note>
<para>Remember that if you're trying to get Tables of
Contents or Indexes on PS or PDF output you'll need to
run <application moreinfo="none">jadetex</application>
or <application moreinfo="none">pdfjadetex</application>
at least three times. This is required by <application
moreinfo="none">TeX</application> but not by DocBook or
related applications.</para>
</note>
</section>
</section>