2010-10-14 13:01:57 +00:00
|
|
|
<?xml version="1.0"?>
|
|
|
|
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
|
|
|
"http://docbook.org/xml/4.1.2/docbookx.dtd" [
|
|
|
|
<!ENTITY howto "http://tldp.org/HOWTO/">
|
|
|
|
<!ENTITY mini-howto "http://tldp.org/HOWTO/mini/">
|
|
|
|
<!ENTITY home "http://www.catb.org/~esr/">
|
|
|
|
]>
|
|
|
|
|
2016-03-29 17:06:54 +00:00
|
|
|
<article id="DocBook-Demystification-HOWTO">
|
2010-10-14 13:01:57 +00:00
|
|
|
<articleinfo>
|
|
|
|
<title>DocBook Demystification HOWTO</title>
|
|
|
|
|
|
|
|
<author>
|
|
|
|
<firstname>Eric</firstname>
|
|
|
|
<surname>Raymond</surname>
|
|
|
|
<affiliation>
|
|
|
|
<address>
|
|
|
|
<email>esr@thyrsus.com</email>
|
|
|
|
</address>
|
|
|
|
</affiliation>
|
|
|
|
</author>
|
|
|
|
|
2016-03-29 17:58:09 +00:00
|
|
|
<revhistory id="revhistory">
|
2010-10-14 13:01:57 +00:00
|
|
|
<revision>
|
|
|
|
<revnumber>v1.6</revnumber>
|
|
|
|
<date>2010-09-14</date>
|
|
|
|
<authorinitials>esr</authorinitials>
|
|
|
|
<revremark>
|
|
|
|
Major update. dblatex actually works for PDF production.
|
|
|
|
Describe asciidoc.
|
|
|
|
</revremark>
|
|
|
|
</revision>
|
|
|
|
<revision>
|
|
|
|
<revnumber>v1.5</revnumber>
|
|
|
|
<date>2006-10-13</date>
|
|
|
|
<authorinitials>esr</authorinitials>
|
|
|
|
<revremark>
|
|
|
|
Major update. Getox seems to be dead, FOP a bit further along.
|
|
|
|
</revremark>
|
|
|
|
</revision>
|
|
|
|
<revision>
|
|
|
|
<revnumber>v1.4</revnumber>
|
|
|
|
<date>2004-10-28</date>
|
|
|
|
<authorinitials>esr</authorinitials>
|
|
|
|
<revremark>
|
|
|
|
Minor update and license change.
|
|
|
|
</revremark>
|
|
|
|
</revision>
|
|
|
|
<revision>
|
|
|
|
<revnumber>v1.3</revnumber>
|
|
|
|
<date>2004-02-27</date>
|
|
|
|
<authorinitials>esr</authorinitials>
|
|
|
|
<revremark>
|
|
|
|
Add pointers to two editors.
|
|
|
|
</revremark>
|
|
|
|
</revision>
|
|
|
|
<revision>
|
|
|
|
<revnumber>v1.2</revnumber>
|
|
|
|
<date>2003-02-17</date>
|
|
|
|
<authorinitials>esr</authorinitials>
|
|
|
|
<revremark>
|
|
|
|
Reorder to defer references to SGML until after it has been
|
|
|
|
introduced.
|
|
|
|
</revremark>
|
|
|
|
</revision>
|
|
|
|
<revision>
|
|
|
|
<revnumber>v1.1</revnumber>
|
|
|
|
<date>2002-10-01</date>
|
|
|
|
<authorinitials>esr</authorinitials>
|
|
|
|
<revremark>
|
|
|
|
Correct inadvertent misrepresentation of FSF's position.
|
|
|
|
Added pointer to the DocBook FAQ.
|
|
|
|
</revremark>
|
|
|
|
</revision>
|
|
|
|
<revision>
|
|
|
|
<revnumber>v1.0</revnumber>
|
|
|
|
<date>2002-09-20</date>
|
|
|
|
<authorinitials>esr</authorinitials>
|
|
|
|
<revremark>
|
|
|
|
Initial version.
|
|
|
|
</revremark>
|
|
|
|
</revision>
|
|
|
|
</revhistory>
|
|
|
|
|
|
|
|
<legalnotice>
|
|
|
|
<title>Copyright</title>
|
|
|
|
<para>Permission is granted to copy, distribute and/or modify
|
|
|
|
this document under the terms of the
|
|
|
|
<ulink url='http://creativecommons.org/licenses/by/2.0/'>Creative Commons Attribution License,</ulink>
|
|
|
|
version 2.0.</para>
|
|
|
|
</legalnotice>
|
|
|
|
|
|
|
|
<abstract><para>
|
|
|
|
This HOWTO attempts to clear the fog and mystery surrounding the
|
|
|
|
DocBook markup system and the tools that go with it. It is aimed at
|
|
|
|
authors of technical documentation for open-source projects hosted
|
|
|
|
on Linux, but should be useful for people composing other kinds on
|
|
|
|
other Unixes as well.
|
|
|
|
</para></abstract>
|
|
|
|
|
|
|
|
</articleinfo>
|
|
|
|
|
|
|
|
<sect1 id="intro"><title>Introduction</title>
|
|
|
|
|
|
|
|
<para>A great many major open-source projects are converging on
|
|
|
|
DocBook as a standard format for their documentation — projects
|
|
|
|
including the Linux kernel, GNOME, KDE, Samba, and the Linux
|
|
|
|
Documentation Project. The advocates of XML-based "structural markup"
|
|
|
|
(as opposed to the older style of "presentation markup" exemplified by
|
|
|
|
troff, Tex, and Texinfo) seem to have won the theoretical
|
|
|
|
battle. You can generate presentation markup from structural markup,
|
|
|
|
but going in the other direction is very difficult.</para>
|
|
|
|
|
|
|
|
<para>Nevertheless, a lot of confusion surrounds DocBook and the
|
|
|
|
programs that support it. Its devotees speak an argot that is dense
|
|
|
|
and forbidding even by computer-science standards, slinging around
|
|
|
|
acronyms that have no obvious relationship to the things you need to
|
|
|
|
do to write markup and make HTML or Postscript from it. XML standards
|
|
|
|
and technical papers are notoriously obscure.</para>
|
|
|
|
|
|
|
|
<para>This HOWTO will attempt to clear up the major mysteries
|
|
|
|
surrounding DocBook and its application to open-source documentation
|
|
|
|
— both the technical and political ones. Our objective is to equip
|
|
|
|
you to understand not just what you need to do to make documents, but
|
|
|
|
why the process is as complex as it is — and how it can be
|
|
|
|
expected to change as newer DocBook-related tools become
|
|
|
|
available.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>Why care about DocBook at all?</title>
|
|
|
|
|
|
|
|
<para>There are two possibilities that make DocBook really
|
|
|
|
interesting. One is <emphasis>multi-mode rendering</emphasis> and the
|
|
|
|
other is <emphasis>searchable documentation
|
|
|
|
databases</emphasis>.</para>
|
|
|
|
|
|
|
|
<para>Multi-mode rendering is the easier, nearer-term possibility; it's
|
|
|
|
the ability to write a document in a single master format that can be
|
|
|
|
rendered in many different display modes (in particular, as both HTML
|
|
|
|
for on-line viewing and as Postscript for high-quality printed
|
|
|
|
output). This capability is pretty well implemented now.</para>
|
|
|
|
|
|
|
|
<para><emphasis>Searchable documentation databases</emphasis> is
|
|
|
|
shorthand for the possibility that DocBook might help get us to a
|
|
|
|
world in which all the documentation on your open-source operating
|
|
|
|
system is one rich, searchable, cross-indexed and hyperlinked
|
|
|
|
database (rather than being scattered across several different formats
|
|
|
|
in multiple locations as it is now).</para>
|
|
|
|
|
|
|
|
<para>Ideally, whenever you install a software package on your machine
|
|
|
|
it would register its DocBook documentation into your system's
|
|
|
|
catalog. HTML, properly indexed and cross-linked to the HTML in the
|
|
|
|
rest of your catalog, would be generated. The new package's
|
|
|
|
documentation would then be available through your browser. All your
|
|
|
|
documentation would be searchable through an interface resembling a
|
|
|
|
good Web search engine.</para>
|
|
|
|
|
|
|
|
<para>HTML itself is not quite rich enough a format to get us to that
|
|
|
|
world. To name just one lack, you can't explicitly declare index
|
|
|
|
entries in HTML. DocBook <emphasis>does</emphasis> have the semantic
|
|
|
|
richness to support structured documentation databases. Fundamentally
|
|
|
|
that's why so many projects are adopting it.</para>
|
|
|
|
|
|
|
|
<para>DocBook has the vices that go with its virtues. Some people
|
|
|
|
find it unpleasantly heavyweight, and too verbose to be really
|
|
|
|
comfortable as a composition format. That's OK; as long as the markup
|
|
|
|
tools they like (things like asciidoc or Perl POD or GNU Texinfo) can
|
|
|
|
generate DocBook out their back ends, we can all still get what we
|
|
|
|
want. It doesn't matter whether or not everybody writes in DocBook
|
|
|
|
— as long as it becomes the common document interchange format
|
|
|
|
that everyone uses, we'll still get unified searchable documentation
|
|
|
|
databases.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>Structural markup: a primer</title>
|
|
|
|
|
|
|
|
<para>Older formatting languages like Tex, Texinfo, and Troff
|
|
|
|
supported <firstterm>presentation
|
|
|
|
markup</firstterm><indexterm><primary>presentation
|
|
|
|
markup</primary></indexterm>. In these systems, the instructions you
|
|
|
|
gave were about the appearance and physical layout of the text (font
|
|
|
|
changes, indentation changes, that sort of thing).</para>
|
|
|
|
|
|
|
|
<para>Presentation markup was adequate as long as your objective was
|
|
|
|
to print to a single medium or type of display device. You run into
|
|
|
|
its limits, however, when you want to mark up a document so that (a)
|
|
|
|
it can be formatted for very different display media (such as printing
|
|
|
|
vs. Web display), or (b) you want to support searching and indexing the
|
|
|
|
document by its logical structure (as you are likely to want to do,
|
|
|
|
for example, if you are incorporating it into a hypertext system).</para>
|
|
|
|
|
|
|
|
<para>To support these capabilities properly, you need a system of
|
|
|
|
<firstterm>structural markup</firstterm><indexterm><primary>structural
|
|
|
|
markup</primary></indexterm>. In structural markup, you describe not
|
|
|
|
the physical appearance of the document but the logical properties of
|
|
|
|
its parts.</para>
|
|
|
|
|
|
|
|
<para>As an example: In a presentation-markup language, if you want to
|
|
|
|
emphasize a word, you might instruct the formatter to set it in
|
|
|
|
boldface. In
|
|
|
|
<citerefentry><refentrytitle>troff</refentrytitle><manvolnum>1</manvolnum></citerefentry>
|
|
|
|
this would look like so:</para>
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
All your base
|
|
|
|
.B are
|
|
|
|
belong to us!
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
<para>In a structural-markup language, you would tell the formatter to
|
|
|
|
emphasize the word:</para>
|
|
|
|
|
|
|
|
<programlisting>
|
|
|
|
All your base <emphasis>are</emphasis> belong to us!
|
|
|
|
</programlisting>
|
|
|
|
|
|
|
|
<para> The "<emphasis>" and </emphasis>in the line above
|
|
|
|
are called <firstterm>markup
|
|
|
|
tags</firstterm><indexterm><primary>markup tags</primary></indexterm>,
|
|
|
|
or just <firstterm>tags</firstterm> for short. They are the
|
|
|
|
instructions to your formatter.</para>
|
|
|
|
|
|
|
|
<para>In a structural-markup language, the physical appearance of the
|
|
|
|
final document would be controlled by a <firstterm>stylesheet</firstterm>
|
|
|
|
<indexterm><primary>stylesheet</primary></indexterm>. It is the
|
|
|
|
stylesheet that would tell the formatter "render emphasis as a font
|
|
|
|
change to boldface". One advantage of structural-markup languages
|
|
|
|
is that by changing a stylesheet you can globally change the
|
|
|
|
presentation of the document (to use different fonts, for example)
|
2016-10-24 09:00:39 +00:00
|
|
|
without having to hack all the individual instances of (say)
|
2010-10-14 13:01:57 +00:00
|
|
|
<markup>.B</markup> in the document itself.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>Document Type Definitions</title>
|
|
|
|
|
|
|
|
<para>(Note: to keep the explanation simple, most of this
|
|
|
|
section is going to tell some lies, mainly by omitting a lot of
|
|
|
|
history. Truthfulness will be fully restored in a
|
|
|
|
<link linkend="sgml">following section</link>.)</para>
|
|
|
|
|
|
|
|
<para>DocBook is a structural-level markup language. Specifically, it
|
|
|
|
is a dialect of XML. A DocBook document is a hunk of XML that uses
|
|
|
|
XML tags for structural markup.</para>
|
|
|
|
|
|
|
|
<para>In order for a document formatter to apply a stylesheet to your
|
|
|
|
document and make it look good, it needs to know things about the
|
|
|
|
overall structure of your document. For example, it needs to know
|
|
|
|
that a book manuscript normally consists of front matter, a sequence
|
|
|
|
of chapters, and back matter in order to physically format chapter
|
|
|
|
headers properly. In order for it to know this sort of thing, you
|
|
|
|
need to give it a <firstterm>Document Type
|
|
|
|
Definition</firstterm><indexterm><primary>Document Type
|
|
|
|
Definition</primary><secondary>DTD</secondary></indexterm> or DTD. The
|
|
|
|
DTD tells your formatter what sorts of elements can be in the document
|
|
|
|
structure, and in what orders they can appear.</para>
|
|
|
|
|
|
|
|
<para>What we mean by calling DocBook an `application' of XML is
|
|
|
|
actually that DocBook is a DTD — a rather large DTD, with
|
|
|
|
somewhere around 400 tags in it.</para>
|
|
|
|
|
|
|
|
<para>Lurking behind DocBook is a kind of program called a
|
|
|
|
<firstterm>validating parser</firstterm><indexterm><primary>validating
|
|
|
|
parser</primary></indexterm>.When you format a DocBook document, the
|
|
|
|
first step is to pass it through a validating parser (the front end of
|
|
|
|
the DocBook formatter). This program checks your document against the
|
|
|
|
DocBook DTD to make sure you aren't breaking any of the DTD's
|
|
|
|
structural rules (otherwise the back end of the formatter, the part
|
|
|
|
that applies your style sheet, might become quite confused).</para>
|
|
|
|
|
|
|
|
<para>The validating parser will either bomb out, giving you error
|
|
|
|
messages about places where the document structure is broken, or translate
|
|
|
|
the document into a stream of <firstterm>formatting events</firstterm>
|
|
|
|
which the parser back end combines with the information in your stylesheet
|
|
|
|
to produce formatted output</para>
|
|
|
|
|
|
|
|
<para>Here is a diagram of the whole process:</para>
|
|
|
|
|
|
|
|
<mediaobject>
|
|
|
|
<imageobject> <imagedata fileref="figure1.png" format="PNG"/></imageobject>
|
|
|
|
</mediaobject>
|
|
|
|
|
|
|
|
<para>The part of the diagram inside the dotted box is your formatting
|
|
|
|
software, or <firstterm>toolchain</firstterm>. Besides the obvious and
|
|
|
|
visible input to the formatter (the document source) you'll need to
|
|
|
|
keep the two `hidden' inputs of the formatter (DTD and stylesheet) in
|
|
|
|
mind to understand what follows.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>Other DTDs</title>
|
|
|
|
|
|
|
|
<para>A brief digression into other DTDs may help make clear what parts of
|
|
|
|
the previous section were specific to DocBook and what parts are general to
|
|
|
|
all structural-markup languages.</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://www.tei-c.org/">TEI</ulink> (Text Encoding
|
|
|
|
Initiative) is a large, elaborate DTD used primarily in academia for
|
|
|
|
computer transcription of literary texts. TEI's Unix-based toolchains
|
|
|
|
use many of the same tools that are involved with DocBook, but with
|
|
|
|
different stylesheets and (of course) a different DTD.</para>
|
|
|
|
|
|
|
|
<para>XHTML, the latest version of HTML, is also an XML application
|
|
|
|
described by a DTD, which explains the family resemblance between
|
|
|
|
XHTML and DocBook tags. The XHTML toolchain consists of web browsers
|
|
|
|
and a number of ad-hoc HTML-to-print utilities.</para>
|
|
|
|
|
|
|
|
<para>Many other XML DTDs are maintained to help people exchange
|
|
|
|
structured information in fields as diverse as bioinformatics and
|
|
|
|
banking. You can look at a <ulink
|
|
|
|
url="http://www.xml.com/pub/rg/DTD_Repositories"> list of
|
|
|
|
repositories</ulink> to get some idea of the variety out
|
|
|
|
there.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>The DocBook toolchain</title>
|
|
|
|
|
|
|
|
<para>The easiest way to format and render XML-DocBook documents is to
|
|
|
|
use the <application>xmlto</application> toolchain. This ships with
|
|
|
|
Red Hat; Debian users can get it with the command <command>apt-get
|
|
|
|
install xmlto</command>.</para>
|
|
|
|
|
|
|
|
<para>Normally, what you'll do to make XHTML from your
|
|
|
|
DocBook sources will look like this:</para>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
bash$ xmlto xhtml foo.xml
|
|
|
|
bash$ ls *.html
|
|
|
|
ar01s02.html ar01s03.html ar01s04.html index.html
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
<para>In this example, you converted an XML-Docbook document named
|
|
|
|
<filename>foo.xml</filename> with three top-level sections into an
|
|
|
|
index page and two parts. Making one big page is just as easy:</para>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
bash$ xmlto xhtml-nochunks foo.xml
|
|
|
|
bash$ ls *.html
|
|
|
|
foo.html
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
<para>Finally, here is how you make PDF for printing:</para>
|
|
|
|
|
|
|
|
<screen>
|
|
|
|
bash$ dblatex foo.xml # To make PDF
|
|
|
|
bash$ ls *.pdf
|
|
|
|
foo.pdf
|
|
|
|
</screen>
|
|
|
|
|
|
|
|
<para>Some older versions of <command>xmlto</command> may be
|
|
|
|
more verbose, emitting noise like "Converting to XHTML" and so forth.</para>
|
|
|
|
|
|
|
|
<para>To turn your documents into HTML or PDF, you need an
|
|
|
|
engine that can apply the combination of DocBook DTD and
|
|
|
|
a suitable stylesheet to your document. Here is how the
|
|
|
|
open-source tools for doing this fit together:</para>
|
|
|
|
|
|
|
|
<mediaobject>
|
|
|
|
<imageobject> <imagedata fileref="figure2.png" format="PNG"/></imageobject>
|
|
|
|
<caption>
|
|
|
|
<para>Present-day XML-DocBook toolchain</para>
|
|
|
|
</caption>
|
|
|
|
</mediaobject>
|
|
|
|
|
|
|
|
<para>Parsing your document and applying the stylesheet transformation
|
|
|
|
will be handled by one of three programs. The most likely one is
|
|
|
|
<application>xsltproc</application><indexterm><primary>xsltproc</primary></indexterm>. The other
|
|
|
|
possibilities are two Java programs,
|
|
|
|
<application>Saxon</application><indexterm><primary>Saxon</primary></indexterm>
|
|
|
|
and
|
|
|
|
<application>Xalan</application><indexterm><primary>Xalan</primary></indexterm>,</para>
|
|
|
|
|
|
|
|
<para>It is relatively easy to generate high-quality XHTML from
|
|
|
|
DocBook; the fact that XHTML is simply another XML DTD helps a lot.
|
|
|
|
Translation to HTML is done by applying a rather simple stylesheet,
|
|
|
|
and that's the end of the story. RTF is also simple to generate in
|
|
|
|
this way, and from XHTML or RTF it's easy to generate a flat ASCII
|
|
|
|
text approximation in a pinch.</para>
|
|
|
|
|
|
|
|
<para>The awkward case is print. Generating high-quality printed
|
|
|
|
output (which means, in practice, Adobe's
|
|
|
|
PDF<indexterm><primary>PDF</primary></indexterm> or Portable Document
|
|
|
|
Format, a packaged form of PostScript) is difficult. Doing it right
|
|
|
|
requires algorithmically duplicating the delicate judgments of a human
|
|
|
|
typesetter moving from content to presentation level.</para>
|
|
|
|
|
|
|
|
<para>So, first, a stylesheet translates Docbook's structural markup
|
|
|
|
into another dialect of XML —
|
|
|
|
FO<indexterm><primary>FO</primary></indexterm>
|
|
|
|
(Formatting Objects). FO markup is very much presentation-level; you
|
|
|
|
can think of it as a sort of XML functional equivalent of troff. It
|
|
|
|
has to be translated to Postscript for packaging in a PDF.</para>
|
|
|
|
|
|
|
|
<para>In the toolchain shipped with most present-day Linux
|
|
|
|
distributions, this job is best handled by a program called
|
|
|
|
<application>dblatex</application><indexterm><primary>dblatex</primary></indexterm>
|
|
|
|
(this obsoletes the older passivetex package that previous versions of
|
|
|
|
tis HOWTO described).</para>
|
|
|
|
|
|
|
|
<para><command>dblatex</command> translates the formatting objects
|
|
|
|
generated by <command>xsltproc</command> into Donald Knuth's TeX
|
|
|
|
language. TeX was one of the earliest open-source projects, an old
|
|
|
|
but powerful presentation-level formatting language much beloved of
|
|
|
|
mathematicians (to whom it provides particulaly elaborate facilities
|
|
|
|
for describing mathematical notation). TeX is also famously good at
|
|
|
|
basic typesetting tasks like kerning, line filling, and hyphenating.
|
|
|
|
TeX's output is then massaged into PDF.</para>
|
|
|
|
|
|
|
|
<para>If you think this bucket chain of XML to Tex macros to
|
|
|
|
PDF sounds like an awkward kludge, you're right. It clanks, it
|
|
|
|
wheezes, and it has ugly warts. Fonts are a significant problem,
|
|
|
|
since XML and TeX and PDF have very different models of how fonts
|
|
|
|
work; also, handling internationalization and localization is a
|
|
|
|
nightmare. About the only thing this code path has going for it is
|
|
|
|
that it works.</para>
|
|
|
|
|
|
|
|
<para>The elegant way will be <ulink
|
|
|
|
url="http://xmlgraphics.apache.org/fop/">
|
|
|
|
FOP</ulink><indexterm><primary>FOP</primary></indexterm>, a direct
|
|
|
|
FO-to-Postscript translator being developed by the Apache project.
|
|
|
|
With FOP, the internationalization problem is, if not solved, at least
|
|
|
|
well confined; XML tools handle Unicode all the way through to FOP.
|
|
|
|
Glyph to font mapping is also strictly FOP's problem. The only
|
|
|
|
trouble with this approach is that it entirely doesn't work yet. As
|
|
|
|
of October 2010 FOP is at 1.0 and usable, but with rough edges and
|
|
|
|
missing features. I recommed dblatex for production use.</para>
|
|
|
|
|
|
|
|
<para>Here is what the FOP toolchain looks like:</para>
|
|
|
|
|
|
|
|
<mediaobject>
|
|
|
|
<imageobject> <imagedata fileref="figure3.png" format="PNG"/></imageobject>
|
|
|
|
<caption>
|
|
|
|
<para>Future XML-DocBook toolchain with FOP.</para>
|
|
|
|
</caption>
|
|
|
|
</mediaobject>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>asciidoc</title>
|
|
|
|
|
|
|
|
<para>There is a relatively new tool called <ulink
|
|
|
|
url="http://www.methods.co.nz/asciidoc/">asciidoc</ulink> that tackles
|
|
|
|
several of the problems associated with DocBook rather effectively.</para>
|
|
|
|
|
|
|
|
<para>The <command>asciidoc</command> tool accepts a simple,
|
|
|
|
lightweight syntax resembling wiki markups and turns it into various
|
|
|
|
output formats using DocBook as an intermediate stage. The asciidoc
|
|
|
|
markup is easier to compose in than DocBook itself, and serves
|
|
|
|
as its own best rendering in flat ASCII.</para>
|
|
|
|
|
|
|
|
<para>Printing support in <command>asciidoc</command> is through an
|
|
|
|
experimental LaTeX back end. It is most useful for writing short
|
|
|
|
to medium-length documents for World Wide Web distribution.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>Who are the projects and the players?</title>
|
|
|
|
|
|
|
|
<para>The DocBook DTD itself is maintained by the DocBook Technical
|
|
|
|
Committee, headed by Norman Walsh. Norm is the principal author of
|
|
|
|
the DocBook stylesheets, a man who has focused remarkable energy and
|
|
|
|
talent over many years on the extremely complex problems DocBook
|
|
|
|
addresses. He is as universally respected in the DocBook
|
|
|
|
community as Linus Torvalds is in the Linux world.</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://xmlsoft.org/XSLT/">libxslt</ulink> is a C
|
|
|
|
library that interprets XSLT, applying stylesheets to XML documents.
|
|
|
|
It includes a wrapper program, <command>xsltproc</command>, that can be
|
|
|
|
used as an XML formatter. The code was written by Daniel Veillard
|
|
|
|
under the auspices of the GNOME project, but does not require any
|
|
|
|
GNOME code to run. I hear it's blazingly fast compared to the
|
|
|
|
Java alternatives, not a surprising claim.</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://cyberelk.net/tim/xmlto/">xmlto</ulink> is the
|
|
|
|
user interface of the XML toolchain that most Linuxes. It's written
|
|
|
|
and maintained by Tim Waugh.</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://users.iclway.co.uk/mhkay/saxon/">Saxon</ulink>
|
|
|
|
and <ulink url="http://xml.apache.org/xalan-j/">Xalan</ulink> are Java
|
|
|
|
programs that interpret XSLT. Saxon seems to be designed to work
|
|
|
|
under Windows. Xalan is part of the XML Apache project and native to
|
|
|
|
Linux and BSD; it's designed to work with FOP.</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://xml.apache.org/fop/">FOP</ulink> translates
|
|
|
|
XML Formatting Objects to PDF. It is part of the Apache XML project
|
|
|
|
and is designed to work with Xalan.</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://www.methods.co.nz/asciidoc/">asciidoc</ulink>
|
|
|
|
translates its own lightweight markup to DocBook, and thence to various
|
|
|
|
output formats.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>Migration tools</title>
|
|
|
|
|
|
|
|
<para>The second biggest problem with DocBook is the effort needed to
|
|
|
|
convert old-style presentation markup to DocBook markup. Human beings
|
|
|
|
can usually parse the presentation of a document into logical
|
|
|
|
structure automatically, because (for example) they can tell from
|
|
|
|
context when an italic font means `emphasis' and when it means
|
|
|
|
something else such as `this is a foreign phrase'.</para>
|
|
|
|
|
|
|
|
<para>Somehow, in converting documents to DocBook, those
|
|
|
|
sorts of distinctions need to be made explicit. Sometimes
|
|
|
|
they're present in the old markup; often they are not, and the
|
|
|
|
missing structural information has to be either deduced by
|
|
|
|
clever heuristics or added by a human.</para>
|
|
|
|
|
|
|
|
<para>Here is a summary of the state of conversion tools from
|
|
|
|
various other formats:</para>
|
|
|
|
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
|
|
<term>GNU Texinfo</term>
|
|
|
|
<listitem>
|
|
|
|
<para>The Free Software Foundation has made a policy decision to
|
|
|
|
support DocBook as an interchange format. Texinfo has enough
|
|
|
|
structure to make reasonably good automatic conversion possible, and
|
|
|
|
the 4.x versions of <command>makeinfo</command> feature a
|
|
|
|
<option>--docbook</option> switch that generates DocBook.
|
|
|
|
More at the <ulink
|
|
|
|
url="http://www.gnu.org/directory/texinfo.html">makeinfo project
|
|
|
|
page</ulink>.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>POD</term>
|
|
|
|
<listitem>
|
|
|
|
<para>There is a <ulink
|
|
|
|
url="http://www.cpan.org/modules/by-module/Pod/">POD::DocBook</ulink>
|
|
|
|
module that translates Plain Old Documentation markup to DocBook. It
|
|
|
|
claims to translate every POD tag except the L<> italic tag.
|
|
|
|
The man page also says "Nested =over/=back lists are not supported
|
|
|
|
within DocBook." but notes that the module has been heavily
|
|
|
|
tested.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>LaTeX</term>
|
|
|
|
<listitem>
|
|
|
|
<para>LaTeX is a (mostly) structural markup macro language built on
|
|
|
|
top of the TeX formatter. There is a project called <ulink
|
|
|
|
url="http://www.lrz-muenchen.de/services/software/sonstiges/tex4ht/mn.html">
|
|
|
|
TeX4ht</ulink> that (according to the author of PassiveTeX) can
|
|
|
|
generate DocBook from LaTeX.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term>man pages and other troff-based markups</term>
|
|
|
|
<listitem>
|
|
|
|
<para>This is generally considered the biggest and nastiest conversion
|
|
|
|
problem. And indeed, the basic
|
|
|
|
<citerefentry><refentrytitle>troff</refentrytitle>
|
|
|
|
<manvolnum>1</manvolnum></citerefentry> markup is at too low a presentation
|
|
|
|
level for automatic conversion tools to do much of any good. However,
|
|
|
|
the gloom in the picture lightens significantly if we consider
|
|
|
|
translation from sources of documents written in macro packages like
|
|
|
|
<citerefentry><refentrytitle>man</refentrytitle>
|
|
|
|
<manvolnum>7</manvolnum></citerefentry>. These have enough structural
|
|
|
|
features for automatic translation to get some traction.</para>
|
|
|
|
|
|
|
|
<para>I wrote a tool to do this myself, because I couldn't find
|
|
|
|
anything else that did a half-decent job of it (and the problem is
|
|
|
|
interesting). It's called <ulink
|
|
|
|
url="&home;/doclifter/">doclifter</ulink>. It will
|
|
|
|
translate to either SGML or XML DocBook from
|
|
|
|
<citerefentry><refentrytitle>man</refentrytitle>
|
|
|
|
<manvolnum>7</manvolnum></citerefentry>,
|
|
|
|
<citerefentry><refentrytitle>mdoc</refentrytitle>
|
|
|
|
<manvolnum>7</manvolnum></citerefentry>,
|
|
|
|
<citerefentry><refentrytitle>ms</refentrytitle>
|
|
|
|
<manvolnum>7</manvolnum></citerefentry>, or
|
|
|
|
<citerefentry><refentrytitle>me</refentrytitle>
|
|
|
|
<manvolnum>7</manvolnum></citerefentry> macros. See the documentation
|
|
|
|
for details.</para>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>Editing tools</title>
|
|
|
|
|
|
|
|
<para>Most people still hack DocBook tags by hand using either vi or
|
|
|
|
emacs. There's an Nxml mode that ships with Emacs and is automatically
|
|
|
|
invoked when the editor recognizes an XMl document. It has become
|
|
|
|
pretty good; while it doesn't give GUI presentation, it does use its
|
|
|
|
knowledge of XML to highlight out-of-balance tags. Some alternative
|
|
|
|
are summarized at the <ulink
|
|
|
|
url="http://www.emacswiki.org/emacs/CategoryXML">Emacs CategoryXML
|
|
|
|
page</ulink>.</para>
|
|
|
|
|
|
|
|
<para>There have been a number of attempts at GUI editors for DocBook,
|
|
|
|
often with the aim of being general editors for any markup with an XML or
|
|
|
|
SGML schema. EuroMath, MLView, Conglomerate, ThotBook are among them.
|
|
|
|
Such projects tent to stall out in alpha stage; designing a decent
|
|
|
|
UI for this task is extemely difficult.</para>
|
|
|
|
|
|
|
|
<para>Some attempts that have made it to production stage (if only
|
|
|
|
barely, in many cases) can be found at the <ulink
|
|
|
|
url="http://wiki.docbook.org/topic/DocBookAuthoringTools">DocBook
|
|
|
|
Authoring Tools page</ulink>. I have not tried using any of these.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>Hints and tricks</title>
|
|
|
|
|
|
|
|
<para>It is possible to generate an index by including an empty
|
|
|
|
<index/> tag at the point in your document where you wish
|
|
|
|
it to appear. Be warned that, as of early 2004, this facility is
|
|
|
|
still somewhat primitive. It won't merge ranges, and the output
|
|
|
|
generated for PostScript is not yet production-quality.</para>
|
|
|
|
|
|
|
|
<para>This space is reserved for more hints and tricks.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1><title>Related standards and practices</title>
|
|
|
|
|
|
|
|
<para>The tools are coming together, if slowly, to edit and format
|
|
|
|
DocBook markup. But DocBook itself is a means, not an end. We'll need
|
|
|
|
other standards besides DocBook itself to accomplish the
|
|
|
|
searchable-documentation-database objective I laid out at the
|
|
|
|
beginning of this document. There are two big issues: document
|
|
|
|
cataloguing and metadata.</para>
|
|
|
|
|
|
|
|
<para>The <ulink
|
|
|
|
url="http://scrollkeeper.sourceforge.net/">Scrollkeeper</ulink>
|
|
|
|
project aims directly to meet this need. It provides a simple set of
|
|
|
|
script hooks that can be used by package install and uninstall
|
|
|
|
productions to register and unregister their documentation into and
|
|
|
|
out of a shared, searchable system-wide database.</para>
|
|
|
|
|
|
|
|
<para>Scrollkeeper uses the <ulink
|
|
|
|
url="http://www.ibiblio.org/osrt/omf/">Open Metadata Format</ulink>.
|
|
|
|
This is a standard for indexing open-source documentation analogous to
|
|
|
|
a library card-catalog system. The idea is to support rich search
|
|
|
|
facilities that use the card-catalog metadata as well as the source
|
|
|
|
text of the documentation itself.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="sgml"><title>SGML and SGML-Tools</title>
|
|
|
|
|
|
|
|
<para>In previous sections, I have thrown away a lot of DocBook's
|
|
|
|
history. XML has an older brother,
|
|
|
|
SGML<indexterm><primary>SGML</primary></indexterm> or Standard Generalized
|
|
|
|
Markup Language.</para>
|
|
|
|
|
|
|
|
<para>Until mid-2002, no discussion of DocBook would have been
|
|
|
|
complete without a long excursion into SGML, the differences between
|
|
|
|
SGML and XML, and detailed descriptions of the SGML DocBook toolchain.
|
|
|
|
Life can be simpler now; an XML DocBook toolchain is available in open
|
|
|
|
source, works as well as the SGML toolchain ever did, and is much
|
|
|
|
easier to use. If you don't think you'll ever have to deal with old
|
|
|
|
SGML-Docbook documents, you can skip the remainder of this
|
|
|
|
section.</para>
|
|
|
|
|
|
|
|
<sect2><title>DocBook SGML</title>
|
|
|
|
|
|
|
|
<para>DocBook was originally an SGML application, and there was an
|
|
|
|
SGML-based DocBook toolchain that is now moribund. There are minor
|
|
|
|
differences between the DocBook SGML DTD and the DocBook XML DTD, but
|
|
|
|
for an introductory discussion we can ignore them. The only one that's
|
|
|
|
normally user-visible is that in SGML contentless tags did not need to
|
|
|
|
have a trailing slash added to them before the closing >.
|
|
|
|
(Requiring the trailing / means XML parsers can be a lot simpler,
|
|
|
|
because they don't have to know about the DTD to know which opening
|
|
|
|
tags need closers.)</para>
|
|
|
|
|
|
|
|
<para>Versions of HTML up to 4.01 (before XHTML) were SGML
|
|
|
|
applications. TEI was originally an SGML application, too. The
|
|
|
|
groups managing all three DTDs jumped to XML for the same reason
|
|
|
|
DocBook's developers did — it's drastically simpler. SGML was
|
|
|
|
extremely complex; unmanageably so, as it turns out. The
|
|
|
|
specification was a dense 150 pages and it is not reliably reported
|
|
|
|
that any software ever fully implemented it.</para>
|
|
|
|
|
|
|
|
<para>The toolchain diagram I gave earlier was simplified; it
|
|
|
|
only showed the XML toolchain. Here is the historically
|
|
|
|
correct version:</para>
|
|
|
|
|
|
|
|
<mediaobject>
|
|
|
|
<imageobject><imagedata fileref="figure4.png" format="PNG"/></imageobject>
|
|
|
|
</mediaobject>
|
|
|
|
|
|
|
|
<para>The DSSSL toolchain is what processed DocBook SGML.
|
|
|
|
Under it, a document goes from DocBook format through one of two
|
|
|
|
closely-related stylesheet engines called Jade and OpenJade. These
|
|
|
|
turn it into a TeX-macro markup, which is processed by a package called
|
|
|
|
JadeTeX, into DVIs, which then get turned into Postscript.</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
<sect2><title>SGML tools</title>
|
|
|
|
|
|
|
|
<para>The <ulink url="http://sources.redhat.com/docbook-tools/">
|
|
|
|
docbook-tools</ulink> project provides open-source tools for
|
|
|
|
converting SGML DocBook to HTML, Postscript, and other formats. This
|
|
|
|
package is shipped with Red Hat and other Linux distributions. It is
|
|
|
|
maintained by Mark Galassi.</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://www.jclark.com/jade/">Jade</ulink> is an
|
|
|
|
engine used to apply DSSSL stylesheets to SGML documents. It is
|
|
|
|
maintained by James Clark.</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://openjade.sourceforge.net/">OpenJade</ulink>
|
|
|
|
is a community project undertaken because the founders thought James
|
|
|
|
Clark's maintainance of Jade was spotty. The docbook-tools programs
|
|
|
|
use OpenJade.</para>
|
|
|
|
|
|
|
|
<para><ulink
|
|
|
|
url="http://users.ox.ac.uk/~rahtz/passivetex/">PassiveTeX</ulink> the
|
|
|
|
package of LaTeX macros that <application>xmlto</application> uses for
|
|
|
|
producing DVI from XML-DocBook. <ulink
|
|
|
|
url="http://jadetex.sourceforge.net/">JadeTex</ulink> is the package
|
|
|
|
of LaTeX macros that OpenJade uses for producing DVI from
|
|
|
|
SGML-DocBook.</para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
<sect2><title>Why SGML DocBook is dead</title>
|
|
|
|
|
|
|
|
<para>The DSSSL toolchain is, as far as new development goes,
|
|
|
|
effectively dead. The XSLT toolchain has reached production status in
|
|
|
|
mid-2002; a working version shipped in Red Hat 7.3. It's where
|
|
|
|
DocBook developers are putting almost all of their effort.</para>
|
|
|
|
|
|
|
|
<para>The reason for the change to XML was threefold. First,
|
|
|
|
SGML turned out to be too complicated to use; then, DSSSL turned out
|
|
|
|
to be too complicated to live with; then, significant parts of the
|
|
|
|
DSSSL toolchain turned out to be weak and irredeemably messy.</para>
|
|
|
|
|
|
|
|
<para>Relative to SGML, XML has a reduced feature set that is
|
|
|
|
sufficient for almost all purposes but much easier to understand and
|
|
|
|
build parsers for. SGML-processing tools (such as validating parsers) have
|
|
|
|
to carry around support for a lot of features that DocBook and other
|
|
|
|
text markup systems never actually used. Removing these features
|
|
|
|
made XML simpler and XML-processing tools faster.</para>
|
|
|
|
|
|
|
|
<para>The language used to describe SGML DTDs is sufficiently spiky
|
|
|
|
and forbidding that composing SGML DTDs was something of a black art.
|
|
|
|
XML DTDs, on the other hand, can be described in a dialect of XML
|
|
|
|
itself; there does not need to be a separate DTD language. An XML
|
|
|
|
description of an XML DTD is called a
|
|
|
|
<firstterm>schema</firstterm><indexterm><primary>schema</primary></indexterm>;
|
|
|
|
the term DTD itself will probably pass out of use as the standards for
|
|
|
|
schemas firm up.</para>
|
|
|
|
|
|
|
|
<para>But mostly the DSSSL toolchain is dead because DSSSL itself, the
|
|
|
|
SGML stylesheet description language in that toolchain, proved just too
|
|
|
|
arcane for most human beings, and made stylesheets too difficult to
|
|
|
|
write and modify. (It was a dialect of Scheme. Your humble editor, a
|
|
|
|
LISP-head from way back, shakes his head in sad bemusement that
|
|
|
|
this should drive people away.)</para>
|
|
|
|
|
|
|
|
<para>XML fans like to sum up all these changes with <quote>XML:
|
|
|
|
tastes great, less filling.</quote></para>
|
|
|
|
|
|
|
|
</sect2>
|
|
|
|
<sect2><title>SGML-Tools</title>
|
|
|
|
|
|
|
|
<para>SGML-Tools was the name of a DTD used by the <ulink
|
|
|
|
url="http://www.linuxdoc.org">Linux Documentation Project</ulink>,
|
|
|
|
developed a few years ago when today's DocBook toolchains didn't exist.
|
|
|
|
SGML-Tools markup was simpler, but also much less flexible than
|
|
|
|
DocBook. The original SGML-Tools formatter/DTD/stylesheet(s)
|
|
|
|
toolchain has been dead for some time now, but a successor called <ulink
|
|
|
|
url="http://sourceforge.net/projects/sgmltools-lite/">SGML-tools
|
|
|
|
Lite</ulink> is still maintained.</para>
|
|
|
|
|
|
|
|
<para>The LDP has been phasing out SGML-Tools in favor of DocBook, but
|
|
|
|
it is still possible you might take over an old HOWTO. These can be
|
|
|
|
recognized by the identifying header "<!doctype linuxdoc
|
|
|
|
system>". If this happens to you, convert the thing to XML DocBook
|
|
|
|
and give the old version a quick burial.</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
|
|
|
|
<sect1><title>References</title>
|
|
|
|
|
|
|
|
<para>One of the things that makes learning DocBook difficult is that
|
|
|
|
the sites related to it tend to overwhelm the newbie with long lists
|
|
|
|
of W3C standards, massive exercises in markup theology, and dense
|
|
|
|
thickets of abstract terminology. We're going to try to avoid that
|
|
|
|
here by giving you just a few selected references to look at.</para>
|
|
|
|
|
|
|
|
<para>Michael Smith's <ulink
|
|
|
|
url="http://xml.oreilly.com/news/dontlearn_0701.html">
|
|
|
|
Take My Advice: Don't Learn XML</ulink> surveys the XML world from
|
|
|
|
an angle similar to this document.</para>
|
|
|
|
|
|
|
|
<para>Norman Walsh's <citetitle>DocBook: The Definitive
|
|
|
|
Guide</citetitle> is available <ulink
|
|
|
|
url="http://www.oreilly.com/catalog/docbook/">in print</ulink> and
|
|
|
|
<ulink url="http://www.docbook.org/tdg/en/html/docbook.html">on the
|
|
|
|
web</ulink>. This is indeed the definitive reference, but as an
|
|
|
|
introduction or tutorial it's a disaster. Instead, read this:</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://opensource.bureau-cornavin.com/crash-course">Writing
|
|
|
|
Documentation Using DocBook: A Crash Course</ulink>. This is an excellent
|
|
|
|
tutorial.</para>
|
|
|
|
|
|
|
|
<para>There is an excellent <ulink
|
|
|
|
url="http://www.dpawson.co.uk/docbook/">DocBook FAQ</ulink> with a lot
|
|
|
|
of material on styling HTML output. There is also a DocBook <ulink
|
|
|
|
url="http://docbook.org/wiki/moin.cgi">wiki</ulink>.</para>
|
|
|
|
|
|
|
|
<para>If you're writing for the Linux Documentation Project, read the
|
|
|
|
<ulink url="http://www.linuxdoc.org/LDP/LDP-Author-Guide/index.html">
|
|
|
|
LDP Author Guide</ulink>.</para>
|
|
|
|
|
|
|
|
<para>The best general introduction to SGML and XML that I've
|
|
|
|
personally read all the way through is David Megginson's <ulink
|
|
|
|
url="http://vig.pearsoned.com/store/product/0,,store-562_banner-0_isbn-0136422993,00.html">Structuring
|
|
|
|
XML Documents</ulink> (Prentice-Hall, ISBN: 0-13-642299-3).</para>
|
|
|
|
|
|
|
|
<para>For XML only, <ulink
|
|
|
|
url="http://www.oreilly.com/catalog/xmlnut2/">XML In A
|
|
|
|
Nutshell</ulink> by W. Scott Means and Elliotte <quote>Rusty</quote>
|
|
|
|
Harold is very good.</para>
|
|
|
|
|
|
|
|
<para><ulink url="http://www.ibiblio.org/xml/books/bible/">The XML
|
|
|
|
Bible</ulink> looks like a pretty comprehensive reference on XML and
|
|
|
|
related standards (including Formatting Objects).</para>
|
|
|
|
|
|
|
|
<para>Finally, the <ulink url="http://xml.coverpages.org/">The XML
|
|
|
|
Cover Pages</ulink> will take you into the jungle of XML standards
|
|
|
|
if you really want to go there.</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
</article>
|
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
|
|
|
mode: sgml
|
|
|
|
sgml-omittag:t
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-namecase-general:t
|
|
|
|
sgml-general-insert-case:lower
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:nil
|
|
|
|
sgml-parent-document:nil
|
|
|
|
sgml-exposed-tags:nil
|
|
|
|
sgml-local-catalogs:nil
|
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
-->
|