666 lines
30 KiB
Plaintext
666 lines
30 KiB
Plaintext
|
|
DocBook Demystification HOWTO
|
|
|
|
Eric Raymond
|
|
|
|
<esr@thyrsus.com>
|
|
|
|
Revision History
|
|
Revision v1.6 2010-09-14 Revised by: esr
|
|
Major update. dblatex actually works for PDF production. Describe
|
|
asciidoc.
|
|
Revision v1.5 2006-10-13 Revised by: esr
|
|
Major update. Getox seems to be dead, FOP a bit further along.
|
|
Revision v1.4 2004-10-28 Revised by: esr
|
|
Minor update and license change.
|
|
Revision v1.3 2004-02-27 Revised by: esr
|
|
Add pointers to two editors.
|
|
Revision v1.2 2003-02-17 Revised by: esr
|
|
Reorder to defer references to SGML until after it has been
|
|
introduced.
|
|
Revision v1.1 2002-10-01 Revised by: esr
|
|
Correct inadvertent misrepresentation of FSF's position. Added
|
|
pointer to the DocBook FAQ.
|
|
Revision v1.0 2002-09-20 Revised by: esr
|
|
Initial version.
|
|
|
|
This HOWTO attempts to clear the fog and mystery surrounding the
|
|
DocBook markup system and the tools that go with it. It is aimed at
|
|
authors of technical documentation for open-source projects hosted on
|
|
Linux, but should be useful for people composing other kinds on other
|
|
Unixes as well.
|
|
|
|
Copyright
|
|
|
|
Permission is granted to copy, distribute and/or modify this document
|
|
under the terms of the [http://creativecommons.org/licenses/by/2.0/]
|
|
Creative Commons Attribution License, version 2.0.
|
|
________________________________________________________________
|
|
|
|
Table of Contents
|
|
1. Introduction
|
|
2. Why care about DocBook at all?
|
|
3. Structural markup: a primer
|
|
4. Document Type Definitions
|
|
5. Other DTDs
|
|
6. The DocBook toolchain
|
|
7. asciidoc
|
|
8. Who are the projects and the players?
|
|
9. Migration tools
|
|
10. Editing tools
|
|
11. Hints and tricks
|
|
12. Related standards and practices
|
|
13. SGML and SGML-Tools
|
|
|
|
13.1. DocBook SGML
|
|
13.2. SGML tools
|
|
13.3. Why SGML DocBook is dead
|
|
13.4. SGML-Tools
|
|
|
|
14. References
|
|
|
|
1. Introduction
|
|
|
|
A great many major open-source projects are converging on DocBook as
|
|
a standard format for their documentation -- projects including the
|
|
Linux kernel, GNOME, KDE, Samba, and the Linux Documentation Project.
|
|
The advocates of XML-based "structural markup" (as opposed to the
|
|
older style of "presentation markup" exemplified by troff, Tex, and
|
|
Texinfo) seem to have won the theoretical battle. You can generate
|
|
presentation markup from structural markup, but going in the other
|
|
direction is very difficult.
|
|
|
|
Nevertheless, a lot of confusion surrounds DocBook and the programs
|
|
that support it. Its devotees speak an argot that is dense and
|
|
forbidding even by computer-science standards, slinging around
|
|
acronyms that have no obvious relationship to the things you need to
|
|
do to write markup and make HTML or Postscript from it. XML standards
|
|
and technical papers are notoriously obscure.
|
|
|
|
This HOWTO will attempt to clear up the major mysteries surrounding
|
|
DocBook and its application to open-source documentation -- both the
|
|
technical and political ones. Our objective is to equip you to
|
|
understand not just what you need to do to make documents, but why
|
|
the process is as complex as it is -- and how it can be expected to
|
|
change as newer DocBook-related tools become available.
|
|
________________________________________________________________
|
|
|
|
2. Why care about DocBook at all?
|
|
|
|
There are two possibilities that make DocBook really interesting. One
|
|
is multi-mode rendering and the other is searchable documentation
|
|
databases.
|
|
|
|
Multi-mode rendering is the easier, nearer-term possibility; it's the
|
|
ability to write a document in a single master format that can be
|
|
rendered in many different display modes (in particular, as both HTML
|
|
for on-line viewing and as Postscript for high-quality printed
|
|
output). This capability is pretty well implemented now.
|
|
|
|
Searchable documentation databases is shorthand for the possibility
|
|
that DocBook might help get us to a world in which all the
|
|
documentation on your open-source operating system is one rich,
|
|
searchable, cross-indexed and hyperlinked database (rather than being
|
|
scattered across several different formats in multiple locations as
|
|
it is now).
|
|
|
|
Ideally, whenever you install a software package on your machine it
|
|
would register its DocBook documentation into your system's catalog.
|
|
HTML, properly indexed and cross-linked to the HTML in the rest of
|
|
your catalog, would be generated. The new package's documentation
|
|
would then be available through your browser. All your documentation
|
|
would be searchable through an interface resembling a good Web search
|
|
engine.
|
|
|
|
HTML itself is not quite rich enough a format to get us to that
|
|
world. To name just one lack, you can't explicitly declare index
|
|
entries in HTML. DocBook does have the semantic richness to support
|
|
structured documentation databases. Fundamentally that's why so many
|
|
projects are adopting it.
|
|
|
|
DocBook has the vices that go with its virtues. Some people find it
|
|
unpleasantly heavyweight, and too verbose to be really comfortable as
|
|
a composition format. That's OK; as long as the markup tools they
|
|
like (things like asciidoc or Perl POD or GNU Texinfo) can generate
|
|
DocBook out their back ends, we can all still get what we want. It
|
|
doesn't matter whether or not everybody writes in DocBook -- as long
|
|
as it becomes the common document interchange format that everyone
|
|
uses, we'll still get unified searchable documentation databases.
|
|
________________________________________________________________
|
|
|
|
3. Structural markup: a primer
|
|
|
|
Older formatting languages like Tex, Texinfo, and Troff supported
|
|
presentation markup. In these systems, the instructions you gave were
|
|
about the appearance and physical layout of the text (font changes,
|
|
indentation changes, that sort of thing).
|
|
|
|
Presentation markup was adequate as long as your objective was to
|
|
print to a single medium or type of display device. You run into its
|
|
limits, however, when you want to mark up a document so that (a) it
|
|
can be formatted for very different display media (such as printing
|
|
vs. Web display), or (b) you want to support searching and indexing
|
|
the document by its logical structure (as you are likely to want to
|
|
do, for example, if you are incorporating it into a hypertext
|
|
system).
|
|
|
|
To support these capabilities properly, you need a system of
|
|
structural markup. In structural markup, you describe not the
|
|
physical appearance of the document but the logical properties of its
|
|
parts.
|
|
|
|
As an example: In a presentation-markup language, if you want to
|
|
emphasize a word, you might instruct the formatter to set it in
|
|
boldface. In troff(1) this would look like so:
|
|
All your base
|
|
.B are
|
|
belong to us!
|
|
|
|
In a structural-markup language, you would tell the formatter to
|
|
emphasize the word:
|
|
All your base <emphasis>are</emphasis> belong to us!
|
|
|
|
The "<emphasis>" and </emphasis>in the line above are called markup
|
|
tags, or just tags for short. They are the instructions to your
|
|
formatter.
|
|
|
|
In a structural-markup language, the physical appearance of the final
|
|
document would be controlled by a stylesheet . It is the stylesheet
|
|
that would tell the formatter "render emphasis as a font change to
|
|
boldface". One advantage of structural-markup languages is that by
|
|
changing a stylesheet you can globally change the presentation of the
|
|
document (to use different fonts, for example) without having to hack
|
|
all the the individual instances of (say) .B in the document itself.
|
|
________________________________________________________________
|
|
|
|
4. Document Type Definitions
|
|
|
|
(Note: to keep the explanation simple, most of this section is going
|
|
to tell some lies, mainly by omitting a lot of history. Truthfulness
|
|
will be fully restored in a following section.)
|
|
|
|
DocBook is a structural-level markup language. Specifically, it is a
|
|
dialect of XML. A DocBook document is a hunk of XML that uses XML
|
|
tags for structural markup.
|
|
|
|
In order for a document formatter to apply a stylesheet to your
|
|
document and make it look good, it needs to know things about the
|
|
overall structure of your document. For example, it needs to know
|
|
that a book manuscript normally consists of front matter, a sequence
|
|
of chapters, and back matter in order to physically format chapter
|
|
headers properly. In order for it to know this sort of thing, you
|
|
need to give it a Document Type Definition or DTD. The DTD tells your
|
|
formatter what sorts of elements can be in the document structure,
|
|
and in what orders they can appear.
|
|
|
|
What we mean by calling DocBook an `application' of XML is actually
|
|
that DocBook is a DTD -- a rather large DTD, with somewhere around
|
|
400 tags in it.
|
|
|
|
Lurking behind DocBook is a kind of program called a validating
|
|
parser.When you format a DocBook document, the first step is to pass
|
|
it through a validating parser (the front end of the DocBook
|
|
formatter). This program checks your document against the DocBook DTD
|
|
to make sure you aren't breaking any of the DTD's structural rules
|
|
(otherwise the back end of the formatter, the part that applies your
|
|
style sheet, might become quite confused).
|
|
|
|
The validating parser will either bomb out, giving you error messages
|
|
about places where the document structure is broken, or translate the
|
|
document into a stream of formatting events which the parser back end
|
|
combines with the information in your stylesheet to produce formatted
|
|
output
|
|
|
|
Here is a diagram of the whole process:
|
|
|
|
[figure1.png]
|
|
|
|
The part of the diagram inside the dotted box is your formatting
|
|
software, or toolchain. Besides the obvious and visible input to the
|
|
formatter (the document source) you'll need to keep the two `hidden'
|
|
inputs of the formatter (DTD and stylesheet) in mind to understand
|
|
what follows.
|
|
________________________________________________________________
|
|
|
|
5. Other DTDs
|
|
|
|
A brief digression into other DTDs may help make clear what parts of
|
|
the previous section were specific to DocBook and what parts are
|
|
general to all structural-markup languages.
|
|
|
|
[http://www.tei-c.org/] TEI (Text Encoding Initiative) is a large,
|
|
elaborate DTD used primarily in academia for computer transcription
|
|
of literary texts. TEI's Unix-based toolchains use many of the same
|
|
tools that are involved with DocBook, but with different stylesheets
|
|
and (of course) a different DTD.
|
|
|
|
XHTML, the latest version of HTML, is also an XML application
|
|
described by a DTD, which explains the family resemblance between
|
|
XHTML and DocBook tags. The XHTML toolchain consists of web browsers
|
|
and a number of ad-hoc HTML-to-print utilities.
|
|
|
|
Many other XML DTDs are maintained to help people exchange structured
|
|
information in fields as diverse as bioinformatics and banking. You
|
|
can look at a list of repositories to get some idea of the variety
|
|
out there.
|
|
________________________________________________________________
|
|
|
|
6. The DocBook toolchain
|
|
|
|
The easiest way to format and render XML-DocBook documents is to use
|
|
the xmlto toolchain. This ships with Red Hat; Debian users can get it
|
|
with the command apt-get install xmlto.
|
|
|
|
Normally, what you'll do to make XHTML from your DocBook sources will
|
|
look like this:
|
|
bash$ xmlto xhtml foo.xml
|
|
bash$ ls *.html
|
|
ar01s02.html ar01s03.html ar01s04.html index.html
|
|
|
|
In this example, you converted an XML-Docbook document named foo.xml
|
|
with three top-level sections into an index page and two parts.
|
|
Making one big page is just as easy:
|
|
bash$ xmlto xhtml-nochunks foo.xml
|
|
bash$ ls *.html
|
|
foo.html
|
|
|
|
Finally, here is how you make PDF for printing:
|
|
bash$ dblatex foo.xml # To make PDF
|
|
bash$ ls *.pdf
|
|
foo.pdf
|
|
|
|
Some older versions of xmlto may be more verbose, emitting noise like
|
|
"Converting to XHTML" and so forth.
|
|
|
|
To turn your documents into HTML or PDF, you need an engine that can
|
|
apply the combination of DocBook DTD and a suitable stylesheet to
|
|
your document. Here is how the open-source tools for doing this fit
|
|
together:
|
|
|
|
[figure2.png]
|
|
|
|
Present-day XML-DocBook toolchain
|
|
|
|
Parsing your document and applying the stylesheet transformation will
|
|
be handled by one of three programs. The most likely one is xsltproc.
|
|
The other possibilities are two Java programs, Saxon and Xalan,
|
|
|
|
It is relatively easy to generate high-quality XHTML from DocBook;
|
|
the fact that XHTML is simply another XML DTD helps a lot.
|
|
Translation to HTML is done by applying a rather simple stylesheet,
|
|
and that's the end of the story. RTF is also simple to generate in
|
|
this way, and from XHTML or RTF it's easy to generate a flat ASCII
|
|
text approximation in a pinch.
|
|
|
|
The awkward case is print. Generating high-quality printed output
|
|
(which means, in practice, Adobe's PDF or Portable Document Format, a
|
|
packaged form of PostScript) is difficult. Doing it right requires
|
|
algorithmically duplicating the delicate judgments of a human
|
|
typesetter moving from content to presentation level.
|
|
|
|
So, first, a stylesheet translates Docbook's structural markup into
|
|
another dialect of XML -- FO (Formatting Objects). FO markup is very
|
|
much presentation-level; you can think of it as a sort of XML
|
|
functional equivalent of troff. It has to be translated to Postscript
|
|
for packaging in a PDF.
|
|
|
|
In the toolchain shipped with most present-day Linux distributions,
|
|
this job is best handled by a program called dblatex (this obsoletes
|
|
the older passivetex package that previous versions of tis HOWTO
|
|
described).
|
|
|
|
dblatex translates the formatting objects generated by xsltproc into
|
|
Donald Knuth's TeX language. TeX was one of the earliest open-source
|
|
projects, an old but powerful presentation-level formatting language
|
|
much beloved of mathematicians (to whom it provides particulaly
|
|
elaborate facilities for describing mathematical notation). TeX is
|
|
also famously good at basic typesetting tasks like kerning, line
|
|
filling, and hyphenating. TeX's output is then massaged into PDF.
|
|
|
|
If you think this bucket chain of XML to Tex macros to PDF sounds
|
|
like an awkward kludge, you're right. It clanks, it wheezes, and it
|
|
has ugly warts. Fonts are a significant problem, since XML and TeX
|
|
and PDF have very different models of how fonts work; also, handling
|
|
internationalization and localization is a nightmare. About the only
|
|
thing this code path has going for it is that it works.
|
|
|
|
The elegant way will be [http://xmlgraphics.apache.org/fop/] FOP, a
|
|
direct FO-to-Postscript translator being developed by the Apache
|
|
project. With FOP, the internationalization problem is, if not
|
|
solved, at least well confined; XML tools handle Unicode all the way
|
|
through to FOP. Glyph to font mapping is also strictly FOP's problem.
|
|
The only trouble with this approach is that it entirely doesn't work
|
|
yet. As of October 2010 FOP is at 1.0 and usable, but with rough
|
|
edges and missing features. I recommed dblatex for production use.
|
|
|
|
Here is what the FOP toolchain looks like:
|
|
|
|
[figure3.png]
|
|
|
|
Future XML-DocBook toolchain with FOP.
|
|
________________________________________________________________
|
|
|
|
7. asciidoc
|
|
|
|
There is a relatively new tool called
|
|
[http://www.methods.co.nz/asciidoc/] asciidoc that tackles several of
|
|
the problems associated with DocBook rather effectively.
|
|
|
|
The asciidoc tool accepts a simple, lightweight syntax resembling
|
|
wiki markups and turns it into various output formats using DocBook
|
|
as an intermediate stage. The asciidoc markup is easier to compose in
|
|
than DocBook itself, and serves as its own best rendering in flat
|
|
ASCII.
|
|
|
|
Printing support in asciidoc is through an experimental LaTeX back
|
|
end. It is most useful for writing short to medium-length documents
|
|
for World Wide Web distribution.
|
|
________________________________________________________________
|
|
|
|
8. Who are the projects and the players?
|
|
|
|
The DocBook DTD itself is maintained by the DocBook Technical
|
|
Committee, headed by Norman Walsh. Norm is the principal author of
|
|
the DocBook stylesheets, a man who has focused remarkable energy and
|
|
talent over many years on the extremely complex problems DocBook
|
|
addresses. He is as universally respected in the DocBook community as
|
|
Linus Torvalds is in the Linux world.
|
|
|
|
[http://xmlsoft.org/XSLT/] libxslt is a C library that interprets
|
|
XSLT, applying stylesheets to XML documents. It includes a wrapper
|
|
program, xsltproc, that can be used as an XML formatter. The code was
|
|
written by Daniel Veillard under the auspices of the GNOME project,
|
|
but does not require any GNOME code to run. I hear it's blazingly
|
|
fast compared to the Java alternatives, not a surprising claim.
|
|
|
|
[http://cyberelk.net/tim/xmlto/] xmlto is the user interface of the
|
|
XML toolchain that most Linuxes. It's written and maintained by Tim
|
|
Waugh.
|
|
|
|
[http://users.iclway.co.uk/mhkay/saxon/] Saxon and
|
|
[http://xml.apache.org/xalan-j/] Xalan are Java programs that
|
|
interpret XSLT. Saxon seems to be designed to work under Windows.
|
|
Xalan is part of the XML Apache project and native to Linux and BSD;
|
|
it's designed to work with FOP.
|
|
|
|
[http://xml.apache.org/fop/] FOP translates XML Formatting Objects to
|
|
PDF. It is part of the Apache XML project and is designed to work
|
|
with Xalan.
|
|
|
|
[http://www.methods.co.nz/asciidoc/] asciidoc translates its own
|
|
lightweight markup to DocBook, and thence to various output formats.
|
|
________________________________________________________________
|
|
|
|
9. Migration tools
|
|
|
|
The second biggest problem with DocBook is the effort needed to
|
|
convert old-style presentation markup to DocBook markup. Human beings
|
|
can usually parse the presentation of a document into logical
|
|
structure automatically, because (for example) they can tell from
|
|
context when an italic font means `emphasis' and when it means
|
|
something else such as `this is a foreign phrase'.
|
|
|
|
Somehow, in converting documents to DocBook, those sorts of
|
|
distinctions need to be made explicit. Sometimes they're present in
|
|
the old markup; often they are not, and the missing structural
|
|
information has to be either deduced by clever heuristics or added by
|
|
a human.
|
|
|
|
Here is a summary of the state of conversion tools from various other
|
|
formats:
|
|
|
|
GNU Texinfo
|
|
The Free Software Foundation has made a policy decision to
|
|
support DocBook as an interchange format. Texinfo has enough
|
|
structure to make reasonably good automatic conversion
|
|
possible, and the 4.x versions of makeinfo feature a --docbook
|
|
switch that generates DocBook. More at the makeinfo project
|
|
page.
|
|
|
|
POD
|
|
There is a [http://www.cpan.org/modules/by-module/Pod/]
|
|
POD::DocBook module that translates Plain Old Documentation
|
|
markup to DocBook. It claims to translate every POD tag except
|
|
the L<> italic tag. The man page also says "Nested =over/=back
|
|
lists are not supported within DocBook." but notes that the
|
|
module has been heavily tested.
|
|
|
|
LaTeX
|
|
LaTeX is a (mostly) structural markup macro language built on
|
|
top of the TeX formatter. There is a project called
|
|
[http://www.lrz-muenchen.de/services/software/sonstiges/tex4ht
|
|
/mn.html] TeX4ht that (according to the author of PassiveTeX)
|
|
can generate DocBook from LaTeX.
|
|
|
|
man pages and other troff-based markups
|
|
This is generally considered the biggest and nastiest
|
|
conversion problem. And indeed, the basic troff(1) markup is
|
|
at too low a presentation level for automatic conversion tools
|
|
to do much of any good. However, the gloom in the picture
|
|
lightens significantly if we consider translation from sources
|
|
of documents written in macro packages like man(7). These have
|
|
enough structural features for automatic translation to get
|
|
some traction.
|
|
|
|
I wrote a tool to do this myself, because I couldn't find
|
|
anything else that did a half-decent job of it (and the
|
|
problem is interesting). It's called
|
|
[http://www.catb.org/~esr//doclifter/] doclifter. It will
|
|
translate to either SGML or XML DocBook from man(7), mdoc(7),
|
|
ms(7), or me(7) macros. See the documentation for details.
|
|
________________________________________________________________
|
|
|
|
10. Editing tools
|
|
|
|
Most people still hack DocBook tags by hand using either vi or emacs.
|
|
There's an Nxml mode that ships with Emacs and is automatically
|
|
invoked when the editor recognizes an XMl document. It has become
|
|
pretty good; while it doesn't give GUI presentation, it does use its
|
|
knowledge of XML to highlight out-of-balance tags. Some alternative
|
|
are summarized at the Emacs CategoryXML page.
|
|
|
|
There have been a number of attempts at GUI editors for DocBook,
|
|
often with the aim of being general editors for any markup with an
|
|
XML or SGML schema. EuroMath, MLView, Conglomerate, ThotBook are
|
|
among them. Such projects tent to stall out in alpha stage; designing
|
|
a decent UI for this task is extemely difficult.
|
|
|
|
Some attempts that have made it to production stage (if only barely,
|
|
in many cases) can be found at the DocBook Authoring Tools page. I
|
|
have not tried using any of these.
|
|
________________________________________________________________
|
|
|
|
11. Hints and tricks
|
|
|
|
It is possible to generate an index by including an empty <index/>
|
|
tag at the point in your document where you wish it to appear. Be
|
|
warned that, as of early 2004, this facility is still somewhat
|
|
primitive. It won't merge ranges, and the output generated for
|
|
PostScript is not yet production-quality.
|
|
|
|
This space is reserved for more hints and tricks.
|
|
________________________________________________________________
|
|
|
|
12. Related standards and practices
|
|
|
|
The tools are coming together, if slowly, to edit and format DocBook
|
|
markup. But DocBook itself is a means, not an end. We'll need other
|
|
standards besides DocBook itself to accomplish the
|
|
searchable-documentation-database objective I laid out at the
|
|
beginning of this document. There are two big issues: document
|
|
cataloguing and metadata.
|
|
|
|
The [http://scrollkeeper.sourceforge.net/] Scrollkeeper project aims
|
|
directly to meet this need. It provides a simple set of script hooks
|
|
that can be used by package install and uninstall productions to
|
|
register and unregister their documentation into and out of a shared,
|
|
searchable system-wide database.
|
|
|
|
Scrollkeeper uses the [http://www.ibiblio.org/osrt/omf/] Open
|
|
Metadata Format. This is a standard for indexing open-source
|
|
documentation analogous to a library card-catalog system. The idea is
|
|
to support rich search facilities that use the card-catalog metadata
|
|
as well as the source text of the documentation itself.
|
|
________________________________________________________________
|
|
|
|
13. SGML and SGML-Tools
|
|
|
|
In previous sections, I have thrown away a lot of DocBook's history.
|
|
XML has an older brother, SGML or Standard Generalized Markup
|
|
Language.
|
|
|
|
Until mid-2002, no discussion of DocBook would have been complete
|
|
without a long excursion into SGML, the differences between SGML and
|
|
XML, and detailed descriptions of the SGML DocBook toolchain. Life
|
|
can be simpler now; an XML DocBook toolchain is available in open
|
|
source, works as well as the SGML toolchain ever did, and is much
|
|
easier to use. If you don't think you'll ever have to deal with old
|
|
SGML-Docbook documents, you can skip the remainder of this section.
|
|
________________________________________________________________
|
|
|
|
13.1. DocBook SGML
|
|
|
|
DocBook was originally an SGML application, and there was an
|
|
SGML-based DocBook toolchain that is now moribund. There are minor
|
|
differences between the DocBook SGML DTD and the DocBook XML DTD, but
|
|
for an introductory discussion we can ignore them. The only one
|
|
that's normally user-visible is that in SGML contentless tags did not
|
|
need to have a trailing slash added to them before the closing >.
|
|
(Requiring the trailing / means XML parsers can be a lot simpler,
|
|
because they don't have to know about the DTD to know which opening
|
|
tags need closers.)
|
|
|
|
Versions of HTML up to 4.01 (before XHTML) were SGML applications.
|
|
TEI was originally an SGML application, too. The groups managing all
|
|
three DTDs jumped to XML for the same reason DocBook's developers did
|
|
-- it's drastically simpler. SGML was extremely complex; unmanageably
|
|
so, as it turns out. The specification was a dense 150 pages and it
|
|
is not reliably reported that any software ever fully implemented it.
|
|
|
|
The toolchain diagram I gave earlier was simplified; it only showed
|
|
the XML toolchain. Here is the historically correct version:
|
|
|
|
[figure4.png]
|
|
|
|
The DSSSL toolchain is what processed DocBook SGML. Under it, a
|
|
document goes from DocBook format through one of two closely-related
|
|
stylesheet engines called Jade and OpenJade. These turn it into a
|
|
TeX-macro markup, which is processed by a package called JadeTeX,
|
|
into DVIs, which then get turned into Postscript.
|
|
________________________________________________________________
|
|
|
|
13.2. SGML tools
|
|
|
|
The [http://sources.redhat.com/docbook-tools/] docbook-tools project
|
|
provides open-source tools for converting SGML DocBook to HTML,
|
|
Postscript, and other formats. This package is shipped with Red Hat
|
|
and other Linux distributions. It is maintained by Mark Galassi.
|
|
|
|
[http://www.jclark.com/jade/] Jade is an engine used to apply DSSSL
|
|
stylesheets to SGML documents. It is maintained by James Clark.
|
|
|
|
[http://openjade.sourceforge.net/] OpenJade is a community project
|
|
undertaken because the founders thought James Clark's maintainance of
|
|
Jade was spotty. The docbook-tools programs use OpenJade.
|
|
|
|
[http://users.ox.ac.uk/~rahtz/passivetex/] PassiveTeX the package of
|
|
LaTeX macros that xmlto uses for producing DVI from XML-DocBook.
|
|
[http://jadetex.sourceforge.net/] JadeTex is the package of LaTeX
|
|
macros that OpenJade uses for producing DVI from SGML-DocBook.
|
|
________________________________________________________________
|
|
|
|
13.3. Why SGML DocBook is dead
|
|
|
|
The DSSSL toolchain is, as far as new development goes, effectively
|
|
dead. The XSLT toolchain has reached production status in mid-2002; a
|
|
working version shipped in Red Hat 7.3. It's where DocBook developers
|
|
are putting almost all of their effort.
|
|
|
|
The reason for the change to XML was threefold. First, SGML turned
|
|
out to be too complicated to use; then, DSSSL turned out to be too
|
|
complicated to live with; then, significant parts of the DSSSL
|
|
toolchain turned out to be weak and irredeemably messy.
|
|
|
|
Relative to SGML, XML has a reduced feature set that is sufficient
|
|
for almost all purposes but much easier to understand and build
|
|
parsers for. SGML-processing tools (such as validating parsers) have
|
|
to carry around support for a lot of features that DocBook and other
|
|
text markup systems never actually used. Removing these features made
|
|
XML simpler and XML-processing tools faster.
|
|
|
|
The language used to describe SGML DTDs is sufficiently spiky and
|
|
forbidding that composing SGML DTDs was something of a black art. XML
|
|
DTDs, on the other hand, can be described in a dialect of XML itself;
|
|
there does not need to be a separate DTD language. An XML description
|
|
of an XML DTD is called a schema; the term DTD itself will probably
|
|
pass out of use as the standards for schemas firm up.
|
|
|
|
But mostly the DSSSL toolchain is dead because DSSSL itself, the SGML
|
|
stylesheet description language in that toolchain, proved just too
|
|
arcane for most human beings, and made stylesheets too difficult to
|
|
write and modify. (It was a dialect of Scheme. Your humble editor, a
|
|
LISP-head from way back, shakes his head in sad bemusement that this
|
|
should drive people away.)
|
|
|
|
XML fans like to sum up all these changes with "XML: tastes great,
|
|
less filling."
|
|
________________________________________________________________
|
|
|
|
13.4. SGML-Tools
|
|
|
|
SGML-Tools was the name of a DTD used by the
|
|
[http://www.linuxdoc.org] Linux Documentation Project, developed a
|
|
few years ago when today's DocBook toolchains didn't exist.
|
|
SGML-Tools markup was simpler, but also much less flexible than
|
|
DocBook. The original SGML-Tools formatter/DTD/stylesheet(s)
|
|
toolchain has been dead for some time now, but a successor called
|
|
SGML-tools Lite is still maintained.
|
|
|
|
The LDP has been phasing out SGML-Tools in favor of DocBook, but it
|
|
is still possible you might take over an old HOWTO. These can be
|
|
recognized by the identifying header "<!doctype linuxdoc system>". If
|
|
this happens to you, convert the thing to XML DocBook and give the
|
|
old version a quick burial.
|
|
________________________________________________________________
|
|
|
|
14. References
|
|
|
|
One of the things that makes learning DocBook difficult is that the
|
|
sites related to it tend to overwhelm the newbie with long lists of
|
|
W3C standards, massive exercises in markup theology, and dense
|
|
thickets of abstract terminology. We're going to try to avoid that
|
|
here by giving you just a few selected references to look at.
|
|
|
|
Michael Smith's [http://xml.oreilly.com/news/dontlearn_0701.html]
|
|
Take My Advice: Don't Learn XML surveys the XML world from an angle
|
|
similar to this document.
|
|
|
|
Norman Walsh's DocBook: The Definitive Guide is available
|
|
[http://www.oreilly.com/catalog/docbook/] in print and on the web.
|
|
This is indeed the definitive reference, but as an introduction or
|
|
tutorial it's a disaster. Instead, read this:
|
|
|
|
Writing Documentation Using DocBook: A Crash Course. This is an
|
|
excellent tutorial.
|
|
|
|
There is an excellent [http://www.dpawson.co.uk/docbook/] DocBook FAQ
|
|
with a lot of material on styling HTML output. There is also a
|
|
DocBook [http://docbook.org/wiki/moin.cgi] wiki.
|
|
|
|
If you're writing for the Linux Documentation Project, read the
|
|
[http://www.linuxdoc.org/LDP/LDP-Author-Guide/index.html] LDP Author
|
|
Guide.
|
|
|
|
The best general introduction to SGML and XML that I've personally
|
|
read all the way through is David Megginson's Structuring XML
|
|
Documents (Prentice-Hall, ISBN: 0-13-642299-3).
|
|
|
|
For XML only, XML In A Nutshell by W. Scott Means and Elliotte
|
|
"Rusty" Harold is very good.
|
|
|
|
The XML Bible looks like a pretty comprehensive reference on XML and
|
|
related standards (including Formatting Objects).
|
|
|
|
Finally, the The XML Cover Pages will take you into the jungle of XML
|
|
standards if you really want to go there.
|