mirror of https://github.com/tLDP/LDP
new
This commit is contained in:
parent
efcdecf7cf
commit
5de5d9e086
|
@ -0,0 +1,738 @@
|
|||
<?xml version="1.0"?>
|
||||
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://docbook.org/xml/4.1.2/docbookx.dtd" [
|
||||
<!ENTITY howto "http://tldp.org/HOWTO/">
|
||||
<!ENTITY mini-howto "http://tldp.org/HOWTO/mini/">
|
||||
]>
|
||||
|
||||
<article>
|
||||
<articleinfo>
|
||||
<title>DocBook Demystification HOWTO</title>
|
||||
|
||||
<author>
|
||||
<firstname>Eric</firstname>
|
||||
<surname>Raymond</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>esr@thyrsus.com</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
|
||||
<revhistory>
|
||||
<revision>
|
||||
<revnumber>v1.0</revnumber>
|
||||
<date>2001-09-20</date>
|
||||
<authorinitials>esr</authorinitials>
|
||||
<revremark>
|
||||
Initial version.
|
||||
</revremark>
|
||||
</revision>
|
||||
</revhistory>
|
||||
|
||||
<abstract><para>
|
||||
This HOWTO attempts to clear the fog and mystery surrounding the
|
||||
DocBook markup system and the tools that go with it. It is aimed at
|
||||
authors of technical documentation for open-source projects hosted
|
||||
on Linux, but should be useful for people composing other kinds on
|
||||
other Unixes as well.
|
||||
</para></abstract>
|
||||
|
||||
</articleinfo>
|
||||
|
||||
<sect1 id="intro"><title>Introduction</title>
|
||||
|
||||
<para>A great many major open-source projects are converging on
|
||||
DocBook as a standard format for their documentation — projects
|
||||
including the Linux kernel, GNOME, KDE, Samba, and the Linux
|
||||
Documentation Project. The advocates of XML-based "structural markup"
|
||||
(as opposed to the older style of "presentation markup" exemplified by
|
||||
troff, Tex, and Texinfo) seem to have won the theoretical
|
||||
battle.</para>
|
||||
|
||||
<para>Nevertheless, a lot of confusion surrounds DocBook and the
|
||||
programs that support it. Its devotees speak an argot that is dense
|
||||
and forbidding even by computer-science standards, slinging around
|
||||
acronyms that have no obvious relationship to the things you need to
|
||||
do to write markup and make HTML or Postscript from it. XML standards
|
||||
and technical papers are notoriously obscure. Most DocBook-related
|
||||
tools are very poorly documented, and their documentation is
|
||||
especially prone to assume way too much prior knowledge on the
|
||||
reader's part.</para>
|
||||
|
||||
<para>This HOWTO will attempt to clear up the major mysteries
|
||||
surrounding DocBook and its application to open-source documentation
|
||||
— both the technical and political ones. Our objective is to equip
|
||||
you to understand not just what you need to do to make documents, but
|
||||
why the process is as complex as it is — and how it can be
|
||||
expected to change as newer DocBook-related tools become
|
||||
available.</para>
|
||||
|
||||
</sect1>
|
||||
<sect1><title>Why care about DocBook at all?</title>
|
||||
|
||||
<para>There are two possibilities that make DocBook really
|
||||
interesting. One is <emphasis>multi-mode rendering</emphasis> and the
|
||||
other is <emphasis>searchable documentation
|
||||
databases</emphasis>.</para>
|
||||
|
||||
<para>Multi-mode rendering is the easier, nearer-term possibility; it's
|
||||
the ability to write a document in a single master format that can be
|
||||
rendered in many different display modes (in particular, as both HTML
|
||||
for on-line viewing and as Postscript for high-quality printed
|
||||
output). This capability is pretty well implemented now.</para>
|
||||
|
||||
<para><emphasis>Searchable documentation databases</emphasis> is
|
||||
shorthand for the possibility that DocBook might help get us to a
|
||||
world in which all the documentation on your open-source operating
|
||||
system is one rich, searchable, cross-indexed and hyperlinked
|
||||
database (rather than being scattered across several different formats
|
||||
in multiple locations as it is now).</para>
|
||||
|
||||
<para>Ideally, whenever you install a software package on your machine
|
||||
it would register its DocBook documentation into your system's
|
||||
catalog. HTML, properly indexed and cross-linked to the HTML in the
|
||||
rest of your catalog, would be generated. The new package's
|
||||
documentation would then be available through your browser. All
|
||||
your documentation would would be searchable through an interface
|
||||
resembling a good Web search engine.</para>
|
||||
|
||||
<para>HTML itself is not quite rich enough a format to get us to that
|
||||
world. To name just one lack, you can't explicitly declare index
|
||||
entries in HTML. DocBook <emphasis>does</emphasis> have the semantic
|
||||
richness to support structured documentation databases. Fundamentally
|
||||
that's why so many projects are adopting it.</para>
|
||||
|
||||
<para>DocBook has the vices that go with its virtues. Some people
|
||||
find it unpleasantly heavyweight, and too verbose to be really
|
||||
comfortable as a composition format. That's OK; as long as the markup
|
||||
tools they like (things like Perl POD or GNU Texinfo) can generate
|
||||
DocBook out their back ends, we can all still get we want. It doesn't
|
||||
matter whether or not everybody writes in DocBook — as long as
|
||||
it becomes the common document interchange format that everyone uses,
|
||||
we'll still get unified searchable documentation databases.</para>
|
||||
|
||||
</sect1>
|
||||
<sect1><title>Structural markup: a primer</title>
|
||||
|
||||
<para>Older formatting languages like Tex, Texinfo, and Troff
|
||||
supported <indexterm><primary>presentation
|
||||
markup</primary></indexterm>. In these systems, the instructions you
|
||||
gave were about the appearance and physical layout of the text (font
|
||||
changes, indentation changes, that sort of thing).</para>
|
||||
|
||||
<para>Presentation markup was adequate as long as your objective was
|
||||
to print to a single medium or type of display device. You run into
|
||||
its limits, however, when you want to mark up a document so that (a)
|
||||
it can be formatted for very different display media (such as printing
|
||||
vs. Web display), or (b) you want to support searching and indexing the
|
||||
document by its logical structure (as you are likely to want to do,
|
||||
for example, if you are incorporating it into a hypertext system).</para>
|
||||
|
||||
<para>To support these capabilities properly, you need a system of
|
||||
<indexterm><primary>structural markup</primary></indexterm>. In structural
|
||||
markup, you describe not the physical appearance of the document but
|
||||
the logical properties of its parts.</para>
|
||||
|
||||
<para>As an example: In a presentation-markup language, if you want to
|
||||
emphasize a word, you might instruct the formatter to set it in
|
||||
boldface. In
|
||||
<citerefentry><refentrytitle>troff</refentrytitle><manvolnum>1</manvolnum></citerefentry>
|
||||
this would look like so:</para>
|
||||
|
||||
<programlisting>
|
||||
All your base
|
||||
.B are
|
||||
belong to us!
|
||||
</programlisting>
|
||||
|
||||
<para>In a structural-markup language, you would tell the formatter to
|
||||
emphasize the word:</para>
|
||||
|
||||
<programlisting>
|
||||
All your base <emphasis>are</emphasis> belong to us!
|
||||
</programlisting>
|
||||
|
||||
<para> The "<emphasis>" and </emphasis>in the line above
|
||||
are called <indexterm><primary>markup tags</primary></indexterm>, or
|
||||
just <firstterm>tags</firstterm> for short. They are the instructions
|
||||
to your formatter.</para>
|
||||
|
||||
<para>In a structural-markup language, the physical appearance of the
|
||||
final document would be controlled by a
|
||||
<indexterm><primary>stylesheet</primary></indexterm>. It is the
|
||||
stylesheet that would tell the formatter "render emphasis as a font
|
||||
change to boldface". One advantage of presentation-markup languages
|
||||
is that by changing a stylesheet you can globally change the
|
||||
presentation of the document (to use different fonts, for example)
|
||||
without having to hack all the the individual instances of (say)
|
||||
<markup>.B</markup> in the document itself.</para>
|
||||
|
||||
</sect1>
|
||||
<sect1><title>Document Type Definitions</title>
|
||||
|
||||
<para>(Note: to keep the explanation simple, most of this
|
||||
section is going to tell some lies, mainly by omitting a lot of
|
||||
history. Truthfulness will be fully restored in a following
|
||||
section.)</para>
|
||||
|
||||
<para>DocBook is a structural-level markup language. Specifically, it
|
||||
is a dialect of XML. A DocBook document is a hunk of XML that uses
|
||||
XML tags for structural markup.</para>
|
||||
|
||||
<para>In order for a document formatter to apply a stylesheet to your
|
||||
document and make it look good, it needs to know things about the
|
||||
overall structure of your document. For example, it needs to know
|
||||
that a book manuscript normally consists of front matter, a sequence
|
||||
of chapters, and back matter in order to physically format chapter
|
||||
headers properly. In order for it to know this sort of thing, you
|
||||
need to give it a <indexterm><primary>Document Type
|
||||
Definition</primary></indexterm> or DTD. The DTD tells your
|
||||
formatter what sorts of elements can be in the document structure, and
|
||||
in what orders they can appear.</para>
|
||||
|
||||
<para>What we mean by calling DocBook an `application' of XML is
|
||||
actually that DocBook is a DTD — a rather large DTD, with
|
||||
somewhere around 400 tags in it.</para>
|
||||
|
||||
<para>Lurking behind DocBook is a kind of program called a
|
||||
<indexterm><primary>validating parser</primary></indexterm>.When you
|
||||
format a DocBook document, the first step is to pass it through a
|
||||
validating parser (the front end of the DocBook formatter). This
|
||||
program checks your document against the DocBook DTD to make sure you
|
||||
aren't breaking any of the DTD's structural rules (otherwise the back
|
||||
end of the formatter, the part that applies your style sheet, might
|
||||
become quite confused)</para>
|
||||
|
||||
<para>The validating parser will either bomb out, giving you error
|
||||
messages about places where the document structure is broken, or translate
|
||||
the document into a stream of <firstterm>formatting events</firstterm>
|
||||
which the parser back end combines with the information in your stylesheet
|
||||
to produce formatted output</para>
|
||||
|
||||
<para>Here is a diagram of the whole process:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject> <imagedata fileref="figure1.png" format="PNG"/></imageobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>The part of the diagram inside the dotted box is your formatting
|
||||
software, or <firstterm>toolchain</firstterm>. Besides the obvious and
|
||||
visible input to the formatter (the document source) you'll need to
|
||||
keep the two `hidden' inputs of the formatter (DTD and stylesheet) in
|
||||
mind to understand what follows.</para>
|
||||
</sect1>
|
||||
<sect1><title>Other DTDs</title>
|
||||
|
||||
<para>A brief digression into other DTDs may help make clear what parts of
|
||||
the previous section were specific to DocBook and what parts are general to
|
||||
all structural-markup languages.</para>
|
||||
|
||||
<para><ulink url="http://www.tei-c.org/">TEI</ulink> (Text Encoding
|
||||
Initiative) is a large, elaborate DTD used primarily in academia for
|
||||
computer transcription of literary texts. TEI's Unix-based toolchains
|
||||
use many of the same tools that are involved with DocBook, but with
|
||||
different stylesheets and (of course) a different DTD.</para>
|
||||
|
||||
<para>XHTML, the latest version of HTML, is also an XML application
|
||||
described by a DTD, which explains the family resemblance between
|
||||
XHTML and DocBook tags. The XHTML toolchain consists of web browsers
|
||||
and a number of ad-hoc HTML-to-print utilities.</para>
|
||||
|
||||
<para>Many other XML DTDs are maintained to help people exchange
|
||||
structured information in fields as diverse as bioinformatics and
|
||||
banking. You can look at a <ulink
|
||||
url="http://www.xml.com/pub/rg/DTD_Repositories"> list of
|
||||
repositories</ulink> to get some idea of the variety out
|
||||
there.</para>
|
||||
|
||||
</sect1>
|
||||
<sect1><title>The DocBook toolchain</title>
|
||||
|
||||
<para>Normally, what you'll do to make XHTML from your
|
||||
DocBook sources will look like this:</para>
|
||||
|
||||
<screen>
|
||||
bash$ xmlto xhtml foo.xml
|
||||
Convert to XHTML
|
||||
bash$ ls *.html
|
||||
ar01s02.html ar01s03.html ar01s04.html index.html
|
||||
</screen>
|
||||
|
||||
<para>In this example, you converted an XML-Docbook document named
|
||||
<filename>foo.xml</filename> with three top-level sections into an
|
||||
index page and two parts. Making one big page is just as easy:</para>
|
||||
|
||||
<screen>
|
||||
bash$ xmlto xhtml-nochunks foo.xml
|
||||
Convert to XHTML
|
||||
bash$ ls *.html
|
||||
foo.html
|
||||
</screen>
|
||||
|
||||
<para>Finally, here is how you make Postscript for printing:</para>
|
||||
|
||||
<screen>
|
||||
bash$ xmlto ps foo.xml # To make Postscript
|
||||
Convert to XSL-FO
|
||||
Making portrait pages on A4 paper (210mmx297mm)
|
||||
Post-process XSL-FO to DVI
|
||||
Post-process DVI to PS
|
||||
bash$ ls *.ps
|
||||
foo.ps
|
||||
</screen>
|
||||
|
||||
<para>To turn your documents into HTML or Postscript, you need an
|
||||
engine that can apply the combination of DocBook DTD and
|
||||
a suitable stylesheet to your document. Here is how the
|
||||
open-source tools for doing this fit together:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject> <imagedata fileref="figure2.png" format="PNG"/></imageobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>Parsing your document and applying the stylesheet transformation
|
||||
will be handled by one of three programs. The most likely one is
|
||||
<indexterm><primary>xsltproc</primary></indexterm>, the parser
|
||||
that ships with Red Hat 7.3. The other possibilities are two Java
|
||||
programs, <indexterm><primary>Saxon</primary></indexterm> and
|
||||
<indexterm><primary>Xalan</primary></indexterm>,</para>
|
||||
|
||||
<para>It is relatively easy to generate high-quality XHTML from either
|
||||
DocBook; the fact that XHTML is simply another XML DTD helps a lot.
|
||||
Translation to HTML is done by applying a rather simple stylesheet,
|
||||
and that's the end of the story. RTF is also simple to generate in
|
||||
this way, and from XHTML or RTF it's easy to generate a flat ASCII
|
||||
text approximation in a pinch.</para>
|
||||
|
||||
<para>The awkward case is print. Generating high-quality printed
|
||||
output (which means, in practice, Adobe's
|
||||
<indexterm><primary>PDF</primary></indexterm> (Portable Document
|
||||
Format) is difficult. Doing it right requires algorithmically
|
||||
duplicating the delicate judgments of a human typesetter moving from
|
||||
content to presentation level.</para>
|
||||
|
||||
<para>So, first, a stylesheet translates Docbook's structural markup
|
||||
into another dialect of XML —
|
||||
<indexterm><primary>FO</primary></indexterm> (Formatting Objects). FO
|
||||
markup is very much presentation-level; you can think of it as a sort
|
||||
of XML functional equivalent of troff. It has to be translated to
|
||||
Postscript for packaging in a PDF.</para>
|
||||
|
||||
<para>In the toolchain shipped with Red Hat, this job is handled by a
|
||||
TeX macro package called
|
||||
<indexterm><primary>PassiveTeX</primary></indexterm>. It translates the
|
||||
formatting objects generated by <command>xsltproc</command> into
|
||||
Donald Knuth's TeX language. TeX was one of the earliest open-source
|
||||
projects, an old but powerful presentation-level formatting language
|
||||
much beloved of mathematicians (to whom it provides particulaly
|
||||
elaborate facilities for describing mathematical notation). TeX is
|
||||
also famously good at basic typesetting tasks like kerning, line
|
||||
filling, and hyphenating. TeX's output, in what's called
|
||||
<indexterm><primary>DVI</primary></indexterm> (DeVice Independent)
|
||||
format, is then massaged into PDF.</para>
|
||||
|
||||
<para>If you think this bucket chain of XML to Tex macros to DVI to
|
||||
PDF sounds like an awkward kludge, you're right. It clanks, it
|
||||
wheezes, and it has ugly warts. Fonts are a significant problem,
|
||||
since XML and TeX and PDF have very different models of how fonts
|
||||
work; also, handling internationalization and localization is a
|
||||
nightmare. About the only thing this code path has going for it is
|
||||
that it works.</para>
|
||||
|
||||
<para>The elegant way will be
|
||||
<indexterm><primary>FOP</primary></indexterm>, a direct
|
||||
FO-to-Postscript translator being developed by the Apache project.
|
||||
With FOP, the internationalization problem is, if not solved, at least
|
||||
well confined; XML tools handle Unicode all the way through to FOP.
|
||||
Glyph to font mapping is also strictly FOP's problem. The only
|
||||
trouble with this approach is that it doesn't work — yet. As of
|
||||
August 2002 FOP is in an unfinished alpha state — usable, but
|
||||
with rough edges and missing features.</para>
|
||||
|
||||
<para>Here is what the FOP toolchain looks like:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject> <imagedata fileref="figure3.png" format="PNG"/></imageobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>FOP has competition. There is another project called
|
||||
<indexterm><primary>xsl-fo-proc</primary></indexterm> which aims to do
|
||||
the same things as FOP, but in C++ (and therefore both faster than
|
||||
Java and not relying on the Java environment). As of August 2002 FOP
|
||||
is in an unfinished alpha state, not as far along as FOP.</para>
|
||||
|
||||
</sect1>
|
||||
<sect1><title>Who are the projects and the players?</title>
|
||||
|
||||
<para>The DocBook DTD itself is maintained by the DocBook Technical
|
||||
Committee, headed by Norman Walsh. Norm is the inventor of DocBook, a
|
||||
man who has focused remarkable energy and talent over many years on
|
||||
the extremely complex problems it addresses. He is as universally
|
||||
respected in the DocBook/SGML/XML community as Linus Torvalds is in
|
||||
the Linux world.</para>
|
||||
|
||||
<para>The <ulink url="http://sources.redhat.com/docbook-tools/">
|
||||
docbook-tools</ulink> project provides open-source tools for
|
||||
converting SGML DocBook to HTML, Postscript, and other formats. This
|
||||
package is shipped with Red Hat and other Linux distributions. It is
|
||||
maintained by Mark Galassi.</para>
|
||||
|
||||
<para><ulink url="http://www.jclark.com/jade/">Jade</ulink> is an
|
||||
engine used to apply DSSSL stylesheets to SGML documents. It is
|
||||
maintained by James Clark.</para>
|
||||
|
||||
<para><ulink url="http://openjade.sourceforge.net/">OpenJade</ulink>
|
||||
is a community roject undertaken because the founders thought James
|
||||
Clark's maintainance of Jade was spotty. The docbook-tools programs
|
||||
use OpenJade.</para>
|
||||
|
||||
<para><ulink url="http://xmlsoft.org/XSLT/">libxslt</ulink> is a C
|
||||
library that interprers XSLT, applying stylesheets to XML documents.
|
||||
It includes a wrapper program, <command>xsltproc</command>, that can be
|
||||
used as an XML formatter. The code was written by Daniel Veillard
|
||||
under the auspices of the GNOME project, but does not require any
|
||||
GNOME code to run. I hear it's blazingly fast compared to the
|
||||
Java alternatives, not a surprising claim.</para>
|
||||
|
||||
<para><ulink url="http://cyberelk.net/tim/xmlto/">xmlto</ulink> is the
|
||||
user interface of the XML toolchain that Red Hat ships. It's written
|
||||
and maintained by Tim Waugh.</para>
|
||||
|
||||
<para><ulink url="http://users.iclway.co.uk/mhkay/saxon/">Saxon</ulink>
|
||||
and <ulink url="http://xml.apache.org/xalan-j/">Xalan</ulink> are Java
|
||||
programs that interpret XSLT. Saxon seems to be designed to work
|
||||
under Windows. Xalan is part of the XML Apache project and native to
|
||||
Linux and BSD; it's designed to work with FOP.</para>
|
||||
|
||||
<para><ulink url="http://jadetex.sourceforge.net/">JadeTex</ulink> is
|
||||
the package of LaTeX macros that OpenJade uses for producing DVI.
|
||||
<ulink url="http://users.ox.ac.uk/~rahtz/passivetex/">PassiveTeX</ulink>
|
||||
performs a similar function on the XML side.</para>
|
||||
|
||||
<para><ulink url="http://xml.apache.org/fop/">FOP</ulink> translates
|
||||
XML Formatting Objects to PDF. It is part of the Apache XML project
|
||||
and is designed to work with Xalan.</para>
|
||||
|
||||
</sect1>
|
||||
<sect1><title>Migration tools</title>
|
||||
|
||||
<para>The second biggest problem with DocBook is the effort needed to
|
||||
convert old-style presentation markup to DocBook markup. Human beings
|
||||
can usually parse the presentatition of a document into logical
|
||||
structure automatically, because (for example) they can tell from
|
||||
context when an italic font means `emphasis' and when it meabs
|
||||
something else such as `this is a foreign phrase'.</para>
|
||||
|
||||
<para>Somehow, in converting documents to DocBook, those
|
||||
sorts of distinctions need to be made explicit. Sometimes
|
||||
they're present in the old markup; often they are not, and the
|
||||
missing structural information has to be either deduced by
|
||||
clever heuristics or added by a human.</para>
|
||||
|
||||
<para>Here is a summary of the state of conversion tools from
|
||||
various other formats:</para>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>GNU Texinfo</term>
|
||||
<listitem>
|
||||
<para>The Free Software Foundation has made a policy decision to move
|
||||
towards DocBook and away from Texinfo, its traditional format.
|
||||
Texinfo has enough structure to make reasonably good automatic
|
||||
conversion possible, and the 4.x versions of <command>makeinfo</command>
|
||||
feature a <option>--docbook</option> switch that generates DocBook.
|
||||
More at the <ulink url="http://www.gnu.org/directory/texinfo.html">makeinfo
|
||||
project page</ulink>.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>POD</term>
|
||||
<listitem>
|
||||
<para>There is a <ulink
|
||||
url="http://www.cpan.org/modules/by-module/Pod/">POD::DocBook</ulink>
|
||||
module that translates Plain Old Documentation markup to DocBook. It
|
||||
claims to support every DocBook tag except the L<> italic tag.
|
||||
The man page also says "Nested =over/=back lists are not supported
|
||||
within DocBook." but notes that the module has been heavily
|
||||
tested.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>LaTeX</term>
|
||||
<listitem>
|
||||
<para>LaTeX is a (mostly) structural markup macro language built on
|
||||
top of the TeX formatter. There is a project called <ulink
|
||||
url="http://www.lrz-muenchen.de/services/software/sonstiges/tex4ht/mn.html">
|
||||
TeX4ht</ulink>that (according to the author of PassiveTeX) can
|
||||
generate DocBook from LaTeX.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>man pages and other troff-based markups</term>
|
||||
<listitem>
|
||||
<para>This is generally considered the biggest and nastiest conversion
|
||||
problem. And indeed, the basic
|
||||
<citerefentry><refentrytitle>troff</refentrytitle>
|
||||
<manvolnum>1</manvolnum></citerefentry> markup is at too low a presentation
|
||||
level for automatic conversion tools to do much of any good. However,
|
||||
the gloom in the picture lightens significantly if we consider
|
||||
translation from sources of documents written in macro packages like
|
||||
<citerefentry><refentrytitle>man</refentrytitle>
|
||||
<manvolnum>7</manvolnum></citerefentry>. These have enough structural
|
||||
features for automatic translation to get some traction.</para>
|
||||
|
||||
<para>I wrote a tool to do this myself, because I couldn't find
|
||||
anything else that did a half-decent job of it (and the problem is
|
||||
interesting). It's called <ulink
|
||||
url="http://www.tuxedo.org/~esr/doclifter/">doclifter</ulink>. It will
|
||||
translate to either SGML or XML DocBook from
|
||||
<citerefentry><refentrytitle>man</refentrytitle>
|
||||
<manvolnum>7</manvolnum></citerefentry>,
|
||||
<citerefentry><refentrytitle>mdoc</refentrytitle>
|
||||
<manvolnum>7</manvolnum></citerefentry>,
|
||||
<citerefentry><refentrytitle>ms</refentrytitle>
|
||||
<manvolnum>7</manvolnum></citerefentry>, or
|
||||
<citerefentry><refentrytitle>me</refentrytitle>
|
||||
<manvolnum>7</manvolnum></citerefentry> macros. See the documentation
|
||||
for details.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
||||
</sect1>
|
||||
<sect1><title>Editing tools</title>
|
||||
|
||||
<para>One thing we presently do not have is a good open-source
|
||||
structure editor for SGML/XML documents.</para>
|
||||
|
||||
<para><ulink url="http://www.lyx.org/">LyX</ulink> is a GUI word processor
|
||||
that uses LaTeX for printing and supports structural editing of LaTeX
|
||||
markup. There is a LaTeX package that generates DocBook, and a
|
||||
<ulink url="http://bgu.chez.tiscali.fr/doc/db4lyx/">how-to document</ulink>
|
||||
escribing how to write SGML and XML in the LyX GUI.</para>
|
||||
|
||||
<para><ulink url="http://idx-getox.idealx.org/">GeTox</ulink>, the
|
||||
GNOME XML Editor, aims at nontechnical users. But the software is
|
||||
still (as of August 2001) alpha, more a proof of concept than anything
|
||||
useful, and the project group seems not to be very active; there have
|
||||
been no updates of the website between May 2001 and August 2002 (time of
|
||||
writing).</para>
|
||||
|
||||
<para><ulink
|
||||
url="http://www.math.u-psud.fr/~anh/TeXmacs/TeXmacs.html"> GNU
|
||||
TeXMacs</ulink> is a project aimed at producing an editor that is good
|
||||
for technical and mathematical material, including displayed formulas.
|
||||
1.0 was released in April 2002. The developers plan XML support in
|
||||
the future, but it's not there yet.</para>
|
||||
|
||||
<para><ulink url="http://www.freesoftware.fsf.org/thotbook/">ThotBook</ulink>
|
||||
is a project to put together a GUI editor for DocBook based on
|
||||
the Thot toolkit. It way be moribund; the web page was not updated
|
||||
from November 2001 to August 2002 (time of writing).</para>
|
||||
|
||||
<para>Most people still hack the tags by hand using either vi or Emacs, using
|
||||
psgml to validate the results.</para>
|
||||
|
||||
</sect1>
|
||||
<sect1><title>Related standards and practices</title>
|
||||
|
||||
<para>The tools are coming together, if slowly, to edit and format
|
||||
DocBook markup. But DocBook itself is a means, not an end. We'll need
|
||||
other standards besides DocBook itself to accomplish the
|
||||
searchable-documentation-database objective I laid out at the
|
||||
beginning of this document. There are two big issues: document
|
||||
cataloguing and metadata.</para>
|
||||
|
||||
<para>The <ulink
|
||||
url="http://scrollkeeper.sourceforge.net/">Scrollkeeper</ulink>
|
||||
project aims directly to meet this need. It provides a simple set of
|
||||
script hooks that can be used by package install and uninstall
|
||||
productions to register and unregister their documentation.</para>
|
||||
|
||||
<para>Scrollkeeper uses the <ulink
|
||||
url="http://www.ibiblio.org/osrt/omf/"> Open Metadata Format</ulink>.
|
||||
This is a standard for indexing open-source documentation analogous to
|
||||
a library card-catalog system. The idea is to support rich search
|
||||
facilities that use the card-catalog metadata as well as the source
|
||||
text of the documentation itself.</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1><title>SGML and SGML-Tools</title>
|
||||
|
||||
<para>In previous sections, I have thrown away a lot of DocBook's
|
||||
history. XML has an older brother,
|
||||
<indexterm><primary>SGML</primary></indexterm> or Standard Generalized
|
||||
Markup Language.</para>
|
||||
|
||||
<para>Until mid-2002, no discussion of DocBook would have been
|
||||
complete without a long excursion into SGML, the differences between
|
||||
SGML and XML, and detailed descriptions of the SGML DocBook toolchain.
|
||||
Life can be simpler now; a XML DocBook toolchain is available in open
|
||||
source, works as well as the SGML toolchain ever did, and is easier to
|
||||
use, If you don't think you'll ever have to deal with old SGML-Docbook
|
||||
documents, you can skip the remainder of this section.</para>
|
||||
|
||||
<sect2><title>DocBook SGML</title>
|
||||
|
||||
<para>DocBook was originally an SGML application, and there was an
|
||||
SGML-based DocBook toolchain that is now moribund. There are minor
|
||||
differences between the DocBook SGML DTD and the DocBook XML DTD, but
|
||||
for an introductory discussion we can ignore them. The only one that's
|
||||
normally user-visible is that in SGML contentless tags did not need to
|
||||
have a trailing slash added to them before the closing >.
|
||||
(Requiring the trailing / means XML parsers can be a lot simpler,
|
||||
because they don't have to know about the DTD to know which opening
|
||||
tags need closers.)</para>
|
||||
|
||||
<para>Versions of HTML up to 4.01 (before XHTML) were SGML
|
||||
applications. TEI was originally an SGML application, too. The
|
||||
groups managing all three DTDs jumped to XML for the same reason
|
||||
DocBook's developers did — it's drastically simpler. SGML was
|
||||
extremely complex; unmanageably so, as it turns out. The
|
||||
specification was a dense 150 pages and it is not reliably reported
|
||||
that any software ever fully implemented it.</para>
|
||||
|
||||
<para>The toolchain diagram I gave earlier was simplified; it
|
||||
only showed the XML toolchain. Here is the historically
|
||||
correct version:</para>
|
||||
|
||||
<mediaobject>
|
||||
<imageobject><imagedata fileref="figure4.png" format="PNG"/></imageobject>
|
||||
</mediaobject>
|
||||
|
||||
<para>The DSSSL toolchain is what processed DocBook SGML.
|
||||
Under it, a document goes from DocBook format through one of two
|
||||
closely-related stylesheet engines called Jade and OpenJade. These
|
||||
turn it into a TeX-macro markup. which is processed by a package called
|
||||
JadeTeX, into DVIs, which then get turned into Postscript.</para>
|
||||
</sect2>
|
||||
|
||||
<sect2><title>Why SGML DocBook is dead</title>
|
||||
|
||||
<para>The DSSSL toolchain is, as far as new development goes,
|
||||
effectively dead. The XSLT toolchain has just reached production
|
||||
status as I write in August 2002; a working version shipped in Red Hat
|
||||
7.3. It's where DocBook developers are putting almost all of their
|
||||
effort.</para>
|
||||
|
||||
<para>The reason for the change to XML was threefold. First,
|
||||
SGML turned out to be too complicated to use; then, DSSSL turned out
|
||||
to be too complicated to live with; then, significant parts of the
|
||||
DSSSL toolchain turned out to be weak and irredeemably messy.</para>
|
||||
|
||||
<para>Relative to SGML, XML has a reduced feature set that is
|
||||
sufficient for almost all purposes but much easier to understand and
|
||||
build parsers for. SGML-processing tools (such as validating parsers) have
|
||||
to carry around support for a lot of features that DocBook and other
|
||||
text markup systems never actually used. Removing these features
|
||||
made XML simpler and XML-processing tools faster.</para>
|
||||
|
||||
<para>The language used to describe SGML DTDs is sufficiently spiky
|
||||
and forbidding that composing SGML DTDs was something of a black art.
|
||||
XML DTDs, on the other hand, can be described in a dialect of XML
|
||||
itself; there does not need to be a separate DTD language. An XML
|
||||
description of an XML DTD is called a
|
||||
<indexterm><primary>schema</primary></indexterm>; the term DTD itself
|
||||
will probably pass out of use as the standards for schemas firm
|
||||
up.</para>
|
||||
|
||||
<para>But mostly the DSSSL toolchain is dead because DSSSL itself, the
|
||||
SGML stylesheet description language in that toolchain, proved just too
|
||||
arcane for most human beings, and made stylesheets too difficult to
|
||||
write and modify. (It was a dialect of Scheme. Your humble editor, a
|
||||
LISP-head from way back, shakes his head in sad bemusement that
|
||||
this should drive people away.)</para>
|
||||
|
||||
<para>XML fans like to sum up all these changes with "XML: tastes great, less
|
||||
filling."</para>
|
||||
</sect2>
|
||||
|
||||
<sect2><title>SGML-Tools</title>
|
||||
|
||||
<para>SGML-Tools was the name of a DTD used by the <ulink
|
||||
url="http://www.linuxdoc.org">Linux Documentation Project</ulink>,
|
||||
developed a few years ago when today's DocBook toolchains didn't exist.
|
||||
SGML-Tools markup was simpler, but also much less flexible than
|
||||
DocBook. The original SGML-Tools formatter/DTD/stylesheet(s)
|
||||
toolchain has been dead for some time now, but a successor called <ulink
|
||||
url="http://sourceforge.net/projects/sgmltools-lite/">SGML-tools
|
||||
Lite</ulink> is still maintained.</para>
|
||||
|
||||
<para>The LDP has been phasing out SGML-Tools in favor of DocBook, but
|
||||
it is still possible you might take over an old HOWTO. These can be
|
||||
regognized by the identifying header "<!doctype linuxdoc
|
||||
system>. If this happens to you, convert the thing to XML DocBook
|
||||
and give the old version a quick burial.</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1><title>References</title>
|
||||
|
||||
<para>One of the things that makes learning DocBook difficult is that
|
||||
the sites related to it tend to overwhelm the newbie with long lists
|
||||
of W3C standards, massive exercises in SGML theology, and dense
|
||||
thickets of abstract terminology. We're going to try to avoid that
|
||||
here by giving you just a few selected references to look at.</para>
|
||||
|
||||
<para>Michael Smith's <ulink
|
||||
url="http://xml.oreilly.com/news/dontlearn_0701.html">
|
||||
Take My Advice: Don't Learn XML</ulink> surveys the XML world from
|
||||
an angle similar to this document.</para>
|
||||
|
||||
<para>Norman Walsh's <citetitle>DocBook: The Definitive
|
||||
Guide</citetitle> is available <ulink
|
||||
url="http://www.oreilly.com/catalog/docbook/">in print</ulink> and
|
||||
<ulink url="http://www.docbook.org/tdg/en/html/docbook.html">on the
|
||||
web</ulink>. This is indeed the definitive reference, but as an
|
||||
introduction or tutorial it's a disaster. Instead, read this:</para>
|
||||
|
||||
<para><ulink url="http://www.bureau-cornavin.com/opensource/crash-course/index.html">Writing
|
||||
Documentation Using DocBook: A Crash Course</ulink>. This is an excellent
|
||||
tutorial.</para>
|
||||
|
||||
<para>If you're writing for the Linux Documentation Project, read the
|
||||
<ulink url="http://www.linuxdoc.org/LDP/LDP-Author-Guide/index.html">
|
||||
LDP Author Guide</ulink>.</para>
|
||||
|
||||
<para>The best general introduction to SGML and XML that I've
|
||||
personally read all the way through is David Megginson's <ulink
|
||||
url="http://vig.pearsoned.com/store/product/0,,store-562_banner-0_isbn-0136422993,00.html">Structuring
|
||||
XML Documents</ulink> (Prentice-Hall, ISBN: 0-13-642299-3).</para>
|
||||
|
||||
<para>For XML only, <ulink
|
||||
url="http://www.oreilly.com/catalog/xmlnut2/">XML In A Nutshell</ulink>
|
||||
by W. Scott Means and Elliotte "Rusty" Harold is very good.</para>
|
||||
|
||||
<para><ulink url="http://www.ibiblio.org/xml/books/bible/">The XML
|
||||
Bible</ulink> looks like a pretty comprehensive reference on XML and
|
||||
related standards (including Formatting Objects).</para>
|
||||
|
||||
<para>Finally, the <ulink url="http://xml.coverpages.org/">The XML
|
||||
Cover Pages</ulink> will take you into the jungle of XML standards
|
||||
if you really want to go there.</para>
|
||||
|
||||
</sect1>
|
||||
</article>
|
||||
|
||||
<!-- Keep this comment at the end of the file
|
||||
Local variables:
|
||||
mode: sgml
|
||||
sgml-omittag:t
|
||||
sgml-shorttag:t
|
||||
sgml-namecase-general:t
|
||||
sgml-general-insert-case:lower
|
||||
sgml-minimize-attributes:nil
|
||||
sgml-always-quote-attributes:t
|
||||
sgml-indent-step:1
|
||||
sgml-indent-data:nil
|
||||
sgml-parent-document:nil
|
||||
sgml-exposed-tags:nil
|
||||
sgml-local-catalogs:nil
|
||||
sgml-local-ecat-files:nil
|
||||
End:
|
||||
-->
|
Binary file not shown.
After Width: | Height: | Size: 900 B |
Binary file not shown.
After Width: | Height: | Size: 3.7 KiB |
Binary file not shown.
After Width: | Height: | Size: 3.1 KiB |
Binary file not shown.
After Width: | Height: | Size: 7.1 KiB |
Loading…
Reference in New Issue