new

2002-09-23 20:53:14 +00:00 · 2002-09-23 20:53:14 +00:00 · 5de5d9e086
parent efcdecf7cf
commit 5de5d9e086
5 changed files with 738 additions and 0 deletions
--- a/LDP/howto/docbook/DocBook-Demystification-HOWTO/DocBook-Demystification-HOWTO.xml
+++ b/LDP/howto/docbook/DocBook-Demystification-HOWTO/DocBook-Demystification-HOWTO.xml
@ -0,0 +1,738 @@
+<?xml version="1.0"?>
+<!DOCTYPE article PUBLIC  "-//OASIS//DTD DocBook XML V4.1.2//EN"
+    "http://docbook.org/xml/4.1.2/docbookx.dtd" [
+<!ENTITY howto         "http://tldp.org/HOWTO/">
+<!ENTITY mini-howto    "http://tldp.org/HOWTO/mini/">
+]>
+
+<article>
+<articleinfo>
+  <title>DocBook Demystification HOWTO</title>
+
+  <author>
+     <firstname>Eric</firstname>
+     <surname>Raymond</surname>
+     <affiliation>
+        <address>
+           <email>esr@thyrsus.com</email>
+        </address>
+     </affiliation>
+  </author>
+
+  <revhistory>
+     <revision>
+	<revnumber>v1.0</revnumber>
+	<date>2001-09-20</date>
+	<authorinitials>esr</authorinitials>
+	 <revremark>
+	   Initial version.
+	</revremark>
+     </revision>
+  </revhistory>
+
+  <abstract><para>
+  This HOWTO attempts to clear the fog and mystery surrounding the
+  DocBook markup system and the tools that go with it.  It is aimed at
+  authors of technical documentation for open-source projects hosted
+  on Linux, but should be useful for people composing other kinds on
+  other Unixes as well.  
+  </para></abstract>
+
+</articleinfo>
+
+<sect1 id="intro"><title>Introduction</title>
+
+<para>A great many major open-source projects are converging on
+DocBook as a standard format for their documentation &mdash; projects
+including the Linux kernel, GNOME, KDE, Samba, and the Linux
+Documentation Project.  The advocates of XML-based "structural markup"
+(as opposed to the older style of "presentation markup" exemplified by
+troff, Tex, and Texinfo) seem to have won the theoretical
+battle.</para>
+
+<para>Nevertheless, a lot of confusion surrounds DocBook and the
+programs that support it.  Its devotees speak an argot that is dense
+and forbidding even by computer-science standards, slinging around
+acronyms that have no obvious relationship to the things you need to
+do to write markup and make HTML or Postscript from it.  XML standards
+and technical papers are notoriously obscure.  Most DocBook-related
+tools are very poorly documented, and their documentation is
+especially prone to assume way too much prior knowledge on the
+reader's part.</para>
+
+<para>This HOWTO will attempt to clear up the major mysteries
+surrounding DocBook and its application to open-source documentation
+&mdash; both the technical and political ones.  Our objective is to equip
+you to understand not just what you need to do to make documents, but
+why the process is as complex as it is &mdash; and how it can be
+expected to change as newer DocBook-related tools become
+available.</para>
+
+</sect1>
+<sect1><title>Why care about DocBook at all?</title>
+
+<para>There are two possibilities that make DocBook really
+interesting.  One is <emphasis>multi-mode rendering</emphasis> and the
+other is <emphasis>searchable documentation
+databases</emphasis>.</para>
+
+<para>Multi-mode rendering is the easier, nearer-term possibility; it's
+the ability to write a document in a single master format that can be
+rendered in many different display modes (in particular, as both HTML
+for on-line viewing and as Postscript for high-quality printed
+output).  This capability is pretty well implemented now.</para>
+
+<para><emphasis>Searchable documentation databases</emphasis> is
+shorthand for the possibility that DocBook might help get us to a
+world in which all the documentation on your open-source operating
+system is one rich, searchable, cross-indexed and hyperlinked
+database (rather than being scattered across several different formats
+in multiple locations as it is now).</para>
+
+<para>Ideally, whenever you install a software package on your machine 
+it would register its DocBook documentation into your system's
+catalog.  HTML, properly indexed and cross-linked to the HTML in the 
+rest of your catalog, would be generated.  The new package's
+documentation would then be available through your browser.  All
+your documentation would would be searchable through an interface
+resembling a good Web search engine.</para>
+
+<para>HTML itself is not quite rich enough a format to get us to that
+world.  To name just one lack, you can't explicitly declare index
+entries in HTML.  DocBook <emphasis>does</emphasis> have the semantic
+richness to support structured documentation databases.  Fundamentally
+that's why so many projects are adopting it.</para>
+
+<para>DocBook has the vices that go with its virtues.  Some people
+find it unpleasantly heavyweight, and too verbose to be really
+comfortable as a composition format.  That's OK; as long as the markup
+tools they like (things like Perl POD or GNU Texinfo) can generate
+DocBook out their back ends, we can all still get we want.  It doesn't
+matter whether or not everybody writes in DocBook &mdash; as long as
+it becomes the common document interchange format that everyone uses,
+we'll still get unified searchable documentation databases.</para>
+
+</sect1>
+<sect1><title>Structural markup: a primer</title>
+
+<para>Older formatting languages like Tex, Texinfo, and Troff
+supported <indexterm><primary>presentation
+markup</primary></indexterm>.  In these systems, the instructions you
+gave were about the appearance and physical layout of the text (font
+changes, indentation changes, that sort of thing).</para>
+
+<para>Presentation markup was adequate as long as your objective was
+to print to a single medium or type of display device.  You run into
+its limits, however, when you want to mark up a document so that (a)
+it can be formatted for very different display media (such as printing
+vs. Web display), or (b) you want to support searching and indexing the
+document by its logical structure (as you are likely to want to do,
+for example, if you are incorporating it into a hypertext system).</para>
+
+<para>To support these capabilities properly, you need a system of
+<indexterm><primary>structural markup</primary></indexterm>.  In structural
+markup, you describe not the physical appearance of the document but
+the logical properties of its parts.</para>
+
+<para>As an example: In a presentation-markup language, if you want to
+emphasize a word, you might instruct the formatter to set it in
+boldface.  In
+<citerefentry><refentrytitle>troff</refentrytitle><manvolnum>1</manvolnum></citerefentry>
+this would look like so:</para>
+
+<programlisting>
+All your base
+.B are
+belong to us!
+</programlisting>
+
+<para>In a structural-markup language, you would tell the formatter to
+emphasize the word:</para>
+
+<programlisting>
+All your base &lt;emphasis&gt;are&lt;/emphasis&gt; belong to us!
+</programlisting>
+
+<para> The "&lt;emphasis&gt;" and &lt;/emphasis&gt;in the line above
+are called <indexterm><primary>markup tags</primary></indexterm>, or
+just <firstterm>tags</firstterm> for short.  They are the instructions
+to your formatter.</para>
+
+<para>In a structural-markup language, the physical appearance of the
+final document would be controlled by a
+<indexterm><primary>stylesheet</primary></indexterm>.  It is the
+stylesheet that would tell the formatter "render emphasis as a font
+change to boldface".  One advantage of presentation-markup languages
+is that by changing a stylesheet you can globally change the
+presentation of the document (to use different fonts, for example)
+without having to hack all the the individual instances of (say)
+<markup>.B</markup> in the document itself.</para>
+
+</sect1>
+<sect1><title>Document Type Definitions</title>
+
+<para>(Note: to keep the explanation simple, most of this
+section is going to tell some lies, mainly by omitting a lot of 
+history.  Truthfulness will be fully restored in a following
+section.)</para>
+
+<para>DocBook is a structural-level markup language.  Specifically, it
+is a dialect of XML.  A DocBook document is a hunk of XML that uses
+XML tags for structural markup.</para>
+
+ <para>In order for a document formatter to apply a stylesheet to your
+document and make it look good, it needs to know things about the
+overall structure of your document.  For example, it needs to know
+that a book manuscript normally consists of front matter, a sequence
+of chapters, and back matter in order to physically format chapter
+headers properly.  In order for it to know this sort of thing, you
+need to give it a <indexterm><primary>Document Type
+Definition</primary></indexterm> or DTD. The DTD tells your
+formatter what sorts of elements can be in the document structure, and
+in what orders they can appear.</para>
+
+<para>What we mean by calling DocBook an `application' of XML is
+actually that DocBook is a DTD &mdash; a rather large DTD, with
+somewhere around 400 tags in it.</para>
+
+<para>Lurking behind DocBook is a kind of program called a
+<indexterm><primary>validating parser</primary></indexterm>.When you
+format a DocBook document, the first step is to pass it through a
+validating parser (the front end of the DocBook formatter).  This
+program checks your document against the DocBook DTD to make sure you
+aren't breaking any of the DTD's structural rules (otherwise the back
+end of the formatter, the part that applies your style sheet, might
+become quite confused)</para>
+
+<para>The validating parser will either bomb out, giving you error
+messages about places where the document structure is broken, or translate
+the document into a stream of <firstterm>formatting events</firstterm>
+which the parser back end combines with the information in your stylesheet
+to produce formatted output</para>
+
+<para>Here is a diagram of the whole process:</para>
+
+<mediaobject>
+<imageobject> <imagedata fileref="figure1.png" format="PNG"/></imageobject>
+</mediaobject>
+
+<para>The part of the diagram inside the dotted box is your formatting
+software, or <firstterm>toolchain</firstterm>. Besides the obvious and
+visible input to the formatter (the document source) you'll need to
+keep the two `hidden' inputs of the formatter (DTD and stylesheet) in
+mind to understand what follows.</para>
+</sect1>
+<sect1><title>Other DTDs</title>
+
+<para>A brief digression into other DTDs may help make clear what parts of
+the previous section were specific to DocBook and what parts are general to
+all structural-markup languages.</para>
+
+<para><ulink url="http://www.tei-c.org/">TEI</ulink> (Text Encoding
+Initiative) is a large, elaborate DTD used primarily in academia for
+computer transcription of literary texts.  TEI's Unix-based toolchains
+use many of the same tools that are involved with DocBook, but with
+different stylesheets and (of course) a different DTD.</para>
+
+<para>XHTML, the latest version of HTML, is also an XML application
+described by a DTD, which explains the family resemblance between
+XHTML and DocBook tags. The XHTML toolchain consists of web browsers
+and a number of ad-hoc HTML-to-print utilities.</para>
+
+<para>Many other XML DTDs are maintained to help people exchange
+structured information in fields as diverse as bioinformatics and
+banking.  You can look at a <ulink
+url="http://www.xml.com/pub/rg/DTD_Repositories"> list of
+repositories</ulink> to get some idea of the variety out
+there.</para>
+
+</sect1>
+<sect1><title>The DocBook toolchain</title>
+
+<para>Normally, what you'll do to make XHTML from your
+DocBook sources will look like this:</para>
+
+<screen>
+bash$ xmlto xhtml foo.xml
+Convert to XHTML
+bash$ ls *.html
+ar01s02.html ar01s03.html ar01s04.html index.html
+</screen>
+
+<para>In this example, you converted an XML-Docbook  document named 
+<filename>foo.xml</filename> with three top-level sections into an
+index page and two parts.  Making one big page is just as easy:</para>
+
+<screen>
+bash$ xmlto xhtml-nochunks foo.xml
+Convert to XHTML
+bash$ ls *.html
+foo.html
+</screen>
+
+<para>Finally, here is how you make Postscript for printing:</para>
+
+<screen>
+bash$ xmlto ps foo.xml       # To make Postscript
+Convert to XSL-FO
+Making portrait pages on A4 paper (210mmx297mm)
+Post-process XSL-FO to DVI
+Post-process DVI to PS
+bash$ ls *.ps
+foo.ps
+</screen>
+
+<para>To turn your documents into HTML or Postscript, you need an
+engine that can apply the combination of DocBook DTD and 
+a suitable stylesheet to your document.  Here is how the 
+open-source tools for doing this fit together:</para>
+
+<mediaobject>
+<imageobject> <imagedata fileref="figure2.png" format="PNG"/></imageobject>
+</mediaobject>
+
+<para>Parsing your document and applying the stylesheet transformation
+will be handled by one of three programs.  The most likely one is
+<indexterm><primary>xsltproc</primary></indexterm>, the parser
+that ships with Red Hat 7.3.  The other possibilities are two Java
+programs, <indexterm><primary>Saxon</primary></indexterm> and
+<indexterm><primary>Xalan</primary></indexterm>,</para>
+
+<para>It is relatively easy to generate high-quality XHTML from either
+DocBook; the fact that XHTML is simply another XML DTD helps a lot.
+Translation to HTML is done by applying a rather simple stylesheet,
+and that's the end of the story.  RTF is also simple to generate in
+this way, and from XHTML or RTF it's easy to generate a flat ASCII
+text approximation in a pinch.</para>
+
+<para>The awkward case is print.  Generating high-quality printed
+output (which means, in practice, Adobe's
+<indexterm><primary>PDF</primary></indexterm> (Portable Document
+Format) is difficult.  Doing it right requires algorithmically
+duplicating the delicate judgments of a human typesetter moving from
+content to presentation level.</para>
+
+<para>So, first, a stylesheet translates Docbook's structural markup
+into another dialect of XML &mdash;
+<indexterm><primary>FO</primary></indexterm> (Formatting Objects).  FO
+markup is very much presentation-level; you can think of it as a sort
+of XML functional equivalent of troff.  It has to be translated to
+Postscript for packaging in a PDF.</para>
+
+<para>In the toolchain shipped with Red Hat, this job is handled by a
+TeX macro package called
+<indexterm><primary>PassiveTeX</primary></indexterm>. It translates the
+formatting objects generated by <command>xsltproc</command> into
+Donald Knuth's TeX language.  TeX was one of the earliest open-source
+projects, an old but powerful presentation-level formatting language
+much beloved of mathematicians (to whom it provides particulaly
+elaborate facilities for describing mathematical notation).  TeX is
+also famously good at basic typesetting tasks like kerning, line
+filling, and hyphenating.  TeX's output, in what's called
+<indexterm><primary>DVI</primary></indexterm> (DeVice Independent)
+format, is then massaged into PDF.</para>
+
+<para>If you think this bucket chain of XML to Tex macros to DVI to
+PDF sounds like an awkward kludge, you're right.  It clanks, it
+wheezes, and it has ugly warts.  Fonts are a significant problem,
+since XML and TeX and PDF have very different models of how fonts
+work; also, handling internationalization and localization is a
+nightmare. About the only thing this code path has going for it is
+that it works.</para>
+
+<para>The elegant way will be
+<indexterm><primary>FOP</primary></indexterm>, a direct
+FO-to-Postscript translator being developed by the Apache project.
+With FOP, the internationalization problem is, if not solved, at least
+well confined; XML tools handle Unicode all the way through to FOP.
+Glyph to font mapping is also strictly FOP's problem.  The only
+trouble with this approach is that it doesn't work &mdash; yet.  As of
+August 2002 FOP is in an unfinished alpha state &mdash; usable, but
+with rough edges and missing features.</para>
+
+<para>Here is what the FOP toolchain looks like:</para>
+
+<mediaobject>
+<imageobject> <imagedata fileref="figure3.png" format="PNG"/></imageobject>
+</mediaobject>
+
+<para>FOP has competition.  There is another project called
+<indexterm><primary>xsl-fo-proc</primary></indexterm> which aims to do
+the same things as FOP, but in C++ (and therefore both faster than
+Java and not relying on the Java environment).  As of August 2002 FOP
+is in an unfinished alpha state, not as far along as FOP.</para>
+
+</sect1>
+<sect1><title>Who are the projects and the players?</title>
+
+<para>The DocBook DTD itself is maintained by the DocBook Technical
+Committee, headed by Norman Walsh.  Norm is the inventor of DocBook, a
+man who has focused remarkable energy and talent over many years on
+the extremely complex problems it addresses.  He is as universally
+respected in the DocBook/SGML/XML community as Linus Torvalds is in
+the Linux world.</para>
+
+<para>The <ulink url="http://sources.redhat.com/docbook-tools/">
+docbook-tools</ulink> project provides open-source tools for
+converting SGML DocBook to HTML, Postscript, and other formats.  This
+package is shipped with Red Hat and other Linux distributions.  It is
+maintained by Mark Galassi.</para>
+
+<para><ulink url="http://www.jclark.com/jade/">Jade</ulink> is an
+engine used to apply DSSSL stylesheets to SGML documents.  It is
+maintained by James Clark.</para>
+
+<para><ulink url="http://openjade.sourceforge.net/">OpenJade</ulink>
+is a community roject undertaken because the founders thought James
+Clark's maintainance of Jade was spotty. The docbook-tools programs
+use OpenJade.</para>
+
+<para><ulink url="http://xmlsoft.org/XSLT/">libxslt</ulink> is a C
+library that interprers XSLT, applying stylesheets to XML documents.
+It includes a wrapper program, <command>xsltproc</command>, that can be
+used as an XML formatter.  The code was written by Daniel Veillard
+under the auspices of the GNOME project, but does not require any
+GNOME code to run.  I hear it's blazingly fast compared to the 
+Java alternatives, not a surprising claim.</para>
+
+<para><ulink url="http://cyberelk.net/tim/xmlto/">xmlto</ulink> is the
+user interface of the XML toolchain that Red Hat ships.  It's written
+and maintained by Tim Waugh.</para>
+
+<para><ulink url="http://users.iclway.co.uk/mhkay/saxon/">Saxon</ulink>
+and <ulink url="http://xml.apache.org/xalan-j/">Xalan</ulink> are Java
+programs that interpret XSLT.  Saxon seems to be designed to work
+under Windows.  Xalan is part of the XML Apache project and native to
+Linux and BSD; it's designed to work with FOP.</para>
+
+<para><ulink url="http://jadetex.sourceforge.net/">JadeTex</ulink> is
+the package of LaTeX macros that OpenJade uses for producing DVI.  
+<ulink url="http://users.ox.ac.uk/~rahtz/passivetex/">PassiveTeX</ulink>
+performs a similar function on the XML side.</para>
+
+<para><ulink url="http://xml.apache.org/fop/">FOP</ulink> translates
+XML Formatting Objects to PDF.  It is part of the Apache XML project
+and is designed to work with Xalan.</para>
+
+</sect1>
+<sect1><title>Migration tools</title>
+
+<para>The second biggest problem with DocBook is the effort needed to
+convert old-style presentation markup to DocBook markup.  Human beings
+can usually parse the presentatition of a document into logical
+structure automatically, because (for example) they can tell from 
+context when an italic font means `emphasis' and when it meabs
+something else such as `this is a foreign phrase'.</para>
+
+<para>Somehow, in converting documents to DocBook, those
+sorts of distinctions need to be made explicit.  Sometimes
+they're present in the old markup; often they are not, and the
+missing  structural information has to be either deduced by 
+clever heuristics or added by a human.</para>
+
+<para>Here is a summary of the state of conversion tools from
+various other formats:</para>
+
+<variablelist>
+<varlistentry>
+<term>GNU Texinfo</term>
+<listitem>
+<para>The Free Software Foundation has made a policy decision to move 
+towards DocBook and away from Texinfo, its traditional format.
+Texinfo has enough structure to make reasonably good automatic
+conversion possible, and the 4.x versions of <command>makeinfo</command>
+feature a <option>--docbook</option> switch that generates DocBook.
+More at the <ulink url="http://www.gnu.org/directory/texinfo.html">makeinfo
+project page</ulink>.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>POD</term>
+<listitem>
+<para>There is a <ulink
+url="http://www.cpan.org/modules/by-module/Pod/">POD::DocBook</ulink>
+module that translates Plain Old Documentation markup to DocBook.  It
+claims to support every DocBook tag except the L&lt;&gt; italic tag.
+The man page also says "Nested =over/=back lists are not supported
+within DocBook." but notes that the module has been heavily
+tested.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>LaTeX</term>
+<listitem>
+<para>LaTeX is a (mostly) structural markup macro language built on
+top of the TeX formatter.  There is a project called <ulink
+url="http://www.lrz-muenchen.de/services/software/sonstiges/tex4ht/mn.html">
+TeX4ht</ulink>that (according to the author of PassiveTeX) can
+generate DocBook from LaTeX.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>man pages and other troff-based markups</term>
+<listitem>
+<para>This is generally considered the biggest and nastiest conversion
+problem.  And indeed, the basic
+<citerefentry><refentrytitle>troff</refentrytitle>
+<manvolnum>1</manvolnum></citerefentry> markup is at too low a presentation
+level for automatic conversion tools to do much of any good.  However,
+the gloom in the picture lightens significantly if we consider
+translation from sources of documents written in macro packages like
+<citerefentry><refentrytitle>man</refentrytitle>
+<manvolnum>7</manvolnum></citerefentry>.  These have enough structural
+features for automatic translation to get some traction.</para>
+
+<para>I wrote a tool to do this myself, because I couldn't find
+anything else that did a half-decent job of it (and the problem is
+interesting).  It's called <ulink
+url="http://www.tuxedo.org/~esr/doclifter/">doclifter</ulink>.  It will
+translate to either SGML or XML DocBook from
+<citerefentry><refentrytitle>man</refentrytitle>
+<manvolnum>7</manvolnum></citerefentry>,
+<citerefentry><refentrytitle>mdoc</refentrytitle>
+<manvolnum>7</manvolnum></citerefentry>,
+<citerefentry><refentrytitle>ms</refentrytitle>
+<manvolnum>7</manvolnum></citerefentry>, or
+<citerefentry><refentrytitle>me</refentrytitle>
+<manvolnum>7</manvolnum></citerefentry> macros.  See the documentation
+for details.</para>
+</listitem>
+</varlistentry>
+</variablelist>
+
+</sect1>
+<sect1><title>Editing tools</title>
+
+<para>One thing we presently do not have is a good open-source
+structure editor for SGML/XML documents.</para>
+
+<para><ulink url="http://www.lyx.org/">LyX</ulink> is a GUI word processor
+that uses LaTeX for printing and supports structural editing of LaTeX
+markup.  There is a LaTeX package that generates DocBook, and a
+<ulink url="http://bgu.chez.tiscali.fr/doc/db4lyx/">how-to document</ulink>
+escribing how to write SGML and XML in the LyX GUI.</para>
+
+<para><ulink url="http://idx-getox.idealx.org/">GeTox</ulink>, the
+GNOME XML Editor, aims at nontechnical users.  But the software is
+still (as of August 2001) alpha, more a proof of concept than anything
+useful, and the project group seems not to be very active; there have
+been no updates of the website between May 2001 and August 2002 (time of
+writing).</para>
+
+<para><ulink
+url="http://www.math.u-psud.fr/~anh/TeXmacs/TeXmacs.html"> GNU
+TeXMacs</ulink> is a project aimed at producing an editor that is good
+for technical and mathematical material, including displayed formulas.
+1.0 was released in April 2002.  The developers plan XML support in
+the future, but it's not there yet.</para>
+
+<para><ulink url="http://www.freesoftware.fsf.org/thotbook/">ThotBook</ulink>
+is a project to put together a GUI editor for DocBook based on
+the Thot toolkit.  It way be moribund; the web page was not updated
+from November 2001 to August 2002 (time of writing).</para>
+
+<para>Most people still hack the tags by hand using either vi or Emacs, using
+psgml to validate the results.</para>
+
+</sect1>
+<sect1><title>Related standards and practices</title>
+
+<para>The tools are coming together, if slowly, to edit and format
+DocBook markup. But DocBook itself is a means, not an end.  We'll need
+other standards besides DocBook itself to accomplish the
+searchable-documentation-database objective I laid out at the
+beginning of this document. There are two big issues: document
+cataloguing and metadata.</para>
+
+<para>The <ulink
+url="http://scrollkeeper.sourceforge.net/">Scrollkeeper</ulink>
+project aims directly to meet this need. It provides a simple set of
+script hooks that can be used by package install and uninstall
+productions to register and unregister their documentation.</para>
+
+<para>Scrollkeeper uses the <ulink
+url="http://www.ibiblio.org/osrt/omf/"> Open Metadata Format</ulink>.
+This is a standard for indexing open-source documentation analogous to
+a library card-catalog system.  The idea is to support rich search
+facilities that use the card-catalog metadata as well as the source 
+text of the documentation itself.</para>
+
+</sect1>
+
+<sect1><title>SGML and SGML-Tools</title>
+
+<para>In previous sections, I have thrown away a lot of DocBook's
+history.  XML has an older brother,
+<indexterm><primary>SGML</primary></indexterm> or Standard Generalized
+Markup Language.</para>
+
+<para>Until mid-2002, no discussion of DocBook would have been
+complete without a long excursion into SGML, the differences between
+SGML and XML, and detailed descriptions of the SGML DocBook toolchain.
+Life can be simpler now; a XML DocBook toolchain is available in open
+source, works as well as the SGML toolchain ever did, and is easier to
+use, If you don't think you'll ever have to deal with old SGML-Docbook
+documents, you can skip the remainder of this section.</para>
+
+<sect2><title>DocBook SGML</title>
+
+<para>DocBook was originally an SGML application, and there was an
+SGML-based DocBook toolchain that is now moribund.  There are minor
+differences between the DocBook SGML DTD and the DocBook XML DTD, but
+for an introductory discussion we can ignore them. The only one that's
+normally user-visible is that in SGML contentless tags did not need to
+have a trailing slash added to them before the closing &gt;.
+(Requiring the trailing / means XML parsers can be a lot simpler,
+because they don't have to know about the DTD to know which opening
+tags need closers.)</para>
+
+<para>Versions of HTML up to 4.01 (before XHTML) were SGML
+applications.  TEI was originally an SGML application, too.  The
+groups managing all three DTDs jumped to XML for the same reason
+DocBook's developers did &mdash; it's drastically simpler.  SGML was
+extremely complex; unmanageably so, as it turns out.  The
+specification was a dense 150 pages and it is not reliably reported
+that any software ever fully implemented it.</para>
+
+<para>The toolchain diagram I gave earlier was simplified; it
+only showed the XML toolchain.  Here is the historically
+correct version:</para>
+
+<mediaobject>
+<imageobject><imagedata fileref="figure4.png" format="PNG"/></imageobject>
+</mediaobject>
+
+<para>The DSSSL toolchain is what processed DocBook SGML.
+Under it, a document goes from DocBook format through one of two
+closely-related stylesheet engines called Jade and OpenJade.  These
+turn it into a TeX-macro markup. which is processed by a package called
+JadeTeX, into DVIs, which then get turned into Postscript.</para>
+</sect2>
+
+<sect2><title>Why SGML DocBook is dead</title>
+
+<para>The DSSSL toolchain is, as far as new development goes,
+effectively dead.  The XSLT toolchain has just reached production
+status as I write in August 2002; a working version shipped in Red Hat
+7.3.  It's where DocBook developers are putting almost all of their
+effort.</para>
+
+<para>The reason for the change to XML was threefold.  First,
+SGML turned out to be too complicated to use; then, DSSSL turned out
+to be too complicated to live with; then, significant parts of the
+DSSSL toolchain turned out to be weak and irredeemably messy.</para>
+
+<para>Relative to SGML, XML has a reduced feature set that is
+sufficient for almost all purposes but much easier to understand and
+build parsers for.  SGML-processing tools (such as validating parsers) have
+to carry around support for a lot of features that DocBook and other
+text markup systems never actually used.  Removing these features
+made XML simpler and XML-processing tools faster.</para>
+
+<para>The language used to describe SGML DTDs is sufficiently spiky
+and forbidding that composing SGML DTDs was something of a black art.
+XML DTDs, on the other hand, can be described in a dialect of XML
+itself; there does not need to be a separate DTD language. An XML
+description of an XML DTD is called a
+<indexterm><primary>schema</primary></indexterm>; the term DTD itself
+will probably pass out of use as the standards for schemas firm
+up.</para>
+
+<para>But mostly the DSSSL toolchain is dead because DSSSL itself, the
+SGML stylesheet description language in that toolchain, proved just too
+arcane for most human beings, and made stylesheets too difficult to
+write and modify. (It was a dialect of Scheme.  Your humble editor, a
+LISP-head from way back, shakes his head in sad bemusement that
+this should drive people away.)</para>
+
+<para>XML fans like to sum up all these changes with "XML: tastes great, less
+filling."</para>
+</sect2>
+
+<sect2><title>SGML-Tools</title>
+
+<para>SGML-Tools was the name of a DTD used by the <ulink
+url="http://www.linuxdoc.org">Linux Documentation Project</ulink>,
+developed a few years ago when today's DocBook toolchains didn't exist.
+SGML-Tools markup was simpler, but also much less flexible than
+DocBook.  The original SGML-Tools formatter/DTD/stylesheet(s)
+toolchain has been dead for some time now, but a successor called <ulink
+url="http://sourceforge.net/projects/sgmltools-lite/">SGML-tools
+Lite</ulink> is still maintained.</para>
+
+<para>The LDP has been phasing out SGML-Tools in favor of DocBook, but
+it is still possible you might take over an old HOWTO.  These can be
+regognized by the identifying header "&lt;!doctype linuxdoc
+system&gt;. If this happens to you, convert the thing to XML DocBook
+and give the old version a quick burial.</para>
+</sect2>
+</sect1>
+
+<sect1><title>References</title>
+
+<para>One of the things that makes learning DocBook difficult is that
+the sites related to it tend to overwhelm the newbie with long lists
+of W3C standards, massive exercises in SGML theology, and dense
+thickets of abstract terminology.  We're going to try to avoid that
+here by giving you just a few selected references to look at.</para>
+
+<para>Michael Smith's <ulink
+url="http://xml.oreilly.com/news/dontlearn_0701.html">
+Take My Advice: Don't Learn XML</ulink> surveys the XML world from
+an angle similar to this document.</para>
+
+<para>Norman Walsh's <citetitle>DocBook: The Definitive
+Guide</citetitle> is available <ulink
+url="http://www.oreilly.com/catalog/docbook/">in print</ulink> and
+<ulink url="http://www.docbook.org/tdg/en/html/docbook.html">on the
+web</ulink>.  This is indeed the definitive reference, but as an
+introduction or tutorial it's a disaster.  Instead, read this:</para>
+
+<para><ulink url="http://www.bureau-cornavin.com/opensource/crash-course/index.html">Writing 
+Documentation Using DocBook: A Crash Course</ulink>.  This is an excellent
+tutorial.</para>
+
+<para>If you're writing for the Linux Documentation Project, read the
+<ulink url="http://www.linuxdoc.org/LDP/LDP-Author-Guide/index.html">
+LDP Author Guide</ulink>.</para>
+
+<para>The best general introduction to SGML and XML that I've
+personally read all the way through is David Megginson's <ulink
+url="http://vig.pearsoned.com/store/product/0,,store-562_banner-0_isbn-0136422993,00.html">Structuring
+XML Documents</ulink> (Prentice-Hall, ISBN: 0-13-642299-3).</para>
+
+<para>For XML only, <ulink
+url="http://www.oreilly.com/catalog/xmlnut2/">XML In A Nutshell</ulink>
+by W. Scott Means and Elliotte "Rusty" Harold is very good.</para>
+
+<para><ulink url="http://www.ibiblio.org/xml/books/bible/">The XML
+Bible</ulink> looks like a pretty comprehensive reference on XML and
+related standards (including Formatting Objects).</para>
+
+<para>Finally, the <ulink url="http://xml.coverpages.org/">The XML
+Cover Pages</ulink> will take you into the jungle of XML standards
+if you really want to go there.</para>
+
+</sect1>
+</article>
+
+<!-- Keep this comment at the end of the file
+Local variables:
+mode: sgml
+sgml-omittag:t
+sgml-shorttag:t
+sgml-namecase-general:t
+sgml-general-insert-case:lower
+sgml-minimize-attributes:nil
+sgml-always-quote-attributes:t
+sgml-indent-step:1
+sgml-indent-data:nil
+sgml-parent-document:nil
+sgml-exposed-tags:nil
+sgml-local-catalogs:nil
+sgml-local-ecat-files:nil
+End:
+-->
--- a/LDP/howto/docbook/DocBook-Demystification-HOWTO/figure1.png
+++ b/LDP/howto/docbook/DocBook-Demystification-HOWTO/figure1.png
--- a/LDP/howto/docbook/DocBook-Demystification-HOWTO/figure2.png
+++ b/LDP/howto/docbook/DocBook-Demystification-HOWTO/figure2.png
--- a/LDP/howto/docbook/DocBook-Demystification-HOWTO/figure3.png
+++ b/LDP/howto/docbook/DocBook-Demystification-HOWTO/figure3.png
--- a/LDP/howto/docbook/DocBook-Demystification-HOWTO/figure4.png
+++ b/LDP/howto/docbook/DocBook-Demystification-HOWTO/figure4.png