The LDP output tree has an inconsistent structure for handling documents of
different types. With the new processing tools, we will be able to produce
a single output directory for each source document. This script allows
the creation of links in the historical LDP output directory to the new
automatically created (and updated) directory.
This document used an endterm in the wrong place and its presence was
confusing the heck out of the FO processor:
<xref endterm="partitiontable" linkend="partitiontable" />
The endterm="" attribute tells the DocBook processor to copy the content found
at the linkend inline. The problem is that the content at this particular
linkend was an entire table. This meant that the FO processor was receiving
an entire (DocBook) table (as FO, of course) and futilely trying to render it
inline.
Removing the endterm="partitiontable" allows the document to be processed by
FOP into a PDF.
Author contacted, responded quickly, provided the missing old files
and confirmed that it was acceptable to comment out the reference to
../openMosix-2.6-HOWTO/openMosix-2.6-HOWTO-content.sgml
Source available: https://github.com/KrisBuytaert/openmosix-howto
Author contacted, responded quickly, provided the missing old files
and confirmed that it was acceptable to comment out the reference to
../openMosix-2.6-HOWTO/openMosix-2.6-HOWTO-content.sgml
Source available: https://github.com/KrisBuytaert/openmosix-howto
Generating PDF outputs (today), requires using <mediaobject> and supplying a
file that can be converted into a print-consumable by the TeX engine.
I added .eps files (thank you, ImageMagick) to allow regeneration and also
added a file called 'image-missing' for the peculiar absence of a file
called OREILLY.BIND.DIAGRAM.
Below comment is reproduced in the doc-index.sgml, which
was generated using collateindex.pl and then hand-tuned into
valid DocBook SGML (version 3.1).
The stock collateindex.pl (md5=2e36626ed6709e5ba0e6af0999bc3102, in
dsssl-stylesheets-1.79) program makes a few mistakes in generating
the complex index below.
It creates a nesting problem for <secondaryie/> elements which contain
both a <ulink/> and have a subsequent <seeie/>. In that case, it
orphans the <ulink/> elements beneath the <seeie/> AFTER closing the
<secondaryie/>. I can't figure out a patch to collateindex.pl, so
hand-fixed the 5 entries and am committing this as is for TLDP.
The desired output formats can be tweaked by setting the value of the
parameter entities %output.print.png% (and friends) to "INCLUDE";
The document still needed a few corrections, namely the removal of two
extraneous </listitem> elements and a few <application/> and <acronym/>
elements that were in illegal locations (for example as a child of the
<contrib> element).
The qandaset had a defaultlabel="none" attribute. This is DocBook legal, but
the XSLT layer was producing FO output that included an empty
fo:list-item-label, with the following error message:
"fo:list-item-label" is missing child elements. Required content model:
marker* (%block;)+
By omitting the defaultlabel="none", the entire problem disappears.
Also, adjusted paths for reference to the ./Annimals/ which are now in
./resources/Annimals/.
The images were supplied here as a tarball, which meant that the DocBook
processor could not read them directly out of the version control system;
adding directories for the ./images/ and the ./resources/ (which contains
Annimals subdirectory and one chap4sec26 file.
Using the HTTP variant of the system identifier; let the local DocBook
installation map that system identifier to the local filesystem for us.
Replacing two literal < with <.
The Template-Big-HOWTO.sgml contained references to images that were not
present in the VCS. I located green.gif and red.gif in the ancient Linux
Gazette materials and added them here, along with a few .eps files for print
outputs.
The markup in this document made plenty of references to elements that
post-date the DocBook 3.0 specification (e.g. <mediaobject/>).
Fortunately, with one or two minor corrections to the nesting of elements, the
newer revision of DocBook can validate the document.
first, xsltproc (and friends) did not like the duplication of id="A" in both
the gloss.xml and index-gloss.xml; so renaming the IDs solved that problem
second, fop complained that empty gloss entries existed; no problem after
commenting them out
find the Makefile in the same directory as this script itself
run entire script under "set -e" to stop on errors
(and invert usage of "test" logic so as not to trigger an error)
compare the stems of all documents in the source tree and output tree
and report on which documents are in source, but not output, as
well as orphaned documents in output tree
I noticed when watching logs go scrolling by, that the Makefile
appeared to include several lines with #011 and some without; bad
editing on my part left some lines indented with spaces instead of
tabs. Repaired.
There was a bare ampersand in a <title/>, replaced with &
Quite a few <programlisting/> elements also contained bare ampersands, so
each <programlisting/> earned a child element of <![CDATA[]]>
One <parameter/> element lacked quotations on the value for its
class attribute. Corrected.
xsltproc and friends will happily generate the index in one
pass, obviating the need for the separate index.xml file, which
must be generated using the jade toolchain; thus commenting out
the (missing) external index and generating it on the fly;
N.B. there were only two </indexterm> entries in the document, but
at least they were present
The additional programs and content were stored in a directory called
ncurses_programs and were referred to in the document--the problem is that
only ./images/ were copied to the output tree, so HTML versions of the
document would fail in building AND the files would not be viewable.
Adjusted that be creating a ./resources/ directory along the same line
as the ./images/ directory. This can be changed, if desired, but this
allows for automated publication of the document. (Side benefit: this is
generalizable to all other TLDP documents.)
When generating a book (or article) index, the filename is index.sgml. Not an
issue with DocBook XML, only with the older tools which make several passes
over the input sources to create the index data (output as SGML in a file
called index.sgml) and then incorporate that into the final document.
removing extraneous and empty <author/>, non-validating <toc/>
replacing an <ulink/> with a mailto: with an <email/> element
stuffing an & in the url="" attribute value for a <ulink/>
document now validates
Though it is legal to define parameter entity substitutions in a
DTD's internal subset, you cannot actually use them there. It is only
legal to use a parameter entity substitution in an external subset.
See this:
https://www.w3.org/TR/1998/REC-xml-19980210#wfc-PEinInternalSubset
Therefore, for the XML-RPC-HOWTO, the %GoodStyleSheets; definition is
left uncommented, but the suppression of the alternate definition of
legal.notice is commented out.
Document validates and processes correctly, now, using the legal.notice
definition intended. N.B., there is no actual difference in the license
specified, just the markup used to communicate it.
The <emphasis/> tag cannot live inside a <systemitem/> or a <literal/>, but
it can certainly surround these elements; inverting to allow for validation
and processing.
first line of fdl.xml with XML text declaration was confusing xsltproc;
removed and things were fine, also removed commented out DOCTYPE declaration
used URI for DocBook 4.1.2 in system identifier in rpmupgrade.xml
commented out <xref endterm="xrefdemo" linked="xrefdemo" /> which was causing
a recursion error in the DocBook XSLT layer
document now validates and builds (except for PDF)
files were declared with <?xml version="1.0" encoding="UTF-8"?>
but were definitely not UTF-8; corrected and added Unicode BOM
Linux-Networking.xml: fixed a doubly-closed </ulink>
Overview.xml: there were several large pasted sections of text which
contained characters understood to be markup; wrapped the entire
sections in <![CDATA[ ]]> blocks;
Protocols-Standards-Services.xml: closing </sect1> tags cannot have id="" on
them: removed these; wrapped several email addresses with <email/> to allow
validation; fixed tons of URLs with proper <ulink/> elements; wrapped a few
pasted sections in <![CDATA[ ]]> blocks;
jade processor was very unhappy to have closing tags on <imagedata>;
removing them (which felt like violence) allowed the document to be
validated and processed
In his document build system, each script is cooked (wrapped in <![ CDATA []]>
during build process--this is elegant, but we are not calling his build
Makefile, so we need to store the cooked scripts in the source tree.
jade did not like trying to process <graphic/> elements where it could not
find the source images; also, it wanted the filename extension included (did
early DocBook SGML processing tools simply locate an image of the preferred
type?)
Commented out an <inlinemediaobject/> which was using processing instructions
(IGNORE/INCLUDE) to control selection of several <imageobject/> children and
one <textobject/> child; jade got confused because there was no <imageobject/>
being selected, only a <textobject/> so refused to process the document
Also, cleaned up some stray </listitem> and missing <emphasis> elements
throughout the document. Squashed one stray </chapter> ending the
bibliography and removed an attribute (title=) that was forbidden on a
<bibliography/> element.
The DOCTYPE specification was missing the System Identifier
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"
Processing is still not validating properly, because the toolchain used to
generate this, before, created the index.xml file on a first pass, and then
the actual document on the second pass, in the style of SGML tools.
I would like to figure out how to support both with a processing instruction.
Georg Käfer had to become Georg Käfer for SGML, anyway; the
problem was not in the generation of HTML, but rather PDF output
through the tex engine