LDP/LDP/guide/docbook/LDP-Author-Guide/tools-validate.xml

491 lines
20 KiB
XML

<!--
<!DOCTYPE book PUBLIC '-//OASIS//DTD DocBook XML V4.2//EN'>
-->
<section id="tools-validate">
<title>Validation</title>
&why-validate;
<section id="validate-online">
<title>Validation for the Faint of Heart</title>
<para>
Your life is already hard enough without having to install a full
set of tools just to see if you validate as well. You can upload your
raw XML files to a web site, then go to
<ulink url="http://validate.sf.net">http://validate.sf.net</ulink>,
enter the URL to your document, then validate it.
</para>
<note><title>External entities</title><para>
When this information was added to the Author Guide external entities
were not supported. Follow the instructions provided on the
Validate site if you have trouble.
</para></note>
</section>
<section id="validate-local">
<title>Validation for the Not So Faint Of Heart</title>
<indexterm zone="validate-local">
<primary>configurations</primary>
</indexterm>
<section id="config-catalog">
<title>Catalogs</title>
<para>XML and SGML files contain most of the information you need;
however, there are sometimes entities which are specific to SGML in
general. To match these entities to their actual values you need to use a
<emphasis>catalog</emphasis>. The role of a catalog is to tell your
system where to find the files it is looking for. You may want to think of
a catalog as a guide book (or a map) for your tools.</para>
<para>
Most distributions (Red Hat/Fedora and Debian at least) have a common location
for the main SGML catalog file, called <filename>/etc/sgml/catalog</filename>.
In times past, it could also be found in <filename>/usr/lib/sgml/catalog</filename>.
</para>
<para>
The structure of XML catalog files is not the same as SGML catalog files.
The section on tailoring a catalog (see <xref
linkend="making-catalogs"/>) will give more details about what these
files actually contain.
</para>
<para>
If your system cannot find the catalog file, or you are using
custom catalog files, you may need to set the
<envar>SGML_CATALOG_FILES</envar> and
<envar>XML_CATALOG_FILES</envar> environment variables. Using
<command>echo <varname>$SGML_CATALOG_FILES</varname></command>,
check to see if it is currently set. If a blank line is returned,
the variable has not been set. Use the same command to see if
<envar>XML_CATALOG_FILES</envar> is set as well. If the variables
are not set, use the following example to set them now.
</para>
<example id="ex-catalog-files">
<title>Setting the SGML_CATALOG_FILES and XML_CATALOG_FILES
Environmental Variables</title>
<para>
<screen><prompt>bash$</prompt> <command>export</command> <envar>SGML_CATALOG_FILES=&quot;/etc/sgml/catalog&quot;</envar></screen></para>
<para>
To make this change permanent, you can add the following lines to
your <filename>~/.bashrc</filename> file.
</para>
<screen>
<envar>SGML_CATALOG_FILES=&quot;/etc/sgml/catalog&quot;</envar>
<command>export</command> <envar>SGML_CATALOG_FILES</envar>
</screen>
<para>
If you installed XML tools via a RedHat or Debian package, you
probably don't need to do this step. If you are using a custom XML catalog
you will definitely need to do this. There is more on custom catalogs in the next
section. To ensure my backup scripts grab this custom file, I have added
mine in a sub-directory of my home directory named <quote>docbook</quote>.
</para>
<para>
<screen><prompt>bash$</prompt> <command>export</command> <envar>XML_CATALOG_FILES=&quot;/home/user/docbook/db-catalog.xml&quot;</envar></screen></para>
<para>
You can also change your <filename>.bashrc</filename> if you want to
save these changes.</para>
<screen>
<envar>XML_CATALOG_FILES=&quot;/home/user/docbook/db-catalog.xml&quot;</envar>
<command>export</command> <envar>XML_CATALOG_FILES</envar>
</screen>
<para>
If you are adding the changes to your
<filename>.bashrc</filename> you will not see the changes
until you open a new terminal window. To make the changes immediate in the current terminal,
<quote>source</quote> the configuration file.
</para>
</example>
<indexterm zone="ex-catalog-files">
<primary>configurations</primary>
<secondary>variables</secondary>
<tertiary>SGML_CATALOG_FILES</tertiary>
</indexterm>
</section>
</section> <!-- config-catalog -->
<section id="making-catalogs">
<title>Creating and modifying catalogs</title>
<indexterm zone="making-catalogs">
<primary>catalog</primary>
<secondary>creating</secondary>
</indexterm>
<indexterm zone="making-catalogs">
<primary>catalog</primary>
<secondary>modifying</secondary>
</indexterm>
<para>
In the previous section I mentioned a catalog is like a guide book for
your tools. Specifically, a catalog maps the rules from the public
identifier to your system's files.
</para>
<para>
At the top of every DocBook (or indeed every XML) file there is a
DOCTYPE which tells the processing tool what kind of document it is
about to be processed. At a minimum this declaration will include a public
identifier, such as <literal>-//OASIS//DTD DocBook
V4.2//EN</literal>. This public identifier has a number of sections all
separated by <literal>//</literal>. It contains the following
information: ISO standard if any (<literal>-</literal> -- in this case
there is no ISO standard),
author (OASIS), type of document (DTD DocBook V4.2), language
(English). Your DOCTYPE may also include a URL.
</para>
<para>
A public identifier is useless to a processing tool, as it needs to be
able to access the actual DTD. A URL is useless if the processing tool
is off-line. To help your processor deal with these problems you can
download all of the necessary files and then <quote>map</quote>
them for your processing tools by using a catalog.
</para>
<para>
If you are using SGML processing tools (for instance Jade), you will need an
SGML catalog. If you are using XML processing tools (like XSLT), you
will need an XML catalog. Information on both is included.
</para>
<section id="catalog-sgml">
<title>SGML Catalogs</title>
<example id="example-catalog-sgml">
<title>Example of an SGML catalog</title>
<programlistingco>
<areaspec>
<area coords="1" id="ex.catalog.comment"/>
<area coords="5" id="ex.catalog.definition"/>
<area coords="11" id="ex.catalog.eof"/>
</areaspec>
<programlisting>
-- Catalog for the Conectiva Styles --
OVERRIDE YES
PUBLIC &quot;-//Conectiva SA//DTD DocBook Conectiva variant V1.0//EN&quot;
&quot;/home/ldp/styles/books.dtd&quot;
DELEGATE &quot;-//OASIS&quot;
&quot;/home/ldp/SGML/dtds/catalog.dtd&quot;
DOCTYPE BOOK /home/ldp/SGML/dtds/docbook/db31/docbook.dtd
-- EOF --
</programlisting>
<calloutlist>
<callout arearefs="ex.catalog.comment">
<para> Comment. Comments start with <quote>--</quote> and
follow to the end of the line. </para>
</callout>
<callout arearefs="ex.catalog.definition">
<para> The public type association <parameter
class="option">&quot;-//Conectiva SA//DTD books V1.0//EN&quot;</parameter>
with the file <filename>books.dtd</filename> on the directory <filename
class="directory">/home/ldp/styles</filename>. </para>
</callout>
<callout arearefs="ex.catalog.eof">
<para> Comment signifying the end of the file.</para>
</callout>
</calloutlist>
</programlistingco>
</example>
<para>As in the example above, to associate an identifier to a file just
follow the sequence shown:</para>
<orderedlist>
<listitem>
<para>Copy the identifier <emphasis>PUBLIC</emphasis></para>
</listitem>
<listitem>
<para>Type the identifying text </para>
</listitem>
<listitem>
<para>Indicate the path to the associated file</para>
</listitem>
</orderedlist>
<section id="making-catalogs-commands">
<title>Useful commands for catalogs</title>
<para>The most common mappings to be used in catalogs are:</para>
<variablelist>
<varlistentry>
<term><literal>PUBLIC</literal></term>
<listitem>
<para>The keyword <literal>PUBLIC</literal> maps
public identifiers for identifiers on the system.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>SYSTEM</literal></term>
<listitem>
<para>The <literal>SYSTEM</literal> keyword maps
system identifiers for files on the system.</para>
<informalexample>
<para>
SYSTEM
&quot;http://nexus.conectiva/utilidades/publicacoes/livros.dtd&quot;
&quot;publicacoes/livros.dtd&quot;</para>
</informalexample>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>SGMLDECL</literal></term>
<listitem>
<para>The keyword <literal>SGMLDECL</literal> designates the
system identifier of the SGML statement that should be used.
</para>
<informalexample>
<para>
SGMLDECL &quot;publishings/books.dcl&quot;</para>
</informalexample>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>DTDDECL</literal></term>
<listitem>
<para>Similar to the <literal>SGMLDECL</literal> the
keyword <literal>DTDDECL</literal> identifies the SGML statement
that should be used. <literal>DTDDECL</literal> makes the
association of the statement with a public identifier to a
<acronym>DTD</acronym>. Unfortunately, this association isn't
supported by the open source tools available. The benefits of this
statement can be achieved somehow with multiple catalog files.
</para>
<informalexample>
<para>
DTDDECL &quot;-//Conectiva SA//DTD livros V1.0//EN&quot; &quot;publicacoes/livros.dcl&quot;</para>
</informalexample>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>CATALOG</literal></term>
<listitem>
<para>The keyword <literal>CATALOG</literal> allows a catalog
to be included inside another. This is a way to make use of several
different catalogs without the need to alter them.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>OVERRIDE</literal></term>
<listitem>
<para>The keyword <literal>OVERRIDE</literal> informs whether an
identifier has priority over a system identifier.
The standard on most systems is that the system identifier
has priority over the public one.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>DELEGATE</literal></term>
<listitem>
<para>The keyword <literal>DELEGATE</literal> allows the
association of a catalog to a specific type of public identifier.
The clause <literal>DELEGATE</literal> is very similar to the
<literal>CATALOG</literal>, except for the fact that it doesn't do
anything until a specific pattern is specified.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>DOCTYPE</literal></term>
<listitem>
<para>If a document starts with a type of document, but
has no public identifier and no system identifier the clause
<literal>DOCTYPE</literal> associates this document
with a specific DTD.</para>
</listitem>
</varlistentry>
</variablelist>
</section>
</section>
<section id="catalog-xml">
<title>XML Catalogs</title>
<para>The following sample catalog was provided by Martin A. Brown.</para>
<example id="ex-catalog-xml">
<title>Sample XML Catalog file</title>
<screen>
&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE catalog PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
<sgmltag class="starttag">catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"</sgmltag>
<sgmltag class="emptytag">public publicId="-//OASIS//DTD DocBook XML V4.2//EN"
uri="/home/mabrown/docbook/dtds/4.2/docbookx.dtd"</sgmltag>
<sgmltag class="emptytag">uri name="http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
uri="/home/mabrown/docbook/dtds/4.2/docbookx.dtd"</sgmltag>
<sgmltag class="emptytag">uri name="http://docbook.sourceforge.net/release/xsl/current/xhtml/docbook.xsl"
uri="/home/mabrown/docbook/xsl/xhtml/docbook.xsl"</sgmltag>
<sgmltag class="emptytag">uri name="http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl"
uri="/home/mabrown/docbook/xsl/xhtml/chunk.xsl"</sgmltag>
<sgmltag class="emptytag">uri name="http://docbook.sourceforge.net/release/xsl/current/xhtml/profile-chunk.xsl"
uri="/home/mabrown/docbook/xsl/xhtml/profile-chunk.xsl"</sgmltag>
<sgmltag class="endtag">catalog</sgmltag>
</screen>
</example>
</section>
</section> <!-- end of catalogs -->
<section id="validatexml">
<title>Validating XML</title>
<section id="validate-nsgmls">
<title>nsgmls</title>
<para>
You can use <application>nsgmls</application>, which is part of the
<application>jade</application> suite (on Debian apt-get the
<application>docbook-utils</application> package, see <xref linkend="docbook-utils" />), to validate SGML or XML documents.
</para>
<screen>
<prompt>bash$</prompt> <command>nsgmls <parameter>-s</parameter> <replaceable>HOWTO.xml</replaceable></command>
</screen>
<para> If there are no issues, you'll just get your command
prompt back. The <parameter>-s</parameter> tells
<application>nsgmls</application> to show only the errors.</para>
<tip><title>Function not found</title>
<para>
If you get errors about a function not being found, or
something about an ISO character not having an
authoritative source, you may
need to point <command>nsgmls</command> to your
<filename>xml.dcl</filename> file. For Red Hat 9, it
will look like this:
<command>nsgmls <parameter>-s</parameter>
<filename>/usr/share/sgml/xml.dcl</filename>
<replaceable>HOWTO.xml</replaceable></command>
</para>
</tip>
<para>
For more information on processing files with Jade/OpenJade please read
<ulink url="http://www.tldp.org/HOWTO/DocBook-OpenJade-SGML-XML-HOWTO/index.html">DocBook XML/SGML Processing Using OpenJade</ulink>.
</para>
</section> <!-- end nsgmls -->
<section id="validate-onsgmls">
<title>onsgmls</title>
<para>
This is an alternative to <application>nsgmls</application>. It ships
with the OpenJade package. This program gives more options than nsgmls
and allows you to quietly ignore a number of problems that arise while
trying to validate an XML file (as opposed to an SGML file). This also
means you don't have to type out the location of your
<filename>xml.dcl</filename> file each time.
</para>
<para>
I was able to simply use the following to validate a file with only
error messages that were related to my markup errors.
</para>
<screen>
<prompt>bash$ </prompt><command>onsgmls -s <replaceable>HOWTO.xml</replaceable></command>
</screen>
<para>
According to <ulink
url="http://lists.oasis-open.org/archives/docbook-apps/200305/msg00081.html">Bob
Stayton</ulink> you can also turn off specific error messages. The
following example turns off XML-specific error messages.
</para>
<screen>
<prompt>bash$ </prompt><command>onsgmls <parameter>-s</parameter> <parameter>-wxml</parameter> <parameter>-wno-explicit-sgml-decl</parameter> <replaceable>HOWTO.xml</replaceable></command>
</screen>
</section>
<section id="validate-xmllint">
<title>xmllint</title>
<para>
You can also use the <application>xmllint</application> command-line tool from the <application>libxml2</application> package to validate your documents. This tool does a simple check on completeness of tags and whether all tags that are opened, are also closed again.
By default
<application>xmllint</application> will output a results tree. So if your document comes out until the last line, you know there are no heavy errors having to do with tag mismatches, opening and closing errors and the like.</para>
<para>To prevent printing the entire document to your screen, add the <parameter>--noout</parameter>
parameter.
</para>
<screen>
<prompt>bash$</prompt> <command>xmllint <parameter>--noout</parameter> <replaceable>HOWTO.xml</replaceable></command>
</screen>
<para>
If nothing is returned, your document contains no syntax errors. Else, start with the first error that was reported. Fix that one error, and run the tool again on your document. If it still returns output, again fix the first error that you see, don't botter with the rest since further errors are usually generated because of the first one.</para>
<para>
If you would like to check your document for any errors which are
specific to your Document Type Definition, add
<parameter>--valid</parameter>.
</para>
<screen>
<prompt>bash$</prompt> <command>xmllint <parameter>--noout</parameter> <parameter>--valid</parameter> <replaceable>HOWTO.xml</replaceable></command>
</screen>
<para>The <command>xmllint</command> tool may also be used for checking errors in the XML catalogs, see the man pages for more info on how to set this behavior.</para>
<para>
If you are a Mac OSX or Windows user, you may also want to check out
<application>tkxmllint</application>, a GUI version of
<application>xmllint</application>. More information is available from:
<ulink url="http://tclxml.sourceforge.net/tkxmllint.html" />.
</para>
<example id="xmllint-example"><title>Debugging example using xmllint</title>
<para>The example below shows how you can use <application>xmllint</application> to check your documents. I've created some errors that I made a lot, as a beginning XML writer. At first, the document doesn't come through, and errors are shown:</para>
<screen>
<prompt>bash$</prompt> <command>xmllint <filename>ldp-history.xml</filename></command>
ldp-history.xml:22: error: Opening and ending tag mismatch: articlinfo line 6 and articleinfo
&lt;/articleinfo&gt;
^
ldp-history.xml:37: error: Opening and ending tag mismatch: listitem line 36 and orderedlist
&lt;/orderedlist&gt;
^
ldp-history.xml:39: error: Opening and ending tag mismatch: orderedlist line 34 and sect2
&lt;/sect2&gt;
^
ldp-history.xml:46: error: Opening and ending tag mismatch: sect1 line 41 and para
for many authors to contribute their part in their area of specialization.&lt;/para
^
ldp-history.xml:57: error: Opening and ending tag mismatch: para line 55 and sect1
&lt;/sect1&gt;
^
ldp-history.xml:59: error: Opening and ending tag mismatch: sect2 line 31 and article
&lt;/article&gt;
^
ldp-history.xml:61: error: Premature end of data in tag sect1 line 24
^
ldp-history.xml:61: error: Premature end of data in tag article line 5
^
</screen>
<para>Now, as we already mentioned, don't worry about anything except the first error. The first error says there is an inconsistency between the tags on line 6 and line 22 in the file. Indeed, on line 6 we left out the <quote>e</quote> in <quote>articleinfo</quote>. Fix the error, and run <command>xmllint</command> again. The first complaint now is about the offending line 37, where the closing tag for list items has been forgotten. Fix the error and run the validation tool again, until all errors are gone. Most common errors include forgetting to open or close the paragraph tag, spelling errors in tags and messed up sections.</para>
</example>
</section> <!-- end xmllint -->
</section> <!-- end validate -->
</section>