Copied information from word processors into x2docbook

This commit is contained in:
emmajane 2005-01-24 02:35:29 +00:00
parent f0340dc3e0
commit 75e1b2a6b3
2 changed files with 237 additions and 2 deletions

View File

@ -14,6 +14,7 @@
improving.
</para>
<!-- duplicated in x2docbook.xml -->
<note><title>Converting Microsoft Word documents</title><para>
Even if you want to use MS Word to write your documents, you may
find <ulink url="http://www.docsoft.com/w2xmlv2.htm">w2XML</ulink>
@ -35,6 +36,7 @@
the case.
</para> </section>
<!-- duplicated in x2docbook.xml -->
<section id="openoffice"> <title>OpenOffice.org</title> <para>
<ulink url="http://openoffice.org">http://openoffice.org</ulink>
</para>

View File

@ -3,9 +3,242 @@
-->
<appendix id="x2docbook">
<title>Converting Documents to DocBook XML</title>
<section id="oo2docbook">
<title>OpenOffice.org to DocBook</title>
<!-- duplicated from tools-word-processors.xml -->
<para>As of <ulink url="http://www.openoffice.org">OpenOffice.org (OOo)</ulink> 1.1RC there has been support for
exporting files to DocBook format.</para>
<para>Although OOo uses the full DocBook document type declaration,
it does not actually export the full list of DocBook elements. It
uses a <quote>simplified</quote> DocBook tagset which is geared
to on-the-fly rendering. (Although it is not the official
Simplified DocBook which is described in <xref linkend="dtd" />.)
The OpenOffice simplified (or <quote>special</quote> docbook) is available from
<ulink
url="http://www.chez.com/ebellot/ooo2sdbk/">http://www.chez.com/ebellot/ooo2sdbk/</ulink>.</para>
<section id="ooo-1-0">
<title>Open Office 1.0.x</title>
<para>
OOo has been tested by LDP volunteers with mostly positive
results. Thanks to Charles Curley
(<ulink
url="http://www.charlescurley.com">charlescurley.com</ulink>)
for the following notes on using OOo version 1.0.x:
</para>
<note><title>Check the version of your OpenOffice</title>
<para>
These notes may not apply to the version of OOo you
are using.
</para>
</note>
<itemizedlist> <listitem><para>
To be able to export to DocBook, you must have a Java runtime
environment (JRE) installed and registered with OOo--a minimum of
version 4.2.x is recommended. The configuration instructions will
depend on how you installed your JRE. Visit the OOo web site for
help with your setup.
</para>
<para>
Contrary to the OOo documentation, the Linux OOo did not come with
a JRE. I got one from Sun.
</para> </listitem> <!-- openoffice -->
<listitem><para>The exported file has lots of empty lines. My 54 line exported
file
had 5 lines of actual XML code.</para></listitem>
<listitem><para>There was no effort at pretty printing.</para></listitem>
<listitem><para> The header is:
<computeroutput>
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;!DOCTYPE article PUBLIC &quot;-//OASIS//DTD DocBook XML V4.1.2//EN&quot;
&quot;http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd&quot;&gt;
</computeroutput>
</para></listitem>
<listitem><para> The pull-down menu in the <menuchoice><guimenu>File</guimenu><guimenuitem>Save
As</guimenuitem></menuchoice> dialog box for
file format
indicates that the export format is <quote>DocBook (simplified).</quote> There is
no explanation of what that <quote>simplified</quote> indicates. Does OOo export
a subset of DocBook? If so, which elements are ignored? Is there any
way to enter any of them manually?
</para></listitem>
<listitem><para> There is NO documentation on the DocBook export filter
or whether
OOo will import it again.
</para></listitem> </itemizedlist>
<para>
Conclusions: OOo 1.1RC is worth looking at if you want a word
processor for preparing DocBook documents.
</para>
<para>However, I hope they cure the lack of documentation. For one
thing, it would be nice to know which native OOo styles map to
which DocBook elements. It would also be nice to know how to
map one's own OOo styles to DocBook elements.</para>
</section>
<section id="ooo-1-1">
<title>Open Office 1.1</title>
<para>
<ulink url="http://www.merlinmonroe.com">Tabatha Marshall</ulink>
offers the following additional information for OOo 1.1.
</para>
<blockquote>
<para> The first problem was when I tried to do everything on version
1.0.1. That was
obviously a problem. I have RH8, and it was installed via rpm packages,
so I ripped it out and did a full, new install of OpenOffice 1.1.
It took a while to find out 1.1 was a requirement for XML to work.
</para>
<para>
During the install process I believe I was offered the choice to install
the XML features. I have a tendency to do full installs of my office
programs, so I selected everything.
</para>
<para>
I can't offer any advice to those trying to update their current
OO 1.1. Their <quote>3 ways</quote> aren't documented very well at the site
(<ulink url="http://xml.openoffice.org">xml.openoffice.org</ulink>) and as of this writing, I can't even find THAT
on their site anymore. I think more current documentation is needed
there to walk people through the process. Most of this was unclear
and I had to pretty much experiment to get things working.
</para>
<para>
Well, after I installed everything I had some configuration to do.
I opened the application, and got started by opening a new file,
choosing templates, then selecting the DocBook template. A nice menu
of <guisubmenu>Paragraph Styles</guisubmenu> popped up for me, which are the names for all those
tags, I noticed (you can see I don't use WYSIWYG often).
</para>
<para>
With a blank doc before me (couldn't get to the <guisubmenu>XML Filter
Settings</guisubmenu> menu unless some type of doc was opened), I went into
<menuchoice><guimenu>Tools</guimenu><guimenuitem>XML
Filter Settings</guimenuitem></menuchoice>, and edited the entry for DocBook file.
I configured mine as follows:
</para>
<itemizedlist>
<listitem>
<para>
<guilabel>Doctype</guilabel>
<userinput>-//OASIS//DTD DocBook XML V4.2//EN</userinput>
</para>
</listitem>
<listitem>
<para>
<guilabel>DTD</guilabel>
<userinput>http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd</userinput>
</para>
</listitem>
<listitem>
<para>
<guilabel>XSLT for export</guilabel>
<userinput>/usr/local/OpenOffice.org1.1.0/share/xslt/docbook/ldp-html.xsl</userinput>
</para>
</listitem>
<listitem>
<para>
<guilabel>XSLT for import</guilabel>
<userinput>/usr/local/OpenOffice.org1.1.0/share/xslt/docbook/docbooktosoffheadings.xsl</userinput>
(this is the default)
</para>
</listitem>
<listitem>
<para>
<guilabel>Template for import</guilabel>
<userinput>/home/tabatha/OpenOffice/user/template/DocBook
File/DocBookTemplate.stw</userinput>
</para>
</listitem>
</itemizedlist>
<para>
At first, if I opened an XML file that had even one parsing error, it
would just open the file anyway and display the markup in OO. I have
many XML files that use &amp;copy; and other types of entities which show
up as parse errors (depending on the encoding) even though they can be
processed through. But today I was unable to open any of those files.
I got input/output errors instead. Still investigating that one.
</para>
<para>
However when you do successfully open a document (one parsing with no
errors), it puts it automatically into WYSIWYG based on the markup,
and you can then work from the paragraph styles menu like any other
such editor.
</para>
<para>
To validate the document, I used <menuchoice><guimenu>Tools</guimenu><guimenuitem>XML
Filter Settings</guimenuitem></menuchoice>, then
clicked the <guibutton>Test XSLTs</guibutton> button. On my screen, I set up the XSLT
for export to be <filename>ldp-html.xsl</filename>. If you test and there are errors,
a new window pops up with error messages at the bottom, and the lines
that need to be changed up at the top. You can change them there and
progress through the errors until they're all gone, and keep testing
until they're gone.
</para>
<para>
If you want to open a file to see the source instead of the processed
results, go to <menuchoice><guimenu>Tools</guimenu><guimenuitem>XML Filter
Settings</guimenuitem><guisubmenu>Test XSLTs</guisubmenu></menuchoice>, and then
under the <menuchoice><guimenu>Import</guimenu></menuchoice> section, check the
<guilabel>Display Source</guilabel> box. My import XSLT
is currently <filename>docbooktosoffheadings.xsl</filename> (the default) and the template
for import is <filename>DocBookTemplate.stw</filename> (also default).
</para>
<para>
I think this might work for some people, but unfortunately not for me.
I've never used WYSIWYG to edit markup. <application>Emacs with
PSGML</application> can tell me
what my next tag is no matter where I am, validate by moving through
the trouble spots, and I can parse and process from command line.
</para>
<para>
With OpenOffice, you have to visit <ulink
url="http://xml.openoffice.org/filters.html">http://xml.openoffice.org/filters.html</ulink>
to find conversion tools.
</para> </blockquote>
</section> </section>
</section>
<section id="word2docbook">
<title>Microsoft Word to DocBook</title>
<para>Even if you want to use MS Word to write your documents, you may
find <ulink url="http://www.docsoft.com/w2xmlv2.htm">w2XML</ulink>
useful. Note that this is not free software--the cost is around $130USD.
There is, however, a trial version of the software.</para>
</section>
<section id="xmldifferences">
<title>Differences between XML and SGML</title>
<title>XML and SGML DocBook</title>
<para>
There are a few changes between DocBook XML and SGML. Handling
these differences should be relatively easy for most small documents,
@ -204,4 +437,4 @@
</section>
</section>
</appendix>
</appendix>