old-www/LDP/www.debian.org/doc/manuals/debian-reference/ch11.en.html

3154 lines
154 KiB
HTML
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Chapter 11. Data conversion</title>
<link rel="stylesheet" href="debian-reference.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
<link rel="home" href="index.en.html" title="Debian Reference">
<link rel="up" href="index.en.html" title="Debian Reference">
<link rel="prev" href="ch10.en.html" title="Chapter 10. Data management">
<link rel="next" href="ch12.en.html" title="Chapter 12. Programming">
<link rel="preface" href="pr01.en.html" title="Preface">
<link rel="chapter" href="ch01.en.html" title="Chapter 1. GNU/Linux tutorials">
<link rel="chapter" href="ch02.en.html" title="Chapter 2. Debian package management">
<link rel="chapter" href="ch03.en.html" title="Chapter 3. The system initialization">
<link rel="chapter" href="ch04.en.html" title="Chapter 4. Authentication">
<link rel="chapter" href="ch05.en.html" title="Chapter 5. Network setup">
<link rel="chapter" href="ch06.en.html" title="Chapter 6. Network applications">
<link rel="chapter" href="ch07.en.html" title="Chapter 7. The X Window System">
<link rel="chapter" href="ch08.en.html" title="Chapter 8. I18N and L10N">
<link rel="chapter" href="ch09.en.html" title="Chapter 9. System tips">
<link rel="chapter" href="ch10.en.html" title="Chapter 10. Data management">
<link rel="chapter" href="ch11.en.html" title="Chapter 11. Data conversion">
<link rel="chapter" href="ch12.en.html" title="Chapter 12. Programming">
<link rel="appendix" href="apa.en.html" title="Appendix A. Appendix">
<link rel="section" href="ch11.en.html#_text_data_conversion_tools" title="11.1. Text data conversion tools">
<link rel="section" href="ch11.en.html#_xml_data" title="11.2. XML data">
<link rel="section" href="ch11.en.html#_printable_data" title="11.3. Printable data">
<link rel="section" href="ch11.en.html#_type_setting" title="11.4. Type setting">
<link rel="section" href="ch11.en.html#_the_mail_data_conversion" title="11.5. The mail data conversion">
<link rel="section" href="ch11.en.html#_graphic_data_tools" title="11.6. Graphic data tools">
<link rel="section" href="ch11.en.html#_miscellaneous_data_conversion" title="11.7. Miscellaneous data conversion">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<div class="navheader">
<table width="100%" summary="Navigation header">
<tr><th colspan="3" align="center">Chapter 11. Data conversion</th></tr>
<tr>
<td width="20%" align="left">
<a accesskey="p" href="ch10.en.html"><img src="images/prev.gif" alt="Prev"></a> </td>
<th width="60%" align="center"> </th>
<td width="20%" align="right"> <a accesskey="n" href="ch12.en.html"><img src="images/next.gif" alt="Next"></a>
</td>
</tr>
</table>
<hr>
</div>
<div class="chapter">
<div class="titlepage"><div><div><h2 class="title">
<a name="_data_conversion"></a>Chapter 11. Data conversion</h2></div></div></div>
<div class="toc">
<p><b>Table of Contents</b></p>
<dl>
<dt><span class="section"><a href="ch11.en.html#_text_data_conversion_tools">11.1. Text data conversion tools</a></span></dt>
<dd><dl>
<dt><span class="section"><a href="ch11.en.html#_converting_a_text_file_with_iconv">11.1.1. Converting a text file with iconv</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_checking_file_to_be_utf_8_with_iconv">11.1.2. Checking file to be UTF-8 with iconv</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_converting_file_names_with_iconv">11.1.3. Converting file names with iconv</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_eol_conversion">11.1.4. EOL conversion</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_tab_conversion">11.1.5. TAB conversion</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_editors_with_auto_conversion">11.1.6. Editors with auto-conversion</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_plain_text_extraction">11.1.7. Plain text extraction</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_highlighting_and_formatting_plain_text_data">11.1.8. Highlighting and formatting plain text data</a></span></dt>
</dl></dd>
<dt><span class="section"><a href="ch11.en.html#_xml_data">11.2. XML data</a></span></dt>
<dd><dl>
<dt><span class="section"><a href="ch11.en.html#_basic_hints_for_xml">11.2.1. Basic hints for XML</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_xml_processing">11.2.2. XML processing</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_the_xml_data_extraction">11.2.3. The XML data extraction</a></span></dt>
</dl></dd>
<dt><span class="section"><a href="ch11.en.html#_printable_data">11.3. Printable data</a></span></dt>
<dd><dl>
<dt><span class="section"><a href="ch11.en.html#_ghostscript">11.3.1. Ghostscript</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_merge_two_ps_or_pdf_files">11.3.2. Merge two PS or PDF files</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_printable_data_utilities">11.3.3. Printable data utilities</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_printing_with_cups">11.3.4. Printing with CUPS</a></span></dt>
</dl></dd>
<dt><span class="section"><a href="ch11.en.html#_type_setting">11.4. Type setting</a></span></dt>
<dd><dl>
<dt><span class="section"><a href="ch11.en.html#_roff_typesetting">11.4.1. roff typesetting</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_tex_latex">11.4.2. TeX/LaTeX</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_pretty_print_a_manual_page">11.4.3. Pretty print a manual page</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_creating_a_manual_page">11.4.4. Creating a manual page</a></span></dt>
</dl></dd>
<dt><span class="section"><a href="ch11.en.html#_the_mail_data_conversion">11.5. The mail data conversion</a></span></dt>
<dd><dl><dt><span class="section"><a href="ch11.en.html#_mail_data_basics">11.5.1. Mail data basics</a></span></dt></dl></dd>
<dt><span class="section"><a href="ch11.en.html#_graphic_data_tools">11.6. Graphic data tools</a></span></dt>
<dt><span class="section"><a href="ch11.en.html#_miscellaneous_data_conversion">11.7. Miscellaneous data conversion</a></span></dt>
</dl>
</div>
<p>Tools and tips for converting data formats on the Debian system are described.</p>
<p>Standard based tools are in very good shape but support for proprietary data formats are limited.</p>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="_text_data_conversion_tools"></a>11.1. Text data conversion tools</h2></div></div></div>
<p>Following packages for the text data conversion caught my eyes.</p>
<div class="table">
<a name="listoftextdataconversiontools"></a><p class="title"><b>Table 11.1. List of text data conversion tools</b></p>
<div class="table-contents"><table summary="List of text data conversion tools" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/libc6" target="_top">
<code class="literal">libc6</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=libc6" target="_top">http://qa.debian.org/popcon.php?package=libc6</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/libc/libc6.html" target="_top">9500</a></td>
<td align="left">
charset
</td>
<td align="left">
text encoding converter between locales by <span class="citerefentry"><span class="refentrytitle">iconv</span>(1)</span> (fundamental)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/recode" target="_top">
<code class="literal">recode</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=recode" target="_top">http://qa.debian.org/popcon.php?package=recode</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/r/recode.html" target="_top">768</a></td>
<td align="left">
charset+eol
</td>
<td align="left">
text encoding converter between locales (versatile, more aliases and features)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/konwert" target="_top">
<code class="literal">konwert</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=konwert" target="_top">http://qa.debian.org/popcon.php?package=konwert</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/k/konwert.html" target="_top">192</a></td>
<td align="left">
charset
</td>
<td align="left">
text encoding converter between locales (fancy)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/nkf" target="_top">
<code class="literal">nkf</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=nkf" target="_top">http://qa.debian.org/popcon.php?package=nkf</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/n/nkf.html" target="_top">205</a></td>
<td align="left">
charset
</td>
<td align="left">
character set translator for Japanese
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/tcs" target="_top">
<code class="literal">tcs</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=tcs" target="_top">http://qa.debian.org/popcon.php?package=tcs</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/tcs.html" target="_top">544</a></td>
<td align="left">
charset
</td>
<td align="left">
character set translator
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/unaccent" target="_top">
<code class="literal">unaccent</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=unaccent" target="_top">http://qa.debian.org/popcon.php?package=unaccent</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/u/unaccent.html" target="_top">76</a></td>
<td align="left">
charset
</td>
<td align="left">
replace accented letters by their unaccented equivalent
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/tofrodos" target="_top">
<code class="literal">tofrodos</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=tofrodos" target="_top">http://qa.debian.org/popcon.php?package=tofrodos</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/tofrodos.html" target="_top">67</a></td>
<td align="left">
eol
</td>
<td align="left">
text format converter between DOS and Unix: <span class="citerefentry"><span class="refentrytitle">fromdos</span>(1)</span> and <span class="citerefentry"><span class="refentrytitle">todos</span>(1)</span>
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/macutils" target="_top">
<code class="literal">macutils</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=macutils" target="_top">http://qa.debian.org/popcon.php?package=macutils</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/m/macutils.html" target="_top">320</a></td>
<td align="left">
eol
</td>
<td align="left">
text format converter between Macintosh and Unix: <span class="citerefentry"><span class="refentrytitle">frommac</span>(1)</span> and <span class="citerefentry"><span class="refentrytitle">tomac</span>(1)</span>
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_converting_a_text_file_with_iconv"></a>11.1.1. Converting a text file with iconv</h3></div></div></div>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p><span class="citerefentry"><span class="refentrytitle">iconv</span>(1)</span> is provided as a part of the <code class="literal">libc6</code> package and it is always available on practically all systems to convert the encoding of characters.</p></td></tr>
</table></div>
<p>You can convert encodings of a text file with <span class="citerefentry"><span class="refentrytitle">iconv</span>(1)</span> by the following.</p>
<pre class="screen">$ iconv -f encoding1 -t encoding2 input.txt &gt;output.txt</pre>
<p>Encoding values are case insensitive and ignore "<code class="literal">-</code>" and "<code class="literal">_</code>" for matching. Supported encodings can be checked by the "<code class="literal">iconv -l</code>" command.</p>
<div class="table">
<a name="list-of-encoding-values"></a><p class="title"><b>Table 11.2. List of encoding values and their usage</b></p>
<div class="table-contents"><table summary="List of encoding values and their usage" border="1">
<colgroup>
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
encoding value
</th>
<th align="left">
usage
</th>
</tr></thead>
<tbody>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/ASCII" target="_top">ASCII</a>.
</td>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/ASCII" target="_top">American Standard Code for Information Interchange</a>, 7 bit code w/o accented characters
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/UTF-8" target="_top">UTF-8</a>
</td>
<td align="left">
current multilingual standard for all modern OSs
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1" target="_top">ISO-8859-1</a>
</td>
<td align="left">
old standard for western European languages, ASCII + accented characters
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/ISO/IEC_8859-2" target="_top">ISO-8859-2</a>
</td>
<td align="left">
old standard for eastern European languages, ASCII + accented characters
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/ISO/IEC_8859-15" target="_top">ISO-8859-15</a>
</td>
<td align="left">
old standard for western European languages, <a class="ulink" href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1" target="_top">ISO-8859-1</a> with euro sign
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Code_page_850" target="_top">CP850</a>
</td>
<td align="left">
code page 850, Microsoft DOS characters with graphics for western European languages, <a class="ulink" href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1" target="_top">ISO-8859-1</a> variant
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Code_page_932" target="_top">CP932</a>
</td>
<td align="left">
code page 932, Microsoft Windows style <a class="ulink" href="http://en.wikipedia.org/wiki/Shift_JIS" target="_top">Shift-JIS</a> variant for Japanese
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Code_page_936" target="_top">CP936</a>
</td>
<td align="left">
code page 936, Microsoft Windows style <a class="ulink" href="http://en.wikipedia.org/wiki/GB2312" target="_top">GB2312</a>, <a class="ulink" href="http://en.wikipedia.org/wiki/GBK" target="_top">GBK</a> or <a class="ulink" href="http://en.wikipedia.org/wiki/GB18030" target="_top">GB18030</a> variant for Simplified Chinese
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Code_page_949" target="_top">CP949</a>
</td>
<td align="left">
code page 949, Microsoft Windows style <a class="ulink" href="http://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-KR" target="_top">EUC-KR</a> or Unified Hangul Code variant for Korean
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Code_page_950" target="_top">CP950</a>
</td>
<td align="left">
code page 950, Microsoft Windows style <a class="ulink" href="http://en.wikipedia.org/wiki/Big5" target="_top">Big5</a> variant for Traditional Chinese
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Windows-1251" target="_top">CP1251</a>
</td>
<td align="left">
code page 1251, Microsoft Windows style encoding for the Cyrillic alphabet
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Windows-1252" target="_top">CP1252</a>
</td>
<td align="left">
code page 1252, Microsoft Windows style <a class="ulink" href="http://en.wikipedia.org/wiki/ISO/IEC_8859-15" target="_top">ISO-8859-15</a> variant for western European languages
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/KOI8-R" target="_top">KOI8-R</a>
</td>
<td align="left">
old Russian UNIX standard for the Cyrillic alphabet
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/ISO/IEC_2022" target="_top">ISO-2022-JP</a>
</td>
<td align="left">
standard encoding for Japanese email which uses only 7 bit codes
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Extended_Unix_Code" target="_top">eucJP</a>
</td>
<td align="left">
old Japanese UNIX standard 8 bit code and completely different from <a class="ulink" href="http://en.wikipedia.org/wiki/Shift_JIS" target="_top">Shift-JIS</a>
</td>
</tr>
<tr>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Shift_JIS" target="_top">Shift-JIS</a>
</td>
<td align="left">
JIS X 0208 Appendix 1 standard for Japanese (see <a class="ulink" href="http://en.wikipedia.org/wiki/Code_page_932" target="_top">CP932</a>)
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>Some encodings are only supported for the data conversion and are not used as locale values (<a class="xref" href="ch08.en.html#_basics_of_encoding" title="8.3.1. Basics of encoding">Section 8.3.1, “Basics of encoding”</a>).</p></td></tr>
</table></div>
<p>For character sets which fit in single byte such as <a class="ulink" href="http://en.wikipedia.org/wiki/ASCII" target="_top">ASCII</a> and <a class="ulink" href="http://en.wikipedia.org/wiki/ISO/IEC_8859" target="_top">ISO-8859</a> character sets, the <a class="ulink" href="http://en.wikipedia.org/wiki/Character_encoding" target="_top">character encoding</a> means almost the same thing as the character set.</p>
<p>For character sets with many characters such as <a class="ulink" href="http://en.wikipedia.org/wiki/JIS_X_0213" target="_top">JIS X 0213</a> for Japanese or <a class="ulink" href="http://en.wikipedia.org/wiki/Universal_Character_Set" target="_top">Universal Character Set (UCS, Unicode, ISO-10646-1)</a> for practically all languages, there are many encoding schemes to fit them into the sequence of the byte data.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p><a class="ulink" href="http://en.wikipedia.org/wiki/Extended_Unix_Code" target="_top">EUC</a> and <a class="ulink" href="http://en.wikipedia.org/wiki/ISO/IEC_2022" target="_top">ISO/IEC 2022 (also known as JIS X 0202)</a> for Japanese
</p></li>
<li class="listitem"><p><a class="ulink" href="http://en.wikipedia.org/wiki/UTF-8" target="_top">UTF-8</a>, <a class="ulink" href="http://en.wikipedia.org/wiki/UTF-16/UCS-2" target="_top">UTF-16/UCS-2</a> and <a class="ulink" href="http://en.wikipedia.org/wiki/UTF-32/UCS-4" target="_top">UTF-32/UCS-4</a> for Unicode
</p></li>
</ul></div>
<p>For these, there are clear differentiations between the character set and the character encoding.</p>
<p>The <a class="ulink" href="http://en.wikipedia.org/wiki/Code_page" target="_top">code page</a> is used as the synonym to the character encoding tables for some vendor specific ones.</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>Please note most encoding systems share the same code with ASCII for the 7 bit characters. But there are some exceptions. If you are converting old Japanese C programs and URLs data from the casually-called shift-JIS encoding format to UTF-8 format, use "<code class="literal">CP932</code>" as the encoding name instead of "<code class="literal">shift-JIS</code>" to get the expected results: <code class="literal">0x5C</code> → "<code class="literal">\</code>" and <code class="literal">0x7E</code> → "<code class="literal">~</code>" . Otherwise, these are converted to wrong characters.</p></td></tr>
</table></div>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p><span class="citerefentry"><span class="refentrytitle">recode</span>(1)</span> may be used too and offers more than the combined functionality of <span class="citerefentry"><span class="refentrytitle">iconv</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">fromdos</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">todos</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">frommac</span>(1)</span>, and <span class="citerefentry"><span class="refentrytitle">tomac</span>(1)</span>. For more, see "<code class="literal">info recode</code>".</p></td></tr>
</table></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_checking_file_to_be_utf_8_with_iconv"></a>11.1.2. Checking file to be UTF-8 with iconv</h3></div></div></div>
<p>You can check if a text file is encoded in UTF-8 with <span class="citerefentry"><span class="refentrytitle">iconv</span>(1)</span> by the following.</p>
<pre class="screen">$ iconv -f utf8 -t utf8 input.txt &gt;/dev/null || echo "non-UTF-8 found"</pre>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p>Use "<code class="literal">--verbose</code>" option in the above example to find the first non-UTF-8 character.</p></td></tr>
</table></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_converting_file_names_with_iconv"></a>11.1.3. Converting file names with iconv</h3></div></div></div>
<p>Here is an example script to convert encoding of file names from ones created under older OS to modern UTF-8 ones in a single directory.</p>
<pre class="screen">#!/bin/sh
ENCDN=iso-8859-1
for x in *;
do
mv "$x" "$(echo "$x" | iconv -f $ENCDN -t utf-8)"
done</pre>
<p>The "<code class="literal">$ENCDN</code>" variable should be set by the encoding value in <a class="xref" href="ch11.en.html#list-of-encoding-values" title="Table 11.2. List of encoding values and their usage">Table 11.2, “List of encoding values and their usage”</a>.</p>
<p>For more complicated case, please mount a filesystem (e.g. a partition on a disk drive) containing such file names with proper encoding as the <span class="citerefentry"><span class="refentrytitle">mount</span>(8)</span> option (see <a class="xref" href="ch08.en.html#_filename_encoding" title="8.3.6. Filename encoding">Section 8.3.6, “Filename encoding”</a>) and copy its entire contents to another filesystem mounted as UTF-8 with "<code class="literal">cp -a</code>" command.</p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_eol_conversion"></a>11.1.4. EOL conversion</h3></div></div></div>
<p>The text file format, specifically the end-of-line (EOL) code, is dependent on the platform.</p>
<div class="table">
<a name="listofeolstylesffferentplatforms"></a><p class="title"><b>Table 11.3. List of EOL styles for different platforms</b></p>
<div class="table-contents"><table summary="List of EOL styles for different platforms" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
platform
</th>
<th align="left">
EOL code
</th>
<th align="left">
control
</th>
<th align="left">
decimal
</th>
<th align="left">
hexadecimal
</th>
</tr></thead>
<tbody>
<tr>
<td align="left">
Debian (unix)
</td>
<td align="left">
LF
</td>
<td align="left">
<code class="literal">^J</code>
</td>
<td align="left">
10
</td>
<td align="left">
0A
</td>
</tr>
<tr>
<td align="left">
MSDOS and Windows
</td>
<td align="left">
CR-LF
</td>
<td align="left">
<code class="literal">^M^J</code>
</td>
<td align="left">
13 10
</td>
<td align="left">
0D 0A
</td>
</tr>
<tr>
<td align="left">
Apple's Macintosh
</td>
<td align="left">
CR
</td>
<td align="left">
<code class="literal">^M</code>
</td>
<td align="left">
13
</td>
<td align="left">
0D
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>The EOL format conversion programs, <span class="citerefentry"><span class="refentrytitle">fromdos</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">todos</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">frommac</span>(1)</span>, and <span class="citerefentry"><span class="refentrytitle">tomac</span>(1)</span>, are quite handy. <span class="citerefentry"><span class="refentrytitle">recode</span>(1)</span> is also useful.</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>Some data on the Debian system, such as the wiki page data for the <code class="literal">python-moinmoin</code> package, use MSDOS style CR-LF as the EOL code. So the above rule is just a general rule.</p></td></tr>
</table></div>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>Most editors (eg. <code class="literal">vim</code>, <code class="literal">emacs</code>, <code class="literal">gedit</code>, …) can handle files in MSDOS style EOL transparently.</p></td></tr>
</table></div>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p>The use of "<code class="literal">sed -e '/\r$/!s/$/\r/'</code>" instead of <span class="citerefentry"><span class="refentrytitle">todos</span>(1)</span> is better when you want to unify the EOL style to the MSDOS style from the mixed MSDOS and Unix style. (e.g., after merging 2 MSDOS style files with <span class="citerefentry"><span class="refentrytitle">diff3</span>(1)</span>.) This is because <code class="literal">todos</code> adds CR to all lines.</p></td></tr>
</table></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_tab_conversion"></a>11.1.5. TAB conversion</h3></div></div></div>
<p>There are few popular specialized programs to convert the tab codes.</p>
<div class="table">
<a name="listoftabconversoreutilspackages"></a><p class="title"><b>Table 11.4. List of TAB conversion commands from <code class="literal">bsdmainutils</code> and <code class="literal">coreutils</code> packages</b></p>
<div class="table-contents"><table summary="List of TAB conversion commands from bsdmainutils and coreutils packages" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
function
</th>
<th align="left">
<code class="literal">bsdmainutils</code>
</th>
<th align="left">
<code class="literal">coreutils</code>
</th>
</tr></thead>
<tbody>
<tr>
<td align="left">
expand tab to spaces
</td>
<td align="left">
"<code class="literal">col -x</code>"
</td>
<td align="left">
<code class="literal">expand</code>
</td>
</tr>
<tr>
<td align="left">
unexpand tab from spaces
</td>
<td align="left">
"<code class="literal">col -h</code>"
</td>
<td align="left">
<code class="literal">unexpand</code>
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p><span class="citerefentry"><span class="refentrytitle">indent</span>(1)</span> from the <code class="literal">indent</code> package completely reformats whitespaces in the C program.</p>
<p>Editor programs such as <code class="literal">vim</code> and <code class="literal">emacs</code> can be used for TAB conversion, too. For example with <code class="literal">vim</code>, you can expand TAB with "<code class="literal">:set expandtab</code>" and "<code class="literal">:%retab</code>" command sequence. You can revert this with "<code class="literal">:set noexpandtab</code>" and "<code class="literal">:%retab!</code>" command sequence.</p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_editors_with_auto_conversion"></a>11.1.6. Editors with auto-conversion</h3></div></div></div>
<p>Intelligent modern editors such as the <code class="literal">vim</code> program are quite smart and copes well with any encoding systems and any file formats. You should use these editors under the UTF-8 locale in the UTF-8 capable console for the best compatibility.</p>
<p>An old western European Unix text file, "<code class="literal">u-file.txt</code>", stored in the latin1 (iso-8859-1) encoding can be edited simply with <code class="literal">vim</code> by the following.</p>
<pre class="screen">$ vim u-file.txt</pre>
<p>This is possible since the auto detection mechanism of the file encoding in <code class="literal">vim</code> assumes the UTF-8 encoding first and, if it fails, assumes it to be latin1.</p>
<p>An old Polish Unix text file, "<code class="literal">pu-file.txt</code>", stored in the latin2 (iso-8859-2) encoding can be edited with <code class="literal">vim</code> by the following.</p>
<pre class="screen">$ vim '+e ++enc=latin2 pu-file.txt'</pre>
<p>An old Japanese unix text file, "<code class="literal">ju-file.txt</code>", stored in the eucJP encoding can be edited with <code class="literal">vim</code> by the following.</p>
<pre class="screen">$ vim '+e ++enc=eucJP ju-file.txt'</pre>
<p>An old Japanese MS-Windows text file, "<code class="literal">jw-file.txt</code>", stored in the so called shift-JIS encoding (more precisely: CP932) can be edited with <code class="literal">vim</code> by the following.</p>
<pre class="screen">$ vim '+e ++enc=CP932 ++ff=dos jw-file.txt'</pre>
<p>When a file is opened with "<code class="literal">++enc</code>" and "<code class="literal">++ff</code>" options, "<code class="literal">:w</code>" in the Vim command line stores it in the original format and overwrite the original file. You can also specify the saving format and the file name in the Vim command line, e.g., "<code class="literal">:w ++enc=utf8 new.txt</code>".</p>
<p>Please refer to the mbyte.txt "multi-byte text support" in <code class="literal">vim</code> on-line help and <a class="xref" href="ch11.en.html#list-of-encoding-values" title="Table 11.2. List of encoding values and their usage">Table 11.2, “List of encoding values and their usage”</a> for locale values used with "<code class="literal">++enc</code>".</p>
<p>The <code class="literal">emacs</code> family of programs can perform the equivalent functions.</p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_plain_text_extraction"></a>11.1.7. Plain text extraction</h3></div></div></div>
<p>The following reads a web page into a text file. This is very useful when copying configurations off the Web or applying basic Unix text tools such as <span class="citerefentry"><span class="refentrytitle">grep</span>(1)</span> on the web page.</p>
<pre class="screen">$ w3m -dump http://www.remote-site.com/help-info.html &gt;textfile</pre>
<p>Similarly, you can extract plain text data from other formats using the following.</p>
<div class="table">
<a name="listoftoolstoextactplaintextdata"></a><p class="title"><b>Table 11.5. List of tools to extract plain text data</b></p>
<div class="table-contents"><table summary="List of tools to extract plain text data" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
function
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/w3m" target="_top">
<code class="literal">w3m</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=w3m" target="_top">http://qa.debian.org/popcon.php?package=w3m</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/w/w3m.html" target="_top">1825</a></td>
<td align="left">
html→text
</td>
<td align="left">
HTML to text converter with the "<code class="literal">w3m -dump</code>" command
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/html2text" target="_top">
<code class="literal">html2text</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=html2text" target="_top">http://qa.debian.org/popcon.php?package=html2text</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/h/html2text.html" target="_top">248</a></td>
<td align="left">
html→text
</td>
<td align="left">
advanced HTML to text converter (ISO 8859-1)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/lynx" target="_top">
<code class="literal">lynx</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=lynx" target="_top">http://qa.debian.org/popcon.php?package=lynx</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/l/lynx.html" target="_top">242</a></td>
<td align="left">
html→text
</td>
<td align="left">
HTML to text converter with the "<code class="literal">lynx -dump</code>" command
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/elinks" target="_top">
<code class="literal">elinks</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=elinks" target="_top">http://qa.debian.org/popcon.php?package=elinks</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/e/elinks.html" target="_top">1364</a></td>
<td align="left">
html→text
</td>
<td align="left">
HTML to text converter with the "<code class="literal">elinks -dump</code>" command
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/links" target="_top">
<code class="literal">links</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=links" target="_top">http://qa.debian.org/popcon.php?package=links</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/l/links.html" target="_top">1275</a></td>
<td align="left">
html→text
</td>
<td align="left">
HTML to text converter with the "<code class="literal">links -dump</code>" command
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/links2" target="_top">
<code class="literal">links2</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=links2" target="_top">http://qa.debian.org/popcon.php?package=links2</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/l/links2.html" target="_top">3092</a></td>
<td align="left">
html→text
</td>
<td align="left">
HTML to text converter with the "<code class="literal">links2 -dump</code>" command
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/antiword" target="_top">
<code class="literal">antiword</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=antiword" target="_top">http://qa.debian.org/popcon.php?package=antiword</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/a/antiword.html" target="_top">560</a></td>
<td align="left">
MSWord→text,ps
</td>
<td align="left">
convert MSWord files to plain text or ps
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/catdoc" target="_top">
<code class="literal">catdoc</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=catdoc" target="_top">http://qa.debian.org/popcon.php?package=catdoc</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/c/catdoc.html" target="_top">2668</a></td>
<td align="left">
MSWord→text,TeX
</td>
<td align="left">
convert MSWord files to plain text or TeX
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/pstotext" target="_top">
<code class="literal">pstotext</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=pstotext" target="_top">http://qa.debian.org/popcon.php?package=pstotext</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/pstotext.html" target="_top">123</a></td>
<td align="left">
ps/pdf→text
</td>
<td align="left">
extract text from PostScript and PDF files
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/unhtml" target="_top">
<code class="literal">unhtml</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=unhtml" target="_top">http://qa.debian.org/popcon.php?package=unhtml</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/u/unhtml.html" target="_top">76</a></td>
<td align="left">
html→text
</td>
<td align="left">
remove the markup tags from an HTML file
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/odt2txt" target="_top">
<code class="literal">odt2txt</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=odt2txt" target="_top">http://qa.debian.org/popcon.php?package=odt2txt</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/o/odt2txt.html" target="_top">73</a></td>
<td align="left">
odt→text
</td>
<td align="left">
converter from OpenDocument Text to text
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break">
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_highlighting_and_formatting_plain_text_data"></a>11.1.8. Highlighting and formatting plain text data</h3></div></div></div>
<p>You can highlight and format plain text data by the following.</p>
<div class="table">
<a name="listoftoolstohigghtplaintextdata"></a><p class="title"><b>Table 11.6. List of tools to highlight plain text data</b></p>
<div class="table-contents"><table summary="List of tools to highlight plain text data" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/vim-runtime" target="_top">
<code class="literal">vim-runtime</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=vim-runtime" target="_top">http://qa.debian.org/popcon.php?package=vim-runtime</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/v/vim-runtime.html" target="_top">22298</a></td>
<td align="left">
highlight
</td>
<td align="left">
Vim MACRO to convert source code to HTML with "<code class="literal">:source $VIMRUNTIME/syntax/html.vim</code>"
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/cxref" target="_top">
<code class="literal">cxref</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=cxref" target="_top">http://qa.debian.org/popcon.php?package=cxref</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/c/cxref.html" target="_top">1115</a></td>
<td align="left">
c→html
</td>
<td align="left">
converter for the C program to latex and HTML (C language)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/src2tex" target="_top">
<code class="literal">src2tex</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=src2tex" target="_top">http://qa.debian.org/popcon.php?package=src2tex</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/s/src2tex.html" target="_top">1968</a></td>
<td align="left">
highlight
</td>
<td align="left">
convert many source codes to TeX (C language)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/source-highlight" target="_top">
<code class="literal">source-highlight</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=source-highlight" target="_top">http://qa.debian.org/popcon.php?package=source-highlight</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/s/source-highlight.html" target="_top">1939</a></td>
<td align="left">
highlight
</td>
<td align="left">
convert many source codes to HTML, XHTML, LaTeX, Texinfo, ANSI color escape sequences and DocBook files with highlight (C++)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/highlight" target="_top">
<code class="literal">highlight</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=highlight" target="_top">http://qa.debian.org/popcon.php?package=highlight</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/h/highlight.html" target="_top">726</a></td>
<td align="left">
highlight
</td>
<td align="left">
convert many source codes to HTML, XHTML, RTF, LaTeX, TeX or XSL-FO files with highlight (C++)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/grc" target="_top">
<code class="literal">grc</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=grc" target="_top">http://qa.debian.org/popcon.php?package=grc</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/grc.html" target="_top">232</a></td>
<td align="left">
text→color
</td>
<td align="left">
generic colouriser for everything (Python)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/txt2html" target="_top">
<code class="literal">txt2html</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=txt2html" target="_top">http://qa.debian.org/popcon.php?package=txt2html</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/txt2html.html" target="_top">296</a></td>
<td align="left">
text→html
</td>
<td align="left">
text to HTML converter (Perl)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/markdown" target="_top">
<code class="literal">markdown</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=markdown" target="_top">http://qa.debian.org/popcon.php?package=markdown</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/m/markdown.html" target="_top">96</a></td>
<td align="left">
text→html
</td>
<td align="left">
markdown text document formatter to (X)HTML (Perl)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/asciidoc" target="_top">
<code class="literal">asciidoc</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=asciidoc" target="_top">http://qa.debian.org/popcon.php?package=asciidoc</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/a/asciidoc.html" target="_top">3165</a></td>
<td align="left">
text→any
</td>
<td align="left">
AsciiDoc text document formatter to XML/HTML (Python)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/python-docutils" target="_top">
<code class="literal">python-docutils</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=python-docutils" target="_top">http://qa.debian.org/popcon.php?package=python-docutils</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/python-docutils.html" target="_top">1548</a></td>
<td align="left">
text→any
</td>
<td align="left">
ReStructured Text document formatter to XML (Python)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/txt2tags" target="_top">
<code class="literal">txt2tags</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=txt2tags" target="_top">http://qa.debian.org/popcon.php?package=txt2tags</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/txt2tags.html" target="_top">1152</a></td>
<td align="left">
text→any
</td>
<td align="left">
document conversion from text to HTML, SGML, LaTeX, man page, MoinMoin, Magic Point and PageMaker (Python)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/udo" target="_top">
<code class="literal">udo</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=udo" target="_top">http://qa.debian.org/popcon.php?package=udo</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/u/udo.html" target="_top">556</a></td>
<td align="left">
text→any
</td>
<td align="left">
universal document - text processing utility (C language)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/stx2any" target="_top">
<code class="literal">stx2any</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=stx2any" target="_top">http://qa.debian.org/popcon.php?package=stx2any</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/s/stx2any.html" target="_top">484</a></td>
<td align="left">
text→any
</td>
<td align="left">
document converter from structured plain text to other formats (m4)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/rest2web" target="_top">
<code class="literal">rest2web</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=rest2web" target="_top">http://qa.debian.org/popcon.php?package=rest2web</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/r/rest2web.html" target="_top">576</a></td>
<td align="left">
text→html
</td>
<td align="left">
document converter from ReStructured Text to html (Python)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/aft" target="_top">
<code class="literal">aft</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=aft" target="_top">http://qa.debian.org/popcon.php?package=aft</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/a/aft.html" target="_top">259</a></td>
<td align="left">
text→any
</td>
<td align="left">
"free form" document preparation system (Perl)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/yodl" target="_top">
<code class="literal">yodl</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=yodl" target="_top">http://qa.debian.org/popcon.php?package=yodl</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/y/yodl.html" target="_top">409</a></td>
<td align="left">
text→any
</td>
<td align="left">
pre-document language and tools to process it (C language)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/sdf" target="_top">
<code class="literal">sdf</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=sdf" target="_top">http://qa.debian.org/popcon.php?package=sdf</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/s/sdf.html" target="_top">1414</a></td>
<td align="left">
text→any
</td>
<td align="left">
simple document parser (Perl)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/sisu" target="_top">
<code class="literal">sisu</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=sisu" target="_top">http://qa.debian.org/popcon.php?package=sisu</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/s/sisu.html" target="_top">9149</a></td>
<td align="left">
text→any
</td>
<td align="left">
document structuring, publishing and search framework (Ruby)
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break">
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="_xml_data"></a>11.2. XML data</h2></div></div></div>
<p><a class="ulink" href="http://en.wikipedia.org/wiki/XML" target="_top">The Extensible Markup Language (XML)</a> is a markup language for documents containing structured information.</p>
<p>See introductory information at <a class="ulink" href="http://xml.com/" target="_top">XML.COM</a>.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p>
<a class="ulink" href="http://www.xml.com/pub/a/98/10/guide0.html" target="_top">"What is XML?"</a>
</p></li>
<li class="listitem"><p>
<a class="ulink" href="http://xml.com/pub/a/2000/08/holman/index.html" target="_top">"What Is XSLT?"</a>
</p></li>
<li class="listitem"><p>
<a class="ulink" href="http://xml.com/pub/a/2002/03/20/xsl-fo.html" target="_top">"What Is XSL-FO?"</a>
</p></li>
<li class="listitem"><p>
<a class="ulink" href="http://xml.com/pub/a/2000/09/xlink/index.html" target="_top">"What Is XLink?"</a>
</p></li>
</ul></div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_basic_hints_for_xml"></a>11.2.1. Basic hints for XML</h3></div></div></div>
<p>XML text looks somewhat like <a class="ulink" href="http://en.wikipedia.org/wiki/HTML" target="_top">HTML</a>. It enables us to manage multiple formats of output for a document. One easy XML system is the <code class="literal">docbook-xsl</code> package, which is used here.</p>
<p>Each XML file starts with standard XML declaration as the following.</p>
<pre class="screen">&lt;?xml version="1.0" encoding="UTF-8"?&gt;</pre>
<p>The basic syntax for one XML element is marked up as the following.</p>
<pre class="screen">&lt;name attribute="value"&gt;content&lt;/name&gt;</pre>
<p>XML element with empty content is marked up in the following short form.</p>
<pre class="screen">&lt;name attribute="value"/&gt;</pre>
<p>The "<code class="literal">attribute="value"</code>" in the above examples are optional.</p>
<p>The comment section in XML is marked up as the following.</p>
<pre class="screen">&lt;!-- comment --&gt;</pre>
<p>Other than adding markups, XML requires minor conversion to the content using predefined entities for following characters.</p>
<div class="table">
<a name="listofpredefinedentitiesforxml"></a><p class="title"><b>Table 11.7. List of predefined entities for XML</b></p>
<div class="table-contents"><table summary="List of predefined entities for XML" border="1">
<colgroup>
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
predefined entity
</th>
<th align="left">
character to be converted from
</th>
</tr></thead>
<tbody>
<tr>
<td align="left">
<code class="literal">&amp;quot;</code>
</td>
<td align="left">
<code class="literal">"</code> : quote
</td>
</tr>
<tr>
<td align="left">
<code class="literal">&amp;apos;</code>
</td>
<td align="left">
<code class="literal">'</code> : apostrophe
</td>
</tr>
<tr>
<td align="left">
<code class="literal">&amp;lt;</code>
</td>
<td align="left">
<code class="literal">&lt;</code> : less-than
</td>
</tr>
<tr>
<td align="left">
<code class="literal">&amp;gt;</code>
</td>
<td align="left">
<code class="literal">&gt;</code> : greater-than
</td>
</tr>
<tr>
<td align="left">
<code class="literal">&amp;amp;</code>
</td>
<td align="left">
<code class="literal">&amp;</code> : ampersand
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="caution" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Caution">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Caution]" src="images/caution.png"></td>
<th align="left">Caution</th>
</tr>
<tr><td align="left" valign="top"><p>"<code class="literal">&lt;</code>" or "<code class="literal">&amp;</code>" can not be used in attributes or elements.</p></td></tr>
</table></div>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>When SGML style user defined entities, e.g. "<code class="literal">&amp;some-tag:</code>", are used, the first definition wins over others. The entity definition is expressed in "<code class="literal">&lt;!ENTITY some-tag "entity value"&gt;</code>".</p></td></tr>
</table></div>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>As long as the XML markup are done consistently with certain set of the tag name (either some data as content or attribute value), conversion to another XML is trivial task using <a class="ulink" href="http://en.wikipedia.org/wiki/XSL_Transformations" target="_top">Extensible Stylesheet Language Transformations (XSLT)</a>.</p></td></tr>
</table></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_xml_processing"></a>11.2.2. XML processing</h3></div></div></div>
<p>There are many tools available to process XML files such as <a class="ulink" href="http://en.wikipedia.org/wiki/Extensible_Stylesheet_Language" target="_top">the Extensible Stylesheet Language (XSL)</a>.</p>
<p>Basically, once you create well formed XML file, you can convert it to any format using <a class="ulink" href="http://en.wikipedia.org/wiki/XSL_Transformations" target="_top">Extensible Stylesheet Language Transformations (XSLT)</a>.</p>
<p>The <a class="ulink" href="http://en.wikipedia.org/wiki/XSL_Formatting_Objects" target="_top">Extensible Stylesheet Language for Formatting Object (XSL-FO)</a> is supposed to be solution for formatting. The <code class="literal">fop</code> package is in the Debian <code class="literal">contrib</code> (not <code class="literal">main</code>) archive still. So the LaTeX code is usually generated from XML using XSLT and the LaTeX system is used to create printable file such as DVI, PostScript, and PDF.</p>
<div class="table">
<a name="listofxmltools"></a><p class="title"><b>Table 11.8. List of XML tools</b></p>
<div class="table-contents"><table summary="List of XML tools" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/docbook-xml" target="_top">
<code class="literal">docbook-xml</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=docbook-xml" target="_top">http://qa.debian.org/popcon.php?package=docbook-xml</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/docbook-xml.html" target="_top">2488</a></td>
<td align="left">
xml
</td>
<td align="left">
XML document type definition (DTD) for DocBook
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/xsltproc" target="_top">
<code class="literal">xsltproc</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=xsltproc" target="_top">http://qa.debian.org/popcon.php?package=xsltproc</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/x/xsltproc.html" target="_top">165</a></td>
<td align="left">
xslt
</td>
<td align="left">
XSLT command line processor (XML→ XML, HTML, plain text, etc.)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/docbook-xsl" target="_top">
<code class="literal">docbook-xsl</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=docbook-xsl" target="_top">http://qa.debian.org/popcon.php?package=docbook-xsl</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/docbook-xsl.html" target="_top">11589</a></td>
<td align="left">
xml/xslt
</td>
<td align="left">
XSL stylesheets for processing DocBook XML to various output formats with XSLT
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/xmlto" target="_top">
<code class="literal">xmlto</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=xmlto" target="_top">http://qa.debian.org/popcon.php?package=xmlto</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/x/xmlto.html" target="_top">134</a></td>
<td align="left">
xml/xslt
</td>
<td align="left">
XML-to-any converter with XSLT
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/dblatex" target="_top">
<code class="literal">dblatex</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=dblatex" target="_top">http://qa.debian.org/popcon.php?package=dblatex</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/dblatex.html" target="_top">6799</a></td>
<td align="left">
xml/xslt
</td>
<td align="left">
convert Docbook files to DVI, PostScript, PDF documents with XSLT
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/fop" target="_top">
<code class="literal">fop</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=fop" target="_top">http://qa.debian.org/popcon.php?package=fop</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/f/fop.html" target="_top">90</a></td>
<td align="left">
xml/xsl-fo
</td>
<td align="left">
convert Docbook XML files to PDF
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>Since XML is subset of <a class="ulink" href="http://en.wikipedia.org/wiki/SGML" target="_top">Standard Generalized Markup Language (SGML)</a>, it can be processed by the extensive tools available for SGML, such as <a class="ulink" href="http://en.wikipedia.org/wiki/Document_Style_Semantics_and_Specification_Language" target="_top">Document Style Semantics and Specification Language (DSSSL)</a>.</p>
<div class="table">
<a name="listofdssltools"></a><p class="title"><b>Table 11.9. List of DSSL tools</b></p>
<div class="table-contents"><table summary="List of DSSL tools" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/openjade" target="_top">
<code class="literal">openjade</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=openjade" target="_top">http://qa.debian.org/popcon.php?package=openjade</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/o/openjade.html" target="_top">1063</a></td>
<td align="left">
dsssl
</td>
<td align="left">
ISO/IEC 10179:1996 standard DSSSL processor (latest)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/openjade1.3" target="_top">
<code class="literal">openjade1.3</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=openjade1.3" target="_top">http://qa.debian.org/popcon.php?package=openjade1.3</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/o/openjade1.3.html" target="_top">2226</a></td>
<td align="left">
dsssl
</td>
<td align="left">
ISO/IEC 10179:1996 standard DSSSL processor (1.3.x series)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/jade" target="_top">
<code class="literal">jade</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=jade" target="_top">http://qa.debian.org/popcon.php?package=jade</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/j/jade.html" target="_top">872</a></td>
<td align="left">
dsssl
</td>
<td align="left">
James Clark's original DSSSL processor (1.2.x series)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/docbook-dsssl" target="_top">
<code class="literal">docbook-dsssl</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=docbook-dsssl" target="_top">http://qa.debian.org/popcon.php?package=docbook-dsssl</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/docbook-dsssl.html" target="_top">3100</a></td>
<td align="left">
xml/dsssl
</td>
<td align="left">
DSSSL stylesheets for processing DocBook XML to various output formats with DSSSL
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/docbook-utils" target="_top">
<code class="literal">docbook-utils</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=docbook-utils" target="_top">http://qa.debian.org/popcon.php?package=docbook-utils</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/docbook-utils.html" target="_top">220</a></td>
<td align="left">
xml/dsssl
</td>
<td align="left">
utilities for DocBook files including conversion to other formats (HTML, RTF, PS, man, PDF) with <code class="literal">docbook2*</code> commands with DSSSL
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/sgml2x" target="_top">
<code class="literal">sgml2x</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=sgml2x" target="_top">http://qa.debian.org/popcon.php?package=sgml2x</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/s/sgml2x.html" target="_top">216</a></td>
<td align="left">
SGML/dsssl
</td>
<td align="left">
converter from SGML and XML using DSSSL stylesheets
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p><a class="ulink" href="http://en.wikipedia.org/wiki/GNOME" target="_top">GNOME</a>'s <code class="literal">yelp</code> is sometimes handy to read <a class="ulink" href="http://en.wikipedia.org/wiki/DocBook" target="_top">DocBook</a> XML files directly since it renders decently on X.</p></td></tr>
</table></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_the_xml_data_extraction"></a>11.2.3. The XML data extraction</h3></div></div></div>
<p>You can extract HTML or XML data from other formats using followings.</p>
<div class="table">
<a name="listofxmldataextractiontools"></a><p class="title"><b>Table 11.10. List of XML data extraction tools</b></p>
<div class="table-contents"><table summary="List of XML data extraction tools" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/wv" target="_top">
<code class="literal">wv</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=wv" target="_top">http://qa.debian.org/popcon.php?package=wv</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/w/wv.html" target="_top">351</a></td>
<td align="left">
MSWord→any
</td>
<td align="left">
document converter from Microsoft Word to HTML, LaTeX, etc.
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/texi2html" target="_top">
<code class="literal">texi2html</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=texi2html" target="_top">http://qa.debian.org/popcon.php?package=texi2html</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/texi2html.html" target="_top">2076</a></td>
<td align="left">
texi→html
</td>
<td align="left">
converter from Texinfo to HTML
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/man2html" target="_top">
<code class="literal">man2html</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=man2html" target="_top">http://qa.debian.org/popcon.php?package=man2html</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/m/man2html.html" target="_top">180</a></td>
<td align="left">
manpage→html
</td>
<td align="left">
converter from manpage to HTML (CGI support)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/tex4ht" target="_top">
<code class="literal">tex4ht</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=tex4ht" target="_top">http://qa.debian.org/popcon.php?package=tex4ht</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/tex4ht.html" target="_top">515</a></td>
<td align="left">
tex↔html
</td>
<td align="left">
converter between (La)TeX and HTML
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/xlhtml" target="_top">
<code class="literal">xlhtml</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=xlhtml" target="_top">http://qa.debian.org/popcon.php?package=xlhtml</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/x/xlhtml.html" target="_top">184</a></td>
<td align="left">
MSExcel→html
</td>
<td align="left">
converter from MSExcel .xls to HTML
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/ppthtml" target="_top">
<code class="literal">ppthtml</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=ppthtml" target="_top">http://qa.debian.org/popcon.php?package=ppthtml</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/ppthtml.html" target="_top">120</a></td>
<td align="left">
MSPowerPoint→html
</td>
<td align="left">
converter from MSPowerPoint to HTML
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/unrtf" target="_top">
<code class="literal">unrtf</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=unrtf" target="_top">http://qa.debian.org/popcon.php?package=unrtf</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/u/unrtf.html" target="_top">224</a></td>
<td align="left">
rtf→html
</td>
<td align="left">
document converter from RTF to HTML, etc
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/info2www" target="_top">
<code class="literal">info2www</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=info2www" target="_top">http://qa.debian.org/popcon.php?package=info2www</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/i/info2www.html" target="_top">156</a></td>
<td align="left">
info→html
</td>
<td align="left">
converter from GNU info to HTML (CGI support)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/ooo2dbk" target="_top">
<code class="literal">ooo2dbk</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=ooo2dbk" target="_top">http://qa.debian.org/popcon.php?package=ooo2dbk</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/o/ooo2dbk.html" target="_top">941</a></td>
<td align="left">
sxw→xml
</td>
<td align="left">
converter from OpenOffice.org SXW documents to DocBook XML
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/wp2x" target="_top">
<code class="literal">wp2x</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=wp2x" target="_top">http://qa.debian.org/popcon.php?package=wp2x</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/w/wp2x.html" target="_top">156</a></td>
<td align="left">
WordPerfect→any
</td>
<td align="left">
WordPerfect 5.0 and 5.1 files to TeX, LaTeX, troff, GML and HTML
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/doclifter" target="_top">
<code class="literal">doclifter</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=doclifter" target="_top">http://qa.debian.org/popcon.php?package=doclifter</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/doclifter.html" target="_top">460</a></td>
<td align="left">
troff→xml
</td>
<td align="left">
converter from troff to DocBook XML
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>For non-XML HTML files, you can convert them to XHTML which is an instance of well formed XML. XHTML can be processed by XML tools.</p>
<div class="table">
<a name="listofxmlprettyprinttools"></a><p class="title"><b>Table 11.11. List of XML pretty print tools</b></p>
<div class="table-contents"><table summary="List of XML pretty print tools" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/libxml2-utils" target="_top">
<code class="literal">libxml2-utils</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=libxml2-utils" target="_top">http://qa.debian.org/popcon.php?package=libxml2-utils</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/libx/libxml2-utils.html" target="_top">139</a></td>
<td align="left">
xml↔html↔xhtml
</td>
<td align="left">
command line XML tool with <span class="citerefentry"><span class="refentrytitle">xmllint</span>(1)</span> (syntax check, reformat, lint, …)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/tidy" target="_top">
<code class="literal">tidy</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=tidy" target="_top">http://qa.debian.org/popcon.php?package=tidy</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/tidy.html" target="_top">82</a></td>
<td align="left">
xml↔html↔xhtml
</td>
<td align="left">
HTML syntax checker and reformatter
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>Once proper XML is generated, you can use XSLT technology to extract data based on the mark-up context etc.</p>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="_printable_data"></a>11.3. Printable data</h2></div></div></div>
<p>Printable data is expressed in the <a class="ulink" href="http://en.wikipedia.org/wiki/PostScript" target="_top">PostScript</a> format on the Debian system. <a class="ulink" href="http://en.wikipedia.org/wiki/Common_Unix_Printing_System" target="_top">Common Unix Printing System (CUPS)</a> uses Ghostscript as its rasterizer backend program for non-PostScript printers.</p>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_ghostscript"></a>11.3.1. Ghostscript</h3></div></div></div>
<p>The core of printable data manipulation is the <a class="ulink" href="http://en.wikipedia.org/wiki/Ghostscript" target="_top">Ghostscript</a> <a class="ulink" href="http://en.wikipedia.org/wiki/PostScript" target="_top">PostScript (PS)</a> interpreter which generates raster image.</p>
<p>The latest upstream Ghostscript from Artifex was re-licensed from AFPL to GPL and merged all the latest ESP version changes such as CUPS related ones at 8.60 release as unified release.</p>
<div class="table">
<a name="listofghostscripriptinterpreters"></a><p class="title"><b>Table 11.12. List of Ghostscript PostScript interpreters</b></p>
<div class="table-contents"><table summary="List of Ghostscript PostScript interpreters" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/ghostscript" target="_top">
<code class="literal">ghostscript</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=ghostscript" target="_top">http://qa.debian.org/popcon.php?package=ghostscript</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/ghostscript.html" target="_top">198</a></td>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Ghostscript" target="_top">The GPL Ghostscript PostScript/PDF interpreter</a>
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/ghostscript-x" target="_top">
<code class="literal">ghostscript-x</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=ghostscript-x" target="_top">http://qa.debian.org/popcon.php?package=ghostscript-x</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/ghostscript-x.html" target="_top">193</a></td>
<td align="left">
GPL Ghostscript PostScript/PDF interpreter - X display support
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/gs-cjk-resource" target="_top">
<code class="literal">gs-cjk-resource</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=gs-cjk-resource" target="_top">http://qa.debian.org/popcon.php?package=gs-cjk-resource</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/gs-cjk-resource.html" target="_top">4528</a></td>
<td align="left">
resource files for gs-cjk, Ghostscript <a class="ulink" href="http://en.wikipedia.org/wiki/CJK_characters" target="_top">CJK</a>-TrueType extension
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/cmap-adobe-cns1" target="_top">
<code class="literal">cmap-adobe-cns1</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=cmap-adobe-cns1" target="_top">http://qa.debian.org/popcon.php?package=cmap-adobe-cns1</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/c/cmap-adobe-cns1.html" target="_top">1572</a></td>
<td align="left">
CMaps for Adobe-CNS1 (for traditional Chinese support)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/cmap-adobe-gb1" target="_top">
<code class="literal">cmap-adobe-gb1</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=cmap-adobe-gb1" target="_top">http://qa.debian.org/popcon.php?package=cmap-adobe-gb1</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/c/cmap-adobe-gb1.html" target="_top">1552</a></td>
<td align="left">
CMaps for Adobe-GB1 (for simplified Chinese support)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/cmap-adobe-japan1" target="_top">
<code class="literal">cmap-adobe-japan1</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=cmap-adobe-japan1" target="_top">http://qa.debian.org/popcon.php?package=cmap-adobe-japan1</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/c/cmap-adobe-japan1.html" target="_top">2428</a></td>
<td align="left">
CMaps for Adobe-Japan1 (for Japanese standard support)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/cmap-adobe-japan2" target="_top">
<code class="literal">cmap-adobe-japan2</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=cmap-adobe-japan2" target="_top">http://qa.debian.org/popcon.php?package=cmap-adobe-japan2</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/c/cmap-adobe-japan2.html" target="_top">416</a></td>
<td align="left">
CMaps for Adobe-Japan2 (for Japanese extra support)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/cmap-adobe-korea1" target="_top">
<code class="literal">cmap-adobe-korea1</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=cmap-adobe-korea1" target="_top">http://qa.debian.org/popcon.php?package=cmap-adobe-korea1</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/c/cmap-adobe-korea1.html" target="_top">872</a></td>
<td align="left">
CMaps for Adobe-Korea1 (for Korean support)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/libpoppler13" target="_top">
<code class="literal">libpoppler13</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=libpoppler13" target="_top">http://qa.debian.org/popcon.php?package=libpoppler13</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/libp/libpoppler13.html" target="_top">2377</a></td>
<td align="left">
PDF rendering library based on xpdf PDF viewer
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/libpoppler-glib6" target="_top">
<code class="literal">libpoppler-glib6</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=libpoppler-glib6" target="_top">http://qa.debian.org/popcon.php?package=libpoppler-glib6</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/libp/libpoppler-glib6.html" target="_top">577</a></td>
<td align="left">
PDF rendering library (GLib-based shared library)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/poppler-data" target="_top">
<code class="literal">poppler-data</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=poppler-data" target="_top">http://qa.debian.org/popcon.php?package=poppler-data</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/poppler-data.html" target="_top">12240</a></td>
<td align="left">
CMaps for PDF rendering library (for <a class="ulink" href="http://en.wikipedia.org/wiki/CJK_characters" target="_top">CJK</a> support: Adobe-*)
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p>"<code class="literal">gs -h</code>" can display the configuration of Ghostscript.</p></td></tr>
</table></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_merge_two_ps_or_pdf_files"></a>11.3.2. Merge two PS or PDF files</h3></div></div></div>
<p>You can merge two <a class="ulink" href="http://en.wikipedia.org/wiki/PostScript" target="_top">PostScript (PS)</a> or <a class="ulink" href="http://en.wikipedia.org/wiki/Portable_Document_Format" target="_top">Portable Document Format (PDF)</a> files using <span class="citerefentry"><span class="refentrytitle">gs</span>(1)</span> of Ghostscript.</p>
<pre class="screen">$ gs -q -dNOPAUSE -dBATCH -sDEVICE=pswrite -sOutputFile=bla.ps -f foo1.ps foo2.ps
$ gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=bla.pdf -f foo1.pdf foo2.pdf</pre>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>The <a class="ulink" href="http://en.wikipedia.org/wiki/Portable_Document_Format" target="_top">PDF</a>, which is widely used cross-platform printable data format, is essentially the compressed <a class="ulink" href="http://en.wikipedia.org/wiki/PostScript" target="_top">PS</a> format with few additional features and extensions.</p></td></tr>
</table></div>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p>For command line, <span class="citerefentry"><span class="refentrytitle">psmerge</span>(1)</span> and other commands from the <code class="literal">psutils</code> package are useful for manipulating PostScript documents. Commands in the <code class="literal">pdfjam</code> package work similarly for manipulating PDF documents. <span class="citerefentry"><span class="refentrytitle">pdftk</span>(1)</span> from the <code class="literal">pdftk</code> package is useful for manipulating PDF documents, too.</p></td></tr>
</table></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_printable_data_utilities"></a>11.3.3. Printable data utilities</h3></div></div></div>
<p>The following packages for the printable data utilities caught my eyes.</p>
<div class="table">
<a name="listofprintabledatautilities"></a><p class="title"><b>Table 11.13. List of printable data utilities</b></p>
<div class="table-contents"><table summary="List of printable data utilities" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/poppler-utils" target="_top">
<code class="literal">poppler-utils</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=poppler-utils" target="_top">http://qa.debian.org/popcon.php?package=poppler-utils</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/poppler-utils.html" target="_top">542</a></td>
<td align="left">
pdf→ps,text,…
</td>
<td align="left">
PDF utilities: <code class="literal">pdftops</code>, <code class="literal">pdfinfo</code>, <code class="literal">pdfimages</code>, <code class="literal">pdftotext</code>, <code class="literal">pdffonts</code>
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/psutils" target="_top">
<code class="literal">psutils</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=psutils" target="_top">http://qa.debian.org/popcon.php?package=psutils</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/psutils.html" target="_top">243</a></td>
<td align="left">
ps→ps
</td>
<td align="left">
PostScript document conversion tools
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/poster" target="_top">
<code class="literal">poster</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=poster" target="_top">http://qa.debian.org/popcon.php?package=poster</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/poster.html" target="_top">80</a></td>
<td align="left">
ps→ps
</td>
<td align="left">
create large posters out of PostScript pages
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/enscript" target="_top">
<code class="literal">enscript</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=enscript" target="_top">http://qa.debian.org/popcon.php?package=enscript</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/e/enscript.html" target="_top">2147</a></td>
<td align="left">
text→ps, html, rtf
</td>
<td align="left">
convert ASCII text to PostScript, HTML, RTF or Pretty-Print
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/a2ps" target="_top">
<code class="literal">a2ps</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=a2ps" target="_top">http://qa.debian.org/popcon.php?package=a2ps</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/a/a2ps.html" target="_top">4292</a></td>
<td align="left">
text→ps
</td>
<td align="left">
'Anything to PostScript' converter and pretty-printer
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/pdftk" target="_top">
<code class="literal">pdftk</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=pdftk" target="_top">http://qa.debian.org/popcon.php?package=pdftk</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/pdftk.html" target="_top">3039</a></td>
<td align="left">
pdf→pdf
</td>
<td align="left">
PDF document conversion tool: <code class="literal">pdftk</code>
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/mpage" target="_top">
<code class="literal">mpage</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=mpage" target="_top">http://qa.debian.org/popcon.php?package=mpage</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/m/mpage.html" target="_top">224</a></td>
<td align="left">
text,ps→ps
</td>
<td align="left">
print multiple pages per sheet
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/html2ps" target="_top">
<code class="literal">html2ps</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=html2ps" target="_top">http://qa.debian.org/popcon.php?package=html2ps</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/h/html2ps.html" target="_top">320</a></td>
<td align="left">
html→ps
</td>
<td align="left">
converter from HTML to PostScript
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/pdfjam" target="_top">
<code class="literal">pdfjam</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=pdfjam" target="_top">http://qa.debian.org/popcon.php?package=pdfjam</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/pdfjam.html" target="_top">122</a></td>
<td align="left">
pdf→pdf
</td>
<td align="left">
PDF document conversion tools: <code class="literal">pdf90</code>, <code class="literal">pdfjoin</code>, and <code class="literal">pdfnup</code>
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/gnuhtml2latex" target="_top">
<code class="literal">gnuhtml2latex</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=gnuhtml2latex" target="_top">http://qa.debian.org/popcon.php?package=gnuhtml2latex</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/gnuhtml2latex.html" target="_top">53</a></td>
<td align="left">
html→latex
</td>
<td align="left">
converter from html to latex
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/latex2rtf" target="_top">
<code class="literal">latex2rtf</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=latex2rtf" target="_top">http://qa.debian.org/popcon.php?package=latex2rtf</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/l/latex2rtf.html" target="_top">508</a></td>
<td align="left">
latex→rtf
</td>
<td align="left">
convert documents from LaTeX to RTF which can be read by MS Word
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/ps2eps" target="_top">
<code class="literal">ps2eps</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=ps2eps" target="_top">http://qa.debian.org/popcon.php?package=ps2eps</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/ps2eps.html" target="_top">136</a></td>
<td align="left">
ps→eps
</td>
<td align="left">
converter from PostScript to EPS (Encapsulated PostScript)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/e2ps" target="_top">
<code class="literal">e2ps</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=e2ps" target="_top">http://qa.debian.org/popcon.php?package=e2ps</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/e/e2ps.html" target="_top">188</a></td>
<td align="left">
text→ps
</td>
<td align="left">
Text to PostScript converter with Japanese encoding support
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/impose+" target="_top">
<code class="literal">impose+</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=impose+" target="_top">http://qa.debian.org/popcon.php?package=impose+</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/i/impose+.html" target="_top">180</a></td>
<td align="left">
ps→ps
</td>
<td align="left">
PostScript utilities
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/trueprint" target="_top">
<code class="literal">trueprint</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=trueprint" target="_top">http://qa.debian.org/popcon.php?package=trueprint</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/trueprint.html" target="_top">188</a></td>
<td align="left">
text→ps
</td>
<td align="left">
pretty print many source codes (C, C++, Java, Pascal, Perl, Pike, Sh, and Verilog) to PostScript. (C language)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/pdf2svg" target="_top">
<code class="literal">pdf2svg</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=pdf2svg" target="_top">http://qa.debian.org/popcon.php?package=pdf2svg</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/pdf2svg.html" target="_top">60</a></td>
<td align="left">
ps→svg
</td>
<td align="left">
converter from PDF to <a class="ulink" href="http://en.wikipedia.org/wiki/Scalable_Vector_Graphics" target="_top">Scalable vector graphics</a> format
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/pdftoipe" target="_top">
<code class="literal">pdftoipe</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=pdftoipe" target="_top">http://qa.debian.org/popcon.php?package=pdftoipe</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/pdftoipe.html" target="_top">91</a></td>
<td align="left">
ps→ipe
</td>
<td align="left">
converter from PDF to IPE's XML format
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break">
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_printing_with_cups"></a>11.3.4. Printing with CUPS</h3></div></div></div>
<p>Both <span class="citerefentry"><span class="refentrytitle">lp</span>(1)</span> and <span class="citerefentry"><span class="refentrytitle">lpr</span>(1)</span> commands offered by <a class="ulink" href="http://en.wikipedia.org/wiki/Common_Unix_Printing_System" target="_top">Common Unix Printing System (CUPS)</a> provides options for customized printing the printable data.</p>
<p>You can print 3 copies of a file collated using one of the following commands.</p>
<pre class="screen">$ lp -n 3 -o Collate=True filename</pre>
<pre class="screen">$ lpr -#3 -o Collate=True filename</pre>
<p>You can further customize printer operation by using printer option such as "<code class="literal">-o number-up=2</code>", "<code class="literal">-o page-set=even</code>", "<code class="literal">-o page-set=odd</code>", "<code class="literal">-o scaling=200</code>", "<code class="literal">-o natural-scaling=200</code>", etc., documented at <a class="ulink" href="http://localhost:631/help/options.html" target="_top">Command-Line Printing and Options</a>.</p>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="_type_setting"></a>11.4. Type setting</h2></div></div></div>
<p>The Unix <a class="ulink" href="http://en.wikipedia.org/wiki/Troff" target="_top">troff</a> program originally developed by AT&amp;T can be used for simple typesetting. It is usually used to create manpages.</p>
<p><a class="ulink" href="http://en.wikipedia.org/wiki/TeX" target="_top">TeX</a> created by Donald Knuth is very powerful type setting tool and is the de facto standard. <a class="ulink" href="http://en.wikipedia.org/wiki/LaTeX" target="_top">LaTeX</a> originally written by Leslie Lamport enables a high-level access to the power of TeX.</p>
<div class="table">
<a name="listoftypesettingtools"></a><p class="title"><b>Table 11.14. List of type setting tools</b></p>
<div class="table-contents"><table summary="List of type setting tools" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/texlive" target="_top">
<code class="literal">texlive</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=texlive" target="_top">http://qa.debian.org/popcon.php?package=texlive</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/texlive.html" target="_top">103</a></td>
<td align="left">
(La)TeX
</td>
<td align="left">
TeX system for typesetting, previewing and printing
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/groff" target="_top">
<code class="literal">groff</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=groff" target="_top">http://qa.debian.org/popcon.php?package=groff</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/groff.html" target="_top">8095</a></td>
<td align="left">
troff
</td>
<td align="left">
GNU troff text-formatting system
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_roff_typesetting"></a>11.4.1. roff typesetting</h3></div></div></div>
<p>Traditionally, <a class="ulink" href="http://en.wikipedia.org/wiki/Roff" target="_top">roff</a> is the main Unix text processing system. See <span class="citerefentry"><span class="refentrytitle">roff</span>(7)</span>, <span class="citerefentry"><span class="refentrytitle">groff</span>(7)</span>, <span class="citerefentry"><span class="refentrytitle">groff</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">grotty</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">troff</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">groff_mdoc</span>(7)</span>, <span class="citerefentry"><span class="refentrytitle">groff_man</span>(7)</span>, <span class="citerefentry"><span class="refentrytitle">groff_ms</span>(7)</span>, <span class="citerefentry"><span class="refentrytitle">groff_me</span>(7)</span>, <span class="citerefentry"><span class="refentrytitle">groff_mm</span>(7)</span>, and "<code class="literal">info groff</code>".</p>
<p>You can read or print a good tutorial and reference on "<code class="literal">-me</code>" <a class="ulink" href="http://en.wikipedia.org/wiki/Macro_(computer_science)" target="_top">macro</a> in "<code class="literal">/usr/share/doc/groff/</code>" by installing the <code class="literal">groff</code> package.</p>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p>"<code class="literal">groff -Tascii -me -</code>" produces plain text output with <a class="ulink" href="http://en.wikipedia.org/wiki/ANSI_escape_code" target="_top">ANSI escape code</a>. If you wish to get manpage like output with many "^H" and "_", use "<code class="literal">GROFF_NO_SGR=1 groff -Tascii -me -</code>" instead.</p></td></tr>
</table></div>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p>To remove "^H" and "_" from a text file generated by <code class="literal">groff</code>, filter it by "<code class="literal">col -b -x</code>".</p></td></tr>
</table></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_tex_latex"></a>11.4.2. TeX/LaTeX</h3></div></div></div>
<p>The <a class="ulink" href="http://en.wikipedia.org/wiki/TeX_Live" target="_top">TeX Live</a> software distribution offers a complete TeX system. The <code class="literal">texlive</code> metapackage provides a decent selection of the <a class="ulink" href="http://en.wikipedia.org/wiki/TeX_Live" target="_top">TeX Live</a> packages which should suffice for the most common tasks.</p>
<p>There are many references available for <a class="ulink" href="http://en.wikipedia.org/wiki/TeX" target="_top">TeX</a> and <a class="ulink" href="http://en.wikipedia.org/wiki/LaTeX" target="_top">LaTeX</a>.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p>
<a class="ulink" href="http://www.tldp.org/HOWTO/TeTeX-HOWTO.html" target="_top">The teTeX HOWTO: The Linux-teTeX Local Guide</a>
</p></li>
<li class="listitem"><p><span class="citerefentry"><span class="refentrytitle">tex</span>(1)</span>
</p></li>
<li class="listitem"><p><span class="citerefentry"><span class="refentrytitle">latex</span>(1)</span>
</p></li>
<li class="listitem"><p>
"The TeXbook", by Donald E. Knuth, (Addison-Wesley)
</p></li>
<li class="listitem"><p>
"LaTeX - A Document Preparation System", by Leslie Lamport, (Addison-Wesley)
</p></li>
<li class="listitem"><p>
"The LaTeX Companion", by Goossens, Mittelbach, Samarin, (Addison-Wesley)
</p></li>
</ul></div>
<p>This is the most powerful typesetting environment. Many <a class="ulink" href="http://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language" target="_top">SGML</a> processors use this as their back end text processor. <a class="ulink" href="http://en.wikipedia.org/wiki/Lyx" target="_top">Lyx</a> provided by the <code class="literal">lyx</code> package and <a class="ulink" href="http://en.wikipedia.org/wiki/GNU_TeXmacs" target="_top">GNU TeXmacs</a> provided by the <code class="literal">texmacs</code> package offer nice <a class="ulink" href="http://en.wikipedia.org/wiki/WYSIWYG" target="_top">WYSIWYG</a> editing environment for <a class="ulink" href="http://en.wikipedia.org/wiki/LaTeX" target="_top">LaTeX</a> while many use <a class="ulink" href="http://en.wikipedia.org/wiki/Emacs" target="_top">Emacs</a> and <a class="ulink" href="http://en.wikipedia.org/wiki/Vim_(text_editor)" target="_top">Vim</a> as the choice for the source editor.</p>
<p>There are many online resources available.</p>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem"><p>
The TEX Live Guide - TEX Live 2007 ("<code class="literal">/usr/share/doc/texlive-doc-base/english/texlive-en/live.html</code>") (<code class="literal">texlive-doc-base</code> package)
</p></li>
<li class="listitem"><p>
<a class="ulink" href="http://www.stat.rice.edu/~helpdesk/howto/lyxguide.html" target="_top">A Simple Guide to Latex/Lyx</a>
</p></li>
<li class="listitem"><p>
<a class="ulink" href="http://www-h.eng.cam.ac.uk/help/tpl/textprocessing/latex_basic/latex_basic.html" target="_top">Word Processing Using LaTeX</a>
</p></li>
<li class="listitem"><p>
<a class="ulink" href="http://supportweb.cs.bham.ac.uk/documentation/LaTeX/lguide/local-guide/local-guide.html" target="_top">Local User Guide to teTeX/LaTeX</a>
</p></li>
</ul></div>
<p>When documents become bigger, sometimes TeX may cause errors. You must increase pool size in "<code class="literal">/etc/texmf/texmf.cnf</code>" (or more appropriately edit "<code class="literal">/etc/texmf/texmf.d/95NonPath</code>" and run <span class="citerefentry"><span class="refentrytitle">update-texmf</span>(8)</span>) to fix this.</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>The TeX source of "The TeXbook" is available at <a class="ulink" href="http://tug.ctan.org/tex-archive/systems/knuth/dist/tex/texbook.tex" target="_top">http://tug.ctan.org/tex-archive/systems/knuth/dist/tex/texbook.tex</a>.</p></td></tr>
</table></div>
<p>This file contains most of the required macros. I heard that you can process this document with <span class="citerefentry"><span class="refentrytitle">tex</span>(1)</span> after commenting lines 7 to 10 and adding "<code class="literal">\input manmac \proofmodefalse</code>". It's strongly recommended to buy this book (and all other books from Donald E. Knuth) instead of using the online version but the source is a great example of TeX input!</p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_pretty_print_a_manual_page"></a>11.4.3. Pretty print a manual page</h3></div></div></div>
<p>You can print a manual page in PostScript nicely by one of the following commands.</p>
<pre class="screen">$ man -Tps some_manpage | lpr</pre>
<pre class="screen">$ man -Tps some_manpage | mpage -2 | lpr</pre>
<p>The second example prints 2 pages on one sheet.</p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_creating_a_manual_page"></a>11.4.4. Creating a manual page</h3></div></div></div>
<p>Although writing a manual page (manpage) in the plain <a class="ulink" href="http://en.wikipedia.org/wiki/Troff" target="_top">troff</a> format is possible, there are few helper packages to create it.</p>
<div class="table">
<a name="listofpackagestoeatingthemanpage"></a><p class="title"><b>Table 11.15. List of packages to help creating the manpage</b></p>
<div class="table-contents"><table summary="List of packages to help creating the manpage" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/docbook-to-man" target="_top">
<code class="literal">docbook-to-man</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=docbook-to-man" target="_top">http://qa.debian.org/popcon.php?package=docbook-to-man</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/docbook-to-man.html" target="_top">213</a></td>
<td align="left">
SGML→manpage
</td>
<td align="left">
converter from DocBook SGML into roff man macros
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/help2man" target="_top">
<code class="literal">help2man</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=help2man" target="_top">http://qa.debian.org/popcon.php?package=help2man</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/h/help2man.html" target="_top">485</a></td>
<td align="left">
text→manpage
</td>
<td align="left">
automatic manpage generator from --help
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/info2man" target="_top">
<code class="literal">info2man</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=info2man" target="_top">http://qa.debian.org/popcon.php?package=info2man</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/i/info2man.html" target="_top">161</a></td>
<td align="left">
info→manpage
</td>
<td align="left">
converter from GNU info to POD or man pages
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/txt2man" target="_top">
<code class="literal">txt2man</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=txt2man" target="_top">http://qa.debian.org/popcon.php?package=txt2man</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/txt2man.html" target="_top">88</a></td>
<td align="left">
text→manpage
</td>
<td align="left">
convert flat ASCII text to man page format
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break">
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="_the_mail_data_conversion"></a>11.5. The mail data conversion</h2></div></div></div>
<p>The following packages for the mail data conversion caught my eyes.</p>
<div class="table">
<a name="listofpackagestoildataconversion"></a><p class="title"><b>Table 11.16. List of packages to help mail data conversion</b></p>
<div class="table-contents"><table summary="List of packages to help mail data conversion" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/sharutils" target="_top">
<code class="literal">sharutils</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=sharutils" target="_top">http://qa.debian.org/popcon.php?package=sharutils</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/s/sharutils.html" target="_top">1408</a></td>
<td align="left">
mail
</td>
<td align="left">
<span class="citerefentry"><span class="refentrytitle">shar</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">unshar</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">uuencode</span>(1)</span>, <span class="citerefentry"><span class="refentrytitle">uudecode</span>(1)</span>
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/mpack" target="_top">
<code class="literal">mpack</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=mpack" target="_top">http://qa.debian.org/popcon.php?package=mpack</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/m/mpack.html" target="_top">109</a></td>
<td align="left">
MIME
</td>
<td align="left">
encoder and decoder <a class="ulink" href="http://en.wikipedia.org/wiki/MIME" target="_top">MIME</a> messages: <span class="citerefentry"><span class="refentrytitle">mpack</span>(1)</span> and <span class="citerefentry"><span class="refentrytitle">munpack</span>(1)</span>
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/tnef" target="_top">
<code class="literal">tnef</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=tnef" target="_top">http://qa.debian.org/popcon.php?package=tnef</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/tnef.html" target="_top">132</a></td>
<td align="left">
ms-tnef
</td>
<td align="left">
unpacking <a class="ulink" href="http://en.wikipedia.org/wiki/MIME" target="_top">MIME</a> attachments of type "application/ms-tnef" which is a Microsoft only format
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/uudeview" target="_top">
<code class="literal">uudeview</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=uudeview" target="_top">http://qa.debian.org/popcon.php?package=uudeview</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/u/uudeview.html" target="_top">117</a></td>
<td align="left">
mail
</td>
<td align="left">
encoder and decoder for the following formats: <a class="ulink" href="http://en.wikipedia.org/wiki/Uuencoding" target="_top">uuencode</a>, <a class="ulink" href="http://en.wikipedia.org/wiki/Xxencode" target="_top">xxencode</a>, <a class="ulink" href="http://en.wikipedia.org/wiki/Base64" target="_top">BASE64</a>, <a class="ulink" href="http://en.wikipedia.org/wiki/Quoted-printable" target="_top">quoted printable</a>, and <a class="ulink" href="http://en.wikipedia.org/wiki/BinHex" target="_top">BinHex</a>
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/readpst" target="_top">
<code class="literal">readpst</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=readpst" target="_top">http://qa.debian.org/popcon.php?package=readpst</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/r/readpst.html" target="_top">228</a></td>
<td align="left">
PST
</td>
<td align="left">
convert Microsoft <a class="ulink" href="http://en.wikipedia.org/wiki/Personal_Folders_(.pst)_file" target="_top">Outlook PST files</a> to <a class="ulink" href="http://en.wikipedia.org/wiki/Mbox" target="_top">mbox</a> format
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p>The <a class="ulink" href="http://en.wikipedia.org/wiki/Internet_Message_Access_Protocol" target="_top">Internet Message Access Protocol</a> version 4 (IMAP4) server (see <a class="xref" href="ch06.en.html#_pop3_imap4_server" title="6.7. POP3/IMAP4 server">Section 6.7, “POP3/IMAP4 server”</a>) may be used to move mails out from proprietary mail systems if the mail client software can be configured to use IMAP4 server too.</p></td></tr>
</table></div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="_mail_data_basics"></a>11.5.1. Mail data basics</h3></div></div></div>
<p>Mail (<a class="ulink" href="http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol" target="_top">SMTP</a>) data should be limited to 7 bit. So binary data and 8 bit text data are encoded into 7 bit format with the <a class="ulink" href="http://en.wikipedia.org/wiki/MIME" target="_top">Multipurpose Internet Mail Extensions (MIME)</a> and the selection of the charset (see <a class="xref" href="ch08.en.html#_basics_of_encoding" title="8.3.1. Basics of encoding">Section 8.3.1, “Basics of encoding”</a>).</p>
<p>The standard mail storage format is mbox formatted according to <a class="ulink" href="http://tools.ietf.org/html/rfc2822" target="_top">RFC2822 (updated RFC822)</a>. See <span class="citerefentry"><span class="refentrytitle">mbox</span>(5)</span> (provided by the <code class="literal">mutt</code> package).</p>
<p>For European languages, "<code class="literal">Content-Transfer-Encoding: quoted-printable</code>" with the ISO-8859-1 charset is usually used for mail since there are not much 8 bit characters. If European text is encoded in UTF-8, "<code class="literal">Content-Transfer-Encoding: quoted-printable</code>" is likely to be used since it is mostly 7 bit data.</p>
<p>For Japanese, traditionally "<code class="literal">Content-Type: text/plain; charset=ISO-2022-JP</code>" is usually used for mail to keep text in 7 bits. But older Microsoft systems may send mail data in Shift-JIS without proper declaration. If Japanese text is encoded in UTF-8, <a class="ulink" href="http://en.wikipedia.org/wiki/Base64" target="_top">Base64</a> is likely to be used since it contains many 8 bit data. The situation of other Asian languages is similar.</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>If your non-Unix mail data is accessible by a non-Debian client software which can talk to the IMAP4 server, you may be able to move them out by running your own IMAP4 server (see <a class="xref" href="ch06.en.html#_pop3_imap4_server" title="6.7. POP3/IMAP4 server">Section 6.7, “POP3/IMAP4 server”</a>).</p></td></tr>
</table></div>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>If you use other mail storage formats, moving them to mbox format is the good first step. The versatile client program such as <span class="citerefentry"><span class="refentrytitle">mutt</span>(1)</span> may be handy for this.</p></td></tr>
</table></div>
<p>You can split mailbox contents to each message using <span class="citerefentry"><span class="refentrytitle">procmail</span>(1)</span> and <span class="citerefentry"><span class="refentrytitle">formail</span>(1)</span>.</p>
<p>Each mail message can be unpacked using <span class="citerefentry"><span class="refentrytitle">munpack</span>(1)</span> from the <code class="literal">mpack</code> package (or other specialized tools) to obtain the MIME encoded contents.</p>
</div>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="_graphic_data_tools"></a>11.6. Graphic data tools</h2></div></div></div>
<p>The following packages for the graphic data conversion, editing, and organization tools caught my eyes.</p>
<div class="table">
<a name="listofgraphicdatatools"></a><p class="title"><b>Table 11.17. List of graphic data tools</b></p>
<div class="table-contents"><table summary="List of graphic data tools" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/gimp" target="_top">
<code class="literal">gimp</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=gimp" target="_top">http://qa.debian.org/popcon.php?package=gimp</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/gimp.html" target="_top">15168</a></td>
<td align="left">
image(bitmap)
</td>
<td align="left">
GNU Image Manipulation Program
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/imagemagick" target="_top">
<code class="literal">imagemagick</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=imagemagick" target="_top">http://qa.debian.org/popcon.php?package=imagemagick</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/i/imagemagick.html" target="_top">207</a></td>
<td align="left">
image(bitmap)
</td>
<td align="left">
image manipulation programs
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/graphicsmagick" target="_top">
<code class="literal">graphicsmagick</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=graphicsmagick" target="_top">http://qa.debian.org/popcon.php?package=graphicsmagick</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/graphicsmagick.html" target="_top">4335</a></td>
<td align="left">
image(bitmap)
</td>
<td align="left">
image manipulation programs (folk of <code class="literal">imagemagick</code>)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/xsane" target="_top">
<code class="literal">xsane</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=xsane" target="_top">http://qa.debian.org/popcon.php?package=xsane</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/x/xsane.html" target="_top">702</a></td>
<td align="left">
image(bitmap)
</td>
<td align="left">
GTK+-based X11 frontend for SANE (Scanner Access Now Easy)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/netpbm" target="_top">
<code class="literal">netpbm</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=netpbm" target="_top">http://qa.debian.org/popcon.php?package=netpbm</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/n/netpbm.html" target="_top">3464</a></td>
<td align="left">
image(bitmap)
</td>
<td align="left">
graphics conversion tools
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/icoutils" target="_top">
<code class="literal">icoutils</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=icoutils" target="_top">http://qa.debian.org/popcon.php?package=icoutils</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/i/icoutils.html" target="_top">160</a></td>
<td align="left">
png↔ico(bitmap)
</td>
<td align="left">
convert <a class="ulink" href="http://en.wikipedia.org/wiki/ICO_(icon_image_file_format)" target="_top">MS Windows icons and cursors to and from PNG formats</a> (<a class="ulink" href="http://en.wikipedia.org/wiki/Favicon" target="_top">favicon.ico</a>)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/scribus" target="_top">
<code class="literal">scribus</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=scribus" target="_top">http://qa.debian.org/popcon.php?package=scribus</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/s/scribus.html" target="_top">54492</a></td>
<td align="left">
ps/pdf/SVG/…
</td>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Scribus" target="_top">Scribus</a> DTP editor
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/openoffice.org-draw" target="_top">
<code class="literal">openoffice.org-draw</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=openoffice.org-draw" target="_top">http://qa.debian.org/popcon.php?package=openoffice.org-draw</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/o/openoffice.org-draw.html" target="_top">164</a></td>
<td align="left">
image(vector)
</td>
<td align="left">
OpenOffice.org office suite - drawing
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/inkscape" target="_top">
<code class="literal">inkscape</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=inkscape" target="_top">http://qa.debian.org/popcon.php?package=inkscape</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/i/inkscape.html" target="_top">80425</a></td>
<td align="left">
image(vector)
</td>
<td align="left">
<a class="ulink" href="http://en.wikipedia.org/wiki/Scalable_Vector_Graphics" target="_top">SVG (Scalable Vector Graphics)</a> editor
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/dia-gnome" target="_top">
<code class="literal">dia-gnome</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=dia-gnome" target="_top">http://qa.debian.org/popcon.php?package=dia-gnome</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/dia-gnome.html" target="_top">617</a></td>
<td align="left">
image(vector)
</td>
<td align="left">
diagram editor (GNOME)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/dia" target="_top">
<code class="literal">dia</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=dia" target="_top">http://qa.debian.org/popcon.php?package=dia</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/dia.html" target="_top">617</a></td>
<td align="left">
image(vector)
</td>
<td align="left">
diagram editor (Gtk)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/xfig" target="_top">
<code class="literal">xfig</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=xfig" target="_top">http://qa.debian.org/popcon.php?package=xfig</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/x/xfig.html" target="_top">1597</a></td>
<td align="left">
image(vector)
</td>
<td align="left">
facility for Interactive Generation of figures under X11
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/pstoedit" target="_top">
<code class="literal">pstoedit</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=pstoedit" target="_top">http://qa.debian.org/popcon.php?package=pstoedit</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/p/pstoedit.html" target="_top">683</a></td>
<td align="left">
ps/pdf→image(vector)
</td>
<td align="left">
PostScript and PDF files to editable vector graphics converter (SVG)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/libwmf-bin" target="_top">
<code class="literal">libwmf-bin</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=libwmf-bin" target="_top">http://qa.debian.org/popcon.php?package=libwmf-bin</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/libw/libwmf-bin.html" target="_top">118</a></td>
<td align="left">
Windows/image(vector)
</td>
<td align="left">
Windows metafile (vector graphic data) conversion tools
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/fig2sxd" target="_top">
<code class="literal">fig2sxd</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=fig2sxd" target="_top">http://qa.debian.org/popcon.php?package=fig2sxd</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/f/fig2sxd.html" target="_top">200</a></td>
<td align="left">
fig→sxd(vector)
</td>
<td align="left">
convert XFig files to OpenOffice.org Draw format
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/unpaper" target="_top">
<code class="literal">unpaper</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=unpaper" target="_top">http://qa.debian.org/popcon.php?package=unpaper</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/u/unpaper.html" target="_top">736</a></td>
<td align="left">
image→image
</td>
<td align="left">
post-processing tool for scanned pages for <a class="ulink" href="http://en.wikipedia.org/wiki/Optical_character_recognition" target="_top">OCR</a>
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/tesseract-ocr" target="_top">
<code class="literal">tesseract-ocr</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=tesseract-ocr" target="_top">http://qa.debian.org/popcon.php?package=tesseract-ocr</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/tesseract-ocr.html" target="_top">435</a></td>
<td align="left">
image→text
</td>
<td align="left">
free <a class="ulink" href="http://en.wikipedia.org/wiki/Optical_character_recognition" target="_top">OCR</a> software based on the HP's commercial OCR engine
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/tesseract-ocr-eng" target="_top">
<code class="literal">tesseract-ocr-eng</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=tesseract-ocr-eng" target="_top">http://qa.debian.org/popcon.php?package=tesseract-ocr-eng</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/t/tesseract-ocr-eng.html" target="_top">58870</a></td>
<td align="left">
image→text
</td>
<td align="left">
OCR engine data: tesseract-ocr language files for English text
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/gocr" target="_top">
<code class="literal">gocr</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=gocr" target="_top">http://qa.debian.org/popcon.php?package=gocr</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/gocr.html" target="_top">473</a></td>
<td align="left">
image→text
</td>
<td align="left">
free OCR software
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/ocrad" target="_top">
<code class="literal">ocrad</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=ocrad" target="_top">http://qa.debian.org/popcon.php?package=ocrad</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/o/ocrad.html" target="_top">255</a></td>
<td align="left">
image→text
</td>
<td align="left">
free OCR software
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/gtkam" target="_top">
<code class="literal">gtkam</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=gtkam" target="_top">http://qa.debian.org/popcon.php?package=gtkam</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/gtkam.html" target="_top">1255</a></td>
<td align="left">
image(Exif)
</td>
<td align="left">
manipulate digital camera photo files (GNOME) - GUI
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/gphoto2" target="_top">
<code class="literal">gphoto2</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=gphoto2" target="_top">http://qa.debian.org/popcon.php?package=gphoto2</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/gphoto2.html" target="_top">1036</a></td>
<td align="left">
image(Exif)
</td>
<td align="left">
manipulate digital camera photo files (GNOME) - command line
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/kamera" target="_top">
<code class="literal">kamera</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=kamera" target="_top">http://qa.debian.org/popcon.php?package=kamera</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/k/kamera.html" target="_top">245</a></td>
<td align="left">
image(Exif)
</td>
<td align="left">
manipulate digital camera photo files (KDE)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/jhead" target="_top">
<code class="literal">jhead</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=jhead" target="_top">http://qa.debian.org/popcon.php?package=jhead</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/j/jhead.html" target="_top">126</a></td>
<td align="left">
image(Exif)
</td>
<td align="left">
manipulate the non-image part of Exif compliant JPEG (digital camera photo) files
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/exif" target="_top">
<code class="literal">exif</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=exif" target="_top">http://qa.debian.org/popcon.php?package=exif</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/e/exif.html" target="_top">212</a></td>
<td align="left">
image(Exif)
</td>
<td align="left">
command-line utility to show EXIF information in JPEG files
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/exiftags" target="_top">
<code class="literal">exiftags</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=exiftags" target="_top">http://qa.debian.org/popcon.php?package=exiftags</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/e/exiftags.html" target="_top">198</a></td>
<td align="left">
image(Exif)
</td>
<td align="left">
utility to read Exif tags from a digital camera JPEG file
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/exiftran" target="_top">
<code class="literal">exiftran</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=exiftran" target="_top">http://qa.debian.org/popcon.php?package=exiftran</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/e/exiftran.html" target="_top">91</a></td>
<td align="left">
image(Exif)
</td>
<td align="left">
transform digital camera jpeg images
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/exifprobe" target="_top">
<code class="literal">exifprobe</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=exifprobe" target="_top">http://qa.debian.org/popcon.php?package=exifprobe</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/e/exifprobe.html" target="_top">484</a></td>
<td align="left">
image(Exif)
</td>
<td align="left">
read metadata from digital pictures
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/dcraw" target="_top">
<code class="literal">dcraw</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=dcraw" target="_top">http://qa.debian.org/popcon.php?package=dcraw</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/d/dcraw.html" target="_top">424</a></td>
<td align="left">
image(Raw)→ppm
</td>
<td align="left">
decode raw digital camera images
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/findimagedupes" target="_top">
<code class="literal">findimagedupes</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=findimagedupes" target="_top">http://qa.debian.org/popcon.php?package=findimagedupes</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/f/findimagedupes.html" target="_top">123</a></td>
<td align="left">
image→fingerprint
</td>
<td align="left">
find visually similar or duplicate images
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/ale" target="_top">
<code class="literal">ale</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=ale" target="_top">http://qa.debian.org/popcon.php?package=ale</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/a/ale.html" target="_top">757</a></td>
<td align="left">
image→image
</td>
<td align="left">
merge images to increase fidelity or create mosaics
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/imageindex" target="_top">
<code class="literal">imageindex</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=imageindex" target="_top">http://qa.debian.org/popcon.php?package=imageindex</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/i/imageindex.html" target="_top">171</a></td>
<td align="left">
image(Exif)→html
</td>
<td align="left">
generate static HTML galleries from images
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/f-spot" target="_top">
<code class="literal">f-spot</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=f-spot" target="_top">http://qa.debian.org/popcon.php?package=f-spot</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/f/f-spot.html" target="_top">8219</a></td>
<td align="left">
image(Exif)
</td>
<td align="left">
personal photo management application (GNOME)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/bins" target="_top">
<code class="literal">bins</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=bins" target="_top">http://qa.debian.org/popcon.php?package=bins</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/b/bins.html" target="_top">2008</a></td>
<td align="left">
image(Exif)→html
</td>
<td align="left">
generate static HTML photo albums using XML and EXIF tags
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/gallery2" target="_top">
<code class="literal">gallery2</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=gallery2" target="_top">http://qa.debian.org/popcon.php?package=gallery2</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/g/gallery2.html" target="_top">46635</a></td>
<td align="left">
image(Exif)→html
</td>
<td align="left">
generate browsable HTML photo albums with thumbnails
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/outguess" target="_top">
<code class="literal">outguess</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=outguess" target="_top">http://qa.debian.org/popcon.php?package=outguess</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/o/outguess.html" target="_top">252</a></td>
<td align="left">
jpeg,png
</td>
<td align="left">
universal <a class="ulink" href="http://en.wikipedia.org/wiki/Steganography" target="_top">Steganographic</a> tool
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/qcad" target="_top">
<code class="literal">qcad</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=qcad" target="_top">http://qa.debian.org/popcon.php?package=qcad</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/q/qcad.html" target="_top">31</a></td>
<td align="left">
DXF
</td>
<td align="left">
CAD data editor (KDE)
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/blender" target="_top">
<code class="literal">blender</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=blender" target="_top">http://qa.debian.org/popcon.php?package=blender</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/b/blender.html" target="_top">57306</a></td>
<td align="left">
blend, TIFF, VRML, …
</td>
<td align="left">
3D content editor for animation etc
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/mm3d" target="_top">
<code class="literal">mm3d</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=mm3d" target="_top">http://qa.debian.org/popcon.php?package=mm3d</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/m/mm3d.html" target="_top">5171</a></td>
<td align="left">
ms3d, obj, dxf, …
</td>
<td align="left">
OpenGL based 3D model editor
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/open-font-design-toolkit" target="_top">
<code class="literal">open-font-design-toolkit</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=open-font-design-toolkit" target="_top">http://qa.debian.org/popcon.php?package=open-font-design-toolkit</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/o/open-font-design-toolkit.html" target="_top">27</a></td>
<td align="left">
ttf, ps, …
</td>
<td align="left">
metapackage for open font design
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/fontforge" target="_top">
<code class="literal">fontforge</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=fontforge" target="_top">http://qa.debian.org/popcon.php?package=fontforge</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/f/fontforge.html" target="_top">6696</a></td>
<td align="left">
ttf, ps, …
</td>
<td align="left">
font editor for PS, TrueType and OpenType fonts
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/xgridfit" target="_top">
<code class="literal">xgridfit</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=xgridfit" target="_top">http://qa.debian.org/popcon.php?package=xgridfit</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/x/xgridfit.html" target="_top">1060</a></td>
<td align="left">
ttf
</td>
<td align="left">
program for <span class="strong"><strong>gridfitting</strong></span> and <span class="strong"><strong>hinting</strong></span> TrueType fonts
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p>Search more image tools using regex "<code class="literal">~Gworks-with::image</code>" in <span class="citerefentry"><span class="refentrytitle">aptitude</span>(8)</span> (see <a class="xref" href="ch02.en.html#_search_method_options_with_aptitude" title="2.2.6. Search method options with aptitude">Section 2.2.6, “Search method options with aptitude”</a>).</p></td></tr>
</table></div>
<p>Although GUI programs such as <span class="citerefentry"><span class="refentrytitle">gimp</span>(1)</span> are very powerful, command line tools such as <span class="citerefentry"><span class="refentrytitle">imagemagick</span>(1)</span> are quite useful for automating image manipulation with the script.</p>
<p>The de facto image file format of the digital camera is the <a class="ulink" href="http://en.wikipedia.org/wiki/Exchangeable_image_file_format" target="_top">Exchangeable Image File Format</a> (EXIF) which is the <a class="ulink" href="http://en.wikipedia.org/wiki/JPEG" target="_top">JPEG</a> image file format with additional metadata tags. It can hold information such as date, time, and camera settings.</p>
<p><a class="ulink" href="http://en.wikipedia.org/wiki/Lempel-Ziv-Welch" target="_top">The Lempel-Ziv-Welch (LZW) lossless data compression</a> patent has been expired. <a class="ulink" href="http://en.wikipedia.org/wiki/Graphics_Interchange_Format" target="_top">Graphics Interchange Format (GIF)</a> utilities which use the LZW compression method are now freely available on the Debian system.</p>
<div class="tip" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top"><p>Any digital camera or scanner with removable recording media works with Linux through <a class="ulink" href="http://en.wikipedia.org/wiki/USB_flash_drive" target="_top">USB storage</a> readers since it follows the <a class="ulink" href="http://en.wikipedia.org/wiki/Design_rule_for_Camera_File_system" target="_top">Design rule for Camera Filesystem</a> and uses <a class="ulink" href="http://en.wikipedia.org/wiki/File_Allocation_Table" target="_top">FAT</a> filesystem. See <a class="xref" href="ch10.en.html#_removable_storage_device" title="10.1.10. Removable storage device">Section 10.1.10, “Removable storage device”</a>.</p></td></tr>
</table></div>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="_miscellaneous_data_conversion"></a>11.7. Miscellaneous data conversion</h2></div></div></div>
<p>There are many other programs for converting data. Following packages caught my eyes using regex "<code class="literal">~Guse::converting</code>" in <span class="citerefentry"><span class="refentrytitle">aptitude</span>(8)</span> (see <a class="xref" href="ch02.en.html#_search_method_options_with_aptitude" title="2.2.6. Search method options with aptitude">Section 2.2.6, “Search method options with aptitude”</a>).</p>
<div class="table">
<a name="listofmiscellaneaconversiontools"></a><p class="title"><b>Table 11.18. List of miscellaneous data conversion tools</b></p>
<div class="table-contents"><table summary="List of miscellaneous data conversion tools" border="1">
<colgroup>
<col align="left">
<col align="left">
<col align="left">
<col align="left">
<col align="left">
</colgroup>
<thead><tr>
<th align="left">
package
</th>
<th align="left">
popcon
</th>
<th align="left">
size
</th>
<th align="left">
keyword
</th>
<th align="left">
description
</th>
</tr></thead>
<tbody>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/alien" target="_top">
<code class="literal">alien</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=alien" target="_top">http://qa.debian.org/popcon.php?package=alien</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/a/alien.html" target="_top">209</a></td>
<td align="left">
rpm/tgz→deb
</td>
<td align="left">
converter for the foreign package into the Debian package
</td>
</tr>
<tr>
<td align="left"><a class="ulink" href="http://packages.debian.org/sid/freepwing" target="_top">
<code class="literal">freepwing</code>
</a></td>
<td align="left"><a class="ulink" href="http://qa.debian.org/popcon.php?package=freepwing" target="_top">http://qa.debian.org/popcon.php?package=freepwing</a></td>
<td align="left"><a class="ulink" href="http://packages.qa.debian.org/f/freepwing.html" target="_top">568</a></td>
<td align="left">
EB→EPWING
</td>
<td align="left">
converter from "Electric Book" (popular in Japan) to a single <a class="ulink" href="http://ja.wikipedia.org/wiki/JIS_X_4081" target="_top">JIS X 4081</a> format (a subset of the <a class="ulink" href="http://ja.wikipedia.org/wiki/EPWING" target="_top">EPWING</a> V1)
</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>You can also extract data from RPM format with the following.</p>
<pre class="screen">$ rpm2cpio file.src.rpm | cpio --extract</pre>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left">
<a accesskey="p" href="ch10.en.html"><img src="images/prev.gif" alt="Prev"></a> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="ch12.en.html"><img src="images/next.gif" alt="Next"></a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 10. Data management </td>
<td width="20%" align="center"><a accesskey="h" href="index.en.html"><img src="images/home.gif" alt="Home"></a></td>
<td width="40%" align="right" valign="top"> Chapter 12. Programming</td>
</tr>
</table>
</div>
</body>
</html>