old-www/HOWTO/Unicode-HOWTO-4.html

1000 lines
40 KiB
HTML
Raw Permalink Blame History

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
<TITLE>The Unicode HOWTO: Specific applications</TITLE>
<LINK HREF="Unicode-HOWTO-5.html" REL=next>
<LINK HREF="Unicode-HOWTO-3.html" REL=previous>
<LINK HREF="Unicode-HOWTO.html#toc4" REL=contents>
</HEAD>
<BODY>
<A HREF="Unicode-HOWTO-5.html">Next</A>
<A HREF="Unicode-HOWTO-3.html">Previous</A>
<A HREF="Unicode-HOWTO.html#toc4">Contents</A>
<HR>
<H2><A NAME="s4">4. Specific applications</A></H2>
<P>
<P>
<H2><A NAME="ss4.1">4.1 Shells</A>
</H2>
<P>
<P>
<H3>bash</H3>
<P>
<P>By default, GNU bash assumes that every character is one byte long and one
column wide. A patch for bash 2.04, by Marcin 'Qrczak' Kowalczyk and
Ricardas Cepas, teaches bash about multibyte characters in UTF-8 encoding.
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/bash-2.04-diff">bash-2.04-diff</A><P>Double-width characters, combining characters and bidi are not supported by
this patch. It seems a complete redesign of the readline redisplay engine is
needed.
<P>
<H2><A NAME="ss4.2">4.2 Networking</A>
</H2>
<P>
<P>
<P>
<H3>telnet</H3>
<P>
<P>In some installations, telnet is not 8-bit clean by default.
In order to be able to send Unicode keystrokes to the remote host, you need to
set telnet into "outbinary" mode.
There are two ways to do this:
<BLOCKQUOTE><CODE>
<PRE>
$ telnet -L &lt;host>
</PRE>
</CODE></BLOCKQUOTE>
and
<BLOCKQUOTE><CODE>
<PRE>
$ telnet
telnet> set outbinary
telnet> open &lt;host>
</PRE>
</CODE></BLOCKQUOTE>
<P>
<P>
<H3>kermit</H3>
<P>
<P>The communications program C-Kermit
<A HREF="http://www.columbia.edu/kermit/ckermit.html">http://www.columbia.edu/kermit/ckermit.html</A>,
(an interactive tool for connection setup, telnet, file transfer,
with support for TCP/IP and serial lines),
in versions 7.0 or newer, understands the file and transfer encodings
UTF-8 and UCS-2, and understands the terminal encoding UTF-8, and converts
between these encodings and many others. Documentation of these features
can be found in
<A HREF="http://www.columbia.edu/kermit/ckermit2.html#x6.6">http://www.columbia.edu/kermit/ckermit2.html#x6.6</A>.
<P>
<H2><A NAME="ss4.3">4.3 Browsers</A>
</H2>
<P>
<P>
<H3>Netscape</H3>
<P>
<P>Netscape 4.05 or newer can display HTML documents in UTF-8 encoding. All a
document needs is the following line between the
&lt;head&gt; and &lt;/head&gt; tags:
<BLOCKQUOTE><CODE>
<PRE>
&lt;meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</PRE>
</CODE></BLOCKQUOTE>
<P>Netscape 4.05 or newer can also display HTML and text files in UCS-2
encoding with byte-order mark.
<P>
<A HREF="http://www.netscape.com/computing/download/">http://www.netscape.com/computing/download/</A><P>
<H3>Mozilla</H3>
<P>
<P>Mozilla milestone M16 has much better internationalization than Netscape 4.
It can display HTML documents in UTF-8 encoding with support for more
languages. Alas, there is a cosmetic problem with CJK fonts: some glyphs
can be bigger than the line's height, thus overlapping the previous or next
line.
<P>
<A HREF="http://www.mozilla.org/">http://www.mozilla.org/</A><P>
<H3>Amaya</H3>
<P>
<P>Amaya 4.2.1
(
<A HREF="http://www.w3.org/Amaya/">http://www.w3.org/Amaya/</A>,
<A HREF="http://www.w3.org/Amaya/User/SourceDist">http://www.w3.org/Amaya/User/SourceDist</A>)
has now limited handling of UTF-8 encoded HTML pages. It
recognizes the encoding, but it displays only ISO-8859-1 and symbol
characters; it only ever accesses the fonts
<BLOCKQUOTE><CODE>
<PRE>
-adobe-times-*-iso8859-1
-adobe-helvetica-*-iso8859-1
-adobe-new century schoolbook-*-iso8859-1
-adobe-courier-*-iso8859-1
-adobe-symbol-*-adobe-fontspecific
</PRE>
</CODE></BLOCKQUOTE>
<P>Amaya is in fact a HTML editor, not only a browser. Amaya's strengths among
the browsers are its speed, given enough memory, and its rendering
of mathematical formulas (MathML support).
<P>
<H3>lynx</H3>
<P>
<P>lynx-2.8 has an options screen (key 'O') which permits to set the display
character set. When running in an xterm or Linux console in UTF-8 mode,
set this to "UNICODE UTF-8". Note that for this setting to take effect
in the current browser session, you have to confirm on the "Accept Changes"
field, and for this setting to take effect in future browser sessions, you
have to enable the "Save options to disk" field and then confirm it on
the "Accept Changes" field.
<P>Now, again, all a document needs is the following line between the
&lt;head&gt; and &lt;/head&gt; tags:
<BLOCKQUOTE><CODE>
<PRE>
&lt;meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</PRE>
</CODE></BLOCKQUOTE>
<P>When you are viewing text files in UTF-8 encoding, you also need to
pass the command-line option "-assume_local_charset=UTF-8" (affects only
file:/... URLs) or "-assume_charset=UTF-8" (affects all URLs).
In lynx-2.8.2 you can alternatively, in the options screen (key 'O'),
change the assumed document character set to "utf-8".
<P>There is also an option in the options screen, to set the "preferred document
character set". But it has no effect, at least with file:/... URLs
and with http://... URLs served by apache-1.3.0.
<P>There is a spacing and line-breaking problem, however. (Look at the
russian section of x-utf8.html, or at utf-8-demo.txt.)
<P>Also, in lynx-2.8.2, configured with --enable-prettysrc, the nice colour
scheme does not work correctly any more when the display character set
has been set to "UNICODE UTF-8". This is fixed by a simple patch
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/lynx282.diff">lynx282.diff</A>.
<P>The Lynx developers say: "For any serious use of UTF-8 screen output with
lynx, compiling with slang lib and -DSLANG_MBCS_HACK is still recommended."
<P>Latest stable release:
<A HREF="ftp://ftp.gnu.org/pub/gnu/lynx/lynx-2.8.2.tar.gz">ftp://ftp.gnu.org/pub/gnu/lynx/lynx-2.8.2.tar.gz</A><P>
<A HREF="http://lynx.isc.org/">http://lynx.isc.org/</A><P>General home page:
<A HREF="http://lynx.browser.org/">http://lynx.browser.org/</A><P>
<A HREF="http://www.slcc.edu/lynx/">http://www.slcc.edu/lynx/</A><P>Newer development shapshots:
<A HREF="http://lynx.isc.org/current/">http://lynx.isc.org/current/</A>,
<A HREF="ftp://lynx.isc.org/current/">ftp://lynx.isc.org/current/</A><P>
<H3>w3m</H3>
<P>
<P>w3m by Akinori Ito
<A HREF="http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/">http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/</A>
is a text mode browser for HTML pages and plain-text files.
Its layout of HTML tables, enumerations etc. is much prettier than lynx' one.
w3m can also be used as a high quality HTML to plain text converter.
<P>w3m 0.1.10 has command line options for the three major Japanese encodings, but
can also be used for UTF-8 encoded files. Without command line options,
you often have to press Ctrl-L to refresh the display, and line breaking
in Cyrillic and CJK paragraphs is not good.
<P>To fix this, by Hironori Sakamoto has a patch
<A HREF="http://www2u.biglobe.ne.jp/~hsaka/w3m/">http://www2u.biglobe.ne.jp/~hsaka/w3m/</A>
which adds UTF-8 as display encoding.
<P>
<H3>Test pages</H3>
<P>
<P>Some test pages for browsers can be found at the pages of Alan Wood
<A HREF="http://www.hclrss.demon.co.uk/unicode/#links">http://www.hclrss.demon.co.uk/unicode/#links</A>
and James Kass
<A HREF="http://home.att.net/~jameskass/">http://home.att.net/~jameskass/</A>.
<P>
<H2><A NAME="ss4.4">4.4 Editors</A>
</H2>
<P>
<P>
<H3>yudit</H3>
<P>
<P>yudit by G&aacute;sp&aacute;r Sinai
<A HREF="http://www.yudit.org/">http://www.yudit.org/</A>
is a first-class unicode text editor for the X Window System.
It supports simultaneous processing of many languages, input methods,
conversions for local character standards.
It has facilities for entering text in all languages with only
an English keyboard, using keyboard configuration maps.
<P>
<H3>yudit-1.5</H3>
<P>
<P>It can be compiled in three versions: Xlib GUI, KDE GUI, or Motif GUI.
<P>Customization is very easy. Typically you will first customize your font.
From the font menu I chose "Unicode". Then, since the command
"xlsfonts '*-*-iso10646-1'" still showed some ambiguity, I chose a font
size of 13 (to match Markus Kuhn's 13-pixel fixed font).
<P>Next, you will customize your input method. The input methods "Straight",
"Unicode" and "SGML" are most remarkable. For details about the other
built-in input methods, look in /usr/local/share/yudit/data/.
<P>To change the default for the next session, edit your $HOME/.yuditrc
file.
<P>The general editor functionality is limited to editing, cut&amp;paste
and search&amp;replace. No undo.
<P>
<H3>yudit-2.1</H3>
<P>
<P>This version is less easy to learn, because it comes with a homebrewn
GUI and no easily accessible help. But it has an undo functionality and
should therefore be more usable than version 1.5.
<P>
<H3>Fonts for yudit</H3>
<P>
<P>yudit can display text using a TrueType font; see section "TrueType fonts"
above. The Bitstream Cyberbit gives good results. For yudit to find the
font, symlink it to <CODE>/usr/local/share/yudit/data/cyberbit.ttf</CODE>.
<P>
<H3>vim</H3>
<P>
<P>vim (as of version 6.0r) has good support for UTF-8: when started in an
UTF-8 locale, it assumes UTF-8 encoding for the console and the text files
being edited. It supports double-wide (CJK) characters as well and
combining characters and therefore fits perfectly into UTF-8 enabled
xterm.
<P>Installation: Download from
<A HREF="http://www.vim.org/">http://www.vim.org/</A>.
After unpacking the four parts, call <CODE>./configure</CODE> with
<CODE>--with-features=big</CODE> <CODE>--enable-multibyte</CODE> arguments
(or edit src/Makefile to include the <CODE>--with-features=big</CODE> and
<CODE>--enable-multibyte</CODE> options). This will turn on the feature
FEAT_MBYTE. Then do "make" and "make install".
<P>vim can be used to edit files in other encodings. For example, to edit
a BIG5 encoded file: <CODE>:e ++cc=BIG5 filename</CODE>. All encoding names
supported by iconv are accepted. Plus: vim automatically distinguishes
UTF-8 and ISO-8859-1 files without needing any command line option.
<P>
<H3>cooledit</H3>
<P>
<P>cooledit by Paul Sheer
<A HREF="http://www.cooledit.org/">http://www.cooledit.org/</A>
is a good text editor for the X Window System. Since version 3.15, it has
support for Unicode, including Bidi for Hebrew (but not Arabic).
<P>A build error message message about a missing "vga_setpage" function is
worked around by adding "-DDO_NOT_USE_VGALIB" to the CFLAGS.
<P>To view UTF-8 files in an UTF-8 locale you have to modify a setting in
the "Options -> Switches" panel: Enable the checkbox "Display characters
outside locale". I also found it necessary to disable "Spellcheck as you
type".
<P>For viewing texts with both European and CJK characters, cooledit needs a
font which contains both, for example the GNU unifont (see section
"X11 Unicode fonts"): Start once
<BLOCKQUOTE><CODE>
<PRE>
$ cooledit -fn -gnu-unifont-medium-r-normal--16-160-75-75-c-80-iso10646-1
</PRE>
</CODE></BLOCKQUOTE>
cooledit will then use this font in all future invocations.
<P>Unfortunately, the only characters that can be entered through the keyboard
are ISO-8859-1 characters and, through a cooledit specific compose mechanism,
ISO-8859-2 characters. Inputing arbitrary Unicode characters in cooledit is
possible, but a bit tedious.
<P>
<H3>emacs</H3>
<P>
<P>First of all, you should read the section "International Character Set Support"
(node "International") in the Emacs manual. In particular, note that you need
to start Emacs using the command
<BLOCKQUOTE><CODE>
<PRE>
$ emacs -fn fontset-standard
</PRE>
</CODE></BLOCKQUOTE>
so that it will use a font set comprising a lot of international characters.
<P>In the short term, there are two packages for using UTF-8 in Emacs. None
of them needs recompiling Emacs.
<UL>
<LI>The emacs-utf package
<A HREF="http://www.cs.ust.hk/faculty/otfried/Mule/">http://www.cs.ust.hk/faculty/otfried/Mule/</A>
by Otfried Cheong provides a "unicode-utf8" encoding to Emacs.</LI>
<LI>The oc-unicode package
<A HREF="http://www.cs.ust.hk/faculty/otfried/Mule/">http://www.cs.ust.hk/faculty/otfried/Mule/</A>,
by Otfried Cheong, an extension of the Mule-UCS package
<A HREF="ftp://etlport.etl.go.jp/pub/mule/Mule-UCS/Mule-UCS-0.70.tar.gz">ftp://etlport.etl.go.jp/pub/mule/Mule-UCS/Mule-UCS-0.70.tar.gz</A>
(mirrored at
<A HREF="http://riksun.riken.go.jp/archives/misc/mule/Mule-UCS/Mule-UCS-0.70.tar.gz">http://riksun.riken.go.jp/archives/misc/mule/Mule-UCS/Mule-UCS-0.70.tar.gz</A>
and
<A HREF="ftp://ftp.m17n.org/pub/mule/Mule-UCS/Mule-UCS-0.70.tar.gz">ftp://ftp.m17n.org/pub/mule/Mule-UCS/Mule-UCS-0.70.tar.gz</A>)
by Hisashi Miyashita, provides a "utf-8" encoding to Emacs.</LI>
</UL>
<P>You can use either of these packages, or both together. The advantages
of the emacs-utf "unicode-utf8" encoding are: it loads faster, and it deals
better with combining characters (important for Thai).
The advantage of the Mule-UCS / oc-unicode "utf-8" encoding is: it can apply
to a process buffer (such as M-x shell), not only to loading and saving of
files; and it respects the widths of characters better (important for
Ethiopian). However, it is less reliable: After heavy editing of a file, I
have seen some Unicode characters replaced with U+FFFD after the file was
saved. (But maybe that were bugs in Emacs 20.5 and 20.6 which are fixed in
Emacs 20.7.)
<P>To install the emacs-utf package, compile the program "utf2mule" and install
it somewhere in your $PATH, also install unicode.el, muleuni-1.el,
unicode-char.el somewhere. Then add the lines
<BLOCKQUOTE><CODE>
<PRE>
(setq load-path (cons "/home/user/somewhere/emacs" load-path))
(if (not (string-match "XEmacs" emacs-version))
(progn
(require 'unicode)
;(setq unicode-data-path "..../UnicodeData-3.0.0.txt")
(if (eq window-system 'x)
(progn
(setq fontset12
(create-fontset-from-fontset-spec
"-misc-fixed-medium-r-normal-*-12-*-*-*-*-*-fontset-standard"))
(setq fontset13
(create-fontset-from-fontset-spec
"-misc-fixed-medium-r-normal-*-13-*-*-*-*-*-fontset-standard"))
(setq fontset14
(create-fontset-from-fontset-spec
"-misc-fixed-medium-r-normal-*-14-*-*-*-*-*-fontset-standard"))
(setq fontset15
(create-fontset-from-fontset-spec
"-misc-fixed-medium-r-normal-*-15-*-*-*-*-*-fontset-standard"))
(setq fontset16
(create-fontset-from-fontset-spec
"-misc-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-standard"))
(setq fontset18
(create-fontset-from-fontset-spec
"-misc-fixed-medium-r-normal-*-18-*-*-*-*-*-fontset-standard"))
; (set-default-font fontset15)
))))
</PRE>
</CODE></BLOCKQUOTE>
to your $HOME/.emacs file. To activate any of the font sets, use the Mule
menu item "Set Font/FontSet" or Shift-down-mouse-1. The Unicode coverage
may of the font sets at different sizes may depend on the installed fonts;
here are screen shots at various sizes of UTF-8-demo.txt (
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-12.gif">12</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-13.gif">13</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-14.gif">14</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-15.gif">15</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-16.gif">16</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-18.gif">18</A>)
and of the Mule script examples (
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-12.gif">12</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-13.gif">13</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-14.gif">14</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-15.gif">15</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-16.gif">16</A>,
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-18.gif">18</A>).
To designate a font set as the initial font set for the first frame at startup,
uncomment the <CODE>set-default-font</CODE> line in the code snippet above.
<P>To install the oc-unicode package, execute the command
<BLOCKQUOTE><CODE>
<PRE>
$ emacs -batch -l oc-comp.el
</PRE>
</CODE></BLOCKQUOTE>
and install the resulting file <CODE>un-define.elc</CODE>, as well as
<CODE>oc-unicode.el</CODE>, <CODE>oc-charsets.el</CODE>, <CODE>oc-tools.el</CODE>,
somewhere. Then add the lines
<BLOCKQUOTE><CODE>
<PRE>
(setq load-path (cons "/home/user/somewhere/emacs" load-path))
(if (not (string-match "XEmacs" emacs-version))
(progn
(require 'oc-unicode)
;(setq unicode-data-path "..../UnicodeData-3.0.0.txt")
(if (eq window-system 'x)
(progn
(setq fontset12
(oc-create-fontset
"-misc-fixed-medium-r-normal-*-12-*-*-*-*-*-fontset-standard"
"-misc-fixed-medium-r-normal-ja-12-*-iso10646-*"))
(setq fontset13
(oc-create-fontset
"-misc-fixed-medium-r-normal-*-13-*-*-*-*-*-fontset-standard"
"-misc-fixed-medium-r-normal-ja-13-*-iso10646-*"))
(setq fontset14
(oc-create-fontset
"-misc-fixed-medium-r-normal-*-14-*-*-*-*-*-fontset-standard"
"-misc-fixed-medium-r-normal-ja-14-*-iso10646-*"))
(setq fontset15
(oc-create-fontset
"-misc-fixed-medium-r-normal-*-15-*-*-*-*-*-fontset-standard"
"-misc-fixed-medium-r-normal-ja-15-*-iso10646-*"))
(setq fontset16
(oc-create-fontset
"-misc-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-standard"
"-misc-fixed-medium-r-normal-ja-16-*-iso10646-*"))
(setq fontset18
(oc-create-fontset
"-misc-fixed-medium-r-normal-*-18-*-*-*-*-*-fontset-standard"
"-misc-fixed-medium-r-normal-ja-18-*-iso10646-*"))
; (set-default-font fontset15)
))))
</PRE>
</CODE></BLOCKQUOTE>
to your $HOME/.emacs file. You can choose your appropriate font set as with
the emacs-utf package.
<P>In order to open an UTF-8 encoded file, you will type
<BLOCKQUOTE><CODE>
<PRE>
M-x universal-coding-system-argument unicode-utf8 RET
M-x find-file filename RET
</PRE>
</CODE></BLOCKQUOTE>
or
<BLOCKQUOTE><CODE>
<PRE>
C-x RET c unicode-utf8 RET
C-x C-f filename RET
</PRE>
</CODE></BLOCKQUOTE>
(or utf-8 instead of unicode-utf8, if you prefer oc-unicode/Mule-UCS).
<P>In order to start a shell buffer with UTF-8 I/O, you will type
<BLOCKQUOTE><CODE>
<PRE>
M-x universal-coding-system-argument utf-8 RET
M-x shell RET
</PRE>
</CODE></BLOCKQUOTE>
(This works with oc-unicode/Mule-UCS only.)
<P>There is a newer version Mule-UCS-0.81. Unfortunately you need to rebuild emacs
from source in order to use it.
<P>Note that all this works with Emacs 20 in windowing mode only, not in terminal
mode. None of the mentioned packages works in Emacs 21, as of this writing.
<P>Richard Stallman plans to add integrated UTF-8 support to Emacs in the long
term, and so does the XEmacs developers group.
<P>
<H3>xemacs</H3>
<P>
<P>(This section is written by Gilbert Baumann.)
<P>Here is how to teach XEmacs (20.4 configured with MULE) the UTF-8 encoding.
Unfortunately you need its sources to be able to patch it.
<P>First you need these files provided by Tomohiko Morioka:
<P>
<A HREF="http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-21.0-b55-emc-b55-ucs.diff">http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-21.0-b55-emc-b55-ucs.diff</A>
and
<A HREF="http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-ucs-conv-0.1.tar.gz">http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-ucs-conv-0.1.tar.gz</A><P>The .diff is a diff against the C sources. The tar ball is elisp code,
which provides lots of code tables to map to and from Unicode. As the
name of the diff file suggests it is against XEmacs-21; I needed to
help `patch' a bit. The most notable difference to my XEmacs-20.4
sources is that file-coding.[ch] was called mule-coding.[ch].
<P>For those unfamilar with the XEmacs-MULE stuff (as I am) a quick
guide:
<P>What we call an encoding is called by MULE a `coding-system'. The most
important commands are:
<P>
<BLOCKQUOTE><CODE>
<PRE>
M-x set-file-coding-system
M-x set-buffer-process-coding-system [comint buffers]
</PRE>
</CODE></BLOCKQUOTE>
<P>and the variable `file-coding-system-alist', which guides `find-file'
to guess the encoding used. After stuff was running, the very first
thing I did was
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/gb-hacks.el">this</A>.
<P>This code looks into the special mode line introduced by -*- somewhere
in the first 600 bytes of the file about to opened; if now there is a
field "Encoding: xyz;" and the xyz encoding ("coding system" in Emacs speak)
exists, choose that. So now you could do e.g.
<P>
<BLOCKQUOTE><CODE>
<PRE>
;;; -*- Mode: Lisp; Syntax: Common-Lisp; Package: CLEX; Encoding: utf-8; -*-
</PRE>
</CODE></BLOCKQUOTE>
<P>and XEmacs goes into utf-8 mode here.
<P>Atfer everything was running I defined \u03BB (greek lambda) as a
macro like:
<P>
<BLOCKQUOTE><CODE>
<PRE>
(defmacro \u03BB (x) `(lambda .,x))
</PRE>
</CODE></BLOCKQUOTE>
<P>
<H3>nedit</H3>
<P>
<P>
<H3>xedit</H3>
<P>
<P>With XFree86-4.0.1, xedit is able to edit UTF-8 files if you set the locale
accordingly (see above), and add the line "Xedit*international: true" to
your $HOME/.Xdefaults file.
<P>
<H3>axe</H3>
<P>
<P>As of version 6.1.2, aXe supports only 8-bit locales. If you add the line
"Axe*international: true" to your $HOME/.Xdefaults file, it will simply dump
core.
<P>
<H3>pico</H3>
<P>
<P>As of version 4.30, pine cannot be reasonably used to view or edit UTF-8
files. In UTF-8 enabled xterm, it has severe redraw problems.
<P>
<H3>mined98</H3>
<P>
<P>mined98 is a small text editor by Michiel Huisjes, Achim M&uuml;ller and
Thomas Wolff.
<A HREF="http://www.inf.fu-berlin.de/~wolff/mined98.tar.gz">http://www.inf.fu-berlin.de/~wolff/mined98.tar.gz</A>
It lets you edit UTF-8 or 8-bit encoded files, in an UTF-8 or 8-bit xterm.
It also has powerful capabilities for entering Unicode characters.
<P>mined lets you edit both 8-bit encoded and UTF-8 encoded files. By default
it uses an autodetection heuristic. If you don't want to rely on heuristics,
pass the command-line option <CODE>-u</CODE> when editing an UTF-8 file, or
<CODE>+u</CODE> when editing an 8-bit encoded file. You can change the
interpretation at any time from within the editor: It displays the encoding
("L:h" for 8-bit, "U:h" for UTF-8) in the menu line. Click on the first
of these characters to change it.
<P>mined knows about double-width and combining characters and displays them
correctly. It also has a special display mode for combining characters.
<P>mined also has a scrollbar and very nice pull-down menus. Alas, the "Home",
"End", "Delete" keys do not work.
<P>
<H3>qemacs</H3>
<P>
<P>qemacs 0.2 is a small text editor by Fabrice Bellard.
<A HREF="http://www-stud.enst.fr/~bellard/qemacs/">http://www-stud.enst.fr/~bellard/qemacs/</A>
with Emacs keybindings. It runs in an UTF-8 console or xterm, and can edit
both 8-bit encoded and UTF-8 encoded files. It still has a few rough edges,
but further development is underway.
<P>
<H2><A NAME="ss4.5">4.5 Mailers</A>
</H2>
<P>
<P>MIME: RFC 2279 defines UTF-8 as a MIME charset, which can be transported
under the 8bit, quoted-printable and base64 encodings. The older MIME
UTF-7 proposal (RFC 2152) is considered to be deprecated and should not
be used any further.
<P>Mail clients released after January 1, 1999, should be capable of sending and
displaying UTF-8 encoded mails, otherwise they are considered deficient.
But these mails have to carry the MIME labels
<BLOCKQUOTE><CODE>
<PRE>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
</PRE>
</CODE></BLOCKQUOTE>
Simply piping an UTF-8 file into "mail" without caring about the MIME labels
will not work.
<P>Mail client implementors should take a look at
<A HREF="http://www.imc.org/imc-intl/">http://www.imc.org/imc-intl/</A>
and
<A HREF="http://www.imc.org/mail-i18n.html">http://www.imc.org/mail-i18n.html</A>.
<P>Now about the individual mail clients (or "mail user agents"):
<P>
<H3>pine</H3>
<P>
<P>The situation for an unpatched pine version 4.30 is as follows.
<P>Pine does not do character set conversions. But it allows you to view
UTF-8 mails in an UTF-8 text window (Linux console or xterm).
<P>Normally, Pine will warn about different character sets each time you view
an UTF-8 encoded mail. To get rid of this warning, choose S (setup), then
C (config), then change the value of "character-set" to UTF-8. This option
will not do anything, except to reduce the warnings, as Pine has no built-in
knowledge of UTF-8.
<P>Also note that Pine's notion of Unicode characters is pretty limited: It
will display Latin and Greek characters, but not other kinds of Unicode
characters.
<P>A patch by Robert Brady
<A HREF="mailto:robert@suse.co.uk">&lt;robert@suse.co.uk&gt;</A>
<A HREF="http://www.ents.susu.soton.ac.uk/~robert/pine-utf8-0.1.diff">http://www.ents.susu.soton.ac.uk/~robert/pine-utf8-0.1.diff</A>
adds UTF-8 support to Pine. With this patch, it decodes and prints headers
and bodies properly. The patch depends on the GNOME libunicode
<A HREF="http://cvs.gnome.org/lxr/source/libunicode/">http://cvs.gnome.org/lxr/source/libunicode/</A>.
<P>However, alignment remains broken in many places; replying to a mail does
not cause the character set to be converted as appropriate; and the editor,
pico, cannot deal with multibyte characters.
<P>
<H3>kmail</H3>
<P>
<P>kmail (as of KDE 1.0) does not support UTF-8 mails at all.
<P>
<H3>Netscape Communicator</H3>
<P>
<P>Netscape Communicator's Messenger can send and display mails in UTF-8
encoding, but it needs a little bit of manual user intervention.
<P>To send an UTF-8 encoded mail: After opening the "Compose" window, but before
starting to compose the message, select from the menu
"View -> Character Set -> Unicode (UTF-8)". Then compose the message and
send it.
<P>When you receive an UTF-8 encoded mail, Netscape unfortunately does not
display it in UTF-8 right away, and does not even give a visual clue that
the mail was encoded in UTF-8. You have to manually select from the menu
"View -> Character Set -> Unicode (UTF-8)".
<P>For displaying UTF-8 mails, Netscape uses different fonts. You can adjust
your font settings in the "Edit -> Preferences -> Fonts" dialog; choose
the "Unicode" font category.
<P>
<H3>emacs (rmail, vm)</H3>
<P>
<P>
<H3>mutt</H3>
<P>
<P>mutt-1.2.x, as available from
<A HREF="http://www.mutt.org/">http://www.mutt.org/</A>,
has only rudimentary support for UTF-8: it can convert
from UTF-8 into an 8-bit display charset. The mutt-1.3.x
development branch also supports UTF-8 as the display charset,
so you can run Mutt in an UTF-8 xterm, and has thorough support
for MIME and charset conversion (relying on iconv).
<P>
<H3>exmh</H3>
<P>
<P>exmh 2.1.2 with Tk<54>8.4a1 can recognize and correctly display UTF-8 mails
(without CJK characters) if you add the following lines to your
<CODE>$HOME/.Xdefaults</CODE> file.
<BLOCKQUOTE><CODE>
<PRE>
!
! Exmh
!
exmh.mimeUCharsets: utf-8
exmh.mime_utf-8_registry: iso10646
exmh.mime_utf-8_encoding: 1
exmh.mime_utf-8_plain_families: fixed
exmh.mime_utf-8_fixed_families: fixed
exmh.mime_utf-8_proportional_families: fixed
exmh.mime_utf-8_title_families: fixed
</PRE>
</CODE></BLOCKQUOTE>
<P>
<H2><A NAME="ss4.6">4.6 Text processing</A>
</H2>
<P>
<P>
<H3>groff</H3>
<P>
<P>groff 1.16.1, the GNU implementation of the traditional Unix text processing
system troff/nroff, can output UTF-8 formatted text. Simply use
`<CODE>groff -Tutf8</CODE>' instead of `<CODE>groff -Tlatin1</CODE>' or
`<CODE>groff -Tascii</CODE>'.
<P>
<H3>TeX</H3>
<P>
<P>The teTeX 0.9 (and newer) distribution contains an Unicode adaptation of TeX,
called Omega
(
<A HREF="http://www.gutenberg.eu.org/omega/">http://www.gutenberg.eu.org/omega/</A>,
<A HREF="ftp://ftp.ens.fr/pub/tex/yannis/omega">ftp://ftp.ens.fr/pub/tex/yannis/omega</A>).
Together with the unicode.tex file contained in
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/utf8-tex-0.1.tar.gz">utf8-tex-0.1.tar.gz</A>
it enables you to use UTF-8 encoded sources as input for TeX. A thousand of
Unicode characters are currently supported.
<P>All that changes is that you run `omega' (instead of `tex') or `lambda'
(instead of `latex'), and insert the following lines at the head of
your source input.
<BLOCKQUOTE><CODE>
<PRE>
\ocp\TexUTF=inutf8
\InputTranslation currentfile \TexUTF
</PRE>
</CODE></BLOCKQUOTE>
<BLOCKQUOTE><CODE>
<PRE>
\input unicode
</PRE>
</CODE></BLOCKQUOTE>
<P>Other maybe related links:
<A HREF="http://www.dante.de/projekte/nts/NTS-FAQ.html">http://www.dante.de/projekte/nts/NTS-FAQ.html</A>,
<A HREF="ftp://ftp.dante.de/pub/tex/language/chinese/CJK/">ftp://ftp.dante.de/pub/tex/language/chinese/CJK/</A>.
<P>
<H2><A NAME="ss4.7">4.7 Databases</A>
</H2>
<P>
<P>
<H3>PostgreSQL</H3>
<P>
<P>PostgreSQL 6.4 or newer can be built with the configuration option
<CODE>--with-mb=UNICODE</CODE>.
<P>
<H3>Interbase</H3>
<P>
<P>Borland/Inprise's Interbase 6.0 can store string fields in UTF-8 format
if the option "CHARACTER SET UNICODE_FSS" is given.
<P>
<H2><A NAME="ss4.8">4.8 Other text-mode applications</A>
</H2>
<P>
<P>
<H3>less</H3>
<P>
<P>With
<A HREF="http://www.flash.net/~marknu/less/less-358.tar.gz">http://www.flash.net/~marknu/less/less-358.tar.gz</A>
you can browse UTF-8 encoded text files in an UTF-8 xterm or console.
Make sure that the environment variable LESSCHARSET is not set (or is set
to utf-8). If you also have a LESSKEY environment variable set, also make
sure that the file it points to does not define LESSCHARSET. If necessary,
regenerate this file using the `lesskey' command, or unset the LESSKEY
environment variable.
<P>
<H3>lv</H3>
<P>
<P>lv-4.49.3 by Tomio Narita
<A HREF="http://www.ff.iij4u.or.jp/~nrt/lv/">http://www.ff.iij4u.or.jp/~nrt/lv/</A>
is a file viewer with builtin character set converters. To view UTF-8 files
in an UTF-8 console, use "lv -Au8". But it can also be used to view
files in other CJK encodings in an UTF-8 console.
<P>There is a small glitch: lv turns off xterm's cursor and doesn't turn it on
again.
<P>
<H3>expand</H3>
<P>
<P>Get the GNU textutils-2.0 and apply the patch
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/textutils-2.0.diff">textutils-2.0.diff</A>,
then configure, add "#define HAVE_FGETWC 1", "#define HAVE_FPUTWC 1" to
config.h. Then rebuild.
<P>
<H3>col, colcrt, colrm, column, rev, ul</H3>
<P>
<P>Get the util-linux-2.9y package, configure it, then define ENABLE_WIDECHAR in
defines.h, change the "#if 0" to "#if 1" in lib/widechar.h. In
text-utils/Makefile, modify CFLAGS and LDFLAGS so that they include the
directories where libutf8 is installed. Then rebuild.
<P>
<H3>figlet</H3>
<P>
<P>figlet 2.2 has an option for UTF-8 input: "figlet -C utf8"
<P>
<H3>Base utilities</H3>
<P>
<P>The Li18nux list of commands and utilities that ought to be made interoperable
with UTF-8 is as follows. Useful information needs to get added here; I just
didn't get around it yet :-)
<P>As of glibc-2.2, regular expressions only work for 8-bit characters.
In an UTF-8 locale, regular expressions that contain non-ASCII characters
or that expect to match a single multibyte character with "." do not work.
This affects all commands and utilities listed below.
<P>
<DL>
<DT><B>alias</B><DD><P>No info available yet.
<DT><B>ar</B><DD><P>No info available yet.
<DT><B>arch</B><DD><P>No info available yet.
<DT><B>arp</B><DD><P>No info available yet.
<DT><B>at</B><DD><P>As of at-3.1.8: The two uses of isalnum in at.c are invalid and should be
replaced with a use of quotearg.c or an exclude list of the (fixed) list
of shell metacharacters. The two uses of %8s in at.c and atd.c are invalid
and should become arbitrary length.
<DT><B>awk</B><DD><P>No info available yet.
<DT><B>basename</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>batch</B><DD><P>No info available yet.
<DT><B>bc</B><DD><P>No info available yet.
<DT><B>bg</B><DD><P>No info available yet.
<DT><B>bunzip2</B><DD><P>No info available yet.
<DT><B>bzip2</B><DD><P>No info available yet.
<DT><B>bzip2recover</B><DD><P>No info available yet.
<DT><B>cal</B><DD><P>No info available yet.
<DT><B>cat</B><DD><P>No info available yet.
<DT><B>cd</B><DD><P>No info available yet.
<DT><B>cflow</B><DD><P>No info available yet.
<DT><B>chgrp</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>chmod</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>chown</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>chroot</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>cksum</B><DD><P>As of textutils-2.0e: OK.
<DT><B>clear</B><DD><P>No info available yet.
<DT><B>cmp</B><DD><P>No info available yet.
<DT><B>col</B><DD><P>No info available yet.
<DT><B>comm</B><DD><P>No info available yet.
<DT><B>command</B><DD><P>No info available yet.
<DT><B>compress</B><DD><P>No info available yet.
<DT><B>cp</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>cpio</B><DD><P>No info available yet.
<DT><B>crontab</B><DD><P>No info available yet.
<DT><B>csplit</B><DD><P>No info available yet.
<DT><B>ctags</B><DD><P>No info available yet.
<DT><B>cut</B><DD><P>No info available yet.
<DT><B>date</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>dd</B><DD><P>As of fileutils-4.0u: The conv=lcase, conv=ucase options don't work correctly.
<DT><B>df</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>diff</B><DD><P>As of diffutils-2.7.2: the --side-by-side mode therefore doesn't compute
column width correctly.
<DT><B>diff3</B><DD><P>No info available yet.
<DT><B>dirname</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>domainname</B><DD><P>No info available yet.
<DT><B>du</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>echo</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>ed</B><DD><P>No info available yet.
<DT><B>egrep</B><DD><P>No info available yet.
<DT><B>env</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>ex</B><DD><P>No info available yet.
<DT><B>expand</B><DD><P>No info available yet.
<DT><B>expr</B><DD><P>As of sh-utils-2.0i: The operators "match", "substr", "index", "length"
don't work correctly.
<DT><B>false</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>fc</B><DD><P>No info available yet.
<DT><B>fg</B><DD><P>No info available yet.
<DT><B>fgrep</B><DD><P>No info available yet.
<DT><B>file</B><DD><P>No info available yet.
<DT><B>find</B><DD><P>As of findutils-4.1.6: The "-iregex" does not work correctly; this needs a
fix in function find/parser.c:insert_regex.
<DT><B>fold</B><DD><P>No info available yet.
<DT><B>ftp[BSD]</B><DD><P>No info available yet.
<DT><B>fuser</B><DD><P>No info available yet.
<DT><B>gencat</B><DD><P>No info available yet.
<DT><B>getconf</B><DD><P>No info available yet.
<DT><B>getopts</B><DD><P>No info available yet.
<DT><B>gettext</B><DD><P>No info available yet.
<DT><B>grep</B><DD><P>No info available yet.
<DT><B>gunzip</B><DD><P>No info available yet.
<DT><B>gzip</B><DD><P>gzip-1.3 is UTF-8 capable, but it uses only English messages in ASCII
charset. Proper internationalization would require: Use gettext. Call
setlocale. In function check_ofname (file gzip.c), use the function rpmatch
from GNU text/sh/fileutils instead of asking for "y" or "n". The use
of strlen in gzip.c:852 is wrong, needs to use the function mbswidth.
<DT><B>hash</B><DD><P>No info available yet.
<DT><B>head</B><DD><P>No info available yet.
<DT><B>hostname</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>iconv</B><DD><P>No info available yet.
<DT><B>id</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>ifconfig</B><DD><P>No info available yet.
<DT><B>imake</B><DD><P>No info available yet.
<DT><B>ipcrm</B><DD><P>No info available yet.
<DT><B>ipcs</B><DD><P>No info available yet.
<DT><B>jobs</B><DD><P>No info available yet.
<DT><B>join</B><DD><P>No info available yet.
<DT><B>kill</B><DD><P>No info available yet.
<DT><B>killall</B><DD><P>No info available yet.
<DT><B>ldd</B><DD><P>No info available yet.
<DT><B>less</B><DD><P>No complete info available yet.
<DT><B>lex</B><DD><P>No info available yet.
<DT><B>ln</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>locale</B><DD><P>As of glibc-2.2: OK.
<DT><B>localedef</B><DD><P>As of glibc-2.2: OK.
<DT><B>logger</B><DD><P>No info available yet.
<DT><B>logname</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>lp</B><DD><P>No info available yet.
<DT><B>lpc[BSD]</B><DD><P>No info available yet.
<DT><B>lpq[BSD]</B><DD><P>No info available yet.
<DT><B>lpr[BSD]</B><DD><P>No info available yet.
<DT><B>lprm[BSD]</B><DD><P>No info available yet.
<DT><B>lpstat(LEGACY)</B><DD><P>No info available yet.
<DT><B>ls</B><DD><P>As of fileutils-4.0y: OK.
<DT><B>m4</B><DD><P>No info available yet.
<DT><B>mailx</B><DD><P>No info available yet.
<DT><B>make</B><DD><P>No info available yet.
<DT><B>man</B><DD><P>No info available yet.
<DT><B>mesg</B><DD><P>No info available yet.
<DT><B>mkdir</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>mkfifo</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>mkfs</B><DD><P>No info available yet.
<DT><B>mkswap</B><DD><P>No info available yet.
<DT><B>more</B><DD><P>No info available yet.
<DT><B>mount</B><DD><P>No info available yet.
<DT><B>msgfmt</B><DD><P>No info available yet.
<DT><B>msgmerge</B><DD><P>No info available yet.
<DT><B>mv</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>netstat</B><DD><P>No info available yet.
<DT><B>newgrp</B><DD><P>No info available yet.
<DT><B>nice</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>nl</B><DD><P>No info available yet.
<DT><B>nohup</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>nslookup</B><DD><P>No info available yet.
<DT><B>nm</B><DD><P>No info available yet.
<DT><B>od</B><DD><P>No info available yet.
<DT><B>passwd[BSD]</B><DD><P>No info available yet.
<DT><B>paste</B><DD><P>No info available yet.
<DT><B>patch</B><DD><P>No info available yet.
<DT><B>pathchk</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>ping</B><DD><P>No info available yet.
<DT><B>pr</B><DD><P>No info available yet.
<DT><B>printf</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>ps</B><DD><P>No info available yet.
<DT><B>pwd</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>read</B><DD><P>No info available yet.
<DT><B>reboot</B><DD><P>No info available yet.
<DT><B>renice</B><DD><P>No info available yet.
<DT><B>rm</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>rmdir</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>sed</B><DD><P>No info available yet.
<DT><B>shar[BSD]</B><DD><P>No info available yet.
<DT><B>shutdown</B><DD><P>No info available yet.
<DT><B>sleep</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>sort</B><DD><P>No info available yet.
<DT><B>split</B><DD><P>No info available yet.
<DT><B>strings</B><DD><P>No info available yet.
<DT><B>strip</B><DD><P>No info available yet.
<DT><B>stty</B><DD><P>As of sh-utils-2.0.11: OK.
<DT><B>su[BSD]</B><DD><P>No info available yet.
<DT><B>sum</B><DD><P>As of textutils-2.0e: OK.
<DT><B>tail</B><DD><P>No info available yet.
<DT><B>talk</B><DD><P>No info available yet.
<DT><B>tar</B><DD><P>As of tar-1.13.17: OK, if user and group names are always ASCII.
<DT><B>tclsh</B><DD><P>No info available yet.
<DT><B>tee</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>telnet</B><DD><P>No info available yet.
<DT><B>test</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>time</B><DD><P>No info available yet.
<DT><B>touch</B><DD><P>As of fileutils-4.0u: OK.
<DT><B>tput</B><DD><P>No info available yet.
<DT><B>tr</B><DD><P>No info available yet.
<DT><B>true</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>tsort</B><DD><P>No info available yet.
<DT><B>tty</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>type</B><DD><P>No info available yet.
<DT><B>ulimit</B><DD><P>No info available yet.
<DT><B>umask</B><DD><P>No info available yet.
<DT><B>umount</B><DD><P>No info available yet.
<DT><B>unalias</B><DD><P>No info available yet.
<DT><B>uname</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>uncompress</B><DD><P>No info available yet.
<DT><B>unexpand</B><DD><P>No info available yet.
<DT><B>uniq</B><DD><P>No info available yet.
<DT><B>uudecode</B><DD><P>No info available yet.
<DT><B>uuencode</B><DD><P>No info available yet.
<DT><B>vi</B><DD><P>No info available yet.
<DT><B>wait</B><DD><P>No info available yet.
<DT><B>wc</B><DD><P>As of textutils-2.0.8: OK.
<DT><B>who</B><DD><P>As of sh-utils-2.0i: OK.
<DT><B>wish</B><DD><P>No info available yet.
<DT><B>write</B><DD><P>No info available yet.
<DT><B>xargs</B><DD><P>As of findutils-4.1.5: The program uses strstr; a patch has been submitted
to the maintainer.
<DT><B>xgettext</B><DD><P>No info available yet.
<DT><B>yacc</B><DD><P>No info available yet.
<DT><B>zcat</B><DD><P>No info available yet.
</DL>
<P>
<H2><A NAME="ss4.9">4.9 Other X11 applications</A>
</H2>
<P>
<P>Owen Taylor is currently developing a library for rendering multilingual
text, called pango.
<A HREF="http://www.labs.redhat.com/~otaylor/pango/">http://www.labs.redhat.com/~otaylor/pango/</A>,
<A HREF="http://www.pango.org/">http://www.pango.org/</A>.
<P>
<P>
<HR>
<A HREF="Unicode-HOWTO-5.html">Next</A>
<A HREF="Unicode-HOWTO-3.html">Previous</A>
<A HREF="Unicode-HOWTO.html#toc4">Contents</A>
</BODY>
</HTML>