1000 lines
40 KiB
HTML
1000 lines
40 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
||
<HTML>
|
||
<HEAD>
|
||
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
||
<TITLE>The Unicode HOWTO: Specific applications</TITLE>
|
||
<LINK HREF="Unicode-HOWTO-5.html" REL=next>
|
||
<LINK HREF="Unicode-HOWTO-3.html" REL=previous>
|
||
<LINK HREF="Unicode-HOWTO.html#toc4" REL=contents>
|
||
</HEAD>
|
||
<BODY>
|
||
<A HREF="Unicode-HOWTO-5.html">Next</A>
|
||
<A HREF="Unicode-HOWTO-3.html">Previous</A>
|
||
<A HREF="Unicode-HOWTO.html#toc4">Contents</A>
|
||
<HR>
|
||
<H2><A NAME="s4">4. Specific applications</A></H2>
|
||
|
||
<P>
|
||
<P>
|
||
<H2><A NAME="ss4.1">4.1 Shells</A>
|
||
</H2>
|
||
|
||
<P>
|
||
<P>
|
||
<H3>bash</H3>
|
||
|
||
<P>
|
||
<P>By default, GNU bash assumes that every character is one byte long and one
|
||
column wide. A patch for bash 2.04, by Marcin 'Qrczak' Kowalczyk and
|
||
Ricardas Cepas, teaches bash about multibyte characters in UTF-8 encoding.
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/bash-2.04-diff">bash-2.04-diff</A><P>Double-width characters, combining characters and bidi are not supported by
|
||
this patch. It seems a complete redesign of the readline redisplay engine is
|
||
needed.
|
||
<P>
|
||
<H2><A NAME="ss4.2">4.2 Networking</A>
|
||
</H2>
|
||
|
||
<P>
|
||
<P>
|
||
<P>
|
||
<H3>telnet</H3>
|
||
|
||
<P>
|
||
<P>In some installations, telnet is not 8-bit clean by default.
|
||
In order to be able to send Unicode keystrokes to the remote host, you need to
|
||
set telnet into "outbinary" mode.
|
||
There are two ways to do this:
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
$ telnet -L <host>
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
and
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
$ telnet
|
||
telnet> set outbinary
|
||
telnet> open <host>
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
<P>
|
||
<P>
|
||
<H3>kermit</H3>
|
||
|
||
<P>
|
||
<P>The communications program C-Kermit
|
||
<A HREF="http://www.columbia.edu/kermit/ckermit.html">http://www.columbia.edu/kermit/ckermit.html</A>,
|
||
(an interactive tool for connection setup, telnet, file transfer,
|
||
with support for TCP/IP and serial lines),
|
||
in versions 7.0 or newer, understands the file and transfer encodings
|
||
UTF-8 and UCS-2, and understands the terminal encoding UTF-8, and converts
|
||
between these encodings and many others. Documentation of these features
|
||
can be found in
|
||
<A HREF="http://www.columbia.edu/kermit/ckermit2.html#x6.6">http://www.columbia.edu/kermit/ckermit2.html#x6.6</A>.
|
||
<P>
|
||
<H2><A NAME="ss4.3">4.3 Browsers</A>
|
||
</H2>
|
||
|
||
<P>
|
||
<P>
|
||
<H3>Netscape</H3>
|
||
|
||
<P>
|
||
<P>Netscape 4.05 or newer can display HTML documents in UTF-8 encoding. All a
|
||
document needs is the following line between the
|
||
<head> and </head> tags:
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
<P>Netscape 4.05 or newer can also display HTML and text files in UCS-2
|
||
encoding with byte-order mark.
|
||
<P>
|
||
<A HREF="http://www.netscape.com/computing/download/">http://www.netscape.com/computing/download/</A><P>
|
||
<H3>Mozilla</H3>
|
||
|
||
<P>
|
||
<P>Mozilla milestone M16 has much better internationalization than Netscape 4.
|
||
It can display HTML documents in UTF-8 encoding with support for more
|
||
languages. Alas, there is a cosmetic problem with CJK fonts: some glyphs
|
||
can be bigger than the line's height, thus overlapping the previous or next
|
||
line.
|
||
<P>
|
||
<A HREF="http://www.mozilla.org/">http://www.mozilla.org/</A><P>
|
||
<H3>Amaya</H3>
|
||
|
||
<P>
|
||
<P>Amaya 4.2.1
|
||
(
|
||
<A HREF="http://www.w3.org/Amaya/">http://www.w3.org/Amaya/</A>,
|
||
<A HREF="http://www.w3.org/Amaya/User/SourceDist">http://www.w3.org/Amaya/User/SourceDist</A>)
|
||
has now limited handling of UTF-8 encoded HTML pages. It
|
||
recognizes the encoding, but it displays only ISO-8859-1 and symbol
|
||
characters; it only ever accesses the fonts
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
-adobe-times-*-iso8859-1
|
||
-adobe-helvetica-*-iso8859-1
|
||
-adobe-new century schoolbook-*-iso8859-1
|
||
-adobe-courier-*-iso8859-1
|
||
-adobe-symbol-*-adobe-fontspecific
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
<P>Amaya is in fact a HTML editor, not only a browser. Amaya's strengths among
|
||
the browsers are its speed, given enough memory, and its rendering
|
||
of mathematical formulas (MathML support).
|
||
<P>
|
||
<H3>lynx</H3>
|
||
|
||
<P>
|
||
<P>lynx-2.8 has an options screen (key 'O') which permits to set the display
|
||
character set. When running in an xterm or Linux console in UTF-8 mode,
|
||
set this to "UNICODE UTF-8". Note that for this setting to take effect
|
||
in the current browser session, you have to confirm on the "Accept Changes"
|
||
field, and for this setting to take effect in future browser sessions, you
|
||
have to enable the "Save options to disk" field and then confirm it on
|
||
the "Accept Changes" field.
|
||
<P>Now, again, all a document needs is the following line between the
|
||
<head> and </head> tags:
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
<P>When you are viewing text files in UTF-8 encoding, you also need to
|
||
pass the command-line option "-assume_local_charset=UTF-8" (affects only
|
||
file:/... URLs) or "-assume_charset=UTF-8" (affects all URLs).
|
||
In lynx-2.8.2 you can alternatively, in the options screen (key 'O'),
|
||
change the assumed document character set to "utf-8".
|
||
<P>There is also an option in the options screen, to set the "preferred document
|
||
character set". But it has no effect, at least with file:/... URLs
|
||
and with http://... URLs served by apache-1.3.0.
|
||
<P>There is a spacing and line-breaking problem, however. (Look at the
|
||
russian section of x-utf8.html, or at utf-8-demo.txt.)
|
||
<P>Also, in lynx-2.8.2, configured with --enable-prettysrc, the nice colour
|
||
scheme does not work correctly any more when the display character set
|
||
has been set to "UNICODE UTF-8". This is fixed by a simple patch
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/lynx282.diff">lynx282.diff</A>.
|
||
<P>The Lynx developers say: "For any serious use of UTF-8 screen output with
|
||
lynx, compiling with slang lib and -DSLANG_MBCS_HACK is still recommended."
|
||
<P>Latest stable release:
|
||
<A HREF="ftp://ftp.gnu.org/pub/gnu/lynx/lynx-2.8.2.tar.gz">ftp://ftp.gnu.org/pub/gnu/lynx/lynx-2.8.2.tar.gz</A><P>
|
||
<A HREF="http://lynx.isc.org/">http://lynx.isc.org/</A><P>General home page:
|
||
<A HREF="http://lynx.browser.org/">http://lynx.browser.org/</A><P>
|
||
<A HREF="http://www.slcc.edu/lynx/">http://www.slcc.edu/lynx/</A><P>Newer development shapshots:
|
||
<A HREF="http://lynx.isc.org/current/">http://lynx.isc.org/current/</A>,
|
||
<A HREF="ftp://lynx.isc.org/current/">ftp://lynx.isc.org/current/</A><P>
|
||
<H3>w3m</H3>
|
||
|
||
<P>
|
||
<P>w3m by Akinori Ito
|
||
<A HREF="http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/">http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/</A>
|
||
is a text mode browser for HTML pages and plain-text files.
|
||
Its layout of HTML tables, enumerations etc. is much prettier than lynx' one.
|
||
w3m can also be used as a high quality HTML to plain text converter.
|
||
<P>w3m 0.1.10 has command line options for the three major Japanese encodings, but
|
||
can also be used for UTF-8 encoded files. Without command line options,
|
||
you often have to press Ctrl-L to refresh the display, and line breaking
|
||
in Cyrillic and CJK paragraphs is not good.
|
||
<P>To fix this, by Hironori Sakamoto has a patch
|
||
<A HREF="http://www2u.biglobe.ne.jp/~hsaka/w3m/">http://www2u.biglobe.ne.jp/~hsaka/w3m/</A>
|
||
which adds UTF-8 as display encoding.
|
||
<P>
|
||
<H3>Test pages</H3>
|
||
|
||
<P>
|
||
<P>Some test pages for browsers can be found at the pages of Alan Wood
|
||
<A HREF="http://www.hclrss.demon.co.uk/unicode/#links">http://www.hclrss.demon.co.uk/unicode/#links</A>
|
||
and James Kass
|
||
<A HREF="http://home.att.net/~jameskass/">http://home.att.net/~jameskass/</A>.
|
||
<P>
|
||
<H2><A NAME="ss4.4">4.4 Editors</A>
|
||
</H2>
|
||
|
||
<P>
|
||
<P>
|
||
<H3>yudit</H3>
|
||
|
||
<P>
|
||
<P>yudit by Gáspár Sinai
|
||
<A HREF="http://www.yudit.org/">http://www.yudit.org/</A>
|
||
is a first-class unicode text editor for the X Window System.
|
||
It supports simultaneous processing of many languages, input methods,
|
||
conversions for local character standards.
|
||
It has facilities for entering text in all languages with only
|
||
an English keyboard, using keyboard configuration maps.
|
||
<P>
|
||
<H3>yudit-1.5</H3>
|
||
|
||
<P>
|
||
<P>It can be compiled in three versions: Xlib GUI, KDE GUI, or Motif GUI.
|
||
<P>Customization is very easy. Typically you will first customize your font.
|
||
From the font menu I chose "Unicode". Then, since the command
|
||
"xlsfonts '*-*-iso10646-1'" still showed some ambiguity, I chose a font
|
||
size of 13 (to match Markus Kuhn's 13-pixel fixed font).
|
||
<P>Next, you will customize your input method. The input methods "Straight",
|
||
"Unicode" and "SGML" are most remarkable. For details about the other
|
||
built-in input methods, look in /usr/local/share/yudit/data/.
|
||
<P>To change the default for the next session, edit your $HOME/.yuditrc
|
||
file.
|
||
<P>The general editor functionality is limited to editing, cut&paste
|
||
and search&replace. No undo.
|
||
<P>
|
||
<H3>yudit-2.1</H3>
|
||
|
||
<P>
|
||
<P>This version is less easy to learn, because it comes with a homebrewn
|
||
GUI and no easily accessible help. But it has an undo functionality and
|
||
should therefore be more usable than version 1.5.
|
||
<P>
|
||
<H3>Fonts for yudit</H3>
|
||
|
||
<P>
|
||
<P>yudit can display text using a TrueType font; see section "TrueType fonts"
|
||
above. The Bitstream Cyberbit gives good results. For yudit to find the
|
||
font, symlink it to <CODE>/usr/local/share/yudit/data/cyberbit.ttf</CODE>.
|
||
<P>
|
||
<H3>vim</H3>
|
||
|
||
<P>
|
||
<P>vim (as of version 6.0r) has good support for UTF-8: when started in an
|
||
UTF-8 locale, it assumes UTF-8 encoding for the console and the text files
|
||
being edited. It supports double-wide (CJK) characters as well and
|
||
combining characters and therefore fits perfectly into UTF-8 enabled
|
||
xterm.
|
||
<P>Installation: Download from
|
||
<A HREF="http://www.vim.org/">http://www.vim.org/</A>.
|
||
After unpacking the four parts, call <CODE>./configure</CODE> with
|
||
<CODE>--with-features=big</CODE> <CODE>--enable-multibyte</CODE> arguments
|
||
(or edit src/Makefile to include the <CODE>--with-features=big</CODE> and
|
||
<CODE>--enable-multibyte</CODE> options). This will turn on the feature
|
||
FEAT_MBYTE. Then do "make" and "make install".
|
||
<P>vim can be used to edit files in other encodings. For example, to edit
|
||
a BIG5 encoded file: <CODE>:e ++cc=BIG5 filename</CODE>. All encoding names
|
||
supported by iconv are accepted. Plus: vim automatically distinguishes
|
||
UTF-8 and ISO-8859-1 files without needing any command line option.
|
||
<P>
|
||
<H3>cooledit</H3>
|
||
|
||
<P>
|
||
<P>cooledit by Paul Sheer
|
||
<A HREF="http://www.cooledit.org/">http://www.cooledit.org/</A>
|
||
is a good text editor for the X Window System. Since version 3.15, it has
|
||
support for Unicode, including Bidi for Hebrew (but not Arabic).
|
||
<P>A build error message message about a missing "vga_setpage" function is
|
||
worked around by adding "-DDO_NOT_USE_VGALIB" to the CFLAGS.
|
||
<P>To view UTF-8 files in an UTF-8 locale you have to modify a setting in
|
||
the "Options -> Switches" panel: Enable the checkbox "Display characters
|
||
outside locale". I also found it necessary to disable "Spellcheck as you
|
||
type".
|
||
<P>For viewing texts with both European and CJK characters, cooledit needs a
|
||
font which contains both, for example the GNU unifont (see section
|
||
"X11 Unicode fonts"): Start once
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
$ cooledit -fn -gnu-unifont-medium-r-normal--16-160-75-75-c-80-iso10646-1
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
cooledit will then use this font in all future invocations.
|
||
<P>Unfortunately, the only characters that can be entered through the keyboard
|
||
are ISO-8859-1 characters and, through a cooledit specific compose mechanism,
|
||
ISO-8859-2 characters. Inputing arbitrary Unicode characters in cooledit is
|
||
possible, but a bit tedious.
|
||
<P>
|
||
<H3>emacs</H3>
|
||
|
||
<P>
|
||
<P>First of all, you should read the section "International Character Set Support"
|
||
(node "International") in the Emacs manual. In particular, note that you need
|
||
to start Emacs using the command
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
$ emacs -fn fontset-standard
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
so that it will use a font set comprising a lot of international characters.
|
||
<P>In the short term, there are two packages for using UTF-8 in Emacs. None
|
||
of them needs recompiling Emacs.
|
||
<UL>
|
||
<LI>The emacs-utf package
|
||
<A HREF="http://www.cs.ust.hk/faculty/otfried/Mule/">http://www.cs.ust.hk/faculty/otfried/Mule/</A>
|
||
by Otfried Cheong provides a "unicode-utf8" encoding to Emacs.</LI>
|
||
<LI>The oc-unicode package
|
||
<A HREF="http://www.cs.ust.hk/faculty/otfried/Mule/">http://www.cs.ust.hk/faculty/otfried/Mule/</A>,
|
||
by Otfried Cheong, an extension of the Mule-UCS package
|
||
<A HREF="ftp://etlport.etl.go.jp/pub/mule/Mule-UCS/Mule-UCS-0.70.tar.gz">ftp://etlport.etl.go.jp/pub/mule/Mule-UCS/Mule-UCS-0.70.tar.gz</A>
|
||
(mirrored at
|
||
<A HREF="http://riksun.riken.go.jp/archives/misc/mule/Mule-UCS/Mule-UCS-0.70.tar.gz">http://riksun.riken.go.jp/archives/misc/mule/Mule-UCS/Mule-UCS-0.70.tar.gz</A>
|
||
and
|
||
<A HREF="ftp://ftp.m17n.org/pub/mule/Mule-UCS/Mule-UCS-0.70.tar.gz">ftp://ftp.m17n.org/pub/mule/Mule-UCS/Mule-UCS-0.70.tar.gz</A>)
|
||
by Hisashi Miyashita, provides a "utf-8" encoding to Emacs.</LI>
|
||
</UL>
|
||
<P>You can use either of these packages, or both together. The advantages
|
||
of the emacs-utf "unicode-utf8" encoding are: it loads faster, and it deals
|
||
better with combining characters (important for Thai).
|
||
The advantage of the Mule-UCS / oc-unicode "utf-8" encoding is: it can apply
|
||
to a process buffer (such as M-x shell), not only to loading and saving of
|
||
files; and it respects the widths of characters better (important for
|
||
Ethiopian). However, it is less reliable: After heavy editing of a file, I
|
||
have seen some Unicode characters replaced with U+FFFD after the file was
|
||
saved. (But maybe that were bugs in Emacs 20.5 and 20.6 which are fixed in
|
||
Emacs 20.7.)
|
||
<P>To install the emacs-utf package, compile the program "utf2mule" and install
|
||
it somewhere in your $PATH, also install unicode.el, muleuni-1.el,
|
||
unicode-char.el somewhere. Then add the lines
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
(setq load-path (cons "/home/user/somewhere/emacs" load-path))
|
||
(if (not (string-match "XEmacs" emacs-version))
|
||
(progn
|
||
(require 'unicode)
|
||
;(setq unicode-data-path "..../UnicodeData-3.0.0.txt")
|
||
(if (eq window-system 'x)
|
||
(progn
|
||
(setq fontset12
|
||
(create-fontset-from-fontset-spec
|
||
"-misc-fixed-medium-r-normal-*-12-*-*-*-*-*-fontset-standard"))
|
||
(setq fontset13
|
||
(create-fontset-from-fontset-spec
|
||
"-misc-fixed-medium-r-normal-*-13-*-*-*-*-*-fontset-standard"))
|
||
(setq fontset14
|
||
(create-fontset-from-fontset-spec
|
||
"-misc-fixed-medium-r-normal-*-14-*-*-*-*-*-fontset-standard"))
|
||
(setq fontset15
|
||
(create-fontset-from-fontset-spec
|
||
"-misc-fixed-medium-r-normal-*-15-*-*-*-*-*-fontset-standard"))
|
||
(setq fontset16
|
||
(create-fontset-from-fontset-spec
|
||
"-misc-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-standard"))
|
||
(setq fontset18
|
||
(create-fontset-from-fontset-spec
|
||
"-misc-fixed-medium-r-normal-*-18-*-*-*-*-*-fontset-standard"))
|
||
; (set-default-font fontset15)
|
||
))))
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
to your $HOME/.emacs file. To activate any of the font sets, use the Mule
|
||
menu item "Set Font/FontSet" or Shift-down-mouse-1. The Unicode coverage
|
||
may of the font sets at different sizes may depend on the installed fonts;
|
||
here are screen shots at various sizes of UTF-8-demo.txt (
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-12.gif">12</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-13.gif">13</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-14.gif">14</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-15.gif">15</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-16.gif">16</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-UTF-8-demo-18.gif">18</A>)
|
||
and of the Mule script examples (
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-12.gif">12</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-13.gif">13</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-14.gif">14</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-15.gif">15</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-16.gif">16</A>,
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/emacs-HELLO-18.gif">18</A>).
|
||
To designate a font set as the initial font set for the first frame at startup,
|
||
uncomment the <CODE>set-default-font</CODE> line in the code snippet above.
|
||
<P>To install the oc-unicode package, execute the command
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
$ emacs -batch -l oc-comp.el
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
and install the resulting file <CODE>un-define.elc</CODE>, as well as
|
||
<CODE>oc-unicode.el</CODE>, <CODE>oc-charsets.el</CODE>, <CODE>oc-tools.el</CODE>,
|
||
somewhere. Then add the lines
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
(setq load-path (cons "/home/user/somewhere/emacs" load-path))
|
||
(if (not (string-match "XEmacs" emacs-version))
|
||
(progn
|
||
(require 'oc-unicode)
|
||
;(setq unicode-data-path "..../UnicodeData-3.0.0.txt")
|
||
(if (eq window-system 'x)
|
||
(progn
|
||
(setq fontset12
|
||
(oc-create-fontset
|
||
"-misc-fixed-medium-r-normal-*-12-*-*-*-*-*-fontset-standard"
|
||
"-misc-fixed-medium-r-normal-ja-12-*-iso10646-*"))
|
||
(setq fontset13
|
||
(oc-create-fontset
|
||
"-misc-fixed-medium-r-normal-*-13-*-*-*-*-*-fontset-standard"
|
||
"-misc-fixed-medium-r-normal-ja-13-*-iso10646-*"))
|
||
(setq fontset14
|
||
(oc-create-fontset
|
||
"-misc-fixed-medium-r-normal-*-14-*-*-*-*-*-fontset-standard"
|
||
"-misc-fixed-medium-r-normal-ja-14-*-iso10646-*"))
|
||
(setq fontset15
|
||
(oc-create-fontset
|
||
"-misc-fixed-medium-r-normal-*-15-*-*-*-*-*-fontset-standard"
|
||
"-misc-fixed-medium-r-normal-ja-15-*-iso10646-*"))
|
||
(setq fontset16
|
||
(oc-create-fontset
|
||
"-misc-fixed-medium-r-normal-*-16-*-*-*-*-*-fontset-standard"
|
||
"-misc-fixed-medium-r-normal-ja-16-*-iso10646-*"))
|
||
(setq fontset18
|
||
(oc-create-fontset
|
||
"-misc-fixed-medium-r-normal-*-18-*-*-*-*-*-fontset-standard"
|
||
"-misc-fixed-medium-r-normal-ja-18-*-iso10646-*"))
|
||
; (set-default-font fontset15)
|
||
))))
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
to your $HOME/.emacs file. You can choose your appropriate font set as with
|
||
the emacs-utf package.
|
||
<P>In order to open an UTF-8 encoded file, you will type
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
M-x universal-coding-system-argument unicode-utf8 RET
|
||
M-x find-file filename RET
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
or
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
C-x RET c unicode-utf8 RET
|
||
C-x C-f filename RET
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
(or utf-8 instead of unicode-utf8, if you prefer oc-unicode/Mule-UCS).
|
||
<P>In order to start a shell buffer with UTF-8 I/O, you will type
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
M-x universal-coding-system-argument utf-8 RET
|
||
M-x shell RET
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
(This works with oc-unicode/Mule-UCS only.)
|
||
<P>There is a newer version Mule-UCS-0.81. Unfortunately you need to rebuild emacs
|
||
from source in order to use it.
|
||
<P>Note that all this works with Emacs 20 in windowing mode only, not in terminal
|
||
mode. None of the mentioned packages works in Emacs 21, as of this writing.
|
||
<P>Richard Stallman plans to add integrated UTF-8 support to Emacs in the long
|
||
term, and so does the XEmacs developers group.
|
||
<P>
|
||
<H3>xemacs</H3>
|
||
|
||
<P>
|
||
<P>(This section is written by Gilbert Baumann.)
|
||
<P>Here is how to teach XEmacs (20.4 configured with MULE) the UTF-8 encoding.
|
||
Unfortunately you need its sources to be able to patch it.
|
||
<P>First you need these files provided by Tomohiko Morioka:
|
||
<P>
|
||
<A HREF="http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-21.0-b55-emc-b55-ucs.diff">http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-21.0-b55-emc-b55-ucs.diff</A>
|
||
and
|
||
<A HREF="http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-ucs-conv-0.1.tar.gz">http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/xemacs-ucs-conv-0.1.tar.gz</A><P>The .diff is a diff against the C sources. The tar ball is elisp code,
|
||
which provides lots of code tables to map to and from Unicode. As the
|
||
name of the diff file suggests it is against XEmacs-21; I needed to
|
||
help `patch' a bit. The most notable difference to my XEmacs-20.4
|
||
sources is that file-coding.[ch] was called mule-coding.[ch].
|
||
<P>For those unfamilar with the XEmacs-MULE stuff (as I am) a quick
|
||
guide:
|
||
<P>What we call an encoding is called by MULE a `coding-system'. The most
|
||
important commands are:
|
||
<P>
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
M-x set-file-coding-system
|
||
M-x set-buffer-process-coding-system [comint buffers]
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
<P>and the variable `file-coding-system-alist', which guides `find-file'
|
||
to guess the encoding used. After stuff was running, the very first
|
||
thing I did was
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/gb-hacks.el">this</A>.
|
||
<P>This code looks into the special mode line introduced by -*- somewhere
|
||
in the first 600 bytes of the file about to opened; if now there is a
|
||
field "Encoding: xyz;" and the xyz encoding ("coding system" in Emacs speak)
|
||
exists, choose that. So now you could do e.g.
|
||
<P>
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
;;; -*- Mode: Lisp; Syntax: Common-Lisp; Package: CLEX; Encoding: utf-8; -*-
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
<P>and XEmacs goes into utf-8 mode here.
|
||
<P>Atfer everything was running I defined \u03BB (greek lambda) as a
|
||
macro like:
|
||
<P>
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
(defmacro \u03BB (x) `(lambda .,x))
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
<P>
|
||
<H3>nedit</H3>
|
||
|
||
<P>
|
||
<P>
|
||
<H3>xedit</H3>
|
||
|
||
<P>
|
||
<P>With XFree86-4.0.1, xedit is able to edit UTF-8 files if you set the locale
|
||
accordingly (see above), and add the line "Xedit*international: true" to
|
||
your $HOME/.Xdefaults file.
|
||
<P>
|
||
<H3>axe</H3>
|
||
|
||
<P>
|
||
<P>As of version 6.1.2, aXe supports only 8-bit locales. If you add the line
|
||
"Axe*international: true" to your $HOME/.Xdefaults file, it will simply dump
|
||
core.
|
||
<P>
|
||
<H3>pico</H3>
|
||
|
||
<P>
|
||
<P>As of version 4.30, pine cannot be reasonably used to view or edit UTF-8
|
||
files. In UTF-8 enabled xterm, it has severe redraw problems.
|
||
<P>
|
||
<H3>mined98</H3>
|
||
|
||
<P>
|
||
<P>mined98 is a small text editor by Michiel Huisjes, Achim Müller and
|
||
Thomas Wolff.
|
||
<A HREF="http://www.inf.fu-berlin.de/~wolff/mined98.tar.gz">http://www.inf.fu-berlin.de/~wolff/mined98.tar.gz</A>
|
||
It lets you edit UTF-8 or 8-bit encoded files, in an UTF-8 or 8-bit xterm.
|
||
It also has powerful capabilities for entering Unicode characters.
|
||
<P>mined lets you edit both 8-bit encoded and UTF-8 encoded files. By default
|
||
it uses an autodetection heuristic. If you don't want to rely on heuristics,
|
||
pass the command-line option <CODE>-u</CODE> when editing an UTF-8 file, or
|
||
<CODE>+u</CODE> when editing an 8-bit encoded file. You can change the
|
||
interpretation at any time from within the editor: It displays the encoding
|
||
("L:h" for 8-bit, "U:h" for UTF-8) in the menu line. Click on the first
|
||
of these characters to change it.
|
||
<P>mined knows about double-width and combining characters and displays them
|
||
correctly. It also has a special display mode for combining characters.
|
||
<P>mined also has a scrollbar and very nice pull-down menus. Alas, the "Home",
|
||
"End", "Delete" keys do not work.
|
||
<P>
|
||
<H3>qemacs</H3>
|
||
|
||
<P>
|
||
<P>qemacs 0.2 is a small text editor by Fabrice Bellard.
|
||
<A HREF="http://www-stud.enst.fr/~bellard/qemacs/">http://www-stud.enst.fr/~bellard/qemacs/</A>
|
||
with Emacs keybindings. It runs in an UTF-8 console or xterm, and can edit
|
||
both 8-bit encoded and UTF-8 encoded files. It still has a few rough edges,
|
||
but further development is underway.
|
||
<P>
|
||
<H2><A NAME="ss4.5">4.5 Mailers</A>
|
||
</H2>
|
||
|
||
<P>
|
||
<P>MIME: RFC 2279 defines UTF-8 as a MIME charset, which can be transported
|
||
under the 8bit, quoted-printable and base64 encodings. The older MIME
|
||
UTF-7 proposal (RFC 2152) is considered to be deprecated and should not
|
||
be used any further.
|
||
<P>Mail clients released after January 1, 1999, should be capable of sending and
|
||
displaying UTF-8 encoded mails, otherwise they are considered deficient.
|
||
But these mails have to carry the MIME labels
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
Content-Type: text/plain; charset=UTF-8
|
||
Content-Transfer-Encoding: 8bit
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
Simply piping an UTF-8 file into "mail" without caring about the MIME labels
|
||
will not work.
|
||
<P>Mail client implementors should take a look at
|
||
<A HREF="http://www.imc.org/imc-intl/">http://www.imc.org/imc-intl/</A>
|
||
and
|
||
<A HREF="http://www.imc.org/mail-i18n.html">http://www.imc.org/mail-i18n.html</A>.
|
||
<P>Now about the individual mail clients (or "mail user agents"):
|
||
<P>
|
||
<H3>pine</H3>
|
||
|
||
<P>
|
||
<P>The situation for an unpatched pine version 4.30 is as follows.
|
||
<P>Pine does not do character set conversions. But it allows you to view
|
||
UTF-8 mails in an UTF-8 text window (Linux console or xterm).
|
||
<P>Normally, Pine will warn about different character sets each time you view
|
||
an UTF-8 encoded mail. To get rid of this warning, choose S (setup), then
|
||
C (config), then change the value of "character-set" to UTF-8. This option
|
||
will not do anything, except to reduce the warnings, as Pine has no built-in
|
||
knowledge of UTF-8.
|
||
<P>Also note that Pine's notion of Unicode characters is pretty limited: It
|
||
will display Latin and Greek characters, but not other kinds of Unicode
|
||
characters.
|
||
<P>A patch by Robert Brady
|
||
<A HREF="mailto:robert@suse.co.uk"><robert@suse.co.uk></A>
|
||
<A HREF="http://www.ents.susu.soton.ac.uk/~robert/pine-utf8-0.1.diff">http://www.ents.susu.soton.ac.uk/~robert/pine-utf8-0.1.diff</A>
|
||
adds UTF-8 support to Pine. With this patch, it decodes and prints headers
|
||
and bodies properly. The patch depends on the GNOME libunicode
|
||
<A HREF="http://cvs.gnome.org/lxr/source/libunicode/">http://cvs.gnome.org/lxr/source/libunicode/</A>.
|
||
<P>However, alignment remains broken in many places; replying to a mail does
|
||
not cause the character set to be converted as appropriate; and the editor,
|
||
pico, cannot deal with multibyte characters.
|
||
<P>
|
||
<H3>kmail</H3>
|
||
|
||
<P>
|
||
<P>kmail (as of KDE 1.0) does not support UTF-8 mails at all.
|
||
<P>
|
||
<H3>Netscape Communicator</H3>
|
||
|
||
<P>
|
||
<P>Netscape Communicator's Messenger can send and display mails in UTF-8
|
||
encoding, but it needs a little bit of manual user intervention.
|
||
<P>To send an UTF-8 encoded mail: After opening the "Compose" window, but before
|
||
starting to compose the message, select from the menu
|
||
"View -> Character Set -> Unicode (UTF-8)". Then compose the message and
|
||
send it.
|
||
<P>When you receive an UTF-8 encoded mail, Netscape unfortunately does not
|
||
display it in UTF-8 right away, and does not even give a visual clue that
|
||
the mail was encoded in UTF-8. You have to manually select from the menu
|
||
"View -> Character Set -> Unicode (UTF-8)".
|
||
<P>For displaying UTF-8 mails, Netscape uses different fonts. You can adjust
|
||
your font settings in the "Edit -> Preferences -> Fonts" dialog; choose
|
||
the "Unicode" font category.
|
||
<P>
|
||
<H3>emacs (rmail, vm)</H3>
|
||
|
||
<P>
|
||
<P>
|
||
<H3>mutt</H3>
|
||
|
||
<P>
|
||
<P>mutt-1.2.x, as available from
|
||
<A HREF="http://www.mutt.org/">http://www.mutt.org/</A>,
|
||
has only rudimentary support for UTF-8: it can convert
|
||
from UTF-8 into an 8-bit display charset. The mutt-1.3.x
|
||
development branch also supports UTF-8 as the display charset,
|
||
so you can run Mutt in an UTF-8 xterm, and has thorough support
|
||
for MIME and charset conversion (relying on iconv).
|
||
<P>
|
||
<H3>exmh</H3>
|
||
|
||
<P>
|
||
<P>exmh 2.1.2 with Tk<54>8.4a1 can recognize and correctly display UTF-8 mails
|
||
(without CJK characters) if you add the following lines to your
|
||
<CODE>$HOME/.Xdefaults</CODE> file.
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
!
|
||
! Exmh
|
||
!
|
||
exmh.mimeUCharsets: utf-8
|
||
exmh.mime_utf-8_registry: iso10646
|
||
exmh.mime_utf-8_encoding: 1
|
||
exmh.mime_utf-8_plain_families: fixed
|
||
exmh.mime_utf-8_fixed_families: fixed
|
||
exmh.mime_utf-8_proportional_families: fixed
|
||
exmh.mime_utf-8_title_families: fixed
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
<P>
|
||
<H2><A NAME="ss4.6">4.6 Text processing</A>
|
||
</H2>
|
||
|
||
<P>
|
||
<P>
|
||
<H3>groff</H3>
|
||
|
||
<P>
|
||
<P>groff 1.16.1, the GNU implementation of the traditional Unix text processing
|
||
system troff/nroff, can output UTF-8 formatted text. Simply use
|
||
`<CODE>groff -Tutf8</CODE>' instead of `<CODE>groff -Tlatin1</CODE>' or
|
||
`<CODE>groff -Tascii</CODE>'.
|
||
<P>
|
||
<H3>TeX</H3>
|
||
|
||
<P>
|
||
<P>The teTeX 0.9 (and newer) distribution contains an Unicode adaptation of TeX,
|
||
called Omega
|
||
(
|
||
<A HREF="http://www.gutenberg.eu.org/omega/">http://www.gutenberg.eu.org/omega/</A>,
|
||
<A HREF="ftp://ftp.ens.fr/pub/tex/yannis/omega">ftp://ftp.ens.fr/pub/tex/yannis/omega</A>).
|
||
Together with the unicode.tex file contained in
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/utf8-tex-0.1.tar.gz">utf8-tex-0.1.tar.gz</A>
|
||
it enables you to use UTF-8 encoded sources as input for TeX. A thousand of
|
||
Unicode characters are currently supported.
|
||
<P>All that changes is that you run `omega' (instead of `tex') or `lambda'
|
||
(instead of `latex'), and insert the following lines at the head of
|
||
your source input.
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
\ocp\TexUTF=inutf8
|
||
\InputTranslation currentfile \TexUTF
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
|
||
<BLOCKQUOTE><CODE>
|
||
<PRE>
|
||
\input unicode
|
||
</PRE>
|
||
</CODE></BLOCKQUOTE>
|
||
<P>Other maybe related links:
|
||
<A HREF="http://www.dante.de/projekte/nts/NTS-FAQ.html">http://www.dante.de/projekte/nts/NTS-FAQ.html</A>,
|
||
<A HREF="ftp://ftp.dante.de/pub/tex/language/chinese/CJK/">ftp://ftp.dante.de/pub/tex/language/chinese/CJK/</A>.
|
||
<P>
|
||
<H2><A NAME="ss4.7">4.7 Databases</A>
|
||
</H2>
|
||
|
||
<P>
|
||
<P>
|
||
<H3>PostgreSQL</H3>
|
||
|
||
<P>
|
||
<P>PostgreSQL 6.4 or newer can be built with the configuration option
|
||
<CODE>--with-mb=UNICODE</CODE>.
|
||
<P>
|
||
<H3>Interbase</H3>
|
||
|
||
<P>
|
||
<P>Borland/Inprise's Interbase 6.0 can store string fields in UTF-8 format
|
||
if the option "CHARACTER SET UNICODE_FSS" is given.
|
||
<P>
|
||
<H2><A NAME="ss4.8">4.8 Other text-mode applications</A>
|
||
</H2>
|
||
|
||
<P>
|
||
<P>
|
||
<H3>less</H3>
|
||
|
||
<P>
|
||
<P>With
|
||
<A HREF="http://www.flash.net/~marknu/less/less-358.tar.gz">http://www.flash.net/~marknu/less/less-358.tar.gz</A>
|
||
you can browse UTF-8 encoded text files in an UTF-8 xterm or console.
|
||
Make sure that the environment variable LESSCHARSET is not set (or is set
|
||
to utf-8). If you also have a LESSKEY environment variable set, also make
|
||
sure that the file it points to does not define LESSCHARSET. If necessary,
|
||
regenerate this file using the `lesskey' command, or unset the LESSKEY
|
||
environment variable.
|
||
<P>
|
||
<H3>lv</H3>
|
||
|
||
<P>
|
||
<P>lv-4.49.3 by Tomio Narita
|
||
<A HREF="http://www.ff.iij4u.or.jp/~nrt/lv/">http://www.ff.iij4u.or.jp/~nrt/lv/</A>
|
||
is a file viewer with builtin character set converters. To view UTF-8 files
|
||
in an UTF-8 console, use "lv -Au8". But it can also be used to view
|
||
files in other CJK encodings in an UTF-8 console.
|
||
<P>There is a small glitch: lv turns off xterm's cursor and doesn't turn it on
|
||
again.
|
||
<P>
|
||
<H3>expand</H3>
|
||
|
||
<P>
|
||
<P>Get the GNU textutils-2.0 and apply the patch
|
||
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/textutils-2.0.diff">textutils-2.0.diff</A>,
|
||
then configure, add "#define HAVE_FGETWC 1", "#define HAVE_FPUTWC 1" to
|
||
config.h. Then rebuild.
|
||
<P>
|
||
<H3>col, colcrt, colrm, column, rev, ul</H3>
|
||
|
||
<P>
|
||
<P>Get the util-linux-2.9y package, configure it, then define ENABLE_WIDECHAR in
|
||
defines.h, change the "#if 0" to "#if 1" in lib/widechar.h. In
|
||
text-utils/Makefile, modify CFLAGS and LDFLAGS so that they include the
|
||
directories where libutf8 is installed. Then rebuild.
|
||
<P>
|
||
<H3>figlet</H3>
|
||
|
||
<P>
|
||
<P>figlet 2.2 has an option for UTF-8 input: "figlet -C utf8"
|
||
<P>
|
||
<H3>Base utilities</H3>
|
||
|
||
<P>
|
||
<P>The Li18nux list of commands and utilities that ought to be made interoperable
|
||
with UTF-8 is as follows. Useful information needs to get added here; I just
|
||
didn't get around it yet :-)
|
||
<P>As of glibc-2.2, regular expressions only work for 8-bit characters.
|
||
In an UTF-8 locale, regular expressions that contain non-ASCII characters
|
||
or that expect to match a single multibyte character with "." do not work.
|
||
This affects all commands and utilities listed below.
|
||
<P>
|
||
<DL>
|
||
<DT><B>alias</B><DD><P>No info available yet.
|
||
<DT><B>ar</B><DD><P>No info available yet.
|
||
<DT><B>arch</B><DD><P>No info available yet.
|
||
<DT><B>arp</B><DD><P>No info available yet.
|
||
<DT><B>at</B><DD><P>As of at-3.1.8: The two uses of isalnum in at.c are invalid and should be
|
||
replaced with a use of quotearg.c or an exclude list of the (fixed) list
|
||
of shell metacharacters. The two uses of %8s in at.c and atd.c are invalid
|
||
and should become arbitrary length.
|
||
<DT><B>awk</B><DD><P>No info available yet.
|
||
<DT><B>basename</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>batch</B><DD><P>No info available yet.
|
||
<DT><B>bc</B><DD><P>No info available yet.
|
||
<DT><B>bg</B><DD><P>No info available yet.
|
||
<DT><B>bunzip2</B><DD><P>No info available yet.
|
||
<DT><B>bzip2</B><DD><P>No info available yet.
|
||
<DT><B>bzip2recover</B><DD><P>No info available yet.
|
||
<DT><B>cal</B><DD><P>No info available yet.
|
||
<DT><B>cat</B><DD><P>No info available yet.
|
||
<DT><B>cd</B><DD><P>No info available yet.
|
||
<DT><B>cflow</B><DD><P>No info available yet.
|
||
<DT><B>chgrp</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>chmod</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>chown</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>chroot</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>cksum</B><DD><P>As of textutils-2.0e: OK.
|
||
<DT><B>clear</B><DD><P>No info available yet.
|
||
<DT><B>cmp</B><DD><P>No info available yet.
|
||
<DT><B>col</B><DD><P>No info available yet.
|
||
<DT><B>comm</B><DD><P>No info available yet.
|
||
<DT><B>command</B><DD><P>No info available yet.
|
||
<DT><B>compress</B><DD><P>No info available yet.
|
||
<DT><B>cp</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>cpio</B><DD><P>No info available yet.
|
||
<DT><B>crontab</B><DD><P>No info available yet.
|
||
<DT><B>csplit</B><DD><P>No info available yet.
|
||
<DT><B>ctags</B><DD><P>No info available yet.
|
||
<DT><B>cut</B><DD><P>No info available yet.
|
||
<DT><B>date</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>dd</B><DD><P>As of fileutils-4.0u: The conv=lcase, conv=ucase options don't work correctly.
|
||
<DT><B>df</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>diff</B><DD><P>As of diffutils-2.7.2: the --side-by-side mode therefore doesn't compute
|
||
column width correctly.
|
||
<DT><B>diff3</B><DD><P>No info available yet.
|
||
<DT><B>dirname</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>domainname</B><DD><P>No info available yet.
|
||
<DT><B>du</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>echo</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>ed</B><DD><P>No info available yet.
|
||
<DT><B>egrep</B><DD><P>No info available yet.
|
||
<DT><B>env</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>ex</B><DD><P>No info available yet.
|
||
<DT><B>expand</B><DD><P>No info available yet.
|
||
<DT><B>expr</B><DD><P>As of sh-utils-2.0i: The operators "match", "substr", "index", "length"
|
||
don't work correctly.
|
||
<DT><B>false</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>fc</B><DD><P>No info available yet.
|
||
<DT><B>fg</B><DD><P>No info available yet.
|
||
<DT><B>fgrep</B><DD><P>No info available yet.
|
||
<DT><B>file</B><DD><P>No info available yet.
|
||
<DT><B>find</B><DD><P>As of findutils-4.1.6: The "-iregex" does not work correctly; this needs a
|
||
fix in function find/parser.c:insert_regex.
|
||
<DT><B>fold</B><DD><P>No info available yet.
|
||
<DT><B>ftp[BSD]</B><DD><P>No info available yet.
|
||
<DT><B>fuser</B><DD><P>No info available yet.
|
||
<DT><B>gencat</B><DD><P>No info available yet.
|
||
<DT><B>getconf</B><DD><P>No info available yet.
|
||
<DT><B>getopts</B><DD><P>No info available yet.
|
||
<DT><B>gettext</B><DD><P>No info available yet.
|
||
<DT><B>grep</B><DD><P>No info available yet.
|
||
<DT><B>gunzip</B><DD><P>No info available yet.
|
||
<DT><B>gzip</B><DD><P>gzip-1.3 is UTF-8 capable, but it uses only English messages in ASCII
|
||
charset. Proper internationalization would require: Use gettext. Call
|
||
setlocale. In function check_ofname (file gzip.c), use the function rpmatch
|
||
from GNU text/sh/fileutils instead of asking for "y" or "n". The use
|
||
of strlen in gzip.c:852 is wrong, needs to use the function mbswidth.
|
||
<DT><B>hash</B><DD><P>No info available yet.
|
||
<DT><B>head</B><DD><P>No info available yet.
|
||
<DT><B>hostname</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>iconv</B><DD><P>No info available yet.
|
||
<DT><B>id</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>ifconfig</B><DD><P>No info available yet.
|
||
<DT><B>imake</B><DD><P>No info available yet.
|
||
<DT><B>ipcrm</B><DD><P>No info available yet.
|
||
<DT><B>ipcs</B><DD><P>No info available yet.
|
||
<DT><B>jobs</B><DD><P>No info available yet.
|
||
<DT><B>join</B><DD><P>No info available yet.
|
||
<DT><B>kill</B><DD><P>No info available yet.
|
||
<DT><B>killall</B><DD><P>No info available yet.
|
||
<DT><B>ldd</B><DD><P>No info available yet.
|
||
<DT><B>less</B><DD><P>No complete info available yet.
|
||
<DT><B>lex</B><DD><P>No info available yet.
|
||
<DT><B>ln</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>locale</B><DD><P>As of glibc-2.2: OK.
|
||
<DT><B>localedef</B><DD><P>As of glibc-2.2: OK.
|
||
<DT><B>logger</B><DD><P>No info available yet.
|
||
<DT><B>logname</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>lp</B><DD><P>No info available yet.
|
||
<DT><B>lpc[BSD]</B><DD><P>No info available yet.
|
||
<DT><B>lpq[BSD]</B><DD><P>No info available yet.
|
||
<DT><B>lpr[BSD]</B><DD><P>No info available yet.
|
||
<DT><B>lprm[BSD]</B><DD><P>No info available yet.
|
||
<DT><B>lpstat(LEGACY)</B><DD><P>No info available yet.
|
||
<DT><B>ls</B><DD><P>As of fileutils-4.0y: OK.
|
||
<DT><B>m4</B><DD><P>No info available yet.
|
||
<DT><B>mailx</B><DD><P>No info available yet.
|
||
<DT><B>make</B><DD><P>No info available yet.
|
||
<DT><B>man</B><DD><P>No info available yet.
|
||
<DT><B>mesg</B><DD><P>No info available yet.
|
||
<DT><B>mkdir</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>mkfifo</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>mkfs</B><DD><P>No info available yet.
|
||
<DT><B>mkswap</B><DD><P>No info available yet.
|
||
<DT><B>more</B><DD><P>No info available yet.
|
||
<DT><B>mount</B><DD><P>No info available yet.
|
||
<DT><B>msgfmt</B><DD><P>No info available yet.
|
||
<DT><B>msgmerge</B><DD><P>No info available yet.
|
||
<DT><B>mv</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>netstat</B><DD><P>No info available yet.
|
||
<DT><B>newgrp</B><DD><P>No info available yet.
|
||
<DT><B>nice</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>nl</B><DD><P>No info available yet.
|
||
<DT><B>nohup</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>nslookup</B><DD><P>No info available yet.
|
||
<DT><B>nm</B><DD><P>No info available yet.
|
||
<DT><B>od</B><DD><P>No info available yet.
|
||
<DT><B>passwd[BSD]</B><DD><P>No info available yet.
|
||
<DT><B>paste</B><DD><P>No info available yet.
|
||
<DT><B>patch</B><DD><P>No info available yet.
|
||
<DT><B>pathchk</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>ping</B><DD><P>No info available yet.
|
||
<DT><B>pr</B><DD><P>No info available yet.
|
||
<DT><B>printf</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>ps</B><DD><P>No info available yet.
|
||
<DT><B>pwd</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>read</B><DD><P>No info available yet.
|
||
<DT><B>reboot</B><DD><P>No info available yet.
|
||
<DT><B>renice</B><DD><P>No info available yet.
|
||
<DT><B>rm</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>rmdir</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>sed</B><DD><P>No info available yet.
|
||
<DT><B>shar[BSD]</B><DD><P>No info available yet.
|
||
<DT><B>shutdown</B><DD><P>No info available yet.
|
||
<DT><B>sleep</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>sort</B><DD><P>No info available yet.
|
||
<DT><B>split</B><DD><P>No info available yet.
|
||
<DT><B>strings</B><DD><P>No info available yet.
|
||
<DT><B>strip</B><DD><P>No info available yet.
|
||
<DT><B>stty</B><DD><P>As of sh-utils-2.0.11: OK.
|
||
<DT><B>su[BSD]</B><DD><P>No info available yet.
|
||
<DT><B>sum</B><DD><P>As of textutils-2.0e: OK.
|
||
<DT><B>tail</B><DD><P>No info available yet.
|
||
<DT><B>talk</B><DD><P>No info available yet.
|
||
<DT><B>tar</B><DD><P>As of tar-1.13.17: OK, if user and group names are always ASCII.
|
||
<DT><B>tclsh</B><DD><P>No info available yet.
|
||
<DT><B>tee</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>telnet</B><DD><P>No info available yet.
|
||
<DT><B>test</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>time</B><DD><P>No info available yet.
|
||
<DT><B>touch</B><DD><P>As of fileutils-4.0u: OK.
|
||
<DT><B>tput</B><DD><P>No info available yet.
|
||
<DT><B>tr</B><DD><P>No info available yet.
|
||
<DT><B>true</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>tsort</B><DD><P>No info available yet.
|
||
<DT><B>tty</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>type</B><DD><P>No info available yet.
|
||
<DT><B>ulimit</B><DD><P>No info available yet.
|
||
<DT><B>umask</B><DD><P>No info available yet.
|
||
<DT><B>umount</B><DD><P>No info available yet.
|
||
<DT><B>unalias</B><DD><P>No info available yet.
|
||
<DT><B>uname</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>uncompress</B><DD><P>No info available yet.
|
||
<DT><B>unexpand</B><DD><P>No info available yet.
|
||
<DT><B>uniq</B><DD><P>No info available yet.
|
||
<DT><B>uudecode</B><DD><P>No info available yet.
|
||
<DT><B>uuencode</B><DD><P>No info available yet.
|
||
<DT><B>vi</B><DD><P>No info available yet.
|
||
<DT><B>wait</B><DD><P>No info available yet.
|
||
<DT><B>wc</B><DD><P>As of textutils-2.0.8: OK.
|
||
<DT><B>who</B><DD><P>As of sh-utils-2.0i: OK.
|
||
<DT><B>wish</B><DD><P>No info available yet.
|
||
<DT><B>write</B><DD><P>No info available yet.
|
||
<DT><B>xargs</B><DD><P>As of findutils-4.1.5: The program uses strstr; a patch has been submitted
|
||
to the maintainer.
|
||
<DT><B>xgettext</B><DD><P>No info available yet.
|
||
<DT><B>yacc</B><DD><P>No info available yet.
|
||
<DT><B>zcat</B><DD><P>No info available yet.
|
||
</DL>
|
||
<P>
|
||
<H2><A NAME="ss4.9">4.9 Other X11 applications</A>
|
||
</H2>
|
||
|
||
<P>
|
||
<P>Owen Taylor is currently developing a library for rendering multilingual
|
||
text, called pango.
|
||
<A HREF="http://www.labs.redhat.com/~otaylor/pango/">http://www.labs.redhat.com/~otaylor/pango/</A>,
|
||
<A HREF="http://www.pango.org/">http://www.pango.org/</A>.
|
||
<P>
|
||
<P>
|
||
<HR>
|
||
<A HREF="Unicode-HOWTO-5.html">Next</A>
|
||
<A HREF="Unicode-HOWTO-3.html">Previous</A>
|
||
<A HREF="Unicode-HOWTO.html#toc4">Contents</A>
|
||
</BODY>
|
||
</HTML>
|