old-www/HOWTO/Unicode-HOWTO-2.html

358 lines
16 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
<TITLE>The Unicode HOWTO: Display setup</TITLE>
<LINK HREF="Unicode-HOWTO-3.html" REL=next>
<LINK HREF="Unicode-HOWTO-1.html" REL=previous>
<LINK HREF="Unicode-HOWTO.html#toc2" REL=contents>
</HEAD>
<BODY>
<A HREF="Unicode-HOWTO-3.html">Next</A>
<A HREF="Unicode-HOWTO-1.html">Previous</A>
<A HREF="Unicode-HOWTO.html#toc2">Contents</A>
<HR>
<H2><A NAME="s2">2. Display setup</A></H2>
<P>
<P>We assume you have already adapted your Linux console and X11 configuration
to your keyboard and locale. This is explained in the Danish/International
HOWTO, and in the other national HOWTOs: Finnish, French, German, Italian,
Polish, Slovenian, Spanish, Cyrillic, Hebrew, Chinese, Thai, Esperanto. But
please do not follow the advice given in the Thai HOWTO, to pretend you
were using ISO-8859-1 characters (U0000..U00FF) when what you are typing
are actually Thai characters (U0E01..U0E5B). Doing so will only cause
problems when you switch to Unicode.
<P>
<H2><A NAME="ss2.1">2.1 Linux console</A>
</H2>
<P>
<P>I'm not talking much about the Linux console here, because on those machines
on which I don't have xdm running, I use it only to type my login name,
my password, and "xinit".
<P>Anyway, the kbd-0.99 package
<A HREF="ftp://sunsite.unc.edu/pub/Linux/system/keyboards/kbd-0.99.tar.gz">ftp://sunsite.unc.edu/pub/Linux/system/keyboards/kbd-0.99.tar.gz</A>
and a heavily extended version, the console-tools-0.2.3 package
<A HREF="ftp://sunsite.unc.edu/pub/Linux/system/keyboards/console-tools-0.2.3.tar.gz">ftp://sunsite.unc.edu/pub/Linux/system/keyboards/console-tools-0.2.3.tar.gz</A>
contains in the kbd-0.99/src/ (or console-tools-0.2.3/screenfonttools/)
directory two programs: `unicode_start' and `unicode_stop'. When you call
`unicode_start', the console's screen output is interpreted as UTF-8. Also,
the keyboard is put into Unicode mode (see "man kbd_mode"). In this mode,
Unicode characters typed as Alt-x1 ... Alt-xn (where x1,...,xn are digits on
the numeric keypad) will be emitted in UTF-8. If your keyboard or, more
precisely, your normal keymap has non-ASCII letter keys (like the German
Umlaute) which you would like to be CapsLockable, you need to apply the kernel
patch
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/linux-2.2.9-keyboard.diff">linux-2.2.9-keyboard.diff</A>
or
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/linux-2.3.12-keyboard.diff">linux-2.3.12-keyboard.diff</A>.
<P>You will want to use display characters from different scripts on the same
screen. For this, you need a Unicode console font. The
<A HREF="ftp://sunsite.unc.edu/pub/Linux/system/keyboards/kbd-0.99.tar.gz">ftp://sunsite.unc.edu/pub/Linux/system/keyboards/kbd-0.99.tar.gz</A>
and
<A HREF="ftp://sunsite.unc.edu/pub/Linux/system/keyboards/console-data-1999.08.29.tar.gz">ftp://sunsite.unc.edu/pub/Linux/system/keyboards/console-data-1999.08.29.tar.gz</A>
packages contain a font (LatArCyrHeb-{08,14,16,19}.psf) which
covers Latin, Cyrillic, Hebrew, Arabic scripts. It covers ISO 8859 parts
1,2,3,4,5,6,8,9,10 all at once. To install it, copy it to
/usr/lib/kbd/consolefonts/ and execute
"/usr/bin/setfont /usr/lib/kbd/consolefonts/LatArCyrHeb-14.psf".
<P>A more flexible approach is given by Dmitry Yu. Bolkhovityanov
<A HREF="mailto:D.Yu.Bolkhovityanov@inp.nsk.su">&lt;D.Yu.Bolkhovityanov@inp.nsk.su&gt;</A>
in
<A HREF="http://www.inp.nsk.su/~bolkhov/files/fonts/univga/index.html">http://www.inp.nsk.su/~bolkhov/files/fonts/univga/index.html</A>
and
<A HREF="http://www.inp.nsk.su/~bolkhov/files/fonts/univga/uni-vga.tgz">http://www.inp.nsk.su/~bolkhov/files/fonts/univga/uni-vga.tgz</A>.
To work around the constraint that a VGA font can only cover 512 characters simultaneously,
he provides a rich Unicode font (2279 characters, covering Latin, Greek, Cyrillic, Hebrew,
Armenian, IPA, math symbols, arrows, and more) in the typical 8x16 size and a script
which permits to extract any 512 characters as a console font.
<P>If you want cut&amp;paste to work with UTF-8 consoles, you need the patch
<A HREF="ftp://ftp.ilog.fr/pub/Users/haible/utf8/linux-2.3.12-console.diff">linux-2.3.12-console.diff</A>
from Edmund Thomas Grimley Evans and Stanislav Voronyi.
<P>In April 2000, Edmund Thomas Grimley Evans
<A HREF="mailto:edmundo@rano.org">&lt;edmundo@rano.org&gt;</A>
has implemented an UTF-8 console terminal emulator. It uses Unicode fonts
and relies on the Linux frame buffer device.
<P>
<H2><A NAME="ss2.2">2.2 X11 Foreign fonts</A>
</H2>
<P>
<P>Don't hesitate to install Cyrillic, Chinese, Japanese etc. fonts. Even
if they are not Unicode fonts, they will help in displaying Unicode
documents: at least Netscape Communicator 4 and Java will make use of
foreign fonts when available.
<P>The following programs are useful when installing fonts:
<UL>
<LI>"mkfontdir directory"
prepares a font directory for use by the X server, needs to be executed
after installing fonts in a directory.</LI>
<LI>"xset -q | sed -e '1,/^Font Path:/d' | sed -e '2,$d' -e 's/^ //'"
displays the X server's current font path.</LI>
<LI>"xset fp+ directory"
adds a directory to the X server's current font path.
To add a directory permanently, add a "FontPath" line to your
/etc/XF86Config file, in section "Files".</LI>
<LI>"xset fp rehash"
needs to be executed after calling mkfontdir on a directory that is
already contained in the X server's current font path.</LI>
<LI>"xfontsel"
allows you to browse the installed fonts by selecting various font
properties.</LI>
<LI>"xlsfonts -fn fontpattern"
lists all fonts matching a font pattern. Also displays various font
properties. In particular, "xlsfonts -ll -fn font" lists the font
properties CHARSET_REGISTRY and CHARSET_ENCODING, which together
determine the font's encoding.</LI>
<LI>"xfd -fn font"
displays a font page by page.</LI>
</UL>
<P>The following fonts are freely available (not a complete list):
<UL>
<LI>The ones contained in XFree86, sometimes packaged in separate packages.
For example, SuSE has only normal 75dpi fonts in the base `xf86' package.
The other fonts are in the packages `xfnt100', `xfntbig', `xfntcyr',
`xfntscl'.</LI>
<LI>The Emacs international fonts,
<A HREF="ftp://ftp.gnu.org/pub/gnu/intlfonts/intlfonts-1.2.tar.gz">ftp://ftp.gnu.org/pub/gnu/intlfonts/intlfonts-1.2.tar.gz</A>
As already mentioned, they are useful even if you prefer XEmacs to
GNU Emacs or don't use any Emacs at all.</LI>
</UL>
<P>
<H2><A NAME="ss2.3">2.3 X11 Unicode fonts</A>
</H2>
<P>
<P>Applications wishing to display text belonging to different scripts (like
Cyrillic and Greek) at the same time, can do so by using different X fonts
for the various pieces of text. This is what Netscape Communicator and Java
do. However, this approach is more complicated, because instead of working
with `Font' and `XFontStruct', the programmer has to deal with `XFontSet',
and also because not all fonts in the font set need to have the same
dimensions.
<P>
<UL>
<LI>Markus Kuhn has assembled fixed-width 75dpi fonts with Unicode encoding
covering Latin, Greek, Cyrillic, Armenian, Georgian, Hebrew scripts and
many symbols.
They cover ISO 8859 parts 1,2,3,4,5,7,8,9,10,13,14,15,16 all at once.
These fonts are required for running xterm in utf-8 mode. They are now
contained in XFree86 4.0.1, therefore you need to install them manually
only if you have an older XFree86 3.x version.
<A HREF="http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz">http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz</A>.</LI>
<LI>Markus Kuhn has also assembled double-width fixed 75dpi fonts with Unicode
encoding covering Chinese, Japanese and Korean. These fonts are contained
in XFree86 4.0.1 as well.
<A HREF="http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-asian.tar.gz">http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-asian.tar.gz</A></LI>
<LI>Roman Czyborra has assembled an 8x16 / 16x16 75dpi font with Unicode encoding
covering a huge part of Unicode. Download unifont.hex.gz and hex2bdf from
<A HREF="http://czyborra.com/unifont/">http://czyborra.com/unifont/</A>.
It is not fixed-width: 8 pixels wide for European characters, 16 pixels wide
for Chinese characters. Installation instructions:
<BLOCKQUOTE><CODE>
<PRE>
$ gunzip unifont.hex.gz
$ hex2bdf &lt; unifont.hex > unifont.bdf
$ bdftopcf -o unifont.pcf unifont.bdf
$ gzip -9 unifont.pcf
# cp unifont.pcf.gz /usr/X11R6/lib/X11/fonts/misc
# cd /usr/X11R6/lib/X11/fonts/misc
# mkfontdir
# xset fp rehash
</PRE>
</CODE></BLOCKQUOTE>
</LI>
<LI>Primoz Peterlin has assembled an ETL family fonts covering Latin, Greek,
Cyrillic, Armenian, Georgian, Hebrew scripts.
<A HREF="ftp://ftp.x.org/contrib/fonts/etl-unicode.tar.gz">ftp://ftp.x.org/contrib/fonts/etl-unicode.tar.gz</A>
Use the "bdftopcf" program in order to install it.</LI>
<LI>Mark Leisher has assembled a proportional, 17 pixel high (12 point), font,
called ClearlyU, covering Latin, Greek, Cyrillic, Armenian, Georgian, Hebrew,
Thai, Laotian scripts.
<A HREF="http://crl.nmsu.edu/~mleisher/cu.html">http://crl.nmsu.edu/~mleisher/cu.html</A>.
Installation instructions:
<BLOCKQUOTE><CODE>
<PRE>
$ bdftopcf -o cu12.pcf cu12.bdf
$ gzip -9 cu12.pcf
# cp cu12.pcf.gz /usr/X11R6/lib/X11/fonts/misc
# cd /usr/X11R6/lib/X11/fonts/misc
# mkfontdir
# xset fp rehash
</PRE>
</CODE></BLOCKQUOTE>
</LI>
</UL>
<P>
<P>
<H2><A NAME="ss2.4">2.4 Unicode xterm</A>
</H2>
<P>
<P>xterm is part of X11R6 and XFree86, but is maintained separately by Tom
Dickey.
<A HREF="http://www.clark.net/pub/dickey/xterm/xterm.html">http://www.clark.net/pub/dickey/xterm/xterm.html</A>
Newer versions (patch level 146 and above) contain support for converting
keystrokes to UTF-8 before sending them to the application running in the
xterm, and for displaying Unicode characters that the application outputs
as UTF-8 byte sequence. It also contains support for double-wide characters
(mostly CJK ideographs) and combining characters, contributed by Robert Brady
<A HREF="mailto:robert@suse.co.uk">&lt;robert@suse.co.uk&gt;</A>.
<P>To get an UTF-8 xterm running, you need to:
<UL>
<LI>Fetch
<A HREF="http://www.clark.net/pub/dickey/xterm/xterm.tar.gz">http://www.clark.net/pub/dickey/xterm/xterm.tar.gz</A>,</LI>
<LI>Configure it by calling "./configure --enable-wide-chars ...", then
compile and install it.</LI>
<LI>Have a Unicode fixed-width font installed. Markus Kuhn's ucs-fonts.tar.gz
(see above) is made for this.</LI>
<LI>Start "xterm -u8 -fn '-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1'".
The option "-u8" turns on Unicode and UTF-8 handling. The font designated
by the long "-fn" option is Markus Kuhn's Unicode font. Without this option,
the default font called "fixed" would be used, an ISO-8859-1 6x13 font.</LI>
<LI>Take a look at the sample files contained in Markus Kuhn's ucs-fonts
package:
<BLOCKQUOTE><CODE>
<PRE>
$ cd .../ucs-fonts
$ cat quickbrown.txt
$ cat utf-8-demo.txt
</PRE>
</CODE></BLOCKQUOTE>
You should be seeing (among others) greek and russian characters.</LI>
<LI>To make xterm come up with UTF-8 handling each time it is started,
add the lines
<BLOCKQUOTE><CODE>
<PRE>
xterm*utf8: 1
xterm*VT100*font: -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1
xterm*VT100*wideFont: -misc-fixed-medium-r-normal-ja-13-125-75-75-c-120-iso10646-1
xterm*VT100*boldFont: -misc-fixed-bold-r-semicondensed--13-120-75-75-c-60-iso10646-1
</PRE>
</CODE></BLOCKQUOTE>
to your $HOME/.Xdefaults (for yourself only).
For CJK text processing with double-width characters, the following
settings are probably better:
<BLOCKQUOTE><CODE>
<PRE>
xterm*VT100*font: -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1
xterm*VT100*wideFont: -Misc-Fixed-Medium-R-Normal-ja-18-120-100-100-C-180-ISO10646-1
</PRE>
</CODE></BLOCKQUOTE>
I don't recommend changing
the system-wide /usr/X11R6/lib/X11/app-defaults/XTerm, because then your
changes will be erased next time you upgrade to a new XFree86 version.</LI>
</UL>
<P>
<P>
<H2><A NAME="ss2.5">2.5 TrueType fonts</A>
</H2>
<P>
<P>The fonts mentioned above are fixed size and not scalable. For some
applications, especially printing, high resolution fonts are necessary,
though. The most important type of scalable, high resolution fonts are
TrueType fonts.
They are currently supported by
<UL>
<LI>XFree86 4.0.1; you need to add the line
<BLOCKQUOTE><CODE>
<PRE>
Load "freetype"
</PRE>
</CODE></BLOCKQUOTE>
or
<BLOCKQUOTE><CODE>
<PRE>
Load "xtt"
</PRE>
</CODE></BLOCKQUOTE>
to the <CODE>"Module"</CODE> section of your XF86Config file.</LI>
<LI>The display engines of other operating systems.</LI>
<LI>The yudit editor, see below, and its printing engine.</LI>
</UL>
<P>Some no-cost TrueType fonts with large Unicode coverage are
<DL>
<DT><B>Bitstream Cyberbit</B><DD><P>Covers Roman, Cyrillic, Greek, Hebrew, Arabic, combining diacritical marks,
Chinese, Korean, Japanese, and more.
<P>Downloadable from
<A HREF="ftp://ftp.netscape.com/pub/communicator/extras/fonts/windows/Cyberbit.ZIP">ftp://ftp.netscape.com/pub/communicator/extras/fonts/windows/Cyberbit.ZIP</A>.
It is free for non-commercial purposes.
<P>
<DT><B>Microsoft Arial</B><DD><P>Covers Roman, Cyrillic, Greek, Hebrew, Arabic, some combining diacritical
marks, Vietnamese.
<P>Downloadable; look on a search engine for ftp-able files called
<CODE>arial.ttf</CODE>, <CODE>ariali.ttf</CODE>, <CODE>arialbd.ttf</CODE>,
<CODE>arialbi.ttf</CODE>.
<P>
<DT><B>Lucida Sans Unicode</B><DD><P>Covers Roman, Cyrillic, Greek, Hebrew, combining diacritical marks.
<P>Download: contained in IBM's JDK 1.3.0 for Linux, at
<A HREF="http://www.ibm.com/java/jdk/linux130/">http://www.ibm.com/java/jdk/linux130/</A>,
or directly downloadable as <CODE>LucidaSansRegular.ttf</CODE> and
<CODE>LucidaSansOblique.ttf</CODE> from
<A HREF="ftp://ftp.maths.tcd.ie/Linux/opt/IBMJava2-13/jre/lib/fonts/">ftp://ftp.maths.tcd.ie/Linux/opt/IBMJava2-13/jre/lib/fonts/</A>.
<P>
<DT><B>Arphic</B><DD><P>Cover Chinese (both traditional and simplified).
<P>Download: at
<A HREF="ftp://ftp.gnu.org/non-gnu/chinese-fonts-truetype/">ftp://ftp.gnu.org/non-gnu/chinese-fonts-truetype/</A>.
These fonts are truly free.
<P>
</DL>
<P>Download locations for these and other TrueType fonts can be found at
Christoph Singer's list of freely downloadable Unicode TrueType fonts
<A HREF="http://www.ccss.de/slovo/unifonts.htm">http://www.ccss.de/slovo/unifonts.htm</A>.
<P>Truetype fonts are installed similarly to fixed size fonts, except that
they go in a separate directory, and that <CODE>ttmkfdir</CODE> must be
called before <CODE>mkfontdir</CODE>:
<BLOCKQUOTE><CODE>
<PRE>
# mkdir -p /usr/X11R6/lib/X11/fonts/truetype
# cp /somewhere/Cyberbit.ttf ... /usr/X11R6/lib/X11/fonts/truetype
# cd /usr/X11R6/lib/X11/fonts/truetype
# ttmkfdir > fonts.scale
# mkfontdir
# xset fp rehash
</PRE>
</CODE></BLOCKQUOTE>
<P>TrueType fonts can be converted to low resolution, non-scalable X11 fonts by
use of Mark Leisher's ttf2bdf utility
<A HREF="ftp://crl.nmsu.edu/CLR/multiling/General/ttf2bdf-2.8-LINUX.tar.gz">ftp://crl.nmsu.edu/CLR/multiling/General/ttf2bdf-2.8-LINUX.tar.gz</A>.
For example, to generate a proportional Unicode font for use with cooledit:
<BLOCKQUOTE><CODE>
<PRE>
# cd /usr/X11R6/lib/X11/fonts/local
# ttf2bdf ../truetrype/Cyberbit.ttf > cyberbit.bdf
# bdftopcf -o cyberbit.pcf cyberbit.bdf
# gzip -9 cyberbit.pcf
# mkfontdir
# xset fp rehash
</PRE>
</CODE></BLOCKQUOTE>
<P>More information about TrueType fonts can be found in the Linux TrueType HOWTO
<A HREF="http://www.moisty.org/~brion/linux/TrueType-HOWTO.html">http://www.moisty.org/~brion/linux/TrueType-HOWTO.html</A>.
<P>
<H2><A NAME="ss2.6">2.6 Miscellaneous</A>
</H2>
<P>
<P>A small program which tests whether a Linux console or xterm is in UTF-8 mode
can be found in the
<A HREF="ftp://sunsite.unc.edu/pub/Linux/system/keyboards/x-lt-1.24.tar.gz">ftp://sunsite.unc.edu/pub/Linux/system/keyboards/x-lt-1.24.tar.gz</A>
package by Ricardas Cepas, files testUTF-8.c and testUTF8.c. Most applications
should not use this, however: they should look at the environment variables,
see section "Locale environment variables".
<P>
<P>
<HR>
<A HREF="Unicode-HOWTO-3.html">Next</A>
<A HREF="Unicode-HOWTO-1.html">Previous</A>
<A HREF="Unicode-HOWTO.html#toc2">Contents</A>
</BODY>
</HTML>