mirror of https://github.com/tLDP/LDP
2157 lines
77 KiB
Plaintext
2157 lines
77 KiB
Plaintext
<!doctype linuxdoc system>
|
|
<!-- $Header$ -->
|
|
|
|
|
|
|
|
<article>
|
|
|
|
<title>The Linux Cyrillic HOWTO
|
|
<author> Alexander L. Belikoff, (<tt/abel@bfr.co.il/), Berger
|
|
Financial Research Ltd.
|
|
<date>v4.0, 23 January 1998
|
|
|
|
<abstract>
|
|
This document describes how to set up your Linux box to typeset, view
|
|
and print the documents in the Russian language.
|
|
</abstract>
|
|
|
|
|
|
<toc>
|
|
|
|
<sect>Administrativia
|
|
<p>
|
|
|
|
<sect1>Introduction
|
|
<p>
|
|
|
|
This document covers the things you need to successfully work with
|
|
information containing cyrillic text (mostly Russian) under
|
|
Linux. Although this document assumes your using Linux as an operating
|
|
system, most of information presented is equally applicable to many
|
|
other Unix flavors. I shall try to keep the distinction as visible as
|
|
possible.
|
|
|
|
There are a number of popular Linux distributions. As an example
|
|
system I describe the RedHat 4.1 Linux (Vanderbildt) - the one I am
|
|
personally using. Nevertheless, I shall try to highlight the
|
|
differences, if they exist, in other popular distributions, such as
|
|
Debian GNU/Linux and Slackware Linux.
|
|
|
|
Since such setup directly modifies and extends the Operating System,
|
|
you should understand, what you are doing. Even though I tried to keep
|
|
things as easy as possible, having some experience with a given piece
|
|
of software is an advantage. I am not going to describe what the X
|
|
Window System is or how to typeset the documents with TeX and LaTeX,
|
|
or how to install printer in Linux. Those issues are covered in other
|
|
documents.
|
|
|
|
For the same reason, in most cases I describe a system-wide setup, by
|
|
default requiring <em/root/ privileges. Still, if there is a
|
|
possibility for user-level setup, I'll try to mention it.
|
|
|
|
<bf/NOTE:/ The X Window System, TeX and other Linux components are complex
|
|
systems with a sofisticated configuration. If you do something wrong,
|
|
you can not only fail with Russian setup, but to break the component
|
|
as well, if not the entire system. This is not to scare you off, but
|
|
merely to make you understand the seriousness of the process and be
|
|
careful. Preliminary backup of the config files is <bf/highly/
|
|
recommended. Having a guru around is also advantageous.
|
|
|
|
|
|
|
|
<sect1>Availability and feedback
|
|
<p>
|
|
|
|
This document is available at <htmlurl
|
|
url="http://sunsite.unc.edu/LDP" name="sunsite.unc.edu"> or
|
|
<htmlurl url="ftp://tsx-11.mit.edu/pub/linux" name="tsx-11.mit.edu">
|
|
as a part of the <em/Linux Document Project/. Also, it may be
|
|
available at various FTP sites containing Linux. Moreover, it may be
|
|
included as a part of Linux distribution.
|
|
|
|
If you have any suggestions or corrections regarding this document,
|
|
please, don't hesitate to contact me as <htmlurl
|
|
url="mailto:abel@bfr.co.il" name="abel@bfr.co.il">. Any new
|
|
and useful information about Cyrillic support in various Unices is
|
|
<em/highly appreciated/. Remember, it will help the others.
|
|
|
|
|
|
<sect1>Acknowledgments and copyrights
|
|
<p>
|
|
|
|
Many people helped me (and not only me) with valuable information and
|
|
suggestions. Even more people contributed software to the public
|
|
community. I am sorry if I forgot to mention somebody.
|
|
|
|
So, here they go:
|
|
|
|
<itemize>
|
|
|
|
<item>Bas V. de Bakker
|
|
<item>David Daves
|
|
<item>Serge Vakulenko
|
|
<item>Sergei O. Naoumov
|
|
<item>Winfried Truemper
|
|
<item>Ilya K. Orehov
|
|
<item>Michael Van Canneyt
|
|
<item>Alex Bogdanov
|
|
<item>...and the countless helpful people from the
|
|
<htmlurl url="news:relcom.fido.ru.unix" name="relcom.fido.ru.unix">
|
|
and <htmlurl url="news:relcom.fido.ru.linux" name="relcom.fido.ru.linux">
|
|
Usenet newsgroups.
|
|
|
|
</itemize>
|
|
|
|
This document is Copyright (C) 1995,1997 by Alexander L. Belikoff. It
|
|
may be used and distributed under the usual Linux HOWTO terms
|
|
described below.
|
|
|
|
The following is a Linux HOWTO copyright notice:
|
|
|
|
<quote>
|
|
<it>Unless otherwise stated, Linux HOWTO documents are copyrighted by their
|
|
respective authors. Linux HOWTO documents may be reproduced and distributed
|
|
in whole or in part, in any medium physical or electronic, as long as
|
|
this copyright notice is retained on all copies. Commercial redistribution
|
|
is allowed and encouraged; however, the author would like to be notified of
|
|
any such distributions.</it>
|
|
</quote>
|
|
|
|
<quote>
|
|
<it>All translations, derivative works, or aggregate works incorporating
|
|
any Linux HOWTO documents must be covered under this copyright notice.
|
|
That is, you may not produce a derivative work from a HOWTO and impose
|
|
additional restrictions on its distribution. Exceptions to these rules
|
|
may be granted under certain conditions; please contact the Linux HOWTO
|
|
coordinator at the address given below.</it>
|
|
</quote>
|
|
|
|
<quote>
|
|
<it>In short, we wish to promote dissemination of this information through as
|
|
many channels as possible. However, we do wish to retain copyright on the
|
|
HOWTO documents, and would like to be notified of any plans to redistribute
|
|
the HOWTOs.</it>
|
|
</quote>
|
|
|
|
If you have questions, please contact Tim Bynum, the Linux HOWTO
|
|
coordinator, at <htmlurl url="mailto:linux-howto@sunsite.unc.edu"
|
|
name="linux-howto@sunsite.unc.edu">. You may finger this address for phone
|
|
number and additional contact information.
|
|
|
|
Unix is a technology trademark of the X/Open Ltd.; MS-DOS, Windows,
|
|
Windows 95, and Windows NT are trademarks of the Microsoft Corp.; The
|
|
X Window System is a trademark of The X Consortium Inc. Other
|
|
trademarks belong to the appropriate holders.
|
|
|
|
|
|
<sect>Theoretical background
|
|
<p>
|
|
|
|
<sect1>Characters and codesets
|
|
<p>
|
|
|
|
In order to understand and print characters of various languages, the
|
|
system and software should be able to distinguish them from other
|
|
characters. That is, each unique character must have a unique
|
|
representation inside the operating system, or the particular software
|
|
package. Such collection of all unique characters, that the system is
|
|
able to represent at once, is called a <em/codeset/.
|
|
|
|
At the time of the most operating system's creation, nobody cared
|
|
about software being multilingual. Therefore, the most popular codeset
|
|
was (and actually is) an <em/ASCII/ (American Standard Code for
|
|
Information Interchange).
|
|
|
|
The <em/standard ASCII/ (aka 7-bit ASCII) comprises 128 unique
|
|
codes. Some of them ASCII defines as real printable characters, and
|
|
some are so-called <em/control characters/, which had special meanings
|
|
in the old communication protocols. Each element of the set is
|
|
identified by an integer <em/character code/ (0-127). The subset of
|
|
printable characters represents those found on the typewriter's
|
|
keyboard with some minor additions. Each character occupies 7 least
|
|
significant bits of a byte, whereas the most significant one was used
|
|
for control purposes (say, transmission control in old communication
|
|
packages).
|
|
|
|
The 7-bit ASCII concept was extended by 8-bit ASCII (aka <em/extended
|
|
ASCII/). In this codeset, the characters' codes' range is 0-255. The
|
|
lower half (0-127) is pure ASCII, whereas the upper one contains 127
|
|
more characters. Since this codeset is backward compatible with the
|
|
ASCII (character still occupies 8 bit, the codes correspond the old
|
|
ASCII), this codeset gained wide popularity.
|
|
|
|
The 8-bit ASCII doesn't define the contents of the upper half of the
|
|
codeset. Therefore the ISO organization took the responsibility of
|
|
defining a family of standards known as <em/ISO 8859-X/ family. It is
|
|
a collection of 8-bit codesets, where the lower half of each codeset
|
|
(characters with codes 0-127) matches the ASCII and the upper parts
|
|
define characters for various languages. For example, the following
|
|
codesets are defined:
|
|
|
|
<itemize>
|
|
|
|
<item><tt/8859-1/ - Europe, Latin America (also known as <em/Latin 1/)
|
|
|
|
<item><tt/8859-2/ - Eastern Europe
|
|
|
|
<item><tt/8859-5/ - Cyrillic
|
|
|
|
<item><tt/8859-8/ - Hebrew
|
|
|
|
</itemize>
|
|
|
|
In Latin 1, the upper half of the table defines
|
|
various characters which are not part of the English alphabet, but are
|
|
present in various european languages (german umlauts, french accentes
|
|
etc).
|
|
|
|
Another popular extended ASCII implementation is so-called <em/IBM
|
|
codepage/ (named after some computer company, that developed this
|
|
codeset for it's infamous personal computers). This one contains
|
|
pseudo-graphic characters in the upper half.
|
|
|
|
Software, that doesn't make any assumptions about the 8-th bit of the
|
|
ASCII data is called <em/8-bit clean/. Some older programs, designed
|
|
with 7-bit ASCII in mind are not 8-bit clean and may work incorrectly
|
|
with your extended ASCII data. Most of packages, however, are able to
|
|
deal with the extended ASCII by default, or require some very basic
|
|
setup. <bf/NOTE:/ before posting the question <em>"I did all setup
|
|
right, but I cannot enter/view Cyrillic characters!"</em>, please
|
|
consult the section <ref id="shells"> for the notes on the
|
|
program, you are using.
|
|
|
|
For information about making your software 8-bit clean, see section
|
|
<ref id="locale-programming">.
|
|
|
|
Since on most systems character occupies 8 bits, there is no way to
|
|
extend ASCII more and more. The way to implement new symbols in
|
|
ASCII-based codesets is creation of other extended ASCII
|
|
implementations. This is the way, the Cyrillic ASCII set is
|
|
implemented.
|
|
|
|
We already mentioned <em/ISO 8859-5/ standard as the one defining the
|
|
Cyrillic codeset. But as it often happens to the standards, this one
|
|
was developed without taking into account the real practices in the
|
|
former USSR. Therefore, one thing that standard really achieved was
|
|
another degree of confusion. I wouldn't say that <em/ISO 8859-5/ is
|
|
widely used anywhere.
|
|
|
|
Other standards for Cyrillic include the so-called <em/Alt/
|
|
codeset and <em/Microsoft CP1251/ codepage. The former one was
|
|
developed by (who?) for MS-DOS quite a while ago. Back then, there was
|
|
not very buzz yet about internetworking, so the intention was to make
|
|
it as compatible as possible with the IBM standard. Therefore the Alt
|
|
codeset is effectively the same IBM codepage, where all specific
|
|
European characters in the upper half were replaced with the Cyrillic
|
|
ones, leaving the pseudographic ones. Therefore, it didn't screw the
|
|
text windowing facilities and provided Cyrillic characters as well.
|
|
The <em/Alt/ standard is still alive and extremely popular in MS-DOS.
|
|
|
|
<em/Microsoft CP1251 codepage/ is just an attempt of Microsoft to come
|
|
up with the new standard for Cyrillic codeset in Windows. As far as I
|
|
know, it is not compatible with anything else (not very surprizing,
|
|
huh?)
|
|
|
|
And finally there is <em/KOI8-R/. This one is also quite old, but it
|
|
was designed wisely and nowadays the design points of it look really
|
|
useful.
|
|
|
|
Again, it is compatible with ASCII, and the Cyrillic characters are
|
|
located in the upper half. But the main design point of <em/KOI8-R/ is
|
|
that the Cyrillic characters' positions must correspond to the English
|
|
characters with the same phonetics. Namely, if we set the eighth bit
|
|
of the English character <tt/'a'/, we'll get the Cyrillic <tt/'a'/.
|
|
This means that, given the Cyrillic text written in KOI8-R, we can
|
|
strip the eighth bit of each character <em/and we still get a readable
|
|
text, although written with English characters!/ This is very
|
|
important now, since there are many mailers on the Internet, that just
|
|
strip the eighth bit silently, being sure that every single soul on
|
|
the face of the Earth speaks English.
|
|
|
|
Not surprisingly, <em/KOI8-R/ quickly became a de-facto standard for
|
|
Cyrillic on the Internet. <htmlurl url="http://www.nagual.ru/~ache"
|
|
name="Andrew A. Chernov"> did a tremendous amount of work to make a
|
|
standard in this area. He is an author of <htmlurl
|
|
url="file://ds.internic.net/rfc/rfc1489.txt" name="RFC 1489">
|
|
(<em/"Registration of a Cyrillic Character Set"/).
|
|
|
|
These two standards differ only in positions of the cyrillic
|
|
characters in the table (that is in cyrillic character codes).
|
|
|
|
The principal difference is that the Alt codeset is used by MS-DOS
|
|
users only, whereas KOI8-R is used in Unix, as well as in MS-DOS
|
|
(though in the latter KOI8-R is much less popular). Since we are doing
|
|
the right thing (namely working in the Unix operating system), we
|
|
shall focuse mostly on KOI8-R.
|
|
|
|
As for the ISO standard, it is more popular in Europe and the US as a
|
|
standard for Cyrillic. The leader in Russia is definitely KOI8-R.
|
|
|
|
There are other standards, which are different from ASCII and much
|
|
more flexible. <em/Unicode/ is most known. However, they are not
|
|
implemented as good as the basic ones in Unix in general and Linux in
|
|
particular. Therefore, I am not describing them here.
|
|
|
|
|
|
<sect>Preparing your environment
|
|
<p>
|
|
|
|
Before we start customizing various parts of the system functionality,
|
|
we have to set up a couple basic things. Most of tools described below
|
|
assume that there are Cyrillic fonts available and a user is able to
|
|
input Cyrillic characters. To make it true we have to configure the
|
|
environment to provide both fonts and input facility for Cyrillic.
|
|
|
|
There are effectively two interface models supported by Linux. One is
|
|
the text mode, and the other one is the graphic mode, provided by the
|
|
X Window System. Both require different setup, which will be described
|
|
below.
|
|
|
|
|
|
<sect1>Text mode setup
|
|
<p>
|
|
|
|
Generally, the text mode setup is the easiest way to show and input
|
|
Cyrillic characters. There is one significant complication, however:
|
|
the text mode fonts and keyboard layout manipulations depend on
|
|
terminal driver implementation. Therefore, there is no portable way to
|
|
achieve the goal across different systems.
|
|
|
|
Right now, I describe the way to deal with the Linux console
|
|
driver. Thus, if you have another system, don't expect it to work for
|
|
you. Instead, consult your terminal driver manual. Nevertheless, send
|
|
me any information you find, so I'll be able to include it in further
|
|
versions of this document.
|
|
|
|
|
|
<sect2>Linux Console<label id="linux-console">
|
|
<p>
|
|
|
|
The Linux console driver is quite a flexible piece of software. It is
|
|
capable of changing fonts as well as keyboard layouts. To achieve it,
|
|
you'll need the <htmlurl
|
|
url="http://sunsite.unc.edu/pub/Linux/system/Keyboards/" name="kbd">
|
|
package. Both RedHat and Slackware install kbd as part of a system.
|
|
|
|
The kbd package contains keyboard control utilities as well as a big
|
|
collection of fonts and keyboard layouts.
|
|
|
|
Cyrillic setup with <bf/kbd/ usually involves two things:
|
|
|
|
<enum>
|
|
<item>Screen font setup. This is performed by the
|
|
<tt/setfont/ program. The fonts files are located in
|
|
<tt>/usr/lib/kbd/consolefonts</tt>.
|
|
|
|
<bf/NOTE:/ Never run the <tt/setfont/ program under X because it will hang
|
|
your system. This is because it works with low-level video card calls
|
|
which X doesn't like.
|
|
|
|
<item>Load the appropriate keyboard layout with the <tt/loadkeys/
|
|
program.
|
|
|
|
</enum>
|
|
|
|
NOTE: In RedHat 3.0.3, <tt>/usr/bin/loadkeys</tt> has too restrictive
|
|
access permissions, namely 700 (<tt/rwx------/). There are no reasons
|
|
for that, since everyone may compile his own copy and execute it (the
|
|
appropriate system calls are not root-only). Thus, just ask your
|
|
sysadmin to set more reasonable permissions for it (for example, 755).
|
|
|
|
The following is an excerpt from my <tt/cyrload/ script, which sets
|
|
up the Cyrillic mode for Linux console:
|
|
|
|
<verb>
|
|
if [ notset.$DISPLAY != notset. ]; then
|
|
echo "`basename $0`: cannot run under X"
|
|
exit
|
|
fi
|
|
|
|
loadkeys /usr/lib/kbd/keytables/ru.map
|
|
setfont /usr/lib/kbd/consolefonts/Cyr_a8x16
|
|
mapscrn /usr/lib/kbd/consoletrans/koi2alt
|
|
echo -ne "\033(K" # the magic sequence
|
|
echo "Use the right Ctrl key to switch the mode..."
|
|
</verb>
|
|
|
|
Let me explain it a bit. You load the appropriate keyboard
|
|
mapping. Then you load a font corresponding to the <em/Alt/
|
|
codeset. Then, in order to be able to display text in <em/KOI8-R/
|
|
correctly, you load a <it/screen translation table/. What it does is a
|
|
translation of <em/some/ characters from the upper half of the codeset
|
|
to the <em/Alt/ encoding. The word 'some' is crucial here - not all
|
|
characters get translated, therefore some of them, like IBM
|
|
pseudographic characters get unmodified to the screen and display
|
|
correctly, since they are compatible with the <em/Alt/ codeset, as
|
|
opposed to <em/KOI8-R/. To ensure this, run <bf/mc/ and pretend you
|
|
are back to MS-DOS 3.3...
|
|
|
|
Finally, the magic sequence is important but I have no idea what on
|
|
the Earth it does. I stole/borrowed/learned it from German HOWTO back
|
|
in 1994, when it was like the only national language oriented
|
|
HOWTO. <em/If you have any idea about this magic sequence, please tell
|
|
me/.
|
|
|
|
Finally, for those purists, who don't wont to give the <em/Alt/
|
|
codeset a chance, I'm attaching yet another version of the script
|
|
above, using native <em/KOI8-R/ fonts.
|
|
|
|
<verb>
|
|
if [ notset.$DISPLAY != notset. ]; then
|
|
echo "`basename $0`: cannot run under X"
|
|
exit
|
|
fi
|
|
|
|
loadkeys /usr/lib/kbd/keytables/ru.map
|
|
setfont /usr/lib/kbd/consolefonts/koi-8x16
|
|
echo "Use the right Ctrl key to switch the mode..."
|
|
</verb>
|
|
|
|
However, don't expect nice borders in your text mode-based windowing
|
|
applications.
|
|
|
|
Now you probably want to test it. Do the appropriate bash or tcsh
|
|
setup, rerun it, then press the right <tt/Control/ key and make sure
|
|
you are getting the cyrillic characters right. The '<tt/q/' key must
|
|
produce russian "<tt/short i/" character, '<tt/w/' generates
|
|
"<tt/ts/", etc.
|
|
|
|
If you've screwed something up, the very best thing to do is to reset
|
|
to the original (that is, US) settings. Execute the following
|
|
commands:
|
|
|
|
<verb>
|
|
loadkeys /usr/lib/kbd/keytables/defkeymap.map
|
|
setfont /usr/lib/kbd/consolefonts/default8x16
|
|
</verb>
|
|
|
|
<bf/NOTE:/ unfortunately enough, the console driver is not able to
|
|
preserve it's state (at least easily enough), while running the X
|
|
Window System. Therefore, after you leave the X (or switch from it to
|
|
a console), you have to reload the console russian font.
|
|
|
|
|
|
<sect2>FreeBSD Console
|
|
<p>
|
|
|
|
I am not using FreeBSD so I couldn't test the following information.
|
|
All data in this section should be treated as just pointers to begin
|
|
with. <htmlurl url="http://www.freebsd.org" name="The FreeBSD project
|
|
homepage"> may have some information on the subject. Another good
|
|
source is the <htmlurl url="news:relcom.fido.ru.unix"
|
|
name="relcom.fido.ru.unix"> newsgroup. Also, check the resources
|
|
listed in section <ref id="resources">.
|
|
|
|
Anyway, this is what <htmlurl url="mailto:elias@artx.ru" name="Ilya
|
|
K. Orehov"> suggests to do in order to make FreeBSD console speak
|
|
Russian:
|
|
|
|
<enum>
|
|
|
|
<item>In <tt>/etc/sysconfig</tt> add:
|
|
|
|
<verb>
|
|
|
|
keymap=ru.koi8-r
|
|
keyrate=fast
|
|
# NOTE: '^[' below is a single control character
|
|
keychange="61 ^[[K"
|
|
cursor=destructive
|
|
scrnmap=koi8-r2cp866
|
|
font8x16=cp866b-8x16
|
|
font8x14=cp866-8x14
|
|
font8x8=cp866-8x8
|
|
|
|
</verb>
|
|
|
|
<item>In <tt>/etc/csh.login</tt>:
|
|
|
|
<verb>
|
|
setenv ENABLE_STARTUP_LOCALE
|
|
setenv LANG ru_SU.KOI8-R
|
|
setenv LESSCHARSET latin1
|
|
</verb>
|
|
|
|
<item>Make analogous changes in <tt>/etc/profile</tt>
|
|
|
|
</enum>
|
|
|
|
|
|
<sect1>The X Window System
|
|
<p>
|
|
|
|
Like the console mode, the X environment also requires some
|
|
setup. This involves setting up the input mode and the X fonts. Both
|
|
are being discussed below.
|
|
|
|
|
|
<sect2>The X fonts.<label id="xfonts">
|
|
<p>
|
|
|
|
First of all, you have to obtain the fonts having the
|
|
Cyrillic glyphs at the appropriate positions.
|
|
|
|
If you are using the most recent X (or XFree86) distribution, chances
|
|
are, that you already have such fonts. In the late 1995, the X Window
|
|
System incorporated a set of Cyrillic fonts, created by <htmlurl
|
|
url="http://www.cronyx.ru" name="Cronyx">. Ask your system
|
|
administrator, or, if <em/you/ are the one, check your system, namely:
|
|
|
|
<enum>
|
|
<item>Run '<tt/xlsfonts | grep koi8/'. If there are fonts listed, your
|
|
X server is already aware about the fonts.
|
|
|
|
<item>Otherwise, run
|
|
|
|
<verb>
|
|
find -name crox\*.pcf\*
|
|
</verb>
|
|
|
|
to find the location of the Cyrillic fonts in the system. You'll have
|
|
to <tt/enable/ those fonts to the X server, as I explain below.
|
|
|
|
</enum>
|
|
|
|
If you haven't found such fonts installed, you'll have to do it
|
|
yourself.
|
|
|
|
There is some ambiguity with the fonts. XFree86 docs claim that the
|
|
russian fonts collection included in the distribution is developed by
|
|
Cronyx. Nevertheless, you may find another set of Cronyx Cyrillic
|
|
fonts on the net (eg. on <htmlurl
|
|
url="ftp://ftp.kiae.su/cyrillic/x11/fonts/xrus-2.1.1-src.tgz"
|
|
name="ftp.kiae.su">), known as the <bf/xrus/ package (don't confuse it
|
|
with the <tt/xrus/ program, which is used to setup a Cyrillic keyboard
|
|
layout. Hopefully, tha letter one was renamed to <bf/xruskb/
|
|
recently). <bf/Xrus/ has fewer fonts than the collection in Xfree86
|
|
(38 vs 68), but the latter one didn't go along with my <ref
|
|
id="netscape" name="Netscape"> setup - it gave me some really huge
|
|
font in the menubar. The <bf/xrus/ package doesn't have this problem.
|
|
|
|
I would suggest you to download and try both of them. Pick up the one
|
|
which you'll like more. Also, I'm going to creat RPM packages soon for
|
|
both collections and download them to <htmlurl
|
|
url="ftp://ftp.redhat.com/pub/contrib/i386/" name="ftp.redhat.com">.
|
|
|
|
There are also older stuff, for example the <bf/vakufonts/ package,
|
|
created by <htmlurl url="mailto:vak@cronyx.ru" name="Serge Vakulenko">,
|
|
which was the base for the one in the X distribution. There are also a
|
|
number of others. The important point is that the fonts' names in the
|
|
old collection were not strictly conforming to the standard. The
|
|
latter is fine in general, but sometimes it may cause various weird
|
|
errors. For example, I had a bad experience with Maple V for Linux,
|
|
which crashed mysteriously with the <bf/vakufonts/ package, but ran
|
|
smoothly with the "standard" ones.
|
|
|
|
So, let's start with the fonts:
|
|
|
|
<enum>
|
|
<item>Download the appropriate fonts collection. The package for
|
|
XFree86 may be found at any FTP site, containing the X distribution,
|
|
for example, directly from the <htmlurl url="http://www.xfree86.org"
|
|
name="XFree86 FTP site">. The <bf/xrus/ package may be found on
|
|
<htmlurl url="ftp://ftp.kiae.su/cyrillic/x11/fonts/xrus-2.1.1-src.tgz"
|
|
name="ftp.kiae.su">
|
|
|
|
<item>Now when you have the fonts, you create some directory for
|
|
them. It is generally a bad idea to put new fonts to the already
|
|
existing font directory. So, place them, to, say,
|
|
<tt>/usr/lib/X11/fonts/cyrillic</tt> for a system-wide setup, or just
|
|
create a private directory for personal use.
|
|
|
|
<item>If the new fonts are in BDF format (<tt/*.bdf/ files), you have to
|
|
compile them. For each font do:
|
|
|
|
<verb>
|
|
bdftopcf -o <font>.pcf <font>.bdf
|
|
</verb>
|
|
|
|
If your server supports compressed fonts, do it, using the
|
|
<em/compress/ program:
|
|
|
|
<verb>
|
|
compress *.pcf
|
|
</verb>
|
|
|
|
Also, if you do want to put the new fonts to an already existing font
|
|
directory. you have to concatenate the old and the new files named
|
|
<tt/fonts.alias/ in the case both of them exist.
|
|
|
|
<item>Each font directory in the X must contain a list of fonts in it. This
|
|
list is stored in the file <tt/fonts.dir/. You don't have to create this
|
|
list manually. Instead, do:
|
|
|
|
<verb>
|
|
cd <new font directory>
|
|
mkfontdir .
|
|
</verb>
|
|
|
|
<item>Now you have to make this font directory known to the X
|
|
server. Here, you have a number of options:
|
|
|
|
<itemize>
|
|
<item>System-wide setup for XFree86. If you are running this version of
|
|
X, then append the new directory to the list of directories in the
|
|
file <tt/XF86Config/. To find the location of this file, see output of
|
|
<tt/startx/. Also, see <bf>XF86Config(4/5)</bf> for details.
|
|
|
|
<item>System-wide setup through <tt/xinit/. Add the new directory to
|
|
the <tt/xinit/ startup file. See <bf/xinit(1x)/ and the next option
|
|
for details.
|
|
|
|
<item>Personal setup. You have a special start-up file for the X -
|
|
<tt>~/.xinitrc</tt> (or <tt>~/.Xclients</tt>, or <tt>~/.xsession</tt>
|
|
for the RedHat users). Add the following commands to it:
|
|
</itemize>
|
|
|
|
<verb>
|
|
xset +fp <new font directory>
|
|
xset fp rehash
|
|
</verb>
|
|
|
|
It is important to note that '<tt/+fp/' means that the new fonts will
|
|
be added to the head of the font path list. That is, if an application
|
|
requests say a <tt/fixed/ font, it'll be given the one with Cyrillic
|
|
characters, which is definitely what we are trying to achieve.
|
|
|
|
There are problems, though. The <tt/fixed/ font in the cyrillic fonts
|
|
distribution doesn't have it's bold and italic counterparts. My font
|
|
of choice is <tt/6x13/, so, since it also lacks bold and italic
|
|
typefaces, I cannot use Emacs/XEmacs faces in their full
|
|
glory. Hopefully somebody will ultimately create those fonts and the
|
|
situation will change.
|
|
|
|
<item>Now restart your X. If you have done everything right, the tests
|
|
in the beginning of the section will be successful. Also, play with
|
|
<bf/xfontsel(1x)/ to make sure you are able to select the cyrillic fonts.
|
|
</enum>
|
|
|
|
In order to make the X clients use the Cyrillic fonts, you have to set
|
|
up the appropriate X resources. For example, I make the russian font
|
|
the default one in my <tt>~/.Xdefaults</tt>:
|
|
|
|
<verb>
|
|
*font: 6x13
|
|
</verb>
|
|
|
|
Since my cyrillic fonts are first in the font path (see output of
|
|
'<tt/xset q/'), the font above is taken from the "cyrillic" directory.
|
|
|
|
This just a simple case. If you want to set the appropriate part of
|
|
the X client to a cyrillic font, you have to figure out the name of
|
|
the resource (eg. using <bf/editres(1x)/) and to specify it either in
|
|
the resource database, or in the command line. Here go some examples:
|
|
|
|
<verb>
|
|
$ xterm -font '-cronyx-*-bold-*-*-*-19-*-*-*-*-*-*-*'
|
|
</verb>
|
|
|
|
...will run xterm with some ugly font; and
|
|
|
|
<verb>
|
|
$ xfontsel -xrm '*quitButton.font: -*-times-*-*-*-*-13-*-*-*-*-*-koi8-*'
|
|
</verb>
|
|
|
|
...will set a Cyrillic Times font for the <bf/Quit/ button in
|
|
<tt/xfontsel/.
|
|
|
|
|
|
<sect2>The input translation
|
|
<p>
|
|
|
|
In the newest X releases (X11R61 and higher) there are two "standard"
|
|
input methods: the original one, working through the <bf/xmodmap/
|
|
utility, and the new one called <em/Xkb/ (X KeyBoard). The very first
|
|
thing you have to do is <bf/to disable the Xkb method!/ Don't get
|
|
charmed by it's ability to set up a "russian keyboard". It looks like
|
|
this method is using the Cyrillic keysyms defined in
|
|
<tt/keysymdef.h/. This file defines keysyms for many languages. The
|
|
only problem is that those definitions have nothing to do with the
|
|
extended ASCII codeset - the one most programs are only able to
|
|
operate with! I hardly know any programs being able to grok the
|
|
<tt/keysymdef.h/ keysyms, different from 8-bit ASCII. However our goal
|
|
is to get the KOI8-R support to work.
|
|
|
|
To disable the <tt/Xkb/ support, browse through the <tt/Keyboard/
|
|
section of your <tt/XF86Config/ file and comment all lines starting
|
|
with <em/Xkb/ (case doesn't matter). Instead, put the following line:
|
|
|
|
<verb>
|
|
XkbDisable
|
|
</verb>
|
|
|
|
The <tt/xmodmap/ program.allows customization of codes emitted by
|
|
various characters and their combinations. It sets the things up based
|
|
on the file containing the translation table.
|
|
|
|
In the previous versions of this document I used to describe the
|
|
<tt/xmodmap/-based setup in a great detail. This proved to be almost
|
|
useless. The <tt/Xmodmap/-based input translation method is well known
|
|
as being it is non-portable, inflexible, and incomplete. Your
|
|
configuration may work with one XFree version and fail with a
|
|
different one. Even worse, sometimes things differ accross different
|
|
servers in the same distribution.
|
|
|
|
I strongly suggest you not to play with this <tt/xmodmap/, at least
|
|
for now. Apart from headache and disappointment you'll gain nothing.
|
|
Instead, I recommend installing the <htmlurl
|
|
url="ftp://ftp.relcom.ru/pub/x11/cyrillic/" name="xruskb"> package,
|
|
which allows you to configure most of the input translation parameters
|
|
without having to know about <tt/xmodmap/. Again, the RedHat Linux
|
|
users are free to download and install an <htmlurl
|
|
url="ftp://ftp.redhat.com/pub/contrib/i386/xruskb-1.5.1-1.i386.rpm"
|
|
name="RPM"> package.
|
|
|
|
|
|
<sect1>First steps - Cyrillic in shells<label id="shells">
|
|
<p>
|
|
|
|
<sect1>bash
|
|
<p>
|
|
|
|
Three variables should be set on order to make <tt/bash/ understand the
|
|
8-bit characters. The best place is <tt>~/.inputrc</tt>
|
|
file. The following should be set:
|
|
|
|
<verb>
|
|
set meta-flag on
|
|
set convert-meta off
|
|
set output-meta on
|
|
</verb>
|
|
|
|
|
|
<sect1>csh/tcsh<label id="csh">
|
|
<p>
|
|
|
|
The following should be set in <tt/.cshrc/:
|
|
|
|
<verb>
|
|
setenv LC_CTYPE iso_8859_5
|
|
stty pass8
|
|
</verb>
|
|
|
|
If you don't have the POSIX <tt/stty/ (impossible for Linux), then
|
|
replace the last call to the following:
|
|
|
|
<verb>
|
|
stty -istrip cs8
|
|
</verb>
|
|
|
|
|
|
<sect1>ksh
|
|
<p>
|
|
|
|
As for the public domain <tt/ksh/ implementation - <tt/pdksh 5.1.3/,
|
|
you can input 8 bit characters only in <tt/vi/ input mode. Use:
|
|
|
|
<verb>
|
|
set -o vi
|
|
</verb>
|
|
|
|
|
|
<sect1>less
|
|
<p>
|
|
|
|
So far, <tt/less/ doesn't support the KOI8-R character set, but the
|
|
following environment variable will do the job:
|
|
|
|
<verb>
|
|
LESSCHARSET=latin1
|
|
</verb>
|
|
|
|
|
|
<sect1>mc (The Midnight Commander)
|
|
<p>
|
|
|
|
To display Cyrillic text correctly, select the <em/full 8 bits/ item
|
|
in the <bf>Options/Display</bf> menu.
|
|
|
|
If your problem is the ugly windows' borders, consult the <ref
|
|
id="linux-console"> section.
|
|
|
|
As an off-topic, if you want to make <bf/mc/ use color in an
|
|
<tt/Xterm/ window, set the variable <tt/COLORTERM/:
|
|
|
|
<verb>
|
|
COLORTERM= ; export COLORTERM
|
|
</verb>
|
|
|
|
|
|
<sect1>rlogin
|
|
<p>
|
|
|
|
Make sure that the shell on the destination site is properly set
|
|
up. Then, if your <tt/rlogin/ doesn't work by default, use '<tt/rlogin
|
|
-8/'.
|
|
|
|
|
|
<sect1>zsh
|
|
<p>
|
|
|
|
Use the same way as with <tt/csh/ (see section <ref id="csh"
|
|
name="csh">). The startup files in this case are <tt/.zshrc/ or
|
|
<tt>/etc/zshrc</tt>.
|
|
|
|
|
|
<sect>Editing text
|
|
<p>
|
|
|
|
In this section I'll describe how to customize various text editors to
|
|
work with Cyrillic text. This doesn't cover the <em/word processors/,
|
|
which will be described later (see section <ref id="word-processing">).
|
|
|
|
|
|
<sect1>Emacs and XEmacs<label id="emacs">
|
|
<p>
|
|
|
|
There are two version of the Emacs editor - <bf/GNU Emacs/ and
|
|
<bf/XEmacs/. While they provide more or less same functionality, some
|
|
implementation details are significantly different. Cyrillic setup
|
|
requires some low-level (in Emacs Lisp sense) tweaking, and it differs
|
|
a bit for those two versions.
|
|
|
|
<bf/NOTE:/ Apart from the setup described here, there is an
|
|
alternative way to configure both versions of emacs - use <bf/MULE/
|
|
(MULtilanguage Emacs support). The latter way is fairly complicated
|
|
and (to the best of my knowledge) rarely used, so I don't discuss it
|
|
here.
|
|
|
|
The minimal cyrillic support in <bf/GNU emacs/ (you don't have to do
|
|
it for the <bf/XEmacs/) is done by adding the following calls to one's
|
|
<tt/.emacs/ (provided that the Cyrillic character set support is
|
|
installed for console or X respectively):
|
|
|
|
<verb>
|
|
(standard-display-european t)
|
|
|
|
(set-input-mode (car (current-input-mode))
|
|
(nth 1 (current-input-mode))
|
|
0)
|
|
</verb>
|
|
|
|
This allows the user to view and input documents in Russian.
|
|
|
|
However, it isn't enough. Emacs doesn't know yet, that Cyrililic
|
|
characters may constitute a word, let alon the upper/lower case
|
|
conversion rules. In order to teach Emacs doing that, you have to
|
|
modify the syntax and case tables of emacs:
|
|
|
|
<verb>
|
|
(require 'case-table)
|
|
|
|
(let* ((ruc "\341\342\367\347\344\345\263\366\372\351\352\353\354\355\356\357\360\362\363\364\365\346\350\343\376\373\375\370\371\377\374\340\361")
|
|
(rlc "\301\302\327\307\304\305\243\326\332\311\312\313\314\315\316\317\320\322\323\324\325\306\310\303\336\333\335\330\331\337\334\300\321")
|
|
(i 0)
|
|
(len (length ruc)))
|
|
(while (< i len)
|
|
(modify-syntax-entry (elt ruc i) "w ")
|
|
(modify-syntax-entry (elt rlc i) "w ")
|
|
(set-case-syntax-pair (elt ruc i) (elt rlc i) (standard-case-table))
|
|
(setq i (+ i 1))))
|
|
</verb>
|
|
|
|
For this purpose I created a <tt/rusup.el/ file which does this, as
|
|
well as a couple handy functions. You have to load it in your
|
|
<tt>~/.emacs</tt>.
|
|
|
|
Finally, the <url url="http://www.math.uga.edu/~valery/russian.el"
|
|
name="russian.el"> package by Valery Alexeev
|
|
(<tt/valery@math.uga.edu/) allows the user to switch between cyrillic
|
|
and regular input mode and to translate the contents of a buffer from
|
|
one Cyrillic coding standard to another (which is especially useful
|
|
while reading the texts imported from MS-DOS or Windows).
|
|
|
|
|
|
<sect1>Using vi
|
|
<p>
|
|
|
|
The <bf/vi/ editor (at least it's clone <bf/vim/, available in most
|
|
Linux distributions) is aware of 8-bit characters. It will allow you
|
|
to enter cyrillic characters and will be able to recognize the word
|
|
boundaries correctly. I don't know about the upper-/lower-case
|
|
conversion rules, since I don't use <bf/vi/ much. <em/If you know
|
|
something about it, please inform me/.
|
|
|
|
|
|
<sect1>Editing text with joe
|
|
<p>
|
|
|
|
<bf/Joe/ requires a special <tt/-asis/ option to recognize 8-bit
|
|
characters. You may either specify this option at the command line, or
|
|
to put it in <tt>~/.joerc</tt> file (for personal use, or in
|
|
<tt>/usr/lib/joerc</tt> for system-wide setup.
|
|
|
|
If your program doesn't understand <tt/-asis/ option, you have to
|
|
upgrade to the newer version.
|
|
|
|
However, <bf/joe/ doesn't seem to understand the cyrillic words'
|
|
boundaries correctly. I assume, that it applies both to the case
|
|
conversion rules.
|
|
|
|
|
|
<sect1>Spell-checking Russian
|
|
<p>
|
|
|
|
The program I use to spell-check text is the <bf/GNU ispell/. It is
|
|
very flexible and extensible, so it is possible to use it to
|
|
spell-check text in languages, other than English, by adding new
|
|
<em/spell dictionaries/.
|
|
|
|
Constantine Knizhnik has created a very good Russian dictionary for
|
|
<bf/ispell/. You may find it at his <htmlurl
|
|
url="http://www.ispras.ru/~knizhnik" name="homepage">. The
|
|
distribution includes a handy incremental spelling script for
|
|
<bf/emacs/.
|
|
|
|
Ideally, if you already have an <bf/ispell/ properly installed, you
|
|
have to just step into the newly-created directory and generate the
|
|
dictionary, using the commands provided in the <tt/Makefile/. However,
|
|
chances are quite high, that you'll see a lot of complaints about the
|
|
<bf/ispell/'s unawareness of the 8-bit data. This is because in most
|
|
distributions, <bf/ispell/ is compiled without 8-bit data support. In
|
|
this case, you cannot avoid recompiling the <bf/ispell/ package.
|
|
|
|
Again, RedHat users will be delighted to know that I've rebuilt the
|
|
<bf/ispell/ package with both Russian and German dictionaries. As
|
|
usual, you may grab it from the <htmlurl
|
|
url="ftp://ftp.redhat.com/pub/contrib/i386/ispell-3.1.20-6.i386.rpm"
|
|
name="RedHat FTP site">.
|
|
|
|
Once you have everything installed, you may invoke Russian
|
|
spell-check, by supplying <tt/'-d russian'/ option to <bf/ispell/.
|
|
|
|
Now, if you use <bf/Emacs/, you may want to add a menu item for a
|
|
russian dictionary. I sent a proposed menu entry to the <tt/ispell.el/
|
|
maintainer and he kindly agreed to include it in the next public
|
|
release of the file. Meanwhile, you may do it by adding the following
|
|
code in your <tt>~/.emacs</tt> (or in
|
|
<tt>/usr/share/emacs/site-lisp/site-start.el</tt> for a system-wide
|
|
setup):
|
|
|
|
<verb>
|
|
(setq ispell-dictionary-alist
|
|
(append ispell-dictionary-alist
|
|
'(("russian"
|
|
"[\341\342\367\347\344\345\263\366\372\351\352\353\354\355\356\357\360\362\363\364\365\346\350\343\376\373\375\370\371\377\374\340\361\301\302\327\307\304\305\243\326\332\311\312\313\314\315\316\317\320\322\323\324\325\306\310\303\336\333\335\330\331\337\334\300\321]"
|
|
"[^\341\342\367\347\344\345\263\366\372\351\352\353\354\355\356\357\360\362\363\364\365\346\350\343\376\373\375\370\371\377\374\340\361\301\302\327\307\304\305\243\326\332\311\312\313\314\315\316\317\320\322\323\324\325\306\310\303\336\333\335\330\331\337\334\300\321]"
|
|
"[']" t ("-C" "-d" "russian") "~latin1"))))
|
|
|
|
(define-key-after ispell-menu-map [ispell-select-russian]
|
|
'("Select Russian (KOI-8)" . (lambda ()
|
|
(interactive)
|
|
(ispell-change-dictionary "russian")))
|
|
'british)
|
|
</verb>
|
|
|
|
Unfortunately, it won't work for the <bf/XEmacs/. I'll try to solve
|
|
this problem later.
|
|
|
|
|
|
<sect>Using Cyrillic with mail and news
|
|
<p>
|
|
|
|
Setting up your mail and news software to recognize Cyrillic text is
|
|
not very difficult, although you have to possess some knowledge of
|
|
principles, mail and news work by.
|
|
|
|
Internet electronic mail software generally consists of two parts:
|
|
<bf/MUA/ (Mail User Agent) and <bf/MTA/ (Mail Transfer Agent). MUA is
|
|
the program you use to read, compose, and send mail. However, MUA
|
|
doesn't transfer mail messages by itself. Instead, it calls the MTA,
|
|
which is reponsible to send message using an appropriate protocol to
|
|
the appropriate direction. For example, your MUA may be <bf/Pine/ and
|
|
MTA - <bf/qmail/.
|
|
|
|
Until quite recently, both MTA and MUA weren't 8-bit clean by
|
|
default. Therefore, whenever you sent your message from say America to
|
|
Russia, you were never sure, that some intermediate MTA won't strip
|
|
the 8th bit from each character of your message. Therefore, a set of
|
|
protocols was developed, which allowed encoding various kinds of data
|
|
using only printable characters from 7-bit ASCII. This family of
|
|
protocols is called <bf/MIME/ (MultimedIa Mail Encoding).
|
|
|
|
Since MIME is usually pre-configured to reasonable defaults, we won't
|
|
describe it here. We will talk more about MIME when we provide a
|
|
backward compatibility with other Cyrillic encodings (section <ref
|
|
id="mime">).
|
|
|
|
Meanwhile, we start MUA setup, because it is usually up to an
|
|
end-user. Then, we will describe the basic priciples of the MTA
|
|
configuration for Cyrillic.
|
|
|
|
|
|
<sect1>Setting up Mail User Agents
|
|
<p>
|
|
|
|
|
|
<sect2>Emacs-based mail readers
|
|
<p>
|
|
|
|
Basically, you don't need any special setup for Emacs-based readers,
|
|
geivedn, that you've already configured the emacs itself (see section
|
|
<ref id="emacs">).
|
|
|
|
|
|
<sect2>pine
|
|
<p>
|
|
|
|
Set the following directive in <tt>~/.pinerc</tt> for personal
|
|
configuration, or in <tt>/usr/lib/pine.conf</tt> for a global one:
|
|
|
|
<verb>
|
|
character-set=ISO-8859-5
|
|
</verb>
|
|
|
|
|
|
<sect1>Configuring your MTA
|
|
<p>
|
|
|
|
There are a number of MTAs available now. These include <bf/sendmail/,
|
|
<bf/qmail/, <bf/smail/, <bf/exim/, and others.
|
|
|
|
|
|
<sect2>sendmail
|
|
<p>
|
|
|
|
So far, <bf/sendmail/ is much more popular than other MTAs, because
|
|
it's long history and widespread use. Personally, I hate this program
|
|
- it is a perfect example of a completely moronic design and even it's
|
|
"improvements" with the passion of time show, that this approach is
|
|
not going to cease. Any system administrator shudders, when he hears
|
|
the ominous "<tt/sendmail.cf/" name...
|
|
|
|
As of now, <bf/sendmail/ doesn't strip the 8th bit anymore. However,
|
|
it may <em/encode/ the 8-bit data using a special <em/base64/
|
|
encoding. Although most MUAs are supposed to recognize it and decode
|
|
it back to a regular data, you may want to start with sending raw
|
|
8-bit text to make sure everything works.
|
|
|
|
As of version 8, <bf/sendmail/ handles 8-bit data correctly by
|
|
default. If it doesn't do it for you, check the <tt/EightBitMode/
|
|
option and option <tt/7/ given to mailers in your
|
|
<tt>/etc/sendmail.cf</tt>. See <em/"Sendmail. Operation and
|
|
Installation Guide"/ for details.
|
|
|
|
|
|
<sect2>Other MTAs
|
|
<p>
|
|
|
|
I don't know much about other MTAs. If you know something, which may
|
|
be important for Cyrillic setup, please inform me.
|
|
|
|
|
|
<sect>Browsing the Cyrillic Web
|
|
<p>
|
|
|
|
Unlike e-mail and news, there is no definitive standard for Cyrillic
|
|
encoding for the Web. This is primarily because Microsoft offers Web
|
|
authoring tools, which only allow <em/cp1251/ codeset for Cyrillic,
|
|
completely ignoring the fact that any other standards may already
|
|
exist.
|
|
|
|
The setup described here is very basic. It will allow you to view
|
|
pages in the <em/KOI8-R/ codeset. If the situation improves, I'll add
|
|
more information.
|
|
|
|
|
|
<sect1>lynx
|
|
<p>
|
|
|
|
As of version 2.6, you may select the appropriate encoding for the
|
|
<tt/display Character set/ option.
|
|
|
|
|
|
<sect1>Netscape navigator<label id="netscape">
|
|
<p>
|
|
|
|
Make sure you are using <tt/Netscape/ version higher than 3. If your
|
|
<tt/Netscape/ is older, download a new one from <htmlurl
|
|
url="http://www.netscape.com" name="www.netscape.com">.
|
|
|
|
|
|
<sect2>Basic setup
|
|
<p>
|
|
|
|
To be able to see Cyrillic text in most parts of the HTML document, do
|
|
the following:
|
|
|
|
<itemize>
|
|
<item>In menu <bf>Options/Document Encoding</bf> select
|
|
<bf/Cyrillic(KOI-8)/.
|
|
|
|
<item>In menu <bf>Options/General Preferences/Fonts</bf> select
|
|
<bf/Cyrillic (KOI-8)/ encoding, <bf/Times(Cronyx)/ as a proportional
|
|
font and <bf/Courier(Cronyx)/ as a fixed one.
|
|
|
|
<item>save options.
|
|
</itemize>
|
|
|
|
<bf/NOTE:/ This setup will work with most parts of the
|
|
document. However, you won't be able to display Cyrillic text in the
|
|
window header, menus and some controls. Attempts to fix it follows.
|
|
|
|
|
|
<sect2>Cyrillic text in frames and input areas
|
|
<p>
|
|
|
|
To fix this, it is usually enough to:
|
|
|
|
<enum>
|
|
|
|
<item>Copy the Netscape properties database (usually <tt/Netscape.ad/)
|
|
to <tt>~/Netscape</tt>.
|
|
|
|
<item>In the latter file, set the following property:
|
|
|
|
<verb>
|
|
*documentFonts.charset*iso8859-1: koi8-r
|
|
</verb>
|
|
|
|
</enum>
|
|
|
|
This will force all frame and input elements to use the fonts with
|
|
<em/koi8-r/ encoding instead of the default ones, therefore you have
|
|
to make sure you have installed such fonts (see section <ref
|
|
id="xfonts">).
|
|
|
|
The bad news about the trick above is that if you load a document
|
|
which is supposed to be displayed in <tt/iso-8859-1/ fonts, it will be
|
|
displayed using the <tt/koi8/ fonts instead. Sometimes such documents
|
|
will look worse.
|
|
|
|
|
|
<sect2>Advanced setup
|
|
<p>
|
|
|
|
Andrew A. Chernov is the one, who knows more than others about KOI-8
|
|
in general and netscape in particular. Visit his excellent <htmlurl
|
|
url="http://www.nagual.ru/~ache/koi8.html" name="KOI-8 page"> and
|
|
download a patch for Netscape resource file, making Netscape speak
|
|
Russian as much as it is able to.
|
|
|
|
|
|
<sect>Cyrillic wordprocessing<label id="word-processors">
|
|
<p>
|
|
|
|
|
|
<sect1>TeX-based environments<label id="tex">
|
|
<p>
|
|
|
|
In this section I'll describe several ways to make TeX and LaTeX
|
|
typeset Cyrillic texts. There are several ways, which differ in setup
|
|
sophistication and usage convenience. For example, one possibility is
|
|
to start without any preliminary setup and use the <em/Washington
|
|
AMSTeX Cyrillic fonts/. On the other hand, you may install a LaTeX
|
|
package, providing a very high degree of Cyrillic setup. I have an
|
|
experience with two such packages. One is the <tt/cmcyralt/ package by
|
|
Vadim V. Zhytnikov (<tt/vvzhy@phy.ncu.edu.tw/) and Alexander Harin
|
|
(<tt/harin@lourie.und.ac.za/), and the other one is the <tt/LH/
|
|
package by the <em/CyrTUG/ group with styles and hyphenation for
|
|
LaTeX2e by Sergei O. Naoumov (<tt/serge@astro.unc.edu/). I'll describe
|
|
both.
|
|
|
|
Note, that there are two versions of LaTeX available - 2.09 is the old
|
|
one, while 2e is a new pre-3.0 release. If you are using LaTeX 2.09,
|
|
then switch quickly to the 2e. The latter retains compatibility with
|
|
the old one, but has much more features. Hopefully, version 3 will be
|
|
released soon. I describe a LaTeX 2e setup.
|
|
|
|
Also, both of these packages require the Cyrillic text to be typeset
|
|
using the <em/Alt/ codeset, not <em/KOI8-R/! This is caused by
|
|
historical reasons, since the creators of these packages used to work
|
|
with <tt/EmTeX/ - the MS-DOG version of TeX (they didn't know about
|
|
Linux yet :-). Switching to the <em/KOI8-R/ requires some effort and is
|
|
being expected to be done soon. So far, use some utility to convert
|
|
your russian text from <em/KOI8-R/ to <em/Alt/. See section <ref
|
|
id="user-tools">.
|
|
|
|
|
|
<sect2>Using the Washington Cyrillic
|
|
<p>
|
|
|
|
This package was created for the American Mathematic Society to
|
|
provide documents with Russian references. Therefore, the authors were
|
|
not very careful and the fonts look quite clumsy. This package is
|
|
usually referred to as a <tt/"really bad cyrillic package for TeX"/.
|
|
|
|
Nevertheless, we'll discuss it, because it is very easy to use and
|
|
doesn't require any setup - this collection is supplied with most of
|
|
TeX distributions.
|
|
|
|
Of course, you won't be able to use such luxury as automatic
|
|
hyphenation, but anyway...
|
|
|
|
1. Prepend your document with the following directives:
|
|
|
|
<verb>
|
|
\input cyracc.def
|
|
\font\tencyr=wncyr10
|
|
\def\cyr{\tencyr\cyracc}
|
|
</verb>
|
|
|
|
2. Now to type a cyrillic letter, you enter
|
|
|
|
<verb>
|
|
\cyr
|
|
</verb>
|
|
|
|
and use a corresponding latin letter or a TeX command. Thus, the lower
|
|
case of the Russian alphabet is expressed by the following codes:
|
|
|
|
<verb>
|
|
a b v g d e \"e zh z i {\u i} k l m n o p r s t u f kh c ch sh shch
|
|
{\cprime} y {\cdprime} \`e yu ya
|
|
</verb>
|
|
|
|
It is extremely inconvenient to convert your Russian texts to such
|
|
encoding, but you can automate the process. The translit program
|
|
(section <ref id="user-tools">) supports a TeX output option.
|
|
|
|
|
|
<sect2> KOI-8 package for teTeX
|
|
<p>
|
|
|
|
There is some new <htmlurl
|
|
url="ftp://xray.sai.msu.su/pub/outgoing/teTeX-rus/" name="teTeX-rus
|
|
package">. It is reported to support KOI-8 character set and have all
|
|
basic stuff required for TeX and LaTeX. I personally haven't tried it
|
|
yes, although I heard about it's successfull usage.
|
|
|
|
<bf/NOTE:/ This package requires you to reconfigure and rebuild some
|
|
parts of your <bf/teTeX/ package (for example the precompiled LaTeX
|
|
macros). <bf>Unless you know what you are doing, you shouldn't try it
|
|
without necessary care. Otherwise, you may be better off by borrowing
|
|
the precompiled parts fron somebody on the net</bf>
|
|
|
|
|
|
<sect2>Using the cmcyralt package for LaTeX
|
|
<p>
|
|
|
|
The <tt/cmcyralt/ package can be found on any CTAN (Comprehensive TeX
|
|
Archive Network) site like <tt/ftp.dante.de/. You should obtain two
|
|
pieces: the fonts collection from <tt>fonts/cmcyralt</tt> and the
|
|
styles and hyphenation rules from
|
|
<tt>macros/latex/contrib/others/cmcyralt</tt>.
|
|
|
|
<bf/Note:/ Make sure you have the <tt/Sauter/ package installed, since
|
|
<tt/cmcyralt/ requires some fonts from it. You can get this package
|
|
from CTAN site as well.
|
|
|
|
Now you should do the following:
|
|
|
|
<enum>
|
|
<item>Put the new fonts to the TeX fonts tree. On my system (Slackware
|
|
2.2) I created a <tt/cmcyralt/ directory in the
|
|
<tt>/usr/lib/texmf/fonts/cm/</tt>. Create the <tt/src/, <tt/tfm/, and
|
|
<tt/vf/ subdirectories in it. Put there <tt/.mf/, <tt/.tfm/, and
|
|
<tt/vf/ files respectively.
|
|
<item>Put the font driver files (<tt/*.fd/) from the styles archive to the
|
|
appropriate place (in my case it was
|
|
<tt>/usr/lib/texmf/tex/latex/fd</tt>).
|
|
<item>Put the style files (<tt/*.sty/) to the appropriate LaTeX styles
|
|
directory (in my case <tt>/usr/lib/texmf/tex/latex/sty</tt>).
|
|
</enum>
|
|
|
|
Now the hyphenation setup. This requires to remake the LaTeX base
|
|
file.
|
|
|
|
<enum
|
|
<item>The file <tt/hyphen.cfg/ contains the directives for both
|
|
English and Russian hyphenation. Extract the one for Russian and place
|
|
it to the LaTeX hyphenation config file <tt/lthyphen.ltx/. In my case,
|
|
that file was in <tt>/usr/lib/texmf/tex/latex/latex-base</tt>.
|
|
|
|
<item>Put the <tt/rhyphen.tex/ to the same directory. It is needed for
|
|
making the new base file. Later, you can remove it.
|
|
|
|
<item>Do '<tt/make/' in that directory. Don't for get to make a link
|
|
from <tt/Makefile/ to <tt/Makefile.unx/. During the make process check
|
|
the output. There should be a message:
|
|
|
|
<verb>
|
|
Loading hyphenation patterns for Russian.
|
|
</verb>
|
|
|
|
If everything goes OK, you will get the new <tt/latex.fmt/ in that
|
|
directory. Put it to the appropriate place, where the previous one was
|
|
(like <tt>/usr/lib/texmf/ini/</tt>). <bf/Don't forget to save the
|
|
previous one!/.
|
|
</enum>
|
|
|
|
This is it. The installation is complete. Try processing the examples
|
|
found in the styles archive. If you are to create the PostScript files
|
|
without any problems, then everything is OK. Now, to use Cyrillic in
|
|
LaTeX, prepend your document with the following directive:
|
|
|
|
<verb>
|
|
\usepackage{cmcyralt}
|
|
</verb>
|
|
|
|
For more details, see the <tt/README/ file in the <tt/cmcyralt/ styles
|
|
archive.
|
|
|
|
<bf/Note:/ if you do have problems with the examples, provided you
|
|
have installed the things right, then probably your TeX system hasn't
|
|
been installed correctly. For example, during my first try, every
|
|
attempt to create the <tt/.pk/ files for the russian fonts failed
|
|
(<tt/MakeTeXPK/ stage). A substantial investigation discovered some
|
|
implicit conflict between the <it/localfont/ and <it/ljfour/
|
|
<tt/METAFONT/ configurations. It used to work before, but kept
|
|
crashing after the <tt/cmcyralt/ installation. Contact your local TeX
|
|
guru - TeX is very (sometimes too much) complicated to reconfigure it
|
|
without any prior knowledge.
|
|
|
|
|
|
<sect2>Using the CyrTUG package
|
|
<p>
|
|
|
|
You can obtain the CyrTUG package from the <htmlurl
|
|
url="ftp://sunsite.unc.edu/pub/academic/russian-studies/Software"
|
|
name="SunSite archive">. Get the files <tt/CyrTUGfonts.tar.gz/,
|
|
<tt/CyrTUGmacro.tar.gz/, and <tt/hyphen.tar.Z/.
|
|
|
|
The process of installation doesn't differ from too much the previous
|
|
one.
|
|
|
|
|
|
<!--
|
|
|
|
<sect1>The ApplixWare suite
|
|
<p>
|
|
|
|
As far as I know, <bf/ApplixWare/ allows
|
|
|
|
-->
|
|
|
|
|
|
<sect1>The StarOffice suite
|
|
<p>
|
|
|
|
Youri Kovalenko (<htmlurl url="http://www.inp.nsk.su/~kovalenko">) has
|
|
compiled a concise summary on StarOffice russification. It is located
|
|
at <htmlurl
|
|
url="ftp://sky.inp.nsk.su/archives_src/linux/StarOffice/russification.txt">.
|
|
I never had a chance to try it, so I cannot say anything about it's
|
|
correctness.
|
|
|
|
Another source of information on the subject is compiled by Eugene
|
|
Demidov (<htmlurl url="mailto:jack@gpi.ru">) and is located at
|
|
<htmlurl url="ftp://ftp.kapella.gpi.ru/pub/cyrillic/psfonts/README">.
|
|
|
|
|
|
<sect>Printing and PostScript
|
|
<p>
|
|
|
|
|
|
<sect1>Text to PostScript conversion
|
|
<p>
|
|
|
|
Sometimes you have just a plain ASCII KOI8-R text and you want to print
|
|
it just to get it on the paper. One of the easiest ways to achieve
|
|
that is to use special programs converting text to PostScript.
|
|
|
|
There are a number of programs doing such conversion. I personally
|
|
prefer <htmlurl url="http://www-inf.enst.fr/~demaille/a2ps.html"
|
|
name="a2ps">. Originally developed as a simple text-to-PostScript
|
|
converter it became a big and highly configurable program with many
|
|
options and allows you to manage various page layouts, syntax
|
|
highlighting etc. Another tool (now available as a part of the
|
|
<em/GNU/ project) is <htmlurl url="ftp://prep.ai.mit.edu/pub/gnu"
|
|
name="enscript">.
|
|
|
|
|
|
<sect2>An a2ps converter
|
|
<p>
|
|
|
|
A text to PostScript converter has been around for a while and is one
|
|
of the most versatile printing tools. The author proved to be very
|
|
open to suggestions, so since the release 4.9.8 <bf/a2ps/ supports
|
|
Cyrillic right off-the-shelf. All you need is a PostScript printer.
|
|
|
|
The command I use is:
|
|
|
|
<verb>
|
|
a2ps -X koi8r --print-anyway <file>
|
|
</verb>
|
|
|
|
|
|
<sect2>The GNU enscript
|
|
<p>
|
|
|
|
The GNU <bf/enscript/ program is also designed for converting text to
|
|
PostScript and it also has a non-ASCII codeset support. It doesn't
|
|
have Cyrillic PostScript fonts, but it is very easy to get them, as
|
|
will be explained below (thanks to Michael Van Canneyt):
|
|
|
|
<enum>
|
|
<item>Install the newest <bf/enscript/. As of now, the most recent
|
|
release is 1.5. You may either get the one from the <htmlurl
|
|
url="ftp://prep.ai.mit.edu/pub/gnu" name="GNU FTP archive">, or take
|
|
an RPM package from the <htmlurl
|
|
url="ftp://ftp.redhat.com/pub/contrib/i386/" name="Redhat"> site.
|
|
|
|
<item>Now, if you are a lucky RedHat Linux user, download and install <url
|
|
url="ftp://ftp.redhat.com/pub/contrib/i386/enscript-fonts-koi8-1.0-1.i386.rpm"
|
|
name="Cyrillic Textbook font">.
|
|
|
|
<item>If you don't use RPM, download a file <tt/textbook.tar.gz/ from
|
|
the Cyrillic Software collection on <url
|
|
url="ftp://sunsite.unc.edu/pub/academic/russian-studies/Software/"
|
|
name="sunsite.unc.edu">. Extract it to a directory, where
|
|
<bf/enscript/ fonts are located (usually
|
|
<tt>/usr/share/enscript</tt>). Now change to that directory and run
|
|
the following command:
|
|
|
|
<verb>
|
|
mkafmmap *.afm
|
|
</verb>
|
|
|
|
<item>The setup is finished. Try to print some text in KOI8-R Cyrillic
|
|
with the following command:
|
|
|
|
<verb>
|
|
enscript --font=Textbook8 --encoding=koi8 some.file
|
|
</verb>
|
|
|
|
</enum>
|
|
|
|
If you want a really quick and dirty solution and you don't care about
|
|
the output quality and all you need is just Cyrillic on the paper, try
|
|
the <htmlurl url="http://www.siber.com/sib/russify/converters/"
|
|
name="rtxt2ps"> package. It is a very simple no-frills
|
|
text-to-PostScript conversion program. The output quality is not very
|
|
good (or, to be honest, just <em/bad/) but it does it's job.
|
|
|
|
|
|
<sect1>Text to TeX conversion
|
|
<p>
|
|
|
|
If all you need is just to print an ASCII text without any additional
|
|
word processing, you may try to use some programs, which would convert
|
|
your Cyrillic text to a ready-to-process TeX file. One of the best
|
|
programs for such purposes is <bf/translit/ (see section <ref
|
|
id="conversion">). In this case, you don't even have to bother about
|
|
installing the Cyrillic fonts for TeX, since <bf/translit/ uses a
|
|
<em/Washington Cyrillic/ package, which is included in most TeX
|
|
distributions (or am I wrong?)
|
|
|
|
|
|
<sect>Cyrillic in PostScript<label id="postscript">
|
|
<p>
|
|
|
|
Experts say PostScript is easy. I cannot judge - I've got too many
|
|
things to learn to spare some time to learn PostScript. So I'll try to
|
|
use my sad experience with it. <bf/I'll appreciate any feedback from
|
|
you guys who know more on the subject than I do/ (approx. 99% of the
|
|
Earth population).
|
|
|
|
Basically, in order to print a Cyrillic text using PostScript, you
|
|
have to make sure about the following things:
|
|
|
|
<itemize>
|
|
<item>Cyrillic font is <em/loaded/ or included in the document.
|
|
|
|
<item>Cyrillic text is included in the document.
|
|
|
|
<item>Cyrillic text uses the appropriate character codes which
|
|
correspond to the font's requirements.
|
|
|
|
<item>An appropriate font is <em/selected/ in order to print Cyrillic
|
|
text.
|
|
</itemize>
|
|
|
|
There is no solution general enough to be recommended as an ultimate
|
|
treatment. I'll try to outline various ways to cope with different
|
|
problems related to the subject.
|
|
|
|
One way to address Cyrillic setup problems generally enough is to use
|
|
<htmlurl url="http://www.cs.wisc.edu/~ghost/index.html"
|
|
name="Ghostscript">. <bf/Ghostscript/ (or just <bf/gs/ in the
|
|
newspeak) is a free (well quasi-free) PostScript interpreter. It has
|
|
many advantages; among them:
|
|
|
|
<itemize>
|
|
|
|
<item>Ability to run on many platforms (various Unices, Windows etc)
|
|
|
|
<item>Support for a wide number of non-PostScript printers
|
|
|
|
<item>Good degree of configurability
|
|
|
|
</itemize>
|
|
|
|
What is important in our particular case, is that once
|
|
<bf/Ghostscript/ is set up, we can do all printing through it, thus
|
|
eliminating extra setup for other PostScript devices (for example
|
|
<em/HP LaserJet IV/)
|
|
|
|
|
|
<sect1>Adding Cyrillic fonts to Ghostscript
|
|
<p>
|
|
|
|
This is important, since you probably don't want to put a
|
|
responsibility to other programs to insert Cyrillic fonts in the
|
|
PostScript output. Instead, you add them to <bf/gs/ and just make the
|
|
programs generate Cyrillic output compatible with the fonts.
|
|
|
|
To add a new font (in <tt/pfa/ or <tt/pfb/ form) in <bf/gs/, you have
|
|
to:
|
|
|
|
<enum>
|
|
|
|
<item>Put it in the <bf/gs/ fonts directory (ie.
|
|
<tt>/usr/lib/ghostscript/fonts</tt>).
|
|
|
|
<item>Add the appropriate names and aliases for the font in the
|
|
<tt/Fontmap/ file in the <bf/gs/ directory.
|
|
|
|
</enum>
|
|
|
|
Recently a decent set of Cyrillic fonts for <bf/GhostScript/ appeared.
|
|
It is located in <htmlurl
|
|
url="ftp://ftp.kapella.gpi.ru/pub/cyrillic/psfonts"
|
|
name="ftp.kapella.gpi.ru">. This one even has a necessary part to add
|
|
to the <tt/Fontmap/ file. You have to download the contents of the
|
|
<tt>/pub/cyrillic/psfonts</tt> directory. The <tt/README/ file
|
|
describes the necessary details.
|
|
|
|
|
|
<sect>Print setup
|
|
<p>
|
|
|
|
Printing is always tricky. There are different printers from different
|
|
vendors with different facilities. Even for a native printing there is
|
|
no uniform solution (this applies not only to UNIX, but to other
|
|
operating systems as well.
|
|
|
|
Printers have different control languages and often they have very
|
|
different views on foreign language support. The good news is that on
|
|
control language seems to be recognized as a de-facto standard for
|
|
print job description - it is a PostScript language developed by
|
|
<htmlurl url="http://www.adobe.com" name="Adobe Corporation">.
|
|
|
|
Another problem is a variety of requirements to the print services.
|
|
For example, sometimes you want just to print a piece if C program,
|
|
containing comments in Russian, so you don't need any pretty-printing
|
|
- just a raw ASCII output in a single font. Another time, when you
|
|
design a postcard for your girlfriend, you'll probably need to typeset
|
|
some document with different fonts etc. This will definitely require
|
|
more effort to setup Cyrillic support.
|
|
|
|
To accomplish the former task you just have to make your printer
|
|
understand <em/one/ Cyrillic font and (maybe) install some filter
|
|
program to generate data in appropriate format. To accomplish the
|
|
latter one, you have to teach your printer different fonts and have a
|
|
special software.
|
|
|
|
There is also something in the middle, when you get a program which
|
|
knows how to generate both the fonts and the appropriate printer
|
|
input, so you can say do some aource code pretty-printing without
|
|
sophisticated word processing systems.
|
|
|
|
All these options will be more or less covered below.
|
|
|
|
|
|
<sect1>Pre-loading Cyrillic fonts into a non-PostScript printer
|
|
<p>
|
|
|
|
If you have a good old dot matrix printer and all you need is to print
|
|
a raw KOI8-R text, try the following:
|
|
|
|
<enum>
|
|
|
|
<item>Find a proper KOI8-R font for your printer. Check out the
|
|
MS-DOSish stuff on the Internet (for example the <url
|
|
url="ftp://ftp.simtel.net" name="SimTel archive">).
|
|
|
|
<item>Learn from the manual, how to load such font into your printer
|
|
and, probably, write a simple program doing that.
|
|
|
|
<item>Run this program from the appropriate <tt/rc/ file at a boot
|
|
time.
|
|
|
|
</enum>
|
|
|
|
Thus, having Cyrillic characters in the upper part of the printer's
|
|
character set will allow you to print you texts in Russian without any
|
|
hussle.
|
|
|
|
Alternatively to the <em/KOI8-R/ fonts you may try to use the <em/Alt/
|
|
font. There are two reasons for that:
|
|
|
|
<itemize>
|
|
|
|
<item>It may be probably much easier to find an <em/Alt/ font, since
|
|
those were very widespread in the MS-DOS culture.
|
|
|
|
<item>Having a proper <em/Alt/ font will allow you to print
|
|
pseudo-graphic characters as well.
|
|
|
|
</itemize>
|
|
|
|
However in this case, you'll have to convert your texts from
|
|
<em/KOI8-R/ to <em/Alt/ before sending them to a printer. This is quite
|
|
easy, since there are a lot of programs doing that (see <ref
|
|
id="user-tools" name="translit"> for example), so you just have to
|
|
call such program properly in the <tt/if/ field in
|
|
<tt>/etc/printcap</tt> file. For example, with the <bf/translit/
|
|
program you may specify:
|
|
|
|
<verb>
|
|
|
|
if=/usr/bin/translit -t koi8-alt.rus
|
|
|
|
</verb>
|
|
|
|
See <bf/printcap(5)/ for details.
|
|
|
|
|
|
<sect1>Printing with different fonts
|
|
<p>
|
|
|
|
One great way to cope with different printers and fonts is to use
|
|
<bf/TeX/ (see section <ref id="tex">). TeX drivers handle all details,
|
|
so once you make TeX understand Cyrillic fonts, you are done.
|
|
|
|
Another possibility is to use <em/PostScript/. I decided to devote an
|
|
entire chapter <ref id="postscript"> to the subject, since it is not
|
|
simple.
|
|
|
|
Finally, there are other word processors, which have printer drivers.
|
|
I never tried anything apart from TeX, so I cannot suggest anything.
|
|
|
|
|
|
<sect>Localization and Internationalization<label id="l-n-i">
|
|
<p>
|
|
|
|
So far, I described how to make various programs understand Cyrillic
|
|
text. Basically, each program required it's own method, very different
|
|
from the others. Moreover, some programs had incomplete support of
|
|
languages other than English. Not to mention their inability to
|
|
interact using user's mother tongue instead of English.
|
|
|
|
The problems outlined above are very pressing, since software is
|
|
rarely developed for home market only. Therefore, rewriting
|
|
substantial parts of software each time the new international market
|
|
is approached is very ineffective; and making each program implement
|
|
it's own proprietary solution for handling different languages is not
|
|
a great idea in a long term either.
|
|
|
|
Therefore, a need for standardization arises. And the standard shows
|
|
up.
|
|
|
|
Everything related to the problems above is divided by two basic
|
|
concepts: <em/localization/ and <em/internationalization/. By
|
|
localization we mean making programs able to handle different language
|
|
conventions for different countries. Let me give an example. The way
|
|
date is printed in the United States is MM/DD/YY. In Russia however,
|
|
the most popular format is DD.MM.YY. Another issues include
|
|
time representation, printing numbers and currency representation
|
|
format. Apart from it, one of the most important aspect of
|
|
localization is defining the appropriate character classes, that is,
|
|
defining which characters in the character set are language units
|
|
(letters) and how they are ordered. On the other hand, localization
|
|
doesn't deal with fonts.
|
|
|
|
Internationalization (or <em/i18n/ for brevity) is supposed to solve
|
|
the problems related to the ability of the program interact with the
|
|
user in his native language.
|
|
|
|
Both of the concepts above had to be implemented in a standard, giving
|
|
programmers a consistent way of making the programs aware of national
|
|
environments.
|
|
|
|
Althogh the standard hasn't been finished yet, many parts actually
|
|
have; so they can be used without much of a problem.
|
|
|
|
I am going to outline the general scheme of making the programs use
|
|
the features above in a standard way. Since this deserves a separate
|
|
document, I'll just try to give a very basic description and pointers
|
|
to more thorough sources.
|
|
|
|
|
|
<sect1>Locale<label id="locale">
|
|
<p>
|
|
|
|
One of the main concept of the localization is a <em/locale/. By
|
|
locale is meant a set of conventions specific to a certain language in
|
|
a certain country. It is usually wrong to say that locale is just
|
|
country-specific. For example, in Canada two locales can be defined -
|
|
Canada/English language and Canada/French language. Moreover,
|
|
Canada/English is not equivalent to UK/English or US/English, just as
|
|
Canada/French is not equivalent to France/French or
|
|
Switzerland/French.
|
|
|
|
|
|
<sect2>How to use locale<label id="locale-use">
|
|
<p>
|
|
|
|
Each locale is a special database, defining at least the following
|
|
rules:
|
|
|
|
<enum>
|
|
|
|
<item>character classification and conversion
|
|
|
|
<item>monetary values representation
|
|
|
|
<item>number representation (ie. the decimal character)
|
|
|
|
<item>date/time formatting
|
|
|
|
</enum>
|
|
|
|
|
|
In RedHat 4.1, which I am using there are actually <it/two/ locale
|
|
databases: one for the C library (<tt/libc/) and one for the <tt/X/
|
|
libraries. In the ideal case there should be only one locale database
|
|
for everything.
|
|
|
|
To change your default locale, it is usually enough to set the
|
|
<tt/LANG/ environment variable. For example, in <bf/sh/:
|
|
|
|
<verb>
|
|
LANG=ru_RU
|
|
export LANG
|
|
</verb>
|
|
|
|
Sometimes, you may want to change only one aspect of the locale
|
|
without affecting the others. For example, you may decide (God knows
|
|
why) to stick with <tt/ru_RU/ locale, but print numbers according to
|
|
the standard POSIX one. For such cases, there is a set of environment
|
|
variables, which you can you to configure specific parts for the
|
|
current locale. In the last exaple it would be:
|
|
|
|
<verb>
|
|
LANG=ru_RU
|
|
LC_NUMERIC=POSIX
|
|
export LANG LC_NUMERIC
|
|
</verb>
|
|
|
|
For the full description of those variables, see <bf/locale(7)/.
|
|
|
|
Now let's be more Linux-specific. Unfortunately, Linux <tt/libc/
|
|
version 5.3.12, supplied with RedHat 4.1, doesn't have a russian
|
|
locale. In this case one must be downloaded from the Internet (I don't
|
|
know the exact address, however).
|
|
|
|
To check, locale for which languages you have, run '<tt/locale
|
|
-a/'. It will list all locale databases, available to libc.
|
|
|
|
Fortunately, Linux community is rapidly moving to the new GNU libc
|
|
(<tt/glibc/ version 2, which is much more POSIX-compliant and has a
|
|
proper russian locale. Next "stable" RedHat system will already use
|
|
<tt/glibc/.
|
|
|
|
As for the <tt/X/ libraries, they have their own locale database. In
|
|
the version I am using (<tt/XFree86 3.3/), there already is a russian
|
|
locale database. I am not sure about the previous versions. In any
|
|
case, you may check it by looking into <tt//usr/lib/X11/locale/ (on
|
|
most systems). In my case, there already are subdirectories named
|
|
<tt/koi8-r/ and even <tt/iso8859-5/.
|
|
|
|
|
|
<sect2>Locale-aware programming<label id="locale-programming">
|
|
<p>
|
|
|
|
With locale, program don't have to implement explicitly various
|
|
character conversion and comparison rules, described above. Instead,
|
|
they use special API which make use of the rules defined by
|
|
locale. Also, it is not necessary for program to use the same locale
|
|
for all rules - it is possible to handle different rules using
|
|
different locales (although such technique should be strongly
|
|
discouraged).
|
|
|
|
From the <bf/setlocale(3)/ manual page:
|
|
|
|
<quote>
|
|
A program may be made portable to all locales by calling
|
|
<tt/setlocale(LC_ALL, "" )/ after program initialization, by
|
|
using the values returned from a <tt/localeconv()/ call for
|
|
locale - dependent information and by using <tt/strcoll()/ or
|
|
<tt/strxfrm()/ to compare strings.
|
|
</quote>
|
|
|
|
SunSoft, for example, defines 5 levels of program localization:
|
|
|
|
<enum>
|
|
|
|
<item><em/8-bit clean/ software. That is, the program calls
|
|
<tt/setlocale()/, it doesn't make any assumptions about the 8th bit of
|
|
each character, it users functions from <tt/ctype.h/ and limits from
|
|
<tt/limits.h/, and it takes care about <tt>signed/unsigned</tt>
|
|
issues.
|
|
|
|
It is very important <em/not/ to do any assumption about the character
|
|
set nature and ordering. The following programming practices must be
|
|
avoided:
|
|
|
|
<verb>
|
|
if (c >= 'A' && c <= 'Z') {
|
|
...
|
|
</verb>
|
|
|
|
Instead, macros from the <tt/ctype.h/ header file are locale-aware and
|
|
should be used in all such occasions.
|
|
|
|
<item>Formats, sorting methods, paper sizes. The program uses
|
|
<tt/strcoll()/ and <tt/strxfrm()/ instead of <tt/strcmp()/ for
|
|
strings, it uses <tt/time()/, <tt/localtime()/, and strftime()/ for
|
|
time services, and finally, it uses <tt/localeconv()/ for a proper
|
|
numbers and currency representation.
|
|
|
|
<item>Visible text in message catalogs. The program must isolate all
|
|
visible text in special <em/message catalogs/. Those map strings in
|
|
English to their translation to other languages. Selection of messages
|
|
in an appropriate for a particular environment language is done in a
|
|
way which is completely transparent for both the program and it's
|
|
user. To make use of those facilities, the program must call
|
|
<tt/gettext()/ (Sun/POSIX standard), or <tt/catgets()/ (X/Open
|
|
standard). For more information on that see section <ref id="i18n">.
|
|
|
|
|
|
<item>EUC/Unicode support. At this level, the program doesn't use the
|
|
<tt/char/ type. Instead it uses <tt/wchar_t/, which defines entities
|
|
big enough to contain Unicode characters. ANSI C defines this data
|
|
type and an appropriate API.
|
|
|
|
</enum>
|
|
|
|
|
|
For a more detaled explanation of locale, see, for example (<ref
|
|
id="Voropay1">) or (<ref id="SingleUnix">).
|
|
|
|
|
|
<sect1>Internationalization<label id="i18n">
|
|
<p>
|
|
|
|
While localization describes, how to adapt a program to a foreign
|
|
environment, <em/internationalization/ (or <em/i18n/ for brevity)
|
|
details the ways to make program communicate with a non-English
|
|
speaking user.
|
|
|
|
Before, that was done by developing some abstraction of the messages
|
|
to output from the program's code. Now, such mechanism is (more or
|
|
less) standardized. And, of course, there are free implementations of
|
|
it!
|
|
|
|
The GNU project has finally adopted the way of making the
|
|
internationalized applications. Ulrich Drepper
|
|
(<tt/drepper@ipd.info.uni-karlsruhe.de/) developed a package
|
|
<tt/gettext/. This package is available at all GNU sites like <htmlurl
|
|
url="ftp://prep.ai.mit.edu/pub/gnu/" name="prep.ai.mit.edu">. It
|
|
allows you to develop programs in the way that you can easily make
|
|
them support more languages. I don't intend to describe the
|
|
programming techniques, especially because the <tt/gettext/ package is
|
|
delivered with excellent manual.
|
|
|
|
<bf/Request for collaboration:/ If you want to learn the <tt/gettext/
|
|
package and to contribute to the GNU project simultaneously; or even
|
|
if you just want to contribute, then you can do it! GNU goes
|
|
international, so all the utilities are being made locale-aware. The
|
|
problem is to translate the messages from English to Russian (and
|
|
other languages if you'd like). Basically, what one has to do is to
|
|
get the special <tt/.po/ file consisting of the English messages for a
|
|
certain utility and to append each message with it's equivalent in
|
|
Russian. Ultimately, this will make the system speak Russian if the
|
|
user wants it to! For more details and further directions contact
|
|
Ulrich Drepper (<htmlurl
|
|
url="mailto:drepper@ipd.info.uni-karlsruhe.de"
|
|
name="drepper@ipd.info.uni-karlsruhe.de">).
|
|
|
|
|
|
|
|
<sect>Staying compatible
|
|
<p>
|
|
|
|
Being standard is not the only issue. To be really nice, one has to
|
|
provide the backward compatibility. In our case, this means that the
|
|
configuration should be tolerant to the data created using
|
|
non-standard character sets - that is the <em/Alt (cp866)/ and
|
|
<em/cp1251/ ones. Also, we should be able to run Cyrillic programs for
|
|
MS-DOS.
|
|
|
|
In most cases (except for HTTP), it is enough to provide a timely
|
|
conversion of data to <em/KOI8-R/. When we talk about raw unstructured
|
|
data, it is quite trivial - see section <ref id="user-tools"
|
|
name="Conversion Utilities">.
|
|
|
|
Another issue is the structured data. This case is more tricky. I'll
|
|
try to outline the basic roadmap of fixing it.
|
|
|
|
|
|
<sect1>MIME-based data compatibility<label id="mime">
|
|
<p>
|
|
|
|
<em/MIME/ is a standard for architecture-independent data
|
|
representation. Originally developed for mail messages, it has now
|
|
many more applications. MIME defines format, which is open to
|
|
extensions and allows architecture-specific handling of data. For
|
|
example, if I receive a mail message, containing a <em/MIME object/ of
|
|
the <tt>video/mpeg</tt> type (an encoded MPEG file), my mail reader
|
|
will automatically decode it and start an MPEG player.
|
|
|
|
Most UNIX programs, offering MIME capabilities, are based on the
|
|
<bf/metamail/ package, which contains a set of utilities and data
|
|
files to work with MIME objects. Several configuration files
|
|
(<tt>/etc/mailcap</tt> for global usage and <tt>~/.mailcap</tt> for
|
|
personal setup) define rules for handling MIME object of various
|
|
types.
|
|
|
|
Thus, if you receive a proper MIME data stream, containing text in one
|
|
of the obsolete character sets, you may define a MIME rule to convert
|
|
such text to KOI8.
|
|
|
|
Below a number of MIME rules are shown, which are supposed to handle
|
|
plain text and richtext objects, using both of the obsolete codesets,
|
|
discussed above. You may incorporate these rules into one of the MIME
|
|
configuration files.
|
|
|
|
Note, that these rules use the <bf/translit/ package to perform the
|
|
actual conversion. For more information on that program and the
|
|
conversion in general see section <ref id="user-tools"
|
|
name="Conversion Utilities">.
|
|
|
|
<verb>
|
|
text/plain; translit -t cp1251-koi8.rus < %s; test=test \
|
|
"`echo %{charset} | tr '[A-Z]' '[a-z]'`" = cp1251; copiousoutput
|
|
|
|
text/richtext; translit -t cp1251-koi8.rus < %s; test=test \
|
|
"`echo %{charset} | tr '[A-Z]' '[a-z]'`" = cp1251; copiousoutput
|
|
|
|
text/plain; translit -t alt-koi8.rus < %s; test=test \
|
|
"`echo %{charset} | tr '[A-Z]' '[a-z]'`" = cp866; copiousoutput
|
|
|
|
text/richtext; translit -t alt-koi8.rus < %s; test=test \
|
|
"`echo %{charset} | tr '[A-Z]' '[a-z]'`" = cp866; copiousoutput
|
|
|
|
text/plain; translit -t alt-koi8.rus < %s; test=test \
|
|
"`echo %{charset} | tr '[A-Z]' '[a-z]'`" = alt; copiousoutput
|
|
|
|
text/richtext; translit -t alt-koi8.rus < %s; test=test \
|
|
"`echo %{charset} | tr '[A-Z]' '[a-z]'`" = alt; copiousoutput
|
|
</verb>
|
|
|
|
Obviously enough, this will work for plain text data only. Binary
|
|
files are supposed to handle the codeset issues themselves (at least
|
|
their "parent" applications are). Therefore, if you receive a
|
|
Microsoft Word document in the <em/cp1251/ character set, the duty of
|
|
providing appropriate conversion capabilities lays upon an application
|
|
you use to read that document (for example Microsoft Word, or Applix
|
|
Words).
|
|
|
|
Unfortunately, the real situation is not that ideal. Many application
|
|
have their own idea on how to use MIME. Until recently Microsoft Mail
|
|
software had a broken MIME engine. Also, the Netscape
|
|
Navigator/Communicator mail client is notorious because of it's
|
|
sending of mail messages, encoded in <em/cp1251/ with the
|
|
<em/charset=koi8-r/ field in the message header and vice versa.
|
|
|
|
|
|
<sect1>Explicit character set conversion<label id="conversion">
|
|
<p>
|
|
|
|
There are a lot of conversion routines for Cyrillic on the
|
|
Internet. Each of them has it's own quirks and it's own degree of
|
|
Cyrillic support.
|
|
|
|
In my opinion tools must be standard. In this particular case the
|
|
"standard" conversion tool is <bf/GNU recode/. Unfortunately, the
|
|
version, found on the official GNU site (3.4) doesn't support Cyrillic
|
|
yet (only <em/ISO-8859-5/). I developed a set of conversion tables for
|
|
<em/KOI8-R/, <em/Alt/, and <em/cp1251/ for <bf/recode/ and submitted
|
|
them to the <bf/recode/ maintainer. He promised to provide Cyrillic
|
|
support in the upcoming release. Once it happens, I'll rewrite this
|
|
section to recommend <bf/GNU recode/ as the standard conversion engine
|
|
for Cyrillic.
|
|
|
|
Meanwhile, I would recommend a <htmlurl
|
|
url="ftp://ftp.osc.edu/pub/russian/translit/translit.tar.Z"
|
|
name="translit"> package. It supports many popular codesets and is
|
|
even able to produce a *TeX files (see section <ref id="tex">) from
|
|
text in Russian. Also, RedHat users will enjoy an <htmlurl
|
|
url="ftp://ftp.redhat.com/pub/contrib/i386/translit-1.03-1.i386.rpm"
|
|
name="RPM package"> for translit.
|
|
|
|
For other conversion routines, Look at <htmlurl
|
|
url="http://www.siber.com/sib/russify/" name="SovInformBureau"> or
|
|
<htmlurl url="ftp://ftp.funet.fi/pub/culture/russian/comp/converters/"
|
|
name="ftp.funet.fi">. You can even use the special mode for <tt/emacs/
|
|
(see section <ref id="emacs" name="Emacs">).
|
|
|
|
|
|
<sect1>Cyrillic in the DOS emulator<label id="dosemu">
|
|
<p>
|
|
|
|
This seems to be the only application, which may require <tt/Alt/
|
|
Cyrillic character set. The reason is that <tt/Alt/ is native to DOS
|
|
and most of DOS programs dealing with Cyrillic are <tt/Alt/-oriented.
|
|
|
|
For the console version (<tt/dos/) you just have to load a keyboard
|
|
and screen driver. Most of DOS drivers will work fine. I personally
|
|
use the <bf/rk/ driver by A. Strakhov, which works for both console
|
|
and X versions of <bf/dosemu/. Another choice is the <tt/r/ driver by
|
|
V. Kurland (sorry for possible misspelling). It is perfectly
|
|
customizable and supports many codesets, <tt/Alt/ and <tt/KOI8/ among
|
|
them. However it won't work for the X window (at least version 1.14
|
|
I'm using).
|
|
|
|
Both drivers can be found on most Russian Internet sites, for example
|
|
<url url="ftp://ftp.kiae.su/pub/cyrillic/msdos" name="Kurchatov
|
|
Institute FTP server">.
|
|
|
|
For the X version of <bf/dosemu/ you have to provide an appropriate
|
|
X font as well. Alex Bogdanov sent me such font by e-mail. It is an
|
|
original <tt/vga/ font from the <bf/dosemu/ distribution, modified for
|
|
the <tt/Alt/ codeset. Unfortunately I don't know who is the creator of
|
|
this font and where the official site is.
|
|
|
|
To setup the font for <tt/dosemu/ you should
|
|
|
|
<itemize>
|
|
<item>Introduce this font to the X. This is described in <ref
|
|
id="xfonts" name="X fonts setup">.
|
|
|
|
<item>Introduce this font to <tt/dosemu/. If the font just replaces the
|
|
original <tt/vga/ font, then it will be recognized by
|
|
default. Otherwise, you have to describe it in
|
|
<tt>/etc/dosemu.conf</tt>:
|
|
|
|
<verb>
|
|
# Font to use (without filename extensions). For example:
|
|
X { updatefreq 8 title "MS DOS" icon_name "xdos" font "vga-alt"}
|
|
</verb>
|
|
|
|
</itemize>
|
|
|
|
Finally, you have to load a keyboard driver. Note, the you don't need
|
|
a screen driver for the X window. Therefore, not all drivers will
|
|
work. At least two will: <tt/rk/ by A. Strakhov, and <tt/cyrkeyb/ by
|
|
Pete Kvitek.
|
|
|
|
|
|
<sect>Bibliography<label id="bibliography">
|
|
<p>
|
|
|
|
<enum>
|
|
|
|
<item>Andrey Chernov. <url url="http://www.nagual.ru/~ache/koi8.html"
|
|
name="KOI-8">. KOI-8 information and setup.<label id="Chernov1">
|
|
|
|
<item>Ulrich Drepper. <url
|
|
url="http://i44www.info.uni-karlsruhe.de/~drepper/conf96/paper.html"
|
|
name="Internationalization in the GNU project">. Very thorough
|
|
description of a GNU approach to i18n.
|
|
|
|
<item>Michael Karl Gschwind. <url
|
|
url="http://www.vlsivie.tuwien.ac.at/mike/i18n.html"
|
|
name="Internationalization">. Various resources on i18n.
|
|
|
|
<item>Sergei Naumov. <url
|
|
url="http://sunsite.oit.unc.edu/sergei/Software/Software.html"
|
|
name="Information on Cyrillic Software">. Cyrillic setup
|
|
information.<label id="Naumov1">
|
|
|
|
<item>The Open Group <url
|
|
url="http://www.UNIX-systems.org/online.html" name="Single UNIX
|
|
specification">.<label id="SingleUnix">
|
|
|
|
<item>RFC 1489 <url
|
|
url="file://ds.internic.net/rfc/rfc1489.txt" name="RFC 1489">
|
|
|
|
<item>Alec Voropay. <url url="http://www.sensi.org/~alec/locale"
|
|
name="Localization as it is">. General locale usage in Russian.<label
|
|
id="Voropay1">
|
|
|
|
</enum>
|
|
|
|
|
|
<sect>Summary of the various useful resources<label id="resources">
|
|
<p>
|
|
|
|
<url url="http://www-inf.enst.fr/~demaille/a2ps.html"
|
|
name="a2ps homepage">
|
|
|
|
<url url="http://www.linux.org"
|
|
name="General Linux Information">
|
|
|
|
<url url="ftp://ftp.ccl.net/pub/central\_eastern\_europe/russian"
|
|
name="Collection of Cyrillic resources">
|
|
|
|
<url url="ftp://ftp.kiae.su/cyrillic/"
|
|
name="Cyrillic resources at KIAE">
|
|
|
|
<url url="ftp://ftp.relcom.ru/cyrillic/"
|
|
name="Cyrillic resources at RELCOM">
|
|
|
|
<url url="ftp://ftp.funet.fi/pub/culture/russian/comp/"
|
|
name="Cyrillic resources at FUNET">
|
|
|
|
<url url="http://www.cronyx.ru"
|
|
name="Cronyx"> - the creators of Cyrillic fonts for the X Window System.
|
|
|
|
<url url="ftp://ftp.kapella.gpi.ru/pub/cyrillic/psfonts"
|
|
name="Cyrillic fonts for Ghostscript and StarOffice">
|
|
|
|
<url url="ftp://ftp.kiae.su/cyrillic/x11/fonts/xrus-2.1.1-src.tgz"
|
|
name="Cyrillic fonts for X">
|
|
|
|
<url url="http://www.cs.wisc.edu/~ghost/index.html"
|
|
name="Ghostscript">
|
|
|
|
<url url="ftp://prep.ai.mit.edu/pub/gnu"
|
|
name="GNU enscript">
|
|
|
|
<htmlurl url="news:relcom.fido.ru.linux"
|
|
name="relcom.fido.ru.linux"> newsgoup.
|
|
|
|
<htmlurl url="news:relcom.fido.ru.unix"
|
|
name="relcom.fido.ru.unix"> newsgoup.
|
|
|
|
<url url="http://www.ispras.ru/~knizhnik"
|
|
name="Russian dictionary for GNU ispell">
|
|
|
|
<url url="http://www.siber.com/sib/russify/"
|
|
name="SovInformBureau">
|
|
|
|
<url url="ftp://xray.sai.msu.su/pub/outgoing/teTeX-rus/"
|
|
name="teTeX russification package">
|
|
|
|
<url url="ftp://sunsite.unc.edu/pub/Linux/system/Keyboards/"
|
|
name="The kbd package for Linux">
|
|
|
|
<url url="ftp://ftp.iesd.auc.dk/"
|
|
name="The remap package for Emacs">
|
|
|
|
<url url="http://www.siber.com/sib/russify/converters/"
|
|
name="The rtxt2ps package">
|
|
|
|
<url url="http://www.math.uga.edu/~valery/russian.el"
|
|
name="The russian.el package for emacs">
|
|
|
|
<url url="ftp://ftp.osc.edu/pub/russian/translit/translit.tar.Z"
|
|
name="The translit package">
|
|
|
|
<url url="ftp://ftp.relcom.ru/pub/x11/cyrillic/"
|
|
name="The xruskb package">
|
|
|
|
<url url="ftp://sunsite.unc.edu/pub/academic/russian-studies/Software"
|
|
name="Useful Cyrillic packages">
|
|
|
|
<url url="ftp://ftp.switch.ch/mirror/linux/X11/fonts/"
|
|
name="X fonts collections">
|
|
|
|
<url url="http://www.xfree86.org"
|
|
name="XFree86 FTP site">
|
|
|
|
</article>
|
|
|
|
|
|
<!--
|
|
Local Variables:
|
|
compile-command: "sgmlcheck Cyrillic-HOWTO.sgml"
|
|
End:
|
|
-->
|
|
|
|
<!-- end of $Source$ -->
|