2004-11-03 13:51:07 +00:00
|
|
|
.\" Copyright (c) 1996 Eric S. Raymond <esr@thyrsus.com>
|
|
|
|
.\" and Andries Brouwer <aeb@cwi.nl>
|
|
|
|
.\"
|
|
|
|
.\" This is free documentation; you can redistribute it and/or
|
|
|
|
.\" modify it under the terms of the GNU General Public License as
|
|
|
|
.\" published by the Free Software Foundation; either version 2 of
|
|
|
|
.\" the License, or (at your option) any later version.
|
|
|
|
.\"
|
|
|
|
.\" This is combined from many sources, including notes by aeb and
|
|
|
|
.\" research by esr. Portions derive from a writeup by Roman Czyborra.
|
|
|
|
.\"
|
|
|
|
.\" Last changed by David Starner <dstarner98@aasaa.ofe.org>.
|
|
|
|
.TH CHARSETS 7 2001-05-07 "Linux" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
|
|
|
charsets \- programmer's view of character sets and internationalization
|
|
|
|
.SH DESCRIPTION
|
2007-04-12 22:42:49 +00:00
|
|
|
Linux is an international operating system.
|
|
|
|
Various of its utilities
|
2004-11-03 13:51:07 +00:00
|
|
|
and device drivers (including the console driver) support multilingual
|
|
|
|
character sets including Latin-alphabet letters with diacritical
|
|
|
|
marks, accents, ligatures, and entire non-Latin alphabets including
|
|
|
|
Greek, Cyrillic, Arabic, and Hebrew.
|
|
|
|
.LP
|
|
|
|
This manual page presents a programmer's-eye view of different
|
2007-04-12 22:42:49 +00:00
|
|
|
character-set standards and how they fit together on Linux.
|
|
|
|
Standards
|
2004-11-03 13:51:07 +00:00
|
|
|
discussed include ASCII, ISO 8859, KOI8-R, Unicode, ISO 2022 and
|
2007-04-12 22:42:49 +00:00
|
|
|
ISO 4873.
|
|
|
|
The primary emphasis is on character sets actually used as
|
2004-11-03 13:51:07 +00:00
|
|
|
locale character sets, not the myriad others that can be found in data
|
|
|
|
from other systems.
|
|
|
|
.LP
|
|
|
|
A complete list of charsets used in a officially supported locale in glibc
|
|
|
|
2.2.3 is: ISO-8859-{1,2,3,5,6,7,8,9,13,15}, CP1251, UTF-8, EUC-{KR,JP,TW},
|
|
|
|
KOI8-{R,U}, GB2312, GB18030, GBK, BIG5, BIG5-HKSCS and TIS-620 (in no
|
2007-04-12 22:42:49 +00:00
|
|
|
particular order.)
|
|
|
|
(Romanian may be switching to ISO-8859-16.)
|
2004-11-03 13:51:07 +00:00
|
|
|
.SH ASCII
|
|
|
|
ASCII (American Standard Code For Information Interchange) is the original
|
2007-04-12 22:42:49 +00:00
|
|
|
7-bit character set, originally designed for American English.
|
|
|
|
It is currently described by the ECMA-6 standard.
|
2004-11-03 13:51:07 +00:00
|
|
|
.LP
|
|
|
|
Various ASCII variants replacing the dollar sign with other currency
|
|
|
|
symbols and replacing punctuation with non-English alphabetic characters
|
2007-04-12 22:42:49 +00:00
|
|
|
to cover German, French, Spanish and others in 7 bits exist.
|
|
|
|
All are
|
2004-11-03 13:51:07 +00:00
|
|
|
deprecated; GNU libc doesn't support locales whose character sets aren't
|
|
|
|
true supersets of ASCII. (These sets are also known as ISO-646, a close
|
|
|
|
relative of ASCII that permitted replacing these characters.)
|
|
|
|
.LP
|
|
|
|
As Linux was written for hardware designed in the US, it natively
|
|
|
|
supports ASCII.
|
|
|
|
.SH ISO 8859
|
|
|
|
ISO 8859 is a series of 15 8-bit character sets all of which have US
|
|
|
|
ASCII in their low (7-bit) half, invisible control characters in
|
|
|
|
positions 128 to 159, and 96 fixed-width graphics in positions 160-255.
|
|
|
|
.LP
|
2007-04-12 22:42:49 +00:00
|
|
|
Of these, the most important is ISO 8859-1 (Latin-1).
|
|
|
|
It is natively
|
2004-11-03 13:51:07 +00:00
|
|
|
supported in the Linux console driver, fairly well supported in X11R6,
|
|
|
|
and is the base character set of HTML.
|
|
|
|
.LP
|
|
|
|
Console support for the other 8859 character sets is available under
|
|
|
|
Linux through user-mode utilities (such as
|
|
|
|
.BR setfont (8))
|
|
|
|
.\" // some distributions still have the deprecated consolechars
|
|
|
|
that modify keyboard bindings and the EGA graphics
|
|
|
|
table and employ the "user mapping" font table in the console
|
|
|
|
driver.
|
|
|
|
.LP
|
|
|
|
Here are brief descriptions of each set:
|
|
|
|
.TP
|
|
|
|
8859-1 (Latin-1)
|
|
|
|
Latin-1 covers most Western European languages such as Albanian, Catalan,
|
|
|
|
Danish, Dutch, English, Faroese, Finnish, French, German, Galician,
|
|
|
|
Irish, Icelandic, Italian, Norwegian, Portuguese, Spanish, and
|
2007-04-12 22:42:49 +00:00
|
|
|
Swedish.
|
|
|
|
The lack of the ligatures Dutch ij, French oe and old-style
|
2004-11-03 13:51:07 +00:00
|
|
|
,,German`` quotation marks is considered tolerable.
|
|
|
|
.TP
|
|
|
|
8859-2 (Latin-2)
|
|
|
|
Latin-2 supports most Latin-written Slavic and Central European
|
|
|
|
languages: Croatian, Czech, German, Hungarian, Polish, Rumanian,
|
|
|
|
Slovak, and Slovene.
|
|
|
|
.TP
|
|
|
|
8859-3 (Latin-3)
|
|
|
|
Latin-3 is popular with authors of Esperanto, Galician, and Maltese.
|
|
|
|
(Turkish is now written with 8859-9 instead.)
|
|
|
|
.TP
|
|
|
|
8859-4 (Latin-4)
|
2007-04-12 22:42:49 +00:00
|
|
|
Latin-4 introduced letters for Estonian, Latvian, and Lithuanian.
|
|
|
|
It is essentially obsolete; see 8859-10 (Latin-6) and 8859-13 (Latin-7).
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
|
|
|
8859-5
|
|
|
|
Cyrillic letters supporting Bulgarian, Byelorussian, Macedonian,
|
2007-04-12 22:42:49 +00:00
|
|
|
Russian, Serbian and Ukrainian.
|
|
|
|
Ukrainians read the letter `ghe'
|
2004-11-03 13:51:07 +00:00
|
|
|
with downstroke as `heh' and would need a ghe with upstroke to write a
|
2007-04-12 22:42:49 +00:00
|
|
|
correct ghe.
|
|
|
|
See the discussion of KOI8-R below.
|
2004-11-03 13:51:07 +00:00
|
|
|
.TP
|
|
|
|
8859-6
|
2007-04-12 22:42:49 +00:00
|
|
|
Supports Arabic.
|
|
|
|
The 8859-6 glyph table is a fixed font of separate
|
2004-11-03 13:51:07 +00:00
|
|
|
letter forms, but a proper display engine should combine these
|
|
|
|
using the proper initial, medial, and final forms.
|
|
|
|
.TP
|
|
|
|
8859-7
|
|
|
|
Supports Modern Greek.
|
|
|
|
.TP
|
|
|
|
8859-8
|
2007-04-12 22:42:49 +00:00
|
|
|
Supports modern Hebrew without niqud (punctuation signs).
|
|
|
|
Niqud and full-fledged Biblical Hebrew are outside the scope of this
|
2004-11-03 13:51:07 +00:00
|
|
|
character set; under Linux, UTF-8 is the preferred encoding for
|
|
|
|
these.
|
|
|
|
.TP
|
|
|
|
8859-9 (Latin-5)
|
|
|
|
This is a variant of Latin-1 that replaces Icelandic letters with
|
|
|
|
Turkish ones.
|
|
|
|
.TP
|
|
|
|
8859-10 (Latin-6)
|
|
|
|
Latin 6 adds the last Inuit (Greenlandic) and Sami (Lappish) letters
|
2007-04-12 22:42:49 +00:00
|
|
|
that were missing in Latin 4 to cover the entire Nordic area.
|
|
|
|
RFC 1345 listed a preliminary and different `latin6'.
|
|
|
|
Skolt Sami still
|
2004-11-03 13:51:07 +00:00
|
|
|
needs a few more accents than these.
|
|
|
|
.TP
|
|
|
|
8859-11
|
2007-04-12 22:42:49 +00:00
|
|
|
This only exists as a rejected draft standard.
|
|
|
|
The draft standard
|
2004-11-03 13:51:07 +00:00
|
|
|
was identical to TIS-620, which is used under Linux for Thai.
|
|
|
|
.TP
|
|
|
|
8859-12
|
2007-04-12 22:42:49 +00:00
|
|
|
This set does not exist.
|
|
|
|
While Vietnamese has been suggested for this
|
2004-11-03 13:51:07 +00:00
|
|
|
space, it does not fit within the 96 (non-combining) characters ISO
|
2007-04-12 22:42:49 +00:00
|
|
|
8859 offers.
|
|
|
|
UTF-8 is the preferred character set for Vietnamese use
|
2004-11-03 13:51:07 +00:00
|
|
|
under Linux.
|
|
|
|
.TP
|
|
|
|
8859-13 (Latin-7)
|
|
|
|
Supports the Baltic Rim languages; in particular, it includes Latvian
|
|
|
|
characters not found in Latin-4.
|
|
|
|
.TP
|
|
|
|
8859-14 (Latin-8)
|
|
|
|
This is the Celtic character set, covering Gaelic and Welsh.
|
|
|
|
This charset also contains the dotted characters needed for Old Irish.
|
|
|
|
.TP
|
|
|
|
8859-15 (Latin-9)
|
|
|
|
This adds the Euro sign and French and Finnish letters that were missing in
|
|
|
|
Latin-1.
|
|
|
|
.TP
|
|
|
|
8859-16 (Latin-10)
|
|
|
|
This set covers many of the languages covered by 8859-2, and supports
|
|
|
|
Romanian more completely then that set does.
|
|
|
|
.SH KOI8-R
|
2007-04-12 22:42:49 +00:00
|
|
|
KOI8-R is a non-ISO character set popular in Russia.
|
|
|
|
The lower half
|
2004-11-03 13:51:07 +00:00
|
|
|
is US ASCII; the upper is a Cyrillic character set somewhat better
|
2007-04-12 22:42:49 +00:00
|
|
|
designed than ISO 8859-5.
|
|
|
|
KOI8-U is a common character set, based off
|
|
|
|
KOI8-R, that has better support for Ukrainian.
|
|
|
|
Neither of these sets
|
2004-11-03 13:51:07 +00:00
|
|
|
are ISO-2022 compatible, unlike the ISO-8859 series.
|
|
|
|
.LP
|
|
|
|
Console support for KOI8-R is available under Linux through user-mode
|
|
|
|
utilities that modify keyboard bindings and the EGA graphics table,
|
|
|
|
and employ the "user mapping" font table in the console driver.
|
2007-04-12 22:42:49 +00:00
|
|
|
.\" Thanks to Tomohiro KUBOTA for the following sections about
|
2004-11-03 13:51:07 +00:00
|
|
|
.\" national standards.
|
|
|
|
.SH JIS X 0208
|
2007-04-12 22:42:49 +00:00
|
|
|
JIS X 0208 is a Japanese national standard character set.
|
|
|
|
Though there are some more Japanese national standard character sets (like
|
|
|
|
JIS X 0201, JIS X 0212, and JIS X 0213), this is the most important one.
|
|
|
|
Characters are mapped into a 94x94 two-byte matrix,
|
|
|
|
whose each byte is in the range 0x21-0x7e.
|
|
|
|
Note that JIS X 0208 is a character set, not an encoding.
|
|
|
|
This means that JIS X 0208
|
|
|
|
itself is not used for expressing text data.
|
|
|
|
JIS X 0208 is used
|
2004-11-03 13:51:07 +00:00
|
|
|
as a component to construct encodings such as EUC-JP, Shift_JIS,
|
2007-04-12 22:42:49 +00:00
|
|
|
and ISO-2022-JP.
|
|
|
|
EUC-JP is the most important encoding for Linux
|
|
|
|
and includes US ASCII and JIS X 0208.
|
|
|
|
In EUC-JP, JIS X 0208
|
2004-11-03 13:51:07 +00:00
|
|
|
characters are expressed in two bytes, each of which is the
|
|
|
|
JIS X 0208 code plus 0x80.
|
|
|
|
.SH KS X 1001
|
2007-04-12 22:42:49 +00:00
|
|
|
KS X 1001 is a Korean national standard character set.
|
|
|
|
Just as
|
2004-11-03 13:51:07 +00:00
|
|
|
JIS X 0208, characters are mapped into a 94x94 two-byte matrix.
|
|
|
|
KS X 1001 is used like JIS X 0208, as a component
|
|
|
|
to construct encodings such as EUC-KR, Johab, and ISO-2022-KR.
|
|
|
|
EUC-KR is the most important encoding for Linux and includes
|
2007-04-12 22:42:49 +00:00
|
|
|
US ASCII and KS X 1001.
|
|
|
|
KS C 5601 is an older name for KS X 1001.
|
2004-11-03 13:51:07 +00:00
|
|
|
.SH GB 2312
|
|
|
|
GB 2312 is a mainland Chinese national standard character set used
|
2007-04-12 22:42:49 +00:00
|
|
|
to express simplified Chinese.
|
|
|
|
Just like JIS X 0208, characters are
|
|
|
|
mapped into a 94x94 two-byte matrix used to construct EUC-CN.
|
|
|
|
EUC-CN
|
2004-11-03 13:51:07 +00:00
|
|
|
is the most important encoding for Linux and includes US ASCII and
|
2007-04-12 22:42:49 +00:00
|
|
|
GB 2312.
|
|
|
|
Note that EUC-CN is often called as GB, GB 2312, or CN-GB.
|
2004-11-03 13:51:07 +00:00
|
|
|
.SH Big5
|
|
|
|
Big5 is a popular character set in Taiwan to express traditional
|
2007-04-12 22:42:49 +00:00
|
|
|
Chinese.
|
|
|
|
(Big5 is both a character set and an encoding.)
|
|
|
|
It is a superset of US ASCII.
|
|
|
|
Non-ASCII characters are expressed in two bytes.
|
|
|
|
Bytes 0xa1-0xfe are used as leading bytes for two-byte characters.
|
|
|
|
Big5 and its extension is widely used in Taiwan and Hong Kong.
|
|
|
|
It is not ISO 2022-compliant.
|
2004-11-03 13:51:07 +00:00
|
|
|
.SH TIS 620
|
|
|
|
TIS 620 is a Thai national standard character set and a superset
|
2007-04-12 22:42:49 +00:00
|
|
|
of US ASCII.
|
|
|
|
Like ISO 8859 series, Thai characters are mapped into
|
|
|
|
0xa1-0xfe.
|
|
|
|
TIS 620 is the only commonly used character set under
|
2004-11-03 13:51:07 +00:00
|
|
|
Linux besides UTF-8 to have combining characters.
|
|
|
|
.SH UNICODE
|
|
|
|
Unicode (ISO 10646) is a standard which aims to unambiguously represent every
|
2007-04-12 22:42:49 +00:00
|
|
|
character in every human language.
|
|
|
|
Unicode's structure permits 20.1 bits to encode every character.
|
|
|
|
Since most computers don't include 20.1-bit
|
2004-11-03 13:51:07 +00:00
|
|
|
integers, Unicode is usually encoded as 32-bit integers internally and
|
|
|
|
either a series of 16-bit integers (UTF-16) (needing two 16-bit integers
|
|
|
|
only when encoding certain rare characters) or a series of 8-bit bytes
|
2007-04-12 22:42:49 +00:00
|
|
|
(UTF-8).
|
|
|
|
Information on Unicode is available at <http://www.unicode.com>.
|
2004-11-03 13:51:07 +00:00
|
|
|
.LP
|
|
|
|
Linux represents Unicode using the 8-bit Unicode Transformation Format
|
2007-04-12 22:42:49 +00:00
|
|
|
(UTF-8).
|
|
|
|
UTF-8 is a variable length encoding of Unicode.
|
|
|
|
It uses 1
|
2004-11-03 13:51:07 +00:00
|
|
|
byte to code 7 bits, 2 bytes for 11 bits, 3 bytes for 16 bits, 4 bytes
|
|
|
|
for 21 bits, 5 bytes for 26 bits, 6 bytes for 31 bits.
|
|
|
|
.LP
|
2007-04-12 22:42:49 +00:00
|
|
|
Let 0,1,x stand for a zero, one, or arbitrary bit.
|
|
|
|
A byte 0xxxxxxx
|
2004-11-03 13:51:07 +00:00
|
|
|
stands for the Unicode 00000000 0xxxxxxx which codes the same symbol
|
2007-04-12 22:42:49 +00:00
|
|
|
as the ASCII 0xxxxxxx.
|
|
|
|
Thus, ASCII goes unchanged into UTF-8, and
|
2004-11-03 13:51:07 +00:00
|
|
|
people using only ASCII do not notice any change: not in code, and not
|
|
|
|
in file size.
|
|
|
|
.LP
|
|
|
|
A byte 110xxxxx is the start of a 2-byte code, and 110xxxxx 10yyyyyy
|
2007-04-12 22:42:49 +00:00
|
|
|
is assembled into 00000xxx xxyyyyyy.
|
|
|
|
A byte 1110xxxx is the start
|
2004-11-03 13:51:07 +00:00
|
|
|
of a 3-byte code, and 1110xxxx 10yyyyyy 10zzzzzz is assembled
|
|
|
|
into xxxxyyyy yyzzzzzz.
|
|
|
|
(When UTF-8 is used to code the 31-bit ISO 10646
|
|
|
|
then this progression continues up to 6-byte codes.)
|
|
|
|
.LP
|
|
|
|
For most people who use ISO-8859 character sets, this means that the
|
2007-04-12 22:42:49 +00:00
|
|
|
characters outside of ASCII are now coded with two bytes.
|
|
|
|
This tends
|
|
|
|
to expand ordinary text files by only one or two percent.
|
|
|
|
For Russian
|
2004-11-03 13:51:07 +00:00
|
|
|
or Greek users, this expands ordinary text files by 100%, since text in
|
2007-04-12 22:42:49 +00:00
|
|
|
those languages is mostly outside of ASCII.
|
|
|
|
For Japanese users this means
|
|
|
|
that the 16-bit codes now in common use will take three bytes.
|
|
|
|
While there
|
2004-11-03 13:51:07 +00:00
|
|
|
are algorithmic conversions from some character sets (esp. ISO-8859-1) to
|
|
|
|
Unicode, general conversion requires carrying around conversion tables,
|
|
|
|
which can be quite large for 16-bit codes.
|
|
|
|
.LP
|
|
|
|
Note that UTF-8 is self-synchronizing: 10xxxxxx is a tail, any other
|
2007-04-12 22:42:49 +00:00
|
|
|
byte is the head of a code.
|
|
|
|
Note that the only way ASCII bytes occur
|
|
|
|
in a UTF-8 stream, is as themselves.
|
|
|
|
In particular, there are no
|
2006-01-13 02:09:44 +00:00
|
|
|
embedded NULs ('\\0') or '/'s that form part of some larger code.
|
2004-11-03 13:51:07 +00:00
|
|
|
.LP
|
|
|
|
Since ASCII, and, in particular, NUL and '/', are unchanged, the
|
2007-04-12 22:42:49 +00:00
|
|
|
kernel does not notice that UTF-8 is being used.
|
|
|
|
It does not care at
|
2004-11-03 13:51:07 +00:00
|
|
|
all what the bytes it is handling stand for.
|
|
|
|
.LP
|
|
|
|
Rendering of Unicode data streams is typically handled through
|
2007-04-12 22:42:49 +00:00
|
|
|
`subfont' tables which map a subset of Unicode to glyphs.
|
|
|
|
Internally
|
2004-11-03 13:51:07 +00:00
|
|
|
the kernel uses Unicode to describe the subfont loaded in video RAM.
|
|
|
|
This means that in UTF-8 mode one can use a character set with 512
|
2007-04-12 22:42:49 +00:00
|
|
|
different symbols.
|
|
|
|
This is not enough for Japanese, Chinese and
|
2004-11-03 13:51:07 +00:00
|
|
|
Korean, but it is enough for most other purposes.
|
|
|
|
.LP
|
|
|
|
At the current time, the console driver does not handle combining
|
2007-04-12 22:42:49 +00:00
|
|
|
characters.
|
|
|
|
So Thai, Sioux and any other script needing combining
|
2004-11-03 13:51:07 +00:00
|
|
|
characters can't be handled on the console.
|
|
|
|
.SH "ISO 2022 AND ISO 4873"
|
|
|
|
The ISO 2022 and 4873 standards describe a font-control model
|
2007-04-12 22:42:49 +00:00
|
|
|
based on VT100 practice.
|
|
|
|
This model is (partially) supported
|
2004-11-03 13:51:07 +00:00
|
|
|
by the Linux kernel and by
|
|
|
|
.BR xterm (1).
|
|
|
|
It is popular in Japan and Korea.
|
|
|
|
.LP
|
|
|
|
There are 4 graphic character sets, called G0, G1, G2 and G3,
|
|
|
|
and one of them is the current character set for codes with
|
|
|
|
high bit zero (initially G0), and one of them is the current
|
|
|
|
character set for codes with high bit one (initially G1).
|
|
|
|
Each graphic character set has 94 or 96 characters, and is
|
2007-04-12 22:42:49 +00:00
|
|
|
essentially a 7-bit character set.
|
|
|
|
It uses codes either
|
2004-11-03 13:51:07 +00:00
|
|
|
040-0177 (041-0176) or 0240-0377 (0241-0376).
|
|
|
|
G0 always has size 94 and uses codes 041-0176.
|
|
|
|
.LP
|
|
|
|
Switching between character sets is done using the shift functions
|
|
|
|
^N (SO or LS1), ^O (SI or LS0), ESC n (LS2), ESC o (LS3),
|
|
|
|
ESC N (SS2), ESC O (SS3), ESC ~ (LS1R), ESC } (LS2R), ESC | (LS3R).
|
|
|
|
The function LS\fIn\fP makes character set G\fIn\fP the current one
|
|
|
|
for codes with high bit zero.
|
|
|
|
The function LS\fIn\fPR makes character set G\fIn\fP the current one
|
|
|
|
for codes with high bit one.
|
|
|
|
The function SS\fIn\fP makes character set G\fIn\fP (\fIn\fP=2 or 3)
|
|
|
|
the current one for the next character only (regardless of the value
|
|
|
|
of its high order bit).
|
|
|
|
.LP
|
|
|
|
A 94-character set is designated as G\fIn\fP character set
|
|
|
|
by an escape sequence ESC ( xx (for G0), ESC ) xx (for G1),
|
|
|
|
ESC * xx (for G2), ESC + xx (for G3), where xx is a symbol
|
|
|
|
or a pair of symbols found in the ISO 2375 International
|
|
|
|
Register of Coded Character Sets.
|
|
|
|
For example, ESC ( @ selects the ISO 646 character set as G0,
|
|
|
|
ESC ( A selects the UK standard character set (with pound
|
|
|
|
instead of number sign), ESC ( B selects ASCII (with dollar
|
|
|
|
instead of currency sign), ESC ( M selects a character set
|
|
|
|
for African languages, ESC ( ! A selects the Cuban character
|
|
|
|
set, etc. etc.
|
|
|
|
.LP
|
|
|
|
A 96-character set is designated as G\fIn\fP character set
|
2005-07-06 07:41:37 +00:00
|
|
|
by an escape sequence ESC \- xx (for G1), ESC . xx (for G2)
|
2004-11-03 13:51:07 +00:00
|
|
|
or ESC / xx (for G3).
|
2005-07-06 07:41:37 +00:00
|
|
|
For example, ESC \- G selects the Hebrew alphabet as G1.
|
2004-11-03 13:51:07 +00:00
|
|
|
.LP
|
|
|
|
A multibyte character set is designated as G\fIn\fP character set
|
|
|
|
by an escape sequence ESC $ xx or ESC $ ( xx (for G0),
|
|
|
|
ESC $ ) xx (for G1), ESC $ * xx (for G2), ESC $ + xx (for G3).
|
|
|
|
For example, ESC $ ( C selects the Korean character set for G0.
|
|
|
|
The Japanese character set selected by ESC $ B has a more
|
|
|
|
recent version selected by ESC & @ ESC $ B.
|
|
|
|
.LP
|
|
|
|
ISO 4873 stipulates a narrower use of character sets, where G0
|
|
|
|
is fixed (always ASCII), so that G1, G2 and G3
|
|
|
|
can only be invoked for codes with the high order bit set.
|
|
|
|
In particular, ^N and ^O are not used anymore, ESC ( xx
|
|
|
|
can be used only with xx=B, and ESC ) xx, ESC * xx, ESC + xx
|
2005-07-06 07:41:37 +00:00
|
|
|
are equivalent to ESC \- xx, ESC . xx, ESC / xx, respectively.
|
2004-11-03 13:51:07 +00:00
|
|
|
.SH "SEE ALSO"
|
|
|
|
.BR console (4),
|
|
|
|
.BR console_codes (4),
|
|
|
|
.BR console_ioctl (4),
|
|
|
|
.BR ascii (7),
|
|
|
|
.BR iso_8859-1 (7),
|
|
|
|
.BR unicode (7),
|
|
|
|
.BR utf-8 (7)
|