mirror of https://github.com/mkerrisk/man-pages
charsets.7: Minor tweaks
And restore a piece about Biblical Hebrew that was inadvertently deleted by Marko Myllynen's patch. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
a8ed5f7430
commit
42d940faf8
|
@ -1,3 +1,4 @@
|
|||
'\" t -*- coding: UTF-8 -*-
|
||||
.\" Copyright (c) 1996 Eric S. Raymond <esr@thyrsus.com>
|
||||
.\" and Copyright (c) Andries Brouwer <aeb@cwi.nl>
|
||||
.\"
|
||||
|
@ -46,7 +47,7 @@ supersets of ASCII.
|
|||
As Unicode, when using UTF-8, is ASCII-compatible, plain ASCII text
|
||||
still renders properly on modern UTF-8 using systems.
|
||||
.SS ISO 8859
|
||||
ISO 8859 is a series of 15 8-bit character sets all of which have ASCII
|
||||
ISO 8859 is a series of 15 8-bit character sets, all of which have ASCII
|
||||
in their low (7-bit) half, invisible control characters in positions
|
||||
128 to 159, and 96 fixed-width graphics in positions 160-255.
|
||||
.LP
|
||||
|
@ -79,12 +80,12 @@ Slovak, and Slovene.
|
|||
Replacing Romanian ș/ț with ş/ţ was considered tolerable.
|
||||
.TP
|
||||
8859-3 (Latin-3)
|
||||
Latin-3 was designed to cover of Esperanto, Maltese, and Turkish but
|
||||
Latin-3 was designed to cover of Esperanto, Maltese, and Turkish, but
|
||||
8859-9 later superseded it for Turkish.
|
||||
.TP
|
||||
8859-4 (Latin-4)
|
||||
Latin-4 introduced letters for North European languages such as
|
||||
Estonian, Latvian, Lithuanian but was superseded by 8859-10 and
|
||||
Estonian, Latvian, and Lithuanian, but was superseded by 8859-10 and
|
||||
8859-13.
|
||||
.TP
|
||||
8859-5
|
||||
|
@ -99,19 +100,20 @@ letter forms, but a proper display engine should combine these
|
|||
using the proper initial, medial, and final forms.
|
||||
.TP
|
||||
8859-7
|
||||
Was created for modern Greek in 1987, updated in 2003.
|
||||
Was created for Modern Greek in 1987, updated in 2003.
|
||||
.TP
|
||||
8859-8
|
||||
Supports modern Hebrew without niqud (punctuation signs).
|
||||
Supports Modern Hebrew without niqud (punctuation signs).
|
||||
Niqud and full-fledged Biblical Hebrew were outside the scope of this
|
||||
character set.
|
||||
character set;
|
||||
under Linux, UTF-8 is the preferred encoding for these.
|
||||
.TP
|
||||
8859-9 (Latin-5)
|
||||
This is a variant of Latin-1 that replaces Icelandic letters with
|
||||
Turkish ones.
|
||||
.TP
|
||||
8859-10 (Latin-6)
|
||||
Latin-6 added Inuit (Greenlandic) and Sami (Lappish) letters that were
|
||||
Latin-6 added the Inuit (Greenlandic) and Sami (Lappish) letters that were
|
||||
missing in Latin-4 to cover the entire Nordic area.
|
||||
.TP
|
||||
8859-11
|
||||
|
@ -130,7 +132,7 @@ This is the Celtic character set, covering Old Irish, Manx, Gaelic,
|
|||
Welsh, Cornish, and Breton.
|
||||
.TP
|
||||
8859-15 (Latin-9)
|
||||
Latin-9 is similar to widely used Latin-1 but replaces some less
|
||||
Latin-9 is similar to the widely used Latin-1 but replaces some less
|
||||
common symbols with the Euro sign and French and Finnish letters that
|
||||
were missing in Latin-1.
|
||||
.TP
|
||||
|
@ -142,7 +144,7 @@ KOI8-R is a non-ISO character set popular in Russia before Unicode.
|
|||
The lower half is ASCII;
|
||||
the upper is a Cyrillic character set somewhat better designed than
|
||||
ISO 8859-5.
|
||||
KOI8-U, based off KOI8-R, has better support for Ukrainian.
|
||||
KOI8-U, based on KOI8-R, has better support for Ukrainian.
|
||||
Neither of these sets are ISO-2022 compatible,
|
||||
unlike the ISO-8859 series.
|
||||
.LP
|
||||
|
@ -198,7 +200,7 @@ It is not ISO 2022 compliant.
|
|||
.SS TIS-620
|
||||
TIS-620 is a Thai national standard character set and a superset
|
||||
of ASCII.
|
||||
Like in the ISO 8859 series, Thai characters are mapped into
|
||||
In the same fashion as the ISO 8859 series, Thai characters are mapped into
|
||||
0xa1-0xfe.
|
||||
.SS Unicode
|
||||
Unicode (ISO 10646) is a standard which aims to unambiguously represent
|
||||
|
@ -262,9 +264,9 @@ Rendering of Unicode data streams is typically handled through
|
|||
"subfont" tables which map a subset of Unicode to glyphs.
|
||||
Internally
|
||||
the kernel uses Unicode to describe the subfont loaded in video RAM.
|
||||
This means that the Linux console in UTF-8 mode one can use a character
|
||||
This means that in the Linux console in UTF-8 mode, one can use a character
|
||||
set with 512 different symbols.
|
||||
This is not enough for Japanese, Chinese and
|
||||
This is not enough for Japanese, Chinese, and
|
||||
Korean, but it is enough for most other purposes.
|
||||
.LP
|
||||
.SS ISO 2022 and ISO 4873
|
||||
|
|
Loading…
Reference in New Issue