charsets.7: Minor tweaks

And restore a piece about Biblical Hebrew that was
inadvertently deleted by Marko Myllynen's patch.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2014-06-05 14:25:56 +02:00
parent a8ed5f7430
commit 42d940faf8
1 changed files with 14 additions and 12 deletions

View File

@ -1,3 +1,4 @@
'\" t -*- coding: UTF-8 -*-
.\" Copyright (c) 1996 Eric S. Raymond <esr@thyrsus.com>
.\" and Copyright (c) Andries Brouwer <aeb@cwi.nl>
.\"
@ -46,7 +47,7 @@ supersets of ASCII.
As Unicode, when using UTF-8, is ASCII-compatible, plain ASCII text
still renders properly on modern UTF-8 using systems.
.SS ISO 8859
ISO 8859 is a series of 15 8-bit character sets all of which have ASCII
ISO 8859 is a series of 15 8-bit character sets, all of which have ASCII
in their low (7-bit) half, invisible control characters in positions
128 to 159, and 96 fixed-width graphics in positions 160-255.
.LP
@ -79,12 +80,12 @@ Slovak, and Slovene.
Replacing Romanian ș/ț with ş/ţ was considered tolerable.
.TP
8859-3 (Latin-3)
Latin-3 was designed to cover of Esperanto, Maltese, and Turkish but
Latin-3 was designed to cover of Esperanto, Maltese, and Turkish, but
8859-9 later superseded it for Turkish.
.TP
8859-4 (Latin-4)
Latin-4 introduced letters for North European languages such as
Estonian, Latvian, Lithuanian but was superseded by 8859-10 and
Estonian, Latvian, and Lithuanian, but was superseded by 8859-10 and
8859-13.
.TP
8859-5
@ -99,19 +100,20 @@ letter forms, but a proper display engine should combine these
using the proper initial, medial, and final forms.
.TP
8859-7
Was created for modern Greek in 1987, updated in 2003.
Was created for Modern Greek in 1987, updated in 2003.
.TP
8859-8
Supports modern Hebrew without niqud (punctuation signs).
Supports Modern Hebrew without niqud (punctuation signs).
Niqud and full-fledged Biblical Hebrew were outside the scope of this
character set.
character set;
under Linux, UTF-8 is the preferred encoding for these.
.TP
8859-9 (Latin-5)
This is a variant of Latin-1 that replaces Icelandic letters with
Turkish ones.
.TP
8859-10 (Latin-6)
Latin-6 added Inuit (Greenlandic) and Sami (Lappish) letters that were
Latin-6 added the Inuit (Greenlandic) and Sami (Lappish) letters that were
missing in Latin-4 to cover the entire Nordic area.
.TP
8859-11
@ -130,7 +132,7 @@ This is the Celtic character set, covering Old Irish, Manx, Gaelic,
Welsh, Cornish, and Breton.
.TP
8859-15 (Latin-9)
Latin-9 is similar to widely used Latin-1 but replaces some less
Latin-9 is similar to the widely used Latin-1 but replaces some less
common symbols with the Euro sign and French and Finnish letters that
were missing in Latin-1.
.TP
@ -142,7 +144,7 @@ KOI8-R is a non-ISO character set popular in Russia before Unicode.
The lower half is ASCII;
the upper is a Cyrillic character set somewhat better designed than
ISO 8859-5.
KOI8-U, based off KOI8-R, has better support for Ukrainian.
KOI8-U, based on KOI8-R, has better support for Ukrainian.
Neither of these sets are ISO-2022 compatible,
unlike the ISO-8859 series.
.LP
@ -198,7 +200,7 @@ It is not ISO 2022 compliant.
.SS TIS-620
TIS-620 is a Thai national standard character set and a superset
of ASCII.
Like in the ISO 8859 series, Thai characters are mapped into
In the same fashion as the ISO 8859 series, Thai characters are mapped into
0xa1-0xfe.
.SS Unicode
Unicode (ISO 10646) is a standard which aims to unambiguously represent
@ -262,9 +264,9 @@ Rendering of Unicode data streams is typically handled through
"subfont" tables which map a subset of Unicode to glyphs.
Internally
the kernel uses Unicode to describe the subfont loaded in video RAM.
This means that the Linux console in UTF-8 mode one can use a character
This means that in the Linux console in UTF-8 mode, one can use a character
set with 512 different symbols.
This is not enough for Japanese, Chinese and
This is not enough for Japanese, Chinese, and
Korean, but it is enough for most other purposes.
.LP
.SS ISO 2022 and ISO 4873