mirror of https://github.com/mkerrisk/man-pages
unicode.7: Minor formatting fixes
There's no need really to boldface names of standards and character sets. Reported-by: Marko Myllynen <myllynen@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
66676c9124
commit
9423e95b07
|
@ -30,13 +30,10 @@
|
||||||
.SH NAME
|
.SH NAME
|
||||||
Unicode \- universal character set
|
Unicode \- universal character set
|
||||||
.SH DESCRIPTION
|
.SH DESCRIPTION
|
||||||
The international standard
|
The international standard ISO 10646 defines the
|
||||||
.B ISO 10646
|
Universal Character Set (UCS).
|
||||||
defines the
|
|
||||||
.BR "Universal Character Set (UCS)" .
|
|
||||||
UCS contains all characters of all other character set standards.
|
UCS contains all characters of all other character set standards.
|
||||||
It also guarantees
|
It also guarantees "round-trip compatibility";
|
||||||
.BR "round-trip compatibility";
|
|
||||||
in other words,
|
in other words,
|
||||||
conversion tables can be built such that no information is lost
|
conversion tables can be built such that no information is lost
|
||||||
when a string is converted from any other encoding to UCS and back.
|
when a string is converted from any other encoding to UCS and back.
|
||||||
|
@ -74,14 +71,12 @@ made up of 256 8-bit
|
||||||
with 256
|
with 256
|
||||||
.I column
|
.I column
|
||||||
positions, one for each character.
|
positions, one for each character.
|
||||||
Part 1 of the standard
|
Part 1 of the standard (ISO 10646-1)
|
||||||
.RB ( "ISO 10646-1" )
|
|
||||||
defines the first 65534 code positions (0x0000 to 0xfffd), which form
|
defines the first 65534 code positions (0x0000 to 0xfffd), which form
|
||||||
the
|
the
|
||||||
.IR "Basic Multilingual Plane (BMP)" ,
|
.IR "Basic Multilingual Plane (BMP)" ,
|
||||||
that is plane 0 in group 0.
|
that is plane 0 in group 0.
|
||||||
Part 2 of the standard
|
Part 2 of the standard (ISO 10646-2)
|
||||||
.RB ( "ISO 10646-2" )
|
|
||||||
adds characters to group 0 outside the BMP in several
|
adds characters to group 0 outside the BMP in several
|
||||||
.I "supplementary planes"
|
.I "supplementary planes"
|
||||||
in the range 0x10000 to 0x10ffff.
|
in the range 0x10000 to 0x10ffff.
|
||||||
|
@ -97,27 +92,20 @@ dictionary printing, publishing industry, higher-level protocol and
|
||||||
enthusiast needs.
|
enthusiast needs.
|
||||||
.PP
|
.PP
|
||||||
The representation of each UCS character as a 2-byte word is referred
|
The representation of each UCS character as a 2-byte word is referred
|
||||||
to as the
|
to as the UCS-2 form (only for BMP characters),
|
||||||
.B UCS-2
|
whereas UCS-4 is the representation of each character by a 4-byte word.
|
||||||
form (only for BMP characters), whereas
|
In addition, there exist two encoding forms UTF-8
|
||||||
.B UCS-4
|
for backward compatibility with ASCII processing software and UTF-16
|
||||||
is the representation of each character by a 4-byte word.
|
|
||||||
In addition, there exist two encoding forms
|
|
||||||
.B UTF-8
|
|
||||||
for backward compatibility with ASCII processing software and
|
|
||||||
.B UTF-16
|
|
||||||
for the backward-compatible handling of non-BMP characters up to
|
for the backward-compatible handling of non-BMP characters up to
|
||||||
0x10ffff by UCS-2 software.
|
0x10ffff by UCS-2 software.
|
||||||
.PP
|
.PP
|
||||||
The UCS characters 0x0000 to 0x007f are identical to those of the
|
The UCS characters 0x0000 to 0x007f are identical to those of the
|
||||||
classic
|
classic US-ASCII
|
||||||
.B US-ASCII
|
|
||||||
character set and the characters in the range 0x0000 to 0x00ff
|
character set and the characters in the range 0x0000 to 0x00ff
|
||||||
are identical to those in
|
are identical to those in
|
||||||
.BR "ISO 8859-1 Latin-1" .
|
ISO 8859-1 (Latin-1).
|
||||||
.SS Combining characters
|
.SS Combining characters
|
||||||
Some code points in
|
Some code points in UCS
|
||||||
.B UCS
|
|
||||||
have been assigned to
|
have been assigned to
|
||||||
.IR "combining characters" .
|
.IR "combining characters" .
|
||||||
These are similar to the nonspacing accent keys on a typewriter.
|
These are similar to the nonspacing accent keys on a typewriter.
|
||||||
|
@ -143,8 +131,7 @@ combining characters, ISO 10646-1 specifies the following three
|
||||||
of UCS:
|
of UCS:
|
||||||
.TP 0.9i
|
.TP 0.9i
|
||||||
Level 1
|
Level 1
|
||||||
Combining characters and
|
Combining characters and Hangul Jamo
|
||||||
.B Hangul Jamo
|
|
||||||
(a variant encoding of the Korean script, where a Hangul syllable
|
(a variant encoding of the Korean script, where a Hangul syllable
|
||||||
glyph is coded as a triplet or pair of vovel/consonant codes) are not
|
glyph is coded as a triplet or pair of vovel/consonant codes) are not
|
||||||
supported.
|
supported.
|
||||||
|
@ -155,19 +142,13 @@ languages where they are essential (e.g., Thai, Lao, Hebrew,
|
||||||
Arabic, Devanagari, Malayalam).
|
Arabic, Devanagari, Malayalam).
|
||||||
.TP
|
.TP
|
||||||
Level 3
|
Level 3
|
||||||
All
|
All UCS characters are supported.
|
||||||
.B UCS
|
|
||||||
characters are supported.
|
|
||||||
.PP
|
.PP
|
||||||
The
|
The Unicode 3.0 Standard
|
||||||
.B Unicode 3.0 Standard
|
published by the Unicode Consortium
|
||||||
published by the
|
contains exactly the UCS Basic Multilingual Plane
|
||||||
.B Unicode Consortium
|
|
||||||
contains exactly the
|
|
||||||
.B UCS Basic Multilingual Plane
|
|
||||||
at implementation level 3, as described in ISO 10646-1:2000.
|
at implementation level 3, as described in ISO 10646-1:2000.
|
||||||
.B Unicode 3.1
|
Unicode 3.1 added the supplemental planes of ISO 10646-2.
|
||||||
added the supplemental planes of ISO 10646-2.
|
|
||||||
The Unicode standard and
|
The Unicode standard and
|
||||||
technical reports published by the Unicode Consortium provide much
|
technical reports published by the Unicode Consortium provide much
|
||||||
additional information on the semantics and recommended usages of
|
additional information on the semantics and recommended usages of
|
||||||
|
@ -180,8 +161,7 @@ Under GNU/Linux, the C type
|
||||||
.I wchar_t
|
.I wchar_t
|
||||||
is a signed 32-bit integer type.
|
is a signed 32-bit integer type.
|
||||||
Its values are always interpreted
|
Its values are always interpreted
|
||||||
by the C library as
|
by the C library as UCS
|
||||||
.B UCS
|
|
||||||
code values (in all locales), a convention that is signaled by the GNU
|
code values (in all locales), a convention that is signaled by the GNU
|
||||||
C library to applications by defining the constant
|
C library to applications by defining the constant
|
||||||
.B __STDC_ISO_10646__
|
.B __STDC_ISO_10646__
|
||||||
|
@ -189,9 +169,7 @@ as specified in the ISO C99 standard.
|
||||||
|
|
||||||
UCS/Unicode can be used just like ASCII in input/output streams,
|
UCS/Unicode can be used just like ASCII in input/output streams,
|
||||||
terminal communication, plaintext files, filenames, and environment
|
terminal communication, plaintext files, filenames, and environment
|
||||||
variables in the ASCII compatible
|
variables in the ASCII compatible UTF-8 multibyte encoding.
|
||||||
.B UTF-8
|
|
||||||
multibyte encoding.
|
|
||||||
To signal the use of UTF-8 as the character
|
To signal the use of UTF-8 as the character
|
||||||
encoding to all applications, a suitable
|
encoding to all applications, a suitable
|
||||||
.I locale
|
.I locale
|
||||||
|
@ -236,8 +214,7 @@ Set (UCS) \(em Part 1: Architecture and Basic Multilingual Plane.
|
||||||
International Standard ISO/IEC 10646-1, International Organization
|
International Standard ISO/IEC 10646-1, International Organization
|
||||||
for Standardization, Geneva, 2000.
|
for Standardization, Geneva, 2000.
|
||||||
|
|
||||||
This is the official specification of
|
This is the official specification of UCS .
|
||||||
.BR UCS .
|
|
||||||
Available from
|
Available from
|
||||||
.UR http://www.iso.ch/
|
.UR http://www.iso.ch/
|
||||||
.UE .
|
.UE .
|
||||||
|
|
Loading…
Reference in New Issue