mirror of https://github.com/mkerrisk/man-pages
unicode.7: Minor formatting fixes
There's no need really to boldface names of standards and character sets. Reported-by: Marko Myllynen <myllynen@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
66676c9124
commit
9423e95b07
|
@ -30,13 +30,10 @@
|
|||
.SH NAME
|
||||
Unicode \- universal character set
|
||||
.SH DESCRIPTION
|
||||
The international standard
|
||||
.B ISO 10646
|
||||
defines the
|
||||
.BR "Universal Character Set (UCS)" .
|
||||
The international standard ISO 10646 defines the
|
||||
Universal Character Set (UCS).
|
||||
UCS contains all characters of all other character set standards.
|
||||
It also guarantees
|
||||
.BR "round-trip compatibility";
|
||||
It also guarantees "round-trip compatibility";
|
||||
in other words,
|
||||
conversion tables can be built such that no information is lost
|
||||
when a string is converted from any other encoding to UCS and back.
|
||||
|
@ -74,14 +71,12 @@ made up of 256 8-bit
|
|||
with 256
|
||||
.I column
|
||||
positions, one for each character.
|
||||
Part 1 of the standard
|
||||
.RB ( "ISO 10646-1" )
|
||||
Part 1 of the standard (ISO 10646-1)
|
||||
defines the first 65534 code positions (0x0000 to 0xfffd), which form
|
||||
the
|
||||
.IR "Basic Multilingual Plane (BMP)" ,
|
||||
that is plane 0 in group 0.
|
||||
Part 2 of the standard
|
||||
.RB ( "ISO 10646-2" )
|
||||
Part 2 of the standard (ISO 10646-2)
|
||||
adds characters to group 0 outside the BMP in several
|
||||
.I "supplementary planes"
|
||||
in the range 0x10000 to 0x10ffff.
|
||||
|
@ -97,27 +92,20 @@ dictionary printing, publishing industry, higher-level protocol and
|
|||
enthusiast needs.
|
||||
.PP
|
||||
The representation of each UCS character as a 2-byte word is referred
|
||||
to as the
|
||||
.B UCS-2
|
||||
form (only for BMP characters), whereas
|
||||
.B UCS-4
|
||||
is the representation of each character by a 4-byte word.
|
||||
In addition, there exist two encoding forms
|
||||
.B UTF-8
|
||||
for backward compatibility with ASCII processing software and
|
||||
.B UTF-16
|
||||
to as the UCS-2 form (only for BMP characters),
|
||||
whereas UCS-4 is the representation of each character by a 4-byte word.
|
||||
In addition, there exist two encoding forms UTF-8
|
||||
for backward compatibility with ASCII processing software and UTF-16
|
||||
for the backward-compatible handling of non-BMP characters up to
|
||||
0x10ffff by UCS-2 software.
|
||||
.PP
|
||||
The UCS characters 0x0000 to 0x007f are identical to those of the
|
||||
classic
|
||||
.B US-ASCII
|
||||
classic US-ASCII
|
||||
character set and the characters in the range 0x0000 to 0x00ff
|
||||
are identical to those in
|
||||
.BR "ISO 8859-1 Latin-1" .
|
||||
ISO 8859-1 (Latin-1).
|
||||
.SS Combining characters
|
||||
Some code points in
|
||||
.B UCS
|
||||
Some code points in UCS
|
||||
have been assigned to
|
||||
.IR "combining characters" .
|
||||
These are similar to the nonspacing accent keys on a typewriter.
|
||||
|
@ -143,8 +131,7 @@ combining characters, ISO 10646-1 specifies the following three
|
|||
of UCS:
|
||||
.TP 0.9i
|
||||
Level 1
|
||||
Combining characters and
|
||||
.B Hangul Jamo
|
||||
Combining characters and Hangul Jamo
|
||||
(a variant encoding of the Korean script, where a Hangul syllable
|
||||
glyph is coded as a triplet or pair of vovel/consonant codes) are not
|
||||
supported.
|
||||
|
@ -155,19 +142,13 @@ languages where they are essential (e.g., Thai, Lao, Hebrew,
|
|||
Arabic, Devanagari, Malayalam).
|
||||
.TP
|
||||
Level 3
|
||||
All
|
||||
.B UCS
|
||||
characters are supported.
|
||||
All UCS characters are supported.
|
||||
.PP
|
||||
The
|
||||
.B Unicode 3.0 Standard
|
||||
published by the
|
||||
.B Unicode Consortium
|
||||
contains exactly the
|
||||
.B UCS Basic Multilingual Plane
|
||||
The Unicode 3.0 Standard
|
||||
published by the Unicode Consortium
|
||||
contains exactly the UCS Basic Multilingual Plane
|
||||
at implementation level 3, as described in ISO 10646-1:2000.
|
||||
.B Unicode 3.1
|
||||
added the supplemental planes of ISO 10646-2.
|
||||
Unicode 3.1 added the supplemental planes of ISO 10646-2.
|
||||
The Unicode standard and
|
||||
technical reports published by the Unicode Consortium provide much
|
||||
additional information on the semantics and recommended usages of
|
||||
|
@ -180,8 +161,7 @@ Under GNU/Linux, the C type
|
|||
.I wchar_t
|
||||
is a signed 32-bit integer type.
|
||||
Its values are always interpreted
|
||||
by the C library as
|
||||
.B UCS
|
||||
by the C library as UCS
|
||||
code values (in all locales), a convention that is signaled by the GNU
|
||||
C library to applications by defining the constant
|
||||
.B __STDC_ISO_10646__
|
||||
|
@ -189,9 +169,7 @@ as specified in the ISO C99 standard.
|
|||
|
||||
UCS/Unicode can be used just like ASCII in input/output streams,
|
||||
terminal communication, plaintext files, filenames, and environment
|
||||
variables in the ASCII compatible
|
||||
.B UTF-8
|
||||
multibyte encoding.
|
||||
variables in the ASCII compatible UTF-8 multibyte encoding.
|
||||
To signal the use of UTF-8 as the character
|
||||
encoding to all applications, a suitable
|
||||
.I locale
|
||||
|
@ -236,8 +214,7 @@ Set (UCS) \(em Part 1: Architecture and Basic Multilingual Plane.
|
|||
International Standard ISO/IEC 10646-1, International Organization
|
||||
for Standardization, Geneva, 2000.
|
||||
|
||||
This is the official specification of
|
||||
.BR UCS .
|
||||
This is the official specification of UCS .
|
||||
Available from
|
||||
.UR http://www.iso.ch/
|
||||
.UE .
|
||||
|
|
Loading…
Reference in New Issue