unicode.7: Minor formatting fixes

There's no need really to boldface names of standards and
character sets.

Reported-by: Marko Myllynen <myllynen@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2014-06-24 11:57:00 +02:00
parent 66676c9124
commit 9423e95b07
1 changed files with 21 additions and 44 deletions

View File

@ -30,13 +30,10 @@
.SH NAME
Unicode \- universal character set
.SH DESCRIPTION
The international standard
.B ISO 10646
defines the
.BR "Universal Character Set (UCS)" .
The international standard ISO 10646 defines the
Universal Character Set (UCS).
UCS contains all characters of all other character set standards.
It also guarantees
.BR "round-trip compatibility";
It also guarantees "round-trip compatibility";
in other words,
conversion tables can be built such that no information is lost
when a string is converted from any other encoding to UCS and back.
@ -74,14 +71,12 @@ made up of 256 8-bit
with 256
.I column
positions, one for each character.
Part 1 of the standard
.RB ( "ISO 10646-1" )
Part 1 of the standard (ISO 10646-1)
defines the first 65534 code positions (0x0000 to 0xfffd), which form
the
.IR "Basic Multilingual Plane (BMP)" ,
that is plane 0 in group 0.
Part 2 of the standard
.RB ( "ISO 10646-2" )
Part 2 of the standard (ISO 10646-2)
adds characters to group 0 outside the BMP in several
.I "supplementary planes"
in the range 0x10000 to 0x10ffff.
@ -97,27 +92,20 @@ dictionary printing, publishing industry, higher-level protocol and
enthusiast needs.
.PP
The representation of each UCS character as a 2-byte word is referred
to as the
.B UCS-2
form (only for BMP characters), whereas
.B UCS-4
is the representation of each character by a 4-byte word.
In addition, there exist two encoding forms
.B UTF-8
for backward compatibility with ASCII processing software and
.B UTF-16
to as the UCS-2 form (only for BMP characters),
whereas UCS-4 is the representation of each character by a 4-byte word.
In addition, there exist two encoding forms UTF-8
for backward compatibility with ASCII processing software and UTF-16
for the backward-compatible handling of non-BMP characters up to
0x10ffff by UCS-2 software.
.PP
The UCS characters 0x0000 to 0x007f are identical to those of the
classic
.B US-ASCII
classic US-ASCII
character set and the characters in the range 0x0000 to 0x00ff
are identical to those in
.BR "ISO 8859-1 Latin-1" .
ISO 8859-1 (Latin-1).
.SS Combining characters
Some code points in
.B UCS
Some code points in UCS
have been assigned to
.IR "combining characters" .
These are similar to the nonspacing accent keys on a typewriter.
@ -143,8 +131,7 @@ combining characters, ISO 10646-1 specifies the following three
of UCS:
.TP 0.9i
Level 1
Combining characters and
.B Hangul Jamo
Combining characters and Hangul Jamo
(a variant encoding of the Korean script, where a Hangul syllable
glyph is coded as a triplet or pair of vovel/consonant codes) are not
supported.
@ -155,19 +142,13 @@ languages where they are essential (e.g., Thai, Lao, Hebrew,
Arabic, Devanagari, Malayalam).
.TP
Level 3
All
.B UCS
characters are supported.
All UCS characters are supported.
.PP
The
.B Unicode 3.0 Standard
published by the
.B Unicode Consortium
contains exactly the
.B UCS Basic Multilingual Plane
The Unicode 3.0 Standard
published by the Unicode Consortium
contains exactly the UCS Basic Multilingual Plane
at implementation level 3, as described in ISO 10646-1:2000.
.B Unicode 3.1
added the supplemental planes of ISO 10646-2.
Unicode 3.1 added the supplemental planes of ISO 10646-2.
The Unicode standard and
technical reports published by the Unicode Consortium provide much
additional information on the semantics and recommended usages of
@ -180,8 +161,7 @@ Under GNU/Linux, the C type
.I wchar_t
is a signed 32-bit integer type.
Its values are always interpreted
by the C library as
.B UCS
by the C library as UCS
code values (in all locales), a convention that is signaled by the GNU
C library to applications by defining the constant
.B __STDC_ISO_10646__
@ -189,9 +169,7 @@ as specified in the ISO C99 standard.
UCS/Unicode can be used just like ASCII in input/output streams,
terminal communication, plaintext files, filenames, and environment
variables in the ASCII compatible
.B UTF-8
multibyte encoding.
variables in the ASCII compatible UTF-8 multibyte encoding.
To signal the use of UTF-8 as the character
encoding to all applications, a suitable
.I locale
@ -236,8 +214,7 @@ Set (UCS) \(em Part 1: Architecture and Basic Multilingual Plane.
International Standard ISO/IEC 10646-1, International Organization
for Standardization, Geneva, 2000.
This is the official specification of
.BR UCS .
This is the official specification of UCS .
Available from
.UR http://www.iso.ch/
.UE .