unicode.7: Minor formatting fixes

There's no need really to boldface names of standards and character sets. Reported-by: Marko Myllynen <myllynen@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2014-06-24 11:57:00 +02:00 · 2014-06-24 11:57:00 +02:00 · 9423e95b07
parent 66676c9124
commit 9423e95b07
1 changed files with 21 additions and 44 deletions
--- a/man7/unicode.7
+++ b/man7/unicode.7
@ -30,13 +30,10 @@
 .SH NAME
 Unicode \- universal character set
 .SH DESCRIPTION
-The international standard
+The international standard ISO 10646 defines the
-.B ISO 10646
+Universal Character Set (UCS).
 defines the
 .BR "Universal Character Set (UCS)" .
 UCS contains all characters of all other character set standards.
-It also guarantees
+It also guarantees "round-trip compatibility";
 .BR "round-trip compatibility";
 in other words,
 conversion tables can be built such that no information is lost
 when a string is converted from any other encoding to UCS and back.
@ -74,14 +71,12 @@ made up of 256 8-bit
 with 256
 .I column
 positions, one for each character.
-Part 1 of the standard
+Part 1 of the standard (ISO 10646-1)
 .RB ( "ISO 10646-1" )
 defines the first 65534 code positions (0x0000 to 0xfffd), which form
 the
 .IR "Basic Multilingual Plane (BMP)" ,
 that is plane 0 in group 0.
-Part 2 of the standard
+Part 2 of the standard (ISO 10646-2)
 .RB ( "ISO 10646-2" )
 adds characters to group 0 outside the BMP in several
 .I "supplementary planes"
 in the range 0x10000 to 0x10ffff.
@ -97,27 +92,20 @@ dictionary printing, publishing industry, higher-level protocol and
 enthusiast needs.
 .PP
 The representation of each UCS character as a 2-byte word is referred
-to as the
+to as the UCS-2 form (only for BMP characters),
-.B UCS-2
+whereas UCS-4 is the representation of each character by a 4-byte word.
-form (only for BMP characters), whereas
+In addition, there exist two encoding forms UTF-8
-.B UCS-4
+for backward compatibility with ASCII processing software and UTF-16
 is the representation of each character by a 4-byte word.
 In addition, there exist two encoding forms
 .B UTF-8
 for backward compatibility with ASCII processing software and
 .B UTF-16
 for the backward-compatible handling of non-BMP characters up to
 0x10ffff by UCS-2 software.
 .PP
 The UCS characters 0x0000 to 0x007f are identical to those of the
-classic
+classic US-ASCII
 .B US-ASCII
 character set and the characters in the range 0x0000 to 0x00ff
 are identical to those in
-.BR "ISO 8859-1 Latin-1" .
+ISO 8859-1 (Latin-1).
 .SS Combining characters
-Some code points in
+Some code points in UCS
 .B UCS
 have been assigned to
 .IR "combining characters" .
 These are similar to the nonspacing accent keys on a typewriter.
@ -143,8 +131,7 @@ combining characters, ISO 10646-1 specifies the following three
 of UCS:
 .TP 0.9i
 Level 1
-Combining characters and
+Combining characters and Hangul Jamo
 .B Hangul Jamo
 (a variant encoding of the Korean script, where a Hangul syllable
 glyph is coded as a triplet or pair of vovel/consonant codes) are not
 supported.
@ -155,19 +142,13 @@ languages where they are essential (e.g., Thai, Lao, Hebrew,
 Arabic, Devanagari, Malayalam).
 .TP
 Level 3
-All
+All UCS characters are supported.
 .B UCS
 characters are supported.
 .PP
-The
+The Unicode 3.0 Standard
-.B Unicode 3.0 Standard
+published by the Unicode Consortium
-published by the
+contains exactly the UCS Basic Multilingual Plane
 .B Unicode Consortium
 contains exactly the
 .B UCS Basic Multilingual Plane
 at implementation level 3, as described in ISO 10646-1:2000.
-.B Unicode 3.1
+Unicode 3.1 added the supplemental planes of ISO 10646-2.
 added the supplemental planes of ISO 10646-2.
 The Unicode standard and
 technical reports published by the Unicode Consortium provide much
 additional information on the semantics and recommended usages of
@ -180,8 +161,7 @@ Under GNU/Linux, the C type
 .I wchar_t
 is a signed 32-bit integer type.
 Its values are always interpreted
-by the C library as
+by the C library as UCS
 .B UCS
 code values (in all locales), a convention that is signaled by the GNU
 C library to applications by defining the constant
 .B __STDC_ISO_10646__
@ -189,9 +169,7 @@ as specified in the ISO C99 standard.
 UCS/Unicode can be used just like ASCII in input/output streams,
 terminal communication, plaintext files, filenames, and environment
-variables in the ASCII compatible
+variables in the ASCII compatible UTF-8 multibyte encoding.
 .B UTF-8
 multibyte encoding.
 To signal the use of UTF-8 as the character
 encoding to all applications, a suitable
 .I locale
@ -236,8 +214,7 @@ Set (UCS) \(em Part 1: Architecture and Basic Multilingual Plane.
 International Standard ISO/IEC 10646-1, International Organization
 for Standardization, Geneva, 2000.
-This is the official specification of
+This is the official specification of UCS .
 .BR UCS .
 Available from
 .UR http://www.iso.ch/
 .UE .