utf-8.7: Minor rewordings in the opening paragraph

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2014-02-26 10:55:52 +01:00
parent 99c2f1a20f
commit 76f6db57c7
1 changed files with 6 additions and 6 deletions

View File

@ -26,7 +26,7 @@
.\" 2001-05-11 Markus Kuhn <mgk25@cl.cam.ac.uk> .\" 2001-05-11 Markus Kuhn <mgk25@cl.cam.ac.uk>
.\" Update .\" Update
.\" .\"
.TH UTF-8 7 2012-04-30 "GNU" "Linux Programmer's Manual" .TH UTF-8 7 2014-02-26 "GNU" "Linux Programmer's Manual"
.SH NAME .SH NAME
UTF-8 \- an ASCII compatible multibyte Unicode encoding UTF-8 \- an ASCII compatible multibyte Unicode encoding
.SH DESCRIPTION .SH DESCRIPTION
@ -37,11 +37,10 @@ The most obvious
Unicode encoding (known as Unicode encoding (known as
.BR UCS-2 ) .BR UCS-2 )
consists of a sequence of 16-bit words. consists of a sequence of 16-bit words.
Such strings can contain as Such strings can contain\(emas part of many 16-bit characters\(embytes
parts of many 16-bit characters bytes such as \(aq\\0\(aq or \(aq/\(aq, which have a
like \(aq\\0\(aq or \(aq/\(aq which have a
special meaning in filenames and other C library function arguments. special meaning in filenames and other C library function arguments.
In addition, the majority of UNIX tools expects ASCII files and can't In addition, the majority of UNIX tools expect ASCII files and can't
read 16-bit words as characters without major modifications. read 16-bit words as characters without major modifications.
For these reasons, For these reasons,
.B UCS-2 .B UCS-2
@ -50,7 +49,8 @@ is not a suitable external encoding of
in filenames, text files, environment variables, and so on. in filenames, text files, environment variables, and so on.
The The
.BR "ISO 10646 Universal Character Set (UCS)" , .BR "ISO 10646 Universal Character Set (UCS)" ,
a superset of Unicode, occupies even a 31-bit code space and the obvious a superset of Unicode, occupies an even larger code
space\(em31\ bits\(emand the obvious
.B UCS-4 .B UCS-4
encoding for it (a sequence of 32-bit words) has the same problems. encoding for it (a sequence of 32-bit words) has the same problems.