diff --git a/man7/utf-8.7 b/man7/utf-8.7 index b194b94df..bdf546850 100644 --- a/man7/utf-8.7 +++ b/man7/utf-8.7 @@ -26,7 +26,7 @@ .\" 2001-05-11 Markus Kuhn .\" Update .\" -.TH UTF-8 7 2012-04-30 "GNU" "Linux Programmer's Manual" +.TH UTF-8 7 2014-02-26 "GNU" "Linux Programmer's Manual" .SH NAME UTF-8 \- an ASCII compatible multibyte Unicode encoding .SH DESCRIPTION @@ -37,11 +37,10 @@ The most obvious Unicode encoding (known as .BR UCS-2 ) consists of a sequence of 16-bit words. -Such strings can contain as -parts of many 16-bit characters bytes -like \(aq\\0\(aq or \(aq/\(aq which have a +Such strings can contain\(emas part of many 16-bit characters\(embytes +such as \(aq\\0\(aq or \(aq/\(aq, which have a special meaning in filenames and other C library function arguments. -In addition, the majority of UNIX tools expects ASCII files and can't +In addition, the majority of UNIX tools expect ASCII files and can't read 16-bit words as characters without major modifications. For these reasons, .B UCS-2 @@ -50,7 +49,8 @@ is not a suitable external encoding of in filenames, text files, environment variables, and so on. The .BR "ISO 10646 Universal Character Set (UCS)" , -a superset of Unicode, occupies even a 31-bit code space and the obvious +a superset of Unicode, occupies an even larger code +space\(em31\ bits\(emand the obvious .B UCS-4 encoding for it (a sequence of 32-bit words) has the same problems.