From 4550bf19cf63901e5c605f641b0fb88eabc5129c Mon Sep 17 00:00:00 2001
From: "Brian M. Carlson" <sandals@crustytoothpaste.ath.cx>
Date: Mon, 30 Apr 2012 10:24:06 +1200
Subject: [PATCH] utf-8.7: Two clarifications

This patch clarifies that 0xc0 and 0xc1 are not valid in any UTF-8
encoding[0], and it also references RFC 3629 instead of RFC 2279.

[0] In order to have 0xc0, you'd have to have a two-byte encoding
with all the data bits zero in the first byte (and thus only six
bits of data), which would be an ASCII character encoded in the
non-shortest form.  Similarly with 0xc1.

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=538641

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
---
 man7/utf-8.7 | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/man7/utf-8.7 b/man7/utf-8.7
index ad159ae5b..392d90fc8 100644
--- a/man7/utf-8.7
+++ b/man7/utf-8.7
@@ -27,7 +27,7 @@
 .\" 2001-05-11  Markus Kuhn <mgk25@cl.cam.ac.uk>
 .\"      Update
 .\"
-.TH UTF-8 7 2001-05-11 "GNU" "Linux Programmer's Manual"
+.TH UTF-8 7 2012-04-30 "GNU" "Linux Programmer's Manual"
 .SH NAME
 UTF-8 \- an ASCII compatible multibyte Unicode encoding
 .SH DESCRIPTION
@@ -99,14 +99,14 @@ All possible 2^31 UCS codes can be encoded using
 .BR UTF-8 .
 .TP
 *
-The bytes 0xfe and 0xff are never used in the
+The bytes 0xc0, 0xc1, 0xfe and 0xff are never used in the
 .B UTF-8
 encoding.
 .TP
 *
 The first byte of a multibyte sequence which represents a single non-ASCII
 .B UCS
-character is always in the range 0xc0 to 0xfd and indicates how long
+character is always in the range 0xc2 to 0xfd and indicates how long
 this multibyte sequence is.
 All further bytes in a multibyte sequence
 are in the range 0x80 to 0xbf.
@@ -288,7 +288,7 @@ ways to represent these things in a non-shortest
 .B UTF-8
 encoding.
 .SS Standards
-ISO/IEC 10646-1:2000, Unicode 3.1, RFC\ 2279, Plan 9.
+ISO/IEC 10646-1:2000, Unicode 3.1, RFC\ 3629, Plan 9.
 .\" .SH AUTHOR
 .\" Markus Kuhn <mgk25@cl.cam.ac.uk>
 .SH "SEE ALSO"