mirror of https://github.com/mkerrisk/man-pages
unicode.7: Update to reflect past developments
The unicode(7) page will look more modern with few small changes: - drop old BUGS section, editors cope with UTF-8 ok these days, and perhaps the state-of-the-art is better described elsewhere anyway than in a man page - drop old suggestion about avoiding combined characters - refer to LANANA for Linux zone, add registry file reference - drop a reference to an inactive/dead mailing list - update some reference URLs Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
c664680afc
commit
79172100b6
|
@ -213,14 +213,6 @@ and
|
||||||
tells, how many positions (0\(en2) the cursor is advanced by the
|
tells, how many positions (0\(en2) the cursor is advanced by the
|
||||||
output of a character.
|
output of a character.
|
||||||
.PP
|
.PP
|
||||||
Under Linux, in general only the BMP at implementation level 1 should
|
|
||||||
be used at the moment.
|
|
||||||
Up to two combining characters per base
|
|
||||||
character for certain scripts (in particular Thai) are also supported
|
|
||||||
by some UTF-8 terminal emulators and ISO 10646 fonts (level 2), but in
|
|
||||||
general precomposed characters should be preferred where available
|
|
||||||
(Unicode calls this
|
|
||||||
.BR "Normalization Form C" ).
|
|
||||||
.SS Private area
|
.SS Private area
|
||||||
In the
|
In the
|
||||||
.BR BMP ,
|
.BR BMP ,
|
||||||
|
@ -232,8 +224,10 @@ range 0xe000 to 0xefff which can be used individually by any end-user
|
||||||
and the Linux zone in the range 0xf000 to 0xf8ff where extensions are
|
and the Linux zone in the range 0xf000 to 0xf8ff where extensions are
|
||||||
coordinated among all Linux users.
|
coordinated among all Linux users.
|
||||||
The registry of the characters
|
The registry of the characters
|
||||||
assigned to the Linux zone is currently maintained by H. Peter Anvin
|
assigned to the Linux zone is maintained by LANANA and the registry
|
||||||
<Peter.Anvin@linux.org>.
|
itself is
|
||||||
|
.I Documentation/unicode.txt
|
||||||
|
in the Linux kernel sources.
|
||||||
.SS Literature
|
.SS Literature
|
||||||
.TP 0.2i
|
.TP 0.2i
|
||||||
*
|
*
|
||||||
|
@ -244,7 +238,7 @@ for Standardization, Geneva, 2000.
|
||||||
|
|
||||||
This is the official specification of
|
This is the official specification of
|
||||||
.BR UCS .
|
.BR UCS .
|
||||||
Available as a PDF file on CD-ROM from
|
Available from
|
||||||
.UR http://www.iso.ch/
|
.UR http://www.iso.ch/
|
||||||
.UE .
|
.UE .
|
||||||
.TP
|
.TP
|
||||||
|
@ -267,7 +261,7 @@ which improved wide and multibyte character support even further.
|
||||||
*
|
*
|
||||||
Unicode Technical Reports.
|
Unicode Technical Reports.
|
||||||
.RS
|
.RS
|
||||||
.UR http://www.unicode.org\:/unicode\:/reports/
|
.UR http://www.unicode.org\:/reports/
|
||||||
.UE
|
.UE
|
||||||
.RE
|
.RE
|
||||||
.TP
|
.TP
|
||||||
|
@ -276,39 +270,18 @@ Markus Kuhn: UTF-8 and Unicode FAQ for UNIX/Linux.
|
||||||
.RS
|
.RS
|
||||||
.UR http://www.cl.cam.ac.uk\:/~mgk25\:/unicode.html
|
.UR http://www.cl.cam.ac.uk\:/~mgk25\:/unicode.html
|
||||||
.UE
|
.UE
|
||||||
|
|
||||||
Provides subscription information for the
|
|
||||||
.I linux-utf8
|
|
||||||
mailing list, which is the best place to look for advice on using
|
|
||||||
Unicode under Linux.
|
|
||||||
.RE
|
.RE
|
||||||
.TP
|
.TP
|
||||||
*
|
*
|
||||||
Bruno Haible: Unicode HOWTO.
|
Bruno Haible: Unicode HOWTO.
|
||||||
.RS
|
.RS
|
||||||
.UR ftp://ftp.ilog.fr\:/pub\:/Users\:/haible\:/utf8\:/Unicode-HOWTO.html
|
.UR http://www.tldp.org\:/HOWTO\:/Unicode-HOWTO.html
|
||||||
.UE
|
.UE
|
||||||
.RE
|
.RE
|
||||||
.SH BUGS
|
|
||||||
When this man page was last revised, the GNU C Library support for
|
|
||||||
.B UTF-8
|
|
||||||
locales was mature and XFree86 support was in an advanced state, but
|
|
||||||
work on making applications (most notably editors) suitable for use in
|
|
||||||
.B UTF-8
|
|
||||||
locales was still fully in progress.
|
|
||||||
Current general
|
|
||||||
.B UCS
|
|
||||||
support under Linux usually provides for CJK double-width characters
|
|
||||||
and sometimes even simple overstriking combining characters, but
|
|
||||||
usually does not include support for scripts with right-to-left
|
|
||||||
writing direction or ligature substitution requirements such as
|
|
||||||
Hebrew, Arabic, or the Indic scripts.
|
|
||||||
These scripts are currently
|
|
||||||
supported only in certain GUI applications (HTML viewers, word processors)
|
|
||||||
with sophisticated text rendering engines.
|
|
||||||
.\" .SH AUTHOR
|
.\" .SH AUTHOR
|
||||||
.\" Markus Kuhn <mgk25@cl.cam.ac.uk>
|
.\" Markus Kuhn <mgk25@cl.cam.ac.uk>
|
||||||
.SH SEE ALSO
|
.SH SEE ALSO
|
||||||
|
.BR locale (1),
|
||||||
.BR setlocale (3),
|
.BR setlocale (3),
|
||||||
.BR charsets (7),
|
.BR charsets (7),
|
||||||
.BR utf-8 (7)
|
.BR utf-8 (7)
|
||||||
|
|
Loading…
Reference in New Issue