charmap.5: Update to match current glibc

charmap(5) was outdated, bring it to closer to reality by fixing
syntax descriptions to match current glibc code and practices,
adding missing options, removing obsolete comments and references,
and removing now incorrect examples.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Marko Myllynen 2014-06-04 08:53:18 +03:00 committed by Michael Kerrisk
parent d98127cdea
commit 83d1d0dd86
1 changed files with 73 additions and 89 deletions

View File

@ -1,5 +1,3 @@
.\" This file is part of locale(1) which displays the settings of the
.\" current locale.
.\" Copyright (C) 1994 Jochen Hein (Hein@Student.TU-Clausthal.de)
.\"
.\" %%%LICENSE_START(GPLv2+_SW_3_PARA)
@ -18,112 +16,98 @@
.\" <http://www.gnu.org/licenses/>.
.\" %%%LICENSE_END
.\"
.TH CHARMAP 5 1994-11-28 "" "Linux User Manual"
.TH CHARMAP 5 2014-06-02 "GNU" "Linux Programmer's Manual"
.SH NAME
charmap \- character symbols to define character encodings
charmap \- characters to define character sets
.SH DESCRIPTION
A character set description (charmap) defines a character set of
available characters and their encodings.
All supported character
sets should have the
.B portable character set
as a proper subset.
.\" Not true anymore:
.\" The portable character set is defined in the file
.\" .I /usr/lib/nls/charmap/POSIX
.\" .I /usr/share/i18n/charmap/POSIX
.\" for reference purposes.
A character set description (charmap) defines all available characters
and their encodings in a character set.
All ISO C compliant character sets should have
the ASCII character set as a proper subset.
.SS Syntax
The charmap file starts with a header, that may consist of the
The charmap file starts with a header that may consist of the
following keywords:
.TP
.I <codeset>
is followed by the name of the codeset.
.TP
.I <mb_cur_max>
is followed by the max number of bytes for a multibyte-character.
Multibyte characters are currently not supported.
The default value
is 1.
.TP
.I <mb_cur_min>
is followed by the min number of bytes for a character.
This
value must be less than or equal than
.BR mb_cur_max .
If not specified, it defaults to
.BR mb_cur_max .
.TP
.I <escape_char>
is followed by a character that should be used as the
escape-character for the rest of the file to mark characters that
should be interpreted in a special way.
It defaults to
the backslash (
.B \\\\
).
.I <code_set_name>
is followed by the name of the character map.
.TP
.I <comment_char>
is followed by a character that will be used as the
comment-character for the rest of the file.
It defaults to the
number sign (
.B #
).
is followed by a character that will be used as the comment character
for the rest of the file.
It defaults to the number sign (#).
.TP
.I <escape_char>
is followed by a character that should be used as the escape character
for the rest of the file to mark characters that should be interpreted
in a special way.
It defaults to the backslash (\\).
.TP
.I <mb_cur_max>
is followed by the maximum number of bytes for a character.
The default value is 1.
.TP
.I <mb_cur_min>
is followed by the minimum number of bytes for a character.
This value must be less than or equal than
.IR mb_cur_max .
If not specified, it defaults to
.IR mb_cur_max .
.PP
The charmap-definition itself starts with the keyword
The character set definition section starts with the keyword
.B CHARMAP
in column 1.
in the first column.
The following lines may have one of the two following forms to
define the character-encodings:
define the character set:
.TP
.I <symbolic-name> <encoding> <comments>
This form defines exactly one character and its encoding.
.I <character> <byte-sequence> <comment>
This form defines exactly one character and its byte sequence,
.I <comment>
being optional.
.TP
.I <symbolic-name>...<symbolic-name> <encoding> <comments>
This form defines a couple of characters.
This is useful only for
multibyte-characters, which are currently not implemented.
.I <character>..<character> <byte-sequence> <comment>
This form defines a character range and its byte sequence,
.I <comment>
being optional.
.PP
The last line in a charmap-definition file must contain
.B END CHARMAP.
.SS Symbolic names
A
.B symbolic name
for a character contains only characters of the
.B portable character set.
The name itself is enclosed between angle brackets.
Characters following an
.B <escape_char>
are interpreted as itself; for example, the sequence
.B "<\\\\\\\\\\\\>>"
represents the symbolic name
.B "\\\\>"
enclosed in angle brackets.
.SS Character encoding
The
encoding may be in each of the following three forms:
The character set definition section ends with the string
.IR "END CHARMAP" .
.PP
The character set definition section may optionally be followed by a
section to define widths of characters.
.PP
The width section starts with the keyword
.B WIDTH
in the first column.
The following lines may have one of the two following forms to
define the widths of the characters:
.TP
.I <escape_char>d<number>
with a decimal number
.I <character> <width>
This form defines the width of exactly one character.
.TP
.I <escape_char>x<number>
with a hexadecimal number
.TP
.I <escape_char><number>
with an octal number.
.\" FIXME comments
.\" FIXME char ... char
.I <character>...<character> <width>
This form defines the width for all the characters in the range.
.PP
The width definition section ends with the string
.IR "END WIDTH" .
.SH FILES
.I /usr/share/i18n/charmaps/*
.\" .SH AUTHOR
.\" Jochen Hein (jochen.hein@delphi.central.de)
.TP
.I /usr/share/i18n/charmaps
Usual default character map path.
.SH CONFORMING TO
POSIX.2.
.SH EXAMPLE
The Euro sign is defined as follows in the
.I UTF\-8
charmap:
.PP
.nf
<U20AC> /xe2/x82/xac
.fi
.SH SEE ALSO
.BR iconv (1),
.BR locale (1),
.BR localedef (1),
.BR localeconv (3),
.BR setlocale (3),
.BR locale (5)
.BR locale (5),
.BR charsets (7)