pico
+As of version 4.30, pine cannot be reasonably used to view or edit UTF-8
+files. In UTF-8 enabled xterm, it has severe redraw problems.
+
mined98
mined98 is a small text editor by Michiel Huisjes, Achim Müller and
Thomas Wolff.
-
+
It lets you edit UTF-8 or 8-bit encoded files, in an UTF-8 or 8-bit xterm.
It also has powerful capabilities for entering Unicode characters.
@@ -1353,10 +1528,20 @@ interpretation at any time from within the editor: It displays the encoding
of these characters to change it.
mined knows about double-width and combining characters and displays them
-correctly.
+correctly. It also has a special display mode for combining characters.
-mined also has very nice pull-down menus. Alas, the "Home", "End", "Delete"
-keys do not work.
+mined also has a scrollbar and very nice pull-down menus. Alas, the "Home",
+"End", "Delete" keys do not work.
+
+qemacs
+
+
+qemacs 0.2 is a small text editor by Fabrice Bellard.
+
+with Emacs keybindings. It runs in an UTF-8 console or xterm, and can edit
+both 8-bit encoded and UTF-8 encoded files. It still has a few rough edges,
+but further development is underway.
Mailers
@@ -1388,7 +1573,7 @@ Now about the individual mail clients (or "mail user agents"):
pine
-The situation for an unpatched pine version 4.10 is as follows.
+The situation for an unpatched pine version 4.30 is as follows.
Pine does not do character set conversions. But it allows you to view
UTF-8 mails in an UTF-8 text window (Linux console or xterm).
@@ -1404,8 +1589,8 @@ will display Latin and Greek characters, but not other kinds of Unicode
characters.
A patch by Robert Brady
-
+
adds UTF-8 support to Pine. With this patch, it decodes and prints headers
@@ -1448,13 +1633,14 @@ the "Unicode" font category.
mutt
-mutt-1.0, as available from
+mutt-1.2.x, as available from
,
-contains only rudimentary UTF-8 support. For full UTF-8 support, there are
-patches by Edmund Grimley Evans at
-.
+has only rudimentary support for UTF-8: it can convert
+from UTF-8 into an 8-bit display charset. The mutt-1.3.x
+development branch also supports UTF-8 as the display charset,
+so you can run Mutt in an UTF-8 xterm, and has thorough support
+for MIME and charset conversion (relying on iconv).
exmh
@@ -1481,7 +1667,7 @@ exmh.mime_utf-8_title_families: fixed
groff
-groff 1.16, the GNU implementation of the traditional Unix text processing
+groff 1.16.1, the GNU implementation of the traditional Unix text processing
system troff/nroff, can output UTF-8 formatted text. Simply use
`groff -Tutf8' instead of `groff -Tlatin1' or
`groff -Tascii'.
@@ -1527,6 +1713,12 @@ Other maybe related links:
PostgreSQL 6.4 or newer can be built with the configuration option
--with-mb=UNICODE.
+Interbase
+
+
+Borland/Inprise's Interbase 6.0 can store string fields in UTF-8 format
+if the option "CHARACTER SET UNICODE_FSS" is given.
+
Other text-mode applications
@@ -1546,9 +1738,9 @@ environment variable.
lv
-lv-4.21 by Tomio Narita
-
+lv-4.49.3 by Tomio Narita
+
is a file viewer with builtin character set converters. To view UTF-8 files
in an UTF-8 console, use "lv -Au8". But it can also be used to view
files in other CJK encodings in an UTF-8 console.
@@ -1556,15 +1748,14 @@ files in other CJK encodings in an UTF-8 console.
There is a small glitch: lv turns off xterm's cursor and doesn't turn it on
again.
-expand, wc
+expand
Get the GNU textutils-2.0 and apply the patch
,
-then configure, add "#define HAVE_MBRTOWC 1", "#define HAVE_FGETWC 1",
-"#define HAVE_FPUTWC 1" to config.h. In src/Makefile, modify CFLAGS and LDFLAGS
-so that they include the directories where libutf8 is installed. Then rebuild.
+then configure, add "#define HAVE_FGETWC 1", "#define HAVE_FPUTWC 1" to
+config.h. Then rebuild.
col, colcrt, colrm, column, rev, ul
@@ -1586,9 +1777,9 @@ The Li18nux list of commands and utilities that ought to be made interoperable
with UTF-8 is as follows. Useful information needs to get added here; I just
didn't get around it yet :-)
-As of glibc-2.2, regular expressions will only work for 8-bit characters.
+As of glibc-2.2, regular expressions only work for 8-bit characters.
In an UTF-8 locale, regular expressions that contain non-ASCII characters
-or that expect to match a single multibyte character with "." will not work.
+or that expect to match a single multibyte character with "." do not work.
This affects all commands and utilities listed below.
-
alias
No info available yet.
ar
@@ -1615,13 +1802,13 @@ This affects all commands and utilities listed below.
No info available yet.
arp
No info available yet.
-asa
- No info available yet.
at
As of at-3.1.8: The two uses of isalnum in at.c are invalid and should be
replaced with a use of quotearg.c or an exclude list of the (fixed) list
of shell metacharacters. The two uses of %8s in at.c and atd.c are invalid
and should become arbitrary length.
+awk
+ No info available yet.
basename
As of sh-utils-2.0i: OK.
batch
@@ -1639,8 +1826,6 @@ This affects all commands and utilities listed below.
cal
No info available yet.
@@ -1676,58 +1861,41 @@ This affects all commands and utilities listed below.
As of fileutils-4.0u: OK.
cpio
No info available yet.
+crontab
+ No info available yet.
csplit
No info available yet.
ctags
No info available yet.
-crontab
- No info available yet.
-
cut
No info available yet.
-
date
As of sh-utils-2.0i: OK.
dd
As of fileutils-4.0u: The conv=lcase, conv=ucase options don't work correctly.
-
-depmod
- No info available yet.
df
As of fileutils-4.0u: OK.
diff
- As of diffutils-2.7 (1994): diff is not locale aware; the --side-by-side
- mode therefore doesn't compute column width correctly, not even in ISO-8859-1
- locales.
+ As of diffutils-2.7.2: the --side-by-side mode therefore doesn't compute
+ column width correctly.
diff3
No info available yet.
-
dirname
As of sh-utils-2.0i: OK.
-
domainname
No info available yet.
du
As of fileutils-4.0u: OK.
echo
As of sh-utils-2.0i: OK.
+ed
+ No info available yet.
+egrep
+ No info available yet.
env
As of sh-utils-2.0i: OK.
+ex
+ No info available yet.
expand
No info available yet.
expr
@@ -1739,26 +1907,29 @@ This affects all commands and utilities listed below.
No info available yet.
fg
No info available yet.
+fgrep
+ No info available yet.
file
No info available yet.
find
- As of findutils-4.1.5: The "-ok" option is not internationalized; a patch
- has been submitted to the maintainer. The "-iregex" does not work correctly;
- this needs a fix in function find/parser.c:insert_regex.
-fort77
+ As of findutils-4.1.6: The "-iregex" does not work correctly; this needs a
+ fix in function find/parser.c:insert_regex.
+fold
No info available yet.
ftp[BSD]
No info available yet.
fuser
No info available yet.
-
+gencat
+ No info available yet.
getconf
No info available yet.
getopts
No info available yet.
+gettext
+ No info available yet.
+grep
+ No info available yet.
gunzip
No info available yet.
gzip
@@ -1773,54 +1944,38 @@ This affects all commands and utilities listed below.
No info available yet.
hostname
As of sh-utils-2.0i: OK.
+iconv
+ No info available yet.
id
As of sh-utils-2.0i: OK.
ifconfig
No info available yet.
imake
No info available yet.
-insmod
- No info available yet.
-ipchains
- No info available yet.
ipcrm
No info available yet.
ipcs
No info available yet.
-ipmasqadm
- No info available yet.
jobs
No info available yet.
join
No info available yet.
-kerneld
- No info available yet.
kill
No info available yet.
killall
No info available yet.
-ksyms
- No info available yet.
ldd
No info available yet.
less
No complete info available yet.
lex
No info available yet.
-lilo
- No info available yet.
-
ln
As of fileutils-4.0u: OK.
-loadkeys
- No info available yet.
+locale
+ As of glibc-2.2: OK.
+localedef
+ As of glibc-2.2: OK.
logger
No info available yet.
logname
@@ -1829,26 +1984,24 @@ This affects all commands and utilities listed below.
No info available yet.
lpc[BSD]
No info available yet.
+lpq[BSD]
+ No info available yet.
lpr[BSD]
No info available yet.
lprm[BSD]
No info available yet.
-lpq[BSD]
- No info available yet.
-
ls
As of fileutils-4.0y: OK.
-lsmod
- No info available yet.
m4
No info available yet.
mailx
No info available yet.
make
No info available yet.
+man
+ No info available yet.
mesg
No info available yet.
mkdir
@@ -1859,12 +2012,14 @@ This affects all commands and utilities listed below.
No info available yet.
mkswap
No info available yet.
-modprobe
- No info available yet.
more
No info available yet.
mount
No info available yet.
+msgfmt
+ No info available yet.
+msgmerge
+ No info available yet.
mv
As of fileutils-4.0u: OK.
netstat
@@ -1883,10 +2038,6 @@ This affects all commands and utilities listed below.
No info available yet.
od
No info available yet.
-
passwd[BSD]
No info available yet.
paste
@@ -1895,62 +2046,36 @@ This affects all commands and utilities listed below.
No info available yet.
pathchk
As of sh-utils-2.0i: OK.
-
ping
No info available yet.
+pr
+ No info available yet.
printf
As of sh-utils-2.0i: OK.
-pr
- No info available yet.
-
ps
No info available yet.
pwd
As of sh-utils-2.0i: OK.
read
No info available yet.
-rdev
- No info available yet.
reboot
No info available yet.
renice
No info available yet.
rm
As of fileutils-4.0u: OK.
-
rmdir
As of fileutils-4.0u: OK.
-rmmod
+sed
No info available yet.
-
shar[BSD]
No info available yet.
shutdown
No info available yet.
sleep
As of sh-utils-2.0i: OK.
-
split
No info available yet.
strings
@@ -1958,18 +2083,11 @@ This affects all commands and utilities listed below.
strip
No info available yet.
stty
- As of sh-utils-2.0i: The string "<undef>" should not be translated;
- this needs a fix in function stty.c:visible.
+ As of sh-utils-2.0.11: OK.
su[BSD]
No info available yet.
sum
As of textutils-2.0e: OK.
-
-tac
- No info available yet.
tail
No info available yet.
talk
@@ -2014,51 +2132,18 @@ This affects all commands and utilities listed below.
No info available yet.
unexpand
No info available yet.
-
uniq
No info available yet.
-unlink
- No info available yet.
-
uudecode
No info available yet.
uuencode
No info available yet.
-
wait
No info available yet.
wc
- As of textutils-2.0e: wc cannot count characters; a patch has been submitted
- to the maintainer.
-
+ As of textutils-2.0.8: OK.
who
As of sh-utils-2.0i: OK.
wish
@@ -2068,6 +2153,8 @@ This affects all commands and utilities listed below.
xargs
As of findutils-4.1.5: The program uses strstr; a patch has been submitted
to the maintainer.
+xgettext
+ No info available yet.
yacc
No info available yet.
zcat
@@ -2131,6 +2218,34 @@ is incorrect: the lines are only about half as wide as they should be.
For plain text, uniprint has a better overall layout. On the other hand,
only wprint gets Thai output correct.
+Printing using fixed-size fonts
+
+
+Generally, printing using fixed-size fonts does not give an as professional
+output as using TrueType fonts.
+
+txtbdf2ps
+
+
+The txtbdf2ps 0.7 program by Serge Winitzki
+
+converts a plain text file to Postscript, by use of a BDF font.
+Installation:
+
+# install -m 777 txtbdf2ps-dev.txt /usr/local/bin/txtbdf2ps
+
+Example with a proportional font:
+
+$ txtbdf2ps -BDF=cyberbit.bdf -UTF-8 -nowrap < input.txt > output.ps
+
+Example with a fixed-width font:
+
+$ txtbdf2ps -BDF=unifont.bdf -UTF-8 -nowrap < input.txt > output.ps
+
+
+Note: txtbdf2ps does not support combining characters and bidi.
+
The classical approach
@@ -2139,7 +2254,9 @@ a Postscript font using the ttf2pt1 utility
(,
). Details can be
+ name="http://quadrant.netspace.net.au/ttf2pt1/">,
+). Details can be
found in Julius Chroboczek's "Printing with TrueType fonts in Unix" writeup,
.
@@ -2316,10 +2433,16 @@ a message database en.po which translates "'Hello', he said" to
"\u201cHello\u201d, he said".
Here is a survey of the portability of the ISO/ANSI C facilities on various
-Unix flavours. GNU glibc-2.2 will support all of it, but for now we have
-the following picture.
+Unix flavours.
+GNU glibc-2.2.x
+
+ - <wchar.h> and <wctype.h> exist.
+
- Has wcs/mbs functions, fgetwc/fputwc/wprintf, everything.
+
- Has five UTF-8 locales.
+
- mbrtowc works.
+
GNU glibc-2.0.x, glibc-2.1.x
- <wchar.h> and <wctype.h> exist.
@@ -2484,10 +2607,7 @@ classes, and includes a Unicode regular expression matcher.
ICU
International Components for Unicode
-(look also at
-).
+ name="http://oss.software.ibm.com/icu/">.
IBM's very comprehensive internationalization library featuring Unicode strings,
resource bundles, number formatters, date/time formatters, message formatters,
collation and more. Lots of supported locales. Portable to Unix and Win32,
@@ -2511,16 +2631,16 @@ of 8-bit character sets, are available:
iconv
-The iconv implementation by Ulrich Drepper, contained in the GNU glibc-2.1.3.
-.
+The iconv implementation by Ulrich Drepper, contained in the GNU glibc-2.2.
+.
The iconv manpages are now contained in
.
The portable iconv implementation by Bruno Haible.
-
+
The portable iconv implementation by Konstantin Chuguev.
librecode by François Pinard
-.
+.
Advantages:
@@ -2567,12 +2687,9 @@ Slow initialization.
ICU
-International Components for Unicode
+International Components for Unicode 1.7
-(look also at
-).
+ name="http://oss.software.ibm.com/icu/">.
IBM's internationalization library also has conversion facilities, declared
in `ucnv.h'.
@@ -2696,8 +2813,13 @@ the `:element-type' and `:external-format' arguments to `open'.
Limitations: Character attribute functions are locale dependent. Source and
compiled source files cannot contain Unicode string literals.
-The commercial Common Lisp implementation Allegro CL will have Unicode
-support in its upcoming release 6.0.
+The commercial Common Lisp implementation Allegro CL, in version 6.0, has
+Unicode support. The types `base-char' and `character' are both equivalent
+to 16-bit Unicode. The encoding used for file I/O can be specified through the
+`:external-format' argument, for example :external-format :utf8.
+The default encoding is locale dependent. More details are at
+.
Ada95
@@ -2721,15 +2843,19 @@ reference manuals for details.
Python 2.0
-(,
+ ,
+ )
-will contain Unicode support. In particular, it will have a data type
-`unicode', representing a Unicode string. a module `unicodedata' for the
+contains Unicode support. It has a new fundamental data type
+`unicode', representing a Unicode string, a module `unicodedata' for the
character properties, and a set of converters for the most important encodings.
See
-for details.
+ name="http://starship.python.net/crew/lemburg/unicode-proposal.txt">,
+or the file Misc/unicode.txt in the distribution, for details.
JavaScript/ECMAscript
@@ -2766,6 +2892,19 @@ characters of a string. For details, see the Perl-i18n FAQ at
.
+Support for other (non-8-bit) encodings is available through the iconv
+interface module
+.
+
+Related reading
+
+
+Tomohiro Kubota has written an introduction to internationalization
+.
+The emphasis of his document is on writing software that runs in any locale,
+using the locale's encoding.
Other sources of information