diff --git a/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml b/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml index 9e157825..fbb52ec2 100644 --- a/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml +++ b/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml @@ -985,7 +985,7 @@ Addresses Linux localization issues specific to Greek users HighQuality-Apps-HOWTO, Creating Integrated High Quality Linux Applications HOWTO -Updated: March 2002. +Updated: April 2002. Tries to clarify some issues and give tips on how to create Linux applications highly integrated to the Operating System, security and easy of use. @@ -2769,7 +2769,7 @@ configurations, and its operation. Unicode-HOWTO, The Unicode HOWTO -Updated: August 2000. +Updated: January 2001. How to change your Linux system so it uses UTF-8 as text encoding. diff --git a/LDP/howto/docbook/HOWTO-INDEX/otherLangSect.sgml b/LDP/howto/docbook/HOWTO-INDEX/otherLangSect.sgml index 8951ea37..2798bf95 100644 --- a/LDP/howto/docbook/HOWTO-INDEX/otherLangSect.sgml +++ b/LDP/howto/docbook/HOWTO-INDEX/otherLangSect.sgml @@ -22,7 +22,7 @@ Topics covered in this section include: Unicode-HOWTO, The Unicode HOWTO -Updated: August 2000. +Updated: January 2001. How to change your Linux system so it uses UTF-8 as text encoding. diff --git a/LDP/howto/docbook/HOWTO-INDEX/programmSect.sgml b/LDP/howto/docbook/HOWTO-INDEX/programmSect.sgml index e6b98d03..93c8069a 100644 --- a/LDP/howto/docbook/HOWTO-INDEX/programmSect.sgml +++ b/LDP/howto/docbook/HOWTO-INDEX/programmSect.sgml @@ -716,7 +716,7 @@ mod_dynvhost, mod_roaming, mod_jserv, and mod_php. HighQuality-Apps-HOWTO, Creating Integrated High Quality Linux Applications HOWTO -Updated: March 2002. +Updated: April 2002. Tries to clarify some issues and give tips on how to create Linux applications highly integrated to the Operating System, security and easy of use. diff --git a/LDP/howto/docbook/HighQuality-Apps-HOWTO/conffile.sgml b/LDP/howto/docbook/HighQuality-Apps-HOWTO/conffile.sgml index 893d1129..0bd8590f 100644 --- a/LDP/howto/docbook/HighQuality-Apps-HOWTO/conffile.sgml +++ b/LDP/howto/docbook/HighQuality-Apps-HOWTO/conffile.sgml @@ -1,10 +1,3 @@ - - - - - - - ############################################################################# ## @@ -18,11 +11,11 @@ # A ':' separated list of directories for your content. # The directories /var/www and /var/MySofware are already there, so # include here your special directories, if any. -CONF_CONTENT_PATH=/var/NewInstance:/var/NewInstance2 +CONF_CONTENT_PATH=/var/NewInstance:/var/NewInstance2 # Your e-mail address, for notifications. -EMAIL=john@mycompany.com +EMAIL=john@mycompany.com # Logs directory -LOG_DIR=/var/log/myInstance +LOG_DIR=/var/log/myInstance diff --git a/LDP/howto/docbook/HighQuality-Apps-HOWTO/externalconf.sgml b/LDP/howto/docbook/HighQuality-Apps-HOWTO/externalconf.sgml index 48980ad2..5bdb9c4c 100644 --- a/LDP/howto/docbook/HighQuality-Apps-HOWTO/externalconf.sgml +++ b/LDP/howto/docbook/HighQuality-Apps-HOWTO/externalconf.sgml @@ -1,9 +1,3 @@ - - - - - - #!/bin/sh @@ -19,18 +13,18 @@ ## # Default configuration file -CONF=/etc/MySoftware.conf +CONF=/etc/MySoftware.conf # Minimal content directories -MIN_CONTENT_PATH=/var/www:/var/MySoftware/www +MIN_CONTENT_PATH=/var/www:/var/MySoftware/www if [ -r "$CONF"]; then - . "$CONF" + . "$CONF" fi # All the content I'll serve are the "minimal" plus the ones provided # by the user in the configuration file $CONF -CONTENT_PATH=$MIN_CONTENT_PATH:$CONF_CONTENT_PATH +CONTENT_PATH=$MIN_CONTENT_PATH:$CONF_CONTENT_PATH . . diff --git a/LDP/howto/docbook/HighQuality-Apps-HOWTO/initscript.sgml b/LDP/howto/docbook/HighQuality-Apps-HOWTO/initscript.sgml index e1d54af2..91475c37 100644 --- a/LDP/howto/docbook/HighQuality-Apps-HOWTO/initscript.sgml +++ b/LDP/howto/docbook/HighQuality-Apps-HOWTO/initscript.sgml @@ -1,26 +1,10 @@ - - - - - - - - - - - - - - - - #!/bin/sh # # /etc/init.d/mysystem # Subsystem file for "MySystem" server # -# chkconfig: 2345 95 05 +# chkconfig: 2345 95 05 # description: MySystem server daemon # # processname: MySystem @@ -32,43 +16,43 @@ . /etc/rc.d/init.d/functions # pull in sysconfig settings -[ -f /etc/sysconfig/mySystem ] && . /etc/sysconfig/mySystem +[ -f /etc/sysconfig/mySystem ] && . /etc/sysconfig/mySystem RETVAL=0 prog="MySystem" . -. +. . -start() { +start() { echo -n $"Starting $prog:" . - . + . . RETVAL=$? - [ "$RETVAL" = 0 ] && touch /var/lock/subsys/$prog + [ "$RETVAL" = 0 ] && touch /var/lock/subsys/$prog echo } -stop() { +stop() { echo -n $"Stopping $prog:" . - . + . . killproc $prog -TERM RETVAL=$? - [ "$RETVAL" = 0 ] && rm -f /var/lock/subsys/$prog + [ "$RETVAL" = 0 ] && rm -f /var/lock/subsys/$prog echo } -reload() { +reload() { echo -n $"Reloading $prog:" killproc $prog -HUP RETVAL=$? echo } -case "$1" in +case "$1" in start) start ;; @@ -94,7 +78,7 @@ case "$1" in status $prog RETVAL=$? ;; - *) + *) echo $"Usage: $0 {start|stop|restart|reload|condrestart|status}" RETVAL=1 esac diff --git a/LDP/howto/docbook/HighQuality-Apps-HOWTO/manysouls.sgml b/LDP/howto/docbook/HighQuality-Apps-HOWTO/manysouls.sgml index 47179789..f58f01c7 100644 --- a/LDP/howto/docbook/HighQuality-Apps-HOWTO/manysouls.sgml +++ b/LDP/howto/docbook/HighQuality-Apps-HOWTO/manysouls.sgml @@ -1,5 +1,5 @@ -bash# /usr/sbin/httpd & -bash# /usr/sbin/httpd -f /etc/httpd/dom1.com.br.conf & -bash# /usr/sbin/httpd -f /etc/httpd/dom2.com.br.conf & -bash# /usr/sbin/httpd -f /etc/httpd/dom3.com.br.conf & +bash# /usr/sbin/httpd & +bash# /usr/sbin/httpd -f /etc/httpd/dom1.com.br.conf & +bash# /usr/sbin/httpd -f /etc/httpd/dom2.com.br.conf & +bash# /usr/sbin/httpd -f /etc/httpd/dom3.com.br.conf & diff --git a/LDP/howto/docbook/HighQuality-Apps-HOWTO/rc3d.sgml b/LDP/howto/docbook/HighQuality-Apps-HOWTO/rc3d.sgml index 569885ca..b4d04ff2 100644 --- a/LDP/howto/docbook/HighQuality-Apps-HOWTO/rc3d.sgml +++ b/LDP/howto/docbook/HighQuality-Apps-HOWTO/rc3d.sgml @@ -1,10 +1,10 @@ bash:/etc/rc3.d# ls -l lrwxrwxrwx 1 root root 18 Jan 14 11:59 K92firewall -> ../init.d/firewall -lrwxrwxrwx 1 root root 17 Jan 14 11:59 S10network -> ../init.d/network +lrwxrwxrwx 1 root root 17 Jan 14 11:59 S10network -> ../init.d/network lrwxrwxrwx 1 root root 16 Jan 14 11:59 S12syslog -> ../init.d/syslog lrwxrwxrwx 1 root root 18 Jan 14 11:59 S17keytable -> ../init.d/keytable lrwxrwxrwx 1 root root 20 Jan 14 11:59 S56rawdevices -> ../init.d/rawdevices lrwxrwxrwx 1 root root 16 Jan 14 11:59 S56xinetd -> ../init.d/xinetd -lrwxrwxrwx 1 root root 18 Jan 14 11:59 S75httpd -> ../init.d/httpd +lrwxrwxrwx 1 root root 18 Jan 14 11:59 S75httpd -> ../init.d/httpd lrwxrwxrwx 1 root root 11 Jan 13 21:45 S99local -> ../rc.local diff --git a/LDP/howto/linuxdoc/Unicode-HOWTO.sgml b/LDP/howto/linuxdoc/Unicode-HOWTO.sgml index d86f7657..a4f85419 100644 --- a/LDP/howto/linuxdoc/Unicode-HOWTO.sgml +++ b/LDP/howto/linuxdoc/Unicode-HOWTO.sgml @@ -6,7 +6,7 @@ Bruno Haible, -v0.18, 4 August 2000 +v1.0, 23 January 2001 This document describes how to change your Linux system so it uses UTF-8 as text encoding. - @@ -86,7 +86,7 @@ There are basically four ways to encode Unicode characters in bytes: The other 2147418112 characters (not assigned yet) can be encoded using 4, 5 or 6 characters. For more info about UTF-8, do `man 7 utf-8' (manpage contained - in the ldpman-1.20 package). + in the man-pages-1.20 package). UCS-2 Every character is represented as two bytes. This encoding can only represent the first 65536 Unicode characters. @@ -188,6 +188,9 @@ In Markus Kuhn's ucs-fonts package: . + + @@ -256,6 +259,18 @@ covers Latin, Cyrillic, Hebrew, Arabic scripts. It covers ISO 8859 parts /usr/lib/kbd/consolefonts/ and execute "/usr/bin/setfont /usr/lib/kbd/consolefonts/LatArCyrHeb-14.psf". +A more flexible approach is given by Dmitry Yu. Bolkhovityanov + +in +and . +To work around the constraint that a VGA font can only cover 512 characters simultaneously, +he provides a rich Unicode font (2279 characters, covering Latin, Greek, Cyrillic, Hebrew, +Armenian, IPA, math symbols, arrows, and more) in the typical 8x16 size and a script +which permits to extract any 512 characters as a console font. + If you want cut&paste to work with UTF-8 consoles, you need the patch @@ -281,6 +296,9 @@ The following programs are useful when installing fonts: "mkfontdir directory" prepares a font directory for use by the X server, needs to be executed after installing fonts in a directory. + + "xset -q | sed -e '1,/^Font Path:/d' | sed -e '2,$d' -e 's/^ //'" + displays the X server's current font path. "xset fp+ directory" adds a directory to the X server's current font path. @@ -334,11 +352,20 @@ dimensions. Markus Kuhn has assembled fixed-width 75dpi fonts with Unicode encoding - covering Latin, Greek, Cyrillic, Armenian, Georgian, Hebrew, Symbol scripts. - They cover ISO 8859 parts 1,2,3,4,5,7,8,9,10,13,14,15 all at once. - This font is required for running xterm in utf-8 mode. + covering Latin, Greek, Cyrillic, Armenian, Georgian, Hebrew scripts and + many symbols. + They cover ISO 8859 parts 1,2,3,4,5,7,8,9,10,13,14,15,16 all at once. + These fonts are required for running xterm in utf-8 mode. They are now + contained in XFree86 4.0.1, therefore you need to install them manually + only if you have an older XFree86 3.x version. + name="http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz">. + + Markus Kuhn has also assembled double-width fixed 75dpi fonts with Unicode + encoding covering Chinese, Japanese and Korean. These fonts are contained + in XFree86 4.0.1 as well. + Roman Czyborra has assembled an 8x16 / 16x16 75dpi font with Unicode encoding covering a huge part of Unicode. Download unifont.hex.gz and hex2bdf from @@ -390,10 +417,13 @@ xterm is part of X11R6 and XFree86, but is maintained separately by Tom Dickey. -Newer versions (patch level 109 and above) contain support for converting +Newer versions (patch level 146 and above) contain support for converting keystrokes to UTF-8 before sending them to the application running in the xterm, and for displaying Unicode characters that the application outputs -as UTF-8 byte sequence. +as UTF-8 byte sequence. It also contains support for double-wide characters +(mostly CJK ideographs) and combining characters, contributed by Robert Brady +. To get an UTF-8 xterm running, you need to: @@ -424,29 +454,29 @@ $ cat utf-8-demo.txt To make xterm come up with UTF-8 handling each time it is started, add the lines -XTerm*utf8: 1 -*VT100*font: -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1 +xterm*utf8: 1 +xterm*VT100*font: -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1 +xterm*VT100*wideFont: -misc-fixed-medium-r-normal-ja-13-125-75-75-c-120-iso10646-1 +xterm*VT100*boldFont: -misc-fixed-bold-r-semicondensed--13-120-75-75-c-60-iso10646-1 - to your $HOME/.Xdefaults (for yourself only). I don't recommend changing + to your $HOME/.Xdefaults (for yourself only). + For CJK text processing with double-width characters, the following + settings are probably better: + +xterm*VT100*font: -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 +xterm*VT100*wideFont: -Misc-Fixed-Medium-R-Normal-ja-18-120-100-100-C-180-ISO10646-1 + + I don't recommend changing the system-wide /usr/X11R6/lib/X11/app-defaults/XTerm, because then your changes will be erased next time you upgrade to a new XFree86 version. -A further patch which implements support for double-wide characters (mostly -CJK ideographs) and combining characters, by Robert Brady -, -is available from -. -It is based on xterm patch level 140 - -and is best used with the following settings: - - *VT100*font: -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 - *VT100*wideFont: -Daewoo-Gothic-Medium-R-Normal--18-18-100-100-M-180-ISO10646-1 - + TrueType fonts

@@ -478,7 +508,7 @@ or Load "xtt" -to the Modules section of your XF86Config file. +to the "Module" section of your XF86Config file. The display engines of other operating systems. @@ -503,6 +533,7 @@ Some no-cost TrueType fonts with large Unicode coverage are Downloadable from . + It is free for non-commercial purposes. Microsoft Arial Covers Roman, Cyrillic, Greek, Hebrew, Arabic, some combining diacritical @@ -520,12 +551,25 @@ Some no-cost TrueType fonts with large Unicode coverage are Lucida Sans Unicode Covers Roman, Cyrillic, Greek, Hebrew, combining diacritical marks. - Download: contained in IBM's JDK 1.3.0beta for Linux, or directly - downloadable as LucidaSansRegular.ttf and + Download: contained in IBM's JDK 1.3.0 for Linux, at + , + or directly downloadable as LucidaSansRegular.ttf and LucidaSansOblique.ttf from . +Arphic + Cover Chinese (both traditional and simplified). + + Download: at + . + These fonts are truly free. + + Download locations for these and other TrueType fonts can be found at @@ -533,10 +577,31 @@ Christoph Singer's list of freely downloadable Unicode TrueType fonts . +Truetype fonts are installed similarly to fixed size fonts, except that +they go in a separate directory, and that ttmkfdir must be +called before mkfontdir: + +# mkdir -p /usr/X11R6/lib/X11/fonts/truetype +# cp /somewhere/Cyberbit.ttf ... /usr/X11R6/lib/X11/fonts/truetype +# cd /usr/X11R6/lib/X11/fonts/truetype +# ttmkfdir > fonts.scale +# mkfontdir +# xset fp rehash + + TrueType fonts can be converted to low resolution, non-scalable X11 fonts by use of Mark Leisher's ttf2bdf utility . +For example, to generate a proportional Unicode font for use with cooledit: + +# cd /usr/X11R6/lib/X11/fonts/local +# ttf2bdf ../truetrype/Cyberbit.ttf > cyberbit.bdf +# bdftopcf -o cyberbit.pcf cyberbit.bdf +# gzip -9 cyberbit.pcf +# mkfontdir +# xset fp rehash + More information about TrueType fonts can be found in the Linux TrueType HOWTO + package by Ricardas Cepas, files testUTF-8.c and testUTF8.c. Most applications should not use this, however: they should look at the environment variables, see section "Locale environment variables". @@ -600,6 +665,10 @@ if the other operating system supports them. Recall that to enable a mount option for all future remounts, you add it to the fourth column of the corresponding /etc/fstab line. + +Upgrading the C library +

+ +glibc-2.2 supports multibyte locales, in particular UTF-8 locales. But +glibc-2.1.x and earlier C libraries do not support it. Therefore you need +to upgrade to glibc-2.2. Upgrading from glibc-2.1.x is riskless, because +glibc-2.2 is binary compatible with glibc-2.1.x (at least on i386 platforms, +and except for IPv6). Nevertheless, I recommend to have a bootable rescue +disk handy in case something goes wrong. + +Prepare the kernel sources. You must have them unpacked and configured. +/usr/src/linux/include/linux/autoconf.h must exist. Building the kernel +is not needed. + +Retrieve the glibc sources +, +su to root, then unpack, build and install it: + +# unset LD_PRELOAD +# unset LD_LIBRARY_PATH +# tar xvfz glibc-2.2.tar.gz +# tar xvfz glibc-linuxthreads-2.2.tar.gz -C glibc-2.2 +# mkdir glibc-2.2-build +# cd glibc-2.2-build +# ../glibc-2.2/configure --prefix=/usr --with-headers=/usr/src/linux/include --enable-add-ons +# make +# make check +# make info +# LC_ALL=C make install +# make localedata/install-locales + + +Upgrading from glibc versions earlier than 2.1.x cannot be done this way; +consider first installing a Linux distribution based on glibc-2.1.x, and +then upgrading to glibc-2.2 as described above. + +Note that if -- for any reason -- you want to rebuild GCC after having +installed glibc-2.2, you need to first apply this patch + +to the GCC sources. + General data conversion

You will need a program to convert your locally (probably ISO-8859-1) encoded texts to UTF-8. (The alternative would be to keep using texts in different encodings on the same machine; this is not fun in the long run.) -One such program is `iconv', which comes with glibc-2.1. Simply use +One such program is `iconv', which comes with glibc-2.2. Simply use $ iconv --from-code=ISO-8859-1 --to-code=UTF-8 < old_file > new_file @@ -714,7 +827,7 @@ Here are two handy shell scripts, called "i2u" (for UTF to ISO conversion). Adapt according to your current 8-bit character set. -If you don't have glibc-2.1 and iconv installed, you can use GNU recode 3.5 +If you don't have glibc-2.2 and iconv installed, you can use GNU recode 3.6 instead. "i2u" is @@ -722,16 +835,8 @@ instead. "u2i" is "recode UTF-8..ISO-8859-1". - - -Notes: You need GNU recode 3.5 or newer. To compile GNU recode 3.5 on -platforms without glibc2 (i.e. on all platforms except recent Linux systems), -you need to configure it with --disable-nls, otherwise it won't link. -Newer development versions of GNU recode with CJK support are available at -. + Or you can also use CLISP instead. Here are "i2u" You do not need to change your LANGUAGE environment variable. -GNU gettext has the ability to convert translations to the right encoding. -Until glibc-2.2 is released, all you have to do is to set the OUTPUT_CHARSET -environment variable. - -$ export OUTPUT_CHARSET=UTF-8 - -glibc-2.2 will not need this OUTPUT_CHARSET variable; it will correctly -infer it from the LC_CTYPE environment variable. +GNU gettext in glibc-2.2 has the ability to convert translations to the right +encoding. Creating the locale support files

-If you have glibc-2.1 or glibc-2.1.1 or glibc-2.1.2 installed, first check -using "localedef --help" that the system directory for character maps is -/usr/share/i18n/charmaps. Then apply to the file /usr/share/i18n/charmaps/UTF8 -the patch -or -or , respectively. -Then create the support files for each UTF-8 locale you intend to use, for -example: +You create using localedef the support files for each UTF-8 locale +you intend to use, for example: -$ localedef -v -c -i de_DE -f UTF8 /usr/share/locale/de_DE.UTF-8 +$ localedef -v -c -i de_DE -f UTF-8 de_DE.UTF-8 -You must give an absolute pathname here; otherwise localedef creates the -locale in a directory named "de_DE.utf8", which does not work with -XFree86-4.0.1. You typically don't need to create locales named "de" or "fr" without country suffix, because these locales are normally only used by the LANGUAGE variable and not by the LC_* variables, and LANGUAGE is only used as an override for LC_MESSAGES. -Adding support to the C library -

- -The glibc-2.2 will support multibyte locales, in particular the UTF-8 locales -created above. But glibc-2.1.x does not really support it. -Therefore the only real effect of the above creation of the -/usr/share/locale/de_DE.UTF-8/* files is that `setlocale(LC_ALL,"")' -will return "de_DE.UTF-8", according to your environment variables, instead -of stripping off the ".UTF-8" suffix. - -To add support for the UTF-8 locale, you should build and install the -following three libraries: - - -`libutf8_plug.so', from -, - -`libiconv_plug.so', from -, - -`libintl_plug.so', from -. - -Then you can set the LD_PRELOAD environment variable to point to the -installed libraries: - -$ export LD_PRELOAD=/usr/local/lib/libutf8_plug.so:/usr/local/lib/libiconv_plug.so:/usr/local/lib/libintl_plug.so - -Then, in every program launched with this environment variable set, the -functions in libutf8_plug.so, libiconv_plug.so and libintl_plug.so will -override the original ones in /lib/libc.so.6. For more info about LD_PRELOAD, -see "man 8 ld.so". - -This entire thing will not be necessary any more once glibc-2.2 comes out. - Specific applications

+Shells +

+ +bash +

+ +By default, GNU bash assumes that every character is one byte long and one +column wide. A patch for bash 2.04, by Marcin 'Qrczak' Kowalczyk and +Ricardas Cepas, teaches bash about multibyte characters in UTF-8 encoding. + + +Double-width characters, combining characters and bidi are not supported by +this patch. It seems a complete redesign of the readline redisplay engine is +needed. + Networking

@@ -967,6 +1035,29 @@ line. +Amaya +

+ +Amaya 4.2.1 +(, +) +has now limited handling of UTF-8 encoded HTML pages. It +recognizes the encoding, but it displays only ISO-8859-1 and symbol +characters; it only ever accesses the fonts + + -adobe-times-*-iso8859-1 + -adobe-helvetica-*-iso8859-1 + -adobe-new century schoolbook-*-iso8859-1 + -adobe-courier-*-iso8859-1 + -adobe-symbol-*-adobe-fontspecific + + +Amaya is in fact a HTML editor, not only a browser. Amaya's strengths among +the browsers are its speed, given enough memory, and its rendering +of mathematical formulas (MathML support). + lynx

@@ -1062,14 +1153,17 @@ and James Kass

yudit by Gáspár Sinai - + is a first-class unicode text editor for the X Window System. It supports simultaneous processing of many languages, input methods, conversions for local character standards. It has facilities for entering text in all languages with only an English keyboard, using keyboard configuration maps. +yudit-1.5 +

+ It can be compiled in three versions: Xlib GUI, KDE GUI, or Motif GUI. Customization is very easy. Typically you will first customize your font. @@ -1081,12 +1175,22 @@ Next, you will customize your input method. The input methods "Straight", "Unicode" and "SGML" are most remarkable. For details about the other built-in input methods, look in /usr/local/share/yudit/data/. -To make a change the default for the next session, edit your $HOME/.yuditrc +To change the default for the next session, edit your $HOME/.yuditrc file. The general editor functionality is limited to editing, cut&paste and search&replace. No undo. +yudit-2.1 +

+ +This version is less easy to learn, because it comes with a homebrewn +GUI and no easily accessible help. But it has an undo functionality and +should therefore be more usable than version 1.5. + +Fonts for yudit +

+ yudit can display text using a TrueType font; see section "TrueType fonts" above. The Bitstream Cyberbit gives good results. For yudit to find the font, symlink it to /usr/local/share/yudit/data/cyberbit.ttf. @@ -1094,7 +1198,7 @@ font, symlink it to /usr/local/share/yudit/data/cyberbit.ttf. vim

-vim (as of version 6.0b) has good support for UTF-8: when started in an +vim (as of version 6.0r) has good support for UTF-8: when started in an UTF-8 locale, it assumes UTF-8 encoding for the console and the text files being edited. It supports double-wide (CJK) characters as well and combining characters and therefore fits perfectly into UTF-8 enabled @@ -1103,10 +1207,46 @@ xterm. Installation: Download from . -After unpacking the four parts, edit src/Makefile to -include the --with-features=big option. This will turn on the -features FEAT_MBYTE, FEAT_RIGHTLEFT, FEAT_LANGMAP. Then do "make" and -"make install". +After unpacking the four parts, call ./configure with +--with-features=big --enable-multibyte arguments +(or edit src/Makefile to include the --with-features=big and +--enable-multibyte options). This will turn on the feature +FEAT_MBYTE. Then do "make" and "make install". + +vim can be used to edit files in other encodings. For example, to edit +a BIG5 encoded file: :e ++cc=BIG5 filename. All encoding names +supported by iconv are accepted. Plus: vim automatically distinguishes +UTF-8 and ISO-8859-1 files without needing any command line option. + +cooledit +

+ +cooledit by Paul Sheer + +is a good text editor for the X Window System. Since version 3.15, it has +support for Unicode, including Bidi for Hebrew (but not Arabic). + +A build error message message about a missing "vga_setpage" function is +worked around by adding "-DDO_NOT_USE_VGALIB" to the CFLAGS. + +To view UTF-8 files in an UTF-8 locale you have to modify a setting in +the "Options -> Switches" panel: Enable the checkbox "Display characters +outside locale". I also found it necessary to disable "Spellcheck as you +type". + +For viewing texts with both European and CJK characters, cooledit needs a +font which contains both, for example the GNU unifont (see section +"X11 Unicode fonts"): Start once + +$ cooledit -fn -gnu-unifont-medium-r-normal--16-160-75-75-c-80-iso10646-1 + +cooledit will then use this font in all future invocations. + +Unfortunately, the only characters that can be entered through the keyboard +are ISO-8859-1 characters and, through a cooledit specific compose mechanism, +ISO-8859-2 characters. Inputing arbitrary Unicode characters in cooledit is +possible, but a bit tedious. emacs

@@ -1136,8 +1276,11 @@ of them needs recompiling Emacs. name="ftp://etlport.etl.go.jp/pub/mule/Mule-UCS/Mule-UCS-0.70.tar.gz"> (mirrored at ) - by Miyashita Hisashi, provides a "utf-8" encoding to Emacs. + name="http://riksun.riken.go.jp/archives/misc/mule/Mule-UCS/Mule-UCS-0.70.tar.gz"> + and + ) + by Hisashi Miyashita, provides a "utf-8" encoding to Emacs. You can use either of these packages, or both together. The advantages @@ -1148,7 +1291,8 @@ to a process buffer (such as M-x shell), not only to loading and saving of files; and it respects the widths of characters better (important for Ethiopian). However, it is less reliable: After heavy editing of a file, I have seen some Unicode characters replaced with U+FFFD after the file was -saved. +saved. (But maybe that were bugs in Emacs 20.5 and 20.6 which are fixed in +Emacs 20.7.) To install the emacs-utf package, compile the program "utf2mule" and install it somewhere in your $PATH, also install unicode.el, muleuni-1.el, @@ -1183,11 +1327,36 @@ unicode-char.el somewhere. Then add the lines )))) to your $HOME/.emacs file. To activate any of the font sets, use the Mule -menu item "Set Font/FontSet" or Shift-down-mouse-1. Currently the font sets -with height 15 and 13 have the best Unicode coverage, due to Markus Kuhn's -9x15 and 6x13 fonts. To designate a font set as the initial font set for -the first frame at startup, uncomment the set-default-font line -in the code snippet above. +menu item "Set Font/FontSet" or Shift-down-mouse-1. The Unicode coverage +may of the font sets at different sizes may depend on the installed fonts; +here are screen shots at various sizes of UTF-8-demo.txt ( +, +, +, +, +, +) +and of the Mule script examples ( +, +, +, +, +, +). +To designate a font set as the initial font set for the first frame at startup, +uncomment the set-default-font line in the code snippet above. To install the oc-unicode package, execute the command @@ -1253,8 +1422,11 @@ M-x shell RET (This works with oc-unicode/Mule-UCS only.) -Note that all this works with Emacs in windowing mode only, not in terminal -mode. +There is a newer version Mule-UCS-0.81. Unfortunately you need to rebuild emacs +from source in order to use it. + +Note that all this works with Emacs 20 in windowing mode only, not in terminal +mode. None of the mentioned packages works in Emacs 21, as of this writing. Richard Stallman plans to add integrated UTF-8 support to Emacs in the long term, and so does the XEmacs developers group. @@ -1334,13 +1506,16 @@ core. pico

+As of version 4.30, pine cannot be reasonably used to view or edit UTF-8 +files. In UTF-8 enabled xterm, it has severe redraw problems. + mined98

mined98 is a small text editor by Michiel Huisjes, Achim Müller and Thomas Wolff. - + It lets you edit UTF-8 or 8-bit encoded files, in an UTF-8 or 8-bit xterm. It also has powerful capabilities for entering Unicode characters. @@ -1353,10 +1528,20 @@ interpretation at any time from within the editor: It displays the encoding of these characters to change it. mined knows about double-width and combining characters and displays them -correctly. +correctly. It also has a special display mode for combining characters. -mined also has very nice pull-down menus. Alas, the "Home", "End", "Delete" -keys do not work. +mined also has a scrollbar and very nice pull-down menus. Alas, the "Home", +"End", "Delete" keys do not work. + +qemacs +

+ +qemacs 0.2 is a small text editor by Fabrice Bellard. + +with Emacs keybindings. It runs in an UTF-8 console or xterm, and can edit +both 8-bit encoded and UTF-8 encoded files. It still has a few rough edges, +but further development is underway. Mailers

@@ -1388,7 +1573,7 @@ Now about the individual mail clients (or "mail user agents"): pine

-The situation for an unpatched pine version 4.10 is as follows. +The situation for an unpatched pine version 4.30 is as follows. Pine does not do character set conversions. But it allows you to view UTF-8 mails in an UTF-8 text window (Linux console or xterm). @@ -1404,8 +1589,8 @@ will display Latin and Greek characters, but not other kinds of Unicode characters. A patch by Robert Brady - + adds UTF-8 support to Pine. With this patch, it decodes and prints headers @@ -1448,13 +1633,14 @@ the "Unicode" font category. mutt

-mutt-1.0, as available from +mutt-1.2.x, as available from , -contains only rudimentary UTF-8 support. For full UTF-8 support, there are -patches by Edmund Grimley Evans at -. +has only rudimentary support for UTF-8: it can convert +from UTF-8 into an 8-bit display charset. The mutt-1.3.x +development branch also supports UTF-8 as the display charset, +so you can run Mutt in an UTF-8 xterm, and has thorough support +for MIME and charset conversion (relying on iconv). exmh

@@ -1481,7 +1667,7 @@ exmh.mime_utf-8_title_families: fixed groff

-groff 1.16, the GNU implementation of the traditional Unix text processing +groff 1.16.1, the GNU implementation of the traditional Unix text processing system troff/nroff, can output UTF-8 formatted text. Simply use `groff -Tutf8' instead of `groff -Tlatin1' or `groff -Tascii'. @@ -1527,6 +1713,12 @@ Other maybe related links: PostgreSQL 6.4 or newer can be built with the configuration option --with-mb=UNICODE. +Interbase +

+ +Borland/Inprise's Interbase 6.0 can store string fields in UTF-8 format +if the option "CHARACTER SET UNICODE_FSS" is given. + Other text-mode applications

@@ -1546,9 +1738,9 @@ environment variable. lv

-lv-4.21 by Tomio Narita - +lv-4.49.3 by Tomio Narita + is a file viewer with builtin character set converters. To view UTF-8 files in an UTF-8 console, use "lv -Au8". But it can also be used to view files in other CJK encodings in an UTF-8 console. @@ -1556,15 +1748,14 @@ files in other CJK encodings in an UTF-8 console. There is a small glitch: lv turns off xterm's cursor and doesn't turn it on again. -expand, wc +expand

Get the GNU textutils-2.0 and apply the patch , -then configure, add "#define HAVE_MBRTOWC 1", "#define HAVE_FGETWC 1", -"#define HAVE_FPUTWC 1" to config.h. In src/Makefile, modify CFLAGS and LDFLAGS -so that they include the directories where libutf8 is installed. Then rebuild. +then configure, add "#define HAVE_FGETWC 1", "#define HAVE_FPUTWC 1" to +config.h. Then rebuild. col, colcrt, colrm, column, rev, ul

@@ -1586,9 +1777,9 @@ The Li18nux list of commands and utilities that ought to be made interoperable with UTF-8 is as follows. Useful information needs to get added here; I just didn't get around it yet :-) -As of glibc-2.2, regular expressions will only work for 8-bit characters. +As of glibc-2.2, regular expressions only work for 8-bit characters. In an UTF-8 locale, regular expressions that contain non-ASCII characters -or that expect to match a single multibyte character with "." will not work. +or that expect to match a single multibyte character with "." do not work. This affects all commands and utilities listed below. - alias No info available yet. ar @@ -1615,13 +1802,13 @@ This affects all commands and utilities listed below. No info available yet. arp No info available yet. -asa - No info available yet. at As of at-3.1.8: The two uses of isalnum in at.c are invalid and should be replaced with a use of quotearg.c or an exclude list of the (fixed) list of shell metacharacters. The two uses of %8s in at.c and atd.c are invalid and should become arbitrary length. +awk + No info available yet. basename As of sh-utils-2.0i: OK. batch @@ -1639,8 +1826,6 @@ This affects all commands and utilities listed below. cal No info available yet. @@ -1676,58 +1861,41 @@ This affects all commands and utilities listed below. As of fileutils-4.0u: OK. cpio No info available yet. +crontab + No info available yet. csplit No info available yet. ctags No info available yet. -crontab - No info available yet. - cut No info available yet. - date As of sh-utils-2.0i: OK. dd As of fileutils-4.0u: The conv=lcase, conv=ucase options don't work correctly. - -depmod - No info available yet. df As of fileutils-4.0u: OK. diff - As of diffutils-2.7 (1994): diff is not locale aware; the --side-by-side - mode therefore doesn't compute column width correctly, not even in ISO-8859-1 - locales. + As of diffutils-2.7.2: the --side-by-side mode therefore doesn't compute + column width correctly. diff3 No info available yet. - dirname As of sh-utils-2.0i: OK. - domainname No info available yet. du As of fileutils-4.0u: OK. echo As of sh-utils-2.0i: OK. +ed + No info available yet. +egrep + No info available yet. env As of sh-utils-2.0i: OK. +ex + No info available yet. expand No info available yet. expr @@ -1739,26 +1907,29 @@ This affects all commands and utilities listed below. No info available yet. fg No info available yet. +fgrep + No info available yet. file No info available yet. find - As of findutils-4.1.5: The "-ok" option is not internationalized; a patch - has been submitted to the maintainer. The "-iregex" does not work correctly; - this needs a fix in function find/parser.c:insert_regex. -fort77 + As of findutils-4.1.6: The "-iregex" does not work correctly; this needs a + fix in function find/parser.c:insert_regex. +fold No info available yet. ftp[BSD] No info available yet. fuser No info available yet. - +gencat + No info available yet. getconf No info available yet. getopts No info available yet. +gettext + No info available yet. +grep + No info available yet. gunzip No info available yet. gzip @@ -1773,54 +1944,38 @@ This affects all commands and utilities listed below. No info available yet. hostname As of sh-utils-2.0i: OK. +iconv + No info available yet. id As of sh-utils-2.0i: OK. ifconfig No info available yet. imake No info available yet. -insmod - No info available yet. -ipchains - No info available yet. ipcrm No info available yet. ipcs No info available yet. -ipmasqadm - No info available yet. jobs No info available yet. join No info available yet. -kerneld - No info available yet. kill No info available yet. killall No info available yet. -ksyms - No info available yet. ldd No info available yet. less No complete info available yet. lex No info available yet. -lilo - No info available yet. - ln As of fileutils-4.0u: OK. -loadkeys - No info available yet. +locale + As of glibc-2.2: OK. +localedef + As of glibc-2.2: OK. logger No info available yet. logname @@ -1829,26 +1984,24 @@ This affects all commands and utilities listed below. No info available yet. lpc[BSD] No info available yet. +lpq[BSD] + No info available yet. lpr[BSD] No info available yet. lprm[BSD] No info available yet. -lpq[BSD] - No info available yet. - ls As of fileutils-4.0y: OK. -lsmod - No info available yet. m4 No info available yet. mailx No info available yet. make No info available yet. +man + No info available yet. mesg No info available yet. mkdir @@ -1859,12 +2012,14 @@ This affects all commands and utilities listed below. No info available yet. mkswap No info available yet. -modprobe - No info available yet. more No info available yet. mount No info available yet. +msgfmt + No info available yet. +msgmerge + No info available yet. mv As of fileutils-4.0u: OK. netstat @@ -1883,10 +2038,6 @@ This affects all commands and utilities listed below. No info available yet. od No info available yet. - passwd[BSD] No info available yet. paste @@ -1895,62 +2046,36 @@ This affects all commands and utilities listed below. No info available yet. pathchk As of sh-utils-2.0i: OK. - ping No info available yet. +pr + No info available yet. printf As of sh-utils-2.0i: OK. -pr - No info available yet. - ps No info available yet. pwd As of sh-utils-2.0i: OK. read No info available yet. -rdev - No info available yet. reboot No info available yet. renice No info available yet. rm As of fileutils-4.0u: OK. - rmdir As of fileutils-4.0u: OK. -rmmod +sed No info available yet. - shar[BSD] No info available yet. shutdown No info available yet. sleep As of sh-utils-2.0i: OK. - split No info available yet. strings @@ -1958,18 +2083,11 @@ This affects all commands and utilities listed below. strip No info available yet. stty - As of sh-utils-2.0i: The string "<undef>" should not be translated; - this needs a fix in function stty.c:visible. + As of sh-utils-2.0.11: OK. su[BSD] No info available yet. sum As of textutils-2.0e: OK. - -tac - No info available yet. tail No info available yet. talk @@ -2014,51 +2132,18 @@ This affects all commands and utilities listed below. No info available yet. unexpand No info available yet. - uniq No info available yet. -unlink - No info available yet. - uudecode No info available yet. uuencode No info available yet. - wait No info available yet. wc - As of textutils-2.0e: wc cannot count characters; a patch has been submitted - to the maintainer. - + As of textutils-2.0.8: OK. who As of sh-utils-2.0i: OK. wish @@ -2068,6 +2153,8 @@ This affects all commands and utilities listed below. xargs As of findutils-4.1.5: The program uses strstr; a patch has been submitted to the maintainer. +xgettext + No info available yet. yacc No info available yet. zcat @@ -2131,6 +2218,34 @@ is incorrect: the lines are only about half as wide as they should be. For plain text, uniprint has a better overall layout. On the other hand, only wprint gets Thai output correct. +Printing using fixed-size fonts +

+ +Generally, printing using fixed-size fonts does not give an as professional +output as using TrueType fonts. + +txtbdf2ps +

+ +The txtbdf2ps 0.7 program by Serge Winitzki + +converts a plain text file to Postscript, by use of a BDF font. +Installation: + +# install -m 777 txtbdf2ps-dev.txt /usr/local/bin/txtbdf2ps + +Example with a proportional font: + +$ txtbdf2ps -BDF=cyberbit.bdf -UTF-8 -nowrap < input.txt > output.ps + +Example with a fixed-width font: + +$ txtbdf2ps -BDF=unifont.bdf -UTF-8 -nowrap < input.txt > output.ps + + +Note: txtbdf2ps does not support combining characters and bidi. + The classical approach

@@ -2139,7 +2254,9 @@ a Postscript font using the ttf2pt1 utility (, ). Details can be + name="http://quadrant.netspace.net.au/ttf2pt1/">, +). Details can be found in Julius Chroboczek's "Printing with TrueType fonts in Unix" writeup, . @@ -2316,10 +2433,16 @@ a message database en.po which translates "'Hello', he said" to "\u201cHello\u201d, he said". Here is a survey of the portability of the ISO/ANSI C facilities on various -Unix flavours. GNU glibc-2.2 will support all of it, but for now we have -the following picture. +Unix flavours. +GNU glibc-2.2.x + + <wchar.h> and <wctype.h> exist. + Has wcs/mbs functions, fgetwc/fputwc/wprintf, everything. + Has five UTF-8 locales. + mbrtowc works. + GNU glibc-2.0.x, glibc-2.1.x <wchar.h> and <wctype.h> exist. @@ -2484,10 +2607,7 @@ classes, and includes a Unicode regular expression matcher. ICU International Components for Unicode -(look also at -). + name="http://oss.software.ibm.com/icu/">. IBM's very comprehensive internationalization library featuring Unicode strings, resource bundles, number formatters, date/time formatters, message formatters, collation and more. Lots of supported locales. Portable to Unix and Win32, @@ -2511,16 +2631,16 @@ of 8-bit character sets, are available: iconv

-The iconv implementation by Ulrich Drepper, contained in the GNU glibc-2.1.3. -. +The iconv implementation by Ulrich Drepper, contained in the GNU glibc-2.2. +. The iconv manpages are now contained in . The portable iconv implementation by Bruno Haible. - + The portable iconv implementation by Konstantin Chuguev. librecode by François Pinard -. +. Advantages: @@ -2567,12 +2687,9 @@ Slow initialization. ICU

-International Components for Unicode +International Components for Unicode 1.7 -(look also at -). + name="http://oss.software.ibm.com/icu/">. IBM's internationalization library also has conversion facilities, declared in `ucnv.h'. @@ -2696,8 +2813,13 @@ the `:element-type' and `:external-format' arguments to `open'. Limitations: Character attribute functions are locale dependent. Source and compiled source files cannot contain Unicode string literals. -The commercial Common Lisp implementation Allegro CL will have Unicode -support in its upcoming release 6.0. +The commercial Common Lisp implementation Allegro CL, in version 6.0, has +Unicode support. The types `base-char' and `character' are both equivalent +to 16-bit Unicode. The encoding used for file I/O can be specified through the +`:external-format' argument, for example :external-format :utf8. +The default encoding is locale dependent. More details are at +. Ada95

@@ -2721,15 +2843,19 @@ reference manuals for details.

Python 2.0 -(, + , + ) -will contain Unicode support. In particular, it will have a data type -`unicode', representing a Unicode string. a module `unicodedata' for the +contains Unicode support. It has a new fundamental data type +`unicode', representing a Unicode string, a module `unicodedata' for the character properties, and a set of converters for the most important encodings. See -for details. + name="http://starship.python.net/crew/lemburg/unicode-proposal.txt">, +or the file Misc/unicode.txt in the distribution, for details. JavaScript/ECMAscript

@@ -2766,6 +2892,19 @@ characters of a string. For details, see the Perl-i18n FAQ at . +Support for other (non-8-bit) encodings is available through the iconv +interface module +. + +Related reading +

+ +Tomohiro Kubota has written an introduction to internationalization +. +The emphasis of his document is on writing software that runs in any locale, +using the locale's encoding. Other sources of information