old-www/HOWTO/Hebrew-HOWTO-2.html

69 lines
2.5 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
<TITLE>The Hebrew HOWTO: Standards for representation of Hebrew characters</TITLE>
<LINK HREF="Hebrew-HOWTO-3.html" REL=next>
<LINK HREF="Hebrew-HOWTO-1.html" REL=previous>
<LINK HREF="Hebrew-HOWTO.html#toc2" REL=contents>
</HEAD>
<BODY>
<A HREF="Hebrew-HOWTO-3.html">Next</A>
<A HREF="Hebrew-HOWTO-1.html">Previous</A>
<A HREF="Hebrew-HOWTO.html#toc2">Contents</A>
<HR>
<H2><A NAME="s2">2. Standards for representation of Hebrew characters</A></H2>
<H2><A NAME="ss2.1">2.1 ASCII</A>
</H2>
<P>To make one thing clear, for once and forever: There is no such thing as
8-bit ASCII. ASCII is only 7 bits. Any 8-bit code is not ASCII, but that
doesn't mean it's not standard. ISO-8859-8 is standard, but not ASCII. Thanks!
<P>
<H2><A NAME="ss2.2">2.2 DOS Hebrew</A>
</H2>
<P>The Hebrew encoding starts at 128d for Aleph. Therefore, encoding requires
8 bits. This is what you have on the Video card EPROM hardware fonts, all of
the Hebrew DOS based editors use this table (Qtext, HED, etc.).
<P>
<H2><A NAME="ss2.3">2.3 ISO Hebrew</A>
</H2>
<P>The Hebrew encoding starts at 224 for Aleph. This is the Internet standard,
international standard and basically the standard for Ms-Windows and for
Macintoshes (Dagesh, etc...).
<P>
<H2><A NAME="ss2.4">2.4 OLD PC Hebrew</A>
</H2>
<P>This is 7-bit, and obsolete, as it occupies essentially the same ASCII
range as English lowercase letters. So, it is best avoided. However, when
ISO Hebrew gets its eighth bit stripped off by some ignorant Unix mail program
(so you get a jumble of English letters for the Hebrew part of your message
and the regular English, reversed or not, mixed in), you will get this, and
will need to transform it to PC or ISO. If there was English mixed in with
the Hebrew, this will be a sad situation, as you will either get Hebrew plus
jumble, or English plus jumble...
<P>
<H2><A NAME="ss2.5">2.5 Conversions</A>
</H2>
<P>Here are some simple scripts to convert from each standard to the other:
<BLOCKQUOTE><CODE>
<PRE>
DOS - ISO: tr '\200-\232' '\340-\372' &lt; {dos_file} > {iso_file}
ISO - DOS: tr '\340-\372' '\200-\232' &lt; {iso_file} > {dos_file}
OLD - DOS: tr -z '\200-\232' &lt; {old_Hebrew_file} > {dos_file}
</PRE>
</CODE></BLOCKQUOTE>
<P>NOTE: The numbers use by <CODE>tr</CODE> are in octal!
<P>
<HR>
<A HREF="Hebrew-HOWTO-3.html">Next</A>
<A HREF="Hebrew-HOWTO-1.html">Previous</A>
<A HREF="Hebrew-HOWTO.html#toc2">Contents</A>
</BODY>
</HTML>