69 lines
2.5 KiB
HTML
69 lines
2.5 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
|
<TITLE>The Hebrew HOWTO: Standards for representation of Hebrew characters</TITLE>
|
|
<LINK HREF="Hebrew-HOWTO-3.html" REL=next>
|
|
<LINK HREF="Hebrew-HOWTO-1.html" REL=previous>
|
|
<LINK HREF="Hebrew-HOWTO.html#toc2" REL=contents>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="Hebrew-HOWTO-3.html">Next</A>
|
|
<A HREF="Hebrew-HOWTO-1.html">Previous</A>
|
|
<A HREF="Hebrew-HOWTO.html#toc2">Contents</A>
|
|
<HR>
|
|
<H2><A NAME="s2">2. Standards for representation of Hebrew characters</A></H2>
|
|
|
|
<H2><A NAME="ss2.1">2.1 ASCII</A>
|
|
</H2>
|
|
|
|
<P>To make one thing clear, for once and forever: There is no such thing as
|
|
8-bit ASCII. ASCII is only 7 bits. Any 8-bit code is not ASCII, but that
|
|
doesn't mean it's not standard. ISO-8859-8 is standard, but not ASCII. Thanks!
|
|
<P>
|
|
<H2><A NAME="ss2.2">2.2 DOS Hebrew</A>
|
|
</H2>
|
|
|
|
<P>The Hebrew encoding starts at 128d for Aleph. Therefore, encoding requires
|
|
8 bits. This is what you have on the Video card EPROM hardware fonts, all of
|
|
the Hebrew DOS based editors use this table (Qtext, HED, etc.).
|
|
<P>
|
|
<H2><A NAME="ss2.3">2.3 ISO Hebrew</A>
|
|
</H2>
|
|
|
|
<P>The Hebrew encoding starts at 224 for Aleph. This is the Internet standard,
|
|
international standard and basically the standard for Ms-Windows and for
|
|
Macintoshes (Dagesh, etc...).
|
|
<P>
|
|
<H2><A NAME="ss2.4">2.4 OLD PC Hebrew</A>
|
|
</H2>
|
|
|
|
<P>This is 7-bit, and obsolete, as it occupies essentially the same ASCII
|
|
range as English lowercase letters. So, it is best avoided. However, when
|
|
ISO Hebrew gets its eighth bit stripped off by some ignorant Unix mail program
|
|
(so you get a jumble of English letters for the Hebrew part of your message
|
|
and the regular English, reversed or not, mixed in), you will get this, and
|
|
will need to transform it to PC or ISO. If there was English mixed in with
|
|
the Hebrew, this will be a sad situation, as you will either get Hebrew plus
|
|
jumble, or English plus jumble...
|
|
<P>
|
|
<H2><A NAME="ss2.5">2.5 Conversions</A>
|
|
</H2>
|
|
|
|
<P>Here are some simple scripts to convert from each standard to the other:
|
|
<BLOCKQUOTE><CODE>
|
|
<PRE>
|
|
DOS - ISO: tr '\200-\232' '\340-\372' < {dos_file} > {iso_file}
|
|
ISO - DOS: tr '\340-\372' '\200-\232' < {iso_file} > {dos_file}
|
|
OLD - DOS: tr -z '\200-\232' < {old_Hebrew_file} > {dos_file}
|
|
</PRE>
|
|
</CODE></BLOCKQUOTE>
|
|
<P>NOTE: The numbers use by <CODE>tr</CODE> are in octal!
|
|
<P>
|
|
<HR>
|
|
<A HREF="Hebrew-HOWTO-3.html">Next</A>
|
|
<A HREF="Hebrew-HOWTO-1.html">Previous</A>
|
|
<A HREF="Hebrew-HOWTO.html#toc2">Contents</A>
|
|
</BODY>
|
|
</HTML>
|