214 lines
10 KiB
HTML
214 lines
10 KiB
HTML
|
<!--startcut ==========================================================-->
|
||
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
||
|
<HTML>
|
||
|
<HEAD>
|
||
|
<title>The DICT Project LG #33</title>
|
||
|
</HEAD>
|
||
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#A000A0"
|
||
|
ALINK="#FF0000">
|
||
|
<!--endcut ============================================================-->
|
||
|
|
||
|
<H4>
|
||
|
"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
||
|
</H4>
|
||
|
|
||
|
<P> <HR> <P>
|
||
|
<!--===================================================================-->
|
||
|
|
||
|
<center>
|
||
|
<h1><font color="maroon">DICT and Word Inspector</font></h1>
|
||
|
<h4>By <a href="mailto: layers@marktwain.net">Larry Ayers</a></h4>
|
||
|
</center>
|
||
|
<P> <HR> <P>
|
||
|
|
||
|
<center><h3>Introduction</h3></center>
|
||
|
|
||
|
<p>Access to an on-line dictionary has been possible for several years now due
|
||
|
to the Webster TCPIP protocol. Webster is useful but the number of servers
|
||
|
has been on the decline, and the protocol itself is limited by its dependence
|
||
|
on a single dictionary database. Rik Faith, a programmer responsible for many
|
||
|
of the essential-but-taken-for-granted Linux utilities, has created a new,
|
||
|
more flexible protocol known as DICT. DICT is another TCPIP protocol (usable
|
||
|
either over a network or on a local machine) which provides access to any
|
||
|
number of dictionary databases. Local access is provided by a client program
|
||
|
called <i>dict</i> which contacts the <i>dictd</i> server daemon.
|
||
|
<i>Dictd</i> then searches the available databases and makes any hits
|
||
|
available to <i>dict</i>, which pipes its output to the default pager on the
|
||
|
local machine (usually either <i>more</i>, <i>less</i>, or <i>most</i>). Net
|
||
|
access is available from several servers, including the home
|
||
|
<a href="http://www.dict.org">DICT</a> site. Looking up words while on-line
|
||
|
frees the user from needing to install and run the <i>dictd</i> and
|
||
|
<i>dict</i> client and server programs (as well as having to make room for the
|
||
|
bulky databases on a local disk), but if you have the disk space it's
|
||
|
convenient to have the service available at any time.
|
||
|
|
||
|
<p>The <i>dictd</i> and <i>dict</i> programs are licensed under the GPL, so
|
||
|
naturally they are set up to use freely available word databases.
|
||
|
|
||
|
<center><h3>Installing The DICT Distribution Locally</h3></center>
|
||
|
|
||
|
<p>DICT is a typical Unix-style command-line set of programs. GUI-fans will
|
||
|
regret the absence of a graphical interface, but the glass is really
|
||
|
half-full. Due to the absence of oft-troublesome GUI toolkit dependencies,
|
||
|
the source for the client and server programs should compile easily. Toolkits
|
||
|
come and go, but applications written with a simple console interface can
|
||
|
easily be adapted to whatever the future toolkit du jour might be. There are
|
||
|
numerous programmers who lack the time or inclination to develop Linux
|
||
|
utilities from scratch, but welcome the opportunity to write GUI front-ends to
|
||
|
console programs (see the Word Inspector section below).
|
||
|
|
||
|
<p>
|
||
|
Compiling and installing <i>dictd</i> and <i>dict</i> isn't difficult, but to
|
||
|
make use of them the word databases need to be downloaded and installed. Here
|
||
|
is a list of the free databases which are currently available from the
|
||
|
<a href="ftp://ftp.dict.org/pub/dict/pre/">DICT</a> FTP site:
|
||
|
|
||
|
<ul>
|
||
|
<li>A 1913 edition of Webster's Revised Unabridged Dictionary
|
||
|
<li>The Free On-Line Dictionary of Computing
|
||
|
<li>Eric Raymond's Jargon File
|
||
|
<li>The WordNet database
|
||
|
<li>Easton's 1897 Bible Dictionary
|
||
|
<li>Hitchcock's Bible Names Dictionary
|
||
|
<li>The Elements (physical elements)
|
||
|
<li>U.S Gazetteer (1990)
|
||
|
<li>The 1995 CIA World Factbook
|
||
|
</ul>
|
||
|
|
||
|
<p>All of these files and their indices will occupy about thirty-one megabytes
|
||
|
of disk space, roughly the same amount as the WordNet dictionary files alone.
|
||
|
The DICT data-files are compressed with a variant of <i>gzip</i> called
|
||
|
<i>dictzip</i>, also written by Rik Faith. <i>Dictzip</i> adds extra header
|
||
|
information to a compressed file which allows pseudo-random access to the
|
||
|
file. When the <i>dictd</i> server processes a request for a word it looks
|
||
|
first in the various index files. These files (which are human-readable)
|
||
|
are just simple lists with the location of each word within the compressed
|
||
|
dictionary file. <i>Dictd</i> is able to use this information to uncompress
|
||
|
just the single 64-kb. block of data which contains the word-entry. This
|
||
|
greatly speeds up access, as the entire dictionary file doesn't need to be
|
||
|
uncompressed and subsequently re-compressed for each transaction. Files
|
||
|
compressed with <i>dictzip</i> can be recognized by the <i>*.dz</i> suffix.
|
||
|
|
||
|
<p>Although <i>dictzip</i> doesn't compress quite as tightly as <i>gzip</i>,
|
||
|
the added advantage of the header information (at least for the sort of access
|
||
|
<i>dictd</i> needs) is a compensation. The above-listed dictionary files
|
||
|
would need nearly seventy-five megabytes of disk space if they weren't
|
||
|
compressed.
|
||
|
|
||
|
<center><h3>Comparison With WordNet</h3></center>
|
||
|
|
||
|
<p>In issue 27 of the Gazette, (April, 1998) I wrote about another
|
||
|
dictionary-database system called WordNet. In order to access a DICT database
|
||
|
the <i>dict</i> server must be running which communicates with <i>dict</i>
|
||
|
client programs, whereas WordNet isn't a client-server program; the small
|
||
|
<i>wn</i> program searches the database indices directly. The upshot is that
|
||
|
WordNet uses less memory than a DICT system, but since WordNet databases
|
||
|
aren't compressed they occupy more disk space than the specially compressed
|
||
|
DICT files. DICT files contain more words (along with etymologies, which
|
||
|
WordNet lacks) and can be supplemented with new files in the future, but DICT
|
||
|
lacks WordNet's powerful thesaurus and lexical usage capabilities. Another
|
||
|
factor to consider is that development of WordNet has ceased, whereas DICT is
|
||
|
still being improved and the chances of its continued development seem likely.
|
||
|
Additionally, DICT can use the WordNet data-files in a compressed format.
|
||
|
|
||
|
<center><h3>Configuration</h3></center>
|
||
|
|
||
|
<p>Sample configuration files are included with the DICT distribution. The
|
||
|
file <kbd>/etc/dictd.conf</kbd> should contain the location of your local
|
||
|
dictionary files in this format:<br>
|
||
|
|
||
|
<pre><kbd>
|
||
|
database web1913 { data "/mt/dict/web1913.dict.dz"
|
||
|
index "/mt/dict/web1913.index" }
|
||
|
database jargon { data "/mt/dict/jargon.dict.dz"
|
||
|
index "/mt/dict/jargon.index" }
|
||
|
</kbd></pre>
|
||
|
|
||
|
<p>The <i>dict</i> client needs to know where the server is; if a local
|
||
|
server is used a simple <kbd>~/.dictrc</kbd> file containing this line will
|
||
|
work:<br>
|
||
|
|
||
|
<pre><kbd>
|
||
|
server localhost
|
||
|
</kbd></pre>
|
||
|
|
||
|
<p>If both <kbd>~.dictrc</kbd> and <kbd>/etc/dict.conf</kbd> are missing the
|
||
|
<i>dict </i> client program will first attempt to access the www.dict.org
|
||
|
web-server; if that fails it will try some alternate sites. To prevent these
|
||
|
attempts (when running a local <i>dictd</i> server) just use the above
|
||
|
<kbd>~/.dictrc</kbd> file.
|
||
|
|
||
|
<center><h3>Drawbacks</h3></center>
|
||
|
|
||
|
<p><i>Dictd</i> might not be a service which you would want to run all of the
|
||
|
time. Though not a large executable, it uses a significant amount of memory,
|
||
|
typically four to five megabytes. I surmise that the daemon reads the
|
||
|
dictionary index-files into memory when it starts up and keeps them there.
|
||
|
This premise also would explain why the word look-ups are so speedy. Memory
|
||
|
access is much faster than disk access, and once the daemon determines from
|
||
|
the index which sixty-four kilobyte block holds the desired information it can
|
||
|
quickly decompress that small chunk of the dictionary file and serve up the
|
||
|
word information. I've found that starting <i>dictd</i> while writing or
|
||
|
whenever I become curious about word-usage and killing the daemon at other
|
||
|
times works well.
|
||
|
|
||
|
|
||
|
<center><h3>Word Inspector</h3></center>
|
||
|
|
||
|
<p>Scott W. Gifford has written a nice graphical front-end to the <i>dict</i>
|
||
|
client program called Word Inspector. Here's a screenshot of the initial
|
||
|
window:<br>
|
||
|
|
||
|
<p><img alt="Word Inspector Main Window" src="./gx/ayers/inspect1.gif">
|
||
|
|
||
|
<p>And here is one showing the output window:<br>
|
||
|
|
||
|
<p><img alt="Word Inspector Output Window" src="./gx/ayers/inspect2.gif">
|
||
|
|
||
|
<p>In the README file accompanying Word Inspector Scott Gifford suggests
|
||
|
setting up the application with several different window-manager menu-items.
|
||
|
Running <kbd>wordinspect --define --clipboard</kbd> will bring up a Word
|
||
|
Inspector output window (as shown in the second screenshot) with the contents
|
||
|
of the current X primary selection as the input. Alternatively,
|
||
|
<kbd>wordinspect --search --clipboard</kbd> will cause the initial window to
|
||
|
appear with the X primary selection already shown in the entry field, and
|
||
|
running just plain <kbd>wordinspect</kbd> will bring up an empty initial
|
||
|
window, so that a word can be typed in which isn't a mouse-selection.
|
||
|
These three commands could be set up in a submenu stemming from a top-level
|
||
|
Word Inspector menu-item.
|
||
|
|
||
|
<p>Word Inspector makes good use of right-mouse-button pop-up menus.
|
||
|
Right-clicking on any word in a definition pops up a menu allowing you to
|
||
|
either open a search (initial) window with the selected word already filled
|
||
|
in, or open a definition window for the word. Highlighting a series of words
|
||
|
with the mouse, then right-clicking, will enable the same behavior for
|
||
|
phrases.
|
||
|
|
||
|
<p>The source of the current version of Word Inspector is this
|
||
|
<a href="http://www.tir.com/~sgifford/wordinspect/">web-site</a>. The GTK
|
||
|
toolkit is required for compilation, with version 1.06 recommended.
|
||
|
|
||
|
<!-- hhmts start -->
|
||
|
Last modified: Mon 28 Sep 1998
|
||
|
<!-- hhmts end -->
|
||
|
|
||
|
<!--===================================================================-->
|
||
|
<P> <hr> <P>
|
||
|
<center><H5>Copyright © 1998, Larry Ayers <BR>
|
||
|
Published in Issue 33 of <i>Linux Gazette</i>, October 1998</H5></center>
|
||
|
|
||
|
<!--===================================================================-->
|
||
|
<P> <hr> <P>
|
||
|
<A HREF="./index.html"><IMG ALIGN=BOTTOM SRC="../gx/indexnew.gif"
|
||
|
ALT="[ TABLE OF CONTENTS ]"></A>
|
||
|
<A HREF="../index.html"><IMG ALIGN=BOTTOM SRC="../gx/homenew.gif"
|
||
|
ALT="[ FRONT PAGE ]"></A>
|
||
|
<A HREF="./jenkins2.html"><IMG SRC="../gx/back2.gif"
|
||
|
ALT=" Back "></A>
|
||
|
<A HREF="./ayers2.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
|
||
|
<P> <hr> <P>
|
||
|
<!--startcut ==========================================================-->
|
||
|
</BODY>
|
||
|
</HTML>
|
||
|
<!--endcut ============================================================-->
|