260 lines
10 KiB
HTML
260 lines
10 KiB
HTML
<!--startcut ==========================================================-->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<title>A Dictionary-Thesaurus for Linux</title>
|
|
</head>
|
|
|
|
<body bgcolor="#FFFFFF">
|
|
|
|
<!--endcut ============================================================-->
|
|
|
|
<H4>
|
|
"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
|
</H4>
|
|
|
|
<P> <HR> <P>
|
|
<!--===================================================================-->
|
|
|
|
<center><img alt="WordNet logo" src="./gx/ayers/wn.gif"></center>
|
|
<hr>
|
|
<center><h1>WordNet: A Fast and Flexible Word Database</h1></center>
|
|
|
|
<center>
|
|
<h4>By <a href="mailto: layers@marktwain.net">Larry Ayers</a></h4>
|
|
</center>
|
|
<hr>
|
|
|
|
<center><h3>Introduction</h3></center>
|
|
|
|
<p>There have been times when I've wished I had dictionary program with a
|
|
Linux interface. One of my favorite print dictionaries is the American
|
|
Heritage Dictionary, which I've used and appreciated for many years.
|
|
<a href="http://www.dux.com/ahdictux.html">Dux Software</a>
|
|
has been offering a Linux version of their computer interface to the
|
|
dictionary (combined with a thesaurus) for some time now, but I was deterred
|
|
by the $49.95 price-tag, considering the existence of a perfectly usable print
|
|
copy of the book, sitting on a shelf not six feet from where I'm typing this.
|
|
But there are powers which the computer-based dictionary possesses which can
|
|
be quite useful. A computer excels at searching for information in a
|
|
database, and combined with the power of regular expressions a digital
|
|
dictionary has significant advantages.
|
|
|
|
<p>The only free digital dictionary I've come across was an old edition of
|
|
Webster's, available from Project Gutenberg in the form of two large text
|
|
files. These could be searched for a word with <em>grep</em>, but I was
|
|
looking for something with an X interface; <em>grep</em> would also find
|
|
instances of words used within a definition, which would clutter up the
|
|
output. Of course there are on-line WWW dictionaries, which are fine for
|
|
people who are on-line most of the time. Users accessing the net via a
|
|
dial-up connection with an ISP are unlikely to be online while writing text
|
|
for which a dictionary would be needful. I happened across a usenet posting
|
|
recently which led me to to this <a
|
|
href="http://www.cogsci.princeton.edu/~wn/">site</a>, and before long I was
|
|
downloading a 13 mb. archive containing a dictionary/thesaurus called WordNet.
|
|
|
|
<center><h3>WordNet Basics</h3></center>
|
|
|
|
<p>The usenet announcement of the most recent WordNet release contained a
|
|
good description of the package:<br>
|
|
|
|
<blockquote>
|
|
WordNet is a powerful lexical reference system that combines aspects of
|
|
dictionaries and thesauri with current psycholinguistic theories of human
|
|
lexical memory. It is produced by the Cognitive Science Laboratory at
|
|
Princeton University, under the direction of Professor George Miller. In
|
|
WordNet, words are defined and grouped into various related sets of
|
|
synonyms. Not only is the system valuable to the casual user as a powerful
|
|
thesaurus and dictionary, but also to the researcher as one of the few
|
|
freely available, lexical databases. WordNet is available via an on-line
|
|
interface and also as easy-to-compile C source code for Unix.
|
|
</blockquote>
|
|
|
|
<p>WordNet consists of interlinked databases of words, synonyms, antonyms,
|
|
and usage examples. In the best unix tradition, this data can be manually
|
|
accessed via the command-line. This makes it relatively easy to create
|
|
script-based interfaces which can simplify the usage of the tool and provide a
|
|
windowed, menu-driven front-end. The distribution contains the source code
|
|
for the basic utilities and a Tcl/Tk interface, as well as statically linked
|
|
binaries and the database files.
|
|
|
|
<p>One difference between WordNet and a traditional dictionary is the lack of
|
|
etymologies, a feature typically used much less often than the simple display
|
|
of meaning and syntax. The inclusion of thesaurus-like features more than
|
|
makes up for this lack.
|
|
|
|
<p>A full WordNet installation, consisting of the data-files and the
|
|
command-line and statically-linked executables, occupies more than thirty
|
|
megabytes of disk space. This is an ideal job for the
|
|
<a href="http://e2compr.memalpha.cx/e2compr/">e2compr</a>
|
|
kernel-level transparent file-compression system; I compressed the database
|
|
directory and reduced it from thirty megabytes to eleven and one-half, with no
|
|
noticeable speed penalty. See LG #18 (June 1997) for an introduction to
|
|
<b>e2compr</b>.
|
|
|
|
<center><h3>Examples</h3></center>
|
|
|
|
<p>Here are a few examples of command-line use of WordNet:<br>
|
|
|
|
<p><kbd><pre>%->wn gazette -over
|
|
Overview of noun gazette
|
|
|
|
The noun gazette has 1 sense (no senses from tagged texts)
|
|
|
|
1. gazette -- (a newspaper)
|
|
|
|
Overview of verb gazette
|
|
|
|
The verb gazette has 1 sense (no senses from tagged texts)
|
|
|
|
1. gazette -- (publish in a gazette)</pre></kbd>
|
|
|
|
<p><b>wn</b> is the command-line search tool, and the switch <b>-over</b>
|
|
shows an overview of meaning and parts of speech the word can have.
|
|
|
|
<p><kbd><pre>%->wn gaz -grepn
|
|
|
|
Grep of noun gaz
|
|
gaza strip
|
|
gazania
|
|
gazania rigens
|
|
gaze
|
|
gazebo
|
|
gazella
|
|
gazella subgutturosa
|
|
gazella thomsoni
|
|
gazelle
|
|
gazelle hound
|
|
gazette
|
|
gazetteer
|
|
</pre></kbd>
|
|
|
|
<p>The switch <b>-grepn</b> searches the noun database for any noun containing
|
|
the string <i>gaz</i>; there are variants of this switch: <b>-grepv</b>,
|
|
<b>-grepa</b>, and <b>-grepr</b>, which respectively search for verbs,
|
|
adjectives, and adverbs. The various <b>grep</b> switches can be used to
|
|
determine the correct spelling of a word when you are certain of the spelling
|
|
of only a syllable or portion of the word.
|
|
|
|
<p><pre><kbd>
|
|
%->wn quell -framv
|
|
|
|
Sample Sentences of verb quell
|
|
|
|
2 senses of quell
|
|
|
|
Sense 1
|
|
squelch, quell
|
|
*> Somebody ----s something
|
|
|
|
Sense 2
|
|
quell, stay, appease
|
|
*> Something ----s
|
|
*> Something ----s somebody
|
|
</kbd></pre>
|
|
|
|
<p>The <b>-framv</b> switch used above shows how the word is used in
|
|
sentences.
|
|
|
|
<p><pre><kbd>
|
|
%->wn quell -simsv
|
|
|
|
Synonyms (Grouped by Similarity of Meaning) of verb quell
|
|
|
|
Sense 1
|
|
squelch, quell
|
|
=> suppress, stamp down, inhibit, subdue, conquer, curb
|
|
--------------
|
|
|
|
Sense 2
|
|
quell, stay, appease
|
|
=> meet, satisfy, fill, fulfill
|
|
--------------
|
|
</kbd></pre>
|
|
|
|
<p>The <b>-simsv</b> switch shows verb synonyms, and a variant <b>-simsn</b>
|
|
lists the noun synonyms of a word.There are a plethora of other <b>wn</b>
|
|
switches for finding antonyms, homonyms, and several other more obscure
|
|
lexical types, many of which have easier-to-use equivalents in the Tcl/Tk
|
|
windowed interface, <b>wnb</b>, which stands for WordNet Browser.
|
|
|
|
<center><h3>Using The Browser</h3></center>
|
|
|
|
<p>Here are screenshots of the browser window and a subsidiary sub-string
|
|
window, which takes the place of the <b>-grep[nvar]</b> switches used with
|
|
<b>wn</b>.<br>
|
|
|
|
<p><img alt="Main wnb window" src="./gx/ayers/wnb.gif">
|
|
|
|
<p><img alt="Substring (grep) window" src="./gx/ayers/grep.gif">
|
|
|
|
<p>This is a convenient and easy-to-use interface, with all functions
|
|
available from the menus. The output, though, isn't wrapped to fit the
|
|
screen, so to avoid having to scroll sideways to see it all the window should
|
|
be resized so that it is wider horizontally. You might be tempted (as I was)
|
|
to try compiling the source code, so that the <b>wnb</b> executable will use
|
|
your own Tcl/Tk libraries rather than the bulky statically-linked
|
|
libraries compiled into the supplied executable file. Unless you happen to
|
|
have the particular patch-level of Tcl-7.6 and Tk-4.2 which the source needs,
|
|
it probably won't compile (at least it wouldn't for me). If the <b>wnb</b>
|
|
interface was just a Tk script, it wouldn't be a big job to modify it so that
|
|
it uses a particular Tcl/Tk installation, but <b>wnb</b> has its own
|
|
specialized wish interpreter, which complicates updating the source for a
|
|
newer version of Tcl/Tk. Since the supplied Tcl/Tk interface is just a
|
|
convenient way of viewing the output from <b>wn</b>, perhaps a GTK, Qt, or
|
|
Emacs-LISP interface could be coded; this would make a welcome addition to the
|
|
KDE and GNOME projects. I've found that a handy way to run <b>wn</b> is in a
|
|
separate wide-and-short XEmacs shell-buffer frame.
|
|
|
|
<center><h3>Documentation and License Issues</h3></center>
|
|
|
|
<p>The documentation supplied with the distribution is complete and clearly
|
|
written; it's all an end-user should need. HTML, Postscript, and man-page
|
|
formats are included to cater to various reading preferences. If you are
|
|
curious about the psycholinguistic theoretical underpinnings of the project, a
|
|
Postscript file (5papers.ps) is available from the web-site.
|
|
|
|
<p>While writing this article I happened to be paging through the introductory
|
|
essays in the American Heritage dictionary. One of these essays was written
|
|
by one of the linguists responsible for the work which inspired the WordNet
|
|
project, Henry Kucera. It's called <cite>Computers in Language Analysis and
|
|
Lexicography</cite>, and it's a more general (though dated) overview of
|
|
psycholinguistics than the above mentioned collection of papers. If you're
|
|
wondering just what in the world the "Browne Corpus" is (mentioned on the
|
|
WordNet web-site), this essay explains it clearly.
|
|
|
|
<p>WordNet isn't licensed under the GPL, but the license isn't very
|
|
restrictive at all. The utilities and programs needed to create the word
|
|
databases are not distributed, but the supplied files are sufficient for most
|
|
needs.
|
|
|
|
<center><h3>FTP Sites</h3></center>
|
|
|
|
<p>WordNet can be obtained from its home
|
|
<a href="ftp://ftp.cogsci.princeton.edu/pub/wordnet">site</a>, but this is a
|
|
really slow site, and I had better luck obtaining the archive from this
|
|
<a href="ftp://ftp.ims.uni-stuttgart.de/pub/WordNet">mirror</a> site in
|
|
Germany. As useful as this package is, it really should be mirrored
|
|
elsewhere as well.
|
|
|
|
|
|
<!--===================================================================-->
|
|
<P> <hr> <P>
|
|
<center><H5>Copyright © 1998, Larry Ayers <BR>
|
|
Published in Issue 27 of <i>Linux Gazette</i>, April 1998</H5></center>
|
|
|
|
<!--===================================================================-->
|
|
<P> <hr> <P>
|
|
<A HREF="./index.html"><IMG ALIGN=BOTTOM SRC="../gx/indexnew.gif"
|
|
ALT="[ TABLE OF CONTENTS ]"></A>
|
|
<A HREF="../index.html"><IMG ALIGN=BOTTOM SRC="../gx/homenew.gif"
|
|
ALT="[ FRONT PAGE ]"></A>
|
|
<A HREF="./ayers2.html"><IMG SRC="../gx/back2.gif"
|
|
ALT=" Back "></A>
|
|
<A HREF="./ayers4.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
|
|
<P> <hr> <P>
|
|
<!--startcut ==========================================================-->
|
|
</BODY>
|
|
</HTML>
|
|
<!--endcut ============================================================-->
|