362 lines
15 KiB
HTML
362 lines
15 KiB
HTML
<!--startcut ==========================================================-->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<title>How Not To Re-Invent The Wheel LG #34</title>
|
|
</HEAD>
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#A000A0"
|
|
ALINK="#FF0000">
|
|
<!--endcut ============================================================-->
|
|
|
|
<H4>
|
|
"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
|
</H4>
|
|
|
|
<P> <HR> <P>
|
|
<!--===================================================================-->
|
|
|
|
<center><h1>
|
|
<font color="maroon">How Not To Re-Invent The Wheel</font>
|
|
</h1></center>
|
|
<center>
|
|
<h4>By <a href="mailto: layers@marktwain.net">Larry Ayers</a></h4>
|
|
</center>
|
|
<p><hr><p>
|
|
|
|
<center><h3>Introduction</h3></center>
|
|
|
|
<p>With all of the excitement lately about various software firms planning Linux
|
|
ports of their products, it's easy to lose sight of the great power and
|
|
versatility of the unsung small utilities which are a part of every Linux
|
|
distribution. These tools, mostly GNU versions of small programs such as
|
|
<i>awk</i>, <i>grep</i> and <i>sed</i>, date back to the early pioneer days of
|
|
Unix and have been in wide use ever since. They typically have
|
|
specialized capabilities and become especially useful when they are chained
|
|
together and data is piped from one to another. Often a shell script serves
|
|
as the matrix in which they do their work.
|
|
|
|
<p>Sometimes a piece of software native to another operating system is ported
|
|
to Linux as an independent unit without taking advantage of pre-existing tools
|
|
which might have reduced the size of the program and reduced memory usage.
|
|
It's always a pleasure to happen upon software written with an awareness of
|
|
the power of Linux and its native utilities. <i>Bu</i> is a backup program
|
|
and <i>NoSQL</i> is an ASCII-table relational database system; what they have
|
|
in common is their usage of simple but effective Linux tools to accomplish
|
|
their respective tasks.
|
|
<hr>
|
|
|
|
<center><h3>
|
|
<font color="maroon">Shell-script Backups With bu</font>
|
|
</h3></center>
|
|
|
|
<p>Making a backup of the myriad files on a Linux system isn't necessary
|
|
for most stand-alone single-user machines. Backing up configuration and
|
|
personal files to floppies or other removable media is normally all that is
|
|
necessary, assuming that a recent Linux distribution CD and a CDROM drive are
|
|
available. The situation becomes more complex with multi-user servers or with
|
|
machines used in a business setting, where the sheer number of irreplaceable
|
|
files makes this simple method impractical and time-consuming; in these cases
|
|
the traditional method in the unix world has been to use <i>cpio</i> or
|
|
another archiving utility to copy files to a tape drive. Though the price of
|
|
hard disks has plummeted in recent years while their capacity has ballooned,
|
|
reliable tape drives capable of storing the vast amounts of data a modern
|
|
hard-disk can hold are still quite expensive, sometimes rivalling the cost of
|
|
the computer they protect from loss of data.
|
|
|
|
<p>Vincent Stemen has developed a small backup utility called <i>bu</i> which
|
|
is shell-based and makes good use of standard Linux utilities such as
|
|
<i>cp</i> and <i>sed</i>. Rather than being intended for
|
|
backups to tape or other streaming device, <i>bu</i> is designed to mirror
|
|
files on another file-system, preferably located on a separate hard drive.
|
|
|
|
<p><i>Bu</i> is just a twelve kilobyte shell script along with a few
|
|
configuration files. It's remarkably capable; compare this list of features
|
|
with those of other backup utilities:<br>
|
|
|
|
<ul>
|
|
<li>Checks timestamps and only copies new or changed files
|
|
<li>Deals with symbolic links intelligently
|
|
<li>Writes a log-file upon completion
|
|
<li>Will ignore directories which are mounted filesystems
|
|
<li>Easy specification of files and directories to include or exclude
|
|
</ul>
|
|
|
|
<p><i>Bu</i> in its earlier versions used <i>cpio</i> extensively, but due to
|
|
a problem with new directory permissions <i>cp</i> is the main engine of the
|
|
utility now. <i>Cp -a</i> used by itself can be used to bulk-copy entire
|
|
filesystems to a new location, but the symbolic links would have to be dealt
|
|
with manually, which is time-consuming. Also missing would be the ability to
|
|
automatically include and exclude specific files and directories; <i>bu</i>
|
|
refers to two configuration files, <kbd>/usr/local/backups/Exclude</kbd> and
|
|
<kbd>/usr/local/backups/Include</kbd>, for this information.
|
|
|
|
<p>This small and handy utility isn't intended to completely supplant
|
|
traditional tape-drive backup systems, but its author has been using <i>bu</i>
|
|
as the basis of a backup strategy involving several development machines and
|
|
several gigabytes of data. <i>Bu</i> can be obtained from this
|
|
<a href="http://www.crel.com/bu/">web-page</a>; be sure to read the white
|
|
paper included in the distribution which details the rationale behind the utility.
|
|
<hr>
|
|
|
|
<center><h3>
|
|
<font color="maroon">The NoSQL Relational Database</font>
|
|
</h3></center>
|
|
|
|
<p>Carlo Strozzi (a member of the Italian Linux society) has developed a
|
|
relational database management system (RDBMS) which uses tab-delimited ASCII
|
|
tables as its data format. <i>NoSQL</i> is a descendant of an RDBMS developed
|
|
by Walter W. Hobbs (of the RAND Organization) called <i>RDB</i>. The
|
|
commercial product <i>/rdb</i> sold by Revolutionary Software is similar, but
|
|
uses more compiled C code for greater speed.
|
|
|
|
<p>Carlo Strozzi had this to say about his motivation for developing
|
|
<i>NoSQL</i> (excerpted from the documentation):
|
|
|
|
<blockquote><pre>
|
|
Several times I have found myself writing applications that
|
|
needed to rely upon simple database management tasks. Most
|
|
commercial database products are often too costly and too
|
|
feature-packed to encourage casual use. There are also plenty of
|
|
good freeware databases around, but they too tend to provide far
|
|
more that I need most of the times, and they too lack the
|
|
shell-level approach of NoSQL. Admittedly, having been written
|
|
with interpretive languages (Shell, Perl, AWK), NoSQL is not the
|
|
fastest DBMS of all, at least not always (a lot depends on the
|
|
application).
|
|
</pre></blockquote>
|
|
|
|
<p>The philosophy behind these database systems is well-expressed in an
|
|
article titled <cite>A 4GL Language</cite>, which was written by Evan
|
|
Schaffer and Mike Wolf, founders of Revolutionary Software. The paper
|
|
originally appeared in the March 1991 issue of the Unix Review; a Postscript
|
|
version is included with the <i>NoSQL</i> documentation. Here is the
|
|
abstract:
|
|
|
|
<blockquote><pre>
|
|
There are many database systems available for UNIX. But almost
|
|
all are software prisons that you must get into and leave the
|
|
power of UNIX behind. Most were developed on operating systems
|
|
other than UNIX. Consequently their developers had very few
|
|
software features to build upon, and wrote the functionality they
|
|
needed directly, without regard for the features provided by the
|
|
operating system. The resulting database systems are large,
|
|
complex programs which degrade total system performance,
|
|
especially when they are run in a multi-user environment.
|
|
|
|
UNIX provides hundreds of programs that can be piped together to
|
|
easily perform almost any function imaginable. Nothing comes
|
|
close to providing the functions that come standard with
|
|
UNIX. Programs and philosophies carried over from other systems
|
|
put walls between the user and UNIX, and the power of UNIX is
|
|
thrown away.
|
|
|
|
The shell, extended with a few relational operators, is the
|
|
fourth generation language most appropriate to the UNIX
|
|
environment.
|
|
</pre></blockquote>
|
|
|
|
<p>The complete article is well worth reading for anyone who has ever wondered
|
|
just why Linux software is different than that used with mainstream operating
|
|
systems, and why GUI software has only recently began to become common.
|
|
|
|
<p><i>NoSQL</i> incorporates the ideas presented above. A major
|
|
difference between Walter W. Hobbs' <i>RDB</i> database and <i>NoSQL</i> is
|
|
that <i>NoSQL</i> uses <i>awk</i> extensively to perform tasks handled by
|
|
<i>perl</i> in <i>RDB</i>. <i>Awk</i> is a more specialized tool with a much
|
|
smaller memory footprint, and since the data-pipelining which is the essence
|
|
of these relational database management systems requires repeated invocation
|
|
of their respective interpreters, <i>NoSQL</i> exerts less of a strain
|
|
on a system's resources, especially important in a multi-user environment.
|
|
|
|
<p>After installing the package (no compilation is involved) a new
|
|
subdirectory under <kbd>/usr/local/lib</kbd> called <kbd>nosql</kbd> will be
|
|
created and populated; it will have these subdirectories:
|
|
|
|
<dl>
|
|
<dt><kbd>awk</kbd>
|
|
<dd> contains several <i>awk</i> scripts which are
|
|
responsible for most of the table-manipulation jobs
|
|
<p><dt><kbd>doc</kbd>
|
|
<dd>contains both Postscript and HTML versions of the
|
|
readable and complete <i>NoSQL</i> documentation, as well as a
|
|
Postscript version of the Schaffer and Wolf article from the Unix Review
|
|
<p><dt><kbd>mylib</kbd>
|
|
<dd>an empty directory for new scripts and programs
|
|
<p><dt><kbd>perl</kbd>
|
|
<dd><i>perl</i> scripts which perform other <i>NoSQL</i> functions
|
|
<p><dt><kbd>sh</kbd>
|
|
<dd>shell scripts which act as wrappers for the <i>awk</i>
|
|
and <i>perl</i> scripts.
|
|
</dl>
|
|
|
|
<p>The entire subdirectory occupies just under 600 kb., most of which is
|
|
documentation.
|
|
|
|
<p>After installing the files, the only other step needed before trying out
|
|
the database is setting three environment variables. Here are three lines
|
|
from my <kbd>.zshenv</kbd> file (<i>bash</i> users should have these lines in
|
|
the <kbd>.bash_profile </kbd> file):<br>
|
|
|
|
<pre><kbd>
|
|
export NSQLIB=/usr/local/lib/nosql
|
|
export NSQSH=/bin/ash
|
|
export NSQAWK=/usr/bin/mawk
|
|
</kbd></pre>
|
|
|
|
<p>Carlo Strozzi recommends using <i>ash</i> rather than one of the larger and
|
|
more powerful shells such as <i>bash</i> or <i>zsh</i>; <i>ash</i> uses less
|
|
memory. and since the shell is repeatedly invoked while using <i>NoSQL</i> the
|
|
upshot will be a noticeable increase in speed and a reduction in memory
|
|
requirements.
|
|
|
|
<p>Since there is no compiled code in the package, <i>NoSQL</i> should run on
|
|
any machines which have <i>awk</i> and <i>perl</i> available; in other words
|
|
the database isn't Linux-centric. The ASCII format of the data tables is also
|
|
very portable, and can be manipulated by text editors and common filesystem
|
|
tools. Data can be extracted from tables by means of various "operators"
|
|
via input-output redirection (e.g., pipes, STDIN and STDOUT). The only limits
|
|
on the amount of data which can be handled are in the machine running
|
|
<i>NoSQL</i>; the installed memory and processor speed are the limiting
|
|
factors.
|
|
|
|
<p>As the name implies this is not an SQL database, which should make
|
|
<i>NoSQL</i> more accessible to users lacking SQL expertise. I don't know SQL
|
|
at all and I found the basic commands of <i>NoSQL</i> easy to learn. All
|
|
commands are executed as parameters of the <kbd>nosql</kbd> shell script.
|
|
Here's an example <i>NoSQL</i> table:<br>
|
|
|
|
<blockquote><pre>
|
|
Name Freq Height Season
|
|
---- ---- ------ ------
|
|
laccaria 27 6 Fall
|
|
lepiota 5 8 Summer
|
|
amanita 42 7 Summer
|
|
lentinus 85 5 Spring-Fall
|
|
morchella 45 6 Spring
|
|
boletus 65 5 Summer
|
|
russula 75 4 Summer
|
|
</pre></blockquote>
|
|
|
|
<p>Single tabs must separate the fields from each other, even the spaces
|
|
between the groups of dashes on the dashed separator line must be single tabs.
|
|
An alternate format for the tabular data is the list; the above table can be
|
|
converted to this format with the command<br>
|
|
|
|
<p><kbd>nosql tabletolist < [filename]</kbd>
|
|
|
|
<p>The results look like this:<br>
|
|
|
|
<pre>
|
|
|
|
Name laccaria
|
|
Freq 27
|
|
Height 6
|
|
Season Fall
|
|
|
|
Name lepiota
|
|
Freq 5
|
|
Height 8
|
|
Season Summer
|
|
|
|
Name amanita
|
|
Freq 42
|
|
Height 7
|
|
Season Summer
|
|
|
|
Name lentinus
|
|
Freq 85
|
|
Height 5
|
|
Season Spring-Fall
|
|
|
|
Name morchella
|
|
Freq 45
|
|
Height 6
|
|
Season Spring
|
|
|
|
Name boletus
|
|
Freq 65
|
|
Height 5
|
|
Season Summer
|
|
|
|
Name russula
|
|
Freq 75
|
|
Height 4
|
|
Season Summer
|
|
|
|
</pre>
|
|
|
|
<p>If the above table were named <kbd>pilze.rdb</kbd>, either the command<br>
|
|
|
|
<p><kbd>nosql istable < pilze.rdb</kbd>
|
|
|
|
<p>or
|
|
|
|
<p><kbd>nosql islist < pilze.rdb</kbd>
|
|
|
|
<p>would ask <kbd>nosql</kbd> to check the table or list format's validity,
|
|
depending on which format is being checked. Another command,<br>
|
|
|
|
<p><kbd>nosql edit < pilze.rdb</kbd><br>
|
|
|
|
<p>will open the file in the editor defined by the EDITOR environment variable
|
|
(often set to <i>vi</i> by default). A file in table format is automatically
|
|
converted into the vertical list format for easier editing, then changed back
|
|
into a table when exiting the editor. When the file is saved or closed
|
|
<i>NoSQL</i> will automatically check the validity of the format and give the
|
|
line numbers where any errors occur. This seemingly obsessive concern with
|
|
correct format isn't mere pedantry; the various <i>NoSQL</i> operators which
|
|
manipulate and extract data need to be able to quickly distinguish headers
|
|
from data and data-fields from each other, and single tabs are the criteria.
|
|
|
|
<p>There are over forty operator functions available, some of which extract or
|
|
rearrange fields while others are used to generate reports. Their names are
|
|
more-or-less mnemonic, such as <i>inscol</i> and <i>addcol</i>, which are used
|
|
to insert a column into a table, respectively on the left- or right-hand side.
|
|
Other operators index and search tables. Examples of typical usage (i.e.,
|
|
connecting <i>NoSQL</i> commands with pipes) are included in the
|
|
documentation.
|
|
|
|
<p>As with any Open-Source software, it's hard to tell how many people or
|
|
organizations are using it. In an e-mail, I asked Carlo Strozzi for examples
|
|
of real-world usage of <i>NoSQL</i>; he replied that he has been using it
|
|
quite a bit for database-backed CGI scripts for the WWW. He also stated that
|
|
several companies in Italy are using it internally.
|
|
|
|
Carlo Strozzi works for IBM in Italy, and he has developed several web
|
|
applications backed by <i>NoSQL</i>; three of the publicly accessible pages
|
|
are:<br>
|
|
|
|
<p>
|
|
<a href="http://www.whoswho-sutter.com">Fortune companies and people profiles<a>
|
|
<p><a href="http://www.secondamano.it">Classifieds - this is in Italian</a>
|
|
<p><a href="http://annunci-auto.repubblica.it">Car classifieds, in Italian</a>
|
|
|
|
<p>The latest version of <i>NoSQL</i> can be obtained from
|
|
this <a href="ftp://ftp.linux.it/pub/database/NoSQL">FTP site.</a>
|
|
|
|
<hr>
|
|
|
|
<!-- hhmts start -->
|
|
Last modified: Thu 29 Oct 1998
|
|
<!-- hhmts end -->
|
|
|
|
<!--===================================================================-->
|
|
<P> <hr> <P>
|
|
<center><H5>Copyright © 1998, Larry Ayers <BR>
|
|
Published in Issue 34 of <i>Linux Gazette</i>, November 1998</H5></center>
|
|
|
|
<!--===================================================================-->
|
|
<P> <hr> <P>
|
|
<A HREF="./index.html"><IMG ALIGN=BOTTOM SRC="../gx/indexnew.gif"
|
|
ALT="[ TABLE OF CONTENTS ]"></A>
|
|
<A HREF="../index.html"><IMG ALIGN=BOTTOM SRC="../gx/homenew.gif"
|
|
ALT="[ FRONT PAGE ]"></A>
|
|
<A HREF="./ayers1.html"><IMG SRC="../gx/back2.gif"
|
|
ALT=" Back "></A>
|
|
<A HREF="./ayers3.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
|
|
<P> <hr> <P>
|
|
<!--startcut ==========================================================-->
|
|
</BODY>
|
|
</HTML>
|
|
<!--endcut ============================================================-->
|