old-www/LDP/LG/issue87/okopnik.html

299 lines
17 KiB
HTML

<!--startcut ==============================================-->
<!-- *** BEGIN HTML header *** -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<title>Perl One-Liner of the Month: The Adventure of the Arbitrary Archives LG #87</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!-- *** END HTML header *** -->
<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="lodato.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue87/okopnik.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="pramode.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
<!--endcut ============================================================-->
<TABLE BORDER><TR><TD WIDTH="200">
<A HREF="http://www.linuxgazette.com/">
<IMG ALT="LINUX GAZETTE" SRC="../gx/2002/lglogo_200x41.png"
WIDTH="200" HEIGHT="41" border="0"></A>
<BR CLEAR="all">
<SMALL>...<I>making Linux just a little more fun!</I></SMALL>
</TD><TD WIDTH="380">
<CENTER>
<BIG><BIG><STRONG><FONT COLOR="maroon">Perl One-Liner of the Month: The Adventure of the Arbitrary Archives</FONT></STRONG></BIG></BIG>
<BR>
<STRONG>By <A HREF="../authors/okopnik.html">Ben Okopnik</A></STRONG>
</CENTER>
</TD></TR>
</TABLE>
<P>
<!-- END header -->
<p>Spring was in full bloom, and Woomert Foonly was enjoying another perfect
day. It had featured a trivially easy configuration of a 1,000-node Linux
cluster, and had been brought to an acme by lunching on Moroccan
<i>b'stila</i>
with just a touch of <i>ras el hanout</i> curry and a fruited <i>couscous</i>
on the side, complemented by a dessert of sweet rice with cinnamon. All
was at peace... until Frink Ooblick burst in, supporting - almost carrying
- a man who seemed to be in the last extremity of shock. Frink helped him
to the couch, then dropped into the easy chair, clearly exhausted by his
effort.
<p>&nbsp;-- "Woomert, it's simply scandalous. This is Resolv Dot Conf, a humble...
erm, well, a sysadmin, anyway. Recently, he was cruelly forced to install
some kind of a legacy OS on his manager's computer - can you imagine? -
and now, he's being asked to do something that sounds nearly impossible,
although I could only get scant details. He had heard of your reputation
(who hasn't, these days?), and was coming to see if you could help him,
but collapsed in the street just outside your door due to the residual
shock and a severe Jolt Cola deficiency. As to the problem... well, I'll
let him tell you."
<p>Woomert had been tending to their guest while listening, with the result
that the latter now looked almost normal. Indeed, Woomert's "sysadmin-grade
coffee" was well-known among the <i>cognoscenti</i> for its restorative
powers, although the exact recipe (it was thought to have Espresso Alexander
and coffee ice cream somewhere in its ancestry, but the various theories
diverged widely after that point) remained a deep secret.
<p>Now, though, the famous detective's eyes sharpened to that look of concentration
he habitually wore while working.
<p>&nbsp;-- "Please state your problem clearly and concisely."
<p>The quickly recovering sysadmin shook his head mournfully.
<p>&nbsp;-- "Well, Mr. Foonly... you see, what I have is a script that processes
the data submitted to us by our satellite offices. The thing is, it all
comes in various forms: we're a health insurance data processor, and every
company's format is different. Not only that, but the way everyone submits
the data is different: some just send us a plain data file, others use
'gzip', or 'compress', or 'bzip', or 'rar', or even 'tar' <i>and</i>
'compress' (or 'gzip'), and others - fortunately, all of those are just
plain data - hand us a live data stream out of their proprietary
applications. Our programmers handled the various format conversions as
soon as they got the specs, but this arbitrary compression problem was left
up to me, and it's got me up a tree!"
<p>He stopped to take a deep breath and another gulp of Woomert's coffee,
which seemed to revive him further, although he still sat hunched over,
his forehead resting on his hand.
<p>"Anyway, at this point, making it all work still requires human intervention;
we've got two people doing nothing but sorting and decompressing the files,
all day long. If it wasn't for that, the whole system could be completely
automated... and of course, management keeps at me: 'Why isn't it fixed
yet? Aren't you computer people supposed to...' and so on."
<p>When he finally sat up and looked at Woomert, his jaw was firmly set.
He was a man clearly resigned to his fate, no matter how horrible.
<p>"Be honest with me, Mr. Foonly. Is there a possibility of a solution,
or am I finished? I know The Mantra <a href="#1">[1]</a>, of course, but
I'd like to go on if possible; my users need me, and I know of The Dark
Powers that slaver to descend upon their innocent souls without a sysadmin
to protect them."
<p>Woomert nodded, recognizing the weary old warrior's words as completely
true; he, too, had encountered and battled The Dark Ones, creatures that
would completely unhinge the minds of the users if they were freed for
even a moment, and knew of the valiant SysAdmin's Guild
(<a href="http://sage.org">http://sage.org</a>) which had sworn
to protect the innocent (even though it was often protection
from themselves, via the application of the mystic and holy LART <a href="#2">[2]</a>).
<p>&nbsp;-- "Resolv, I'm very happy to say that there is indeed a solution to
the problem. I'm sure that you've done your research on the available tools,
and have heard of '<tt>atool</tt>', an archive manager by Oskar Liljeblad..."
<p>At Resolv's nod, he went on.
<p>"All right; then you also know that it will handle all of the above
archive formats and more. Despite the fact that it's written in Perl, we're
not going to use any of its code in your script - that would be a wasteful
duplication of effort. Instead, we're simply going to use '<tt>acat'</tt>,
one of <tt>'atool'</tt>s utilities, as an external filter - a conditional
one. All we have to do is insert it right at the beginning of your script,
like so:
<pre>
<hr WIDTH="100%">#!/usr/bin/perl -w
# Created by Resolv Dot Conf on Pungenday, Chaos 43, 3166 YOLD
<b>@ARGV = map { /\.(gz|tgz|zip|bz2|rar|Z)$/ ? "acat $_ '*' 2>/dev/null|" : $_ } @ARGV;
</b># Rest of script follows</pre>
<pre>...
<hr WIDTH="100%"></pre>
"Perl will take care of the appropriate magic - and that will take care
of the problem."
<p>The sysadmin was on his feet in a moment, fervently shaking Woomert's
hand.
<p>&nbsp;-- "Mr. Foonly, I don't know how to thank you. You've saved... well,
I won't speak of that, but I want you to know that you've always got a
friend wherever I happen to be. Wait until they see <i>this</i>!... Uh,
just to make sure I understand - what <b>is</b> it? How does it work?"
<p>Woomert glanced over at Frink, who also seemed to be on the edge of
his seat, eager for the explanation.
<p>&nbsp;-- "What do you think, Frink - can you handle this one? I've only used
one function and one operator; the rest of it happened automagically, simply
because of the way that Perl deals with files on the command line."
<p>Frink turned a little pink, and chewed his thumb as he always did when
he was nervous.
<p>&nbsp;-- "Well, Woomert... I know you told me to study the 'map' function,
but it was pretty deep; I got lost early on, and then there was this new
movie out..."
<p>Woomert smiled and shook his head.
<p>&nbsp;-- "All right, then. 'map', as per the info from '<tt>perldoc -f map</tt>',
evaluates the specified expression or block of expressions for each element
of a list - sort of like a 'for' loop, but much shorter and more convenient
in many cases. I also used the ternary conditional operator ('<tt>?:</tt>')
which works somewhat like an "if-then-else" construct:
<pre>
<hr WIDTH="100%"># Ternary conditional op - sets $a to 5 if $b is true, to 10 otherwise
$a = $b ? 5 : 10;
# "if-then-else" construct - same action
if ( $b ){
$a = 5;
}
else {
$a = 10;
}
<hr WIDTH="100%"></pre>
"Both of the above do the same thing, but again, the first method is shorter
and often more convenient. Examining the script one step at a time, what
I have done is test each of the elements in @ARGV, which initially contains
everything on the command line that follows the script name, against the
following regular expression:
<p><tt>/\.(gz|tgz|zip|bz2|rar|Z)$/</tt>
<p>This will match any filename that ends in a period (a literal dot) followed
by any of the specified extensions.
<p>Now, if the filename <i>doesn't</i> match the regex, the ternary
operator returns the part after the colon, '<tt>$_</tt>' - which simply
contains the original filename. Perl then processes the filename as it
normally does the ones contained in @ARGV: it opens a filehandle to that
file and makes its contents available within the script. In fact, there
are a number of ways to access the data once that's done; read up on the
diamond operator ('<tt>&lt;&gt;</tt>') , the STDIN filehandle, and the ARGV
filehandle (note the similarity <i>and</i> the difference, Frink!) for
information on some of the many available methods of doing file I/O in
Perl."
<p>"On the other hand, if the current element <i>does</i> match, the ternary
operator will return the code before the colon, in this case<b></b>
<p><tt>"acat $_ '*' 2&gt;/dev/null|"</tt>
<p>Perl will then execute the above command for the current filename. The
syntax may seem a little odd, but it's what '<tt>acat</tt>' (or, more to
the point, the archive utilities that it uses) requires to process the
files and ignore the error messages. Note that the command ends in '|',
the pipe symbol; what happens here is much like doing a pipe within the
shell. The command will be executed, the output will be placed in a memory
buffer, and the contents of that buffer will become available on the filehandle
that Perl would normally have opened for that file - presto, pure magic!
<a href="#3">[3]</a>"
<p>"So, to break it all out in long form, here's what I did:
<pre>
<hr WIDTH="100%">@ARGV =
map { # Use the BLOCK syntax of 'map'
if ( /\.(gz|tgz|zip|bz2|rar|Z)$/ ){ # Look for archive extensions
"acat $_ '*' 2&gt;/dev/null|"; # Uncompress/pipe out the contents
}
else {
$_; # Otherwise, return original name
}
} @ARGV; # This is the list to "walk" over
<hr WIDTH="100%"></pre>
"Perl handles it from that point on. Once you pass it something useful
on the command line or standard input, it knows just what to do. In fact,"
he glanced sternly over at Frink, who once again looked abashed,
"studying '<tt>perldoc perlopentut</tt>' is something I recommend to anyone
who wants to understand how Perl does I/O. This includes files, pipes,
forking child processes, building filters, dealing with binary files, duplicating
file handles, the single-argument version of 'open', and many other things.
In some ways, this could be called the most important document that comes
with Perl. Taking a look at '<tt>perldoc perlipc</tt>' as a follow-up would
be a good idea as well - it deals with a number of related issues, including
opening safe (low privilege) pipes to possibly insecure processes, something
that can become very important in a hurry."
<p>&nbsp;-- "Now, Resolv, I believe that you have a bright new future stretching
out ahead of you; your problem will be solved, your management will be
pleased, and your users will remain safe from Those Outside The Pale. If
you would care to join us in a little celebration, I've just finished boiling
a Spotted Dog, and - oh. Where did he go?... It's a very fine English pudding
with currants, after all. Well, I suppose he wanted to implement that change
as soon as possible..."
<hr WIDTH="100%">
<H3>Footnotes</H3>
<p><a NAME="1"></a>[1] "Down, Not Across." For those who need additional
clues on the grim meaning of The Sysadmin Mantra, search the archives of
<b>alt.sysadmin.recovery</b>
at <a href="http://groups.google.com">&lt;http://groups.google.com&gt;</a>,
and all will become clear. If it does not, then you weren't meant to know.
:)
<p><a NAME="2"></a>[2] From The Jargon File:
<pre> Luser Attitude Readjustment Tool. ... The LART classic is a 2x4 or
other large billet of wood usable as a club, to be applied upside the
head of spammers and other people who cause sysadmins more grief than
just naturally goes with the job. Perennial debates rage on
alt.sysadmin.recovery over what constitutes the truly effective LART;
knobkerries, semiautomatic weapons, flamethrowers, and tactical nukes
all have their partisans. Compare {clue-by-four}.</pre>
<p><br><a NAME="3"></a>[3] See "perldoc perlopentut" for a tutorial on
opening files, the 'magic' in @ARGV, and even "Dispelling the Dweomer"
for those who have seen too much magic already. :)
<!-- *** BEGIN author bio *** -->
<P>&nbsp;
<P>
<P> Ben is a Contributing Editor for Linux Gazette and a member of
The Answer Gang.
<!-- *** BEGIN bio *** -->
<P>
<IMG ALT="picture" SRC="../../gx/2002/tagbio/ben-okopnik.jpg" WIDTH="199"
HEIGHT="200" ALIGN="left" HSPACE="10" VSPACE="10">
<em>
Ben was born in Moscow, Russia in 1962. He became interested in
electricity at age six--promptly demonstrating it by sticking a fork into
a socket and starting a fire--and has been falling down technological mineshafts
ever since. He has been working with computers since the Elder Days, when
they had to be built by soldering parts onto printed circuit boards and
programs had to fit into 4k of memory. He would gladly pay good money to any
psychologist who can cure him of the resulting nightmares.
<p>Ben's subsequent experiences include creating software in nearly a dozen
languages, network and database maintenance during the approach of a hurricane,
and writing articles for publications ranging from sailing magazines to
technological journals. Having recently completed a seven-year
Atlantic/Caribbean cruise under sail, he is currently docked in Baltimore, MD,
where he works as a technical instructor for Sun Microsystems.
<p>Ben has been working with Linux since 1997, and credits it with his complete
loss of interest in waging nuclear warfare on parts of the Pacific Northwest.
</em>
<br CLEAR="all">
<!-- *** END bio *** -->
<!-- *** END author bio *** -->
<!-- *** BEGIN copyright *** -->
<hr>
<CENTER><SMALL><STRONG>
Copyright &copy; 2003, Ben Okopnik.
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
Published in Issue 87 of <i>Linux Gazette</i>, February 2003
</STRONG></SMALL></CENTER>
<!-- *** END copyright *** -->
<HR>
<!--startcut ==========================================================-->
<CENTER>
<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="lodato.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue87/okopnik.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="pramode.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
</CENTER>
</BODY></HTML>
<!--endcut ============================================================-->