299 lines
17 KiB
HTML
299 lines
17 KiB
HTML
<!--startcut ==============================================-->
|
|
<!-- *** BEGIN HTML header *** -->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML><HEAD>
|
|
<title>Perl One-Liner of the Month: The Adventure of the Arbitrary Archives LG #87</title>
|
|
</HEAD>
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
|
|
ALINK="#FF0000">
|
|
<!-- *** END HTML header *** -->
|
|
|
|
<!-- *** BEGIN navbar *** -->
|
|
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="lodato.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue87/okopnik.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="pramode.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
|
|
<!-- *** END navbar *** -->
|
|
|
|
<!--endcut ============================================================-->
|
|
|
|
<TABLE BORDER><TR><TD WIDTH="200">
|
|
<A HREF="http://www.linuxgazette.com/">
|
|
<IMG ALT="LINUX GAZETTE" SRC="../gx/2002/lglogo_200x41.png"
|
|
WIDTH="200" HEIGHT="41" border="0"></A>
|
|
<BR CLEAR="all">
|
|
<SMALL>...<I>making Linux just a little more fun!</I></SMALL>
|
|
</TD><TD WIDTH="380">
|
|
|
|
|
|
<CENTER>
|
|
<BIG><BIG><STRONG><FONT COLOR="maroon">Perl One-Liner of the Month: The Adventure of the Arbitrary Archives</FONT></STRONG></BIG></BIG>
|
|
<BR>
|
|
<STRONG>By <A HREF="../authors/okopnik.html">Ben Okopnik</A></STRONG>
|
|
</CENTER>
|
|
|
|
</TD></TR>
|
|
</TABLE>
|
|
<P>
|
|
|
|
<!-- END header -->
|
|
|
|
|
|
|
|
<p>Spring was in full bloom, and Woomert Foonly was enjoying another perfect
|
|
day. It had featured a trivially easy configuration of a 1,000-node Linux
|
|
cluster, and had been brought to an acme by lunching on Moroccan
|
|
<i>b'stila</i>
|
|
with just a touch of <i>ras el hanout</i> curry and a fruited <i>couscous</i>
|
|
on the side, complemented by a dessert of sweet rice with cinnamon. All
|
|
was at peace... until Frink Ooblick burst in, supporting - almost carrying
|
|
- a man who seemed to be in the last extremity of shock. Frink helped him
|
|
to the couch, then dropped into the easy chair, clearly exhausted by his
|
|
effort.
|
|
<p> -- "Woomert, it's simply scandalous. This is Resolv Dot Conf, a humble...
|
|
erm, well, a sysadmin, anyway. Recently, he was cruelly forced to install
|
|
some kind of a legacy OS on his manager's computer - can you imagine? -
|
|
and now, he's being asked to do something that sounds nearly impossible,
|
|
although I could only get scant details. He had heard of your reputation
|
|
(who hasn't, these days?), and was coming to see if you could help him,
|
|
but collapsed in the street just outside your door due to the residual
|
|
shock and a severe Jolt Cola deficiency. As to the problem... well, I'll
|
|
let him tell you."
|
|
<p>Woomert had been tending to their guest while listening, with the result
|
|
that the latter now looked almost normal. Indeed, Woomert's "sysadmin-grade
|
|
coffee" was well-known among the <i>cognoscenti</i> for its restorative
|
|
powers, although the exact recipe (it was thought to have Espresso Alexander
|
|
and coffee ice cream somewhere in its ancestry, but the various theories
|
|
diverged widely after that point) remained a deep secret.
|
|
<p>Now, though, the famous detective's eyes sharpened to that look of concentration
|
|
he habitually wore while working.
|
|
<p> -- "Please state your problem clearly and concisely."
|
|
<p>The quickly recovering sysadmin shook his head mournfully.
|
|
<p> -- "Well, Mr. Foonly... you see, what I have is a script that processes
|
|
the data submitted to us by our satellite offices. The thing is, it all
|
|
comes in various forms: we're a health insurance data processor, and every
|
|
company's format is different. Not only that, but the way everyone submits
|
|
the data is different: some just send us a plain data file, others use
|
|
'gzip', or 'compress', or 'bzip', or 'rar', or even 'tar' <i>and</i>
|
|
'compress' (or 'gzip'), and others - fortunately, all of those are just
|
|
plain data - hand us a live data stream out of their proprietary
|
|
applications. Our programmers handled the various format conversions as
|
|
soon as they got the specs, but this arbitrary compression problem was left
|
|
up to me, and it's got me up a tree!"
|
|
<p>He stopped to take a deep breath and another gulp of Woomert's coffee,
|
|
which seemed to revive him further, although he still sat hunched over,
|
|
his forehead resting on his hand.
|
|
<p>"Anyway, at this point, making it all work still requires human intervention;
|
|
we've got two people doing nothing but sorting and decompressing the files,
|
|
all day long. If it wasn't for that, the whole system could be completely
|
|
automated... and of course, management keeps at me: 'Why isn't it fixed
|
|
yet? Aren't you computer people supposed to...' and so on."
|
|
<p>When he finally sat up and looked at Woomert, his jaw was firmly set.
|
|
He was a man clearly resigned to his fate, no matter how horrible.
|
|
<p>"Be honest with me, Mr. Foonly. Is there a possibility of a solution,
|
|
or am I finished? I know The Mantra <a href="#1">[1]</a>, of course, but
|
|
I'd like to go on if possible; my users need me, and I know of The Dark
|
|
Powers that slaver to descend upon their innocent souls without a sysadmin
|
|
to protect them."
|
|
<p>Woomert nodded, recognizing the weary old warrior's words as completely
|
|
true; he, too, had encountered and battled The Dark Ones, creatures that
|
|
would completely unhinge the minds of the users if they were freed for
|
|
even a moment, and knew of the valiant SysAdmin's Guild
|
|
(<a href="http://sage.org">http://sage.org</a>) which had sworn
|
|
to protect the innocent (even though it was often protection
|
|
from themselves, via the application of the mystic and holy LART <a href="#2">[2]</a>).
|
|
<p> -- "Resolv, I'm very happy to say that there is indeed a solution to
|
|
the problem. I'm sure that you've done your research on the available tools,
|
|
and have heard of '<tt>atool</tt>', an archive manager by Oskar Liljeblad..."
|
|
<p>At Resolv's nod, he went on.
|
|
<p>"All right; then you also know that it will handle all of the above
|
|
archive formats and more. Despite the fact that it's written in Perl, we're
|
|
not going to use any of its code in your script - that would be a wasteful
|
|
duplication of effort. Instead, we're simply going to use '<tt>acat'</tt>,
|
|
one of <tt>'atool'</tt>s utilities, as an external filter - a conditional
|
|
one. All we have to do is insert it right at the beginning of your script,
|
|
like so:
|
|
<pre>
|
|
<hr WIDTH="100%">#!/usr/bin/perl -w
|
|
# Created by Resolv Dot Conf on Pungenday, Chaos 43, 3166 YOLD
|
|
|
|
<b>@ARGV = map { /\.(gz|tgz|zip|bz2|rar|Z)$/ ? "acat $_ '*' 2>/dev/null|" : $_ } @ARGV;
|
|
|
|
</b># Rest of script follows</pre>
|
|
|
|
<pre>...
|
|
|
|
<hr WIDTH="100%"></pre>
|
|
"Perl will take care of the appropriate magic - and that will take care
|
|
of the problem."
|
|
<p>The sysadmin was on his feet in a moment, fervently shaking Woomert's
|
|
hand.
|
|
<p> -- "Mr. Foonly, I don't know how to thank you. You've saved... well,
|
|
I won't speak of that, but I want you to know that you've always got a
|
|
friend wherever I happen to be. Wait until they see <i>this</i>!... Uh,
|
|
just to make sure I understand - what <b>is</b> it? How does it work?"
|
|
<p>Woomert glanced over at Frink, who also seemed to be on the edge of
|
|
his seat, eager for the explanation.
|
|
<p> -- "What do you think, Frink - can you handle this one? I've only used
|
|
one function and one operator; the rest of it happened automagically, simply
|
|
because of the way that Perl deals with files on the command line."
|
|
<p>Frink turned a little pink, and chewed his thumb as he always did when
|
|
he was nervous.
|
|
<p> -- "Well, Woomert... I know you told me to study the 'map' function,
|
|
but it was pretty deep; I got lost early on, and then there was this new
|
|
movie out..."
|
|
<p>Woomert smiled and shook his head.
|
|
<p> -- "All right, then. 'map', as per the info from '<tt>perldoc -f map</tt>',
|
|
evaluates the specified expression or block of expressions for each element
|
|
of a list - sort of like a 'for' loop, but much shorter and more convenient
|
|
in many cases. I also used the ternary conditional operator ('<tt>?:</tt>')
|
|
which works somewhat like an "if-then-else" construct:
|
|
<pre>
|
|
<hr WIDTH="100%"># Ternary conditional op - sets $a to 5 if $b is true, to 10 otherwise
|
|
$a = $b ? 5 : 10;
|
|
|
|
# "if-then-else" construct - same action
|
|
if ( $b ){
|
|
$a = 5;
|
|
}
|
|
else {
|
|
$a = 10;
|
|
}
|
|
<hr WIDTH="100%"></pre>
|
|
"Both of the above do the same thing, but again, the first method is shorter
|
|
and often more convenient. Examining the script one step at a time, what
|
|
I have done is test each of the elements in @ARGV, which initially contains
|
|
everything on the command line that follows the script name, against the
|
|
following regular expression:
|
|
<p><tt>/\.(gz|tgz|zip|bz2|rar|Z)$/</tt>
|
|
<p>This will match any filename that ends in a period (a literal dot) followed
|
|
by any of the specified extensions.
|
|
<p>Now, if the filename <i>doesn't</i> match the regex, the ternary
|
|
operator returns the part after the colon, '<tt>$_</tt>' - which simply
|
|
contains the original filename. Perl then processes the filename as it
|
|
normally does the ones contained in @ARGV: it opens a filehandle to that
|
|
file and makes its contents available within the script. In fact, there
|
|
are a number of ways to access the data once that's done; read up on the
|
|
diamond operator ('<tt><></tt>') , the STDIN filehandle, and the ARGV
|
|
filehandle (note the similarity <i>and</i> the difference, Frink!) for
|
|
information on some of the many available methods of doing file I/O in
|
|
Perl."
|
|
<p>"On the other hand, if the current element <i>does</i> match, the ternary
|
|
operator will return the code before the colon, in this case<b></b>
|
|
<p><tt>"acat $_ '*' 2>/dev/null|"</tt>
|
|
<p>Perl will then execute the above command for the current filename. The
|
|
syntax may seem a little odd, but it's what '<tt>acat</tt>' (or, more to
|
|
the point, the archive utilities that it uses) requires to process the
|
|
files and ignore the error messages. Note that the command ends in '|',
|
|
the pipe symbol; what happens here is much like doing a pipe within the
|
|
shell. The command will be executed, the output will be placed in a memory
|
|
buffer, and the contents of that buffer will become available on the filehandle
|
|
that Perl would normally have opened for that file - presto, pure magic!
|
|
<a href="#3">[3]</a>"
|
|
<p>"So, to break it all out in long form, here's what I did:
|
|
<pre>
|
|
<hr WIDTH="100%">@ARGV =
|
|
map { # Use the BLOCK syntax of 'map'
|
|
if ( /\.(gz|tgz|zip|bz2|rar|Z)$/ ){ # Look for archive extensions
|
|
"acat $_ '*' 2>/dev/null|"; # Uncompress/pipe out the contents
|
|
}
|
|
else {
|
|
$_; # Otherwise, return original name
|
|
}
|
|
} @ARGV; # This is the list to "walk" over
|
|
|
|
<hr WIDTH="100%"></pre>
|
|
"Perl handles it from that point on. Once you pass it something useful
|
|
on the command line or standard input, it knows just what to do. In fact,"
|
|
he glanced sternly over at Frink, who once again looked abashed,
|
|
"studying '<tt>perldoc perlopentut</tt>' is something I recommend to anyone
|
|
who wants to understand how Perl does I/O. This includes files, pipes,
|
|
forking child processes, building filters, dealing with binary files, duplicating
|
|
file handles, the single-argument version of 'open', and many other things.
|
|
In some ways, this could be called the most important document that comes
|
|
with Perl. Taking a look at '<tt>perldoc perlipc</tt>' as a follow-up would
|
|
be a good idea as well - it deals with a number of related issues, including
|
|
opening safe (low privilege) pipes to possibly insecure processes, something
|
|
that can become very important in a hurry."
|
|
<p> -- "Now, Resolv, I believe that you have a bright new future stretching
|
|
out ahead of you; your problem will be solved, your management will be
|
|
pleased, and your users will remain safe from Those Outside The Pale. If
|
|
you would care to join us in a little celebration, I've just finished boiling
|
|
a Spotted Dog, and - oh. Where did he go?... It's a very fine English pudding
|
|
with currants, after all. Well, I suppose he wanted to implement that change
|
|
as soon as possible..."
|
|
<hr WIDTH="100%">
|
|
<H3>Footnotes</H3>
|
|
<p><a NAME="1"></a>[1] "Down, Not Across." For those who need additional
|
|
clues on the grim meaning of The Sysadmin Mantra, search the archives of
|
|
<b>alt.sysadmin.recovery</b>
|
|
at <a href="http://groups.google.com"><http://groups.google.com></a>,
|
|
and all will become clear. If it does not, then you weren't meant to know.
|
|
:)
|
|
<p><a NAME="2"></a>[2] From The Jargon File:
|
|
<pre> Luser Attitude Readjustment Tool. ... The LART classic is a 2x4 or
|
|
other large billet of wood usable as a club, to be applied upside the
|
|
head of spammers and other people who cause sysadmins more grief than
|
|
just naturally goes with the job. Perennial debates rage on
|
|
alt.sysadmin.recovery over what constitutes the truly effective LART;
|
|
knobkerries, semiautomatic weapons, flamethrowers, and tactical nukes
|
|
all have their partisans. Compare {clue-by-four}.</pre>
|
|
|
|
<p><br><a NAME="3"></a>[3] See "perldoc perlopentut" for a tutorial on
|
|
opening files, the 'magic' in @ARGV, and even "Dispelling the Dweomer"
|
|
for those who have seen too much magic already. :)
|
|
|
|
|
|
|
|
|
|
<!-- *** BEGIN author bio *** -->
|
|
<P>
|
|
<P>
|
|
<P> Ben is a Contributing Editor for Linux Gazette and a member of
|
|
The Answer Gang.
|
|
|
|
<!-- *** BEGIN bio *** -->
|
|
<P>
|
|
<IMG ALT="picture" SRC="../../gx/2002/tagbio/ben-okopnik.jpg" WIDTH="199"
|
|
HEIGHT="200" ALIGN="left" HSPACE="10" VSPACE="10">
|
|
<em>
|
|
Ben was born in Moscow, Russia in 1962. He became interested in
|
|
electricity at age six--promptly demonstrating it by sticking a fork into
|
|
a socket and starting a fire--and has been falling down technological mineshafts
|
|
ever since. He has been working with computers since the Elder Days, when
|
|
they had to be built by soldering parts onto printed circuit boards and
|
|
programs had to fit into 4k of memory. He would gladly pay good money to any
|
|
psychologist who can cure him of the resulting nightmares.
|
|
|
|
<p>Ben's subsequent experiences include creating software in nearly a dozen
|
|
languages, network and database maintenance during the approach of a hurricane,
|
|
and writing articles for publications ranging from sailing magazines to
|
|
technological journals. Having recently completed a seven-year
|
|
Atlantic/Caribbean cruise under sail, he is currently docked in Baltimore, MD,
|
|
where he works as a technical instructor for Sun Microsystems.
|
|
|
|
<p>Ben has been working with Linux since 1997, and credits it with his complete
|
|
loss of interest in waging nuclear warfare on parts of the Pacific Northwest.
|
|
</em>
|
|
<br CLEAR="all">
|
|
<!-- *** END bio *** -->
|
|
|
|
<!-- *** END author bio *** -->
|
|
|
|
|
|
<!-- *** BEGIN copyright *** -->
|
|
<hr>
|
|
<CENTER><SMALL><STRONG>
|
|
Copyright © 2003, Ben Okopnik.
|
|
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
|
|
Published in Issue 87 of <i>Linux Gazette</i>, February 2003
|
|
</STRONG></SMALL></CENTER>
|
|
<!-- *** END copyright *** -->
|
|
<HR>
|
|
|
|
<!--startcut ==========================================================-->
|
|
<CENTER>
|
|
<!-- *** BEGIN navbar *** -->
|
|
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="lodato.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue87/okopnik.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="pramode.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
|
|
<!-- *** END navbar *** -->
|
|
</CENTER>
|
|
</BODY></HTML>
|
|
<!--endcut ============================================================-->
|