old-www/LDP/LG/issue14/outline.html

341 lines
13 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<title>Directory Trees Issue 14</title>
</HEAD>
<BODY >
<H4>
&quot;Linux Gazette...<I>making Linux just a little more lovable!</I>&quot;
<IMG ALIGN=MIDDLE SRC="../gx/heart.gif"> </H4>
<P> <HR> <P>
<!--===================================================================-->
<center>
<H2>Directory Trees in Outline Format</H2>
<H4>By James T. Dennis
<a href="mailto:jim@starshine.org">jim@starshine.org</a></H4>
</center>
<P> <HR> <P>
Since I frequently post messages to various Unix and Linux
newsgroups and mailing lists I often get technical questions
mailed to me ``out of the blue.''
<P>
I recently received a request for a script to produce
the following sort of output:
<PRE>
dir/
file1
file2
file
dir/
dir/
file
(etc)
</PRE>
Here was my quick and dirty solution:
<PRE>
find . | awk -F/ '{for (x=1;x&lt;NF;x++) { printf "\t"}; print $NF}'
</PRE> <P>
... which only does about 80% of the job. The only problem is
that the directory entries don't end with the ``/'' to indicate their
file type. It was late -- so that's what I sent him.
<P>
Here's how that works:
<P>
find . just prints a list of full paths (using GNU find). Some
non-Linux users may have to using 'find . -print' to accomplish
this (or update to the GNU version on their systems).
<P>
awk is a text processing language/utility.
<P>
The -F (capital ``f'') sets a field separator to the '/'
(slash character). Awk defaults to parsing it's input into
records (lines) of fields (whitespace delimited). Using the
-F allows me to tell awk to treat each record (still just lines)
as a group of fields that are separated by slashes -- allowing
me to deal with each directory element as a separate element
very easily.
<P>
The next parameter to awk is a short program -- a for loop
(like the C for() construct). It iterates from 1 to NF.
<P>
NF in awk is the ``number of fields'' for each record.
This, among many other values, is preset by awk as it parses its
input.
<P>
Awk defaults to reading it's input from a pipe or from
each file listed after it's script on the command line. We're
supplying it with input through the pipe, of course.
<P>
In the body of my awk 'for' loop I simply print a tab for each
directory named in that line. This has the appearance of "wiping
out" all of the leading directory names and indenting my line as
desired.
<P>
Finally, after the end of the for loop I simply print the last
field ($NF). Note how the printf takes a string similar to
C's printf -- and it doesn't assume a newline. I could put
C-like format specifiers like %s and %f in there -- and I'd have
to supply additional parameters to the printf call if I did.
<P>
By contrast the awk print command (no trailing ``f'') does add
an ORS (output record separator) character to the end of its line
and doesn't treat its first argument as a format specification.
<P>
This evening I happened to be cleaning up my home directory (while
procrastinating on doing paying work and cleaning the house) I
happened across a copy of this and decided to fix it.
<PRE>
find . | { while read i ;
do
[ -d $i ] \
&& echo $i/ \
|| echo $i
done } \
| awk -F/ '
/\/$/ { for (x = 1; x < NF -1 ;x++) {
printf "\t" };
print $(NF-1) "/";
next;
}
{ for (x = 1; x < NF; x++) {
printf "\t" }
print $NF }'
</PRE>
Note that the original script: 'find ....| awk -F/ ...'
is mostly still there. But the script has gone from
one line to eleven -- all to get that silly little slash
character on the end of each directory name.
<P>
(If anyone as a shorter program -- I'd like to see it
-- there's probably a fairly quick way to do this using
perl and find2perl)
<P>
The main thing I've added is the while loop which works
like this:
<P>
find's output is piped into a group of commands
(that's what the braces are for).
That group of commands starts with a bash "while...
do" loop. The bash "while...do" loop works like this:
<PRE>
'while'
some command returns no error
'do'
some commands
</PRE> 'done'
Note that, unlike C or Pascal programming the
``condition'' for the while loop is actually any
command (or group of commands -- enclosed in
braces or parentheses). The fact that programs
return values (called errorlevels in DOS and
some Mainframe OS) makes all commands implicitly
``conditions.'' (Actually C allows a variety of
function calls within conditionals -- but we
won't go into that).
<P>
Note that some commands might not return values that
make any sense -- so those would not be suitable
for use with any of the conditional contexts in any
shell.
<P>
The command I'm using is bash' internal ``read''
command which just takes a variable name as an
argument. Note that I don't say ``read $i'' --
the shell would then fill the value of $i into
the command (i.e it would ``dereference'' it) and
the read command would have no arguments. If you
give the read command no argument it simply reads
a value and throws it away (no error).
<P>
When you set values in bash (or Bourne shell, or zsh
etc) you also don't ``dereference'' it. $i=foo would be
an error unless you actually wanted to set the
value of some variable -- whose name was currently stored
in $i to be set to foo.
<P>
Back to our script. When the find command stops printing
filenames into the pipe, the 'read i' command will fail
to get any value -- so the body of the do loop will be
skipped.
<P>
The 'do' keyword just marks the end of the list of
commands in the conditional section and the beginning
of the body of the loop (big surprise -- huh?).
<P>
The next three lines of the script are another common
shell construct --
<ol>
<li>[ is really an alias for or link to the 'test'
command.
<P>
<li>-d is a parameter to 'test' that is true if
the next parameter ($i) is a directory.
<P>
<li>That line ends with a ``\'' (backslash) to mark
a continuation character. This causes the shell
to treat the next line as an extension of this
one.
</ul>
<P>
I could certainly have put all of this one line.
However, for readability I broke it up and formatted
it with leading tabs -- otherwise *I* couldn't read
it, much less expect anyone else to do so.
<P>
The next line (continuation) starts with the '&&'
operator. In bash and related shells you have things
like the familiar ``|'' (pipe) and ``;'' semicolon which are
called operators. This operator means ``if that last command
was O.K. -- returned no error -- then ...''
<P>
You can think of the '&&' operator as do this ``and''
to that (in the *conditional* sense of the the word
and).
<P>
The next line uses the '||' operator -- which is,
as you might expect, similar to the '&&' operator except
it means -- ``if the last command executed returned an
error then ...'' This is roughly analogous to the English
``or'' (again, it the conditional sense).
<P>
Of course I could have wrapped this in an 'if ....;
then ....; else...' construct -- but I'm used to the '&&'
and '||' as are most shell programmers.
<P>
So far all we've done is added a ``/'' character to the end
of each directory.
<P>
Now I'm left with a print out of full paths with directories ending in
``/'' (slashes) and other files printed normally -- back to replacing all
but the last thing with tabs -- so we pipe the 'while' loop's output
into the same awk script we were using before.
<P>
Ooops! Well, almost the same script -- it turns out that awk -F is
happy to consider the trailing slash as a blank field on the end of a
line. Hmm. O.K. we add an extra condition to the awk script.
<P>
An awk script consists of condition-action pairs. The most
common awk ``conditions'' are patterns. That is so say that they
are regular expressions (like the things you use grep to search
for). A pattern is usually delimited by slashes (a mnemonic to
the users of ed, later upgraded ex, later upgraded to vi) although
you can also ``match'' against strings that are enclosed in quotes.
<P>
Actions in awk are enclosed in braces.
<P>
Awk is an extremely forgiving language. If you leave out the
``condition'' or ``pattern'' it will execute the action on that
line for every record (line) that it comes across. That's
what my first script did.
<P>
If you leave off the action (i.e. if you have a line that
consists just of a condition) then awk will simply print
the record. In other words the default action is {print}.
<P>
When I was a regular in the comp.lang.awk newsgroup (and
alt.lang.awk that preceded it) I used to enjoy pointing out
that the shorted awk programs in the work are:
<PRE>
1
and
.
</PRE>
(The first one just prints every line it sees since ``1'' is
a ``true'' condition; the second program (a dot) prints every
line that has at least one character -- since that is the
regular expression for ``any character''. The second program
actually does filter out blank lines since awk doesn't count
the record separator as part of the line).
<P>
So, the modification of my awk script for this purpose is
to add a condition that handles any record that *ends* with a
slash. In those cases I convert all *but* the next-to-last field
to a tab, and print that ``next-to-last'' field. I also have to
add the ``/'' character to the end of that since awk doesn't consider
the field separator to be part of any field.
<P>
Finally I add a 'next' command which tells awk not to look
for any more pattern-action pairs with *this* record. If I
didn't do that than awk would execute the action for each
``directory'' line -- and also execute the other action for it
(i.e. it would print a blank line after printing each directory
line).
<P>
Is the extra 10 lines of code worth it just to add a slash to the end
of the directory names in our outline? Depends on how much your customer
is willing to pay -- or how much grief it causes you, your boss or your
users.
<P>
Mostly I decided to work on this as a training example. I think there are
some neat constructs that every budding shell programmer might benefit
from learning.
<P>
The ``find .... | {while read i .... do ... done}'' construct is well worth
remember for other cases. It allows you to do complex operations on
large numbers of files without resorting to writing a temporary file and
having to clean up after it.
<P>
When you write scripts that explicitly create temporary files you suddenly
have a host of new concerns -- what do I name it? where do I put it?
don't forget to remove it! do I have enough space for it? what if my
script gets interrupted? etc.
<P>
To be sure there are answers to each of these. For example I
suggest ~/tmp/$0.`date +%Y%m%d`.$$ for a generic temporary filename
for any script -- it gives the name of your script, the date in
YYYYMMDD format and the process ID of the current instance of your
script as the filename. It puts that into the temporary directory
under your home (which no one else should have access to). There is
virtually no chance of a name collision using this scheme (particularly
if you change the date format to +%s which is the total number of seconds
since midnight on Jan. 1, 1970). You can use the 'trap' command to
ensure that your temp files are cleaned in all but the most extreme
cases etc.
<P>
However, as I've said, it's worth understanding how to avoid temporary
files -- and usually your scripts will execute faster as a result.
<P>
The [ ... ] && ... || ... construct is absolutely essential to
any Unix sysadmin. Many of legacy scripts (particularly those in
/etc/rc.d/ -- or it's local equivalent) rely on these operators and
the test or '[' command.
<P>
Finally there is 'awk'. I've heard it argued that awk is a dinosaur
and that we should convert all the awk code to perl (and presumably most
of the Bourne shell and sed code with it). I won't argue that point
here. Suffice it to say that anything you learn how to do in awk will
just make learning perl that much easier when you get to it. awk is a
much simpler language and is phenomenally easy to integrate into shell scripts
(as you can see here).
<P>
Jim Dennis, Starshine Technical Services
<!--===================================================================-->
<P> <hr> <P>
<center><H5>Copyright &copy; 1997, James T. Dennis <BR>
Published in Issue 14 of the Linux Gazette</H5></center>
<!--===================================================================-->
<P> <hr> <P>
<A HREF="./index.html"><IMG ALIGN=BOTTOM SRC="../gx/indexnew.gif"
ALT="[ TABLE OF CONTENTS ]"></A>
<A HREF="../index.html"><IMG ALIGN=BOTTOM SRC="../gx/homenew.gif"
ALT="[ FRONT PAGE ]"></A>
<A HREF="./dired.html"><IMG SRC="../gx/back2.gif"
ALT=" Back "></A>
<A HREF="./gm.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
<P> <hr> <P>
</BODY>
</HTML>