341 lines
13 KiB
HTML
341 lines
13 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<title>Directory Trees Issue 14</title>
|
|
</HEAD>
|
|
<BODY >
|
|
|
|
<H4>
|
|
"Linux Gazette...<I>making Linux just a little more lovable!</I>"
|
|
<IMG ALIGN=MIDDLE SRC="../gx/heart.gif"> </H4>
|
|
|
|
<P> <HR> <P>
|
|
<!--===================================================================-->
|
|
|
|
<center>
|
|
<H2>Directory Trees in Outline Format</H2>
|
|
<H4>By James T. Dennis
|
|
<a href="mailto:jim@starshine.org">jim@starshine.org</a></H4>
|
|
</center>
|
|
<P> <HR> <P>
|
|
|
|
Since I frequently post messages to various Unix and Linux
|
|
newsgroups and mailing lists I often get technical questions
|
|
mailed to me ``out of the blue.''
|
|
<P>
|
|
I recently received a request for a script to produce
|
|
the following sort of output:
|
|
<PRE>
|
|
dir/
|
|
file1
|
|
file2
|
|
file
|
|
dir/
|
|
dir/
|
|
file
|
|
|
|
(etc)
|
|
</PRE>
|
|
Here was my quick and dirty solution:
|
|
<PRE>
|
|
find . | awk -F/ '{for (x=1;x<NF;x++) { printf "\t"}; print $NF}'
|
|
</PRE> <P>
|
|
... which only does about 80% of the job. The only problem is
|
|
that the directory entries don't end with the ``/'' to indicate their
|
|
file type. It was late -- so that's what I sent him.
|
|
<P>
|
|
Here's how that works:
|
|
<P>
|
|
find . just prints a list of full paths (using GNU find). Some
|
|
non-Linux users may have to using 'find . -print' to accomplish
|
|
this (or update to the GNU version on their systems).
|
|
<P>
|
|
awk is a text processing language/utility.
|
|
<P>
|
|
The -F (capital ``f'') sets a field separator to the '/'
|
|
(slash character). Awk defaults to parsing it's input into
|
|
records (lines) of fields (whitespace delimited). Using the
|
|
-F allows me to tell awk to treat each record (still just lines)
|
|
as a group of fields that are separated by slashes -- allowing
|
|
me to deal with each directory element as a separate element
|
|
very easily.
|
|
<P>
|
|
The next parameter to awk is a short program -- a for loop
|
|
(like the C for() construct). It iterates from 1 to NF.
|
|
<P>
|
|
NF in awk is the ``number of fields'' for each record.
|
|
This, among many other values, is preset by awk as it parses its
|
|
input.
|
|
<P>
|
|
Awk defaults to reading it's input from a pipe or from
|
|
each file listed after it's script on the command line. We're
|
|
supplying it with input through the pipe, of course.
|
|
<P>
|
|
In the body of my awk 'for' loop I simply print a tab for each
|
|
directory named in that line. This has the appearance of "wiping
|
|
out" all of the leading directory names and indenting my line as
|
|
desired.
|
|
<P>
|
|
Finally, after the end of the for loop I simply print the last
|
|
field ($NF). Note how the printf takes a string similar to
|
|
C's printf -- and it doesn't assume a newline. I could put
|
|
C-like format specifiers like %s and %f in there -- and I'd have
|
|
to supply additional parameters to the printf call if I did.
|
|
<P>
|
|
By contrast the awk print command (no trailing ``f'') does add
|
|
an ORS (output record separator) character to the end of its line
|
|
and doesn't treat its first argument as a format specification.
|
|
<P>
|
|
This evening I happened to be cleaning up my home directory (while
|
|
procrastinating on doing paying work and cleaning the house) I
|
|
happened across a copy of this and decided to fix it.
|
|
<PRE>
|
|
find . | { while read i ;
|
|
do
|
|
[ -d $i ] \
|
|
&& echo $i/ \
|
|
|| echo $i
|
|
done } \
|
|
| awk -F/ '
|
|
/\/$/ { for (x = 1; x < NF -1 ;x++) {
|
|
printf "\t" };
|
|
print $(NF-1) "/";
|
|
next;
|
|
}
|
|
{ for (x = 1; x < NF; x++) {
|
|
printf "\t" }
|
|
print $NF }'
|
|
</PRE>
|
|
|
|
Note that the original script: 'find ....| awk -F/ ...'
|
|
is mostly still there. But the script has gone from
|
|
one line to eleven -- all to get that silly little slash
|
|
character on the end of each directory name.
|
|
<P>
|
|
(If anyone as a shorter program -- I'd like to see it
|
|
-- there's probably a fairly quick way to do this using
|
|
perl and find2perl)
|
|
<P>
|
|
The main thing I've added is the while loop which works
|
|
like this:
|
|
<P>
|
|
find's output is piped into a group of commands
|
|
(that's what the braces are for).
|
|
|
|
That group of commands starts with a bash "while...
|
|
do" loop. The bash "while...do" loop works like this:
|
|
<PRE>
|
|
'while'
|
|
some command returns no error
|
|
'do'
|
|
some commands
|
|
</PRE> 'done'
|
|
|
|
Note that, unlike C or Pascal programming the
|
|
``condition'' for the while loop is actually any
|
|
command (or group of commands -- enclosed in
|
|
braces or parentheses). The fact that programs
|
|
return values (called errorlevels in DOS and
|
|
some Mainframe OS) makes all commands implicitly
|
|
``conditions.'' (Actually C allows a variety of
|
|
function calls within conditionals -- but we
|
|
won't go into that).
|
|
<P>
|
|
Note that some commands might not return values that
|
|
make any sense -- so those would not be suitable
|
|
for use with any of the conditional contexts in any
|
|
shell.
|
|
<P>
|
|
The command I'm using is bash' internal ``read''
|
|
command which just takes a variable name as an
|
|
argument. Note that I don't say ``read $i'' --
|
|
the shell would then fill the value of $i into
|
|
the command (i.e it would ``dereference'' it) and
|
|
the read command would have no arguments. If you
|
|
give the read command no argument it simply reads
|
|
a value and throws it away (no error).
|
|
<P>
|
|
When you set values in bash (or Bourne shell, or zsh
|
|
etc) you also don't ``dereference'' it. $i=foo would be
|
|
an error unless you actually wanted to set the
|
|
value of some variable -- whose name was currently stored
|
|
in $i to be set to foo.
|
|
<P>
|
|
Back to our script. When the find command stops printing
|
|
filenames into the pipe, the 'read i' command will fail
|
|
to get any value -- so the body of the do loop will be
|
|
skipped.
|
|
<P>
|
|
The 'do' keyword just marks the end of the list of
|
|
commands in the conditional section and the beginning
|
|
of the body of the loop (big surprise -- huh?).
|
|
<P>
|
|
The next three lines of the script are another common
|
|
shell construct --
|
|
<ol>
|
|
<li>[ is really an alias for or link to the 'test'
|
|
command.
|
|
<P>
|
|
<li>-d is a parameter to 'test' that is true if
|
|
the next parameter ($i) is a directory.
|
|
<P>
|
|
<li>That line ends with a ``\'' (backslash) to mark
|
|
a continuation character. This causes the shell
|
|
to treat the next line as an extension of this
|
|
one.
|
|
</ul>
|
|
<P>
|
|
I could certainly have put all of this one line.
|
|
However, for readability I broke it up and formatted
|
|
it with leading tabs -- otherwise *I* couldn't read
|
|
it, much less expect anyone else to do so.
|
|
<P>
|
|
The next line (continuation) starts with the '&&'
|
|
operator. In bash and related shells you have things
|
|
like the familiar ``|'' (pipe) and ``;'' semicolon which are
|
|
called operators. This operator means ``if that last command
|
|
was O.K. -- returned no error -- then ...''
|
|
<P>
|
|
You can think of the '&&' operator as do this ``and''
|
|
to that (in the *conditional* sense of the the word
|
|
and).
|
|
<P>
|
|
The next line uses the '||' operator -- which is,
|
|
as you might expect, similar to the '&&' operator except
|
|
it means -- ``if the last command executed returned an
|
|
error then ...'' This is roughly analogous to the English
|
|
``or'' (again, it the conditional sense).
|
|
<P>
|
|
Of course I could have wrapped this in an 'if ....;
|
|
then ....; else...' construct -- but I'm used to the '&&'
|
|
and '||' as are most shell programmers.
|
|
<P>
|
|
So far all we've done is added a ``/'' character to the end
|
|
of each directory.
|
|
<P>
|
|
Now I'm left with a print out of full paths with directories ending in
|
|
``/'' (slashes) and other files printed normally -- back to replacing all
|
|
but the last thing with tabs -- so we pipe the 'while' loop's output
|
|
into the same awk script we were using before.
|
|
<P>
|
|
Ooops! Well, almost the same script -- it turns out that awk -F is
|
|
happy to consider the trailing slash as a blank field on the end of a
|
|
line. Hmm. O.K. we add an extra condition to the awk script.
|
|
<P>
|
|
An awk script consists of condition-action pairs. The most
|
|
common awk ``conditions'' are patterns. That is so say that they
|
|
are regular expressions (like the things you use grep to search
|
|
for). A pattern is usually delimited by slashes (a mnemonic to
|
|
the users of ed, later upgraded ex, later upgraded to vi) although
|
|
you can also ``match'' against strings that are enclosed in quotes.
|
|
<P>
|
|
Actions in awk are enclosed in braces.
|
|
<P>
|
|
Awk is an extremely forgiving language. If you leave out the
|
|
``condition'' or ``pattern'' it will execute the action on that
|
|
line for every record (line) that it comes across. That's
|
|
what my first script did.
|
|
<P>
|
|
If you leave off the action (i.e. if you have a line that
|
|
consists just of a condition) then awk will simply print
|
|
the record. In other words the default action is {print}.
|
|
<P>
|
|
When I was a regular in the comp.lang.awk newsgroup (and
|
|
alt.lang.awk that preceded it) I used to enjoy pointing out
|
|
that the shorted awk programs in the work are:
|
|
<PRE>
|
|
1
|
|
|
|
and
|
|
|
|
.
|
|
</PRE>
|
|
(The first one just prints every line it sees since ``1'' is
|
|
a ``true'' condition; the second program (a dot) prints every
|
|
line that has at least one character -- since that is the
|
|
regular expression for ``any character''. The second program
|
|
actually does filter out blank lines since awk doesn't count
|
|
the record separator as part of the line).
|
|
<P>
|
|
So, the modification of my awk script for this purpose is
|
|
to add a condition that handles any record that *ends* with a
|
|
slash. In those cases I convert all *but* the next-to-last field
|
|
to a tab, and print that ``next-to-last'' field. I also have to
|
|
add the ``/'' character to the end of that since awk doesn't consider
|
|
the field separator to be part of any field.
|
|
<P>
|
|
Finally I add a 'next' command which tells awk not to look
|
|
for any more pattern-action pairs with *this* record. If I
|
|
didn't do that than awk would execute the action for each
|
|
``directory'' line -- and also execute the other action for it
|
|
(i.e. it would print a blank line after printing each directory
|
|
line).
|
|
|
|
<P>
|
|
Is the extra 10 lines of code worth it just to add a slash to the end
|
|
of the directory names in our outline? Depends on how much your customer
|
|
is willing to pay -- or how much grief it causes you, your boss or your
|
|
users.
|
|
<P>
|
|
Mostly I decided to work on this as a training example. I think there are
|
|
some neat constructs that every budding shell programmer might benefit
|
|
from learning.
|
|
<P>
|
|
The ``find .... | {while read i .... do ... done}'' construct is well worth
|
|
remember for other cases. It allows you to do complex operations on
|
|
large numbers of files without resorting to writing a temporary file and
|
|
having to clean up after it.
|
|
<P>
|
|
When you write scripts that explicitly create temporary files you suddenly
|
|
have a host of new concerns -- what do I name it? where do I put it?
|
|
don't forget to remove it! do I have enough space for it? what if my
|
|
script gets interrupted? etc.
|
|
<P>
|
|
To be sure there are answers to each of these. For example I
|
|
suggest ~/tmp/$0.`date +%Y%m%d`.$$ for a generic temporary filename
|
|
for any script -- it gives the name of your script, the date in
|
|
YYYYMMDD format and the process ID of the current instance of your
|
|
script as the filename. It puts that into the temporary directory
|
|
under your home (which no one else should have access to). There is
|
|
virtually no chance of a name collision using this scheme (particularly
|
|
if you change the date format to +%s which is the total number of seconds
|
|
since midnight on Jan. 1, 1970). You can use the 'trap' command to
|
|
ensure that your temp files are cleaned in all but the most extreme
|
|
cases etc.
|
|
<P>
|
|
However, as I've said, it's worth understanding how to avoid temporary
|
|
files -- and usually your scripts will execute faster as a result.
|
|
<P>
|
|
The [ ... ] && ... || ... construct is absolutely essential to
|
|
any Unix sysadmin. Many of legacy scripts (particularly those in
|
|
/etc/rc.d/ -- or it's local equivalent) rely on these operators and
|
|
the test or '[' command.
|
|
<P>
|
|
Finally there is 'awk'. I've heard it argued that awk is a dinosaur
|
|
and that we should convert all the awk code to perl (and presumably most
|
|
of the Bourne shell and sed code with it). I won't argue that point
|
|
here. Suffice it to say that anything you learn how to do in awk will
|
|
just make learning perl that much easier when you get to it. awk is a
|
|
much simpler language and is phenomenally easy to integrate into shell scripts
|
|
(as you can see here).
|
|
<P>
|
|
Jim Dennis, Starshine Technical Services
|
|
|
|
<!--===================================================================-->
|
|
<P> <hr> <P>
|
|
<center><H5>Copyright © 1997, James T. Dennis <BR>
|
|
Published in Issue 14 of the Linux Gazette</H5></center>
|
|
|
|
<!--===================================================================-->
|
|
<P> <hr> <P>
|
|
<A HREF="./index.html"><IMG ALIGN=BOTTOM SRC="../gx/indexnew.gif"
|
|
ALT="[ TABLE OF CONTENTS ]"></A>
|
|
<A HREF="../index.html"><IMG ALIGN=BOTTOM SRC="../gx/homenew.gif"
|
|
ALT="[ FRONT PAGE ]"></A>
|
|
<A HREF="./dired.html"><IMG SRC="../gx/back2.gif"
|
|
ALT=" Back "></A>
|
|
<A HREF="./gm.html"><IMG SRC="../gx/fwd.gif" ALT=" Next "></A>
|
|
<P> <hr> <P>
|
|
</BODY>
|
|
</HTML>
|