333 lines
13 KiB
HTML
333 lines
13 KiB
HTML
<!--startcut ==============================================-->
|
|
<!-- *** BEGIN HTML header *** -->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML><HEAD>
|
|
<title>An Introduction to awk LG #67</title>
|
|
</HEAD>
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
|
|
ALINK="#FF0000">
|
|
<!-- *** END HTML header *** -->
|
|
|
|
<CENTER>
|
|
<A HREF="http://www.linuxgazette.com/">
|
|
<H1><IMG ALT="LINUX GAZETTE" SRC="../gx/lglogo.png"
|
|
WIDTH="600" HEIGHT="124" border="0"></H1></A>
|
|
<BR>
|
|
|
|
<!-- *** BEGIN navbar *** -->
|
|
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="gilliam.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue67/nazario.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../faq/index.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="nazario2.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
|
|
<!-- *** END navbar *** -->
|
|
<P>
|
|
</CENTER>
|
|
|
|
<!--endcut ============================================================-->
|
|
|
|
<H4 ALIGN="center">
|
|
"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
|
</H4>
|
|
|
|
<P> <HR> <P>
|
|
<!--===================================================================-->
|
|
|
|
<center>
|
|
<H1><font color="maroon">An Introduction to awk</font></H1>
|
|
<H4>By <a href="mailto:jose@cwru.edu">Jose Nazario</a></H4>
|
|
</center>
|
|
<P> <HR> <P>
|
|
|
|
<!-- END header -->
|
|
|
|
|
|
|
|
|
|
<h3>
|
|
Abstract</h3>
|
|
<p>
|
|
The awk programming language often gets overlooked in the face of Perl,
|
|
which is a more capable language. However, awk is found even more
|
|
ubiquitously than Perl, has a less steep learning curve than Perl, and can
|
|
be used just about everywhere in system monitoring scripts where
|
|
efficiency is key. This brief tutorial is designed to help you get started
|
|
in awk programming.
|
|
<p>
|
|
<h3>
|
|
The Basics</h3>
|
|
<p>
|
|
|
|
The awk language is a small, C style language which was designed for the
|
|
processing of regularly formatted text. This usually includes database
|
|
dumps and system log files. It's built around regular expressions and
|
|
pattern handling, much like Perl is. In fact, Perl is considered to be a
|
|
grandchild of the awk language.
|
|
<p>
|
|
The funny name of the awk language is due to the names of its original
|
|
authors, who were Alfred V. Aho, Brian W. Kernighan, and Peter J.
|
|
Weinberger. Most of you will recognize the name of Kernighan, one of the
|
|
fathers of the C programming language and a major force in the UNIX world.
|
|
<p><h3>
|
|
Using awk in a One Liner</h3><p>
|
|
|
|
This is how I began using awk, to print specific fields in output. This
|
|
would work surprisingly well, but the efficiency went through the floor
|
|
when I was writing large scripts that took minutes to complete.
|
|
<p>
|
|
|
|
But, here you go, this can be useful sometimes:
|
|
<p>
|
|
<pre>
|
|
ls -l /tmp/foobar | awk '{print $1"\t"$9}'
|
|
</pre>
|
|
<p>
|
|
What this will do is take some input like this:
|
|
<pre>
|
|
|
|
-rw-rw-rw- 1 root root 1 Jul 14 1997 tmpmsg
|
|
</pre><p>
|
|
|
|
and will generate some output like this:
|
|
<pre>
|
|
|
|
-rw-rw-rw- tmpmsg
|
|
</pre>
|
|
|
|
<p>
|
|
Quite intuitive what it just did, it printed only the first and ninth
|
|
fields. Now you can see why it's so popular for one line data extraction.
|
|
But, let's move on to a full fledged awk program.
|
|
<p>
|
|
<h3>
|
|
An awk Program Structure</h3><p>
|
|
|
|
One of my favorite things about awk is the amazing readability of it,
|
|
despite it's power compared to Perl or Python. Every awk program has three
|
|
parts: a BEGIN block, which is executed once before any input is read; a
|
|
main loop which is executed for every line of input; and an END block,
|
|
which is executed after all of the input is read. Quite intuitive! Yes,
|
|
I'll keep saying that about awk, because i find it to be very true.
|
|
<p>
|
|
|
|
This is a very simple awk program highlighting some of the features of the
|
|
language. See if you can pick out what is happening before we dissect it:
|
|
<pre>
|
|
|
|
#!/usr/bin/awk -f
|
|
#
|
|
# check the sulog for failures..
|
|
# copyright 2001 (c) jose nazario
|
|
#
|
|
# works for Solaris, IRIX and HPUX 10.20
|
|
BEGIN {
|
|
print "--- checking sulog"
|
|
failed=0
|
|
}
|
|
{
|
|
if ($4 == "-") {
|
|
print "failed su:\t"$6"\tat\t"$2"\t"$3
|
|
failed=failed+1
|
|
}
|
|
}
|
|
END {
|
|
print "---------------------------------------"
|
|
printf("\ttotal number of records:\t%d\n", NR)
|
|
printf("\ttotal number of failed su's:\t%d\n",failed)
|
|
}
|
|
</pre><p>
|
|
|
|
Have you figured it out yet? Would it help to know the format of a typical
|
|
line of the input file (sulog, from, say, IRIX)? Here's a typical pair of
|
|
lines:
|
|
<pre>
|
|
|
|
SU 01/30 13:15 - ttyq1 jose-root
|
|
SU 01/30 13:15 + ttyq1 jose-root
|
|
|
|
</pre>
|
|
OK, read up and see if you can figure out the script. The BEGIN block sets
|
|
everything up, printing out a header and initializing our one variable (in
|
|
this case failed) to zero. The main loop then reads each line of input
|
|
(which is the sulog file, a log of su attempts) and compares field four
|
|
against the minu sign. If they match, it is because the attempt failed, so
|
|
we increment out counter by one and note which attempt failed and when. At
|
|
the end final tallies are presented, showing the total number of lines of
|
|
input as the number of records (NR, an internal awk variable) and the
|
|
number of failed su attempts we noted. Output looks like this:
|
|
<pre>
|
|
|
|
failed su: jose-root at 01/30 13:15
|
|
---------------------------------------
|
|
total number of records: 272
|
|
total number of failed su's: 73
|
|
|
|
</pre>
|
|
You should also be able to see how printf works, almost exactly like the
|
|
printf does in C. In short, awk is a rather intuitive language.
|
|
<p>
|
|
|
|
By default the field separator is whitespace, but you can tweak that. In
|
|
password files I set it to be a colon. This small script looks for users
|
|
with an ID of 0 (root equivilent) and no passwords:
|
|
<pre>
|
|
|
|
#!/usr/bin/awk -f
|
|
BEGIN { FS=":" }
|
|
{
|
|
if ($3 == 0) print $1
|
|
if ($2 == "") print $1
|
|
}
|
|
</pre>
|
|
<p>
|
|
|
|
Other internals from awk to know and use are RS for record separator
|
|
(defaults to a newline, or \n), OFS for output field separator (defaults
|
|
to nothing, I think) and ORS (defaults to a newline), for output record
|
|
separator. These can all be set within the script, of course.
|
|
|
|
<p>
|
|
|
|
<h3> Regular Expressions</h3><p>
|
|
|
|
The awk language matches normal regular expressions that you have come to
|
|
know and love, and does so better than grep. For instance, I use the
|
|
following awk search pattern to look for the presence of a likely exploit
|
|
on Intel Linux systems:
|
|
<pre>
|
|
|
|
#!/usr/bin/awk -f
|
|
{ if ($0 ~ /\x90/) print "exploit at line " NR }
|
|
</pre><p>
|
|
|
|
You can't look for hex value 0x90 in grep, but 0x90 is popular in Intel
|
|
exploits -- its the NOP call, which is used as padding in shellcode
|
|
portions.
|
|
<p>
|
|
You can look for hex values using \xdd, where dd is the hex number to look
|
|
for; you can look for decimal (ie ASCII) values by looking for \ddd, using
|
|
the decimal value, and regular expressions based on text will, of course,
|
|
work.
|
|
<p>
|
|
<h3>Random awk bits</h3><p>
|
|
|
|
Random numbers in awk are readily generated, but there is an interesting
|
|
caveat. The rand() function does exactly what you would expect it to, it
|
|
returns a random number, in this case between 0 and 1. You can scale it,
|
|
of course, to get larger values. Here's some example code to show you
|
|
this, as well as an interesting bit of behavior:
|
|
|
|
<pre>
|
|
#!/usr/bin/awk -f
|
|
{
|
|
for(i=1;i<=10;i++)
|
|
print rand(); exit
|
|
}
|
|
</pre>
|
|
|
|
Run that a couple of times and you will see a problem: the random numbers
|
|
are hardly random, they repeat every time you run it!
|
|
<p>
|
|
So what's the problem? Well we didn't seed the random number generator.
|
|
Normally, we're used to our random number generator pulling entropy from a
|
|
good source, like (in Linux) /dev/random. However, awk doesn't do this. To
|
|
really get random numbers, we should seed our random number generator.
|
|
This improved code will do this:
|
|
|
|
<pre>
|
|
#!/usr/bin/awk -f
|
|
BEGIN {
|
|
srand()
|
|
}
|
|
{
|
|
for(i=1;i<=10;i++)
|
|
print rand(); exit
|
|
}
|
|
</pre>
|
|
|
|
The seeding of the random number generator in the BEGIN block is what does
|
|
the trick. The function srand() can take an argument, and in the absence
|
|
of one the current date and time is used to seed the generator. <em>Note
|
|
that the same seed will always produce the same 'random' sequence.</em>
|
|
|
|
<p><h3>
|
|
Conclusion</h3><p>
|
|
|
|
This isn't the most detailed intro to awk you will find, but I hope that
|
|
it is more clear to you how to use awk in a program setting. Myself, I'm
|
|
quite happy programming in awk, and I've got a lot more to learn.
|
|
<p>
|
|
|
|
We haven't even touched upon arrays, self built functions or other complex
|
|
language features, but suffice it to say awk is hardly Perl's little
|
|
brother.
|
|
<p>
|
|
|
|
Go forth and awk!
|
|
<h3>
|
|
|
|
Resources</h3><p>
|
|
|
|
Kernighan's homepage contains a list of good awk books as well as the
|
|
source for the 'one true awk', aka "nawk". It also contains a host of
|
|
other interesting links and information from Kernighan.<p>
|
|
<A HREF="http://cm.bell-labs.com/who/bwk/">http://cm.bell-labs.com/who/bwk/</A>
|
|
|
|
<p>
|
|
The standard awk implementation, nawk (for "new awk", as opposed to the "old
|
|
awk, sometimes found as 'oawk' for compatability), is based on the POSIX
|
|
awk definitions, and contains a few functions that were introduced by two
|
|
other awk implementations, gawk and mawk. I usually keep this one around
|
|
as 'nawk' and use it to test the portability of my awk scripts. This one is
|
|
usually found on my commercial UNIX machines, where I often don't have
|
|
gawk installed.<p>
|
|
Source for nawk: <A HREF="http://cm.bell-labs.com/who/bwk/awk.tar.gz">http://cm.bell-labs.com/who/bwk/awk.tar.gz</A>
|
|
|
|
<p>
|
|
The GNU project's awk, gawk, is also based on the POSIX awk standard, but
|
|
adds a significant number of useful features, as well. These include command
|
|
line features like 'lint' checking and reversion to struct POSIX mode. My
|
|
favorite feature in gawk is the line breaks, using '\', and the extended
|
|
regular expressions. The gawk documentation has a complete discussion of
|
|
GNU extensions to the awk language. This is also the standard awk on Linux
|
|
and BSD systems.
|
|
<p>
|
|
Source for gawk: <A HREF="ftp://gnudist.gnu.org/gnu/gawk/gawk-3.0.6.tar.gz">ftp://gnudist.gnu.org/gnu/gawk/gawk-3.0.6.tar.gz</A> (the GNU Project's version of awk)
|
|
|
|
<p>
|
|
This is perhaps the most popular book on these two small programs, and is
|
|
highly regarded. It contains, among other things, a discussion of popular
|
|
awk implementations (ie gawk, nawk, mawk), a great selection of functions and
|
|
the usual O'Reilly readability. The awk homepage lists several other books
|
|
on the awk programming language, though this one remains my favorite.
|
|
<p>
|
|
The sed & awk book: <A HREF="http://www.oreilly.com/catalog/sed2">http://www.oreilly.com/catalog/sed2</A>
|
|
|
|
|
|
|
|
|
|
<!-- *** BEGIN bio *** -->
|
|
<SPACER TYPE="vertical" SIZE="30">
|
|
<P>
|
|
<H4><IMG ALIGN=BOTTOM ALT="" SRC="../gx/note.gif">Jose Nazario</H4>
|
|
<CITE>José is a Ph.D. student in the department of biochemistry at Case
|
|
Western Reserve University in Cleveland, OH. He has been using UNIX for
|
|
nearly ten years, and Linux since kernels 1.2.</CITE>
|
|
|
|
<!-- *** END bio *** -->
|
|
|
|
<!-- *** BEGIN copyright *** -->
|
|
<P> <hr> <!-- P -->
|
|
<H5 ALIGN=center>
|
|
|
|
Copyright © 2001, Jose Nazario.<BR>
|
|
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
|
|
Published in Issue 67 of <i>Linux Gazette</i>, June 2001</H5>
|
|
<!-- *** END copyright *** -->
|
|
|
|
<!--startcut ==========================================================-->
|
|
<HR><P>
|
|
<CENTER>
|
|
<!-- *** BEGIN navbar *** -->
|
|
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="gilliam.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue67/nazario.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../faq/index.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="nazario2.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
|
|
<!-- *** END navbar *** -->
|
|
</CENTER>
|
|
</BODY></HTML>
|
|
<!--endcut ============================================================-->
|