old-www/LDP/abs/html/awk.html

588 lines
9.0 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML
><HEAD
><TITLE
>Awk</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
REL="HOME"
TITLE="Advanced Bash-Scripting Guide"
HREF="index.html"><LINK
REL="UP"
TITLE="A Sed and Awk Micro-Primer"
HREF="sedawk.html"><LINK
REL="PREVIOUS"
TITLE="Sed"
HREF="x23170.html"><LINK
REL="NEXT"
TITLE="Parsing and Managing Pathnames"
HREF="pathmanagement.html"></HEAD
><BODY
CLASS="SECT1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Advanced Bash-Scripting Guide: </TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="x23170.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Appendix C. A Sed and Awk Micro-Primer</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="pathmanagement.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="AWK"
></A
>C.2. Awk</H1
><P
><A
NAME="AWKREF"
></A
></P
><P
><I
CLASS="FIRSTTERM"
>Awk</I
>
<A
NAME="AEN23443"
HREF="#FTN.AEN23443"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
>
is a full-featured text processing language with a syntax
reminiscent of <I
CLASS="FIRSTTERM"
>C</I
>. While it possesses an
extensive set of operators and capabilities, we will cover only
a few of these here - the ones most useful in shell scripts.</P
><P
>Awk breaks each line of input passed to it into
<A
NAME="FIELDREF2"
></A
>
<A
HREF="special-chars.html#FIELDREF"
>fields</A
>. By default, a field
is a string of consecutive characters delimited by <A
HREF="special-chars.html#WHITESPACEREF"
>whitespace</A
>, though there are options
for changing this. Awk parses and operates on each separate
field. This makes it ideal for handling structured text files
-- especially tables -- data organized into consistent chunks,
such as rows and columns.</P
><P
><A
HREF="varsubn.html#SNGLQUO"
>Strong quoting</A
> and <A
HREF="special-chars.html#CODEBLOCKREF"
>curly brackets</A
> enclose blocks of
awk code within a shell script.</P
><P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
># $1 is field #1, $2 is field #2, etc.
echo one two | awk '{print $1}'
# one
echo one two | awk '{print $2}'
# two
# But what is field #0 ($0)?
echo one two | awk '{print $0}'
# one two
# All the fields!
awk '{print $3}' $filename
# Prints field #3 of file $filename to stdout.
awk '{print $1 $5 $6}' $filename
# Prints fields #1, #5, and #6 of file $filename.
awk '{print $0}' $filename
# Prints the entire file!
# Same effect as: cat $filename . . . or . . . sed '' $filename</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>We have just seen the awk <I
CLASS="FIRSTTERM"
>print</I
> command
in action. The only other feature of awk we need to deal with
here is variables. Awk handles variables similarly to shell
scripts, though a bit more flexibly.</P
><P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>{ total += ${column_number} }</PRE
></FONT
></TD
></TR
></TABLE
>
This adds the value of <TT
CLASS="PARAMETER"
><I
>column_number</I
></TT
> to
the running total of <TT
CLASS="PARAMETER"
><I
>total</I
></TT
>&#62;. Finally, to print
<SPAN
CLASS="QUOTE"
>"total"</SPAN
>, there is an <B
CLASS="COMMAND"
>END</B
> command
block, executed after the script has processed all its input.
<TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>END { print total }</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>Corresponding to the <B
CLASS="COMMAND"
>END</B
>, there is a
<B
CLASS="COMMAND"
>BEGIN</B
>, for a code block to be performed before awk
starts processing its input.</P
><P
>The following example illustrates how <B
CLASS="COMMAND"
>awk</B
> can
add text-parsing tools to a shell script.</P
><DIV
CLASS="EXAMPLE"
><A
NAME="LETTERCOUNT2"
></A
><P
><B
>Example C-1. Counting Letter Occurrences</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#! /bin/sh
# letter-count2.sh: Counting letter occurrences in a text file.
#
# Script by nyal [nyal@voila.fr].
# Used in ABS Guide with permission.
# Recommented and reformatted by ABS Guide author.
# Version 1.1: Modified to work with gawk 3.1.3.
# (Will still work with earlier versions.)
INIT_TAB_AWK=""
# Parameter to initialize awk script.
count_case=0
FILE_PARSE=$1
E_PARAMERR=85
usage()
{
echo "Usage: letter-count.sh file letters" 2&#62;&#38;1
# For example: ./letter-count2.sh filename.txt a b c
exit $E_PARAMERR # Too few arguments passed to script.
}
if [ ! -f "$1" ] ; then
echo "$1: No such file." 2&#62;&#38;1
usage # Print usage message and exit.
fi
if [ -z "$2" ] ; then
echo "$2: No letters specified." 2&#62;&#38;1
usage
fi
shift # Letters specified.
for letter in `echo $@` # For each one . . .
do
INIT_TAB_AWK="$INIT_TAB_AWK tab_search[${count_case}] = \
\"$letter\"; final_tab[${count_case}] = 0; "
# Pass as parameter to awk script below.
count_case=`expr $count_case + 1`
done
# DEBUG:
# echo $INIT_TAB_AWK;
cat $FILE_PARSE |
# Pipe the target file to the following awk script.
# ---------------------------------------------------------------------
# Earlier version of script:
# awk -v tab_search=0 -v final_tab=0 -v tab=0 -v \
# nb_letter=0 -v chara=0 -v chara2=0 \
awk \
"BEGIN { $INIT_TAB_AWK } \
{ split(\$0, tab, \"\"); \
for (chara in tab) \
{ for (chara2 in tab_search) \
{ if (tab_search[chara2] == tab[chara]) { final_tab[chara2]++ } } } } \
END { for (chara in final_tab) \
{ print tab_search[chara] \" =&#62; \" final_tab[chara] } }"
# ---------------------------------------------------------------------
# Nothing all that complicated, just . . .
#+ for-loops, if-tests, and a couple of specialized functions.
exit $?
# Compare this script to letter-count.sh.</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><P
>For simpler examples of awk within shell scripts, see:
<P
></P
><OL
TYPE="1"
><LI
><P
><A
HREF="internal.html#EX44"
>Example 15-14</A
></P
></LI
><LI
><P
><A
HREF="redircb.html#REDIR4"
>Example 20-8</A
></P
></LI
><LI
><P
><A
HREF="filearchiv.html#STRIPC"
>Example 16-32</A
></P
></LI
><LI
><P
><A
HREF="wrapper.html#COLTOTALER"
>Example 36-5</A
></P
></LI
><LI
><P
><A
HREF="ivr.html#COLTOTALER2"
>Example 28-2</A
></P
></LI
><LI
><P
><A
HREF="internal.html#COLTOTALER3"
>Example 15-20</A
></P
></LI
><LI
><P
><A
HREF="procref1.html#PIDID"
>Example 29-3</A
></P
></LI
><LI
><P
><A
HREF="procref1.html#CONSTAT"
>Example 29-4</A
></P
></LI
><LI
><P
><A
HREF="loops1.html#FILEINFO"
>Example 11-3</A
></P
></LI
><LI
><P
><A
HREF="extmisc.html#BLOTOUT"
>Example 16-61</A
></P
></LI
><LI
><P
><A
HREF="randomvar.html#SEEDINGRANDOM"
>Example 9-16</A
></P
></LI
><LI
><P
><A
HREF="moreadv.html#IDELETE"
>Example 16-4</A
></P
></LI
><LI
><P
><A
HREF="string-manipulation.html#SUBSTRINGEX"
>Example 10-6</A
></P
></LI
><LI
><P
><A
HREF="assortedtips.html#SUMPRODUCT"
>Example 36-19</A
></P
></LI
><LI
><P
><A
HREF="loops1.html#USERLIST"
>Example 11-9</A
></P
></LI
><LI
><P
><A
HREF="wrapper.html#PRASC"
>Example 36-4</A
></P
></LI
><LI
><P
><A
HREF="mathc.html#HYPOT"
>Example 16-53</A
></P
></LI
><LI
><P
><A
HREF="asciitable.html#ASCII3SH"
>Example T-3</A
></P
></LI
></OL
>
</P
><P
>That's all the awk we'll cover here, folks, but there's lots
more to learn. See the appropriate references in the <A
HREF="biblio.html"
><I
>Bibliography</I
></A
>.</P
></DIV
><H3
CLASS="FOOTNOTES"
>Notes</H3
><TABLE
BORDER="0"
CLASS="FOOTNOTES"
WIDTH="100%"
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.AEN23443"
HREF="awk.html#AEN23443"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
>Its name derives from the initials of its authors,
<B
CLASS="COMMAND"
>A</B
>ho, <B
CLASS="COMMAND"
>W</B
>einberg, and
<B
CLASS="COMMAND"
>K</B
>ernighan.</P
></TD
></TR
></TABLE
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="x23170.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="pathmanagement.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Sed</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="sedawk.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Parsing and Managing Pathnames</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>