old-www/LDP/abs/html/textproc.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML
><HEAD
><TITLE
>Text Processing Commands</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
REL="HOME"
TITLE="Advanced Bash-Scripting Guide"
HREF="index.html"><LINK
REL="UP"
TITLE="External Filters, Programs and Commands"
HREF="external.html"><LINK
REL="PREVIOUS"
TITLE="Time / Date Commands"
HREF="timedate.html"><LINK
REL="NEXT"
TITLE="File and Archiving Commands"
HREF="filearchiv.html"></HEAD
><BODY
CLASS="SECT1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Advanced Bash-Scripting Guide: </TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="timedate.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Chapter 16. External Filters, Programs and Commands</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="filearchiv.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="TEXTPROC"
></A
>16.4. Text Processing Commands</H1
><P
></P
><DIV
CLASS="VARIABLELIST"
><P
><B
><A
NAME="TPCOMMANDLISTING1"
></A
>Commands affecting text and
	   text files</B
></P
><DL
><DT
><A
NAME="SORTREF"
></A
><B
CLASS="COMMAND"
>sort</B
></DT
><DD
><P
>File sort utility, often used as a filter in a pipe. This
	      command sorts a <I
CLASS="FIRSTTERM"
>text stream</I
>
	      or file forwards or backwards, or according to various
	      keys or character positions. Using the <TT
CLASS="OPTION"
>-m</TT
>
	      option, it merges presorted input files.	The <I
CLASS="FIRSTTERM"
>info
	      page</I
> lists its many capabilities and options. See
	      <A
HREF="loops1.html#FINDSTRING"
>Example 11-10</A
>, <A
HREF="loops1.html#SYMLINKS"
>Example 11-11</A
>,
	      and <A
HREF="contributed-scripts.html#MAKEDICT"
>Example A-8</A
>.</P
></DD
><DT
><A
NAME="TSORTREF"
></A
><B
CLASS="COMMAND"
>tsort</B
></DT
><DD
><P
><I
CLASS="FIRSTTERM"
>Topological sort</I
>, reading in
	      pairs of whitespace-separated strings and sorting
	      according to input patterns. The original purpose of
	      <B
CLASS="COMMAND"
>tsort</B
> was to sort a list of dependencies
	      for an obsolete version of the <I
CLASS="FIRSTTERM"
>ld</I
>
	      linker in an <SPAN
CLASS="QUOTE"
>"ancient"</SPAN
> version of UNIX.</P
><P
>The results of a <I
CLASS="FIRSTTERM"
>tsort</I
> will usually
	      differ markedly from those of the standard
	      <B
CLASS="COMMAND"
>sort</B
> command, above.</P
></DD
><DT
><A
NAME="UNIQREF"
></A
><B
CLASS="COMMAND"
>uniq</B
></DT
><DD
><P
>This filter removes duplicate lines from a sorted
	      file. It is often seen in a pipe coupled with
	      <A
HREF="textproc.html#SORTREF"
>sort</A
>.</P
><P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>cat list-1 list-2 list-3 | sort | uniq &#62; final.list
# Concatenates the list files,
# sorts them,
# removes duplicate lines,
# and finally writes the result to an output file.</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>The useful <TT
CLASS="OPTION"
>-c</TT
> option prefixes each line of
	       the input file with its number of occurrences.</P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>cat testfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>This line occurs only once.
 This line occurs twice.
 This line occurs twice.
 This line occurs three times.
 This line occurs three times.
 This line occurs three times.</TT
>


<TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>uniq -c testfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>      1 This line occurs only once.
       2 This line occurs twice.
       3 This line occurs three times.</TT
>


<TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>sort testfile | uniq -c | sort -nr</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>      3 This line occurs three times.
       2 This line occurs twice.
       1 This line occurs only once.</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	     </P
><P
>The <TT
CLASS="USERINPUT"
><B
>sort INPUTFILE | uniq -c | sort -nr</B
></TT
>
	       command string produces a <I
CLASS="FIRSTTERM"
>frequency
	       of occurrence</I
> listing on the
	       <TT
CLASS="FILENAME"
>INPUTFILE</TT
> file (the
	       <TT
CLASS="OPTION"
>-nr</TT
> options to <B
CLASS="COMMAND"
>sort</B
>
	       cause a reverse numerical sort). This template finds
	       use in analysis of log files and dictionary lists, and
	       wherever the lexical structure of a document needs to
	       be examined.</P
><DIV
CLASS="EXAMPLE"
><A
NAME="WF"
></A
><P
><B
>Example 16-12. Word Frequency Analysis</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# wf.sh: Crude word frequency analysis on a text file.
# This is a more efficient version of the "wf2.sh" script.


# Check for input file on command-line.
ARGS=1
E_BADARGS=85
E_NOFILE=86

if [ $# -ne "$ARGS" ]  # Correct number of arguments passed to script?
then
  echo "Usage: `basename $0` filename"
  exit $E_BADARGS
fi

if [ ! -f "$1" ]       # Check if file exists.
then
  echo "File \"$1\" does not exist."
  exit $E_NOFILE
fi


########################################################
# main ()
sed -e 's/\.//g'  -e 's/\,//g' -e 's/ /\
/g' "$1" | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr
#                           =========================
#                            Frequency of occurrence

#  Filter out periods and commas, and
#+ change space between words to linefeed,
#+ then shift characters to lowercase, and
#+ finally prefix occurrence count and sort numerically.

#  Arun Giridhar suggests modifying the above to:
#  . . . | sort | uniq -c | sort +1 [-f] | sort +0 -nr
#  This adds a secondary sort key, so instances of
#+ equal occurrence are sorted alphabetically.
#  As he explains it:
#  "This is effectively a radix sort, first on the
#+ least significant column
#+ (word or string, optionally case-insensitive)
#+ and last on the most significant column (frequency)."
#
#  As Frank Wang explains, the above is equivalent to
#+       . . . | sort | uniq -c | sort +0 -nr
#+ and the following also works:
#+       . . . | sort | uniq -c | sort -k1nr -k
########################################################

exit 0

# Exercises:
# ---------
# 1) Add 'sed' commands to filter out other punctuation,
#+   such as semicolons.
# 2) Modify the script to also filter out multiple spaces and
#+   other whitespace.</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><P
>	       <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>cat testfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>This line occurs only once.
 This line occurs twice.
 This line occurs twice.
 This line occurs three times.
 This line occurs three times.
 This line occurs three times.</TT
>


<TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>./wf.sh testfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>      6 this
       6 occurs
       6 line
       3 times
       3 three
       2 twice
       1 only
       1 once</TT
>
	       </PRE
></FONT
></TD
></TR
></TABLE
>
	     </P
></DD
><DT
><A
NAME="EXPANDREF"
></A
><B
CLASS="COMMAND"
>expand</B
>, <B
CLASS="COMMAND"
>unexpand</B
></DT
><DD
><P
>The <B
CLASS="COMMAND"
>expand</B
> filter converts tabs to
	      spaces. It is often used in a <A
HREF="special-chars.html#PIPEREF"
>pipe</A
>.</P
><P
>The <B
CLASS="COMMAND"
>unexpand</B
> filter
	      converts spaces to tabs. This reverses the effect of
	      <B
CLASS="COMMAND"
>expand</B
>.</P
></DD
><DT
><A
NAME="CUTREF"
></A
><B
CLASS="COMMAND"
>cut</B
></DT
><DD
><P
>A tool for extracting <A
HREF="special-chars.html#FIELDREF"
>fields</A
> from files. It is similar
	      to the <TT
CLASS="USERINPUT"
><B
>print $N</B
></TT
> command set in <A
HREF="awk.html#AWKREF"
>awk</A
>, but more limited. It may be
	      simpler to use <I
CLASS="FIRSTTERM"
>cut</I
> in a script than
	      <I
CLASS="FIRSTTERM"
>awk</I
>. Particularly important are the
	      <TT
CLASS="OPTION"
>-d</TT
> (delimiter) and <TT
CLASS="OPTION"
>-f</TT
>
	      (field specifier) options.</P
><P
>Using <B
CLASS="COMMAND"
>cut</B
> to obtain a listing of the
	      mounted filesystems:
	      <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>cut -d ' ' -f1,2 /etc/mtab</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>Using <B
CLASS="COMMAND"
>cut</B
> to list the OS and kernel version:
	      <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>uname -a | cut -d" " -f1,3,11,12</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>Using <B
CLASS="COMMAND"
>cut</B
> to extract message headers from
	      an e-mail folder:

	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep '^Subject:' read-messages | cut -c10-80</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>Re: Linux suitable for mission-critical apps?
 MAKE MILLIONS WORKING AT HOME!!!
 Spam complaint
 Re: Spam complaint</TT
></PRE
></FONT
></TD
></TR
></TABLE
>
	    </P
><P
>Using <B
CLASS="COMMAND"
>cut</B
> to parse a file:
	      <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
># List all the users in /etc/passwd.

FILENAME=/etc/passwd

for user in $(cut -d: -f1 $FILENAME)
do
  echo $user
done

# Thanks, Oleg Philon for suggesting this.</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
><TT
CLASS="USERINPUT"
><B
>cut -d ' ' -f2,3 filename</B
></TT
> is equivalent to
	      <TT
CLASS="USERINPUT"
><B
>awk -F'[ ]' '{ print $2, $3 }' filename</B
></TT
></P
><DIV
CLASS="NOTE"
><P
></P
><TABLE
CLASS="NOTE"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/note.gif"
HSPACE="5"
ALT="Note"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>It is even possible to specify a linefeed as a
	      delimiter. The trick is to actually embed a linefeed
	      (<B
CLASS="KEYCAP"
>RETURN</B
>) in the command sequence.</P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>cut -d'
 ' -f3,7,19 testfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>This is line 3 of testfile.
 This is line 7 of testfile.
 This is line 19 of testfile.</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	  </P
><P
>Thank you, Jaka Kranjc, for pointing this out.</P
></TD
></TR
></TABLE
></DIV
><P
>See also <A
HREF="mathc.html#BASE"
>Example 16-48</A
>.</P
></DD
><DT
><A
NAME="PASTEREF"
></A
><B
CLASS="COMMAND"
>paste</B
></DT
><DD
><P
>Tool for merging together different files into a single,
	      multi-column file.  In combination with
	      <A
HREF="textproc.html#CUTREF"
>cut</A
>, useful for creating system log
	      files.
	    </P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>cat items</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>alphabet blocks
 building blocks
 cables</TT
>

<TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>cat prices</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>$1.00/dozen
 $2.50 ea.
 $3.75</TT
>

<TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>paste items prices</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>alphabet blocks $1.00/dozen
 building blocks $2.50 ea.
 cables  $3.75</TT
></PRE
></FONT
></TD
></TR
></TABLE
>
	  </P
></DD
><DT
><A
NAME="JOINREF"
></A
><B
CLASS="COMMAND"
>join</B
></DT
><DD
><P
>Consider this a special-purpose cousin of
	      <B
CLASS="COMMAND"
>paste</B
>. This powerful utility allows
	      merging two files in a meaningful fashion, which essentially
	      creates a simple version of a relational database.</P
><P
>The <B
CLASS="COMMAND"
>join</B
> command operates on
	      exactly two files, but pastes together only those lines
	      with a common tagged <A
HREF="special-chars.html#FIELDREF"
>field</A
>
	      (usually a numerical label), and writes the result to
	      <TT
CLASS="FILENAME"
>stdout</TT
>.  The files to be joined should
	      be sorted according to the tagged field for the matchups
	      to work properly.</P
><P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>File: 1.data

100 Shoes
200 Laces
300 Socks</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>File: 2.data

100 $40.00
200 $1.00
300 $2.00</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>join 1.data 2.data</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>File: 1.data 2.data

 100 Shoes $40.00
 200 Laces $1.00
 300 Socks $2.00</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	    </P
><DIV
CLASS="NOTE"
><P
></P
><TABLE
CLASS="NOTE"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/note.gif"
HSPACE="5"
ALT="Note"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>The tagged field appears only once in the
	      output.</P
></TD
></TR
></TABLE
></DIV
></DD
><DT
><A
NAME="HEADREF"
></A
><B
CLASS="COMMAND"
>head</B
></DT
><DD
><P
>lists the beginning of a file to <TT
CLASS="FILENAME"
>stdout</TT
>.
	      The default is <TT
CLASS="LITERAL"
>10</TT
> lines, but a different
	      number can be specified. The command has a number of
	      interesting options.

	    <DIV
CLASS="EXAMPLE"
><A
NAME="SCRIPTDETECTOR"
></A
><P
><B
>Example 16-13. Which files are scripts?</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# script-detector.sh: Detects scripts within a directory.

TESTCHARS=2    # Test first 2 characters.
SHABANG='#!'   # Scripts begin with a "sha-bang."

for file in *  # Traverse all the files in current directory.
do
  if [[ `head -c$TESTCHARS "$file"` = "$SHABANG" ]]
  #      head -c2                      #!
  #  The '-c' option to "head" outputs a specified
  #+ number of characters, rather than lines (the default).
  then
    echo "File \"$file\" is a script."
  else
    echo "File \"$file\" is *not* a script."
  fi
done

exit 0

#  Exercises:
#  ---------
#  1) Modify this script to take as an optional argument
#+    the directory to scan for scripts
#+    (rather than just the current working directory).
#
#  2) As it stands, this script gives "false positives" for
#+    Perl, awk, and other scripting language scripts.
#     Correct this.</PRE
></FONT
></TD
></TR
></TABLE
></DIV
>

	    <DIV
CLASS="EXAMPLE"
><A
NAME="RND"
></A
><P
><B
>Example 16-14. Generating 10-digit random numbers</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# rnd.sh: Outputs a 10-digit random number

# Script by Stephane Chazelas.

head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'


# =================================================================== #

# Analysis
# --------

# head:
# -c4 option takes first 4 bytes.

# od:
# -N4 option limits output to 4 bytes.
# -tu4 option selects unsigned decimal format for output.

# sed:
# -n option, in combination with "p" flag to the "s" command,
# outputs only matched lines.


# The author of this script explains the action of 'sed', as follows.

# head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'
# ----------------------------------&#62; |

# Assume output up to "sed" --------&#62; |
# is 0000000 1198195154\n

#  sed begins reading characters: 0000000 1198195154\n.
#  Here it finds a newline character,
#+ so it is ready to process the first line (0000000 1198195154).
#  It looks at its &#60;range&#62;&#60;action&#62;s. The first and only one is

#   range     action
#   1         s/.* //p

#  The line number is in the range, so it executes the action:
#+ tries to substitute the longest string ending with a space in the line
#  ("0000000 ") with nothing (//), and if it succeeds, prints the result
#  ("p" is a flag to the "s" command here, this is different
#+ from the "p" command).

#  sed is now ready to continue reading its input. (Note that before
#+ continuing, if -n option had not been passed, sed would have printed
#+ the line once again).

#  Now, sed reads the remainder of the characters, and finds the
#+ end of the file.
#  It is now ready to process its 2nd line (which is also numbered '$' as
#+ it's the last one).
#  It sees it is not matched by any &#60;range&#62;, so its job is done.

#  In few word this sed commmand means:
#  "On the first line only, remove any character up to the right-most space,
#+ then print it."

# A better way to do this would have been:
#           sed -e 's/.* //;q'

# Here, two &#60;range&#62;&#60;action&#62;s (could have been written
#           sed -e 's/.* //' -e q):

#   range                    action
#   nothing (matches line)   s/.* //
#   nothing (matches line)   q (quit)

#  Here, sed only reads its first line of input.
#  It performs both actions, and prints the line (substituted) before
#+ quitting (because of the "q" action) since the "-n" option is not passed.

# =================================================================== #

# An even simpler altenative to the above one-line script would be:
#           head -c4 /dev/urandom| od -An -tu4

exit</PRE
></FONT
></TD
></TR
></TABLE
></DIV
>

	      See also <A
HREF="filearchiv.html#EX52"
>Example 16-39</A
>.</P
></DD
><DT
><A
NAME="TAILREF"
></A
><B
CLASS="COMMAND"
>tail</B
></DT
><DD
><P
>lists the (tail) end of a file to <TT
CLASS="FILENAME"
>stdout</TT
>.
	      The default is <TT
CLASS="LITERAL"
>10</TT
> lines, but this can
	      be changed with the <TT
CLASS="OPTION"
>-n</TT
> option.
	      Commonly used to keep track of
	      changes to a system logfile, using the <TT
CLASS="OPTION"
>-f</TT
>
	      option, which outputs lines appended to the file.</P
><DIV
CLASS="EXAMPLE"
><A
NAME="EX12"
></A
><P
><B
>Example 16-15. Using <I
CLASS="FIRSTTERM"
>tail</I
> to monitor the system log</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash

filename=sys.log

cat /dev/null &#62; $filename; echo "Creating / cleaning out file."
#  Creates the file if it does not already exist,
#+ and truncates it to zero length if it does.
#  : &#62; filename   and   &#62; filename also work.

tail /var/log/messages &#62; $filename
# /var/log/messages must have world read permission for this to work.

echo "$filename contains tail end of system log."

exit 0</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><DIV
CLASS="TIP"
><P
></P
><TABLE
CLASS="TIP"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/tip.gif"
HSPACE="5"
ALT="Tip"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>To list a specific line of a text file,
	        <A
HREF="special-chars.html#PIPEREF"
>pipe</A
> the output of
	        <B
CLASS="COMMAND"
>head</B
> to <B
CLASS="COMMAND"
>tail -n 1</B
>.
		For example <TT
CLASS="USERINPUT"
><B
>head -n 8 database.txt | tail
		-n 1</B
></TT
> lists the 8th line of the file
		<TT
CLASS="FILENAME"
>database.txt</TT
>.</P
><P
>To set a variable to a given block of a text file:
	        <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>var=$(head -n $m $filename | tail -n $n)

# filename = name of file
# m = from beginning of file, number of lines to end of block
# n = number of lines to set variable to (trim from end of block)</PRE
></FONT
></TD
></TR
></TABLE
></P
></TD
></TR
></TABLE
></DIV
><DIV
CLASS="NOTE"
><P
></P
><TABLE
CLASS="NOTE"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/note.gif"
HSPACE="5"
ALT="Note"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>Newer implementations of <B
CLASS="COMMAND"
>tail</B
>
	        deprecate the older <B
CLASS="COMMAND"
>tail -$LINES
	        filename</B
> usage. The standard <B
CLASS="COMMAND"
>tail -n $LINES
	        filename</B
> is correct.</P
></TD
></TR
></TABLE
></DIV
><P
>See also <A
HREF="moreadv.html#EX41"
>Example 16-5</A
>, <A
HREF="filearchiv.html#EX52"
>Example 16-39</A
> and
		<A
HREF="debugging.html#ONLINE"
>Example 32-6</A
>.</P
></DD
><DT
><A
NAME="GREPREF"
></A
><B
CLASS="COMMAND"
>grep</B
></DT
><DD
><P
>A multi-purpose file search tool that uses
	      <A
HREF="regexp.html#REGEXREF"
>Regular Expressions</A
>.
	      It was originally a command/filter in the
	      venerable <B
CLASS="COMMAND"
>ed</B
> line editor:
	      <TT
CLASS="USERINPUT"
><B
>g/re/p</B
></TT
> -- <I
CLASS="FIRSTTERM"
>global -
	      regular expression - print</I
>.</P
><P
><P
><B
CLASS="COMMAND"
>grep</B
>   <TT
CLASS="REPLACEABLE"
><I
>pattern</I
></TT
>  [<TT
CLASS="REPLACEABLE"
><I
>file</I
></TT
>...]</P
>Search the target file(s) for
	      occurrences of <TT
CLASS="REPLACEABLE"
><I
>pattern</I
></TT
>, where
	      <TT
CLASS="REPLACEABLE"
><I
>pattern</I
></TT
> may be literal text
	      or a Regular Expression.</P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep '[rst]ystem.$' osinfo.txt</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>The GPL governs the distribution of the Linux operating system.</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	      </P
><P
>If no target file(s) specified, <B
CLASS="COMMAND"
>grep</B
>
	      works as a filter on <TT
CLASS="FILENAME"
>stdout</TT
>, as in
	      a <A
HREF="special-chars.html#PIPEREF"
>pipe</A
>.</P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>ps ax | grep clock</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>765 tty1     S      0:00 xclock
 901 pts/1    S      0:00 grep clock</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	      </P
><P
>The <TT
CLASS="OPTION"
>-i</TT
> option causes a case-insensitive
	      search.</P
><P
>The <TT
CLASS="OPTION"
>-w</TT
> option matches only whole
	      words.</P
><P
>The <TT
CLASS="OPTION"
>-l</TT
> option lists only the files in which
	      matches were found, but not the matching lines.</P
><P
>The <TT
CLASS="OPTION"
>-r</TT
> (recursive) option searches files in
	      the current working directory and all subdirectories below
	      it.</P
><P
>The <TT
CLASS="OPTION"
>-n</TT
> option lists the matching lines,
	      together with line numbers.</P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep -n Linux osinfo.txt</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>2:This is a file containing information about Linux.
 6:The GPL governs the distribution of the Linux operating system.</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	      </P
><P
>The <TT
CLASS="OPTION"
>-v</TT
> (or <TT
CLASS="OPTION"
>--invert-match</TT
>)
	      option <I
CLASS="FIRSTTERM"
>filters out</I
> matches.
	      <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>grep pattern1 *.txt | grep -v pattern2

# Matches all lines in "*.txt" files containing "pattern1",
# but ***not*** "pattern2".	      </PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>The <TT
CLASS="OPTION"
>-c</TT
> (<TT
CLASS="OPTION"
>--count</TT
>)
	      option gives a numerical count of matches, rather than
	      actually listing the matches.

	        <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>grep -c txt *.sgml   # (number of occurrences of "txt" in "*.sgml" files)


#   grep -cz .
#            ^ dot
# means count (-c) zero-separated (-z) items matching "."
# that is, non-empty ones (containing at least 1 character).
#
printf 'a b\nc  d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz .     # 3
printf 'a b\nc  d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz '$'   # 5
printf 'a b\nc  d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -cz '^'   # 5
#
printf 'a b\nc  d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -c '$'    # 9
# By default, newline chars (\n) separate items to match.

# Note that the -z option is GNU "grep" specific.


# Thanks, S.C.</PRE
></FONT
></TD
></TR
></TABLE
>
            </P
><P
>The <TT
CLASS="OPTION"
>--color</TT
> (or <TT
CLASS="OPTION"
>--colour</TT
>)
	      option marks the matching string in color (on the console
	      or in an <I
CLASS="FIRSTTERM"
>xterm</I
> window). Since
	      <I
CLASS="FIRSTTERM"
>grep</I
> prints out each entire line
	      containing the matching pattern, this lets you see exactly
	      <EM
>what</EM
> is being matched. See also
	      the <TT
CLASS="OPTION"
>-o</TT
> option, which shows only the
	      matching portion of the line(s).</P
><DIV
CLASS="EXAMPLE"
><A
NAME="FROMSH"
></A
><P
><B
>Example 16-16. Printing out the <I
CLASS="FIRSTTERM"
>From</I
> lines in
	        stored e-mail messages</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# from.sh

#  Emulates the useful 'from' utility in Solaris, BSD, etc.
#  Echoes the "From" header line in all messages
#+ in your e-mail directory.


MAILDIR=~/mail/*               #  No quoting of variable. Why?
# Maybe check if-exists $MAILDIR:   if [ -d $MAILDIR ] . . .
GREP_OPTS="-H -A 5 --color"    #  Show file, plus extra context lines
                               #+ and display "From" in color.
TARGETSTR="^From"              # "From" at beginning of line.

for file in $MAILDIR           #  No quoting of variable.
do
  grep $GREP_OPTS "$TARGETSTR" "$file"
  #    ^^^^^^^^^^              #  Again, do not quote this variable.
  echo
done

exit $?

#  You might wish to pipe the output of this script to 'more'
#+ or redirect it to a file . . .</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><P
>When invoked with more than one target file given,
	      <B
CLASS="COMMAND"
>grep</B
> specifies which file contains
	      matches.</P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep Linux osinfo.txt misc.txt</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>osinfo.txt:This is a file containing information about Linux.
 osinfo.txt:The GPL governs the distribution of the Linux operating system.
 misc.txt:The Linux operating system is steadily gaining in popularity.</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	    </P
><DIV
CLASS="TIP"
><P
></P
><TABLE
CLASS="TIP"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/tip.gif"
HSPACE="5"
ALT="Tip"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>To force <B
CLASS="COMMAND"
>grep</B
> to show the filename
	      when searching only one target file, simply give
	      <TT
CLASS="FILENAME"
>/dev/null</TT
> as the second file.</P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep Linux osinfo.txt /dev/null</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>osinfo.txt:This is a file containing information about Linux.
 osinfo.txt:The GPL governs the distribution of the Linux operating system.</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	    </P
></TD
></TR
></TABLE
></DIV
><P
>If there is a successful match, <B
CLASS="COMMAND"
>grep</B
>
	      returns an <A
HREF="exit-status.html#EXITSTATUSREF"
>exit status</A
>
	      of 0, which makes it useful in a condition test in a
	      script, especially in combination with the <TT
CLASS="OPTION"
>-q</TT
>
	      option to suppress output.
	        <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>SUCCESS=0                      # if grep lookup succeeds
word=Linux
filename=data.file

grep -q "$word" "$filename"    #  The "-q" option
                               #+ causes nothing to echo to stdout.
if [ $? -eq $SUCCESS ]
# if grep -q "$word" "$filename"   can replace lines 5 - 7.
then
  echo "$word found in $filename"
else
  echo "$word not found in $filename"
fi</PRE
></FONT
></TD
></TR
></TABLE
>
            </P
><P
><A
HREF="debugging.html#ONLINE"
>Example 32-6</A
> demonstrates how to use
	      <B
CLASS="COMMAND"
>grep</B
> to search for a word pattern in
	      a system logfile.</P
><DIV
CLASS="EXAMPLE"
><A
NAME="GRP"
></A
><P
><B
>Example 16-17. Emulating <I
CLASS="FIRSTTERM"
>grep</I
> in a script</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# grp.sh: Rudimentary reimplementation of grep.

E_BADARGS=85

if [ -z "$1" ]    # Check for argument to script.
then
  echo "Usage: `basename $0` pattern"
  exit $E_BADARGS
fi

echo

for file in *     # Traverse all files in $PWD.
do
  output=$(sed -n /"$1"/p $file)  # Command substitution.

  if [ ! -z "$output" ]           # What happens if "$output" is not quoted?
  then
    echo -n "$file: "
    echo "$output"
  fi              #  sed -ne "/$1/s|^|${file}: |p"  is equivalent to above.

  echo
done

echo

exit 0

# Exercises:
# ---------
# 1) Add newlines to output, if more than one match in any given file.
# 2) Add features.</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><P
>How can <B
CLASS="COMMAND"
>grep</B
> search for two (or
	      more) separate patterns? What if you want
	      <B
CLASS="COMMAND"
>grep</B
> to display all lines in a file
	      or files that contain both <SPAN
CLASS="QUOTE"
>"pattern1"</SPAN
>
	      <EM
>and</EM
> <SPAN
CLASS="QUOTE"
>"pattern2"</SPAN
>?</P
><P
>One method is to <A
HREF="special-chars.html#PIPEREF"
>pipe</A
> the result of <B
CLASS="COMMAND"
>grep
	      pattern1</B
> to <B
CLASS="COMMAND"
>grep pattern2</B
>.</P
><P
>For example, given the following file:</P
><P
>	    <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
># Filename: tstfile

This is a sample file.
This is an ordinary text file.
This file does not contain any unusual text.
This file is not unusual.
Here is some text.</PRE
></FONT
></TD
></TR
></TABLE
>
            </P
><P
>Now, let's search this file for lines containing
	      <EM
>both</EM
> <SPAN
CLASS="QUOTE"
>"file"</SPAN
> and
	      <SPAN
CLASS="QUOTE"
>"text"</SPAN
> . . . </P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep file tstfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
># Filename: tstfile
 This is a sample file.
 This is an ordinary text file.
 This file does not contain any unusual text.
 This file is not unusual.</TT
>

<TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep file tstfile | grep text</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>This is an ordinary text file.
 This file does not contain any unusual text.</TT
></PRE
></FONT
></TD
></TR
></TABLE
><P
>Now, for an interesting recreational use
	      of <I
CLASS="FIRSTTERM"
>grep</I
> . . .</P
><DIV
CLASS="EXAMPLE"
><A
NAME="CWSOLVER"
></A
><P
><B
>Example 16-18. Crossword puzzle solver</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# cw-solver.sh
# This is actually a wrapper around a one-liner (line 46).

#  Crossword puzzle and anagramming word game solver.
#  You know *some* of the letters in the word you're looking for,
#+ so you need a list of all valid words
#+ with the known letters in given positions.
#  For example: w...i....n
#               1???5????10
# w in position 1, 3 unknowns, i in the 5th, 4 unknowns, n at the end.
# (See comments at end of script.)


E_NOPATT=71
DICT=/usr/share/dict/word.lst
#                    ^^^^^^^^   Looks for word list here.
#  ASCII word list, one word per line.
#  If you happen to need an appropriate list,
#+ download the author's "yawl" word list package.
#  http://ibiblio.org/pub/Linux/libs/yawl-0.3.2.tar.gz
#  or
#  http://bash.deta.in/yawl-0.3.2.tar.gz


if [ -z "$1" ]   #  If no word pattern specified
then             #+ as a command-line argument . . .
  echo           #+ . . . then . . .
  echo "Usage:"  #+ Usage message.
  echo
  echo ""$0" \"pattern,\""
  echo "where \"pattern\" is in the form"
  echo "xxx..x.x..."
  echo
  echo "The x's represent known letters,"
  echo "and the periods are unknown letters (blanks)."
  echo "Letters and periods can be in any position."
  echo "For example, try:   sh cw-solver.sh w...i....n"
  echo
  exit $E_NOPATT
fi

echo
# ===============================================
# This is where all the work gets done.
grep ^"$1"$ "$DICT"   # Yes, only one line!
#    |    |
# ^ is start-of-word regex anchor.
# $ is end-of-word regex anchor.

#  From _Stupid Grep Tricks_, vol. 1,
#+ a book the ABS Guide author may yet get around
#+ to writing . . . one of these days . . .
# ===============================================
echo


exit $?  # Script terminates here.
#  If there are too many words generated,
#+ redirect the output to a file.

$ sh cw-solver.sh w...i....n

wellington
workingman
workingmen</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><P
><A
NAME="EGREPREF"
></A
><B
CLASS="COMMAND"
>egrep</B
>
	      -- <I
CLASS="FIRSTTERM"
>extended grep</I
> -- is the same
	      as <B
CLASS="COMMAND"
>grep -E</B
>. This uses a somewhat
	      different, extended set of <A
HREF="regexp.html#REGEXREF"
>Regular
	      Expressions</A
>, which can make the search a bit more
	      flexible. It also allows the boolean |
	      (<I
CLASS="FIRSTTERM"
>or</I
>) operator.
	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash $ </TT
><TT
CLASS="USERINPUT"
><B
>egrep 'matches|Matches' file.txt</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>Line 1 matches.
 Line 3 Matches.
 Line 4 contains matches, but also Matches</TT
>
              </PRE
></FONT
></TD
></TR
></TABLE
>
	      </P
><P
><A
NAME="FGREPREF"
></A
><B
CLASS="COMMAND"
>fgrep</B
> --
	      <I
CLASS="FIRSTTERM"
>fast grep</I
> -- is the same as
	      <B
CLASS="COMMAND"
>grep -F</B
>. It does a literal string search
	      (no <A
HREF="regexp.html#REGEXREF"
>Regular Expressions</A
>),
	      which generally speeds things up a bit.</P
><DIV
CLASS="NOTE"
><P
></P
><TABLE
CLASS="NOTE"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/note.gif"
HSPACE="5"
ALT="Note"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>On some Linux distros, <B
CLASS="COMMAND"
>egrep</B
> and
	      <B
CLASS="COMMAND"
>fgrep</B
> are symbolic links to, or aliases for
	      <B
CLASS="COMMAND"
>grep</B
>, but invoked with the
	      <TT
CLASS="OPTION"
>-E</TT
> and <TT
CLASS="OPTION"
>-F</TT
> options,
	      respectively.</P
></TD
></TR
></TABLE
></DIV
><DIV
CLASS="EXAMPLE"
><A
NAME="DICTLOOKUP"
></A
><P
><B
>Example 16-19. Looking up definitions in <I
CLASS="CITETITLE"
>Webster's 1913 Dictionary</I
></B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# dict-lookup.sh

#  This script looks up definitions in the 1913 Webster's Dictionary.
#  This Public Domain dictionary is available for download
#+ from various sites, including
#+ Project Gutenberg (http://www.gutenberg.org/etext/247).
#
#  Convert it from DOS to UNIX format (with only LF at end of line)
#+ before using it with this script.
#  Store the file in plain, uncompressed ASCII text.
#  Set DEFAULT_DICTFILE variable below to path/filename.


E_BADARGS=85
MAXCONTEXTLINES=50                        # Maximum number of lines to show.
DEFAULT_DICTFILE="/usr/share/dict/webster1913-dict.txt"
                                          # Default dictionary file pathname.
                                          # Change this as necessary.
#  Note:
#  ----
#  This particular edition of the 1913 Webster's
#+ begins each entry with an uppercase letter
#+ (lowercase for the remaining characters).
#  Only the *very first line* of an entry begins this way,
#+ and that's why the search algorithm below works.


if [[ -z $(echo "$1" | sed -n '/^[A-Z]/p') ]]
#  Must at least specify word to look up, and
#+ it must start with an uppercase letter.
then
  echo "Usage: `basename $0` Word-to-define [dictionary-file]"
  echo
  echo "Note: Word to look up must start with capital letter,"
  echo "with the rest of the word in lowercase."
  echo "--------------------------------------------"
  echo "Examples: Abandon, Dictionary, Marking, etc."
  exit $E_BADARGS
fi


if [ -z "$2" ]                            #  May specify different dictionary
                                          #+ as an argument to this script.
then
  dictfile=$DEFAULT_DICTFILE
else
  dictfile="$2"
fi

# ---------------------------------------------------------
Definition=$(fgrep -A $MAXCONTEXTLINES "$1 \\" "$dictfile")
#                  Definitions in form "Word \..."
#
#  And, yes, "fgrep" is fast enough
#+ to search even a very large text file.


# Now, snip out just the definition block.

echo "$Definition" |
sed -n '1,/^[A-Z]/p' |
#  Print from first line of output
#+ to the first line of the next entry.
sed '$d' | sed '$d'
#  Delete last two lines of output
#+ (blank line and first line of next entry).
# ---------------------------------------------------------

exit $?

# Exercises:
# ---------
# 1)  Modify the script to accept any type of alphabetic input
#   + (uppercase, lowercase, mixed case), and convert it
#   + to an acceptable format for processing.
#
# 2)  Convert the script to a GUI application,
#   + using something like 'gdialog' or 'zenity' . . .
#     The script will then no longer take its argument(s)
#   + from the command-line.
#
# 3)  Modify the script to parse one of the other available
#   + Public Domain Dictionaries, such as the U.S. Census Bureau Gazetteer.</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><DIV
CLASS="NOTE"
><P
></P
><TABLE
CLASS="NOTE"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/note.gif"
HSPACE="5"
ALT="Note"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>See also <A
HREF="contributed-scripts.html#QKY"
>Example A-41</A
> for an example
	      of speedy <I
CLASS="FIRSTTERM"
>fgrep</I
> lookup on a large
	      text file.</P
></TD
></TR
></TABLE
></DIV
><P
><A
NAME="AGREPREF"
></A
></P
><P
><B
CLASS="COMMAND"
>agrep</B
> (<I
CLASS="FIRSTTERM"
>approximate
	      grep</I
>) extends the capabilities of
	      <B
CLASS="COMMAND"
>grep</B
> to approximate matching. The search
	      string may differ by a specified number of characters
	      from the resulting matches. This utility is not part of
	      the core Linux distribution.</P
><P
><A
NAME="ZEGREPREF"
></A
></P
><DIV
CLASS="TIP"
><P
></P
><TABLE
CLASS="TIP"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/tip.gif"
HSPACE="5"
ALT="Tip"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>To search compressed files, use
	      <B
CLASS="COMMAND"
>zgrep</B
>, <B
CLASS="COMMAND"
>zegrep</B
>, or
	      <B
CLASS="COMMAND"
>zfgrep</B
>. These also work on non-compressed
	      files, though slower than plain <B
CLASS="COMMAND"
>grep</B
>,
	      <B
CLASS="COMMAND"
>egrep</B
>, <B
CLASS="COMMAND"
>fgrep</B
>.
	      They are handy for searching through a mixed set of files,
	      some compressed, some not.</P
><P
><A
NAME="BZGREPREF"
></A
></P
><P
>To search <A
HREF="filearchiv.html#BZIPREF"
>bzipped</A
>
	      files, use <B
CLASS="COMMAND"
>bzgrep</B
>.</P
></TD
></TR
></TABLE
></DIV
></DD
><DT
><A
NAME="LOOKREF"
></A
><B
CLASS="COMMAND"
>look</B
></DT
><DD
><P
>The command <B
CLASS="COMMAND"
>look</B
> works like
	      <B
CLASS="COMMAND"
>grep</B
>, but does a lookup on
	      a <SPAN
CLASS="QUOTE"
>"dictionary,"</SPAN
> a sorted word list.
	      By default, <B
CLASS="COMMAND"
>look</B
> searches for a match
	      in <TT
CLASS="FILENAME"
>/usr/dict/words</TT
>, but a different
	      dictionary file may be specified.</P
><DIV
CLASS="EXAMPLE"
><A
NAME="LOOKUP"
></A
><P
><B
>Example 16-20. Checking words in a list for validity</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# lookup: Does a dictionary lookup on each word in a data file.

file=words.data  # Data file from which to read words to test.

echo
echo "Testing file $file"
echo

while [ "$word" != end ]  # Last word in data file.
do               # ^^^
  read word      # From data file, because of redirection at end of loop.
  look $word &#62; /dev/null  # Don't want to display lines in dictionary file.
  #  Searches for words in the file /usr/share/dict/words
  #+ (usually a link to linux.words).
  lookup=$?      # Exit status of 'look' command.

  if [ "$lookup" -eq 0 ]
  then
    echo "\"$word\" is valid."
  else
    echo "\"$word\" is invalid."
  fi

done &#60;"$file"    # Redirects stdin to $file, so "reads" come from there.

echo

exit 0

# ----------------------------------------------------------------
# Code below line will not execute because of "exit" command above.


# Stephane Chazelas proposes the following, more concise alternative:

while read word &#38;&#38; [[ $word != end ]]
do if look "$word" &#62; /dev/null
   then echo "\"$word\" is valid."
   else echo "\"$word\" is invalid."
   fi
done &#60;"$file"

exit 0</PRE
></FONT
></TD
></TR
></TABLE
></DIV
></DD
><DT
><B
CLASS="COMMAND"
>sed</B
>, <B
CLASS="COMMAND"
>awk</B
></DT
><DD
><P
>Scripting languages especially suited for parsing text
	      files and command output. May be embedded singly or in
	      combination in pipes and shell scripts.</P
></DD
><DT
><B
CLASS="COMMAND"
><A
HREF="sedawk.html#SEDREF"
>sed</A
></B
></DT
><DD
><P
>Non-interactive <SPAN
CLASS="QUOTE"
>"stream editor"</SPAN
>, permits using
	      many <B
CLASS="COMMAND"
>ex</B
> commands in <A
HREF="timedate.html#BATCHPROCREF"
>batch</A
> mode. It finds many
	      uses in shell scripts.</P
></DD
><DT
><B
CLASS="COMMAND"
><A
HREF="awk.html#AWKREF"
>awk</A
></B
></DT
><DD
><P
>Programmable file extractor and formatter, good for
	      manipulating and/or extracting <A
HREF="special-chars.html#FIELDREF"
>fields</A
> (columns) in structured
	      text files. Its syntax is similar to C.</P
></DD
><DT
><A
NAME="WCREF"
></A
><B
CLASS="COMMAND"
>wc</B
></DT
><DD
><P
><I
CLASS="FIRSTTERM"
>wc</I
> gives a <SPAN
CLASS="QUOTE"
>"word
	      count"</SPAN
> on a file or I/O stream:

	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash $ </TT
><TT
CLASS="USERINPUT"
><B
>wc /usr/share/doc/sed-4.1.2/README</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>13  70  447 README</TT
>
[13 lines  70 words  447 characters]</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
><TT
CLASS="USERINPUT"
><B
>wc -w</B
></TT
> gives only the word count.</P
><P
><TT
CLASS="USERINPUT"
><B
>wc -l</B
></TT
> gives only the line count.</P
><P
><TT
CLASS="USERINPUT"
><B
>wc -c</B
></TT
> gives only the byte count.</P
><P
><TT
CLASS="USERINPUT"
><B
>wc -m</B
></TT
> gives only the character count.</P
><P
><TT
CLASS="USERINPUT"
><B
>wc -L</B
></TT
> gives only the length of the longest line.</P
><P
>Using <B
CLASS="COMMAND"
>wc</B
> to count how many
	    <TT
CLASS="FILENAME"
>.txt</TT
> files are in current working directory:
	      <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>$ ls *.txt | wc -l
#  Will work as long as none of the "*.txt" files
#+ have a linefeed embedded in their name.

#  Alternative ways of doing this are:
#      find . -maxdepth 1 -name \*.txt -print0 | grep -cz .
#      (shopt -s nullglob; set -- *.txt; echo $#)

#  Thanks, S.C.</PRE
></FONT
></TD
></TR
></TABLE
>
	    </P
><P
>Using <B
CLASS="COMMAND"
>wc</B
> to total up the size of all the
	      files whose names begin with letters in the range d - h
	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>wc [d-h]* | grep total | awk '{print $3}'</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>71832</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	    </P
><P
>Using <B
CLASS="COMMAND"
>wc</B
> to count the instances of the
	      word <SPAN
CLASS="QUOTE"
>"Linux"</SPAN
> in the main source file for
	      this book.
	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep Linux abs-book.sgml | wc -l</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>138</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	    </P
><P
>See also <A
HREF="filearchiv.html#EX52"
>Example 16-39</A
> and <A
HREF="redircb.html#REDIR4"
>Example 20-8</A
>.</P
><P
>Certain commands include some of the
	      functionality of <B
CLASS="COMMAND"
>wc</B
> as options.

	    <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>... | grep foo | wc -l
# This frequently used construct can be more concisely rendered.

... | grep -c foo
# Just use the "-c" (or "--count") option of grep.

# Thanks, S.C.</PRE
></FONT
></TD
></TR
></TABLE
></P
></DD
><DT
><A
NAME="TRREF"
></A
><B
CLASS="COMMAND"
>tr</B
></DT
><DD
><P
>character translation filter.</P
><DIV
CLASS="CAUTION"
><P
></P
><TABLE
CLASS="CAUTION"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/caution.gif"
HSPACE="5"
ALT="Caution"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
><A
HREF="special-chars.html#UCREF"
>Must use quoting and/or
	      brackets</A
>, as appropriate. Quotes prevent the
	      shell from reinterpreting the special characters in
	      <B
CLASS="COMMAND"
>tr</B
> command sequences. Brackets should be
	      quoted to prevent expansion by the shell.  </P
></TD
></TR
></TABLE
></DIV
><P
>Either <TT
CLASS="USERINPUT"
><B
>tr "A-Z" "*" &#60;filename</B
></TT
>
	      or <TT
CLASS="USERINPUT"
><B
>tr A-Z \* &#60;filename</B
></TT
> changes
	      all the uppercase letters in <TT
CLASS="FILENAME"
>filename</TT
>
	      to asterisks (writes to <TT
CLASS="FILENAME"
>stdout</TT
>).
	      On some systems this may not work, but <TT
CLASS="USERINPUT"
><B
>tr A-Z
	      '[**]'</B
></TT
> will.</P
><P
><A
NAME="TROPTIONS"
></A
></P
><P
>The <TT
CLASS="OPTION"
>-d</TT
> option deletes a range of
	      characters.
	    <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>echo "abcdef"                 # abcdef
echo "abcdef" | tr -d b-d     # aef


tr -d 0-9 &#60;filename
# Deletes all digits from the file "filename".</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>The <TT
CLASS="OPTION"
>--squeeze-repeats</TT
> (or
              <TT
CLASS="OPTION"
>-s</TT
>) option deletes all but the
              first instance of a string of consecutive characters.
              This option is useful for removing excess <A
HREF="special-chars.html#WHITESPACEREF"
>whitespace</A
>.


	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>echo "XXXXX" | tr --squeeze-repeats 'X'</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>X</TT
></PRE
></FONT
></TD
></TR
></TABLE
></P
><P
>The <TT
CLASS="OPTION"
>-c</TT
> <SPAN
CLASS="QUOTE"
>"complement"</SPAN
>
	      option <I
CLASS="FIRSTTERM"
>inverts</I
> the character set to
	      match. With this option, <B
CLASS="COMMAND"
>tr</B
> acts only
	      upon those characters <EM
>not</EM
> matching
	      the specified set.</P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>echo "acfdeb123" | tr -c b-d +</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>+c+d+b++++</TT
></PRE
></FONT
></TD
></TR
></TABLE
>
            </P
><P
>Note that <B
CLASS="COMMAND"
>tr</B
> recognizes <A
HREF="x17129.html#POSIXREF"
>POSIX character classes</A
>.
	         <A
NAME="AEN11502"
HREF="#FTN.AEN11502"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
>
	      </P
><P
>	      <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>echo "abcd2ef1" | tr '[:alpha:]' -</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>----2--1</TT
>
	      </PRE
></FONT
></TD
></TR
></TABLE
>
	    </P
><DIV
CLASS="EXAMPLE"
><A
NAME="EX49"
></A
><P
><B
>Example 16-21. <I
CLASS="FIRSTTERM"
>toupper</I
>: Transforms a file
	      to all uppercase.</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# Changes a file to all uppercase.

E_BADARGS=85

if [ -z "$1" ]  # Standard check for command-line arg.
then
  echo "Usage: `basename $0` filename"
  exit $E_BADARGS
fi

tr a-z A-Z &#60;"$1"

# Same effect as above, but using POSIX character set notation:
#        tr '[:lower:]' '[:upper:]' &#60;"$1"
# Thanks, S.C.

#     Or even . . .
#     cat "$1" | tr a-z A-Z
#     Or dozens of other ways . . .

exit 0

#  Exercise:
#  Rewrite this script to give the option of changing a file
#+ to *either* upper or lowercase.
#  Hint: Use either the "case" or "select" command.</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><DIV
CLASS="EXAMPLE"
><A
NAME="LOWERCASE"
></A
><P
><B
>Example 16-22. <I
CLASS="FIRSTTERM"
>lowercase</I
>: Changes all
	      filenames in working directory to lowercase.</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
#
#  Changes every filename in working directory to all lowercase.
#
#  Inspired by a script of John Dubois,
#+ which was translated into Bash by Chet Ramey,
#+ and considerably simplified by the author of the ABS Guide.


for filename in *                # Traverse all files in directory.
do
   fname=`basename $filename`
   n=`echo $fname | tr A-Z a-z`  # Change name to lowercase.
   if [ "$fname" != "$n" ]       # Rename only files not already lowercase.
   then
     mv $fname $n
   fi
done

exit $?


# Code below this line will not execute because of "exit".
#--------------------------------------------------------#
# To run it, delete script above line.

# The above script will not work on filenames containing blanks or newlines.
# Stephane Chazelas therefore suggests the following alternative:


for filename in *    # Not necessary to use basename,
                     # since "*" won't return any file containing "/".
do n=`echo "$filename/" | tr '[:upper:]' '[:lower:]'`
#                             POSIX char set notation.
#                    Slash added so that trailing newlines are not
#                    removed by command substitution.
   # Variable substitution:
   n=${n%/}          # Removes trailing slash, added above, from filename.
   [[ $filename == $n ]] || mv "$filename" "$n"
                     # Checks if filename already lowercase.
done

exit $?</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><P
><A
NAME="TRD2U"
></A
></P
><DIV
CLASS="EXAMPLE"
><A
NAME="DU"
></A
><P
><B
>Example 16-23. <I
CLASS="FIRSTTERM"
>du</I
>: DOS to UNIX text file conversion.</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# Du.sh: DOS to UNIX text file converter.

E_WRONGARGS=85

if [ -z "$1" ]
then
  echo "Usage: `basename $0` filename-to-convert"
  exit $E_WRONGARGS
fi

NEWFILENAME=$1.unx

CR='\015'  # Carriage return.
           # 015 is octal ASCII code for CR.
           # Lines in a DOS text file end in CR-LF.
           # Lines in a UNIX text file end in LF only.

tr -d $CR &#60; $1 &#62; $NEWFILENAME
# Delete CR's and write to new file.

echo "Original DOS text file is \"$1\"."
echo "Converted UNIX text file is \"$NEWFILENAME\"."

exit 0

# Exercise:
# --------
# Change the above script to convert from UNIX to DOS.</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><DIV
CLASS="EXAMPLE"
><A
NAME="ROT13"
></A
><P
><B
>Example 16-24. <I
CLASS="FIRSTTERM"
>rot13</I
>: ultra-weak encryption.</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# rot13.sh: Classic rot13 algorithm,
#           encryption that might fool a 3-year old
#           for about 10 minutes.

# Usage: ./rot13.sh filename
# or     ./rot13.sh &#60;filename
# or     ./rot13.sh and supply keyboard input (stdin)

cat "$@" | tr 'a-zA-Z' 'n-za-mN-ZA-M'   # "a" goes to "n", "b" to "o" ...
#  The   cat "$@"   construct
#+ permits input either from stdin or from files.

exit 0</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><DIV
CLASS="EXAMPLE"
><A
NAME="CRYPTOQUOTE"
></A
><P
><B
>Example 16-25. Generating <SPAN
CLASS="QUOTE"
>"Crypto-Quote"</SPAN
> Puzzles</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# crypto-quote.sh: Encrypt quotes

#  Will encrypt famous quotes in a simple monoalphabetic substitution.
#  The result is similar to the "Crypto Quote" puzzles
#+ seen in the Op Ed pages of the Sunday paper.


key=ETAOINSHRDLUBCFGJMQPVWZYXK
# The "key" is nothing more than a scrambled alphabet.
# Changing the "key" changes the encryption.

# The 'cat "$@"' construction gets input either from stdin or from files.
# If using stdin, terminate input with a Control-D.
# Otherwise, specify filename as command-line parameter.

cat "$@" | tr "a-z" "A-Z" | tr "A-Z" "$key"
#        |  to uppercase  |     encrypt
# Will work on lowercase, uppercase, or mixed-case quotes.
# Passes non-alphabetic characters through unchanged.


# Try this script with something like:
# "Nothing so needs reforming as other people's habits."
# --Mark Twain
#
# Output is:
# "CFPHRCS QF CIIOQ MINFMBRCS EQ FPHIM GIFGUI'Q HETRPQ."
# --BEML PZERC

# To reverse the encryption:
# cat "$@" | tr "$key" "A-Z"


#  This simple-minded cipher can be broken by an average 12-year old
#+ using only pencil and paper.

exit 0

#  Exercise:
#  --------
#  Modify the script so that it will either encrypt or decrypt,
#+ depending on command-line argument(s).</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><P
><A
NAME="JABH"
></A
>Of course, <I
CLASS="FIRSTTERM"
>tr</I
>
              lends itself to <I
CLASS="FIRSTTERM"
>code
              obfuscation</I
>.</P
><P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# jabh.sh

x="wftedskaebjgdBstbdbsmnjgz"
echo $x | tr "a-z" 'oh, turtleneck Phrase Jar!'

# Based on the Wikipedia "Just another Perl hacker" article.</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
><A
NAME="TRVARIANTS"
></A
></P
><TABLE
CLASS="SIDEBAR"
BORDER="1"
CELLPADDING="5"
><TR
><TD
><DIV
CLASS="SIDEBAR"
><A
NAME="AEN11540"
></A
><P
><B
><I
CLASS="FIRSTTERM"
>tr</I
> variants</B
></P
><P
>	    The <B
CLASS="COMMAND"
>tr</B
> utility has two historic
	    variants. The BSD version does not use brackets
	    (<TT
CLASS="USERINPUT"
><B
>tr a-z A-Z</B
></TT
>), but the SysV one does
	    (<TT
CLASS="USERINPUT"
><B
>tr '[a-z]' '[A-Z]'</B
></TT
>). The GNU version
	    of <B
CLASS="COMMAND"
>tr</B
> resembles the BSD one.
	    </P
></DIV
></TD
></TR
></TABLE
></DD
><DT
><A
NAME="FOLDREF"
></A
><B
CLASS="COMMAND"
>fold</B
></DT
><DD
><P
>A filter that wraps lines of input to a specified width.
	      This is especially useful with the <TT
CLASS="OPTION"
>-s</TT
>
	      option, which breaks lines at word spaces (see <A
HREF="textproc.html#EX50"
>Example 16-26</A
> and <A
HREF="contributed-scripts.html#MAILFORMAT"
>Example A-1</A
>).</P
></DD
><DT
><A
NAME="FMTREF"
></A
><B
CLASS="COMMAND"
>fmt</B
></DT
><DD
><P
>Simple-minded file formatter, used as a filter in a
	      pipe to <SPAN
CLASS="QUOTE"
>"wrap"</SPAN
> long lines of text
	      output.</P
><DIV
CLASS="EXAMPLE"
><A
NAME="EX50"
></A
><P
><B
>Example 16-26. Formatted file listing.</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash

WIDTH=40                    # 40 columns wide.

b=`ls /usr/local/bin`       # Get a file listing...

echo $b | fmt -w $WIDTH

# Could also have been done by
#    echo $b | fold - -s -w $WIDTH

exit 0</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><P
>See also <A
HREF="moreadv.html#EX41"
>Example 16-5</A
>.</P
><DIV
CLASS="TIP"
><P
></P
><TABLE
CLASS="TIP"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/tip.gif"
HSPACE="5"
ALT="Tip"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>A powerful alternative to <B
CLASS="COMMAND"
>fmt</B
> is
	      Kamil Toman's <B
CLASS="COMMAND"
>par</B
>
	      utility, available from <A
HREF="http://www.cs.berkeley.edu/~amc/Par/"
TARGET="_top"
>http://www.cs.berkeley.edu/~amc/Par/</A
>.
	      </P
></TD
></TR
></TABLE
></DIV
></DD
><DT
><A
NAME="COLREF"
></A
><B
CLASS="COMMAND"
>col</B
></DT
><DD
><P
>This deceptively named filter removes reverse line feeds
	      from an input stream. It also attempts to replace
	      whitespace with equivalent tabs. The chief use of
	      <B
CLASS="COMMAND"
>col</B
> is in filtering the output
	      from certain text processing utilities, such as
	      <B
CLASS="COMMAND"
>groff</B
> and <B
CLASS="COMMAND"
>tbl</B
>.</P
></DD
><DT
><A
NAME="COLUMNREF"
></A
><B
CLASS="COMMAND"
>column</B
></DT
><DD
><P
>Column formatter. This filter transforms list-type
	      text output into a <SPAN
CLASS="QUOTE"
>"pretty-printed"</SPAN
> table
	      by inserting tabs at appropriate places.</P
><DIV
CLASS="EXAMPLE"
><A
NAME="COL"
></A
><P
><B
>Example 16-27. Using <I
CLASS="FIRSTTERM"
>column</I
> to format a directory
	        listing</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# colms.sh
# A minor modification of the example file in the "column" man page.


(printf "PERMISSIONS LINKS OWNER GROUP SIZE MONTH DAY HH:MM PROG-NAME\n" \
; ls -l | sed 1d) | column -t
#         ^^^^^^           ^^

#  The "sed 1d" in the pipe deletes the first line of output,
#+ which would be "total        N",
#+ where "N" is the total number of files found by "ls -l".

# The -t option to "column" pretty-prints a table.

exit 0</PRE
></FONT
></TD
></TR
></TABLE
></DIV
></DD
><DT
><A
NAME="COLRMREF"
></A
><B
CLASS="COMMAND"
>colrm</B
></DT
><DD
><P
>Column removal filter. This removes columns (characters)
	      from a file and writes the file, lacking the range of
	      specified columns, back to <TT
CLASS="FILENAME"
>stdout</TT
>.
	      <TT
CLASS="USERINPUT"
><B
>colrm 2 4 &#60;filename</B
></TT
> removes the
	      second through fourth characters from each line of the
	      text file <TT
CLASS="FILENAME"
>filename</TT
>.</P
><DIV
CLASS="CAUTION"
><P
></P
><TABLE
CLASS="CAUTION"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/caution.gif"
HSPACE="5"
ALT="Caution"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>If the file contains tabs or nonprintable
	      characters, this may cause unpredictable
	      behavior. In such cases, consider using
	      <A
HREF="textproc.html#EXPANDREF"
>expand</A
> and
	      <B
CLASS="COMMAND"
>unexpand</B
> in a pipe preceding
	      <B
CLASS="COMMAND"
>colrm</B
>.</P
></TD
></TR
></TABLE
></DIV
></DD
><DT
><A
NAME="NLREF"
></A
><B
CLASS="COMMAND"
>nl</B
></DT
><DD
><P
>Line numbering filter: <TT
CLASS="USERINPUT"
><B
>nl filename</B
></TT
>
	    lists <TT
CLASS="FILENAME"
>filename</TT
> to
	    <TT
CLASS="FILENAME"
>stdout</TT
>, but inserts consecutive
	    numbers at the beginning of each non-blank line. If
	    <TT
CLASS="FILENAME"
>filename</TT
> omitted, operates on
	    <TT
CLASS="FILENAME"
>stdin.</TT
></P
><P
>The output of <B
CLASS="COMMAND"
>nl</B
> is very similar to
	      <TT
CLASS="USERINPUT"
><B
>cat -b</B
></TT
>, since, by default
	      <B
CLASS="COMMAND"
>nl</B
> does not list blank lines.</P
><DIV
CLASS="EXAMPLE"
><A
NAME="LNUM"
></A
><P
><B
>Example 16-28. <I
CLASS="FIRSTTERM"
>nl</I
>: A self-numbering script.</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# line-number.sh

# This script echoes itself twice to stdout with its lines numbered.

echo "     line number = $LINENO" # 'nl' sees this as line 4
#                                   (nl does not number blank lines).
#                                   'cat -n' sees it correctly as line #6.

nl `basename $0`

echo; echo  # Now, let's try it with 'cat -n'

cat -n `basename $0`
# The difference is that 'cat -n' numbers the blank lines.
# Note that 'nl -ba' will also do so.

exit 0
# -----------------------------------------------------------------</PRE
></FONT
></TD
></TR
></TABLE
></DIV
></DD
><DT
><A
NAME="PRREF"
></A
><B
CLASS="COMMAND"
>pr</B
></DT
><DD
><P
>Print formatting filter. This will paginate files
	      (or <TT
CLASS="FILENAME"
>stdout</TT
>) into sections suitable for
	      hard copy printing or viewing on screen.	Various options
	      permit row and column manipulation, joining lines, setting
	      margins, numbering lines, adding page headers, and merging
	      files, among other things. The <B
CLASS="COMMAND"
>pr</B
>
	      command combines much of the functionality of
	      <B
CLASS="COMMAND"
>nl</B
>, <B
CLASS="COMMAND"
>paste</B
>,
	      <B
CLASS="COMMAND"
>fold</B
>, <B
CLASS="COMMAND"
>column</B
>, and
	      <B
CLASS="COMMAND"
>expand</B
>.</P
><P
><TT
CLASS="USERINPUT"
><B
>pr -o 5 --width=65 fileZZZ | more</B
></TT
>
	     gives a nice paginated listing to screen of
	     <TT
CLASS="FILENAME"
>fileZZZ</TT
> with margins set at 5 and
	     65.</P
><P
>A particularly useful option is <TT
CLASS="OPTION"
>-d</TT
>,
	      forcing double-spacing (same effect as <B
CLASS="COMMAND"
>sed
	      -G</B
>).</P
></DD
><DT
><A
NAME="GETTEXTREF"
></A
><B
CLASS="COMMAND"
>gettext</B
></DT
><DD
><P
>The GNU <B
CLASS="COMMAND"
>gettext</B
> package is a set of
	      utilities for <A
HREF="localization.html"
>localizing</A
>
	      and translating the text output of programs into foreign
	      languages. While originally intended for C programs, it
	      now supports quite a number of programming and scripting
	      languages.</P
><P
>The  <B
CLASS="COMMAND"
>gettext</B
>
	      <EM
>program</EM
> works on shell scripts. See
	      the <TT
CLASS="REPLACEABLE"
><I
>info page</I
></TT
>.</P
></DD
><DT
><A
NAME="MSGFMTREF"
></A
><B
CLASS="COMMAND"
>msgfmt</B
></DT
><DD
><P
>A program for generating binary
	      message catalogs. It is used for <A
HREF="localization.html"
>localization</A
>.</P
></DD
><DT
><A
NAME="ICONVREF"
></A
><B
CLASS="COMMAND"
>iconv</B
></DT
><DD
><P
>A utility for converting file(s) to a different encoding
	      (character set). Its chief use is for <A
HREF="localization.html"
>localization</A
>.</P
><P
>	    <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
># Convert a string from UTF-8 to UTF-16 and print to the BookList
function write_utf8_string {
    STRING=$1
    BOOKLIST=$2
    echo -n "$STRING" | iconv -f UTF8 -t UTF16 | \
    cut -b 3- | tr -d \\n &#62;&#62; "$BOOKLIST"
}

#  From Peter Knowles' "booklistgen.sh" script
#+ for converting files to Sony Librie/PRS-50X format.
#  (http://booklistgensh.peterknowles.com)</PRE
></FONT
></TD
></TR
></TABLE
>
	    </P
></DD
><DT
><A
NAME="RECODEREF"
></A
><B
CLASS="COMMAND"
>recode</B
></DT
><DD
><P
>Consider this a fancier version of
	      <B
CLASS="COMMAND"
>iconv</B
>, above. This very versatile utility
	      for converting a file to a different encoding scheme.
	      Note that <I
CLASS="FIRSTTERM"
>recode</I
> is not part of the
	      standard Linux installation.</P
></DD
><DT
><A
NAME="TEXREF"
></A
><B
CLASS="COMMAND"
>TeX</B
>, <A
NAME="GSREF"
></A
><B
CLASS="COMMAND"
>gs</B
></DT
><DD
><P
><B
CLASS="COMMAND"
>TeX</B
> and <B
CLASS="COMMAND"
>Postscript</B
>
	      are text markup languages used for preparing copy for
	      printing or formatted video display.</P
><P
><B
CLASS="COMMAND"
>TeX</B
> is Donald Knuth's elaborate
		typsetting system. It is often convenient to write a
		shell script encapsulating all the options and arguments
		passed to one of these markup languages.</P
><P
><I
CLASS="FIRSTTERM"
>Ghostscript</I
>
		(<B
CLASS="COMMAND"
>gs</B
>) is a GPL-ed Postscript
		interpreter.</P
></DD
><DT
><A
NAME="TEXEXECREF"
></A
><B
CLASS="COMMAND"
>texexec</B
></DT
><DD
><P
>Utility for processing <I
CLASS="FIRSTTERM"
>TeX</I
> and
	      <I
CLASS="FIRSTTERM"
>pdf</I
> files. Found in
	      <TT
CLASS="FILENAME"
>/usr/bin</TT
>
	      on many Linux distros, it is actually a <A
HREF="wrapper.html#SHWRAPPER"
>shell wrapper</A
> that
	      calls <A
HREF="wrapper.html#PERLREF"
>Perl</A
> to invoke
	      <I
CLASS="FIRSTTERM"
>Tex</I
>.</P
><P
>	    <TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>texexec --pdfarrange --result=Concatenated.pdf *pdf

#  Concatenates all the pdf files in the current working directory
#+ into the merged file, Concatenated.pdf . . .
#  (The --pdfarrange option repaginates a pdf file. See also --pdfcombine.)
#  The above command-line could be parameterized and put into a shell script.</PRE
></FONT
></TD
></TR
></TABLE
>
            </P
></DD
><DT
><A
NAME="ENSCRIPTREF"
></A
><B
CLASS="COMMAND"
>enscript</B
></DT
><DD
><P
>Utility for converting plain text file to PostScript</P
><P
>For example, <B
CLASS="COMMAND"
>enscript filename.txt -p filename.ps</B
>
	      produces the PostScript output file
	      <TT
CLASS="FILENAME"
>filename.ps</TT
>.</P
></DD
><DT
><A
NAME="GROFFREF"
></A
><B
CLASS="COMMAND"
>groff</B
>, <A
NAME="TBLREF"
></A
><B
CLASS="COMMAND"
>tbl</B
>, <A
NAME="EQNREF"
></A
><B
CLASS="COMMAND"
>eqn</B
></DT
><DD
><P
>Yet another text markup and display formatting language
	      is <B
CLASS="COMMAND"
>groff</B
>. This is the enhanced GNU version
	      of the venerable UNIX <B
CLASS="COMMAND"
>roff/troff</B
> display
	      and typesetting package. <A
HREF="basic.html#MANREF"
>Manpages</A
>
	      use <B
CLASS="COMMAND"
>groff</B
>.</P
><P
>The <B
CLASS="COMMAND"
>tbl</B
> table processing utility
	      is considered part of <B
CLASS="COMMAND"
>groff</B
>, as its
	      function is to convert table markup into
	      <B
CLASS="COMMAND"
>groff</B
> commands.</P
><P
>The <B
CLASS="COMMAND"
>eqn</B
> equation processing utility
	      is likewise part of <B
CLASS="COMMAND"
>groff</B
>, and
	      its function is to convert equation markup into
	      <B
CLASS="COMMAND"
>groff</B
> commands.</P
><DIV
CLASS="EXAMPLE"
><A
NAME="MANVIEW"
></A
><P
><B
>Example 16-29. <I
CLASS="FIRSTTERM"
>manview</I
>: Viewing formatted manpages</B
></P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
# manview.sh: Formats the source of a man page for viewing.

#  This script is useful when writing man page source.
#  It lets you look at the intermediate results on the fly
#+ while working on it.

E_WRONGARGS=85

if [ -z "$1" ]
then
  echo "Usage: `basename $0` filename"
  exit $E_WRONGARGS
fi

# ---------------------------
groff -Tascii -man $1 | less
# From the man page for groff.
# ---------------------------

#  If the man page includes tables and/or equations,
#+ then the above code will barf.
#  The following line can handle such cases.
#
#   gtbl &#60; "$1" | geqn -Tlatin1 | groff -Tlatin1 -mtty-char -man
#
#   Thanks, S.C.

exit $?   # See also the "maned.sh" script.</PRE
></FONT
></TD
></TR
></TABLE
></DIV
><P
>See also <A
HREF="contributed-scripts.html#MANED"
>Example A-39</A
>.</P
></DD
><DT
><A
NAME="LEXREF"
></A
><B
CLASS="COMMAND"
>lex</B
>, <A
NAME="YACCREF"
></A
><B
CLASS="COMMAND"
>yacc</B
></DT
><DD
><P
><A
NAME="FLEXREF"
></A
></P
><P
>The <B
CLASS="COMMAND"
>lex</B
> lexical analyzer produces
	      programs for pattern matching. This has been replaced
	      by the nonproprietary <B
CLASS="COMMAND"
>flex</B
> on Linux
	      systems.</P
><P
><A
NAME="BISONREF"
></A
></P
><P
>The <B
CLASS="COMMAND"
>yacc</B
> utility creates a
	      parser based on a set of specifications. This has been
	      replaced by the nonproprietary <B
CLASS="COMMAND"
>bison</B
>
	      on Linux systems.</P
></DD
></DL
></DIV
></DIV
><H3
CLASS="FOOTNOTES"
>Notes</H3
><TABLE
BORDER="0"
CLASS="FOOTNOTES"
WIDTH="100%"
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.AEN11502"
HREF="textproc.html#AEN11502"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
>This is only true of the GNU version of
		 <B
CLASS="COMMAND"
>tr</B
>, not the generic version often found on
		 commercial UNIX systems.</P
></TD
></TR
></TABLE
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="timedate.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="filearchiv.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Time / Date Commands</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="external.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>File and Archiving Commands</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>