old-www/LDP/abs/html/x17129.html

1568 lines
22 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML
><HEAD
><TITLE
>A Brief Introduction to Regular Expressions</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
REL="HOME"
TITLE="Advanced Bash-Scripting Guide"
HREF="index.html"><LINK
REL="UP"
TITLE="Regular Expressions"
HREF="regexp.html"><LINK
REL="PREVIOUS"
TITLE="Regular Expressions"
HREF="regexp.html"><LINK
REL="NEXT"
TITLE="Globbing"
HREF="globbingref.html"></HEAD
><BODY
CLASS="SECT1"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="3"
ALIGN="center"
>Advanced Bash-Scripting Guide: </TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="bottom"
><A
HREF="regexp.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="80%"
ALIGN="center"
VALIGN="bottom"
>Chapter 18. Regular Expressions</TD
><TD
WIDTH="10%"
ALIGN="right"
VALIGN="bottom"
><A
HREF="globbingref.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="AEN17129"
></A
>18.1. A Brief Introduction to Regular Expressions</H1
><P
>An expression is a string of characters. Those characters
having an interpretation above and beyond their literal
meaning are called <I
CLASS="FIRSTTERM"
>metacharacters</I
>.
A quote symbol, for example, may denote speech by a person,
<I
CLASS="FIRSTTERM"
>ditto</I
>, or a meta-meaning
<A
NAME="AEN17134"
HREF="#FTN.AEN17134"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
>
for the symbols that follow. Regular Expressions are sets
of characters and/or metacharacters that match (or specify)
patterns.</P
><P
>A Regular Expression contains one or more of the
following:</P
><P
></P
><UL
><LI
><P
><I
CLASS="FIRSTTERM"
>A character set</I
>. These are the
characters retaining their literal meaning. The
simplest type of Regular Expression consists
<EM
>only</EM
> of a character set, with no
metacharacters.</P
></LI
><LI
><P
><A
NAME="ANCHORREF"
></A
></P
><P
><I
CLASS="FIRSTTERM"
>An anchor</I
>. These designate
(<I
CLASS="FIRSTTERM"
>anchor</I
>) the position in the line of
text that the RE is to match. For example, <SPAN
CLASS="TOKEN"
>^</SPAN
>,
and <SPAN
CLASS="TOKEN"
>$</SPAN
> are anchors.</P
></LI
><LI
><P
><I
CLASS="FIRSTTERM"
>Modifiers</I
>. These expand or narrow
(<I
CLASS="FIRSTTERM"
>modify</I
>) the range of text the RE is
to match. Modifiers include the asterisk, brackets, and
the backslash.</P
></LI
></UL
><P
>The main uses for Regular Expressions
(<I
CLASS="FIRSTTERM"
>RE</I
>s) are text searches and string
manipulation. An RE <I
CLASS="FIRSTTERM"
>matches</I
> a single
character or a set of characters -- a string or a part of
a string.</P
><P
></P
><UL
><LI
><P
><A
NAME="ASTERISKREG"
></A
>The asterisk --
<SPAN
CLASS="TOKEN"
>*</SPAN
> -- matches any number of
repeats of the character string or RE preceding it,
including <EM
>zero</EM
> instances.</P
><P
><SPAN
CLASS="QUOTE"
>"1133*"</SPAN
> matches <TT
CLASS="REPLACEABLE"
><I
>11 +
one or more 3's</I
></TT
>:
<TT
CLASS="REPLACEABLE"
><I
>113</I
></TT
>, <TT
CLASS="REPLACEABLE"
><I
>1133</I
></TT
>,
<TT
CLASS="REPLACEABLE"
><I
>1133333</I
></TT
>, and so forth.</P
></LI
><LI
><P
><A
NAME="REGEXDOT"
></A
>The <I
CLASS="FIRSTTERM"
>dot</I
>
-- <SPAN
CLASS="TOKEN"
>.</SPAN
> -- matches
any one character, except a newline.
<A
NAME="AEN17189"
HREF="#FTN.AEN17189"
><SPAN
CLASS="footnote"
>[2]</SPAN
></A
>
</P
><P
><SPAN
CLASS="QUOTE"
>"13."</SPAN
> matches <TT
CLASS="REPLACEABLE"
><I
>13 + at
least one of any character (including a
space)</I
></TT
>: <TT
CLASS="REPLACEABLE"
><I
>1133</I
></TT
>,
<TT
CLASS="REPLACEABLE"
><I
>11333</I
></TT
>, but not
<TT
CLASS="REPLACEABLE"
><I
>13</I
></TT
> (additional character
missing).</P
><P
>See <A
HREF="textproc.html#CWSOLVER"
>Example 16-18</A
> for a demonstration
of <I
CLASS="FIRSTTERM"
>dot single-character</I
>
matching.</P
></LI
><LI
><P
><A
NAME="CARETREF"
></A
>The caret -- <SPAN
CLASS="TOKEN"
>^</SPAN
>
-- matches the beginning of a line, but sometimes, depending
on context, negates the meaning of a set of characters in
an RE.</P
></LI
><LI
><P
><A
NAME="DOLLARSIGNREF"
></A
></P
><P
>The dollar sign -- <SPAN
CLASS="TOKEN"
>$</SPAN
> -- at the end of an
RE matches the end of a line.</P
><P
><SPAN
CLASS="QUOTE"
>"XXX$"</SPAN
> matches <SPAN
CLASS="TOKEN"
>XXX</SPAN
> at the
end of a line.</P
><P
><SPAN
CLASS="QUOTE"
>"^$"</SPAN
> matches blank lines.</P
></LI
><LI
><P
><A
NAME="BRACKETSREF"
></A
></P
><P
>Brackets -- <SPAN
CLASS="TOKEN"
>[...]</SPAN
> -- enclose a set of characters
to match in a single RE.</P
><P
><SPAN
CLASS="QUOTE"
>"[xyz]"</SPAN
> matches any one of the characters
<TT
CLASS="REPLACEABLE"
><I
>x</I
></TT
>, <TT
CLASS="REPLACEABLE"
><I
>y</I
></TT
>,
or <TT
CLASS="REPLACEABLE"
><I
>z</I
></TT
>.</P
><P
><SPAN
CLASS="QUOTE"
>"[c-n]"</SPAN
> matches any one of the
characters in the range <TT
CLASS="REPLACEABLE"
><I
>c</I
></TT
>
to <TT
CLASS="REPLACEABLE"
><I
>n</I
></TT
>.</P
><P
><SPAN
CLASS="QUOTE"
>"[B-Pk-y]"</SPAN
> matches any one of the
characters in the ranges <TT
CLASS="REPLACEABLE"
><I
>B</I
></TT
>
to <TT
CLASS="REPLACEABLE"
><I
>P</I
></TT
>
and <TT
CLASS="REPLACEABLE"
><I
>k</I
></TT
> to
<TT
CLASS="REPLACEABLE"
><I
>y</I
></TT
>.</P
><P
><SPAN
CLASS="QUOTE"
>"[a-z0-9]"</SPAN
> matches any single lowercase
letter or any digit.</P
><P
><SPAN
CLASS="QUOTE"
>"[^b-d]"</SPAN
> matches any character
<EM
>except</EM
> those in
the range <TT
CLASS="REPLACEABLE"
><I
>b</I
></TT
> to
<TT
CLASS="REPLACEABLE"
><I
>d</I
></TT
>. This is an instance of
<SPAN
CLASS="TOKEN"
>^</SPAN
> negating or inverting the meaning
of the following RE (taking on a role similar to
<SPAN
CLASS="TOKEN"
>!</SPAN
> in a different context).</P
><P
>Combined sequences of bracketed characters match
common word patterns. <SPAN
CLASS="QUOTE"
>"[Yy][Ee][Ss]"</SPAN
> matches
<TT
CLASS="REPLACEABLE"
><I
>yes</I
></TT
>, <TT
CLASS="REPLACEABLE"
><I
>Yes</I
></TT
>,
<TT
CLASS="REPLACEABLE"
><I
>YES</I
></TT
>, <TT
CLASS="REPLACEABLE"
><I
>yEs</I
></TT
>,
and so forth.
<SPAN
CLASS="QUOTE"
>"[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]"</SPAN
>
matches any Social Security number.</P
></LI
><LI
><P
><A
NAME="REGEXBS"
></A
></P
><P
>The backslash -- <SPAN
CLASS="TOKEN"
>\</SPAN
> -- <A
HREF="escapingsection.html#ESCP"
>escapes</A
> a special character, which
means that character gets interpreted literally (and is
therefore no longer <I
CLASS="FIRSTTERM"
>special</I
>).</P
><P
>A <SPAN
CLASS="QUOTE"
>"\$"</SPAN
> reverts back to its
literal meaning of <SPAN
CLASS="QUOTE"
>"$"</SPAN
>, rather than its
RE meaning of end-of-line. Likewise a <SPAN
CLASS="QUOTE"
>"\\"</SPAN
>
has the literal meaning of <SPAN
CLASS="QUOTE"
>"\"</SPAN
>.</P
></LI
><LI
><P
><A
NAME="ANGLEBRAC"
></A
></P
><P
><A
HREF="escapingsection.html#ESCP"
>Escaped</A
> <SPAN
CLASS="QUOTE"
>"angle
brackets"</SPAN
> -- <SPAN
CLASS="TOKEN"
>\&#60;...\&#62;</SPAN
> -- mark word
boundaries.</P
><P
>The angle brackets must be escaped, since otherwise
they have only their literal character meaning.</P
><P
><SPAN
CLASS="QUOTE"
>"\&#60;the\&#62;"</SPAN
> matches the word
<SPAN
CLASS="QUOTE"
>"the,"</SPAN
> but not the words <SPAN
CLASS="QUOTE"
>"them,"</SPAN
>
<SPAN
CLASS="QUOTE"
>"there,"</SPAN
> <SPAN
CLASS="QUOTE"
>"other,"</SPAN
> etc.</P
><P
> <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>cat textfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>This is line 1, of which there is only one instance.
This is the only instance of line 2.
This is line 3, another line.
This is line 4.</TT
>
<TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep 'the' textfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>This is line 1, of which there is only one instance.
This is the only instance of line 2.
This is line 3, another line.</TT
>
<TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep '\&#60;the\&#62;' textfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>This is the only instance of line 2.</TT
>
</PRE
></FONT
></TD
></TR
></TABLE
>
</P
></LI
></UL
><TABLE
CLASS="SIDEBAR"
BORDER="1"
CELLPADDING="5"
><TR
><TD
><DIV
CLASS="SIDEBAR"
><A
NAME="AEN17316"
></A
><P
></P
><P
>The only way to be certain that a particular RE works is to
test it.</P
><P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>TEST FILE: tstfile # No match.
# No match.
Run grep "1133*" on this file. # Match.
# No match.
# No match.
This line contains the number 113. # Match.
This line contains the number 13. # No match.
This line contains the number 133. # No match.
This line contains the number 1133. # Match.
This line contains the number 113312. # Match.
This line contains the number 1112. # No match.
This line contains the number 113312312. # Match.
This line contains no numbers at all. # No match.</PRE
></FONT
></TD
></TR
></TABLE
></P
><TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep "1133*" tstfile</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>Run grep "1133*" on this file. # Match.
This line contains the number 113. # Match.
This line contains the number 1133. # Match.
This line contains the number 113312. # Match.
This line contains the number 113312312. # Match.</TT
>
</PRE
></FONT
></TD
></TR
></TABLE
><P
></P
></DIV
></TD
></TR
></TABLE
><P
></P
><UL
><LI
STYLE="list-style-type: square"
><DIV
CLASS="FORMALPARA"
><P
><B
><A
NAME="EXTREGEX"
></A
>Extended REs. </B
>Additional metacharacters added to the basic set. Used
in <A
HREF="textproc.html#EGREPREF"
>egrep</A
>,
<A
HREF="awk.html#AWKREF"
>awk</A
>, and <A
HREF="wrapper.html#PERLREF"
>Perl</A
>.</P
></DIV
></LI
><LI
><P
><A
NAME="QUEXREGEX"
></A
></P
><P
>The question mark -- <SPAN
CLASS="TOKEN"
>?</SPAN
> -- matches zero or
one of the previous RE. It is generally used for matching
single characters.</P
></LI
><LI
><P
><A
NAME="PLUSREF"
></A
></P
><P
>The plus -- <SPAN
CLASS="TOKEN"
>+</SPAN
> -- matches one or more of the
previous RE. It serves a role similar to the <SPAN
CLASS="TOKEN"
>*</SPAN
>, but
does <EM
>not</EM
> match zero occurrences.</P
><P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
># GNU versions of sed and awk can use "+",
# but it needs to be escaped.
echo a111b | sed -ne '/a1\+b/p'
echo a111b | grep 'a1\+b'
echo a111b | gawk '/a1+b/'
# All of above are equivalent.
# Thanks, S.C.</PRE
></FONT
></TD
></TR
></TABLE
></P
><P
><A
NAME="ESCPCB"
></A
></P
></LI
><LI
><P
><A
HREF="escapingsection.html#ESCP"
>Escaped</A
> <SPAN
CLASS="QUOTE"
>"curly
brackets"</SPAN
> -- <SPAN
CLASS="TOKEN"
>\{ \}</SPAN
> -- indicate the number
of occurrences of a preceding RE to match.</P
><P
>It is necessary to escape the curly brackets since
they have only their literal character meaning
otherwise. This usage is technically not part of the basic
RE set.</P
><P
><SPAN
CLASS="QUOTE"
>"[0-9]\{5\}"</SPAN
> matches exactly five digits
(characters in the range of 0 to 9).</P
><DIV
CLASS="NOTE"
><P
></P
><TABLE
CLASS="NOTE"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/note.gif"
HSPACE="5"
ALT="Note"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>Curly brackets are not available as an RE in the
<SPAN
CLASS="QUOTE"
>"classic"</SPAN
> (non-POSIX compliant) version
of <A
HREF="awk.html#AWKREF"
>awk</A
>.
<A
NAME="GNUGAWK"
></A
>However, the GNU extended version
of <I
CLASS="FIRSTTERM"
>awk</I
>, <B
CLASS="COMMAND"
>gawk</B
>,
has the <TT
CLASS="OPTION"
>--re-interval</TT
> option that permits
them (without being escaped).</P
><P
> <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>echo 2222 | gawk --re-interval '/2{3}/'</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>2222</TT
>
</PRE
></FONT
></TD
></TR
></TABLE
>
</P
><P
><B
CLASS="COMMAND"
>Perl</B
> and some
<B
CLASS="COMMAND"
>egrep</B
> versions do not require escaping
the curly brackets.</P
></TD
></TR
></TABLE
></DIV
></LI
><LI
><P
><A
NAME="PARENGRPS"
></A
></P
><P
>Parentheses -- <B
CLASS="COMMAND"
>( )</B
> -- enclose a group of
REs. They are useful with the following
<SPAN
CLASS="QUOTE"
>"<SPAN
CLASS="TOKEN"
>|</SPAN
>"</SPAN
> operator and in <A
HREF="string-manipulation.html#EXPRPAREN"
>substring extraction</A
> using <A
HREF="moreadv.html#EXPRREF"
>expr</A
>.</P
></LI
><LI
><P
>The -- <B
CLASS="COMMAND"
>|</B
> -- <SPAN
CLASS="QUOTE"
>"or"</SPAN
> RE operator
matches any of a set of alternate characters.</P
><P
> <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>egrep 're(a|e)d' misc.txt</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>People who read seem to be better informed than those who do not.
The clarinet produces sound by the vibration of its reed.</TT
>
</PRE
></FONT
></TD
></TR
></TABLE
>
</P
></LI
></UL
><DIV
CLASS="NOTE"
><P
></P
><TABLE
CLASS="NOTE"
WIDTH="100%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/note.gif"
HSPACE="5"
ALT="Note"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>Some versions of <B
CLASS="COMMAND"
>sed</B
>,
<B
CLASS="COMMAND"
>ed</B
>, and <B
CLASS="COMMAND"
>ex</B
> support
escaped versions of the extended Regular Expressions
described above, as do the GNU utilities.</P
></TD
></TR
></TABLE
></DIV
><P
></P
><UL
><LI
STYLE="list-style-type: square"
><DIV
CLASS="FORMALPARA"
><P
><B
><A
NAME="POSIXREF"
></A
>POSIX Character Classes. </B
><TT
CLASS="USERINPUT"
><B
>[:class:]</B
></TT
></P
></DIV
><P
>This is an alternate method of specifying a range of
characters to match.</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:alnum:]</B
></TT
> matches alphabetic or
numeric characters. This is equivalent to
<TT
CLASS="USERINPUT"
><B
>A-Za-z0-9</B
></TT
>.</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:alpha:]</B
></TT
> matches alphabetic
characters. This is equivalent to
<TT
CLASS="USERINPUT"
><B
>A-Za-z</B
></TT
>.</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:blank:]</B
></TT
> matches a space or a
tab.</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:cntrl:]</B
></TT
> matches control
characters.</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:digit:]</B
></TT
> matches (decimal)
digits. This is equivalent to
<TT
CLASS="USERINPUT"
><B
>0-9</B
></TT
>.</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:graph:]</B
></TT
> (graphic printable
characters). Matches characters in the range of <A
HREF="special-chars.html#ASCIIDEF"
>ASCII</A
> 33 - 126. This is
the same as <TT
CLASS="USERINPUT"
><B
>[:print:]</B
></TT
>, below,
but excluding the space character.</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:lower:]</B
></TT
> matches lowercase
alphabetic characters. This is equivalent to
<TT
CLASS="USERINPUT"
><B
>a-z</B
></TT
>.</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:print:]</B
></TT
> (printable
characters). Matches characters in the range of ASCII 32 -
126. This is the same as <TT
CLASS="USERINPUT"
><B
>[:graph:]</B
></TT
>,
above, but adding the space character.</P
></LI
><LI
><P
><A
NAME="WSPOSIX"
></A
><TT
CLASS="USERINPUT"
><B
>[:space:]</B
></TT
>
matches whitespace characters (space and horizontal
tab).</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:upper:]</B
></TT
> matches uppercase
alphabetic characters. This is equivalent to
<TT
CLASS="USERINPUT"
><B
>A-Z</B
></TT
>.</P
></LI
><LI
><P
><TT
CLASS="USERINPUT"
><B
>[:xdigit:]</B
></TT
> matches hexadecimal
digits. This is equivalent to
<TT
CLASS="USERINPUT"
><B
>0-9A-Fa-f</B
></TT
>.</P
><DIV
CLASS="IMPORTANT"
><P
></P
><TABLE
CLASS="IMPORTANT"
WIDTH="90%"
BORDER="0"
><TR
><TD
WIDTH="25"
ALIGN="CENTER"
VALIGN="TOP"
><IMG
SRC="../images/important.gif"
HSPACE="5"
ALT="Important"></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
><P
>POSIX character classes generally require quoting
or <A
HREF="testconstructs.html#DBLBRACKETS"
>double brackets</A
>
([[ ]]).</P
></TD
></TR
></TABLE
></DIV
><P
> <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>grep [[:digit:]] test.file</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>abc=723</TT
>
</PRE
></FONT
></TD
></TR
></TABLE
>
</P
><P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
># ...
if [[ $arow =~ [[:digit:]] ]] # Numerical input?
then # POSIX char class
if [[ $acol =~ [[:alpha:]] ]] # Number followed by a letter? Illegal!
# ...
# From ktour.sh example script.</PRE
></FONT
></TD
></TR
></TABLE
>
</P
><P
>These character classes may even be used with <A
HREF="globbingref.html"
>globbing</A
>, to a limited
extent.</P
><P
> <TABLE
BORDER="1"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="SCREEN"
><TT
CLASS="PROMPT"
>bash$ </TT
><TT
CLASS="USERINPUT"
><B
>ls -l ?[[:digit:]][[:digit:]]?</B
></TT
>
<TT
CLASS="COMPUTEROUTPUT"
>-rw-rw-r-- 1 bozo bozo 0 Aug 21 14:47 a33b</TT
>
</PRE
></FONT
></TD
></TR
></TABLE
>
</P
><P
>POSIX character classes are used in
<A
HREF="textproc.html#EX49"
>Example 16-21</A
> and <A
HREF="textproc.html#LOWERCASE"
>Example 16-22</A
>.</P
></LI
></UL
><P
><A
HREF="sedawk.html#SEDREF"
>Sed</A
>, <A
HREF="awk.html#AWKREF"
>awk</A
>, and <A
HREF="wrapper.html#PERLREF"
>Perl</A
>, used as filters in scripts, take
REs as arguments when "sifting" or transforming files or I/O
streams. See <A
HREF="contributed-scripts.html#BEHEAD"
>Example A-12</A
> and <A
HREF="contributed-scripts.html#TREE"
>Example A-16</A
>
for illustrations of this.</P
><P
>The standard reference on this complex topic is Friedl's
<I
CLASS="CITETITLE"
>Mastering Regular
Expressions</I
>. <I
CLASS="CITETITLE"
>Sed &#38;
Awk</I
>, by Dougherty and Robbins, also gives a very
lucid treatment of REs. See the <A
HREF="biblio.html"
><I
>Bibliography</I
></A
> for
more information on these books.</P
></DIV
><H3
CLASS="FOOTNOTES"
>Notes</H3
><TABLE
BORDER="0"
CLASS="FOOTNOTES"
WIDTH="100%"
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.AEN17134"
HREF="x17129.html#AEN17134"
><SPAN
CLASS="footnote"
>[1]</SPAN
></A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
><A
NAME="METAMEANINGREF"
></A
>A
<I
CLASS="FIRSTTERM"
>meta-meaning</I
> is the meaning of a
term or expression on a higher level of abstraction. For
example, the <I
CLASS="FIRSTTERM"
>literal</I
> meaning
of <I
CLASS="FIRSTTERM"
>regular expression</I
> is an
ordinary expression that conforms to accepted usage. The
<I
CLASS="FIRSTTERM"
>meta-meaning</I
> is drastically different,
as discussed at length in this chapter.</P
></TD
></TR
><TR
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="5%"
><A
NAME="FTN.AEN17189"
HREF="x17129.html#AEN17189"
><SPAN
CLASS="footnote"
>[2]</SPAN
></A
></TD
><TD
ALIGN="LEFT"
VALIGN="TOP"
WIDTH="95%"
><P
>Since <A
HREF="sedawk.html#SEDREF"
>sed</A
>, <A
HREF="awk.html#AWKREF"
>awk</A
>, and <A
HREF="textproc.html#GREPREF"
>grep</A
> process single lines, there
will usually not be a newline to match. In those cases where
there is a newline in a multiple line expression, the dot
will match the newline.
<TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><FONT
COLOR="#000000"
><PRE
CLASS="PROGRAMLISTING"
>#!/bin/bash
sed -e 'N;s/.*/[&#38;]/' &#60;&#60; EOF # Here Document
line1
line2
EOF
# OUTPUT:
# [line1
# line2]
echo
awk '{ $0=$1 "\n" $2; if (/line.1/) {print}}' &#60;&#60; EOF
line 1
line 2
EOF
# OUTPUT:
# line
# 1
# Thanks, S.C.
exit 0</PRE
></FONT
></TD
></TR
></TABLE
></P
></TD
></TR
></TABLE
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="regexp.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="globbingref.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Regular Expressions</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="regexp.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Globbing</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>