old-www/LDP/LG/issue41/lopes/reg_exp.htm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
   <META NAME="GENERATOR" CONTENT="Mozilla/4.06 [en] (X11; I; Linux 2.0.34 i686) [Netscape]">
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000">
A regular expression that consists solely of
<UL>
<LI>
a <TT>Character</TT> matches this character.</LI>

<BR>&nbsp;
<LI>
&nbsp;a character class <TT>'[' (Character|Character'-'Character)+ ']'</TT>
matches any character in that class. A <TT>Character</TT> is to be considered
an element of a class, if it is listed in the class or if its code lies
within a listed character range <TT>Character'-'Character</TT>. So <TT>[a0-3\n]</TT>
for instance matches the characters</LI>

<BR>&nbsp;
<P>&nbsp;
<P><TT>a 0 1 2 3 \n</TT>
<BR>&nbsp;
<LI>
&nbsp;a negated character class <TT>'[^' (Character|Character'-'Character)+
']'</TT> matches all characters not listed in the class.</LI>

<BR>&nbsp;
<LI>
&nbsp;a string <TT>'"' StringCharacter+ '"</TT> <TT>'</TT> matches the
exact text enclosed in double quotes. All meta characters but <TT>\</TT>
and <TT>"</TT> loose their special meaning inside a string.</LI>
</UL>

<UL>
<LI>
a macro usage <TT>'{' Identifier '}'</TT> matches the input that is matched
by the right hand side of the macro with name "<TT>Identifier</TT>".</LI>

<BR>&nbsp;
<LI>
&nbsp;a predefined character class matches any of the characters in that
class. There are the following predefined character classes:</LI>

<BR>&nbsp;
<P>&nbsp;
<P><TT>.</TT> contains all characters but <TT>\n</TT>.</UL>
If <TT>a</TT> and <TT>b</TT> are regular expressions, then
<DL COMPACT>
<DT>
<TT>a | b</TT></DT>

<BR>&nbsp;
<P>&nbsp;
<P>(union) is the regular expression, that matches all input that is matched
by <TT>a</TT> or by <TT>b</TT>.
<DT>
<TT>a b</TT></DT>

<BR>&nbsp;
<P>&nbsp;
<P>(concatenation) is the regular expression, that matches the input matched
by <TT>a</TT> followed by the input matched by <TT>b</TT>.
<DT>
<TT>a*</TT></DT>

<BR>&nbsp;
<P>&nbsp;
<P>(kleene closure) matches zero or more repetitions of the input matched
by <TT>a</TT>
<DT>
<TT>a+</TT></DT>

<DD>
is equivalent to <TT>aa*</TT></DD>

<DT>
<TT>a?</TT></DT>

<DD>
matches the empty input or the input matched by <TT>a</TT></DD>

<DT>
<TT>a{ n}</TT></DT>

<BR>&nbsp;
<P>&nbsp;
<P>is equivalent to <TT>n</TT> times the concatenation of <TT>a</TT>. So
<TT>a{4}</TT>
for instance is equivalent to the expression <TT>a a a a</TT>. The decimal
integer <TT>n</TT> must be positive.
<DT>
<TT>a{ n,m}</TT></DT>

<BR>&nbsp;
<P>&nbsp;
<P>is equivalent to at least <TT>n</TT> times and at most <TT>m</TT> times
the concatenation of <TT>a</TT>. So <TT>a{2,4}</TT> for instance is equivalent
to the expression <TT>a a a? a?</TT>. Both <TT>n</TT> and
<TT>m</TT> are
non negative decimal integers and <TT>m</TT> must not be smaller than <TT>n</TT>.
<DT>
<TT>( a )</TT></DT>

<DD>
matches the same input as <TT>a</TT>.</DD>
</DL>
In a lexical rule, a regular expression <TT>r</TT> may be preceded by a
'<TT>^</TT>' (the beginning of line operator). <TT>r</TT> is then only
matched at the beginning of a line in the input. A line begins after each
<TT>\r|\n|\r\n</TT>
and at the beginning of input. The preceding line terminator in the input
is not consumed and can be matched by another rule.
<P>&nbsp;<A HREF="lopes.html#decl">Return to Using JFlex</A>
</BODY>
</HTML>