505 lines
19 KiB
HTML
505 lines
19 KiB
HTML
<!--startcut ==============================================-->
|
|
<!-- *** BEGIN HTML header *** -->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML><HEAD>
|
|
<title>Programming in Ruby, part 2 LG #83</title>
|
|
</HEAD>
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
|
|
ALINK="#FF0000">
|
|
<!-- *** END HTML header *** -->
|
|
|
|
<!-- *** BEGIN navbar *** -->
|
|
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="qubism.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue83/ramankutty.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="sandeep.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
|
|
<!-- *** END navbar *** -->
|
|
|
|
<!--endcut ============================================================-->
|
|
|
|
<TABLE BORDER><TR><TD WIDTH="200">
|
|
<A HREF="http://www.linuxgazette.com/">
|
|
<IMG ALT="LINUX GAZETTE" SRC="../gx/2002/lglogo_200x41.png"
|
|
WIDTH="200" HEIGHT="41" border="0"></A>
|
|
<BR CLEAR="all">
|
|
<SMALL>...<I>making Linux just a little more fun!</I></SMALL>
|
|
</TD><TD WIDTH="380">
|
|
|
|
|
|
<center>
|
|
<BIG><BIG><STRONG><FONT COLOR="maroon">Programming in Ruby, part 2</FONT></STRONG></BIG></BIG><BR>
|
|
<STRONG>By <A HREF="../authors/ramankutty.html">Hiran Ramankutty</A></STRONG>
|
|
</CENTER>
|
|
|
|
</TD></TR>
|
|
</TABLE>
|
|
<P>
|
|
|
|
<!-- END header -->
|
|
|
|
|
|
|
|
|
|
<H2><B>Review</B></H2>
|
|
<p>
|
|
A wide variety of applications from different domains need different
|
|
levels of organization. We have seen the fundamentals of Ruby in
|
|
<A HREF="../issue81/ramankutty.html">Part 1</A>, and now we jump on to the next
|
|
level of organization.
|
|
</p>
|
|
<H2><B>Regular Expressions</B></H2>
|
|
<p>
|
|
In Ruby, a regular expression is quoted by '/' as in Perl and awk rather than
|
|
by quotation marks.
|
|
Regular expressions have an efficient expressive
|
|
power, whenever you deal with patterns (as in pattern matching). Also
|
|
some methods convert a string into a regular expression.
|
|
<p></p>
|
|
<pre>print "abcdef" =~ /de/,"\n"
|
|
print "aaaaaa" =~ /d/,"\n"
|
|
^D
|
|
3
|
|
FALSE
|
|
</pre>
|
|
</p>
|
|
<p>
|
|
The operator `=~' is a matching operator with respect to regular
|
|
expressions. It returns the position in a string where a match was
|
|
found, or nil if the pattern did not match. It is interesting to see
|
|
that regular expressions share a particular kind of vocabulary as shown
|
|
below:
|
|
</p>
|
|
<DIV class="table"><PRE>
|
|
[ ] range specification. (e.g., [a-z] means a letter in range of from a to z)
|
|
\w letter or digit. same as [0-9A-Za-z_]
|
|
\W neither letter nor digit
|
|
\s blank character. same as [ \t\n\r\f]
|
|
\S non-space character.
|
|
\d digit character. same as [0-9].
|
|
\D non digit character.
|
|
\b word boundary (outside of range specification).
|
|
\B non word boundary.
|
|
\b back space (0x08) (inside of range specification)
|
|
* zero or more times repetition of followed expression
|
|
+ one or more times repetition of followed expression
|
|
{m,n} at least n times, but not more than m times repetition
|
|
of followed expression
|
|
? at least 0 times, but not more than 1 times repetition
|
|
of followed expression
|
|
| either preceding or next expression may match
|
|
( ) grouping
|
|
</PRE></DIV>
|
|
<p>
|
|
For example, `^f[a-z]+' means "f followed by repetition of letters in
|
|
range from `a' to `z'. Now what if we want check whether a string fits a
|
|
given description say for example: "Starts with lower case `f', which
|
|
is immediately followed by exactly one upper case letter, and
|
|
optionally more junk after that, as long as there are no more lower
|
|
case characters. You will have to write a dozen lines in C, right?
|
|
Admit it; you can hardly help yourself. In Ruby you just have to
|
|
request the string to be matched with the regular expression
|
|
/^f[A-Z](^[a-z])*$/. This ability of regular expressions in string
|
|
matching is often used in UNIX environment, typical example is `grep'.
|
|
Let us get acquainted with regular expressions. Consider the program
|
|
given below:
|
|
</p>
|
|
<p></p>
|
|
<PRE> #Store this as regx.rb
|
|
st = "\033[7m"
|
|
en = "\033[m"
|
|
|
|
while TRUE
|
|
print "str> "
|
|
STDOUT.flush
|
|
str = gets
|
|
break if not str
|
|
str.chop!
|
|
print "pat> "
|
|
STDOUT.flush
|
|
re = gets
|
|
break if not re
|
|
re.chop!
|
|
str.gsub! re, "#{st}\\&#{en}"
|
|
print str, "\n"
|
|
end
|
|
print "\n"
|
|
# Now run ruby regx.rb
|
|
</PRE>
|
|
<p>The program requires inputs twice, once for a string and once for a
|
|
regular expression. The test is performed for the string against the regular
|
|
expression, and matched parts are highlighted in reverse video. Note
|
|
that this requires an ANSI terminal since it uses reverse video
|
|
escape sequences. Do not mind the details of the program.
|
|
</p>
|
|
<p></p>
|
|
<pre>str>foobar
|
|
pat>^fo+
|
|
foobar
|
|
~~~
|
|
</pre>
|
|
<p>
|
|
We see that foo is reversed. Note that ``~~~'' is just for text-based
|
|
browsers. We shall experiment with different inputs.
|
|
</p>
|
|
<p></p>
|
|
<pre>str>asd987wonew06521
|
|
pat>\d
|
|
asd987wonew06521
|
|
~~~ ~~~~~
|
|
str>foozboozer
|
|
pat>f.*z
|
|
foozboozer
|
|
~~~~~~~~
|
|
</pre>
|
|
<p>
|
|
Note that foozbooz is matched and not fooz. This is because here the
|
|
regular expression matches the longest possible substring. First
|
|
glance interpretation is difficult. Try this:
|
|
</p>
|
|
<p></p>
|
|
<pre>str> Wed Feb 7 08:58:04 JST 1996
|
|
pat> [0-9]+:[0-9]+(:[0-9]+)?
|
|
Wed Feb 7 08:58:04 JST 1996
|
|
~~~~~~~~
|
|
</pre>
|
|
<p>
|
|
Now try to represent a hexadecimal number using regular expressions.
|
|
(for example: 0x123af00c as well as 0Xbc13590ae are hexadecimal numbers)
|
|
</p>
|
|
<p></p>
|
|
<pre>def chab(s) # "contains hex in angle brackets"
|
|
(s =~ /<0(x|X)(\d|[a-f]|[A-F])+>/) != nil
|
|
end
|
|
print chab "Not this one."
|
|
print "\n",chab "Maybe this? {0x35}" # use of wrong kind of brackets
|
|
print "\n",chab "Or this? <0x38z7e>" # Is this a HEX number
|
|
print "\n",chab "Okay, this: <0xfc0004>."
|
|
print "\n"
|
|
^D
|
|
false
|
|
false
|
|
false
|
|
true
|
|
</pre>
|
|
<H2><B>Iterators</B></H2>
|
|
<p>
|
|
Iterator means "one which does the same thing many times". Consider the
|
|
C code given below:
|
|
</p>
|
|
<p></p>
|
|
<pre>char *str;
|
|
for (str = "abcdefg"; *str != '\0'; str++) {
|
|
/* process a character here */
|
|
}
|
|
</pre>
|
|
<p>
|
|
Note the abstraction provided by C's for(...) syntax to create loops,
|
|
but in fact, the programmer has to know the internal structure of a
|
|
string to test *str with the null character.
|
|
</p>
|
|
<p>
|
|
Flexible support for iteration is one of the few features that mark a
|
|
high-level language. Consider the following shell script (/bin/sh):
|
|
</p>
|
|
<p></p>
|
|
<pre>for i in *.[ch]; do
|
|
# ... something to do for each file
|
|
done
|
|
</pre>
|
|
<p>
|
|
All the C source and header files in the current directory are,
|
|
processed, and the command shell handles the details of picking up and
|
|
substituting file names one by one. Isn't this working at a higher
|
|
level than C? What do you think ?
|
|
</p>
|
|
<p>
|
|
Considering the fact that, it is fine to provide iterators in a
|
|
programming language for built-in data types, but it is a
|
|
disappointment if we have to write low-level loops to iterate our
|
|
dat types. In OOP, this can be a serious problem, since users often
|
|
define one data type after another.
|
|
</p>
|
|
<p>
|
|
To solve above matters, every OOP language has elaborate ways to make
|
|
iterations easy, for example some languages provide class controlling
|
|
iteration, etc. On the other hand, ruby allows us to define control
|
|
structures directly. In term of ruby, such user-defined control
|
|
structures are called iterators.
|
|
</p>
|
|
<p>
|
|
Let us see few examples:
|
|
</p>
|
|
<p></p>
|
|
<pre>"abc".each_byte{|c| printf "%c\n", c}
|
|
^D
|
|
a
|
|
b
|
|
c
|
|
</pre>
|
|
<p>
|
|
Here, each_byte is an iterator for each character in the string. A
|
|
local variable `c' is being used here, and each character is being
|
|
substituted into it. This can be translated into something that looks
|
|
a lot like C code ...
|
|
</p>
|
|
<p></p>
|
|
<pre>s="abc"
|
|
i=0
|
|
while i < s.length
|
|
printf "%c\n",s[i]
|
|
i+=1
|
|
end
|
|
^D
|
|
a
|
|
b
|
|
c
|
|
</pre>
|
|
<p>
|
|
... however, the each_byte iterator is simpler conceptually and is
|
|
more likely to continuously work even if the string class happens to
|
|
be radically modified in the near future. One benefit of the iterators
|
|
is their tendency of robustness in the face of such changes, and I
|
|
think that is a characteristic of a good code.
|
|
</p>
|
|
Another iterator of string is each_line.
|
|
</p><p></p>
|
|
<pre>"a\nb\nc\n".each_line{|l| print l}
|
|
^D
|
|
a
|
|
b
|
|
c
|
|
</pre>
|
|
<p>
|
|
Every irksome task like finding delimiters for lines, generating
|
|
sub strings etc. are undertaken by iterators.
|
|
</p>
|
|
<p>
|
|
Now, let's rewrite this example with for statement.
|
|
|
|
</p><p></p>
|
|
<pre>for l in "a\nb\nc\n"
|
|
print l
|
|
end
|
|
^D
|
|
a
|
|
b
|
|
c
|
|
</pre>
|
|
<p>
|
|
The for statement does iteration by way of an each iterator. String's
|
|
each works the same as each_line as seen in the previous example.
|
|
</p>
|
|
<p>
|
|
Current iteration can be done or retried again from the top, by using
|
|
a control structure `retry' in conjunction with an iterated loop.
|
|
See below:
|
|
</p><p></p>
|
|
<pre>c = 0
|
|
for i in 0..4
|
|
print i
|
|
if i==2 and c==0
|
|
c = 1
|
|
print "\n"
|
|
retry
|
|
end
|
|
end
|
|
^D
|
|
012
|
|
01234
|
|
</pre>
|
|
<p>
|
|
The definition of an iterator may have an occurrence of `yield', which
|
|
moves control to the block of code that is passed to the iterator (we
|
|
will see more of this later). The example below defines the iterator
|
|
repeat, which repeats a block of code the number of times specified in
|
|
an argument.
|
|
</p>
|
|
<p></p>
|
|
<pre>def repeat(num)
|
|
while num < num
|
|
yield
|
|
num-=1
|
|
end
|
|
end
|
|
repeat(4) {print "hello world\n"}
|
|
^D
|
|
hello world
|
|
hello world
|
|
hello world
|
|
</pre>
|
|
<p>
|
|
If it is not clear, then, print the value of num before and after the
|
|
occurrence of `yield'.
|
|
</p>
|
|
<p>
|
|
With `retry' one can define an iterator which works the same as `while',
|
|
but it is not practical due to slowness.
|
|
</p><p></p>
|
|
<pre>def MYWHILE(cond)
|
|
return if not cond
|
|
yield
|
|
retry
|
|
end
|
|
i = 0
|
|
MYWHILE(i<3) {print i,"\n" ;i+=1}
|
|
^D
|
|
0
|
|
1
|
|
2
|
|
</pre>
|
|
<p>
|
|
By now, I hope you must have got an idea about iterators. There are a
|
|
few restrictions, but you can write your original iterators; and in
|
|
fact, whenever you define a new data type, it is often convenient to
|
|
define suitable iterators to go with it. In this sense this, the above
|
|
examples `repeat() and `MYWHILE()' are not very useful. We will talk
|
|
about practical iterators after we have a better understanding of what
|
|
classes are.
|
|
</p>
|
|
|
|
<H2><B>Object Oriented Thinking</B></H2>
|
|
<p>
|
|
<B>`Object Oriented'</B> is indeed a very catchy phrase. Ruby claims to
|
|
be an object oriented scripting language; but what does 'object
|
|
oriented' exactly mean?
|
|
</p>
|
|
<p>
|
|
There have been a variety of answers to that question, all of which
|
|
probably boil down to about the same thing. Before arguing and summing
|
|
our definitions too quickly, let's think for a moment about the
|
|
traditional programming paradigm.
|
|
</p>
|
|
<p>
|
|
Traditionally, a programming problem is attacked by coming up with
|
|
some kinds of <B>data representations</B>, and <B>procedures</B> that
|
|
operate on that data. We can associate terms inert, passive,and
|
|
helpless with `data' under this model and that the data sits
|
|
completely at the mercy of a large procedural body with which we
|
|
associate terms active, logical, and all-powerful.
|
|
</p>
|
|
<p>
|
|
The problem with this approach is that programs are written by
|
|
programmers, who are only human and can only keep so much detail clear
|
|
in their heads at any one time. As a project gets larger, its
|
|
procedural core grows to the point where it is difficult to remember how
|
|
the whole thing works. Minor lapses of thinking and typographical
|
|
errors become more likely to result in well-concealed bugs. Complex
|
|
and unintended interactions begin to emerge within the procedural core,
|
|
and maintaining it becomes like trying to carry around an angry squid
|
|
without letting any tentacles touch your face. There are guidelines
|
|
for programming that can help to minimize and localize bugs within this
|
|
traditional paradigm, but there is a better solution that involves
|
|
fundamentally changing the way we work.
|
|
</p>
|
|
<p>
|
|
What object-oriented does is to let us delegate most of he mundane
|
|
and repetitive logical work <B>to the data itself</B>; we can then
|
|
change our concept of data from <B>passive</B> to <B>active</B>. Put
|
|
another way,
|
|
</p>
|
|
<ul>
|
|
<li>we stop treating each piece of data as a box with an open lid that
|
|
lets us reach in and throw things around.
|
|
<li>We start treating each piece of data as a working machine with a
|
|
closed lid and a few well marked switches and dials.
|
|
</ul>
|
|
<p>
|
|
What is described above as a "machine" may be very simple or complex
|
|
on the inside; we can't tell from the outside, and we don't allow
|
|
ourselves to open up the machine (except when we are absolutely sure
|
|
something is wrong with its design), so we are required to do things
|
|
like flip the switches and read the dials to interact with the data.
|
|
Once the machine is built, we don't want to have to think about how it
|
|
operates.
|
|
</p>
|
|
<p>
|
|
You might think we are just making more work for ourselves,but this
|
|
approach tends to do a nice job of preventing all kinds of things from
|
|
going wrong.
|
|
</p>
|
|
<p>
|
|
Let's start with an example that is to simple to be of practical value,
|
|
but should illustrate at least part of the concept. My 2-wheeler has a
|
|
trip meter. Its job is to keep track of the distance it has travelled
|
|
since the last time its reset button was pushed. How would we model
|
|
this in a programming language? In C, the trip meter would just be a
|
|
numeric variable, possibly of type float. The program would manipulate
|
|
the variable by increasing its value in small increments, with
|
|
occasional resets to zero when appropriate. What's wrong with that? A
|
|
bug in the program would assign a bogus value to the variable, for any
|
|
number of unexpected reasons. Anyone who has programmed in C knows what
|
|
it is like to spend hours or days tracking down such a bug whose cause
|
|
seems absurdly simple once it has been found. (The moment of finding the
|
|
bug is commonly indicated by the sound of a loud slap to the forehead.)
|
|
</p>
|
|
<p>
|
|
In object-oriented context, the same problem can be attacked in a
|
|
different manner. A programmer designing a trip meter is supposed not to
|
|
ask "which of the familiar data-types comes closest to resembling the
|
|
thing" but instead be interested in "how exactly is this thing supposed
|
|
to act?" The difference winds up being a profound one. It is necessary
|
|
to spend a little bit of time deciding exactly what an odometer is for,
|
|
and how the outside world expects to interact with it. We decide to
|
|
build a little machine with controls that allows us to increment it,
|
|
reset it, read its value, and nothing else.
|
|
</p>
|
|
<p>
|
|
We don't provide a way for a trip meter to be assigned arbitrary values;
|
|
why? because we all know trip meters don't work that way. There are
|
|
only a few things you should be able to do with a trip meter, and those
|
|
are all we allow. Thus, if something else in the program mistakenly
|
|
tries to place some other value (say, the target temperature of the
|
|
vehicle's climate control) into the trip meter, there is an immediate
|
|
indication of what went wrong. We are told when running the program
|
|
(or possibly while compiling, depending on the nature of the language)
|
|
that <B>we are not allowed to assign arbitrary values to Trip meter
|
|
objects</B>. The message might not be exactly that clear, but it will be
|
|
reasonably close to that. It doesn't prevent the error, does it? But
|
|
it quickly points us in the direction of the cause. This is only one
|
|
of several ways in which OO programming can save a lot of wasted time.
|
|
</p>
|
|
<p>
|
|
We commonly take one step of abstraction above this, because it turns
|
|
out to be as easy to build a factory that makes machines as it is to
|
|
make an individual machine. We aren't likely to build a single
|
|
trip meter directly; rather, we arrange for any number of trip meters to
|
|
be built from a single pattern. The pattern (or if you like, the
|
|
trip meter factory) corresponds to what we call a class, and an
|
|
factory) corresponds to an <B>object</B>. Most OO languages require
|
|
a class to be defined before we can have a new kind of object, but
|
|
ruby does not.
|
|
</p>
|
|
<p>
|
|
I would like to emphasize on the fact that the use of an OO language
|
|
will not enforce proper OO design. Indeed it is possible in any
|
|
language to write code that is unclear, sloppy, ill-conceived, buggy,
|
|
and wobbly all over. What ruby does for you (as opposed, especially, to
|
|
C++) is to make the practice of OO programming feel natural enough
|
|
that even when you are working on a small scale you don't feel a
|
|
necessity to resort to ugly code to save effort. We will be discussing
|
|
the ways in which ruby accomplishes that admirable goal as this guide
|
|
progresses; the next topic will be the "switches and dials" (object
|
|
methods) and from there we'll move on to the "factories" (classes). Are
|
|
you still with us?
|
|
</p>
|
|
|
|
|
|
|
|
|
|
<!-- *** BEGIN copyright *** -->
|
|
<hr>
|
|
<CENTER><SMALL><STRONG>
|
|
|
|
Copyright © 2002, Hiran Ramankutty.
|
|
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
|
|
Published in Issue 83 of <i>Linux Gazette</i>, October 2002
|
|
</STRONG></SMALL></CENTER>
|
|
<!-- *** END copyright *** -->
|
|
<HR>
|
|
|
|
<!--startcut ==========================================================-->
|
|
<CENTER>
|
|
<!-- *** BEGIN navbar *** -->
|
|
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="qubism.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue83/ramankutty.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="sandeep.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
|
|
<!-- *** END navbar *** -->
|
|
</CENTER>
|
|
</BODY></HTML>
|
|
<!--endcut ============================================================-->
|