old-www/LDP/LG/issue83/ramankutty.html

<!--startcut  ==============================================-->
<!-- *** BEGIN HTML header *** -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<title>Programming in Ruby, part 2 LG #83</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!-- *** END HTML header *** -->

<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="qubism.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue83/ramankutty.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="sandeep.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->

<!--endcut ============================================================-->

<TABLE BORDER><TR><TD WIDTH="200">
<A HREF="http://www.linuxgazette.com/">
<IMG ALT="LINUX GAZETTE" SRC="../gx/2002/lglogo_200x41.png"
	WIDTH="200" HEIGHT="41" border="0"></A>
<BR CLEAR="all">
<SMALL>...<I>making Linux just a little more fun!</I></SMALL>
</TD><TD WIDTH="380">


<center>
<BIG><BIG><STRONG><FONT COLOR="maroon">Programming in Ruby, part 2</FONT></STRONG></BIG></BIG><BR>
<STRONG>By <A HREF="../authors/ramankutty.html">Hiran  Ramankutty</A></STRONG>
</CENTER>

</TD></TR>
</TABLE>
<P>

<!-- END header -->


<H2><B>Review</B></H2>
<p>
A wide variety of applications from different domains need different
levels of organization. We have seen the fundamentals of Ruby in
<A HREF="../issue81/ramankutty.html">Part 1</A>, and now we jump on to the next
level of organization.
</p>
<H2><B>Regular Expressions</B></H2>
<p>
In Ruby, a regular expression is quoted by '/' as in Perl and awk rather than
by quotation marks.
Regular expressions have an efficient expressive
power, whenever you deal with patterns (as in pattern matching). Also
some methods convert a string into a regular expression.
<p></p>
<pre>print "abcdef" =~ /de/,"\n"
print "aaaaaa" =~ /d/,"\n"
^D
3
FALSE
</pre>
</p>
<p>
The operator `=~' is a matching operator with respect to regular
expressions. It returns the position in a string where a match was
found, or nil if the pattern did not match. It is interesting to see
that regular expressions share a particular kind of vocabulary as shown
below:
</p>
<DIV class="table"><PRE>
	  [ ]     range specification. (e.g., [a-z] means a letter in range of from a to z)
          \w      letter or digit. same as [0-9A-Za-z_]
          \W      neither letter nor digit
          \s      blank character. same as [ \t\n\r\f]
          \S      non-space character.
          \d      digit character. same as [0-9].
          \D      non digit character.
          \b      word boundary (outside of range specification).
          \B      non word boundary.
          \b      back space (0x08) (inside of range specification)
          *       zero or more times repetition of followed expression
          +       one or more times repetition of followed expression
          {m,n}   at least n times, but not more than m times repetition
                  of followed expression
          ?       at least 0 times, but not more than 1 times repetition
                  of followed expression
          |       either preceding or next expression may match
          ( )     grouping
</PRE></DIV>
<p>
For example, `^f[a-z]+' means "f followed by repetition of letters in
range from `a' to `z'.  Now what if we want check whether a string fits a
given description say for example: "Starts with lower case `f', which
is immediately followed by exactly one upper case letter, and
optionally more junk after that, as long as there are no more lower
case characters. You will have to write a dozen lines in C, right?
Admit it; you can hardly help yourself. In Ruby you just have to
request the string to be matched with the regular expression
/^f[A-Z](^[a-z])*$/. This ability of regular expressions in string
matching is often used in UNIX environment, typical example is `grep'.
Let us get acquainted with regular expressions. Consider the program
given below:
</p>
<p></p>
<PRE> #Store this as regx.rb
 st = "\033[7m"
 en = "\033[m"

 while TRUE
	print "str&gt; "
	STDOUT.flush
	str = gets
	break if not str
	str.chop!
	print "pat&gt; "
	STDOUT.flush
	re = gets
	break if not re
	re.chop!
	str.gsub! re, "#{st}\\&amp;#{en}"
	print str, "\n"
end
print "\n"
# Now run ruby regx.rb
</PRE>
<p>The program requires inputs twice, once for a string and once for a
regular expression. The test is performed for the string against the regular
expression, and matched parts are highlighted in reverse video.  Note
that this requires an ANSI terminal since it uses reverse video
escape sequences. Do not mind the details of the program.
</p>
<p></p>
<pre>str&gt;foobar
pat&gt;^fo+
foobar
~~~
</pre>
<p>
We see that foo is reversed. Note that ``~~~'' is just for text-based
browsers. We shall experiment with different inputs.
</p>
<p></p>
<pre>str&gt;asd987wonew06521
pat&gt;\d
asd987wonew06521
   ~~~     ~~~~~
str&gt;foozboozer
pat&gt;f.*z
foozboozer
~~~~~~~~
</pre>
<p>
Note that foozbooz is matched and not fooz. This is because here the
regular expression matches the longest possible substring. First
glance interpretation is difficult. Try this:
</p>
<p></p>
<pre>str&gt; Wed Feb  7 08:58:04 JST 1996
pat&gt; [0-9]+:[0-9]+(:[0-9]+)?
Wed Feb  7 08:58:04 JST 1996
           ~~~~~~~~
</pre>
<p>
Now try to represent a hexadecimal number using regular expressions.
(for example: 0x123af00c as well as 0Xbc13590ae are hexadecimal numbers)
</p>
<p></p>
<pre>def chab(s)   # "contains hex in angle brackets"
	(s =~ /<0(x|X)(\d|[a-f]|[A-F])+>/) != nil
end
print chab "Not this one."
print "\n",chab "Maybe this? {0x35}" # use of wrong kind of brackets
print "\n",chab "Or this? <0x38z7e>" # Is this a HEX number
print "\n",chab "Okay, this: <0xfc0004>."
print "\n"
^D
false
false
false
true
</pre>
<H2><B>Iterators</B></H2>
<p>
Iterator means "one which does the same thing many times". Consider the
C code given below:
</p>
<p></p>
<pre>char *str;
for (str = "abcdefg"; *str != '\0'; str++) {
  /* process a character here */
}
</pre>
<p>
Note the abstraction provided by C's for(...) syntax to create loops,
but in fact, the programmer has to know the internal structure of a
string to test *str with the null character.
</p>
<p>
Flexible support for iteration is one of the few features that mark a
high-level language. Consider the following shell script (/bin/sh):
</p>
<p></p>
<pre>for i in *.[ch]; do
  # ...  something to do for each file
done
</pre>
<p>
All the C source and header files in the current directory are,
processed, and the command shell handles the details of picking up and
substituting file names one by one. Isn't this working at a higher
level than C? What do you think ?
</p>
<p>
Considering the fact that, it is fine to provide iterators in a
programming language for built-in data types, but it is a
disappointment if we have to write low-level loops to iterate our
dat types. In OOP, this can be a serious problem, since users often
define one data type after another.
</p>
<p>
To  solve above matters, every OOP language has elaborate ways to make
iterations  easy, for example some languages provide class controlling
iteration,  etc.  On  the other hand, ruby allows us to define control
structures  directly.  In  term  of  ruby,  such  user-defined control
structures are called iterators.
</p>
<p>
Let us see few examples:
</p>
<p></p>
<pre>"abc".each_byte{|c| printf "%c\n", c}
^D
a
b
c
</pre>
<p>
Here, each_byte is an iterator for each character in the string. A
local variable `c' is being used here, and each character is being
substituted into it. This can be translated into something that looks
a lot like C code ...
</p>
<p></p>
<pre>s="abc"
i=0
while i &lt; s.length
	printf "%c\n",s[i]
	i+=1
end
^D
a
b
c
</pre>
<p>
... however, the each_byte iterator is simpler conceptually and is
more likely to continuously work even if the string class happens to
be radically modified in the near future. One benefit of the iterators
is their tendency of robustness in the face of such changes, and I
think that is a characteristic of a good code.
</p>
Another iterator of string is each_line.
</p><p></p>
<pre>"a\nb\nc\n".each_line{|l| print l}
^D
a
b
c
</pre>
<p>
Every irksome task like finding delimiters for lines, generating
sub strings etc. are undertaken by iterators.
</p>
<p>
Now, let's rewrite this example with for statement.

</p><p></p>
<pre>for l in "a\nb\nc\n"
        print l
end
^D
a
b
c
</pre>
<p>
The  for statement does iteration by way of an each iterator. String's
each works the same as each_line as seen in the previous example.
</p>
<p>
Current iteration can be done or retried again from the top, by using
a control structure `retry' in conjunction with an iterated loop.
See below:
</p><p></p>
<pre>c = 0
for i in 0..4
	print i
     if i==2 and c==0
		c = 1
          print "\n"
          retry
     end
end
^D
012
01234
</pre>
<p>
The definition of an iterator may have an occurrence of `yield', which
moves control to the block of code that is passed to the iterator (we
will see more of this later). The example below defines the iterator
repeat, which repeats a block of code the number of times specified in
an argument.
</p>
<p></p>
<pre>def repeat(num)
	while num &lt; num
		yield
		num-=1
	end
end
repeat(4) {print "hello world\n"}
^D
hello world
hello world
hello world
</pre>
<p>
If it is not clear, then, print the value of num before and after the
occurrence of `yield'.
</p>
<p>
With `retry' one can define an iterator which works the same as `while',
but it is not practical due to slowness.
</p><p></p>
<pre>def MYWHILE(cond)
	return if not cond
	yield
	retry
end
i = 0
MYWHILE(i&lt;3) {print i,"\n" ;i+=1}
^D
0
1
2
</pre>
<p>
By now, I hope you must have got an idea about iterators. There are a
few restrictions, but you can write your original iterators; and in
fact, whenever you define a new data type, it is often convenient to
define suitable iterators to go with it. In this sense this, the above
examples `repeat() and `MYWHILE()' are not very useful. We will talk
about practical iterators after we have a better understanding of what
classes are.
</p>

<H2><B>Object Oriented Thinking</B></H2>
<p>
<B>`Object Oriented'</B> is indeed a very catchy phrase. Ruby claims to
be an object oriented scripting language; but what does 'object
oriented' exactly mean?
</p>
<p>
There have been a variety of answers to that question, all of which
probably boil down to about the same thing. Before arguing and summing
our definitions too quickly, let's think for a moment about the
traditional programming paradigm.
</p>
<p>
Traditionally, a programming problem is attacked by coming up with
some kinds of <B>data representations</B>, and <B>procedures</B> that
operate on that data. We can associate terms inert, passive,and
helpless with `data' under this model and that the data sits
completely at the mercy of a large procedural body with which we
associate terms active, logical, and all-powerful.
</p>
<p>
The problem with this approach is that programs are written by
programmers, who are only human and can only keep so much detail clear
in their heads at any one time.  As a project gets larger, its
procedural core grows to the point where it is difficult to remember how
the whole thing works.  Minor lapses of thinking and typographical
errors become more likely to result in well-concealed bugs.  Complex
and unintended interactions begin to emerge within the procedural core,
and maintaining it becomes like trying to carry around an angry squid
without letting any tentacles touch your face.  There are guidelines
for programming that can help to minimize and localize bugs within this
traditional paradigm, but there is a better solution that involves
fundamentally changing the way we work.
</p>
<p>
What object-oriented does is to let us delegate most of he mundane
and repetitive logical work <B>to the data itself</B>; we can then
change our concept of data from <B>passive</B> to <B>active</B>. Put
another way,
</p>
<ul>
<li>we stop treating each piece of data as a box with an open lid that
lets us reach in and throw things around.
<li>We start treating each piece of data as a working machine with a
closed lid and a few well marked switches and dials.
</ul>
<p>
What is described above as a "machine" may be very simple or complex
on the inside; we can't tell from the outside, and we don't allow
ourselves to open up the machine (except when we are absolutely sure
something is wrong with its design), so we are required to do things
like flip the switches and read the dials to interact with the data.
Once the machine is built, we don't want to have to think about how it
operates.
</p>
<p>
You might think we are just making more work for ourselves,but this
approach tends to do a nice job of preventing all kinds of things from
going wrong.
</p>
<p>
Let's start with an example that is to simple to be of practical value,
but should illustrate at least part of the concept. My 2-wheeler has a
trip meter. Its job is to keep track of the distance it has travelled
since the last time its reset button was pushed. How would we model
this in a programming language? In C, the trip meter would just be a
numeric variable, possibly of type float. The program would manipulate
the variable by increasing its value in small increments, with
occasional resets to zero when appropriate. What's wrong with that? A
bug in the program would assign a bogus value to the variable, for any
number of unexpected reasons. Anyone who has programmed in C knows what
it is like to spend hours or days tracking down such a bug whose cause
seems absurdly simple once it has been found. (The moment of finding the
bug is commonly indicated by the sound of a loud slap to the forehead.)
</p>
<p>
In object-oriented context, the same problem can be attacked in a
different manner. A programmer designing a trip meter is supposed not to
ask "which of the familiar data-types comes closest to resembling the
thing" but instead be interested in "how exactly is this thing supposed
to act?" The difference winds up being a profound one. It is necessary
to spend a little bit of time deciding exactly what an odometer is for,
and how the outside world expects to interact with it. We decide to
build a little machine with controls that allows us to increment it,
reset it, read its value, and nothing else.
</p>
<p>
We don't provide a way for a trip meter to be assigned arbitrary values;
why?  because  we  all  know trip meters don't work that way. There are
only a few things you should be able to do with a trip meter, and those
are all we allow. Thus, if something else in the program mistakenly
tries to place some other value (say, the target temperature of the
vehicle's  climate  control) into the trip meter, there is an immediate
indication  of  what  went wrong. We are told when running the program
(or possibly while compiling, depending on the nature of the language)
that <B>we  are  not  allowed  to  assign  arbitrary values to Trip meter
objects</B>. The message might not be exactly that clear, but it will be
reasonably  close  to that. It doesn't prevent the error, does it? But
it  quickly  points us in the direction of the cause. This is only one
of several ways in which OO programming can save a lot of wasted time.
</p>
<p>
We  commonly take one step of abstraction above this, because it turns
out  to  be as easy to build a factory that makes machines as it is to
make an individual machine. We aren't likely to build a single
trip meter directly; rather, we arrange for any number of trip meters to
be built from a single pattern. The pattern (or if you like, the
trip meter  factory)  corresponds  to  what  we  call  a  class, and an
factory)  corresponds  to an <B>object</B>. Most OO languages require
a class to  be  defined before we can have a new kind of object, but
ruby does not.
</p>
<p>
I would like to emphasize on the fact that the use of an OO language
will not enforce proper OO design. Indeed it is possible in any
language to write code that is unclear, sloppy, ill-conceived, buggy,
and wobbly all over. What ruby does for you (as opposed, especially, to
C++) is to make the practice  of OO programming feel natural enough
that even when you are working  on a small scale you don't feel a
necessity to resort to ugly code to save effort. We will be discussing
the ways in which ruby accomplishes that admirable goal as this guide
progresses; the next topic will be the "switches and dials" (object
methods) and from there we'll move on to the "factories" (classes). Are
you still with us?
</p>


<!-- *** BEGIN copyright *** -->
<hr>
<CENTER><SMALL><STRONG>

Copyright &copy; 2002, Hiran  Ramankutty.
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
Published in Issue 83 of <i>Linux Gazette</i>, October 2002
</STRONG></SMALL></CENTER>
<!-- *** END copyright *** -->
<HR>

<!--startcut ==========================================================-->
<CENTER>
<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="qubism.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue83/ramankutty.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="sandeep.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
</CENTER>
</BODY></HTML>
<!--endcut ============================================================-->