old-www/LDP/LG/issue85/ortiz.html

<!--startcut  ==============================================-->
<!-- *** BEGIN HTML header *** -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<title>Programming Bits: C# Data Types LG #85</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!-- *** END HTML header *** -->

<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="foolish.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue85/ortiz.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="qubism.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->

<!--endcut ============================================================-->

<TABLE BORDER><TR><TD WIDTH="200">
<A HREF="http://www.linuxgazette.com/">
<IMG ALT="LINUX GAZETTE" SRC="../gx/2002/lglogo_200x41.png"
	WIDTH="200" HEIGHT="41" border="0"></A>
<BR CLEAR="all">
<SMALL>...<I>making Linux just a little more fun!</I></SMALL>
</TD><TD WIDTH="380">


<CENTER>
<BIG><BIG><STRONG><FONT COLOR="maroon">Programming Bits: C# Data Types</FONT></STRONG></BIG></BIG>
<BR>
<STRONG>By <A HREF="../authors/ortiz.html">Ariel Ortiz Ramirez</A></STRONG>
</CENTER>

</TD></TR>
</TABLE>
<P>

<!-- END header -->


In my <a href="../issue84/ortiz.html"> previous
article</a>, I introduced the C# programming language and
explained how it works in the context of the Mono environment, an open-source
implementation of Microsoft's .NET framework. I will now go on to some details
on the data types supported by the C# programming language.</p>

<p>In the subsequent discussion I will use the following diagrammatic notation
to represent variables and objects:</p>

<p><img border="0" src="misc/ortiz/notation.png" width="363" height="151" alt="Variable and object diagram notation."></p>

<p>The variable diagram is a cubic figure that depicts three traits (name, value
and type) relevant during the compilation and execution of a program. In the von
Neumann architecture tradition, we will consider a variable as a chunk of memory
in which we hold a value that can be read or overwritten. The object diagram is
a rounded edge rectangle that denotes an object created at runtime and allocated
in a garbage collectable heap. For any object in a certain point in time, we
know what type (class) it is, and the current values of its instance
variables.&nbsp;</p>

<h2>Type Categories</h2>

<p>In the C# programming language, types are divided in three categories:&nbsp;</p>

<ul>
  <li>value types</li>
  <li>reference types&nbsp;</li>
  <li>pointer types</li>
</ul>
<p>In a variable that holds a <b>value type</b>, the data itself is directly
contained within the memory allotted to the variable. For example, the following
code</p>
<blockquote>
  <pre>int x = 5;</pre>

</blockquote>
<p>declares an 32-bit signed integer variable, called <code>x</code>,
initialized with a value of 5. The following figure represents the corresponding
variable diagram:</p>

<p><img border="0" src="misc/ortiz/intvar.png" width="136" height="102" alt="A simple int variable diagram."></p>

<p>Note how the value 5 is contained within the variable itself.</p>

<p>On the other hand, a variable that holds a <b>reference type</b> contains the
address of an object stored in the heap. The following code declares a variable
called <code>y</code> of type <code>object</code> which gets initialized, thanks
to the <code>new</code> operator, so that it refers to a new heap allocated <code>object</code>
instance (<code>object</code> is the base class of all C# types, but more of
this latter).</p>

<blockquote>
  <pre>object y = new object();</pre>

</blockquote>
<p>The corresponding variable/object diagram would be:</p>

<p><img border="0" src="misc/ortiz/objectvar.png" alt="An object diagram." width="290" height="159"></p>

<p>In this case, we can observe that the &quot;value&quot; part of the variable
diagram contains the start of an arrow that points to the referred object.
This arrow represents the address of the object inside the memory
heap.&nbsp;</p>

<p>Now, let us analyze what happens when we introduce two new variables and do
some copying from the original variables. Assume we have the following code:</p>

<blockquote>
  <pre>int a = x;
object b = y; </pre>

</blockquote>
<p>The result is displayed below:&nbsp;</p>

<p><img border="0" src="misc/ortiz/copying.png" width="473" height="282" alt="Results after copying values and references."></p>

<p>As can be observed, <code>a</code> has a copy of the value of <code>x</code>.
If we modify the value of one of these variables, the other variable would
remain unchanged. In the case of <code>y</code> and <code>b</code>, both
variables refer to the same object. If we alter the state of the object using variable
<code>y</code>, then the resulting changes will be observable using variable <code>b</code>,
and vice versa.</p>

<p>Aside from references into the heap, a reference type variable may also
contain the special value <code>null</code>, which denotes a nonexistent object.
Continuing with the last example, if we have the statements</p>

<blockquote>
  <pre>y = null;
b = null;</pre>

</blockquote>
<p>then variables <code>y</code> and <code>b</code> no longer refer to any
specific object, as shown below:</p>

<p><img border="0" src="misc/ortiz/nullvar.png" width="469" height="196" alt="Making references equal to null."></p>

<p>As can be seen, all references to the object instance have been lost. This
object has now turned into &quot;garbage&quot; because no other live reference to it
exists. As noted before, in C# the heap is garbage collected, which means that
the memory occupied by these &quot;dead&quot; objects is at sometime
automatically disposed and recycled by the runtime system. Other languages, such
as C++ and Pascal, do not have this kind of automatic memory management scheme.
Programmers for these languages must explicitly free any heap allocated memory
chunks that the program no longer requires. Failing to do so gives place to
memory leaks, in which certain portions of memory in a program are wasted
because they haven't been signaled for reuse. Experience has shown that explicit memory de-allocation is
cumbersome and error prone. This is why many modern programming languages (such as Java,
Python, Scheme and Smalltalk, just to name a few) also incorporate garbage
collection as part of their runtime environment.</p>

<p>Finally, a <b>pointer type</b> gives you similar capabilities as those found
with pointers in languages like C and C++. It is important to understand that
both pointers and references actually represent memory addresses, but that's
where their similarities end. References are tracked by the garbage collector,
pointers are not. You can perform pointer arithmetic on pointers, but not on
references. Because of the unwieldy nature associated to pointers, they can only
be used in C# within code marked as <code>unsafe</code>. This is an advanced
topic and I won't go
deeper into this matter at this time.</p>

<h2>Predefined Types</h2>

<p>C# has a rich set of predefined data types which you can use in your
programs. The following figure illustrates the hierarchy of the predefined data types
found in C#:</p>

<p><img border="0" src="misc/ortiz/types.png" width="576" height="226" alt="The C# type hierarchy."></p>

<p>Here is a brief summary of each of these types:</p>

<table border="0" cellpadding="10" bordercolorlight="#EBEBEB" bordercolordark="#000000" bordercolor="#FFFFFF" cellspacing="1" bgcolor="#000000">
  <tr>
    <th style="background-color: #C0C0C0">Type</th>
    <th style="background-color: #C0C0C0">Size in&nbsp;<br>
      Bytes</th>
    <th style="background-color: #C0C0C0">Description</th>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>bool</code></td>
    <td align="right" bgcolor="#EBEBEB">1</td>
    <td bgcolor="#EBEBEB">Boolean value. The only valid literals are <code>true</code>
      and <code>false</code>.</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>sbyte</code></td>
    <td align="right" bgcolor="#EBEBEB">1</td>
    <td bgcolor="#EBEBEB">Signed byte integer.</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>byte</code></td>
    <td align="right" bgcolor="#EBEBEB">1</td>
    <td bgcolor="#EBEBEB">Unsigned byte integer.</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>short</code></td>
    <td align="right" bgcolor="#EBEBEB">2</td>
    <td bgcolor="#EBEBEB">Signed short integer.</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>ushort</code></td>
    <td align="right" bgcolor="#EBEBEB">2</td>
    <td bgcolor="#EBEBEB">Unsigned short integer.</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>int</code></td>
    <td align="right" bgcolor="#EBEBEB">4</td>
    <td bgcolor="#EBEBEB">Signed integer. Literals may be in decimal (default)
      or hexadecimal notation (with an <code>0x</code> prefix).&nbsp; Examples: <code>26</code>,
      <code>0x1A</code></td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>uint</code></td>
    <td align="right" bgcolor="#EBEBEB">4</td>
    <td bgcolor="#EBEBEB">Unsigned integer. Examples: <code>26U</code>, <code>0x1AU
      </code>(mandatory <code>U</code> suffix)</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>long</code></td>
    <td align="right" bgcolor="#EBEBEB">8</td>
    <td bgcolor="#EBEBEB">Signed long integer. Examples: <code>26L</code>, <code>0x1AL
      </code>(mandatory <code>L</code> suffix)</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>ulong</code></td>
    <td align="right" bgcolor="#EBEBEB">8</td>
    <td bgcolor="#EBEBEB">Unsigned long integer. Examples: <code>26UL</code>, <code>0x1AUL
      </code>(mandatory <code>UL</code> suffix)</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>char</code></td>
    <td align="right" bgcolor="#EBEBEB">2</td>
    <td bgcolor="#EBEBEB">Unicode character. Example: 'A' (contained within
      single quotes)</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>float</code></td>
    <td align="right" bgcolor="#EBEBEB">4</td>
    <td bgcolor="#EBEBEB">IEEE 754 single precision floating point number.
      Examples: <code>1.2F</code>, <code>1E10F </code>(mandatory <code>F</code> suffix)</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>double</code></td>
    <td align="right" bgcolor="#EBEBEB">8</td>
    <td bgcolor="#EBEBEB">IEEE 754 double precision floating point number.
      Examples: <code>1.2</code>, <code>1E10</code>, <code>1D</code> (optional <code>D</code>
      suffix)</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>decimal</code></td>
    <td align="right" bgcolor="#EBEBEB">16</td>
    <td bgcolor="#EBEBEB">Numeric data type suitable for financial and monetary
      calculations, exact to the 28th decimal place. Example: <code>123.45M</code>
      (mandatory <code>M</code> suffix)</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>object</code></td>
    <td align="right" bgcolor="#EBEBEB">8+</td>
    <td bgcolor="#EBEBEB">Ultimate base type for both value and reference types.
      Has no literal representation.</td>
  </tr>
  <tr>
    <td bgcolor="#EBEBEB"><code>string</code></td>
    <td align="right" bgcolor="#EBEBEB">20+</td>
    <td bgcolor="#EBEBEB">Immutable sequence of Unicode characters. Example: <code>&quot;hello
      world!\n&quot; </code>(contained within double quotes)</td>
  </tr>
</table>

<p align="left">C#'s has a unified type system such that a value of any type can
be treated as an object. Every type in C# derives, directly or indirectly, from
the <code>object</code> class. Reference types are treated as objects simply by
viewing them as <code>object</code> types. Value types are treated as objects by
performing <i>boxing</i> and <i>unboxing</i> operations. I will go deeper into
these concepts in my next article.</p>

<h2>Classes and Structures</h2>

<p>C# allows you to define new reference and value types. Reference types are
defined using the <code>class</code> construct, while value types are defined
using <code>struct</code>. Lets see them both in action in the following
program:</p>

<blockquote>
  <pre>struct ValType {
    public int i;
    public double d;
    public ValType(int i, double d) {
        this.i = i;
        this.d = d;
    }
    public override string ToString() {
        return &quot;(&quot; + i + &quot;, &quot; + d + &quot;)&quot;;
    }
}

class RefType {
    public int i;
    public double d;
    public RefType(int i, double d) {
        this.i = i;
        this.d = d;
    }
    public override string ToString() {
        return &quot;(&quot; + i + &quot;, &quot; + d + &quot;)&quot;;
    }
}

public class Test {
    public static void Main (string[] args) {

        // PART 1
        ValType v1;
        RefType r1;
        v1 = new ValType(3, 4.2);
        r1 = new RefType(4, 5.1);
        System.Console.WriteLine(&quot;PART 1&quot;);
        System.Console.WriteLine(&quot;v1 = &quot; + v1);
        System.Console.WriteLine(&quot;r1 = &quot; + r1);

        // PART 2
        ValType v2;
        RefType r2;
        v2 = v1;
        r2 = r1;
        v2.i++; v2.d++;
        r2.i++; r2.d++;
        System.Console.WriteLine(&quot;PART 2&quot;);
        System.Console.WriteLine(&quot;v1 = &quot; + v1);
        System.Console.WriteLine(&quot;r1 = &quot; + r1);
    }
}</pre>

</blockquote>
<p>First we have the structure <code>ValType</code>. It defines two instance
variables, <code>i</code> and <code>d</code> of type <code>int</code> and <code>double</code>,
respectively. They are declared as <code>public</code>, which means they can be accessed
from any part of the program where this structure is visible. The structure
defines a constructor, which has the same name as the structure itself and,
contrary to method definitions, has no return type. Our constructor is in charge
of the initialization of the two instance variables. The keyword <code>this</code>
is used here to obtain a reference to the instance being created and has to be
used explicitly in order to avoid the ambiguity generated when a parameter name
clashes with the an instance variable name. The structure also defines a method
called <code>ToString</code>, that returns the external representation of a
structure instance as a string of characters. This method overrides the <code>ToString</code>
method (thus the use of the <code>override</code> modifier) defined in this
structure's base type (the <code>object</code> class). The body of this method
uses the string concatenation operator (+) to generate a string of the form
&quot;(<i>i</i>, <i>d</i>)&quot;, where <i>i</i> and <i>d</i> represent the
current value of those instance variables, and finally returns the expected
result.</p>

<p>As can be observed, the <code>RefType</code> class has basically the same
code as <code>ValType</code>. Let us examine the runtime behavior of variables
declared using both types so we can further understand their differences. The <code>Test</code>
class has a <code>Main</code> method that establishes the program entry point.
In the first part of the program (marked with the &quot;PART 1&quot; comment) we
have one value type variable and one reference type variable. This is how they
look after the assignments:</p>

<img border="0" src="misc/ortiz/v1r1.png" alt="Class and structure variable diagram." width="509" height="231">

<p>The value type variable, <code>v1</code>, has its instance variables contained
within the variable itself. The <code>new</code> operator used in the assignment</p>

<blockquote>
  <pre>v1 = new ValType(3, 4.2);</pre>

</blockquote>
<p>does not allocate any memory in the heap as we've learned from other
languages. Because <code>ValType</code> is a value type, the new operator is
only used in this context to call its constructor and this way initialize the
instance variables. Because <code>v1</code> is a local variable, it's actually
stored as part of the method's activation record (stack frame), and it exists
just because it's declared.&nbsp;</p>

<p>Objects referred by reference type variables have to be created explicitly at
some point in the program. In the assignment</p>

<pre>    r1 = new RefType(4, 5.1);</pre>

<p>the <code>new</code> operator does the expected dynamic memory allocation
because in this case <code>RefType</code> is a reference type. The corresponding
constructor gets called immediately afterwards. Variable <code>v2</code> is also
stored in the method's activation record (because it's also a local variable)
but it's just big enough to hold the reference (address) of the newly created
instance. All the instance's data is in fact stored in the heap.</p>

<p>Now lets check what happens when the second part of the program (marked after
the &quot;PART 2&quot; comment) is executed. Two new variable are introduced and
they are assigned the values of the two original ones. Then, each of the
instance variables of the new variables are incremented by one (using the <code>++</code>
operator).</p>

<p><img border="0" src="misc/ortiz/v2r2.png" alt="Class and structure instances after copying." width="442" height="393"></p>

<p>When <code>v1</code> is copied into <code>v2</code>, each individual instance
variable of the source is copied individually into the destination, thus
producing totally independent values. So any modification done over <code>v2</code>
doesn't affect <code>v1</code> at all. This is not so with <code>r1</code> and <code>r2</code>
in which only the reference (address) is copied. Any change to the object
referred by <code>r2</code> is immediately seen by <code>r1</code>, because they
both refer in fact to the same object.</p>

<p>If you check the type hierarchy diagram above, you will notice that simple
data types such as <code>int</code>, <code>bool</code> and <code>char</code> are
actually <code>struct</code> value types, while <code>object</code> and <code>string</code>
are <code>class</code> reference types.</p>

<p>If you want to compile and run the <a href="misc/ortiz/varsexample.cs.txt">source
code</a> of the above example, type at the Linux shell prompt:</p>

<blockquote>
  <pre>mcs varsexample.cs</pre>

  <pre>mono varsexample.exe</pre>

</blockquote>

<p>The output should be:</p>

<blockquote>

<pre>PART 1
v1 = (3, 4.2)
r1 = (4, 5.1)
PART 2
v1 = (3, 4.2)
r1 = (5, 6.1)</pre>

</blockquote>

<h2>Resources</h2>

<dl>
  <dt>    <a href="http://www.go-mono.com/">http://www.go-mono.com/</a></dt>
  <dd>The official Mono home page. You can find here the download and install instructions
    for the Mono platform. It includes a C# compiler, a runtime environment, and
    a class library.</dd>
  <dt><a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cscon/html/vcoriCStartPage.asp">http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cscon/html/vcoriCStartPage.asp</a></dt>
  <dd>General information on the C# programming language.</dd>
</dl>


<!-- *** BEGIN copyright *** -->
<hr>
<CENTER><SMALL><STRONG>
Copyright &copy; 2002, Ariel Ortiz Ramirez.
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
Published in Issue 85 of <i>Linux Gazette</i>, December 2002
</STRONG></SMALL></CENTER>
<!-- *** END copyright *** -->
<HR>

<!--startcut ==========================================================-->
<CENTER>
<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="foolish.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue85/ortiz.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><A HREF="../lg_faq.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="qubism.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom"  ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
</CENTER>
</BODY></HTML>
<!--endcut ============================================================-->