450 lines
20 KiB
HTML
450 lines
20 KiB
HTML
<HTML>
|
||
<center>
|
||
<A HREF="../tlk-toc.html"> Table of Contents</A>,
|
||
<A href="../tlk.html" target="_top"> Show Frames</A>,
|
||
<A href="../basics/sw.html" target="_top"> No Frames</A>
|
||
</center>
|
||
<hr>
|
||
<META NAME="TtH" CONTENT="1.03">
|
||
|
||
<p>
|
||
<H1><A NAME="tth_chAp2">Chapter 2 <br>Software Basics</H1>
|
||
<A NAME="sw-basics-chapter"></A>
|
||
<p>
|
||
<img src="../logos/sit3-bw-tran.1.gif"><br> <tt><b></tt></b>
|
||
A program is a set of computer instructions that perform a particular task.
|
||
That program can be written in assembler, a very low level computer language,
|
||
or in a high level, machine independent language such as the C programming
|
||
language.
|
||
An operating system is a special program which allows the user to run applications such as
|
||
spreadsheets and word processors.
|
||
This chapter introduces basic programming principles and gives an overview of
|
||
the aims and functions of an operating system.
|
||
|
||
<p>
|
||
|
||
<H2><A NAME="tth_sEc2.1">2.1 </A> Computer Languages</H2>
|
||
|
||
<H3><A NAME="tth_sEc2.1.1">2.1.1 </A> Assembly Languages</H3>
|
||
|
||
<p>
|
||
The instructions that a CPU fetches from memory and executes are not at all understandable
|
||
to human beings.
|
||
They are machine codes which tell the computer precisely what to do.
|
||
The hexadecimal number <em>0x89E5</em> is an Intel 80486 instruction which copies the contents
|
||
of the ESP register to the EBP register.
|
||
One of the first software tools invented for the earliest computers was an assembler,
|
||
a program which takes a human readable source file and assembles it into
|
||
machine code.
|
||
Assembly languages explicitly handle registers and operations on
|
||
data and they are specific to a particular microprocessor.
|
||
The assembly language for an Intel X86 microprocessor is very different
|
||
to the assembly language for an Alpha AXP microprocessor.
|
||
The following Alpha AXP assembly code shows the sort of operations that a
|
||
program can perform:
|
||
|
||
<pre>
|
||
ldr r16, (r15) ; Line 1
|
||
ldr r17, 4(r15) ; Line 2
|
||
beq r16,r17,100 ; Line 3
|
||
str r17, (r15) ; Line 4
|
||
100: ; Line 5
|
||
|
||
</pre>
|
||
<p>
|
||
The first statement (on line 1) loads register 16 from the
|
||
address held in register 15.
|
||
The next instruction loads register 17 from the next location in
|
||
memory.
|
||
Line 3 compares the contents of register 16 with that of register 17
|
||
and, if they are equal, branches to label <em>100</em>.
|
||
If the registers do not contain the same value then the program
|
||
continues to line 4 where the contents of r17 are saved into memory.
|
||
If the registers do contain the same value then no data needs to
|
||
be saved.
|
||
Assembly level programs are tedious and tricky to write and prone to
|
||
errors.
|
||
Very little of the Linux kernel is written in assembly language and
|
||
those parts that are are written only for efficiency and they are
|
||
specific to particular microprocessors.
|
||
|
||
<p>
|
||
|
||
<H3><A NAME="tth_sEc2.1.2">2.1.2 </A> The C Programming Language and Compiler</H3>
|
||
|
||
<p>
|
||
Writing large programs in assembly language is a difficult and time
|
||
consuming task.
|
||
It is prone to error and the resulting program is not portable, being
|
||
tied to one particular processor family.
|
||
It is far better to use a machine independent language
|
||
like C.
|
||
C allows you to describe programs in terms of their logical algorithms and the data
|
||
that they operate on.
|
||
Special programs called compilers read the C program and translate
|
||
it into assembly language, generating machine specific code from it.
|
||
A good compiler can generate assembly instructions that are very
|
||
nearly as efficient as those written by a good assembly programmer.
|
||
Most of the Linux kernel is written in the C language.
|
||
The following C fragment:
|
||
|
||
<pre>
|
||
if (x != y)
|
||
x = y ;
|
||
</pre>
|
||
<p>
|
||
performs exactly the same operations as the previous example assembly code.
|
||
If the contents of the variable <tt>x</tt> are not the same as the contents
|
||
of variable <tt>y</tt> then the contents of <tt>y</tt> will be copied to <tt>x</tt>.
|
||
C code is organized into routines, each of which perform a task.
|
||
Routines may return any value or data type supported by C.
|
||
Large programs like the Linux kernel comprise many separate C source
|
||
modules each with its own routines and data structures.
|
||
These C source code modules group together logical functions such as
|
||
filesystem handling code.
|
||
|
||
<p>
|
||
C supports many types of variables, a variable is a location in memory which
|
||
can be referenced by a symbolic name.
|
||
In the above C fragment <tt>x</tt> and <tt>y</tt> refer to locations in memory.
|
||
The programmer does not care where in memory the variables are put, it
|
||
is the linker (see below) that has to worry about that.
|
||
Some variables contain different sorts of data,
|
||
integer and floating point and others are pointers.
|
||
|
||
<p>
|
||
Pointers are variables that contain the address, the location in memory
|
||
of other data.
|
||
Consider a variable called <em>x</em>, it might live in memory at address <em>0x80010000</em>.
|
||
You could have a pointer, called <em>px</em>, which points at <em>x</em>.
|
||
<em>px</em> might live at address <em>0x80010030</em>.
|
||
The value of <em>px</em> would be <em>0x80010000</em>: the address of the variable <em>x</em>.
|
||
|
||
<p>
|
||
C allows you to bundle together related variables into data structures.
|
||
For example,
|
||
|
||
<pre>
|
||
struct {
|
||
int i ;
|
||
char b ;
|
||
} my_struct ;
|
||
</pre>
|
||
<p>
|
||
is a data structure called <tt>my_struct</tt> which contains two elements,
|
||
an integer (32 bits of data storage) called <tt>i</tt> and a character (8 bits
|
||
of data) called <tt>b</tt>.
|
||
|
||
<p>
|
||
|
||
<H3><A NAME="tth_sEc2.1.3">2.1.3 </A> Linkers</H3>
|
||
|
||
<p>
|
||
Linkers are programs that link together several object modules and libraries
|
||
to form a single, coherent, program.
|
||
Object modules are the machine code output from an assembler or compiler and contain
|
||
executable machine code and data together with information that allows the linker
|
||
to combine the modules together to form a program.
|
||
For example one module might contain all of a program's database functions and
|
||
another module its command line argument handling functions.
|
||
Linkers fix up references between these object modules, where a routine or data
|
||
structure referenced in one module actually exists in another module.
|
||
The Linux kernel is a single, large program linked together from its
|
||
many constituent object modules.
|
||
|
||
<p>
|
||
|
||
<H2><A NAME="tth_sEc2.2">2.2 </A> What is an Operating System?</H2>
|
||
|
||
<p>
|
||
Without software a computer is just a pile of electronics that gives off heat.
|
||
If the hardware is the heart of a computer then the software is its soul.
|
||
An operating system is a collection of system programs which allow the user to
|
||
run application software.
|
||
The operating system abstracts the real hardware of the system and presents
|
||
the system's users and its applications with a virtual machine.
|
||
In a very real sense the software provides
|
||
the character of the system. Most PCs can run one or more operating
|
||
systems and each one can have a very different look and feel.
|
||
Linux is made up of a number of functionally
|
||
separate pieces that, together, comprise the operating system. One
|
||
obvious part of Linux is the kernel itself; but even that would be useless
|
||
without libraries or shells.
|
||
|
||
<p>
|
||
In order to start understanding what an operating system is,
|
||
consider what happens when you type an apparently simple command:
|
||
|
||
<pre>
|
||
|
||
$ ls
|
||
Mail c images perl
|
||
docs tcl
|
||
$
|
||
|
||
</pre>
|
||
<p>
|
||
The $ is a prompt put out by a login shell (in this case <tt>bash</tt>).
|
||
This means that it is waiting for you, the user, to type some command. Typing
|
||
<font face="helvetica">ls</font> causes the keyboard driver to recognize that characters have
|
||
been typed. The keyboard driver
|
||
passes them to the shell which processes that command
|
||
by looking for an executable image of the same name.
|
||
It finds that image,
|
||
in <tt>/bin/ls</tt>. Kernel services are called to pull the <font face="helvetica">ls</font>
|
||
executable image into virtual memory and start executing it. The <font face="helvetica">ls</font>
|
||
image makes calls to the file subsystem of the kernel to find out what files
|
||
are available.
|
||
The filesystem might make use of cached filesystem
|
||
information or use the disk device
|
||
driver to read this information from the disk.
|
||
It might even cause a network driver to exchange information with
|
||
a remote machine to find out details of remote files that this system has
|
||
access to (filesystems can be remotely mounted via the Networked File System or NFS).
|
||
Whichever way the information is located, <font face="helvetica">ls</font>
|
||
writes that information out and the video driver displays it on the
|
||
screen.
|
||
|
||
<p>
|
||
All of the above seems rather complicated but it shows that even
|
||
most simple commands reveal that
|
||
an operating system is in fact a co-operating set of functions that
|
||
together give you, the user, a coherent view of the system.
|
||
|
||
<p>
|
||
|
||
<H3><A NAME="tth_sEc2.2.1">2.2.1 </A> Memory management</H3>
|
||
|
||
<p>
|
||
With infinite resources, for example memory, many of the
|
||
things that an operating system has to do would be redundant.
|
||
One of the basic tricks of any operating system is the ability to make a small amount
|
||
of physical memory behave like rather more memory.
|
||
This apparently large memory is known as virtual memory.
|
||
The idea is that the software running in the system
|
||
is fooled into believing that it is running in a lot of memory.
|
||
The system divides the memory into easily handled pages and swaps these pages onto a
|
||
hard disk as the system runs.
|
||
The software does not notice because of another trick, multi-processing.
|
||
|
||
<p>
|
||
|
||
<H3><A NAME="tth_sEc2.2.2">2.2.2 </A> Processes</H3>
|
||
|
||
<p>
|
||
A process could be thought of as a program in action, each process is a separate entity that is
|
||
running a particular program.
|
||
If you look at the processes on your Linux system, you will see that
|
||
there are rather a lot. For example, typing <font face="helvetica">ps</font> shows the
|
||
following processes on my system:
|
||
|
||
<pre>
|
||
$ ps
|
||
PID TTY STAT TIME COMMAND
|
||
158 pRe 1 0:00 -bash
|
||
174 pRe 1 0:00 sh /usr/X11R6/bin/startx
|
||
175 pRe 1 0:00 xinit /usr/X11R6/lib/X11/xinit/xinitrc --
|
||
178 pRe 1 N 0:00 bowman
|
||
182 pRe 1 N 0:01 rxvt -geometry 120x35 -fg white -bg black
|
||
184 pRe 1 < 0:00 xclock -bg grey -geometry -1500-1500 -padding 0
|
||
185 pRe 1 < 0:00 xload -bg grey -geometry -0-0 -label xload
|
||
187 pp6 1 9:26 /bin/bash
|
||
202 pRe 1 N 0:00 rxvt -geometry 120x35 -fg white -bg black
|
||
203 ppc 2 0:00 /bin/bash
|
||
1796 pRe 1 N 0:00 rxvt -geometry 120x35 -fg white -bg black
|
||
1797 v06 1 0:00 /bin/bash
|
||
3056 pp6 3 < 0:02 emacs intro/introduction.tex
|
||
3270 pp6 3 0:00 ps
|
||
$
|
||
</pre>
|
||
<p>
|
||
If my system had many CPUs then each process could (theoretically at least)
|
||
run on a different CPU. Unfortunately, there is only one so again the
|
||
operating system resorts to trickery by running each process in turn
|
||
for a short period.
|
||
This period of time is known as a time-slice.
|
||
This trick is known as multi-processing or scheduling and it fools each
|
||
process into thinking that it is the only process.
|
||
Processes are protected from one another so that if one process crashes or malfunctions then
|
||
it will not affect any others.
|
||
The operating system achieves this by giving each process a separate address
|
||
space which only they have access to.
|
||
|
||
<p>
|
||
|
||
<H3><A NAME="tth_sEc2.2.3">2.2.3 </A> Device drivers</H3>
|
||
|
||
<p>
|
||
Device drivers make up the major part of the Linux kernel. Like other
|
||
parts of the operating system, they operate in a highly privileged
|
||
environment and can cause disaster if they get things wrong.
|
||
Device drivers control the interaction between the operating system
|
||
and the hardware device that they are controlling. For example,
|
||
the filesystem makes use of a general block device interface when writing
|
||
blocks to an IDE disk. The driver takes care of the details and makes
|
||
device specific things happen.
|
||
Device drivers are specific to the controller chip that they are driving
|
||
which is why, for example, you need the NCR810 SCSI driver if your system
|
||
has an NCR810 SCSI controller.
|
||
|
||
<p>
|
||
|
||
<H3><A NAME="tth_sEc2.2.4">2.2.4 </A> The Filesystems</H3>
|
||
|
||
<p>
|
||
In Linux, as it is for Unix <sup><font size=-4><tt>T</tt>M</font></sup> , the separate filesystems that the system may use are
|
||
not accessed by device identifiers (such as a drive number or a drive name) but instead
|
||
they are combined into a single hierarchical tree structure that represents the
|
||
filesystem as a single entity.
|
||
Linux adds each new filesystem into this single filesystem tree as they are mounted
|
||
onto a mount directory, for example <tt>/mnt/cdrom</tt>.
|
||
One of the most important features of Linux is its support for many different filesystems.
|
||
This makes it very flexible and well able to coexist with other operating systems.
|
||
The most popular filesystem for Linux is the <tt>EXT2</tt> filesystem and this is the
|
||
filesystem supported by most of the Linux distributions.
|
||
|
||
<p>
|
||
A filesystem gives the user a sensible view of files and directories held on the
|
||
hard disks of the system regardless of the filesystem type or the characteristics
|
||
of the underlying physical device.
|
||
Linux transparently supports many different filesystems
|
||
(for example <tt>MS-DOS</tt> and <tt>EXT2</tt>)
|
||
and presents all of the mounted files and filesystems as one integrated virtual
|
||
filesystem.
|
||
So, in general, users and processes do not need to know what sort of filesystem that any
|
||
file is part of, they just use them.
|
||
|
||
<p>
|
||
The block device drivers hide the differences between the physical block device
|
||
types (for example, <tt>IDE</tt> and <tt>SCSI</tt>) and, so far as each filesystem is
|
||
concerned, the physical devices are just linear collections of blocks of data.
|
||
The block sizes may vary between devices, for example 512 bytes is common for
|
||
floppy devices whereas 1024 bytes is common for IDE devices and, again, this
|
||
is hidden from the users of the system.
|
||
An <tt>EXT2</tt> filesystem looks the same no matter what device holds it.
|
||
|
||
<p>
|
||
|
||
<H2><A NAME="tth_sEc2.3">2.3 </A> Kernel Data Structures</H2>
|
||
The operating system must keep a lot of information about the current state of the
|
||
system.
|
||
As things happen within the system these data structures must be changed to reflect
|
||
the current reality.
|
||
For example, a new process might be created when a user logs onto the system.
|
||
The kernel must create a data structure representing the new process and
|
||
link it with the data structures representing all of the other processes in the system.
|
||
|
||
<p>
|
||
Mostly these data structures exist in physical memory and are accessible only by
|
||
the kernel and its subsystems.
|
||
Data structures contain data and pointers; addresses of other data structures or the
|
||
addresses of routines. Taken all together, the data structures used by the Linux
|
||
kernel can look very confusing.
|
||
Every data structure has a purpose and although some are used by several kernel
|
||
subsystems, they are more simple than they appear at first sight.
|
||
|
||
<p>
|
||
Understanding the Linux kernel hinges on understanding its data structures and the
|
||
use that the various functions within the Linux kernel makes of them.
|
||
This book bases its description of the Linux kernel on its data structures.
|
||
It talks about each kernel subsystem in terms of its algorithms, its methods of getting
|
||
things done, and their usage of the kernel's data structures.
|
||
|
||
<p>
|
||
|
||
<H3><A NAME="tth_sEc2.3.1">2.3.1 </A> Linked Lists</H3>
|
||
Linux uses a number of software engineering techniques to link together its data structures.
|
||
On a lot of occasions it uses <em>linked</em> or <em>chained</em> data structures.
|
||
If each data structure describes a single instance or occurance of something, for example a process
|
||
or a network device, the kernel must be able to find all of the instances.
|
||
In a linked list a root pointer contains the address of the first data structure, or <em>element</em>,
|
||
in the list and each data structure contains a pointer to the next element in the list.
|
||
The last element's next pointer would be 0 or NULL to show that it is the end of the list.
|
||
In a <em>doubly linked</em> list each element contains both a pointer to the next element in the
|
||
list but also a pointer to the previous element in the list.
|
||
Using doubly linked lists makes it easier to add or remove elements from the middle of list although
|
||
you do need more memory accesses.
|
||
This is a typical operating system trade off: memory accesses versus CPU cycles.
|
||
|
||
<p>
|
||
|
||
<H3><A NAME="tth_sEc2.3.2">2.3.2 </A> Hash Tables</H3>
|
||
Linked lists are handy ways of tying data structures together but navigating linked lists
|
||
can be inefficient.
|
||
If you were searching for a particular element, you might easily have to look at the whole
|
||
list before you find the one that you need.
|
||
Linux uses another technique, <em>hashing</em> to get around this restriction.
|
||
A <em>hash table</em> is an <em>array</em> or <em>vector</em> of pointers.
|
||
An array, or vector, is simply a set of things coming one after another in memory.
|
||
A bookshelf could be said to be an array of books.
|
||
Arrays are accessed by an <em>index</em>, the index is an offset into the array.
|
||
Taking the bookshelf analogy a little further, you could describe each book by its position
|
||
on the shelf; you might ask for the 5th book.
|
||
|
||
<p>
|
||
A hash table is an array of pointers to data structures and its index is derived from information
|
||
in those data structures.
|
||
If you had data structures describing the population of a village then you could use a person's
|
||
age as an index.
|
||
To find a particular person's data you could use their age as an index into the population hash
|
||
table and then follow the pointer to the data structure containing the person's details.
|
||
Unfortunately many people in the village are likely to have the same age and so the hash
|
||
table pointer becomes a pointer to a chain or list of data structures each describing people
|
||
of the same age.
|
||
However, searching these shorter chains is still faster than searching all of the data
|
||
structures.
|
||
|
||
<p>
|
||
As a hash table speeds up access to commonly used data structures, Linux often uses hash tables
|
||
to implement <em>caches</em>.
|
||
Caches are handy information that needs to be accessed quickly and are usually a subset
|
||
of the full set of information available.
|
||
Data structures are put into a cache and kept there because the kernel often
|
||
accesses them.
|
||
There is a drawback to caches in that they are more complex to use and maintain than simple
|
||
linked lists or hash tables.
|
||
If the data structure can be found in the cache (this is known as a <em>cache hit</em>,
|
||
then all well and good.
|
||
If it cannot then all of the relevant data structures must be searched and, if the
|
||
data structure exists at all, it must be added into the cache.
|
||
In adding new data structures into the cache an old cache entry may need discarding.
|
||
Linux must decide which one to discard, the danger being that the discarded data structure
|
||
may be the next one that Linux needs.
|
||
|
||
<p>
|
||
|
||
<H3><A NAME="tth_sEc2.3.3">2.3.3 </A> Abstract Interfaces</H3>
|
||
The Linux kernel often abstracts its interfaces.
|
||
An interface is a collection of routines and data structures which operate in a particular
|
||
way.
|
||
For example all network device drivers have to provide certain routines in which particular
|
||
data structures are operated on.
|
||
This way there can be generic layers of code using the services (interfaces) of lower layers
|
||
of specific code.
|
||
The network layer is generic and it is supported by device specific code that conforms to a
|
||
standard interface.
|
||
|
||
<p>
|
||
Often these lower layers <em>register</em> themselves with the upper layer at boot time.
|
||
This registration usually involves adding a data structure to a linked list.
|
||
For example each filesystem built into the kernel registers itself with the kernel
|
||
at boot time or, if you are using modules, when the filesystem is first used.
|
||
You can see which filesystems have registered themselves by looking at the file <tt>/proc/filesystems</tt>.
|
||
The registration data structure often includes pointers to functions.
|
||
These are the addresses of software functions that perform particular tasks.
|
||
Again, using filesystem registration as an example, the data structure that each filesystem passes
|
||
to the Linux kernel as it registers includes the address of a filesystem specfic routine which must be
|
||
called whenever that filesystem is mounted.
|
||
|
||
<p>
|
||
|
||
<p><hr><small>File translated from T<sub><font size=-1>E</font></sub>X by <a href="http://hutchinson.belmont.ma.us/tth/tth.html">T<sub><font size=-1>T</font></sub>H</a>, version 1.0.</small>
|
||
<hr>
|
||
<center>
|
||
<A HREF="../basics/sw.html"> Top of Chapter</A>,
|
||
<A HREF="../tlk-toc.html"> Table of Contents</A>,
|
||
<A href="../tlk.html" target="_top"> Show Frames</A>,
|
||
<A href="../basics/sw.html" target="_top"> No Frames</A><br>
|
||
<EFBFBD> 1996-1999 David A Rusling <A HREF="../misc/copyright.html">copyright notice</a>.
|
||
</center>
|
||
</HTML> |