450 lines
20 KiB
HTML
450 lines
20 KiB
HTML
|
<HTML>
|
|||
|
<center>
|
|||
|
<A HREF="../tlk-toc.html"> Table of Contents</A>,
|
|||
|
<A href="../tlk.html" target="_top"> Show Frames</A>,
|
|||
|
<A href="../basics/sw.html" target="_top"> No Frames</A>
|
|||
|
</center>
|
|||
|
<hr>
|
|||
|
<META NAME="TtH" CONTENT="1.03">
|
|||
|
|
|||
|
<p>
|
|||
|
<H1><A NAME="tth_chAp2">Chapter 2 <br>Software Basics</H1>
|
|||
|
<A NAME="sw-basics-chapter"></A>
|
|||
|
<p>
|
|||
|
<img src="../logos/sit3-bw-tran.1.gif"><br> <tt><b></tt></b>
|
|||
|
A program is a set of computer instructions that perform a particular task.
|
|||
|
That program can be written in assembler, a very low level computer language,
|
|||
|
or in a high level, machine independent language such as the C programming
|
|||
|
language.
|
|||
|
An operating system is a special program which allows the user to run applications such as
|
|||
|
spreadsheets and word processors.
|
|||
|
This chapter introduces basic programming principles and gives an overview of
|
|||
|
the aims and functions of an operating system.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H2><A NAME="tth_sEc2.1">2.1 </A> Computer Languages</H2>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.1.1">2.1.1 </A> Assembly Languages</H3>
|
|||
|
|
|||
|
<p>
|
|||
|
The instructions that a CPU fetches from memory and executes are not at all understandable
|
|||
|
to human beings.
|
|||
|
They are machine codes which tell the computer precisely what to do.
|
|||
|
The hexadecimal number <em>0x89E5</em> is an Intel 80486 instruction which copies the contents
|
|||
|
of the ESP register to the EBP register.
|
|||
|
One of the first software tools invented for the earliest computers was an assembler,
|
|||
|
a program which takes a human readable source file and assembles it into
|
|||
|
machine code.
|
|||
|
Assembly languages explicitly handle registers and operations on
|
|||
|
data and they are specific to a particular microprocessor.
|
|||
|
The assembly language for an Intel X86 microprocessor is very different
|
|||
|
to the assembly language for an Alpha AXP microprocessor.
|
|||
|
The following Alpha AXP assembly code shows the sort of operations that a
|
|||
|
program can perform:
|
|||
|
|
|||
|
<pre>
|
|||
|
ldr r16, (r15) ; Line 1
|
|||
|
ldr r17, 4(r15) ; Line 2
|
|||
|
beq r16,r17,100 ; Line 3
|
|||
|
str r17, (r15) ; Line 4
|
|||
|
100: ; Line 5
|
|||
|
|
|||
|
</pre>
|
|||
|
<p>
|
|||
|
The first statement (on line 1) loads register 16 from the
|
|||
|
address held in register 15.
|
|||
|
The next instruction loads register 17 from the next location in
|
|||
|
memory.
|
|||
|
Line 3 compares the contents of register 16 with that of register 17
|
|||
|
and, if they are equal, branches to label <em>100</em>.
|
|||
|
If the registers do not contain the same value then the program
|
|||
|
continues to line 4 where the contents of r17 are saved into memory.
|
|||
|
If the registers do contain the same value then no data needs to
|
|||
|
be saved.
|
|||
|
Assembly level programs are tedious and tricky to write and prone to
|
|||
|
errors.
|
|||
|
Very little of the Linux kernel is written in assembly language and
|
|||
|
those parts that are are written only for efficiency and they are
|
|||
|
specific to particular microprocessors.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.1.2">2.1.2 </A> The C Programming Language and Compiler</H3>
|
|||
|
|
|||
|
<p>
|
|||
|
Writing large programs in assembly language is a difficult and time
|
|||
|
consuming task.
|
|||
|
It is prone to error and the resulting program is not portable, being
|
|||
|
tied to one particular processor family.
|
|||
|
It is far better to use a machine independent language
|
|||
|
like C.
|
|||
|
C allows you to describe programs in terms of their logical algorithms and the data
|
|||
|
that they operate on.
|
|||
|
Special programs called compilers read the C program and translate
|
|||
|
it into assembly language, generating machine specific code from it.
|
|||
|
A good compiler can generate assembly instructions that are very
|
|||
|
nearly as efficient as those written by a good assembly programmer.
|
|||
|
Most of the Linux kernel is written in the C language.
|
|||
|
The following C fragment:
|
|||
|
|
|||
|
<pre>
|
|||
|
if (x != y)
|
|||
|
x = y ;
|
|||
|
</pre>
|
|||
|
<p>
|
|||
|
performs exactly the same operations as the previous example assembly code.
|
|||
|
If the contents of the variable <tt>x</tt> are not the same as the contents
|
|||
|
of variable <tt>y</tt> then the contents of <tt>y</tt> will be copied to <tt>x</tt>.
|
|||
|
C code is organized into routines, each of which perform a task.
|
|||
|
Routines may return any value or data type supported by C.
|
|||
|
Large programs like the Linux kernel comprise many separate C source
|
|||
|
modules each with its own routines and data structures.
|
|||
|
These C source code modules group together logical functions such as
|
|||
|
filesystem handling code.
|
|||
|
|
|||
|
<p>
|
|||
|
C supports many types of variables, a variable is a location in memory which
|
|||
|
can be referenced by a symbolic name.
|
|||
|
In the above C fragment <tt>x</tt> and <tt>y</tt> refer to locations in memory.
|
|||
|
The programmer does not care where in memory the variables are put, it
|
|||
|
is the linker (see below) that has to worry about that.
|
|||
|
Some variables contain different sorts of data,
|
|||
|
integer and floating point and others are pointers.
|
|||
|
|
|||
|
<p>
|
|||
|
Pointers are variables that contain the address, the location in memory
|
|||
|
of other data.
|
|||
|
Consider a variable called <em>x</em>, it might live in memory at address <em>0x80010000</em>.
|
|||
|
You could have a pointer, called <em>px</em>, which points at <em>x</em>.
|
|||
|
<em>px</em> might live at address <em>0x80010030</em>.
|
|||
|
The value of <em>px</em> would be <em>0x80010000</em>: the address of the variable <em>x</em>.
|
|||
|
|
|||
|
<p>
|
|||
|
C allows you to bundle together related variables into data structures.
|
|||
|
For example,
|
|||
|
|
|||
|
<pre>
|
|||
|
struct {
|
|||
|
int i ;
|
|||
|
char b ;
|
|||
|
} my_struct ;
|
|||
|
</pre>
|
|||
|
<p>
|
|||
|
is a data structure called <tt>my_struct</tt> which contains two elements,
|
|||
|
an integer (32 bits of data storage) called <tt>i</tt> and a character (8 bits
|
|||
|
of data) called <tt>b</tt>.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.1.3">2.1.3 </A> Linkers</H3>
|
|||
|
|
|||
|
<p>
|
|||
|
Linkers are programs that link together several object modules and libraries
|
|||
|
to form a single, coherent, program.
|
|||
|
Object modules are the machine code output from an assembler or compiler and contain
|
|||
|
executable machine code and data together with information that allows the linker
|
|||
|
to combine the modules together to form a program.
|
|||
|
For example one module might contain all of a program's database functions and
|
|||
|
another module its command line argument handling functions.
|
|||
|
Linkers fix up references between these object modules, where a routine or data
|
|||
|
structure referenced in one module actually exists in another module.
|
|||
|
The Linux kernel is a single, large program linked together from its
|
|||
|
many constituent object modules.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H2><A NAME="tth_sEc2.2">2.2 </A> What is an Operating System?</H2>
|
|||
|
|
|||
|
<p>
|
|||
|
Without software a computer is just a pile of electronics that gives off heat.
|
|||
|
If the hardware is the heart of a computer then the software is its soul.
|
|||
|
An operating system is a collection of system programs which allow the user to
|
|||
|
run application software.
|
|||
|
The operating system abstracts the real hardware of the system and presents
|
|||
|
the system's users and its applications with a virtual machine.
|
|||
|
In a very real sense the software provides
|
|||
|
the character of the system. Most PCs can run one or more operating
|
|||
|
systems and each one can have a very different look and feel.
|
|||
|
Linux is made up of a number of functionally
|
|||
|
separate pieces that, together, comprise the operating system. One
|
|||
|
obvious part of Linux is the kernel itself; but even that would be useless
|
|||
|
without libraries or shells.
|
|||
|
|
|||
|
<p>
|
|||
|
In order to start understanding what an operating system is,
|
|||
|
consider what happens when you type an apparently simple command:
|
|||
|
|
|||
|
<pre>
|
|||
|
|
|||
|
$ ls
|
|||
|
Mail c images perl
|
|||
|
docs tcl
|
|||
|
$
|
|||
|
|
|||
|
</pre>
|
|||
|
<p>
|
|||
|
The $ is a prompt put out by a login shell (in this case <tt>bash</tt>).
|
|||
|
This means that it is waiting for you, the user, to type some command. Typing
|
|||
|
<font face="helvetica">ls</font> causes the keyboard driver to recognize that characters have
|
|||
|
been typed. The keyboard driver
|
|||
|
passes them to the shell which processes that command
|
|||
|
by looking for an executable image of the same name.
|
|||
|
It finds that image,
|
|||
|
in <tt>/bin/ls</tt>. Kernel services are called to pull the <font face="helvetica">ls</font>
|
|||
|
executable image into virtual memory and start executing it. The <font face="helvetica">ls</font>
|
|||
|
image makes calls to the file subsystem of the kernel to find out what files
|
|||
|
are available.
|
|||
|
The filesystem might make use of cached filesystem
|
|||
|
information or use the disk device
|
|||
|
driver to read this information from the disk.
|
|||
|
It might even cause a network driver to exchange information with
|
|||
|
a remote machine to find out details of remote files that this system has
|
|||
|
access to (filesystems can be remotely mounted via the Networked File System or NFS).
|
|||
|
Whichever way the information is located, <font face="helvetica">ls</font>
|
|||
|
writes that information out and the video driver displays it on the
|
|||
|
screen.
|
|||
|
|
|||
|
<p>
|
|||
|
All of the above seems rather complicated but it shows that even
|
|||
|
most simple commands reveal that
|
|||
|
an operating system is in fact a co-operating set of functions that
|
|||
|
together give you, the user, a coherent view of the system.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.2.1">2.2.1 </A> Memory management</H3>
|
|||
|
|
|||
|
<p>
|
|||
|
With infinite resources, for example memory, many of the
|
|||
|
things that an operating system has to do would be redundant.
|
|||
|
One of the basic tricks of any operating system is the ability to make a small amount
|
|||
|
of physical memory behave like rather more memory.
|
|||
|
This apparently large memory is known as virtual memory.
|
|||
|
The idea is that the software running in the system
|
|||
|
is fooled into believing that it is running in a lot of memory.
|
|||
|
The system divides the memory into easily handled pages and swaps these pages onto a
|
|||
|
hard disk as the system runs.
|
|||
|
The software does not notice because of another trick, multi-processing.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.2.2">2.2.2 </A> Processes</H3>
|
|||
|
|
|||
|
<p>
|
|||
|
A process could be thought of as a program in action, each process is a separate entity that is
|
|||
|
running a particular program.
|
|||
|
If you look at the processes on your Linux system, you will see that
|
|||
|
there are rather a lot. For example, typing <font face="helvetica">ps</font> shows the
|
|||
|
following processes on my system:
|
|||
|
|
|||
|
<pre>
|
|||
|
$ ps
|
|||
|
PID TTY STAT TIME COMMAND
|
|||
|
158 pRe 1 0:00 -bash
|
|||
|
174 pRe 1 0:00 sh /usr/X11R6/bin/startx
|
|||
|
175 pRe 1 0:00 xinit /usr/X11R6/lib/X11/xinit/xinitrc --
|
|||
|
178 pRe 1 N 0:00 bowman
|
|||
|
182 pRe 1 N 0:01 rxvt -geometry 120x35 -fg white -bg black
|
|||
|
184 pRe 1 < 0:00 xclock -bg grey -geometry -1500-1500 -padding 0
|
|||
|
185 pRe 1 < 0:00 xload -bg grey -geometry -0-0 -label xload
|
|||
|
187 pp6 1 9:26 /bin/bash
|
|||
|
202 pRe 1 N 0:00 rxvt -geometry 120x35 -fg white -bg black
|
|||
|
203 ppc 2 0:00 /bin/bash
|
|||
|
1796 pRe 1 N 0:00 rxvt -geometry 120x35 -fg white -bg black
|
|||
|
1797 v06 1 0:00 /bin/bash
|
|||
|
3056 pp6 3 < 0:02 emacs intro/introduction.tex
|
|||
|
3270 pp6 3 0:00 ps
|
|||
|
$
|
|||
|
</pre>
|
|||
|
<p>
|
|||
|
If my system had many CPUs then each process could (theoretically at least)
|
|||
|
run on a different CPU. Unfortunately, there is only one so again the
|
|||
|
operating system resorts to trickery by running each process in turn
|
|||
|
for a short period.
|
|||
|
This period of time is known as a time-slice.
|
|||
|
This trick is known as multi-processing or scheduling and it fools each
|
|||
|
process into thinking that it is the only process.
|
|||
|
Processes are protected from one another so that if one process crashes or malfunctions then
|
|||
|
it will not affect any others.
|
|||
|
The operating system achieves this by giving each process a separate address
|
|||
|
space which only they have access to.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.2.3">2.2.3 </A> Device drivers</H3>
|
|||
|
|
|||
|
<p>
|
|||
|
Device drivers make up the major part of the Linux kernel. Like other
|
|||
|
parts of the operating system, they operate in a highly privileged
|
|||
|
environment and can cause disaster if they get things wrong.
|
|||
|
Device drivers control the interaction between the operating system
|
|||
|
and the hardware device that they are controlling. For example,
|
|||
|
the filesystem makes use of a general block device interface when writing
|
|||
|
blocks to an IDE disk. The driver takes care of the details and makes
|
|||
|
device specific things happen.
|
|||
|
Device drivers are specific to the controller chip that they are driving
|
|||
|
which is why, for example, you need the NCR810 SCSI driver if your system
|
|||
|
has an NCR810 SCSI controller.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.2.4">2.2.4 </A> The Filesystems</H3>
|
|||
|
|
|||
|
<p>
|
|||
|
In Linux, as it is for Unix <sup><font size=-4><tt>T</tt>M</font></sup> , the separate filesystems that the system may use are
|
|||
|
not accessed by device identifiers (such as a drive number or a drive name) but instead
|
|||
|
they are combined into a single hierarchical tree structure that represents the
|
|||
|
filesystem as a single entity.
|
|||
|
Linux adds each new filesystem into this single filesystem tree as they are mounted
|
|||
|
onto a mount directory, for example <tt>/mnt/cdrom</tt>.
|
|||
|
One of the most important features of Linux is its support for many different filesystems.
|
|||
|
This makes it very flexible and well able to coexist with other operating systems.
|
|||
|
The most popular filesystem for Linux is the <tt>EXT2</tt> filesystem and this is the
|
|||
|
filesystem supported by most of the Linux distributions.
|
|||
|
|
|||
|
<p>
|
|||
|
A filesystem gives the user a sensible view of files and directories held on the
|
|||
|
hard disks of the system regardless of the filesystem type or the characteristics
|
|||
|
of the underlying physical device.
|
|||
|
Linux transparently supports many different filesystems
|
|||
|
(for example <tt>MS-DOS</tt> and <tt>EXT2</tt>)
|
|||
|
and presents all of the mounted files and filesystems as one integrated virtual
|
|||
|
filesystem.
|
|||
|
So, in general, users and processes do not need to know what sort of filesystem that any
|
|||
|
file is part of, they just use them.
|
|||
|
|
|||
|
<p>
|
|||
|
The block device drivers hide the differences between the physical block device
|
|||
|
types (for example, <tt>IDE</tt> and <tt>SCSI</tt>) and, so far as each filesystem is
|
|||
|
concerned, the physical devices are just linear collections of blocks of data.
|
|||
|
The block sizes may vary between devices, for example 512 bytes is common for
|
|||
|
floppy devices whereas 1024 bytes is common for IDE devices and, again, this
|
|||
|
is hidden from the users of the system.
|
|||
|
An <tt>EXT2</tt> filesystem looks the same no matter what device holds it.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H2><A NAME="tth_sEc2.3">2.3 </A> Kernel Data Structures</H2>
|
|||
|
The operating system must keep a lot of information about the current state of the
|
|||
|
system.
|
|||
|
As things happen within the system these data structures must be changed to reflect
|
|||
|
the current reality.
|
|||
|
For example, a new process might be created when a user logs onto the system.
|
|||
|
The kernel must create a data structure representing the new process and
|
|||
|
link it with the data structures representing all of the other processes in the system.
|
|||
|
|
|||
|
<p>
|
|||
|
Mostly these data structures exist in physical memory and are accessible only by
|
|||
|
the kernel and its subsystems.
|
|||
|
Data structures contain data and pointers; addresses of other data structures or the
|
|||
|
addresses of routines. Taken all together, the data structures used by the Linux
|
|||
|
kernel can look very confusing.
|
|||
|
Every data structure has a purpose and although some are used by several kernel
|
|||
|
subsystems, they are more simple than they appear at first sight.
|
|||
|
|
|||
|
<p>
|
|||
|
Understanding the Linux kernel hinges on understanding its data structures and the
|
|||
|
use that the various functions within the Linux kernel makes of them.
|
|||
|
This book bases its description of the Linux kernel on its data structures.
|
|||
|
It talks about each kernel subsystem in terms of its algorithms, its methods of getting
|
|||
|
things done, and their usage of the kernel's data structures.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.3.1">2.3.1 </A> Linked Lists</H3>
|
|||
|
Linux uses a number of software engineering techniques to link together its data structures.
|
|||
|
On a lot of occasions it uses <em>linked</em> or <em>chained</em> data structures.
|
|||
|
If each data structure describes a single instance or occurance of something, for example a process
|
|||
|
or a network device, the kernel must be able to find all of the instances.
|
|||
|
In a linked list a root pointer contains the address of the first data structure, or <em>element</em>,
|
|||
|
in the list and each data structure contains a pointer to the next element in the list.
|
|||
|
The last element's next pointer would be 0 or NULL to show that it is the end of the list.
|
|||
|
In a <em>doubly linked</em> list each element contains both a pointer to the next element in the
|
|||
|
list but also a pointer to the previous element in the list.
|
|||
|
Using doubly linked lists makes it easier to add or remove elements from the middle of list although
|
|||
|
you do need more memory accesses.
|
|||
|
This is a typical operating system trade off: memory accesses versus CPU cycles.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.3.2">2.3.2 </A> Hash Tables</H3>
|
|||
|
Linked lists are handy ways of tying data structures together but navigating linked lists
|
|||
|
can be inefficient.
|
|||
|
If you were searching for a particular element, you might easily have to look at the whole
|
|||
|
list before you find the one that you need.
|
|||
|
Linux uses another technique, <em>hashing</em> to get around this restriction.
|
|||
|
A <em>hash table</em> is an <em>array</em> or <em>vector</em> of pointers.
|
|||
|
An array, or vector, is simply a set of things coming one after another in memory.
|
|||
|
A bookshelf could be said to be an array of books.
|
|||
|
Arrays are accessed by an <em>index</em>, the index is an offset into the array.
|
|||
|
Taking the bookshelf analogy a little further, you could describe each book by its position
|
|||
|
on the shelf; you might ask for the 5th book.
|
|||
|
|
|||
|
<p>
|
|||
|
A hash table is an array of pointers to data structures and its index is derived from information
|
|||
|
in those data structures.
|
|||
|
If you had data structures describing the population of a village then you could use a person's
|
|||
|
age as an index.
|
|||
|
To find a particular person's data you could use their age as an index into the population hash
|
|||
|
table and then follow the pointer to the data structure containing the person's details.
|
|||
|
Unfortunately many people in the village are likely to have the same age and so the hash
|
|||
|
table pointer becomes a pointer to a chain or list of data structures each describing people
|
|||
|
of the same age.
|
|||
|
However, searching these shorter chains is still faster than searching all of the data
|
|||
|
structures.
|
|||
|
|
|||
|
<p>
|
|||
|
As a hash table speeds up access to commonly used data structures, Linux often uses hash tables
|
|||
|
to implement <em>caches</em>.
|
|||
|
Caches are handy information that needs to be accessed quickly and are usually a subset
|
|||
|
of the full set of information available.
|
|||
|
Data structures are put into a cache and kept there because the kernel often
|
|||
|
accesses them.
|
|||
|
There is a drawback to caches in that they are more complex to use and maintain than simple
|
|||
|
linked lists or hash tables.
|
|||
|
If the data structure can be found in the cache (this is known as a <em>cache hit</em>,
|
|||
|
then all well and good.
|
|||
|
If it cannot then all of the relevant data structures must be searched and, if the
|
|||
|
data structure exists at all, it must be added into the cache.
|
|||
|
In adding new data structures into the cache an old cache entry may need discarding.
|
|||
|
Linux must decide which one to discard, the danger being that the discarded data structure
|
|||
|
may be the next one that Linux needs.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<H3><A NAME="tth_sEc2.3.3">2.3.3 </A> Abstract Interfaces</H3>
|
|||
|
The Linux kernel often abstracts its interfaces.
|
|||
|
An interface is a collection of routines and data structures which operate in a particular
|
|||
|
way.
|
|||
|
For example all network device drivers have to provide certain routines in which particular
|
|||
|
data structures are operated on.
|
|||
|
This way there can be generic layers of code using the services (interfaces) of lower layers
|
|||
|
of specific code.
|
|||
|
The network layer is generic and it is supported by device specific code that conforms to a
|
|||
|
standard interface.
|
|||
|
|
|||
|
<p>
|
|||
|
Often these lower layers <em>register</em> themselves with the upper layer at boot time.
|
|||
|
This registration usually involves adding a data structure to a linked list.
|
|||
|
For example each filesystem built into the kernel registers itself with the kernel
|
|||
|
at boot time or, if you are using modules, when the filesystem is first used.
|
|||
|
You can see which filesystems have registered themselves by looking at the file <tt>/proc/filesystems</tt>.
|
|||
|
The registration data structure often includes pointers to functions.
|
|||
|
These are the addresses of software functions that perform particular tasks.
|
|||
|
Again, using filesystem registration as an example, the data structure that each filesystem passes
|
|||
|
to the Linux kernel as it registers includes the address of a filesystem specfic routine which must be
|
|||
|
called whenever that filesystem is mounted.
|
|||
|
|
|||
|
<p>
|
|||
|
|
|||
|
<p><hr><small>File translated from T<sub><font size=-1>E</font></sub>X by <a href="http://hutchinson.belmont.ma.us/tth/tth.html">T<sub><font size=-1>T</font></sub>H</a>, version 1.0.</small>
|
|||
|
<hr>
|
|||
|
<center>
|
|||
|
<A HREF="../basics/sw.html"> Top of Chapter</A>,
|
|||
|
<A HREF="../tlk-toc.html"> Table of Contents</A>,
|
|||
|
<A href="../tlk.html" target="_top"> Show Frames</A>,
|
|||
|
<A href="../basics/sw.html" target="_top"> No Frames</A><br>
|
|||
|
<EFBFBD> 1996-1999 David A Rusling <A HREF="../misc/copyright.html">copyright notice</a>.
|
|||
|
</center>
|
|||
|
</HTML>
|