398 lines
17 KiB
HTML
398 lines
17 KiB
HTML
<HTML>
|
||
<center>
|
||
<A HREF="../tlk-toc.html"> Table of Contents</A>,
|
||
<A href="../tlk.html" target="_top"> Show Frames</A>,
|
||
<A href="../kernel/kernel.html" target="_top"> No Frames</A>
|
||
</center>
|
||
<hr>
|
||
<META NAME="TtH" CONTENT="1.03">
|
||
|
||
<p>
|
||
<H1><A NAME="tth_chAp11">Chapter 11 <br>Kernel Mechanisms</H1>
|
||
<A NAME="kernel-chapter"></A>
|
||
<p>
|
||
<img src="../logos/sit3-bw-tran.1.gif"><br> <tt><b></tt></b> This chapter describes some of the general tasks
|
||
and mechanisms that the Linux kernel needs to supply so that other
|
||
parts of the kernel work effectively together.
|
||
|
||
<p>
|
||
|
||
<H2><A NAME="tth_sEc11.1">11.1 </A> Bottom Half Handling</H2>
|
||
<A NAME="kernel-bh-section"></A>
|
||
<p>
|
||
|
||
<p><A NAME="tth_fIg11.1"></A>
|
||
<center><center> <img src="bh.gif"><br>
|
||
<p>
|
||
</center></center><center> Figure 11.1: Bottom Half Handling Data Structures</center>
|
||
<A NAME="bh-figure"></A>
|
||
<p>
|
||
<p>There are often times in a kernel when you do not want to do work at this moment.
|
||
A good example of this is during interrupt processing.
|
||
When the interrupt was asserted, the processor stopped what it was doing
|
||
and the operating system delivered the interrupt to the appropriate
|
||
device driver.
|
||
Device drivers should not spend too much time handling interrupts as, during
|
||
this time, nothing else in the system can run.
|
||
There is often some work that could just as well be done later on.
|
||
Linux's bottom half handlers were invented so that device drivers and other
|
||
parts of the Linux kernel could queue work to be done later on.
|
||
Figure <A href="#bh-figure"
|
||
> 11.1</A> shows the kernel data structures associated with bottom
|
||
half handling.
|
||
|
||
<p>
|
||
There can be up to 32 different bottom half handlers; <tt>bh_base</tt> is a vector
|
||
of pointers to each of the kernel's bottom half handling routines.
|
||
<tt>bh_active</tt> and <tt>bh_mask</tt>
|
||
have their bits set according to what handlers have been installed and are active.
|
||
If bit N of <tt>bh_mask</tt> is set then the Nth element of
|
||
<tt>bh_base</tt> contains the address of a bottom half routine.
|
||
If bit N of <tt>bh_active</tt> is set then the N'th bottom half handler routine
|
||
should be called as soon as the scheduler deems reasonable.
|
||
These indices are statically defined; the timer bottom half handler is the highest
|
||
priority (index 0), the console bottom half handler is next in priority (index 1) and
|
||
so on.
|
||
Typically the bottom half handling routines have lists of tasks associated with
|
||
them.
|
||
For example, the <em>immediate</em> bottom half handler works its way through the immediate
|
||
tasks queue (<tt>tq_immediate</tt>) which contains tasks that need to be performed
|
||
immediately.
|
||
|
||
<p>
|
||
Some of the kernel's bottom half handers are device specific, but others are more
|
||
generic:
|
||
|
||
<DL compact>
|
||
<p>
|
||
<dt><b>TIMER</b></dt><dd> This handler is marked as active each time the system's periodic
|
||
timer interrupts and is used to drive the kernel's timer queue mechanisms,
|
||
<dt><b>CONSOLE</b></dt><dd> This handler is used to process console messages,
|
||
<dt><b>TQUEUE</b></dt><dd> This handler is used to process <tt>tty</tt> messages,
|
||
<dt><b>NET</b></dt><dd> This handler handles general network processing,
|
||
<dt><b>IMMEDIATE</b></dt><dd> This is a generic handler used by several device drivers
|
||
to queue work to be done later.
|
||
</DL>
|
||
<p>
|
||
Whenever a device driver, or some other part of the kernel, needs to schedule work
|
||
to be done later, it adds work to the appropriate system queue, for example the timer
|
||
queue, and then signals the kernel that some bottom half handling needs to be
|
||
done.
|
||
It does this by setting the appropriate bit in <tt>bh_active</tt>.
|
||
Bit 8 is set if the driver has queued something on the immediate queue and wishes the
|
||
immediate bottom half handler to run and process it.
|
||
The <tt>bh_active</tt> bitmask is checked at the end of each system call, just before
|
||
control is returned to the calling process.
|
||
If it has any bits set, the bottom half handler routines that are active are called.
|
||
Bit 0 is checked first, then 1 and so on until bit 31.
|
||
|
||
<p>
|
||
The bit in <tt>bh_active</tt> is cleared as each bottom half handling routine
|
||
is called.
|
||
<tt>bh_active</tt> is transient; it only has meaning between calls to the scheduler and
|
||
is a way of not calling bottom half handling routines when there is no work for them to
|
||
do.
|
||
|
||
<p>
|
||
|
||
<H2><A NAME="tth_sEc11.2">11.2 </A> Task Queues</H2>
|
||
<A NAME="kernel-task-queue-section"></A>
|
||
<p>
|
||
|
||
<p><A NAME="tth_fIg11.2"></A>
|
||
<center><center> <img src="tqueue.gif"><br>
|
||
<p>
|
||
</center></center><center> Figure 11.2: A Task Queue</center>
|
||
<A NAME="tq-figure"></A>
|
||
<p>
|
||
<p>Task queues are the kernel's way of deferring work until later.
|
||
Linux has a generic mechanism for queuing work on queues and for processing
|
||
them later.
|
||
|
||
<p>
|
||
Task queues are often used in conjunction with bottom half handlers; the timer task queue
|
||
is processed when the timer queue bottom half handler runs.
|
||
A task queue is a simple data structure, see figure <A href="#tq-figure"
|
||
> 11.2</A> which consists
|
||
of a singly linked list of <tt>tq_struct</tt> data structures each of which contains
|
||
the address of a routine and a pointer to some data.
|
||
|
||
<p>
|
||
The routine will be called when the element on the task queue is processed and it will
|
||
be passed a pointer to the data.
|
||
|
||
<p>
|
||
Anything in the kernel, for example a device driver, can create and use task
|
||
queues but there are three task queues created and managed by the kernel:
|
||
|
||
<DL compact>
|
||
<dt><b>timer</b></dt><dd> This queue is used to queue work that will be done as soon after
|
||
the next system clock tick as is possible. Each clock tick, this queue
|
||
is checked to see if it contains any entries and, if it does, the timer
|
||
queue bottom half handler is made active. The timer queue bottom half
|
||
handler is processed, along with all the other bottom half handlers,
|
||
when the scheduler next runs.
|
||
This queue should not be confused with system timers, which are a much
|
||
more sophisticated mechanism.
|
||
<dt><b>immediate</b></dt><dd> This queue is also processed when the scheduler processes the
|
||
active bottom half handlers. The immediate bottom half handler is not
|
||
as high in priority as the timer queue bottom half handler and so these
|
||
tasks will be run later.
|
||
<dt><b>scheduler</b></dt><dd> This task queue is processed directly by the scheduler.
|
||
It is used to support other task queues in the system and, in this case,
|
||
the task to be run will be a routine that processes a task queue, say
|
||
for a device driver.
|
||
</DL>
|
||
<p>
|
||
When task queues are processed, the pointer to the first element in the queue is
|
||
removed from the queue and replaced with a null pointer.
|
||
In fact, this removal is an atomic operation, one that cannot be interrupted.
|
||
Then each element in the queue has its handling routine called in turn.
|
||
The elements in the queue are often statically allocated data.
|
||
However there is no inherent mechanism for discarding allocated memory.
|
||
The task queue processing routine simply moves onto the next element in the list.
|
||
It is the job of the task itself to ensure that it properly cleans up any allocated
|
||
kernel memory.
|
||
|
||
<p>
|
||
|
||
<H2><A NAME="tth_sEc11.3">11.3 </A> Timers</H2>
|
||
<A NAME="kernel-timers-section"></A>
|
||
<p>
|
||
|
||
<p><A NAME="tth_fIg11.3"></A>
|
||
<center><center> <img src="timers.gif"><br>
|
||
<p>
|
||
</center></center><center> Figure 11.3: System Timers</center>
|
||
<A NAME="timer-figure"></A>
|
||
<p>
|
||
<p>An operating system needs to be able to schedule an activity sometime in the
|
||
future.
|
||
A mechanism is needed whereby activities can be scheduled to run at some
|
||
relatively precise time.
|
||
Any microprocessor that wishes to support an operating system must have a programmable
|
||
interval timer that periodically interrupts the processor.
|
||
This periodic interrupt is known as a system clock tick and it acts like a
|
||
metronome, orchestrating the system's activities.
|
||
|
||
<p>
|
||
Linux has a very simple view of what time it is; it measures time in clock ticks
|
||
since the system booted. All system times are based on this measurement, which is
|
||
known as <tt>jiffies</tt> after the globally available variable of the same name.
|
||
|
||
<p>
|
||
Linux has two types of system timers, both queue routines to be called at some
|
||
system time but they are slightly different in their implementations.
|
||
Figure <A href="#timer-figure"
|
||
> 11.3</A> shows both mechanisms.
|
||
|
||
<p>
|
||
The first, the old timer mechanism, has a static array of 32 pointers to
|
||
<tt>timer_struct</tt> data structures and a mask of active timers, <tt>timer_active</tt>.
|
||
|
||
<p>
|
||
Where the timers go in the timer table is statically defined (rather like the bottom
|
||
half handler table <tt>bh_base</tt>).
|
||
Entries are added into this table mostly at system initialization time.
|
||
The second, newer, mechanism uses a linked list of <tt>timer_list</tt> data structures
|
||
held in ascending expiry time order.
|
||
|
||
<p>
|
||
Both methods use the time in <tt>jiffies</tt> as an expiry time so that a timer that
|
||
wished to run in 5s would have to convert 5s to units of <tt>jiffies</tt> and add that
|
||
to the current system time to get the system time in <tt>jiffies</tt> when the timer
|
||
should expire.
|
||
Every system clock tick the timer bottom half handler is marked as active so that
|
||
the when the scheduler next runs, the timer queues will be processed.
|
||
The timer bottom half handler
|
||
processes
|
||
both types of system timer.
|
||
For the old system timers the <tt>timer_active</tt> bit mask is check for bits that are
|
||
set.
|
||
|
||
<p>
|
||
If the expiry time for an active timer has expired (expiry time is less than
|
||
the current system <tt>jiffies</tt>), its timer routine is called
|
||
and its active bit is cleared.
|
||
For new system timers, the entries in the linked list of <tt>timer_list</tt> data
|
||
structures are checked.
|
||
|
||
<p>
|
||
|
||
Every expired timer is removed from the list and its routine is called.
|
||
The new timer mechanism has the advantage of being able to pass an argument
|
||
to the timer routine.
|
||
|
||
<p>
|
||
|
||
<H2><A NAME="tth_sEc11.4">11.4 </A> Wait Queues</H2>
|
||
<A NAME="wait-queues-section"></A>
|
||
<p>
|
||
There are many times when a process must wait for a system resource.
|
||
For example a process may need the VFS inode describing a directory in the file system
|
||
and that inode may not be in the buffer cache.
|
||
In this case the process must wait for that inode to be fetched from the physical
|
||
media containing the file system before it can carry on.
|
||
|
||
<p>
|
||
|
||
<p><A NAME="tth_fIg11.4"></A> <center>
|
||
wait_queue <br>
|
||
|
||
|
||
|
||
<table border><tr><td>
|
||
<p>
|
||
|
||
*task
|
||
<tr><td>
|
||
|
||
*next</table>
|
||
|
||
|
||
<p>
|
||
</center><center> Figure 11.4: Wait Queue</center>
|
||
<A NAME="wait-queue-struct"></A>
|
||
<p>
|
||
<p>The Linux kernel uses a simple data structure, a wait queue (see figure <A href="#wait-queue-struct"
|
||
> 11.4</A>),
|
||
|
||
<p>
|
||
which consists of a pointer to the processes <tt>task_struct</tt> and a pointer to the
|
||
next element in the wait queue.
|
||
|
||
<p>
|
||
When processes are added to the end of a wait queue they can either be interruptible or
|
||
uninterruptible.
|
||
Interruptible processes may be interrupted by events such as timers expiring or
|
||
signals being delivered whilst they are waiting on a wait queue.
|
||
The waiting processes state will reflect this and either be <tt>INTERRUPTIBLE</tt> or
|
||
<tt>UNINTERRUPTIBLE</tt>.
|
||
As this process can not now continue to run, the scheduler is run and, when it selects
|
||
a new process to run, the waiting process will be suspended.
|
||
<a href="#tthFtNtAAB" name=tthFrefAAB><sup>1</sup></a>
|
||
|
||
<p>
|
||
When the wait queue is processed, the state of every process in the wait queue is set to
|
||
<tt>RUNNING</tt>.
|
||
If the process has been removed from the run queue, it is put back onto the run queue.
|
||
The next time the scheduler runs, the processes that are on the wait queue are now candidates
|
||
to be run as they are now no longer waiting.
|
||
When a process on the wait queue is scheduled the first thing that it will do is remove
|
||
itself from the wait queue.
|
||
Wait queues can be used to synchronize access to system resources and they are used by Linux in
|
||
its implementation of semaphores (see below).
|
||
|
||
<p>
|
||
|
||
<H2><A NAME="tth_sEc11.5">11.5 </A> Buzz Locks</H2>
|
||
|
||
<p>
|
||
These are better known as spin locks and they are a primitive way of protecting a data structure
|
||
or piece of code.
|
||
They only allow one process at a time to be within a critical region of code.
|
||
They are used in Linux to restrict access to fields in data structures, using a single integer field
|
||
as a lock.
|
||
Each process wishing to enter the region attempts to change the lock's initial value from 0 to 1.
|
||
If its current value is 1, the process tries again, spinning in a tight loop of code.
|
||
The access to the memory location holding the lock must be atomic, the action of reading its value, checking
|
||
that it is 0 and then changing it to 1 cannot be interrupted by any other process.
|
||
Most CPU architectures provide support for this via special instructions but you can also implement buzz locks
|
||
using uncached main memory.
|
||
|
||
<p>
|
||
When the owning process leaves the critical region of code it decrements the buzz lock, returning its value to
|
||
0.
|
||
Any processes spinning on the lock will now read it as 0,
|
||
the first one to do this will increment it to 1 and enter the critical region.
|
||
|
||
<p>
|
||
|
||
<H2><A NAME="tth_sEc11.6">11.6 </A> Semaphores</H2>
|
||
<A NAME="kernel-semaphore-section"></A>
|
||
<p>
|
||
Semaphores are used to protect critical regions of code or data structures.
|
||
Remember that each access of a critical piece of data such as a VFS inode describing
|
||
a directory is made by kernel code running on behalf of a process.
|
||
It would be very dangerous to allow one process to alter a critical data structure
|
||
that is being used by another process.
|
||
One way to achieve this would be to use a buzz lock around the critical piece
|
||
of data is being accessed but this is a simplistic approach that would not give
|
||
very good system performance.
|
||
Instead Linux uses semaphores to allow just one process at a time to access
|
||
critical regions of code and data; all other processes wishing to access this
|
||
resource will be made to wait until it becomes free.
|
||
The waiting processes are suspended, other processes in the system can continue
|
||
to run as normal.
|
||
|
||
<p>
|
||
A Linux <tt>semaphore</tt>
|
||
data structure contains the following information:
|
||
|
||
<DL compact>
|
||
<p>
|
||
<dt><b>count</b></dt><dd> This field keeps track of the count of processes wishing
|
||
to use this resource. A positive value means that the resource is
|
||
available. A negative or zero value means that processes are waiting
|
||
for it. An initial value of 1 means that one and only one process at
|
||
a time can use this resource. When processes want this resource they
|
||
decrement the count and when they have finished with this resource they
|
||
increment the count,
|
||
<dt><b>waking</b></dt><dd> This is the count of processes waiting for this resource which is
|
||
also the number of process waiting to be woken up when this resource
|
||
becomes free,
|
||
<dt><b>wait queue</b></dt><dd> When processes are waiting for this resource they are put
|
||
onto this wait queue,
|
||
<dt><b>lock</b></dt><dd> A buzz lock used when accessing the <tt>waking</tt> field.
|
||
</DL>
|
||
<p>
|
||
Suppose the initial count for a semaphore is 1, the first process to come along
|
||
will see that the count is positive and decrement it by 1, making it 0.
|
||
The process now ``owns'' the critical piece of code or resource that is being protected
|
||
by the semaphore.
|
||
When the process leaves the critical region it increments the semphore's count.
|
||
The most optimal case is where there are no other processes contending for ownership of
|
||
the critical region.
|
||
Linux has implemented semaphores to work efficiently for this, the most common, case.
|
||
|
||
<p>
|
||
If another process wishes to enter the critical region whilst it is owned by a
|
||
process it too will decrement the count.
|
||
As the count is now negative (-1) the process cannot enter the critical region.
|
||
Instead it must wait until the owning process exits it.
|
||
Linux makes the waiting process sleep until the owning process wakes it on exiting
|
||
the critical region.
|
||
The waiting process adds itself to the semaphore's wait queue and sits in a loop
|
||
checking the value of the <tt>waking</tt> field and calling the scheduler until
|
||
<tt>waking</tt> is non-zero.
|
||
|
||
<p>
|
||
The owner of the critical region increments the semaphore's count and if it is less
|
||
than or equal to zero then there are processes sleeping, waiting for this resource.
|
||
In the optimal case the semaphore's count would have been returned to its initial
|
||
value of 1 and no further work would be neccessary.
|
||
The owning process increments the waking counter and wakes up the process sleeping on
|
||
the semaphore's wait queue.
|
||
When the waiting process wakes up, the waking counter is now 1 and it knows that it
|
||
it may now enter the critical region.
|
||
It decrements the waking counter, returning it to a value of zero, and continues.
|
||
All access to the waking field of semaphore are protected by a buzz lock using the semaphore's
|
||
lock.
|
||
|
||
<p>
|
||
<hr><H3>Footnotes:</H3>
|
||
|
||
<p><a name=tthFtNtAAB></a><a href="#tthFrefAAB"><sup>1</sup></a>
|
||
<font face="helvetica">REVIEW NOTE:</font> <em>What is to stop a task in state <tt>INTERRUPTIBLE</tt> being made
|
||
to run the next time the scheduler runs? Processes in a wait queue should never
|
||
run until they are woken up.</em>
|
||
<p><hr><small>File translated from T<sub><font size=-1>E</font></sub>X by <a href="http://hutchinson.belmont.ma.us/tth/tth.html">T<sub><font size=-1>T</font></sub>H</a>, version 1.0.</small>
|
||
<hr>
|
||
<center>
|
||
<A HREF="../kernel/kernel.html"> Top of Chapter</A>,
|
||
<A HREF="../tlk-toc.html"> Table of Contents</A>,
|
||
<A href="../tlk.html" target="_top"> Show Frames</A>,
|
||
<A href="../kernel/kernel.html" target="_top"> No Frames</A><br>
|
||
<EFBFBD> 1996-1999 David A Rusling <A HREF="../misc/copyright.html">copyright notice</a>.
|
||
</center>
|
||
</HTML> |