This commit is contained in:
gferg 2003-03-26 00:38:22 +00:00
parent 5befdbbb0f
commit 80e714b3ab
12 changed files with 476 additions and 712 deletions

View File

@ -28,6 +28,9 @@
Linus, and probably never will, but he has made a profound difference in my
life.</para>
<para>The following people have written to me with corrections or good
suggestions: Ignacio Martin and David Porter</para>
</sect1>

View File

@ -301,7 +301,7 @@ int register_chrdev(unsigned int major, const char *name,
{
Major = register_chrdev(0, DEVICE_NAME, &fops);
if (Major > 0) {
if (Major < 0) {
printk ("Registering the character device failed with %d\n", Major);
return Major;
}

View File

@ -1,24 +1,18 @@
<sect1><title>Using /proc For Input</title>
<!-- \label{proc-input
\index{Input\\using /proc for}
\index{/proc\\using for input}
\index{proc\\using for input} -->
<indexterm><primary>input</primary><secondary>using /proc for</secondary></indexterm>
<indexterm><primary>proc</primary><secondary>using for input</secondary></indexterm>
<para>So far we have two ways to generate output from kernel modules: we can
register a device driver and <command>mknod</command> a device file, or we
can create a <filename role="directory">/proc</filename> file. This allows
the kernel module to tell us anything it likes. The only problem is that
there is no way for us to talk back. The first way we'll send input to kernel
modules will be by writing back to the <filename
role="directory">/proc</filename> file.</para>
<para>So far we have two ways to generate output from kernel modules: we can register a device driver and
<command>mknod</command> a device file, or we can create a <filename role="directory">/proc</filename> file. This
allows the kernel module to tell us anything it likes. The only problem is that there is no way for us to talk
back. The first way we'll send input to kernel modules will be by writing back to the <filename
role="directory">/proc</filename> file.</para>
<para>Because the proc filesystem was written mainly to allow the kernel to
report its situation to processes, there are no special provisions for input.
The <varname>struct proc_dir_entry</varname> doesn't include a pointer to an
input function, the way it includes a pointer to an output function. Instead,
to write into a <filename role="directory">/proc</filename> file, we need to
use the standard filesystem mechanism.</para>
<para>Because the proc filesystem was written mainly to allow the kernel to report its situation to processes,
there are no special provisions for input. The <varname>struct proc_dir_entry</varname> doesn't include a pointer
to an input function, the way it includes a pointer to an output function. Instead, to write into a <filename
role="directory">/proc</filename> file, we need to use the standard filesystem mechanism.</para>
<!-- \index{proc\_dir\_entry structure}
\index{struct proc\_dir\_entry} -->
@ -409,3 +403,6 @@ void cleanup_module()
</sect1>
<!--
vim: tw=116
-->

View File

@ -1,76 +1,66 @@
<sect1><title>Talking to Device Files (writes and IOCTLs)}</title>
<!-- \label{dev-input} \index{device files\\input to}
\index{input to device files} \index{ioctl}
\index{write\\to device files} -->
<indexterm><primary>ioctl</primary></indexterm>
<indexterm><primary>device files</primary><secondary>input to</secondary></indexterm>
<indexterm><primary>device files</primary><secondary>write to</secondary></indexterm>
<para>Device files are supposed to represent physical devices. Most physical
devices are used for output as well as input, so there has to be some
mechanism for device drivers in the kernel to get the output to send to
the device from processes. This is done by opening the device file for
output and writing to it, just like writing to a file. In the following
example, this is implemented by <function>device_write</function>.</para>
<para>Device files are supposed to represent physical devices. Most
physical devices are used for output as well as input, so there has to be
some mechanism for device drivers in the kernel to get the output to send
to the device from processes. This is done by opening the device file for
output and writing to it, just like writing to a file. In the following
example, this is implemented by <function>device_write</function>.</para>
<para>This is not always enough. Imagine you had a serial port connected to a
modem (even if you have an internal modem, it is still implemented from the
CPU's perspective as a serial port connected to a modem, so you don't have to
tax your imagination too hard). The natural thing to do would be to use the
device file to write things to the modem (either modem commands or data to be
sent through the phone line) and read things from the modem (either responses
for commands or the data received through the phone line). However, this
leaves open the question of what to do when you need to talk to the serial
port itself, for example to send the rate at which data is sent and
received.</para>
<para>This is not always enough. Imagine you had a serial port connected to a modem (even if you have an internal
modem, it is still implemented from the CPU's perspective as a serial port connected to a modem, so you don't have
to tax your imagination too hard). The natural thing to do would be to use the device file to write things to the
modem (either modem commands or data to be sent through the phone line) and read things from the modem (either
responses for commands or the data received through the phone line). However, this leaves open the question of
what to do when you need to talk to the serial port itself, for example to send the rate at which data is sent and
received.</para>
<indexterm><primary>serial port</primary></indexterm>
<indexterm><primary>modem</primary></indexterm>
<indexterm><primary>serial port</primary></indexterm>
<indexterm><primary>modem</primary></indexterm>
<para>The answer in Unix is to use a special function called
<function>ioctl</function> (short for Input Output ConTroL). Every device can
have its own <function>ioctl</function> commands, which can be read
<function>ioctl</function>'s (to send information from a process to the
kernel), write <function>ioctl</function>'s (to return information to a
process), <footnote><para>Notice that here the roles of read and write are
reversed <emphasis>again</emphasis>, so in <function>ioctl</function>'s read
is to send information to the kernel and write is to receive information from
the kernel.</para></footnote> both or neither. The
<function>ioctl</function> function is called with three parameters: the file
descriptor of the appropriate device file, the ioctl number, and a parameter,
which is of type long so you can use a cast to use it to pass anything.
<footnote><para>This isn't exact. You won't be able to pass a structure, for
example, through an ioctl --- but you will be able to pass a pointer to the
structure.</para></footnote></para>
<para>The answer in Unix is to use a special function called <function>ioctl</function> (short for Input Output
ConTroL). Every device can have its own <function>ioctl</function> commands, which can be read
<function>ioctl</function>'s (to send information from a process to the kernel), write
<function>ioctl</function>'s (to return information to a process), <footnote><para>Notice that here the roles of
read and write are reversed <emphasis>again</emphasis>, so in <function>ioctl</function>'s read is to send
information to the kernel and write is to receive information from the kernel.</para></footnote> both or neither.
The <function>ioctl</function> function is called with three parameters: the file descriptor of the appropriate
device file, the ioctl number, and a parameter, which is of type long so you can use a cast to use it to pass
anything. <footnote><para>This isn't exact. You won't be able to pass a structure, for example, through an ioctl
--- but you will be able to pass a pointer to the structure.</para></footnote></para>
<para>The ioctl number encodes the major device number, the type of the
ioctl, the command, and the type of the parameter. This ioctl number is
usually created by a macro call (<varname>_IO</varname>,
<varname>_IOR</varname>, <varname>_IOW</varname> or <varname>_IOWR</varname>
--- depending on the type) in a header file. This header file should then be
included both by the programs which will use
<function>ioctl</function> (so they can generate the appropriate
<function>ioctl</function>'s) and by the kernel module (so it can understand
it). In the example below, the header file is <filename
class="headerfile">chardev.h</filename> and the program which uses it is
<function>ioctl.c</function>.</para>
<para>The ioctl number encodes the major device number, the type of the ioctl, the command, and the type of the
parameter. This ioctl number is usually created by a macro call (<varname>_IO</varname>, <varname>_IOR</varname>,
<varname>_IOW</varname> or <varname>_IOWR</varname> --- depending on the type) in a header file. This header file
should then be included both by the programs which will use <function>ioctl</function> (so they can generate the
appropriate <function>ioctl</function>'s) and by the kernel module (so it can understand it). In the example
below, the header file is <filename class="headerfile">chardev.h</filename> and the program which uses it is
<function>ioctl.c</function>.</para>
<!-- \index{\_IO} \index{\_IOR} \index{\_IOW} \index{\_IOWR} -->
<indexterm><primary>_IO</primary></indexterm>
<indexterm><primary>_IOR</primary></indexterm>
<indexterm><primary>)_IOW</primary></indexterm>
<indexterm><primary>_IOWR</primary></indexterm>
<para>If you want to use <function>ioctl</function>s in your own kernel
modules, it is best to receive an official <function>ioctl</function>
assignment, so if you accidentally get somebody else's
<function>ioctl</function>s, or if they get yours, you'll know something is
wrong. For more information, consult the kernel source tree at
<filename>Documentation/ioctl-number.txt</filename>.</para>
<para>If you want to use <function>ioctl</function>s in your own kernel modules, it is best to receive an official
<function>ioctl</function> assignment, so if you accidentally get somebody else's <function>ioctl</function>s, or
if they get yours, you'll know something is wrong. For more information, consult the kernel source tree at
<filename>Documentation/ioctl-number.txt</filename>.</para>
<!-- \index{official ioctl assignment} \index{ioctl\\official assignment} -->
<indexterm><primary>official ioctl assignment</primary></indexterm>
<indexterm><primary>ioctl</primary><secondary>official assignment</secondary></indexterm>
<!-- \index{chardev.c, source file}\index{source\\chardev.c} -->
<indexterm><primary>chardev.c</primary></indexterm>
<example><title>chardev.c</title>
<programlisting><![CDATA[
/* chardev.c
/* chardev.c
*
* Create an input/output character device
* Create an input/output character device
*/
@ -639,8 +629,11 @@ main()
close(file_desc);
}
]]></programlisting>
</example>
</sect1>
<!--
vim: tw=116
-->

View File

@ -1,127 +1,114 @@
<sect1><title>System Calls</title>
<!-- \label{sys-call} \index{system calls} \index{calls\\system} -->
<indexterm><primary>system calls</primary></indexterm>
<para>So far, the only thing we've done was to use well defined kernel
mechanisms to register <filename role="directory">/proc</filename> files and
device handlers. This is fine if you want to do something the kernel
programmers thought you'd want, such as write a device driver. But what if
you want to do something unusual, to change the behavior of the system in
some way? Then, you're mostly on your own.</para>
<para>So far, the only thing we've done was to use well defined kernel
mechanisms to register <filename role="directory">/proc</filename> files
and device handlers. This is fine if you want to do something the kernel
programmers thought you'd want, such as write a device driver. But what if
you want to do something unusual, to change the behavior of the system in
some way? Then, you're mostly on your own.</para>
<para>This is where kernel programming gets dangerous. While writing the
example below, I killed the <function>open()</function> system call. This
meant I couldn't open any files, I couldn't run any programs, and I couldn't
<command>shutdown</command> the computer. I had to pull the power switch.
Luckily, no files died. To ensure you won't lose any files either, please run
<command>sync</command> right before you do the <command>insmod</command> and
the <command>rmmod</command>.
<para>This is where kernel programming gets dangerous. While writing the
example below, I killed the <function>open()</function> system call. This
meant I couldn't open any files, I couldn't run any programs, and I
couldn't <command>shutdown</command> the computer. I had to pull the power
switch. Luckily, no files died. To ensure you won't lose any files either,
please run <command>sync</command> right before you do the
<command>insmod</command> and the <command>rmmod</command>.
<!-- \index{sync} \index{insmod} \index{rmmod} \index{shutdown} -->
<indexterm><primary>sync</primary></indexterm>
<indexterm><primary>insmod</primary></indexterm>
<indexterm><primary>rmmod</primary></indexterm>
<indexterm><primary>shutdown</primary></indexterm>
<para>Forget about <filename role="directory">/proc</filename> files, forget
about device files. They're just minor details. The <emphasis>real</emphasis>
process to kernel communication mechanism, the one used by all processes, is
system calls. When a process requests a service from the kernel (such as
opening a file, forking to a new process, or requesting more memory), this is
the mechanism used. If you want to change the behaviour of the kernel in
interesting ways, this is the place to do it. By the way, if you want to see
which system calls a program uses, run <command>strace
&lt;arguments&gt;</command>.</para> <!-- \index{strace} -->
<para>Forget about <filename role="directory">/proc</filename> files, forget about device files. They're just
minor details. The <emphasis>real</emphasis> process to kernel communication mechanism, the one used by all
processes, is system calls. When a process requests a service from the kernel (such as opening a file, forking to
a new process, or requesting more memory), this is the mechanism used. If you want to change the behaviour of the
kernel in interesting ways, this is the place to do it. By the way, if you want to see which system calls a
program uses, run <command>strace &lt;arguments&gt;</command>.</para>
<para>In general, a process is not supposed to be able to access the kernel. It
can't access kernel memory and it can't call kernel functions. The hardware
of the CPU enforces this (that's the reason why it's called `protected
mode').</para>
<indexterm><primary>strace</primary></indexterm>
<para>System calls are an exception to this general rule. What happens is that
the process fills the registers with the appropriate values and then calls
a special instruction which jumps to a previously defined location in the
kernel (of course, that location is readable by user processes, it is not
writable by them). Under Intel CPUs, this is done by means of interrupt 0x80.
The hardware knows that once you jump to this location, you are no longer
running in restricted user mode, but as the operating system kernel --- and
therefore you're allowed to do whatever you want.</para>
<!-- \index{interrupt 0x80} -->
<para>In general, a process is not supposed to be able to access the kernel. It can't access kernel memory and it
can't call kernel functions. The hardware of the CPU enforces this (that's the reason why it's called `protected
mode').</para>
<para>The location in the kernel a process can jump to is called
<emphasis>system_call</emphasis>. The procedure at that location checks the
system call number, which tells the kernel what service the process
requested. Then, it looks at the table of system calls
(<varname>sys_call_table</varname>) to see the address of the kernel function
to call. Then it calls the function, and after it returns, does a few system
checks and then return back to the process (or to a different process, if the
process time ran out). If you want to read this code, it's at the source file
<filename>arch/$<$architecture$>$/kernel/entry.S</filename>, after the line
<function>ENTRY(system_call)</function>.</para>
<indexterm><primary>interrupt 0x80</primary></indexterm>
<!-- \index{system\_call} \index{ENTRY(system\_call)} \index{sys\_call\_table}
\index{entry.S} -->
<para>System calls are an exception to this general rule. What happens is that the process fills the registers
with the appropriate values and then calls a special instruction which jumps to a previously defined location in
the kernel (of course, that location is readable by user processes, it is not writable by them). Under Intel CPUs,
this is done by means of interrupt 0x80. The hardware knows that once you jump to this location, you are no
longer running in restricted user mode, but as the operating system kernel --- and therefore you're allowed to do
whatever you want.</para>
<para>So, if we want to change the way a certain system call works, what we
need to do is to write our own function to implement it (usually by adding a
bit of our own code, and then calling the original function) and then change
the pointer at <varname>sys_call_table</varname> to point to our function.
Because we might be removed later and we don't want to leave the system in an
unstable state, it's important for <function>cleanup_module</function> to
restore the table to its original state.</para>
<para>The location in the kernel a process can jump to is called <emphasis>system_call</emphasis>. The procedure
at that location checks the system call number, which tells the kernel what service the process requested. Then,
it looks at the table of system calls (<varname>sys_call_table</varname>) to see the address of the kernel
function to call. Then it calls the function, and after it returns, does a few system checks and then return back
to the process (or to a different process, if the process time ran out). If you want to read this code, it's at
the source file <filename>arch/$<$architecture$>$/kernel/entry.S</filename>, after the line
<function>ENTRY(system_call)</function>.</para>
<para>The source code here is an example of such a kernel module. We want to
`spy' on a certain user, and to <function>printk()</function> a message
whenever that user opens a file. Towards this end, we replace the system call
to open a file with our own function, called
<function>our_sys_open</function>. This function checks the uid (user's id)
of the current process, and if it's equal to the uid we spy on, it calls
<function>printk()</function> to display the name of the file to be opened.
Then, either way, it calls the original <function>open()</function> function
with the same parameters, to actually open the file.</para>
<indexterm><primary>system call</primary></indexterm>
<indexterm><primary>ENTRY(system call)</primary></indexterm>
<indexterm><primary>sys_call_table</primary></indexterm>
<indexterm><primary>entry.S</primary></indexterm>
<!-- \index{open\\system call} -->
<para>So, if we want to change the way a certain system call works, what we need to do is to write our own
function to implement it (usually by adding a bit of our own code, and then calling the original function) and
then change the pointer at <varname>sys_call_table</varname> to point to our function. Because we might be
removed later and we don't want to leave the system in an unstable state, it's important for
<function>cleanup_module</function> to restore the table to its original state.</para>
<para>The <function>init_module</function> function replaces the appropriate
location in <varname>sys_call_table</varname> and keeps the original pointer
in a variable. The <function>cleanup_module</function> function uses that
variable to restore everything back to normal. This approach is dangerous,
because of the possibility of two kernel modules changing the same system
call. Imagine we have two kernel modules, A and B. A's open system call will
be A_open and B's will be B_open. Now, when A is inserted into the kernel,
the system call is replaced with A_open, which will call the original
sys_open when it's done. Next, B is inserted into the kernel, which replaces
the system call with B_open, which will call what it thinks is the original
system call, A_open, when it's done.</para>
<para>The source code here is an example of such a kernel module. We want to `spy' on a certain user, and to
<function>printk()</function> a message whenever that user opens a file. Towards this end, we replace the system
call to open a file with our own function, called <function>our_sys_open</function>. This function checks the uid
(user's id) of the current process, and if it's equal to the uid we spy on, it calls <function>printk()</function>
to display the name of the file to be opened. Then, either way, it calls the original <function>open()</function>
function with the same parameters, to actually open the file.</para>
<para>Now, if B is removed first, everything will be well --- it will simply
restore the system call to A_open, which calls the original. However, if A
is removed and then B is removed, the system will crash. A's removal will
restore the system call to the original, sys_open, cutting B out of the
loop. Then, when B is removed, it will restore the system call to what
<emphasis>it</emphasis> thinks is the original, A_open, which is no longer
in memory. At first glance, it appears we could solve this particular problem
by checking if the system call is equal to our open function and if so not
changing it at all (so that B won't change the system call when it's
removed), but that will cause an even worse problem. When A is removed, it
sees that the system call was changed to B_open so that it is no longer
pointing to A_open, so it won't restore it to sys_open before it is removed
from memory. Unfortunately, B_open will still try to call A_open which is
no longer there, so that even without removing B the system would
crash.</para>
<indexterm><primary>system call</primary><secondary>open</secondary></indexterm>
<para>I can think of two ways to prevent this problem. The first is to
restore the call to the original value, sys_open. Unfortunately, sys_open is
not part of the kernel system table in <filename>/proc/ksyms</filename>, so
we can't access it. The other solution is to use the reference count to
prevent root from <command>rmmod</command>'ing the module once it is loaded.
This is good for production modules, but bad for an educational sample ---
which is why I didn't do it here.</para>
<para>The <function>init_module</function> function replaces the appropriate location in
<varname>sys_call_table</varname> and keeps the original pointer in a variable. The
<function>cleanup_module</function> function uses that variable to restore everything back to normal. This
approach is dangerous, because of the possibility of two kernel modules changing the same system call. Imagine we
have two kernel modules, A and B. A's open system call will be A_open and B's will be B_open. Now, when A is
inserted into the kernel, the system call is replaced with A_open, which will call the original sys_open when it's
done. Next, B is inserted into the kernel, which replaces the system call with B_open, which will call what it
thinks is the original system call, A_open, when it's done.</para>
<!-- \index{rmmod}\index{MOD\_INC\_USE\_COUNT} \index{sys\_open} -->
<para>Now, if B is removed first, everything will be well---it will simply restore the system call to A_open,
which calls the original. However, if A is removed and then B is removed, the system will crash. A's removal will
restore the system call to the original, sys_open, cutting B out of the loop. Then, when B is removed, it will
restore the system call to what <emphasis>it</emphasis> thinks is the original, A_open, which is no longer in
memory. At first glance, it appears we could solve this particular problem by checking if the system call is equal
to our open function and if so not changing it at all (so that B won't change the system call when it's removed),
but that will cause an even worse problem. When A is removed, it sees that the system call was changed to B_open
so that it is no longer pointing to A_open, so it won't restore it to sys_open before it is removed from memory.
Unfortunately, B_open will still try to call A_open which is no longer there, so that even without removing B the
system would crash.</para>
<para>I can think of two ways to prevent this problem. The first is to restore the call to the original value,
sys_open. Unfortunately, sys_open is not part of the kernel system table in <filename>/proc/ksyms</filename>, so
we can't access it. The other solution is to use the reference count to prevent root from
<command>rmmod</command>'ing the module once it is loaded. This is good for production modules, but bad for an
educational sample --- which is why I didn't do it here.</para>
<indexterm><primary>rmmod</primary></indexterm>
<indexterm><primary>MOD_INC_USE_COUNT</primary></indexterm>
<indexterm><primary>sys_open</primary></indexterm>
<indexterm><primary>procfs.c</primary></indexterm>
<example><title>procfs.c</title>
<programlisting><![CDATA[
/* syscall.c
/* syscall.c
*
* System call "stealing" sample
* System call "stealing" sample.
*/
@ -304,3 +291,6 @@ void cleanup_module()
</sect1>
<!--
vim: tw=116
-->

View File

@ -1,151 +1,89 @@
<sect1><title>Blocking Processes</title>
<indexterm><primary>blocking processes</primary></indexterm>
<indexterm>
<primary>processes</primary>
<secondary>blocking</secondary>
</indexterm>
<indexterm><primary>processes</primary><secondary>blocking</secondary></indexterm>
<sect2><title>Replacing <function>printk</function></title>
<para>
What do you do when somebody asks you for something you can't do right
away? If you're a human being and you're bothered by a human being, the
only thing you can say is: <quote>Not right now, I'm busy. <emphasis>Go
away!</emphasis></quote>. But if you're a kernel module and you're
bothered by a process, you have another possibility. You can put the
process to sleep until you can service it. After all, processes are
being put to sleep by the kernel and woken up all the time (that's the
way multiple processes appear to run on the same time on a single
<acronym>CPU</acronym>).
</para>
<sect2><title>Replacing <function>printk</function></title>
<indexterm><primary>multi-tasking</primary></indexterm>
<indexterm><primary>busy</primary></indexterm>
<para>What do you do when somebody asks you for something you can't do right away? If you're a human being
and you're bothered by a human being, the only thing you can say is: <quote>Not right now, I'm busy.
<emphasis>Go away!</emphasis></quote>. But if you're a kernel module and you're bothered by a process, you
have another possibility. You can put the process to sleep until you can service it. After all, processes
are being put to sleep by the kernel and woken up all the time (that's the way multiple processes appear to
run on the same time on a single CPU).</para>
<para>
This kernel module is an example of this. The file (called
<filename>/proc/sleep</filename>) can only be opened by a single process
at a time. If the file is already open, the kernel module calls
<function>module_interruptible_sleep_on</function>.
<indexterm><primary>multi-tasking</primary></indexterm>
<indexterm><primary>busy</primary></indexterm>
<indexterm><primary>module_interruptible_sleep_on</primary></indexterm>
<indexterm><primary>interruptible_sleep_on</primary></indexterm>
<footnote>
<para>
The easiest way to keep a file open is to open it with <command>tail
-f</command>.
</para>
</footnote>
This function changes the status of the task (a task is the kernel data
structure which holds information about a process and the system call
it's in, if any) to <parameter>TASK_INTERRUPTIBLE</parameter>,
<indexterm><primary>TASK_INTERRUPTIBLE</primary></indexterm> which means
that the task will not run until it is woken up somehow, and adds it to
<structname>WaitQ</structname>, the queue of tasks waiting to access the
file. Then, the function calls the scheduler to context switch to a
different process, one which has some use for the <acronym>CPU</acronym>.
</para>
<indexterm><primary>putting processes to sleep</primary></indexterm>
<indexterm>
<primary>sleep</primary>
<secondary>putting processes to</secondary>
</indexterm>
<indexterm><primary>TASK_INTERRUPTIBLE</primary></indexterm>
<indexterm><primary>putting processes to sleep</primary></indexterm>
<indexterm><primary>sleep</primary><secondary>putting processes to</secondary></indexterm>
<indexterm><primary>waking up processes</primary></indexterm>
<indexterm><primary>processes</primary><secondary>waking up</secondary></indexterm>
<indexterm><primary>multitasking</primary></indexterm>
<indexterm><primary>scheduler</primary></indexterm>
<para>
When a process is done with the file, it closes it, and
<function>module_close</function> is called. That function wakes up all
the processes in the queue (there's no mechanism to only wake up one of
them). It then returns and the process which just closed the file can
continue to run. In time, the scheduler decides that that process has
had enough and gives control of the <acronym>CPU</acronym> to another
process. Eventually, one of the processes which was in the queue will be
given control of the <acronym>CPU</acronym> by the scheduler. It starts
at the point right after the call to
<function>module_interruptible_sleep_on</function>.
<footnote>
<para>
This means that the process is still in kernel mode -- as far as the
process is concerned, it issued the <function>open</function> system
call and the system call hasn't returned yet. The process doesn't
know somebody else used the <acronym>CPU</acronym> for most of the
time between the moment it issued the call and the moment it
returned.
</para>
</footnote>
It can then proceed to set a global variable to tell all the other
processes that the file is still open and go on with its life. When the
other processes get a piece of the <acronym>CPU</acronym>, they'll see
that global variable and go back to sleep.
</para>
<indexterm><primary>waking up processes</primary></indexterm>
<indexterm>
<primary>processes</primary>
<secondary>waking up</secondary>
</indexterm>
<indexterm><primary>multitasking</primary></indexterm>
<indexterm><primary>scheduler</primary></indexterm>
<para>This kernel module is an example of this. The file (called <filename>/proc/sleep</filename>) can only be
opened by a single process at a time. If the file is already open, the kernel module calls
<function>module_interruptible_sleep_on</function><footnote><para>The easiest way to keep a file open is to
open it with <command>tail -f</command>.</para></footnote>. This function changes the status of the task (a
task is the kernel data structure which holds information about a process and the system call it's in, if any)
to <parameter>TASK_INTERRUPTIBLE</parameter>, which means that the task will not run until it is woken up
somehow, and adds it to <structname>WaitQ</structname>, the queue of tasks waiting to access the file. Then,
the function calls the scheduler to context switch to a different process, one which has some use for the
CPU.</para>
<para>When a process is done with the file, it closes it, and <function>module_close</function> is called.
That function wakes up all the processes in the queue (there's no mechanism to only wake up one of them). It
then returns and the process which just closed the file can continue to run. In time, the scheduler decides
that that process has had enough and gives control of the CPU to another process. Eventually, one of the
processes which was in the queue will be given control of the CPU by the scheduler. It starts at the point
right after the call to <function>module_interruptible_sleep_on</function><footnote><para>This means that the
process is still in kernel mode -- as far as the process is concerned, it issued the <function>open</function>
system call and the system call hasn't returned yet. The process doesn't know somebody else used the CPU for
most of the time between the moment it issued the call and the moment it returned.</para></footnote>. It can
then proceed to set a global variable to tell all the other processes that the file is still open and go on
with its life. When the other processes get a piece of the CPU, they'll see that global variable and go back
to sleep.</para>
<para>
To make our life more interesting, <function>module_close</function>
doesn't have a monopoly on waking up the processes which wait to access
the file. A signal, such as <keycombo
action="simul"><keycap>Ctrl</keycap><keycap>c</keycap></keycombo>
<indexterm><primary>ctrl-c</primary></indexterm>
(<parameter>SIGINT</parameter>)
<indexterm><primary>signal</primary></indexterm>
<indexterm><primary>SIGINT</primary></indexterm>
can also wake up a process.
<indexterm><primary>module_wake_up</primary></indexterm>
<footnote>
<para>
This is because we used
<function>module_interruptible_sleep_on</function>. We could have
used <function>module_sleep_on</function>
<indexterm><primary>module_sleep_on</primary></indexterm>
<indexterm><primary>sleep_on</primary></indexterm>
instead, but that would have resulted is extremely angry users whose
<keycombo
action="simul"><keycap>Ctrl</keycap><keycap>c</keycap></keycombo>s
are ignored.
</para>
</footnote>
In that case, we want to return with <parameter>-EINTR</parameter>
<indexterm><primary>EINTR</primary></indexterm>
immediately. This is important so users can, for example, kill the
process before it receives the file.
</para>
<indexterm><primary>module_sleep_on</primary></indexterm>
<indexterm><primary>sleep_on</primary></indexterm>
<indexterm><primary>ctrl-c</primary></indexterm>
<indexterm>
<primary>processes</primary>
<secondary>killing</secondary>
</indexterm>
<para>To make our life more interesting, <function>module_close</function> doesn't have a monopoly on waking
up the processes which wait to access the file. A signal, such as <keycombo
action="simul"><keycap>Ctrl</keycap><keycap>c</keycap></keycombo> (<parameter>SIGINT</parameter>) can also
wake up a process. <footnote> <para> This is because we used
<function>module_interruptible_sleep_on</function>. We could have used <function>module_sleep_on</function>
instead, but that would have resulted is extremely angry users whose <keycombo
action="simul"><keycap>Ctrl</keycap><keycap>c</keycap></keycombo>s are ignored. </para> </footnote> In that
case, we want to return with <parameter>-EINTR</parameter> <indexterm><primary>EINTR</primary></indexterm>
immediately. This is important so users can, for example, kill the process before it receives the
file.</para>
<para>
There is one more point to remember. Some times processes don't want to
sleep, they want either to get what they want immediately, or to be told
it cannot be done. Such processes use the
<parameter>O_NONBLOCK</parameter>
<indexterm><primary>processes</primary><secondary>killing</secondary></indexterm>
<indexterm><primary>O_NONBLOCK</primary></indexterm>
<indexterm><primary>non-blocking</primary></indexterm> flag when opening
the file. The kernel is supposed to respond by returning with the error
code <parameter>-EAGAIN</parameter>
<indexterm><primary>EAGAIN</primary></indexterm> from operations which
would otherwise block, such as opening the file in this example. The
program <command>cat_noblock</command>, available in the source directory
for this chapter, can be used to open a file with
<parameter>O_NONBLOCK</parameter>.
</para>
<indexterm><primary>non-blocking</primary></indexterm>
<indexterm><primary>EAGAIN</primary></indexterm>
<indexterm><primary>blocking, how to avoid</primary></indexterm>
<indexterm><primary>blocking, how to avoid</primary></indexterm>
<para>There is one more point to remember. Some times processes don't want to sleep, they want either to get
what they want immediately, or to be told it cannot be done. Such processes use the
<parameter>O_NONBLOCK</parameter> flag when opening the file. The kernel is supposed to respond by returning
with the error code <parameter>-EAGAIN</parameter> from operations which would otherwise block, such as
opening the file in this example. The program <command>cat_noblock</command>, available in the source
directory for this chapter, can be used to open a file with <parameter>O_NONBLOCK</parameter>.</para>
<example>
<title>sleep.c</title>
<indexterm><primary>sleep.c</primary></indexterm>
<programlisting>
<![CDATA[
<indexterm><primary>sleep.c</primary></indexterm>
<example><title>sleep.c</title>
<programlisting><![CDATA[
/* sleep.c - create a /proc file, and if several processes try to open it at
* the same time, put all but one to sleep
*
@ -502,8 +440,13 @@ void cleanup_module()
proc_unregister(&proc_root, Our_Proc_File.low_ino);
}
]]>
</programlisting>
</example>
]]></programlisting>
</example>
</sect2>
</sect1>
<!--
vim: tw=116
-->

View File

@ -1,58 +1,33 @@
<sect1><title>Replacing <function>printk</function></title>
<indexterm><primary>replacing printk</primary></indexterm>
<indexterm>
<primary>printk</primary>
<secondary>replacing</secondary>
</indexterm>
<indexterm><primary>printk</primary><secondary>replacing</secondary></indexterm>
<para>Good writing style says we have a paragraph here.</para>
<sect2><title>Replacing <function>printk</function></title>
<sect2><title>Replacing <function>printk</function></title>
<para>In the beginning (chapter \ref{hello-world}), I said that X and kernel module programming don't mix.
That's true while developing the kernel module, but in actual use you want to be able to send messages to
whichever tty<footnote><para><emphasis>T</emphasis>ele<emphasis>ty</emphasis>pe, originally a combination
keyboard-printer used to communicate with a Unix system, and today an abstraction for the text stream used for
a Unix program, whether it's a physical terminal, an xterm on an X display, a network connection used with
telnet, etc.</para></footnote> the command to the module came from. This is important for identifying errors
after the kernel module is released, because it will be used through all of them.</para>
<para>
In the beginning (chapter \ref{hello-world}), I said that X and kernel
module programming don't mix. That's true while developing the kernel
module, but in actual use you want to be able to send messages to
whichever <acronym>tty</acronym>
<footnote>
<para>
<emphasis>T</emphasis>ele<emphasis>ty</emphasis>pe, originally a
combination keyboard-printer used to communicate with a Unix system,
and today an abstraction for the text stream used for a Unix program,
whether it's a physical terminal, an xterm on an X display, a network
connection used with telnet, etc.
</para>
</footnote>
the command to the module came from. This is important for identifying
errors after the kernel module is released, because it will be used
through all of them.
</para>
<para>
The way this is done is by using <parameter>current</parameter>,
<indexterm><primary>current task</primary></indexterm>
<indexterm>
<primary>task</primary>
<secondary>current></secondary>
</indexterm>
a pointer to the currently running task, to get the current task's
<structname>tty</structname> structure.
<indexterm><primary>task</primary><secondary>current></secondary></indexterm>
<indexterm><primary>tty_structure</primary></indexterm>
<indexterm>
<primary>struct</primary>
<secondary>tty</secondary>
</indexterm>
Then, we look inside that <structname>tty</structname> structure to find
a pointer to a string write function, which we use to write a string to
the <acronym>tty</acronym>.
</para>
<indexterm><primary>struct</primary><secondary>tty</secondary></indexterm>
<para>The way this is done is by using <parameter>current</parameter>, a pointer to the currently running
task, to get the current task's <structname>tty</structname> structure. Then, we look inside that
<structname>tty</structname> structure to find a pointer to a string write function, which we use to write a
string to the tty.</para>
<example><title>printk.c</title>
<indexterm><primary>printk.c</primary></indexterm>
<programlisting><![CDATA[
<example><title>printk.c</title>
<programlisting><![CDATA[
/* printk.c - send textual output to the tty you're running on, regardless of
* whether it's passed through X11, telnet, etc.
*
@ -137,8 +112,13 @@ void cleanup_module()
{
print_string("Module Removed");
}
]]></programlisting>
</example>
]]></programlisting>
</example>
</sect2>
</sect1>
<!--
vim: tw=116
-->

View File

@ -1,91 +1,61 @@
<sect1><title>Scheduling Tasks</title>
<indexterm><primary>scheduling tasks</primary></indexterm>
<indexterm>
<primary>tasks</primary>
<secondary>scheduling</secondary>
</indexterm>
<indexterm><primary>tasks</primary><secondary>scheduling</secondary></indexterm>
<indexterm><primary>housekeeping</primary></indexterm>
<indexterm><primary>crontab</primary></indexterm>
<indexterm><primary>crontab</primary></indexterm>
<para>Good writing style says we have a paragraph here.</para>
<para>Very often, we have <quote>housekeeping</quote> tasks which have to be done at a certain time, or every so
often. If the task is to be done by a process, we do it by putting it in the <filename>crontab</filename> file.
If the task is to be done by a kernel module, we have two possibilities. The first is to put a process in the
<filename>crontab</filename> file which will wake up the module by a system call when necessary, for example by
opening a file. This is terribly inefficient, however -- we run a new process off of <filename>crontab</filename>,
read a new executable to memory, and all this just to wake up a kernel module which is in memory anyway.</para>
<sect2><title>Scheduling Tasks</title>
<indexterm><primary>task</primary></indexterm>
<indexterm><primary>tq_struct</primary></indexterm>
<indexterm><primary>queue_task</primary></indexterm>
<indexterm><primary>tq_timer</primary></indexterm>
<para>
Very often, we have <quote>housekeeping</quote>
<indexterm><primary>housekeeping</primary></indexterm>
<indexterm><primary>crontab</primary></indexterm> tasks which have to be
done at a certain time, or every so often. If the task is to be done by a
process, we do it by putting it in the <filename>crontab</filename> file.
If the task is to be done by a kernel module, we have two possibilities.
The first is to put a process in the <filename>crontab</filename> file
which will wake up the module by a system call when necessary, for
example by opening a file. This is terribly inefficient, however -- we
run a new process off of <filename>crontab</filename>, read a new
executable to memory, and all this just to wake up a kernel module which
is in memory anyway.
</para>
<para>Instead of doing that, we can create a function that will be called once for every timer interrupt. The way
we do this is we create a task, held in a <structname>tq_struct</structname> structure, which will hold a pointer
to the function. Then, we use <function>queue_task</function> to put that task on a task list called
<structname>tq_timer</structname>, which is the list of tasks to be executed on the next timer interrupt. Because
we want the function to keep on being executed, we need to put it back on <structname>tq_timer</structname>
whenever it is called, for the next timer interrupt.</para>
<para>
Instead of doing that, we can create a function that will be called once
for every timer interrupt. The way we do this is we create a task,
<indexterm><primary>task</primary></indexterm> held in a
<structname>tq_struct</structname>
<indexterm><primary>tq_struct</primary></indexterm> structure, which will
hold a pointer to the function. Then, we use
<function>queue_task</function>
<indexterm><primary>queue_task</primary></indexterm> to put that task on
a task list called <structname>tq_timer</structname>,
<indexterm><primary>tq_timer</primary></indexterm> which is the list of
tasks to be executed on the next timer interrupt. Because we want the
function to keep on being executed, we need to put it back on
<structname>tq_timer</structname> whenever it is called, for the next
timer interrupt.
</para>
<indexterm><primary>rmmod</primary></indexterm>
<indexterm><primary>reference count</primary></indexterm>
<indexterm><primary>module_cleanup</primary></indexterm>
<para>
There's one more point we need to remember here. When a module is
removed by <command>rmmod</command>,
<indexterm><primary>rmmod</primary></indexterm> first its reference count
<indexterm><primary>reference count</primary></indexterm> is checked. If
it is zero, <function>module_cleanup</function>
<indexterm><primary>module_cleanup</primary></indexterm> is called.
Then, the module is removed from memory with all its functions. Nobody
checks to see if the timer's task list happens to contain a pointer to
one of those functions, which will no longer be available. Ages later
(from the computer's perspective, from a human perspective it's nothing,
less than a hundredth of a second), the kernel has a timer interrupt and
tries to call the function on the task list. Unfortunately, the function
is no longer there. In most cases, the memory page where it sat is
unused, and you get an ugly error message. But if some other code is now
sitting at the same memory location, things could get
<emphasis>very</emphasis> ugly. Unfortunately, we don't have an easy way
to unregister a task from a task list.
</para>
<para>There's one more point we need to remember here. When a module is removed by <command>rmmod</command>,
first its reference count is checked. If it is zero, <function>module_cleanup</function> is called. Then, the
module is removed from memory with all its functions. Nobody checks to see if the timer's task list happens to
contain a pointer to one of those functions, which will no longer be available. Ages later (from the computer's
perspective, from a human perspective it's nothing, less than a hundredth of a second), the kernel has a timer
interrupt and tries to call the function on the task list. Unfortunately, the function is no longer there. In
most cases, the memory page where it sat is unused, and you get an ugly error message. But if some other code is
now sitting at the same memory location, things could get <emphasis>very</emphasis> ugly. Unfortunately, we don't
have an easy way to unregister a task from a task list.</para>
<para>
Since <function>cleanup_module</function> can't return with an error code
(it's a void function), the solution is to not let it return at all.
Instead, it calls <function>sleep_on</function>
<indexterm><primary>sleep_on</primary></indexterm> or
<function>module_sleep_on</function>
<indexterm><primary>module_sleep_on</primary></indexterm>
<footnote><para>They're really the same.</para></footnote>
to put the <command>rmmod</command> process to sleep. Before that, it
informs the function called on the timer interrupt to stop attaching
itself by setting a global variable. Then, on the next timer interrupt,
the <command>rmmod</command> process will be woken up, when our function
is no longer in the queue and it's safe to remove the module.
</para>
<indexterm><primary>sleep_on</primary></indexterm>
<indexterm><primary>module_sleep_on</primary></indexterm>
<example><title>sched.c</title>
<para>Since <function>cleanup_module</function> can't return with an error code (it's a void function), the
solution is to not let it return at all. Instead, it calls <function>sleep_on</function> or
<function>module_sleep_on</function><footnote><para>They're really the same.</para></footnote> to put the
<command>rmmod</command> process to sleep. Before that, it informs the function called on the timer interrupt to
stop attaching itself by setting a global variable. Then, on the next timer interrupt, the
<command>rmmod</command> process will be woken up, when our function is no longer in the queue and it's safe to
remove the module.</para>
<indexterm><primary>sched.c</primary></indexterm>
<programlisting><![CDATA[
/* sched.c - scheduale a function to be called on every timer interrupt.
<example><title>sched.c</title>
<indexterm><primary>sched.c</primary></indexterm>
<programlisting><![CDATA[
/* sched.c - scheduale a function to be called on every timer interrupt.
*
* Copyright (C) 2001 by Peter Jay Salzman
* Copyright (C) 2001 by Peter Jay Salzman
*/
/* The necessary header files */
@ -236,9 +206,11 @@ void cleanup_module()
*/
sleep_on(&WaitQ);
}
]]></programlisting>
</example>
]]></programlisting>
</example>
</sect2>
</sect1>
<!--
vim: tw=116
-->

View File

@ -1,173 +1,112 @@
<sect1><title>Interrupt Handlers</title>
<indexterm><primary>interrupt handlers</primary></indexterm>
<indexterm>
<primary>handlers</primary>
<secondary>interrupt</secondary>
</indexterm>
<indexterm><primary>handlers</primary><secondary>interrupt</secondary></indexterm>
<sect2><title>Interrupt Handlers</title>
<para>
Except for the last chapter, everything we did in the kernel so far we've
done as a response to a process asking for it, either by dealing with a
special file, sending an <function>ioctl</function>, or issuing a system
call. But the job of the kernel isn't just to respond to process
requests. Another job, which is every bit as important, is to speak to
the hardware connected to the machine.
</para>
<sect2><title>Interrupt Handlers</title>
<para>
There are two types of interaction between the <acronym>CPU</acronym> and
the rest of the computer's hardware. The first type is when the
<acronym>CPU</acronym> gives orders to the hardware, the other is when
the hardware needs to tell the <acronym>CPU</acronym> something. The
second, called interrupts, is much harder to implement because it has to
be dealt with when convenient for the hardware, not the
<acronym>CPU</acronym>. Hardware devices typically have a very small
amount of <acronym>RAM</acronym>, and if you don't read their information
when available, it is lost.
</para>
<para>Except for the last chapter, everything we did in the kernel so far we've done as a response to a
process asking for it, either by dealing with a special file, sending an <function>ioctl()</function>, or
issuing a system call. But the job of the kernel isn't just to respond to process requests. Another job,
which is every bit as important, is to speak to the hardware connected to the machine.</para>
<para>
Under Linux, hardware interrupts are called <acronym>IRQ</acronym>s
(short for <emphasis>I</emphasis>nterrupt
<emphasis>R</emphasis>e<emphasis>q</emphasis>uests).
<footnote>
<para>
This is standard nomencalture on the Intel architecture where Linux
originated.
<para>
</footnote>
There are two types of <acronym>IRQ</acronym>s, short and long. A short
<acronym>IRQ</acronym> is one which is expected to take a
<emphasis>very</emphasis> short period of time, during which the rest of
the machine will be blocked and no other interrupts will be handled. A
long <acronym>IRQ</acronym> is one which can take longer, and during
which other interrupts may occur (but not interrupts from the same
device). If at all possible, it's better to declare an interrupt handler
to be long.
</para>
<para>There are two types of interaction between the CPU and the rest of the computer's hardware. The first
type is when the CPU gives orders to the hardware, the other is when the hardware needs to tell the CPU
something. The second, called interrupts, is much harder to implement because it has to be dealt with when
convenient for the hardware, not the CPU. Hardware devices typically have a very small amount of RAM, and if
you don't read their information when available, it is lost.</para>
<para>
When the <acronym>CPU</acronym> receives an interrupt, it stops whatever
it's doing (unless it's processing a more important interrupt, in which
case it will deal with this one only when the more important one is
done), saves certain parameters on the stack and calls the interrupt
handler. This means that certain things are not allowed in the interrupt
handler itself, because the system is in an unknown state. The solution
to this problem is for the interrupt handler to do what needs to be done
immediately, usually read something from the hardware or send something
to the hardware, and then schedule the handling of the new information at
a later time (this is called the <quote>bottom half</quote>)
<indexterm><primary>bottom half</primary></indexterm> and return. The
kernel is then guaranteed to call the bottom half as soon as possible --
and when it does, everything allowed in kernel modules will be allowed.
</para>
<para>Under Linux, hardware interrupts are called IRQ's (<emphasis>I</emphasis>nterrupt
<emphasis>R</emphasis>e<emphasis>q</emphasis>uests)<footnote><para>This is standard nomencalture on the Intel
architecture where Linux originated.<para></footnote>. There are two types of IRQ's, short and long. A short
IRQ is one which is expected to take a <emphasis>very</emphasis> short period of time, during which the rest
of the machine will be blocked and no other interrupts will be handled. A long IRQ is one which can take
longer, and during which other interrupts may occur (but not interrupts from the same device). If at all
possible, it's better to declare an interrupt handler to be long.</para>
<para>
The way to implement this is to call <function>request_irq</function>
<indexterm><primary>request_irq</primary></indexterm> to get your
interrupt handler called when the relevant <acronym>IRQ</acronym> is
received (there are 15 of them, plus 1 which is used to cascade the
interrupt controllers, on Intel platforms). This function receives the
<acronym>IRQ</acronym> number, the name of the function, flags, a name
for <filename>/proc/interrupts</filename>
<indexterm><primary>/proc/interrupts</primary></indexterm> and a
parameter to pass to the interrupt handler. The flags can include
<parameter>SA_SHIRQ</parameter>
<indexterm><primary>SA_SHIRQ</primary></indexterm> to indicate you're
willing to share the <acronym>IRQ</acronym> with other interrupt handlers
(usually because a number of hardware devices sit on the same
<acronym>IRQ</acronym>) and <parameter>SA_INTERRUPT</parameter>
<indexterm><primary>SA_INTERRUPT</primary></indexterm> to indicate this
is a fast interrupt. This function will only succeed if there isn't
already a handler on this <acronym>IRQ</acronym>, or if you're both
willing to share.
</para>
<indexterm><primary>bottom half</primary></indexterm>
<para>
Then, from within the interrupt handler, we communicate with the hardware
and then use <function>queue_task_irq</function>
<indexterm><primary>queue_task_irq</primary></indexterm> with
<function>tq_immediate</function>
<indexterm><primary>tq_immediate</primary></indexterm> and
<function>mark_bh(BH_IMMEDIATE)</function>
<para>When the CPU receives an interrupt, it stops whatever it's doing (unless it's processing a more
important interrupt, in which case it will deal with this one only when the more important one is done), saves
certain parameters on the stack and calls the interrupt handler. This means that certain things are not
allowed in the interrupt handler itself, because the system is in an unknown state. The solution to this
problem is for the interrupt handler to do what needs to be done immediately, usually read something from the
hardware or send something to the hardware, and then schedule the handling of the new information at a later
time (this is called the <quote>bottom half</quote>) and return. The kernel is then guaranteed to call the
bottom half as soon as possible -- and when it does, everything allowed in kernel modules will be
allowed.</para>
<indexterm><primary>request_irq()</primary></indexterm>
<indexterm><primary>/proc/interrupts</primary></indexterm>
<indexterm><primary>SA_INTERRUPT</primary></indexterm>
<indexterm><primary>SA_SHIRQ</primary></indexterm>
<para>The way to implement this is to call <function>request_irq()</function> to get your interrupt handler
called when the relevant IRQ is received (there are 15 of them, plus 1 which is used to cascade the interrupt
controllers, on Intel platforms). This function receives the IRQ number, the name of the function, flags, a
name for <filename>/proc/interrupts</filename> and a parameter to pass to the interrupt handler. The flags
can include <parameter>SA_SHIRQ</parameter> to indicate you're willing to share the IRQ with other interrupt
handlers (usually because a number of hardware devices sit on the same IRQ) and
<parameter>SA_INTERRUPT</parameter> to indicate this is a fast interrupt. This function will only succeed if
there isn't already a handler on this IRQ, or if you're both willing to share.</para>
<indexterm><primary>queue_task_irq</primary></indexterm>
<indexterm><primary>tq_immediate</primary></indexterm>
<indexterm><primary>mark_bh</primary></indexterm>
<indexterm><primary>BH_IMMEDIATE</primary></indexterm> to schedule the
bottom half. The reason we can't use the standard
<function>queue_task</function>
<indexterm><primary>queue_task</primary></indexterm> in version 2.0 is
that the interrupt might happen right in the middle of somebody else's
<function>queue_task</function>.
<footnote>
<para>
<function>queue_task_irq</function> is protected from this by a
global lock -- in 2.2 there is no <function>queue_task_irq</function>
and <function>queue_task</function> is protected by a lock.
</para>
</footnote>
We need <function>mark_bh</function> because earlier versions of Linux
only had an array of 32 bottom halves, and now one of them
(<parameter>BH_IMMEDIATE</parameter>) is used for the linked list of
bottom halves for drivers which didn't get a bottom half entry assigned
to them.
</para>
</sect2>
<indexterm><primary>BH_IMMEDIATE</primary></indexterm>
<sect2 id="keyboard">
<title>Keyboards on the Intel Architecture</title>
<para>Then, from within the interrupt handler, we communicate with the hardware and then use
<function>queue_task_irq()</function> with <function>tq_immediate()</function> and
<function>mark_bh(BH_IMMEDIATE)</function> to schedule the bottom half. The reason we can't use the standard
<function>queue_task</function> <indexterm><primary>queue_task</primary></indexterm> in version 2.0 is that
the interrupt might happen right in the middle of somebody else's
<function>queue_task</function><footnote><para><function>queue_task_irq</function> is protected from this by a
global lock -- in 2.2 there is no <function>queue_task_irq</function> and <function>queue_task</function> is
protected by a lock.</para></footnote>. We need <function>mark_bh</function> because earlier versions of Linux
only had an array of 32 bottom halves, and now one of them (<parameter>BH_IMMEDIATE</parameter>) is used for
the linked list of bottom halves for drivers which didn't get a bottom half entry assigned to them.</para>
<indexterm><primary>keyboard</primary></indexterm>
<indexterm>
<primary>Intel architecture</primary>
<secondary>keyboard</secondary>
</indexterm>
</sect2>
<warning>
<para>
The rest of this chapter is completely Intel specific. If you're not
running on an Intel platform, it will not work. Don't even try to
compile the code here.
</para>
</warning>
<para>
I had a problem with writing the sample code for this chapter. On one
hand, for an example to be useful it has to run on everybody's computer
with meaningful results. On the other hand, the kernel already includes
device drivers for all of the common devices, and those device drivers
won't coexist with what I'm going to write. The solution I've found was
to write something for the keyboard interrupt, and disable the regular
keyboard interrupt handler first. Since it is defined as a static symbol
in the kernel source files (specifically,
<filename>drivers/char/keyboard.c</filename>), there is no way to restore
it. Before <userinput>insmod</userinput>'ing this code, do on another
terminal <userinput>sleep 120 ; reboot</userinput> if you value your
file system.
</para>
<para>
This code binds itself to <acronym>IRQ</acronym> 1, which is the
<acronym>IRQ</acronym> of the keyboard controlled under Intel
architectures. Then, when it receives a keyboard interrupt, it reads the
keyboard's status (that's the purpose of the
<userinput>inb(0x64)</userinput>)
<indexterm><primary>inb</primary></indexterm> and the scan code, which is
the value returned by the keyboard. Then, as soon as the kernel thinks
it's feasible, it runs <function>got_char</function> which gives the code
of the key used (the first seven bits of the scan code) and whether it
has been pressed (if the 8th bit is zero) or released (if it's one).
</para>
<sect2 id="keyboard"><title>Keyboards on the Intel Architecture</title>
<indexterm><primary>keyboard</primary></indexterm>
<indexterm><primary>Intel architecture</primary><secondary>keyboard</secondary></indexterm>
<warning>
<para>The rest of this chapter is completely Intel specific. If you're not running on an Intel platform, it
will not work. Don't even try to compile the code here.</para>
</warning>
<para>I had a problem with writing the sample code for this chapter. On one hand, for an example to be useful
it has to run on everybody's computer with meaningful results. On the other hand, the kernel already includes
device drivers for all of the common devices, and those device drivers won't coexist with what I'm going to
write. The solution I've found was to write something for the keyboard interrupt, and disable the regular
keyboard interrupt handler first. Since it is defined as a static symbol in the kernel source files
(specifically, <filename>drivers/char/keyboard.c</filename>), there is no way to restore it. Before
<userinput>insmod</userinput>'ing this code, do on another terminal <userinput>sleep 120 ; reboot</userinput>
if you value your file system.</para>
<indexterm><primary>inb</primary></indexterm>
<para> This code binds itself to IRQ 1, which is the IRQ of the keyboard controlled under Intel architectures.
Then, when it receives a keyboard interrupt, it reads the keyboard's status (that's the purpose of the
<userinput>inb(0x64)</userinput>) and the scan code, which is the value returned by the keyboard. Then, as
soon as the kernel thinks it's feasible, it runs <function>got_char</function> which gives the code of the key
used (the first seven bits of the scan code) and whether it has been pressed (if the 8th bit is zero) or
released (if it's one).</para>
<example><title>intrpt.c</title>
<indexterm><primary>intrpt.c</primary></indexterm>
<programlisting><![CDATA[
/* intrpt.c - An interrupt handler.
<example><title>intrpt.c</title>
<programlisting><![CDATA[
/* intrpt.c - An interrupt handler.
*
* Copyright (C) 2001 by Peter Jay Salzman
* Copyright (C) 2001 by Peter Jay Salzman
*/
/* The necessary header files */
@ -263,7 +202,14 @@ void cleanup_module()
*/
free_irq(1, NULL);
}
]]></programlisting>
</example>
</sect2>
]]></programlisting>
</example>
</sect2>
</sect1>
<!--
vim: tw=116
-->

View File

@ -1,83 +1,39 @@
<sect1><title>Symmetrical Multi-Processing</title>
<indexterm><primary>SMP</primary></indexterm>
<indexterm><primary>multi-processing</primary></indexterm>
<indexterm><primary>symmetrical multi-processing</primary></indexterm>
<indexterm>
<primary>processing</primary>
<secondary>multi</secondary>
</indexterm>
<indexterm><primary>SMP</primary></indexterm>
<indexterm><primary>multi-processing</primary></indexterm>
<indexterm><primary>symmetrical multi-processing</primary></indexterm>
<indexterm><primary>processing</primary><secondary>multi</secondary></indexterm>
<indexterm><primary>CPU</primary><secondary>multiple</secondary></indexterm>
<para>Good writing style says we have a paragraph here.</para>
<sect2><title>Symmetrical Multi-Processing</title>
<para>
One of the easiest (read, cheapest) ways to improve hardware performance
is to put more than one <acronym>CPU</acronym> on the board.
<indexterm>
<primary>CPU</primary>
<secondary>multiple</secondary>
</indexterm>
This can be done either making the different <acronym>CPU</acronym>s take
on different jobs (asymmetrical multi-processing) or by making them all
run in parallel, doing the same job (symmetrical multi-processing, a.k.a.
<acronym>SMP</acronym>). Doing asymmetrical multi-processing effectively
requires specialized knowledge about the tasks the computer should do,
which is unavailable in a general purpose operating system such as Linux.
On the other hand, symmetrical multi-processing is relatively easy to
implement.
</para>
<para>One of the easiest and cheapest ways to improve hardware performance is to put more than one CPU on the
board. This can be done either making the different CPU's take on different jobs (asymmetrical multi-processing)
or by making them all run in parallel, doing the same job (symmetrical multi-processing, a.k.a. SMP). Doing
asymmetrical multi-processing effectively requires specialized knowledge about the tasks the computer should do,
which is unavailable in a general purpose operating system such as Linux. On the other hand, symmetrical
multi-processing is relatively easy to implement.</para>
<para>
By relatively easy, I mean exactly that -- not that it's
<emphasis>really</emphasis> easy. In a symmetrical multi-processing
environment, the <acronym>CPU</acronym>s share the same memory, and as a
result code running in one <acronym>CPU</acronym> can affect the memory
used by another. You can no longer be certain that a variable you've set
to a certain value in the previous line still has that value -- the other
<acronym>CPU</acronym> might have played with it while you weren't
looking. Obviously, it's impossible to program like this.
</para>
<para>By relatively easy, I mean exactly that: not that it's <emphasis>really</emphasis> easy. In a symmetrical
multi-processing environment, the CPU's share the same memory, and as a result code running in one CPU can affect
the memory used by another. You can no longer be certain that a variable you've set to a certain value in the
previous line still has that value; the other CPU might have played with it while you weren't looking. Obviously,
it's impossible to program like this.</para>
<para>
In the case of process programming this normally isn't an issue, because
a process will normally only run on one <acronym>CPU</acronym> at a time.
<footnote>
<para>
The exception is threaded processes, which can run on several
<acronym>CPU</acronym>s at once.
</para>
</footnote>
The kernel, on the other hand, could be called by different processes
running on different <acronym>CPU</acronym>s.
</para>
<para>In the case of process programming this normally isn't an issue, because a process will normally only run on
one CPU at a time<footnote><para>The exception is threaded processes, which can run on several CPU's at
once.</para></footnote>. The kernel, on the other hand, could be called by different processes running on
different CPU's.</para>
<para>
In version 2.0.x, this isn't a problem because the entire kernel is in
one big spinlock. This means that if one <acronym>CPU</acronym> is in
the kernel and another <acronym>CPU</acronym> wants to get in, for
example because of a system call, it has to wait until the first
<acronym>CPU</acronym> is done. This makes Linux <acronym>SMP</acronym>
safe,
<footnote>
<para>Meaning it is safe to use it with <acronym>SMP</acronym></para>
</footnote>
but terriably inefficient.
</para>
<para>In version 2.0.x, this isn't a problem because the entire kernel is in one big spinlock. This means that if
one CPU is in the kernel and another CPU wants to get in, for example because of a system call, it has to wait
until the first CPU is done. This makes Linux SMP safe<footnote><para>Meaning it is safe to use it with
SMP</para></footnote>, but inefficient.</para>
<para>
In version 2.2.x, several <acronym>CPU</acronym>s can be in the kernel at
the same time. This is something module writers need to be aware of. I
got somebody to give me access to an <acronym>SMP</acronym> box, so
hopefully the next version of this book will include more information.
</para>
<para>In version 2.2.x, several CPU's can be in the kernel at the same time. This is something module writers
need to be aware of.</para>
<!-- Unfortunately, I don't have access to an SMP box to test things, so I
can't write a chapter about how to do it right. It anybody out there has
access to one and is willing to help me with this, I'll be grateful. If a
company will provide me with this access, I'll give them a free one
paragraph ad at the top of this chapter.
-->
</sect2>
</sect1>
<!--
vim: tw=116
-->

View File

@ -1,61 +1,41 @@
<sect1><title>Common Pitfalls</title>
<sect2><title>Common Pitfalls</title>
<para>Before I send you on your way to go out into the world and write kernel modules, there are a few things I
need to warn you about. If I fail to warn you and something bad happens, please report the problem to me for a
full refund of the amount I was paid for your copy of the book.</para>
<para>Before I send you on your way to go out into the world and write
kernel modules, there are a few things I need to warn you about. If I
fail to warn you and something bad happens, please report the problem to
me for a full refund of the amount I was paid for your copy of the book.
</para>
<indexterm><primary>refund policy</primary></indexterm>
<indexterm><primary>refund policy</primary></indexterm>
<variablelist>
<variablelist>
<varlistentry>
<term>Using standard libraries</term>
<indexterm><primary>standard libraries</primary></indexterm>
<indexterm>
<primary>libraries</primary>
<secondary>standard</secondary>
</indexterm>
<listitem>
<para>
You can't do that. In a kernel module you can only use kernel
functions, which are the functions you can see in
<filename>/proc/ksyms</filename>.
<indexterm><primary>/proc/ksyms</primary></indexterm>
<indexterm>
<primary>proc file</primary>
<secondary>ksyms</secondary>
</indexterm>
</para>
</listitem>
</varlistentry>
<indexterm><primary>standard libraries</primary></indexterm>
<indexterm><primary>libraries</primary><secondary>standard</secondary></indexterm>
<indexterm><primary>/proc/ksyms</primary></indexterm>
<indexterm><primary>proc file</primary><secondary>ksyms</secondary></indexterm>
<varlistentry>
<term>Disabling interrupts</term>
<indexterm>
<primary>interrupts</primary>
<secondary>disabling</secondary>
</indexterm>
<listitem>
<para>
You might need to do this for a short time and that is OK, but if
you don't enable them afterwards, your system will be stuck and
you'll have to power it off.
</para>
</listitem>
</varlistentry>
<varlistentry><term>Using standard libraries</term>
<listitem><para> You can't do that. In a kernel module you can only use kernel functions, which are the functions
you can see in <filename>/proc/ksyms</filename>.</para></listitem>
</varlistentry>
<indexterm><primary>interrupts</primary><secondary>disabling</secondary></indexterm>
<varlistentry><term>Disabling interrupts</term>
<listitem><para>You might need to do this for a short time and that is OK, but if you don't enable them
afterwards, your system will be stuck and you'll have to power it off.</para></listitem>
</varlistentry>
<varlistentry>
<term>Sticking your head inside a large carnivore</term>
<listitem>
<para>
I probably don't have to warn you about this, but I figured I will
anyway, just in case.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<indexterm><primary>carnivore</primary><secondary>large</secondary></indexterm>
<varlistentry><term>Sticking your head inside a large carnivore</term>
<listitem><para>I probably don't have to warn you about this, but I figured I will anyway, just in
case.</para></listitem>
</varlistentry>
</variablelist>
</sect1>
<!--
vim: tw=116
-->

View File

@ -29,6 +29,10 @@
<collab><collabname>Ori Pomerantz</collabname></collab>
</authorgroup>
<!-- year-month-day -->
<pubdate>2003-03-25 ver 2.1</pubdate>
<copyright>
<year>2001</year>
<holder>Peter Jay Salzman</holder>