833 lines
30 KiB
HTML
833 lines
30 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
|
|
<TITLE>KernelAnalysis-HOWTO: Fundamentals</TITLE>
|
|
<LINK HREF="KernelAnalysis-HOWTO-4.html" REL=next>
|
|
<LINK HREF="KernelAnalysis-HOWTO-2.html" REL=previous>
|
|
<LINK HREF="KernelAnalysis-HOWTO.html#toc3" REL=contents>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="KernelAnalysis-HOWTO-4.html">Next</A>
|
|
<A HREF="KernelAnalysis-HOWTO-2.html">Previous</A>
|
|
<A HREF="KernelAnalysis-HOWTO.html#toc3">Contents</A>
|
|
<HR>
|
|
<H2><A NAME="s3">3. Fundamentals</A></H2>
|
|
|
|
<H2><A NAME="ss3.1">3.1 What is the kernel?</A>
|
|
</H2>
|
|
|
|
<P>The kernel is the "core" of any computer system: it is the "software"
|
|
which allows users to share computer resources.
|
|
<P>
|
|
<P>The kernel can be thought as the main software of the OS (Operating
|
|
System), which may also include graphics management.
|
|
<P>
|
|
<P>For example, under Linux (like other Unix-like OSs), the XWindow
|
|
environment doesn't belong to the Linux Kernel, because it manages
|
|
only graphical operations (it uses user mode I/O to access video
|
|
card devices).
|
|
<P>
|
|
<P>By contrast, Windows environments (Win9x, WinME, WinNT, Win2K,
|
|
WinXP, and so on) are a mix between a graphical environment and kernel.
|
|
<P>
|
|
<H2><A NAME="ss3.2">3.2 What is the difference between User Mode and Kernel Mode?</A>
|
|
</H2>
|
|
|
|
<H3>Overview</H3>
|
|
|
|
<P>Many years ago, when computers were as big as a room, users ran
|
|
their applications with much difficulty and, sometimes, their applications
|
|
crashed the computer.
|
|
<P>
|
|
<H3>Operative modes</H3>
|
|
|
|
<P>To avoid having applications that constantly crashed, newer OSs
|
|
were designed with 2 different operative modes:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>Kernel Mode: the machine operates with critical data structure,
|
|
direct hardware (IN/OUT or memory mapped), direct memory, IRQ, DMA,
|
|
and so on.</LI>
|
|
<LI>User Mode: users can run applications.
|
|
</LI>
|
|
</OL>
|
|
<P>
|
|
<PRE>
|
|
|
|
| Applications /|\
|
|
| ______________ |
|
|
| | User Mode | |
|
|
| ______________ |
|
|
| | |
|
|
Implementation | _______ _______ | Abstraction
|
|
Detail | | Kernel Mode | |
|
|
| _______________ |
|
|
| | |
|
|
| | |
|
|
| | |
|
|
\|/ Hardware |
|
|
</PRE>
|
|
<P>Kernel Mode "prevents" User Mode applications from damaging the
|
|
system or its features.
|
|
<P>
|
|
<P>Modern microprocessors implement in hardware at least 2 different
|
|
states. For example under Intel, 4 states determine the PL (Privilege
|
|
Level). It is possible to use 0,1,2,3 states, with 0 used in Kernel
|
|
Mode.
|
|
<P>
|
|
<P>Unix OS requires only 2 privilege levels, and we will use such
|
|
a paradigm as point of reference.
|
|
<P>
|
|
<H2><A NAME="ss3.3">3.3 Switching from User Mode to Kernel Mode</A>
|
|
</H2>
|
|
|
|
<H3>When do we switch?</H3>
|
|
|
|
<P>Once we understand that there are 2 different modes, we have
|
|
to know when we switch from one to the other.
|
|
<P>
|
|
<P>Typically, there are 2 points of switching:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>When calling a System Call: after calling a System Call, the
|
|
task voluntary calls pieces of code living in Kernel Mode</LI>
|
|
<LI>When an IRQ (or exception) comes: after the IRQ an IRQ handler
|
|
(or exception handler) is called, then control returns back to the
|
|
task that was interrupted like nothing was happened.
|
|
</LI>
|
|
</OL>
|
|
<H3>System Calls</H3>
|
|
|
|
<P>System calls are like special functions that manage OS routines
|
|
which live in Kernel Mode.
|
|
<P>
|
|
<P>A system call can be called when we:
|
|
<P>
|
|
<P>
|
|
<UL>
|
|
<LI>access an I/O device or a file (like read or write)</LI>
|
|
<LI>need to access privileged information (like pid, changing scheduling
|
|
policy or other information)</LI>
|
|
<LI>need to change execution context (like forking or executing some
|
|
other application) </LI>
|
|
<LI>need to execute a particular command (like ''chdir'', ''kill",
|
|
''brk'', or ''signal'')
|
|
</LI>
|
|
</UL>
|
|
<P>
|
|
<PRE>
|
|
| |
|
|
------->| System Call i | (Accessing Devices)
|
|
| | | | [sys_read()] |
|
|
| ... | | | |
|
|
| system_call(i) |-------- | |
|
|
| [read()] | | |
|
|
| ... | | |
|
|
| system_call(j) |-------- | |
|
|
| [get_pid()] | | | |
|
|
| ... | ------->| System Call j | (Accessing kernel data structures)
|
|
| | | [sys_getpid()]|
|
|
| |
|
|
|
|
USER MODE KERNEL MODE
|
|
|
|
|
|
Unix System Calls Working
|
|
</PRE>
|
|
<P>System calls are almost the only interface used by User Mode
|
|
to talk with low level resources (hardware). The only exception to
|
|
this statement is when a process uses ''ioperm'' system call. In
|
|
this case a device can be accessed directly by User Mode process
|
|
(IRQs cannot be used).
|
|
<P>
|
|
<P>NOTE: Not every ''C'' function is a system call, only some of
|
|
them.
|
|
<P>
|
|
<P>Below is a list of System Calls under Linux Kernel 2.4.17, from
|
|
[ arch/i386/kernel/entry.S ]
|
|
<P>
|
|
<P>
|
|
<PRE>
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* 0 - old "setup()" system call*/
|
|
.long SYMBOL_NAME(sys_exit)
|
|
.long SYMBOL_NAME(sys_fork)
|
|
.long SYMBOL_NAME(sys_read)
|
|
.long SYMBOL_NAME(sys_write)
|
|
.long SYMBOL_NAME(sys_open) /* 5 */
|
|
.long SYMBOL_NAME(sys_close)
|
|
.long SYMBOL_NAME(sys_waitpid)
|
|
.long SYMBOL_NAME(sys_creat)
|
|
.long SYMBOL_NAME(sys_link)
|
|
.long SYMBOL_NAME(sys_unlink) /* 10 */
|
|
.long SYMBOL_NAME(sys_execve)
|
|
.long SYMBOL_NAME(sys_chdir)
|
|
.long SYMBOL_NAME(sys_time)
|
|
.long SYMBOL_NAME(sys_mknod)
|
|
.long SYMBOL_NAME(sys_chmod) /* 15 */
|
|
.long SYMBOL_NAME(sys_lchown16)
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* old break syscall holder */
|
|
.long SYMBOL_NAME(sys_stat)
|
|
.long SYMBOL_NAME(sys_lseek)
|
|
.long SYMBOL_NAME(sys_getpid) /* 20 */
|
|
.long SYMBOL_NAME(sys_mount)
|
|
.long SYMBOL_NAME(sys_oldumount)
|
|
.long SYMBOL_NAME(sys_setuid16)
|
|
.long SYMBOL_NAME(sys_getuid16)
|
|
.long SYMBOL_NAME(sys_stime) /* 25 */
|
|
.long SYMBOL_NAME(sys_ptrace)
|
|
.long SYMBOL_NAME(sys_alarm)
|
|
.long SYMBOL_NAME(sys_fstat)
|
|
.long SYMBOL_NAME(sys_pause)
|
|
.long SYMBOL_NAME(sys_utime) /* 30 */
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* old stty syscall holder */
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* old gtty syscall holder */
|
|
.long SYMBOL_NAME(sys_access)
|
|
.long SYMBOL_NAME(sys_nice)
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* 35 */ /* old ftime syscall holder */
|
|
.long SYMBOL_NAME(sys_sync)
|
|
.long SYMBOL_NAME(sys_kill)
|
|
.long SYMBOL_NAME(sys_rename)
|
|
.long SYMBOL_NAME(sys_mkdir)
|
|
.long SYMBOL_NAME(sys_rmdir) /* 40 */
|
|
.long SYMBOL_NAME(sys_dup)
|
|
.long SYMBOL_NAME(sys_pipe)
|
|
.long SYMBOL_NAME(sys_times)
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* old prof syscall holder */
|
|
.long SYMBOL_NAME(sys_brk) /* 45 */
|
|
.long SYMBOL_NAME(sys_setgid16)
|
|
.long SYMBOL_NAME(sys_getgid16)
|
|
.long SYMBOL_NAME(sys_signal)
|
|
.long SYMBOL_NAME(sys_geteuid16)
|
|
.long SYMBOL_NAME(sys_getegid16) /* 50 */
|
|
.long SYMBOL_NAME(sys_acct)
|
|
.long SYMBOL_NAME(sys_umount) /* recycled never used phys() */
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* old lock syscall holder */
|
|
.long SYMBOL_NAME(sys_ioctl)
|
|
.long SYMBOL_NAME(sys_fcntl) /* 55 */
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* old mpx syscall holder */
|
|
.long SYMBOL_NAME(sys_setpgid)
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* old ulimit syscall holder */
|
|
.long SYMBOL_NAME(sys_olduname)
|
|
.long SYMBOL_NAME(sys_umask) /* 60 */
|
|
.long SYMBOL_NAME(sys_chroot)
|
|
.long SYMBOL_NAME(sys_ustat)
|
|
.long SYMBOL_NAME(sys_dup2)
|
|
.long SYMBOL_NAME(sys_getppid)
|
|
.long SYMBOL_NAME(sys_getpgrp) /* 65 */
|
|
.long SYMBOL_NAME(sys_setsid)
|
|
.long SYMBOL_NAME(sys_sigaction)
|
|
.long SYMBOL_NAME(sys_sgetmask)
|
|
.long SYMBOL_NAME(sys_ssetmask)
|
|
.long SYMBOL_NAME(sys_setreuid16) /* 70 */
|
|
.long SYMBOL_NAME(sys_setregid16)
|
|
.long SYMBOL_NAME(sys_sigsuspend)
|
|
.long SYMBOL_NAME(sys_sigpending)
|
|
.long SYMBOL_NAME(sys_sethostname)
|
|
.long SYMBOL_NAME(sys_setrlimit) /* 75 */
|
|
.long SYMBOL_NAME(sys_old_getrlimit)
|
|
.long SYMBOL_NAME(sys_getrusage)
|
|
.long SYMBOL_NAME(sys_gettimeofday)
|
|
.long SYMBOL_NAME(sys_settimeofday)
|
|
.long SYMBOL_NAME(sys_getgroups16) /* 80 */
|
|
.long SYMBOL_NAME(sys_setgroups16)
|
|
.long SYMBOL_NAME(old_select)
|
|
.long SYMBOL_NAME(sys_symlink)
|
|
.long SYMBOL_NAME(sys_lstat)
|
|
.long SYMBOL_NAME(sys_readlink) /* 85 */
|
|
.long SYMBOL_NAME(sys_uselib)
|
|
.long SYMBOL_NAME(sys_swapon)
|
|
.long SYMBOL_NAME(sys_reboot)
|
|
.long SYMBOL_NAME(old_readdir)
|
|
.long SYMBOL_NAME(old_mmap) /* 90 */
|
|
.long SYMBOL_NAME(sys_munmap)
|
|
.long SYMBOL_NAME(sys_truncate)
|
|
.long SYMBOL_NAME(sys_ftruncate)
|
|
.long SYMBOL_NAME(sys_fchmod)
|
|
.long SYMBOL_NAME(sys_fchown16) /* 95 */
|
|
.long SYMBOL_NAME(sys_getpriority)
|
|
.long SYMBOL_NAME(sys_setpriority)
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* old profil syscall holder */
|
|
.long SYMBOL_NAME(sys_statfs)
|
|
.long SYMBOL_NAME(sys_fstatfs) /* 100 */
|
|
.long SYMBOL_NAME(sys_ioperm)
|
|
.long SYMBOL_NAME(sys_socketcall)
|
|
.long SYMBOL_NAME(sys_syslog)
|
|
.long SYMBOL_NAME(sys_setitimer)
|
|
.long SYMBOL_NAME(sys_getitimer) /* 105 */
|
|
.long SYMBOL_NAME(sys_newstat)
|
|
.long SYMBOL_NAME(sys_newlstat)
|
|
.long SYMBOL_NAME(sys_newfstat)
|
|
.long SYMBOL_NAME(sys_uname)
|
|
.long SYMBOL_NAME(sys_iopl) /* 110 */
|
|
.long SYMBOL_NAME(sys_vhangup)
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* old "idle" system call */
|
|
.long SYMBOL_NAME(sys_vm86old)
|
|
.long SYMBOL_NAME(sys_wait4)
|
|
.long SYMBOL_NAME(sys_swapoff) /* 115 */
|
|
.long SYMBOL_NAME(sys_sysinfo)
|
|
.long SYMBOL_NAME(sys_ipc)
|
|
.long SYMBOL_NAME(sys_fsync)
|
|
.long SYMBOL_NAME(sys_sigreturn)
|
|
.long SYMBOL_NAME(sys_clone) /* 120 */
|
|
.long SYMBOL_NAME(sys_setdomainname)
|
|
.long SYMBOL_NAME(sys_newuname)
|
|
.long SYMBOL_NAME(sys_modify_ldt)
|
|
.long SYMBOL_NAME(sys_adjtimex)
|
|
.long SYMBOL_NAME(sys_mprotect) /* 125 */
|
|
.long SYMBOL_NAME(sys_sigprocmask)
|
|
.long SYMBOL_NAME(sys_create_module)
|
|
.long SYMBOL_NAME(sys_init_module)
|
|
.long SYMBOL_NAME(sys_delete_module)
|
|
.long SYMBOL_NAME(sys_get_kernel_syms) /* 130 */
|
|
.long SYMBOL_NAME(sys_quotactl)
|
|
.long SYMBOL_NAME(sys_getpgid)
|
|
.long SYMBOL_NAME(sys_fchdir)
|
|
.long SYMBOL_NAME(sys_bdflush)
|
|
.long SYMBOL_NAME(sys_sysfs) /* 135 */
|
|
.long SYMBOL_NAME(sys_personality)
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* for afs_syscall */
|
|
.long SYMBOL_NAME(sys_setfsuid16)
|
|
.long SYMBOL_NAME(sys_setfsgid16)
|
|
.long SYMBOL_NAME(sys_llseek) /* 140 */
|
|
.long SYMBOL_NAME(sys_getdents)
|
|
.long SYMBOL_NAME(sys_select)
|
|
.long SYMBOL_NAME(sys_flock)
|
|
.long SYMBOL_NAME(sys_msync)
|
|
.long SYMBOL_NAME(sys_readv) /* 145 */
|
|
.long SYMBOL_NAME(sys_writev)
|
|
.long SYMBOL_NAME(sys_getsid)
|
|
.long SYMBOL_NAME(sys_fdatasync)
|
|
.long SYMBOL_NAME(sys_sysctl)
|
|
.long SYMBOL_NAME(sys_mlock) /* 150 */
|
|
.long SYMBOL_NAME(sys_munlock)
|
|
.long SYMBOL_NAME(sys_mlockall)
|
|
.long SYMBOL_NAME(sys_munlockall)
|
|
.long SYMBOL_NAME(sys_sched_setparam)
|
|
.long SYMBOL_NAME(sys_sched_getparam) /* 155 */
|
|
.long SYMBOL_NAME(sys_sched_setscheduler)
|
|
.long SYMBOL_NAME(sys_sched_getscheduler)
|
|
.long SYMBOL_NAME(sys_sched_yield)
|
|
.long SYMBOL_NAME(sys_sched_get_priority_max)
|
|
.long SYMBOL_NAME(sys_sched_get_priority_min) /* 160 */
|
|
.long SYMBOL_NAME(sys_sched_rr_get_interval)
|
|
.long SYMBOL_NAME(sys_nanosleep)
|
|
.long SYMBOL_NAME(sys_mremap)
|
|
.long SYMBOL_NAME(sys_setresuid16)
|
|
.long SYMBOL_NAME(sys_getresuid16) /* 165 */
|
|
.long SYMBOL_NAME(sys_vm86)
|
|
.long SYMBOL_NAME(sys_query_module)
|
|
.long SYMBOL_NAME(sys_poll)
|
|
.long SYMBOL_NAME(sys_nfsservctl)
|
|
.long SYMBOL_NAME(sys_setresgid16) /* 170 */
|
|
.long SYMBOL_NAME(sys_getresgid16)
|
|
.long SYMBOL_NAME(sys_prctl)
|
|
.long SYMBOL_NAME(sys_rt_sigreturn)
|
|
.long SYMBOL_NAME(sys_rt_sigaction)
|
|
.long SYMBOL_NAME(sys_rt_sigprocmask) /* 175 */
|
|
.long SYMBOL_NAME(sys_rt_sigpending)
|
|
.long SYMBOL_NAME(sys_rt_sigtimedwait)
|
|
.long SYMBOL_NAME(sys_rt_sigqueueinfo)
|
|
.long SYMBOL_NAME(sys_rt_sigsuspend)
|
|
.long SYMBOL_NAME(sys_pread) /* 180 */
|
|
.long SYMBOL_NAME(sys_pwrite)
|
|
.long SYMBOL_NAME(sys_chown16)
|
|
.long SYMBOL_NAME(sys_getcwd)
|
|
.long SYMBOL_NAME(sys_capget)
|
|
.long SYMBOL_NAME(sys_capset) /* 185 */
|
|
.long SYMBOL_NAME(sys_sigaltstack)
|
|
.long SYMBOL_NAME(sys_sendfile)
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* streams1 */
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* streams2 */
|
|
.long SYMBOL_NAME(sys_vfork) /* 190 */
|
|
.long SYMBOL_NAME(sys_getrlimit)
|
|
.long SYMBOL_NAME(sys_mmap2)
|
|
.long SYMBOL_NAME(sys_truncate64)
|
|
.long SYMBOL_NAME(sys_ftruncate64)
|
|
.long SYMBOL_NAME(sys_stat64) /* 195 */
|
|
.long SYMBOL_NAME(sys_lstat64)
|
|
.long SYMBOL_NAME(sys_fstat64)
|
|
.long SYMBOL_NAME(sys_lchown)
|
|
.long SYMBOL_NAME(sys_getuid)
|
|
.long SYMBOL_NAME(sys_getgid) /* 200 */
|
|
.long SYMBOL_NAME(sys_geteuid)
|
|
.long SYMBOL_NAME(sys_getegid)
|
|
.long SYMBOL_NAME(sys_setreuid)
|
|
.long SYMBOL_NAME(sys_setregid)
|
|
.long SYMBOL_NAME(sys_getgroups) /* 205 */
|
|
.long SYMBOL_NAME(sys_setgroups)
|
|
.long SYMBOL_NAME(sys_fchown)
|
|
.long SYMBOL_NAME(sys_setresuid)
|
|
.long SYMBOL_NAME(sys_getresuid)
|
|
.long SYMBOL_NAME(sys_setresgid) /* 210 */
|
|
.long SYMBOL_NAME(sys_getresgid)
|
|
.long SYMBOL_NAME(sys_chown)
|
|
.long SYMBOL_NAME(sys_setuid)
|
|
.long SYMBOL_NAME(sys_setgid)
|
|
.long SYMBOL_NAME(sys_setfsuid) /* 215 */
|
|
.long SYMBOL_NAME(sys_setfsgid)
|
|
.long SYMBOL_NAME(sys_pivot_root)
|
|
.long SYMBOL_NAME(sys_mincore)
|
|
.long SYMBOL_NAME(sys_madvise)
|
|
.long SYMBOL_NAME(sys_getdents64) /* 220 */
|
|
.long SYMBOL_NAME(sys_fcntl64)
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* reserved for TUX */
|
|
.long SYMBOL_NAME(sys_ni_syscall) /* Reserved for Security */
|
|
.long SYMBOL_NAME(sys_gettid)
|
|
.long SYMBOL_NAME(sys_readahead) /* 225 */
|
|
|
|
|
|
</PRE>
|
|
<H3>IRQ Event</H3>
|
|
|
|
<P>When an IRQ comes, the task that is running is interrupted in
|
|
order to service the IRQ Handler.
|
|
<P>
|
|
<P>After the IRQ is handled, control returns backs exactly to point
|
|
of interrupt, like nothing happened.
|
|
<P>
|
|
<P>
|
|
<PRE>
|
|
|
|
|
|
Running Task
|
|
|-----------| (3)
|
|
NORMAL | | | [break execution] IRQ Handler
|
|
EXECUTION (1)| | | ------------->|---------|
|
|
| \|/ | | | does |
|
|
IRQ (2)---->| .. |-----> | some |
|
|
| | |<----- | work |
|
|
BACK TO | | | | | ..(4). |
|
|
NORMAL (6)| \|/ | <-------------|_________|
|
|
EXECUTION |___________| [return to code]
|
|
(5)
|
|
USER MODE KERNEL MODE
|
|
|
|
User->Kernel Mode Transition caused by IRQ event
|
|
|
|
</PRE>
|
|
<P>The numbered steps below refer to the sequence of events in the
|
|
diagram above:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>Process is executing</LI>
|
|
<LI>IRQ comes while the task is running.</LI>
|
|
<LI>Task is interrupted to call an "Interrupt handler".</LI>
|
|
<LI>The "Interrupt handler" code is executed.</LI>
|
|
<LI>Control returns back to task user mode (as if nothing happened)</LI>
|
|
<LI>Process returns back to normal execution
|
|
</LI>
|
|
</OL>
|
|
<P>Special interest has the Timer IRQ, coming every TIMER ms to
|
|
manage:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>Alarms</LI>
|
|
<LI>System and task counters (used by schedule to decide when stop
|
|
a process or for accounting)</LI>
|
|
<LI>Multitasking based on wake up mechanism after TIMESLICE time.
|
|
</LI>
|
|
</OL>
|
|
<H2><A NAME="ss3.4">3.4 Multitasking</A>
|
|
</H2>
|
|
|
|
<H3>Mechanism</H3>
|
|
|
|
<P>The key point of modern OSs is the "Task". The Task is an application
|
|
running in memory sharing all resources (included CPU and Memory)
|
|
with other Tasks.
|
|
<P>
|
|
<P>This "resource sharing" is managed by the "Multitasking Mechanism".
|
|
The Multitasking Mechanism switches from one task to another after
|
|
a "timeslice" time. Users have the "illusion" that they own all resources.
|
|
We can also imagine a single user scenario, where a user can have
|
|
the "illusion" of running many tasks at the same time.
|
|
<P>
|
|
<P>To implement this multitasking, the task uses "the state" variable,
|
|
which can be:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>READY, ready for execution</LI>
|
|
<LI>BLOCKED, waiting for a resource
|
|
</LI>
|
|
</OL>
|
|
<P>The task state is managed by its presence in a relative list:
|
|
READY list and BLOCKED list.
|
|
<P>
|
|
<H3>Task Switching</H3>
|
|
|
|
<P>The movement from one task to another is called ''Task Switching''.
|
|
many computers have a hardware instruction which automatically performs
|
|
this operation. Task Switching occurs in the following cases:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>After Timeslice ends: we need to schedule a "Ready for execution"
|
|
task and give it access.</LI>
|
|
<LI>When a Task has to wait for a device: we need to schedule a new
|
|
task and switch to it *
|
|
</LI>
|
|
</OL>
|
|
<P>* We schedule another task to prevent "Busy Form Waiting", which
|
|
occurs when we are waiting for a device instead performing other
|
|
work.
|
|
<P>
|
|
<P>Task Switching is managed by the "Schedule" entity.
|
|
<P>
|
|
<P>
|
|
<PRE>
|
|
|
|
Timer | |
|
|
IRQ | | Schedule
|
|
| | | ________________________
|
|
|----->| Task 1 |<------------------>|(1)Chooses a Ready Task |
|
|
| | | |(2)Task Switching |
|
|
| |___________| |________________________|
|
|
| | | /|\
|
|
| | | |
|
|
| | | |
|
|
| | | |
|
|
| | | |
|
|
|----->| Task 2 |<-------------------------------|
|
|
| | | |
|
|
| |___________| |
|
|
. . . . .
|
|
. . . . .
|
|
. . . . .
|
|
| | | |
|
|
| | | |
|
|
------>| Task N |<--------------------------------
|
|
| |
|
|
|___________|
|
|
|
|
Task Switching based on TimeSlice
|
|
|
|
</PRE>
|
|
<P>A typical Timeslice for Linux is about 10 ms.
|
|
<P>
|
|
<P>
|
|
<PRE>
|
|
|
|
|
|
|
|
| |
|
|
| | Resource _____________________________
|
|
| Task 1 |----------->|(1) Enqueue Resource request |
|
|
| | Access |(2) Mark Task as blocked |
|
|
| | |(3) Choose a Ready Task |
|
|
|___________| |(4) Task Switching |
|
|
|_____________________________|
|
|
|
|
|
|
|
|
| | |
|
|
| | |
|
|
| Task 2 |<-------------------------
|
|
| |
|
|
| |
|
|
|___________|
|
|
|
|
Task Switching based on Waiting for a Resource
|
|
|
|
</PRE>
|
|
<H2><A NAME="ss3.5">3.5 Microkernel vs Monolithic OS</A>
|
|
</H2>
|
|
|
|
<H3>Overview</H3>
|
|
|
|
<P>Until now we viewed so called Monolithic OS, but there is also
|
|
another kind of OS: ''Microkernel''.
|
|
<P>
|
|
<P>A Microkernel OS uses Tasks, not only for user mode processes,
|
|
but also as a real kernel manager, like Floppy-Task, HDD-Task, Net-Task
|
|
and so on. Some examples are Amoeba, and Mach.
|
|
<P>
|
|
<H3>PROs and CONTROs of Microkernel OS </H3>
|
|
|
|
<P>PROS:
|
|
<P>
|
|
<P>
|
|
<UL>
|
|
<LI>OS is simpler to maintain because each Task manages a single
|
|
kind of operation. So if you want to modify networking, you modify
|
|
Net-Task (ideally, if it is not needed a structural update).
|
|
</LI>
|
|
</UL>
|
|
<P>CONS:
|
|
<P>
|
|
<P>
|
|
<UL>
|
|
<LI>Performances are worse than Monolithic OS, because you have to
|
|
add 2*TASK_SWITCH times (the first to enter the specific Task, the
|
|
second to go out from it).
|
|
</LI>
|
|
</UL>
|
|
<P>My personal opinion is that, Microkernels are a good didactic
|
|
example (like Minix) but they are not ''optimal'', so not really
|
|
suitable. Linux uses a few Tasks, called "Kernel Threads" to implement
|
|
a little microkernel structure (like kswapd, which is used to retrieve
|
|
memory pages from mass storage). In this case there are no problems
|
|
with perfomance because swapping is a very slow job.
|
|
<P>
|
|
<H2><A NAME="ss3.6">3.6 Networking</A>
|
|
</H2>
|
|
|
|
<H3>ISO OSI levels</H3>
|
|
|
|
<P>Standard ISO-OSI describes a network architecture with the following
|
|
levels:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>Physical level (examples: PPP and Ethernet)</LI>
|
|
<LI>Data-link level (examples: PPP and Ethernet)</LI>
|
|
<LI>Network level (examples: IP, and X.25)</LI>
|
|
<LI>Transport level (examples: TCP, UDP)</LI>
|
|
<LI>Session level (SSL)</LI>
|
|
<LI>Presentation level (FTP binary-ascii coding)</LI>
|
|
<LI>Application level (applications like Netscape)
|
|
</LI>
|
|
</OL>
|
|
<P>The first 2 levels listed above are often implemented in hardware.
|
|
Next levels are in software (or firmware for routers).
|
|
<P>
|
|
<P>Many protocols are used by an OS: one of these is TCP/IP (the
|
|
most important living on 3-4 levels).
|
|
<P>
|
|
<H3>What does the kernel?</H3>
|
|
|
|
<P>The kernel doesn't know anything (only addresses) about first
|
|
2 levels of ISO-OSI.
|
|
<P>
|
|
<P>In RX it:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>Manages handshake with low levels devices (like ethernet card
|
|
or modem) receiving "frames" from them.</LI>
|
|
<LI>Builds TCP/IP "packets" from "frames" (like Ethernet or PPP ones),
|
|
</LI>
|
|
<LI>Convers ''packets'' in ''sockets'' passing them to the right
|
|
application (using port number) or</LI>
|
|
<LI>Forwards packets to the right queue
|
|
</LI>
|
|
</OL>
|
|
<P>
|
|
<PRE>
|
|
frames packets sockets
|
|
NIC ---------> Kernel ----------> Application
|
|
| packets
|
|
--------------> Forward
|
|
- RX -
|
|
</PRE>
|
|
<P>In TX stage it:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>Converts sockets or </LI>
|
|
<LI>Queues datas into TCP/IP ''packets''</LI>
|
|
<LI>Splits ''packets" into "frames" (like Ethernet or PPP ones)</LI>
|
|
<LI>Sends ''frames'' using HW drivers
|
|
</LI>
|
|
</OL>
|
|
<P>
|
|
<PRE>
|
|
sockets packets frames
|
|
Application ---------> Kernel ----------> NIC
|
|
packets /|\
|
|
Forward -------------------
|
|
- TX -
|
|
|
|
|
|
</PRE>
|
|
<H2><A NAME="ss3.7">3.7 Virtual Memory</A>
|
|
</H2>
|
|
|
|
<H3>Segmentation</H3>
|
|
|
|
<P>Segmentation is the first method to solve memory allocation problems:
|
|
it allows you to compile source code without caring where the application
|
|
will be placed in memory. As a matter of fact, this feature helps
|
|
applications developers to develop in a independent fashion from
|
|
the OS e also from the hardware.
|
|
<P>
|
|
<P>
|
|
<PRE>
|
|
|
|
| Stack |
|
|
| | |
|
|
| \|/ |
|
|
| Free |
|
|
| /|\ | Segment <---> Process
|
|
| | |
|
|
| Heap |
|
|
| Data uninitialized |
|
|
| Data initialized |
|
|
| Code |
|
|
|____________________|
|
|
|
|
Segment
|
|
|
|
</PRE>
|
|
<P>We can say that a segment is the logical entity of an application,
|
|
or the image of the application in memory.
|
|
<P>
|
|
<P>When programming, we don't care where our data is put in memory,
|
|
we only care about the offset inside our segment (our application).
|
|
<P>
|
|
<P>We use to assign a Segment to each Process and vice versa. In
|
|
Linux this is not true. Linux uses only 4 segments for either Kernel
|
|
and all Processes.
|
|
<P>
|
|
<H3>Problems of Segmentation</H3>
|
|
|
|
<P>
|
|
<PRE>
|
|
|
|
____________________
|
|
----->| |----->
|
|
| IN | Segment A | OUT
|
|
____________________ | |____________________|
|
|
| |____| | |
|
|
| Segment B | | Segment B |
|
|
| |____ | |
|
|
|____________________| | |____________________|
|
|
| | Segment C |
|
|
| |____________________|
|
|
----->| Segment D |----->
|
|
IN |____________________| OUT
|
|
|
|
Segmentation problem
|
|
|
|
|
|
</PRE>
|
|
<P>In the diagram above, we want to get exit processes A, and D
|
|
and enter process B. As we can see there is enough space for B, but
|
|
we cannot split it in 2 pieces, so we CANNOT load it (memory out).
|
|
<P>
|
|
<P>The reason this problem occurs is because pure segments are continuous
|
|
areas (because they are logical areas) and cannot be split.
|
|
<P>
|
|
<H3>Pagination</H3>
|
|
|
|
<P>
|
|
<PRE>
|
|
|
|
____________________
|
|
| Page 1 |
|
|
|____________________|
|
|
| Page 2 |
|
|
|____________________|
|
|
| .. | Segment <---> Process
|
|
|____________________|
|
|
| Page n |
|
|
|____________________|
|
|
| |
|
|
|____________________|
|
|
| |
|
|
|____________________|
|
|
|
|
Segment
|
|
|
|
</PRE>
|
|
<P>Pagination splits memory in "n" pieces, each one with
|
|
a fixed length.
|
|
<P>
|
|
<P>A process may be loaded in one or more Pages. When memory is
|
|
freed, all pages are freed (see Segmentation Problem, before).
|
|
<P>
|
|
<P>Pagination is also used for another important purpose, "Swapping".
|
|
If a page is not present in physical memory then it generates an
|
|
EXCEPTION, that will make the Kernel search for a new page in storage
|
|
memory. This mechanism allow OS to load more applications than the
|
|
ones allowed by physical memory only.
|
|
<P>
|
|
<H3>Pagination Problem</H3>
|
|
|
|
<P>
|
|
<PRE>
|
|
____________________
|
|
Page X | Process Y |
|
|
|____________________|
|
|
| |
|
|
| WASTE |
|
|
| SPACE |
|
|
|____________________|
|
|
|
|
Pagination Problem
|
|
|
|
</PRE>
|
|
<P>In the diagram above, we can see what is wrong with the pagination
|
|
policy: when a Process Y loads into Page X, ALL memory space of the
|
|
Page is allocated, so the remaining space at the end of Page is wasted.
|
|
<P>
|
|
<H3>Segmentation and Pagination</H3>
|
|
|
|
<P>How can we solve segmentation and pagination problems? Using
|
|
either 2 policies.
|
|
<P>
|
|
<P>
|
|
<PRE>
|
|
|
|
| .. |
|
|
|____________________|
|
|
----->| Page 1 |
|
|
| |____________________|
|
|
| | .. |
|
|
____________________ | |____________________|
|
|
| | |---->| Page 2 |
|
|
| Segment X | ----| |____________________|
|
|
| | | | .. |
|
|
|____________________| | |____________________|
|
|
| | .. |
|
|
| |____________________|
|
|
|---->| Page 3 |
|
|
|____________________|
|
|
| .. |
|
|
|
|
</PRE>
|
|
<P>Process X, identified by Segment X, is split in 3 pieces and
|
|
each of one is loaded in a page.
|
|
<P>
|
|
<P>We do not have:
|
|
<P>
|
|
<P>
|
|
<OL>
|
|
<LI>Segmentation problem: we allocate per Pages, so we also free
|
|
Pages and we manage free space in an optimized way.</LI>
|
|
<LI>Pagination problem: only last page wastes space, but we can decide
|
|
to use very small pages, for example 4096 bytes length (losing at
|
|
maximum 4096*N_Tasks bytes) and manage hierarchical paging (using
|
|
2 or 3 levels of paging)
|
|
</LI>
|
|
</OL>
|
|
<P>
|
|
<PRE>
|
|
|
|
|
|
|
|
| | | |
|
|
| | Offset2 | Value |
|
|
| | /|\| |
|
|
Offset1 | |----- | | |
|
|
/|\ | | | | | |
|
|
| | | | \|/| |
|
|
| | | ------>| |
|
|
\|/ | | | |
|
|
Base Paging Address ---->| | | |
|
|
| ....... | | ....... |
|
|
| | | |
|
|
|
|
Hierarchical Paging
|
|
</PRE>
|
|
<HR>
|
|
<A HREF="KernelAnalysis-HOWTO-4.html">Next</A>
|
|
<A HREF="KernelAnalysis-HOWTO-2.html">Previous</A>
|
|
<A HREF="KernelAnalysis-HOWTO.html#toc3">Contents</A>
|
|
</BODY>
|
|
</HTML>
|