From e607922fe231720832de0fa2cab7cbef8ea21386 Mon Sep 17 00:00:00 2001 From: gferg <> Date: Fri, 28 Mar 2003 20:32:58 +0000 Subject: [PATCH] updated --- LDP/howto/docbook/HOWTO-INDEX/adminSect.sgml | 2 +- LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml | 2 +- LDP/howto/docbook/HOWTO-INDEX/osSect.sgml | 2 +- LDP/howto/linuxdoc/KernelAnalysis-HOWTO.sgml | 1598 ++++++++++++------ 4 files changed, 1098 insertions(+), 506 deletions(-) diff --git a/LDP/howto/docbook/HOWTO-INDEX/adminSect.sgml b/LDP/howto/docbook/HOWTO-INDEX/adminSect.sgml index d6e980a8..47944d0e 100644 --- a/LDP/howto/docbook/HOWTO-INDEX/adminSect.sgml +++ b/LDP/howto/docbook/HOWTO-INDEX/adminSect.sgml @@ -160,7 +160,7 @@ troubleshooting for ix86-based systems. KernelAnalysis-HOWTO, KernelAnalysis-HOWTO -Updated: July 2002. +Updated: March 2003. Explains some things about the Linux Kernel, such as the most important components, how they work, and so on. diff --git a/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml b/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml index 543337d2..c60493ac 100644 --- a/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml +++ b/LDP/howto/docbook/HOWTO-INDEX/howtoChap.sgml @@ -1394,7 +1394,7 @@ troubleshooting for ix86-based systems. KernelAnalysis-HOWTO, KernelAnalysis-HOWTO -Updated: July 2002. +Updated: March 2003. Explains some things about the Linux Kernel, such as the most important components, how they work, and so on. diff --git a/LDP/howto/docbook/HOWTO-INDEX/osSect.sgml b/LDP/howto/docbook/HOWTO-INDEX/osSect.sgml index 6f15d3bd..0af6e9f4 100644 --- a/LDP/howto/docbook/HOWTO-INDEX/osSect.sgml +++ b/LDP/howto/docbook/HOWTO-INDEX/osSect.sgml @@ -499,7 +499,7 @@ troubleshooting for ix86-based systems. KernelAnalysis-HOWTO, KernelAnalysis-HOWTO -Updated: July 2002. +Updated: March 2003. Explains some things about the Linux Kernel, such as the most important components, how they work, and so on. diff --git a/LDP/howto/linuxdoc/KernelAnalysis-HOWTO.sgml b/LDP/howto/linuxdoc/KernelAnalysis-HOWTO.sgml index 3dc19a97..bc5184a1 100644 --- a/LDP/howto/linuxdoc/KernelAnalysis-HOWTO.sgml +++ b/LDP/howto/linuxdoc/KernelAnalysis-HOWTO.sgml @@ -1,80 +1,101 @@ -
+ KernelAnalysis-HOWTO + -Roberto Arcomano +Roberto Arcomano berto@bertolinux.com + -v0.63 - July 31, 2002 +v0.7, March 26, 2003 + -This document tries to explain some things about the Linux Kernel, such - as the most important components, how they work, and so on. This HOWTO should - help prevent the reader from needing to browse all the kernel source files - searching for the"right function," declaration, and definition, and then linking - each to the other. You can find the latest version of this document at If - you have suggestions to help make this document better, please submit your - ideas to me at the following address: +This document tries to explain some things about the Linux Kernel, + such as the most important components, how they work, and so on. + This HOWTO should help prevent the reader from needing to browse + all the kernel source files searching for the"right function," declaration, + and definition, and then linking each to the other. You can find + the latest version of this document at If you have suggestions to + help make this document better, please submit your ideas to me at + the following address: + Introduction Introduction

-This HOWTO tries to define how parts of the Linux Kernel work, what are - the main functions and data structures used, and how the "wheel spins". You can - find the latest version of this document at If you have suggestions to help - make this document better, please submit your ideas to me at the following - address: Code used within this document refers to the Linux Kernel version - 2.4.x, which is the last stable kernel version at time of writing this HOWTO. +This HOWTO tries to define how parts of the Linux Kernel work, + what are the main functions and data structures used, and how the + "wheel spins". You can find the latest version of this document at + If you have suggestions to help make this document better, please + submit your ideas to me at the following address: Code used within + this document refers to the Linux Kernel version 2.4.x, which is + the last stable kernel version at time of writing this HOWTO. +

Copyright

-Copyright (C) 2000,2001,2002 Roberto Arcomano. This document is free; you - can redistribute it and/or modify it under the terms of the GNU General Public - License as published by the Free Software Foundation; either version 2 of the - License, or (at your option) any later version. This document is distributed - in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even - the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. - See the GNU General Public License for more details. You can get a copy of - the GNU GPL +Copyright (C) 2000,2001,2002 Roberto Arcomano. This document + is free; you can redistribute it and/or modify it under the terms + of the GNU General Public License as published by the Free Software + Foundation; either version 2 of the License, or (at your option) + any later version. This document is distributed in the hope that + it will be useful, but WITHOUT ANY WARRANTY; without even the implied + warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + See the GNU General Public License for more details. You can get + a copy of the GNU GPL +

Translations

-If you want to translate this document you are free to do so. However, - you will need to do the following: +If you want to translate this document you are free to do so. + However, you will need to do the following: +

+

-Check that another version of the document doesn't already exist at your - local LDP +Check that another version of the document doesn't already exist + at your local LDP -Maintain all 'Introduction' sections (including 'Introduction', 'Copyright', - 'Translations' , 'Credits'). +Maintain all 'Introduction' sections (including 'Introduction', + 'Copyright', 'Translations' , 'Credits'). +

-Warning! You don't have to translate TXT or HTML file, you have to modify - LYX file, so that it is possible to convert it all other formats (TXT, HTML, - RIFF, etc.): to do that you can use "LyX" application you download from . +Warning! You don't have to translate TXT or HTML file, you have + to modify LYX file, so that it is possible to convert it all other + formats (TXT, HTML, RIFF, etc.): to do that you can use "LyX" application + you download from . +

-No need to ask me to translate! You just have to let me know (if you want) - about your translation. +No need to ask me to translate! You just have to let me know + (if you want) about your translation. +

Thank you for your translation! +

Credits

-Thanks to for publishing and uploading my document quickly. +Thanks to for publishing and uploading my document quickly. + +

+

+Thanks to Klaas de Waal for his suggestions. +

Syntax used @@ -82,49 +103,64 @@ Syntax used Function Syntax

When speaking about a function, we write: +

+

"function_name [ file location . extension ]" +

For example: +

+

"schedule [kernel/sched.c]" +

tells us that we talk about +

"schedule" +

function retrievable from file +

[ kernel/sched.c ] +

Note: We also assume /usr/src/linux as the starting directory. +

Indentation

Indentation in source code is 3 blank characters. +

InterCallings Analysis Overview

-We use the"InterCallings Analysis "(ICA) to see (in an indented fashion) - how kernel functions call each other. +We use the"InterCallings Analysis "(ICA) to see (in an indented + fashion) how kernel functions call each other. +

For example, the sleep_on command is described in ICA below: +

+

|sleep_on @@ -139,10 +175,13 @@ For example, the sleep_on command is described in ICA below: sleep_on ICA +

The indented ICA is followed by functions' locations: +

+

@@ -163,116 +202,148 @@ __remove_wait_queue [include/linux/wait.h] list_del [include/linux/list.h] __list_del +

-Note: We don't specify anymore file location, if specified just before. +Note: We don't specify anymore file location, if specified just + before. +

Details

In an ICA a line like looks like the following +

+

function1 -> function2 +

-means that < function1 > is a generic pointer to another function. - In this case < function1 > points to < function2 >. +means that < function1 > is a generic pointer to another + function. In this case < function1 > points to < function2 + >. +

When we write: +

+

function: +

-it means that < function > is not a real function. It is a label - (typically assembler label). +it means that < function > is not a real function. It is + a label (typically assembler label). +

-In many sections we may report a ''C'' code or a ''pseudo-code''. In real - source files, you could use ''assembler'' or ''not structured'' code. This - difference is for learning purposes. +In many sections we may report a ''C'' code or a ''pseudo-code''. + In real source files, you could use ''assembler'' or ''not structured'' + code. This difference is for learning purposes. +

PROs of using ICA

The advantages of using ICA (InterCallings Analysis) are many: +

+

-You get an overview of what happens when you call a kernel function +You get an overview of what happens when you call a kernel function + -Function locations are indicated after the function, so ICA could also - be considered as a little ''function reference'' +Function locations are indicated after the function, so ICA could + also be considered as a little ''function reference'' -InterCallings Analysis (ICA) is useful in sleep/awake mechanisms, where - we can view what we do before sleeping, the proper sleeping action, and what - we'll do after waking up (after schedule). +InterCallings Analysis (ICA) is useful in sleep/awake mechanisms, + where we can view what we do before sleeping, the proper sleeping + action, and what we'll do after waking up (after schedule). +

CONTROs of using ICA +

Some of the disadvantages of using ICA are listed below: +

-As all theoretical models, we simplify reality avoiding many details, such - as real source code and special conditions. +As all theoretical models, we simplify reality avoiding many + details, such as real source code and special conditions. +

+

-Additional diagrams should be added to better represent stack conditions, - data values, and so on. +Additional diagrams should be added to better represent stack + conditions, data values, and so on. +

Fundamentals What is the kernel?

-The kernel is the "core" of any computer system: it is the "software" which - allows users to share computer resources. +The kernel is the "core" of any computer system: it is the "software" + which allows users to share computer resources. +

-The kernel can be thought ofas the main software of the OS (Operating System), - which may also include graphics management. +The kernel can be thought as the main software of the OS (Operating + System), which may also include graphics management. +

-For example, under Linux (like other Unix-like OSs), the XWindow environment - doesn't belong to the Linux Kernel, because it manages only graphical operations - (it uses user mode I/O to access video card devices). +For example, under Linux (like other Unix-like OSs), the XWindow + environment doesn't belong to the Linux Kernel, because it manages + only graphical operations (it uses user mode I/O to access video + card devices). +

-By contrast, Windows environments (Win9x, WinME, WinNT, Win2K, WinXP, and - so on) are a mix between a graphical environment and kernel. +By contrast, Windows environments (Win9x, WinME, WinNT, Win2K, + WinXP, and so on) are a mix between a graphical environment and kernel. +

What is the difference between User Mode and Kernel Mode? Overview

-Many years ago, when computers were as big as a room, users ran their applications - with much difficulty and, sometimes, their applications crashed the computer. - +Many years ago, when computers were as big as a room, users ran + their applications with much difficulty and, sometimes, their applications + crashed the computer. +

Operative modes

-To avoid having applications that constantly crashed, newer OSs were designed - with 2 different operative modes: +To avoid having applications that constantly crashed, newer OSs + were designed with 2 different operative modes: +

+

-Kernel Mode: the machine operates with critical data structure, direct - hardware (IN/OUT or memory mapped), direct memory, IRQ, DMA, and so on. +Kernel Mode: the machine operates with critical data structure, + direct hardware (IN/OUT or memory mapped), direct memory, IRQ, DMA, + and so on. User Mode: users can run applications. +

@@ -289,63 +360,76 @@ Implementation | _______ _______ | Abstraction | | | | | | \|/ Hardware | +

-Kernel Mode "prevents" User Mode applications from damaging the system or - its features. +Kernel Mode "prevents" User Mode applications from damaging the + system or its features. +

-Modern microprocessors implement in hardware at least 2 different states. - For example under Intel, 4 states determine the PL (Privilege Level). It is - possible to use 0,1,2,3 states, with 0 used in Kernel Mode. +Modern microprocessors implement in hardware at least 2 different + states. For example under Intel, 4 states determine the PL (Privilege + Level). It is possible to use 0,1,2,3 states, with 0 used in Kernel + Mode. +

-Unix OS requires only 2 privilege levels, and we will use such a paradigm - as point of reference. +Unix OS requires only 2 privilege levels, and we will use such + a paradigm as point of reference. +

Switching from User Mode to Kernel Mode When do we switch?

-Once we understand that there are 2 different modes, we have to know when - we switch from one to the other. +Once we understand that there are 2 different modes, we have + to know when we switch from one to the other. +

Typically, there are 2 points of switching: +

+

-When calling a System Call: after calling a System Call, the task voluntary - calls pieces of code living in Kernel Mode +When calling a System Call: after calling a System Call, the + task voluntary calls pieces of code living in Kernel Mode -When an IRQ (or exception) comes: after the IRQ an IRQ handler (or exception - handler) is called, then control returns back to the task that was interrupted - like nothing was happened. +When an IRQ (or exception) comes: after the IRQ an IRQ handler + (or exception handler) is called, then control returns back to the + task that was interrupted like nothing was happened. +

System Calls

-System calls are like special functions that manage OS routines which live - in Kernel Mode. +System calls are like special functions that manage OS routines + which live in Kernel Mode. +

A system call can be called when we: +

+

access an I/O device or a file (like read or write) -need to access privileged information (like pid, changing scheduling policy - or other information) +need to access privileged information (like pid, changing scheduling + policy or other information) -need to change execution context (like forking or executing some other - application) +need to change execution context (like forking or executing some + other application) -need to execute a particular command (like ''chdir'', ''kill", ''brk'', - or ''signal'') +need to execute a particular command (like ''chdir'', ''kill", + ''brk'', or ''signal'') +

@@ -366,20 +450,27 @@ need to execute a particular command (like ''chdir'', ''kill", ''brk'', Unix System Calls Working +

-System calls are almost the only interface used by User Mode to talk with - low level resources (hardware). The only exception to this statement is when - a process uses ''ioperm'' system call. In this case a device can be accessed - directly by User Mode process (IRQs cannot be used). +System calls are almost the only interface used by User Mode + to talk with low level resources (hardware). The only exception to + this statement is when a process uses ''ioperm'' system call. In + this case a device can be accessed directly by User Mode process + (IRQs cannot be used). +

-NOTE: Not every ''C'' function is a system call, only some of them. +NOTE: Not every ''C'' function is a system call, only some of + them. +

-Below is a list of System Calls under Linux Kernel 2.4.17, from [ - arch/i386/kernel/entry.S ] +Below is a list of System Calls under Linux Kernel 2.4.17, from + [ arch/i386/kernel/entry.S ] +

+

.long SYMBOL_NAME(sys_ni_syscall) /* 0 - old "setup()" system call*/ @@ -610,17 +701,21 @@ Below is a list of System Calls under Linux Kernel 2.4.17, from [ .long SYMBOL_NAME(sys_readahead) /* 225 */ +

IRQ Event

-When an IRQ comes, the task that is running is interrupted in order to - service the IRQ Handler. +When an IRQ comes, the task that is running is interrupted in + order to service the IRQ Handler. +

-After the IRQ is handled, control returns backs exactly to point of interrupt, - like nothing happened. +After the IRQ is handled, control returns backs exactly to point + of interrupt, like nothing happened. +

+

@@ -640,11 +735,14 @@ EXECUTION |___________| [return to code] User->Kernel Mode Transition caused by IRQ event +

-The numbered steps below refer to the sequence of events in the diagram - above: +The numbered steps below refer to the sequence of events in the + diagram above: +

+

@@ -659,73 +757,92 @@ The "Interrupt handler" code is executed. Control returns back to task user mode (as if nothing happened) Process returns back to normal execution +

-Special interest has the Timer IRQ, coming every TIMER ms to manage: +Special interest has the Timer IRQ, coming every TIMER ms to + manage: +

+

Alarms -System and task counters (used by schedule to decide when stop a process - or for accounting) +System and task counters (used by schedule to decide when stop + a process or for accounting) Multitasking based on wake up mechanism after TIMESLICE time. +

Multitasking Mechanism

-The key point of modern OSs is the "Task". The Task is an application running - in memory sharing all resources (included CPU and Memory) with other Tasks. +The key point of modern OSs is the "Task". The Task is an application + running in memory sharing all resources (included CPU and Memory) + with other Tasks. +

-This "resource sharing" is managed by the "Multitasking Mechanism". The Multitasking - Mechanism switches from one task to another after a "timeslice" time. Users have - the "illusion" that they own all resources. We can also imagine a single user - scenario, where a user can have the "illusion" of running many tasks at the same - time. +This "resource sharing" is managed by the "Multitasking Mechanism". + The Multitasking Mechanism switches from one task to another after + a "timeslice" time. Users have the "illusion" that they own all resources. + We can also imagine a single user scenario, where a user can have + the "illusion" of running many tasks at the same time. +

-To implement this multitasking, the task uses "the state" variable, which - can be: +To implement this multitasking, the task uses "the state" variable, + which can be: +

+

READY, ready for execution BLOCKED, waiting for a resource +

-The task state is managed by its presence in a relative list: READY list - and BLOCKED list. +The task state is managed by its presence in a relative list: + READY list and BLOCKED list. +

Task Switching

-The movement from one task to another is called ''Task Switching''. many - computers have a hardware instruction which automatically performs this operation. - Task Switching occurs in the following cases: +The movement from one task to another is called ''Task Switching''. + many computers have a hardware instruction which automatically performs + this operation. Task Switching occurs in the following cases: +

+

-After Timeslice ends: we need to schedule a "Ready for execution" task and - give it access. +After Timeslice ends: we need to schedule a "Ready for execution" + task and give it access. -When a Task has to wait for a device: we need to schedule a new task and - switch to it * +When a Task has to wait for a device: we need to schedule a new + task and switch to it * +

-* We schedule another task to prevent "Busy Form Waiting", which occurs - when we are waiting for a device instead performing other work. +* We schedule another task to prevent "Busy Form Waiting", which + occurs when we are waiting for a device instead performing other + work. +

Task Switching is managed by the "Schedule" entity. +

+

@@ -754,10 +871,13 @@ Timer | | Task Switching based on TimeSlice +

A typical Timeslice for Linux is about 10 ms. +

+

@@ -781,56 +901,70 @@ A typical Timeslice for Linux is about 10 ms. Task Switching based on Waiting for a Resource +

Microkernel vs Monolithic OS Overview

-Until now we viewed so called Monolithic OS, but there is also another - kind of OS: ''Microkernel''. +Until now we viewed so called Monolithic OS, but there is also + another kind of OS: ''Microkernel''. +

-A Microkernel OS uses Tasks, not only for user mode processes, but also - as a real kernel manager, like Floppy-Task, HDD-Task, Net-Task and so on. Some - examples are Amoeba, and Mach. +A Microkernel OS uses Tasks, not only for user mode processes, + but also as a real kernel manager, like Floppy-Task, HDD-Task, Net-Task + and so on. Some examples are Amoeba, and Mach. +

PROs and CONTROs of Microkernel OS

PROS: +

+

-OS is simpler to maintain because each Task manages a single kind of operation. - So if you want to modify networking, you modify Net-Task (ideally, if it is - not needed a structural update). +OS is simpler to maintain because each Task manages a single + kind of operation. So if you want to modify networking, you modify + Net-Task (ideally, if it is not needed a structural update). +

CONS: +

+

-Performances are worse than Monolithic OS, because you have to add 2*TASK_SWITCH - times (the first to enter the specific Task, the second to go out from it). +Performances are worse than Monolithic OS, because you have to + add 2*TASK_SWITCH times (the first to enter the specific Task, the + second to go out from it). +

-My personal opinion is that, Microkernels are a good didactic example (like - Minix) but they are not ''optimal'', so not really suitable. Linux uses a few - Tasks, called "Kernel Threads" to implement a little microkernel structure (like - kswapd, which is used to retrieve memory pages from mass storage). In this - case there are no problems with perfomance because swapping is a very slow - job. +My personal opinion is that, Microkernels are a good didactic + example (like Minix) but they are not ''optimal'', so not really + suitable. Linux uses a few Tasks, called "Kernel Threads" to implement + a little microkernel structure (like kswapd, which is used to retrieve + memory pages from mass storage). In this case there are no problems + with perfomance because swapping is a very slow job. +

Networking ISO OSI levels

-Standard ISO-OSI describes a network architecture with the following levels: +Standard ISO-OSI describes a network architecture with the following + levels: +

+

@@ -847,36 +981,44 @@ Session level (SSL) Presentation level (FTP binary-ascii coding) Application level (applications like Netscape) +

-The first 2 levels listed above are often implemented in hardware. Next - levels are in software (or firmware for routers). +The first 2 levels listed above are often implemented in hardware. + Next levels are in software (or firmware for routers). +

-Many protocols are used by an OS: one of these is TCP/IP (the most important - living on 3-4 levels). +Many protocols are used by an OS: one of these is TCP/IP (the + most important living on 3-4 levels). +

What does the kernel?

-The kernel doesn't know anything (only addresses) about first 2 levels - of ISO-OSI. +The kernel doesn't know anything (only addresses) about first + 2 levels of ISO-OSI. +

In RX it: +

+

-Manages handshake with low levels devices (like ethernet card or modem) - receiving "frames" from them. +Manages handshake with low levels devices (like ethernet card + or modem) receiving "frames" from them. -Builds TCP/IP "packets" from "frames" (like Ethernet or PPP ones), +Builds TCP/IP "packets" from "frames" (like Ethernet or PPP ones), + -Convers ''packets'' in ''sockets'' passing them to the right application - (using port number) or +Convers ''packets'' in ''sockets'' passing them to the right + application (using port number) or Forwards packets to the right queue +

@@ -885,10 +1027,13 @@ NIC ---------> Kernel ----------> Application | packets --------------> Forward - RX - +

In TX stage it: +

+

@@ -899,6 +1044,7 @@ Queues datas into TCP/IP ''packets'' Splits ''packets" into "frames" (like Ethernet or PPP ones) Sends ''frames'' using HW drivers +

@@ -909,17 +1055,21 @@ Forward ------------------- - TX - +

Virtual Memory Segmentation

-Segmentation is the first method to solve memory allocation problems: it - allows you to compile source code without caring where the application will - be placed in memory. As a matter of fact, this feature helps applications developers - to develop in a independent fashion from the OS e also from the hardware. +Segmentation is the first method to solve memory allocation problems: + it allows you to compile source code without caring where the application + will be placed in memory. As a matter of fact, this feature helps + applications developers to develop in a independent fashion from + the OS e also from the hardware. +

+

@@ -937,21 +1087,27 @@ Segmentation is the first method to solve memory allocation problems: it Segment +

-We can say that a segment is the logical entity of an application, or the - image of the application in memory. +We can say that a segment is the logical entity of an application, + or the image of the application in memory. +

-When programming, we don't care where our data is put in memory, we only - care about the offset inside our segment (our application). +When programming, we don't care where our data is put in memory, + we only care about the offset inside our segment (our application). +

-We use to assign a Segment to each Process and vice versa. In Linux this - is not true. Linux uses only 4 segments for either Kernel and all Processes. +We use to assign a Segment to each Process and vice versa. In + Linux this is not true. Linux uses only 4 segments for either Kernel + and all Processes. +

Problems of Segmentation +

@@ -971,18 +1127,22 @@ Problems of Segmentation Segmentation problem +

-In the diagram above, we want to get exit processes A, and D and enter - process B. As we can see there is enough space for B, but we cannot split it - in 2 pieces, so we CANNOT load it (memory out). +In the diagram above, we want to get exit processes A, and D + and enter process B. As we can see there is enough space for B, but + we cannot split it in 2 pieces, so we CANNOT load it (memory out). +

The reason this problem occurs is because pure segments are continuous areas (because they are logical areas) and cannot be split. +

Pagination +

@@ -1002,23 +1162,29 @@ Pagination Segment +

-Pagination splits memory in "n" pieces, each one with a fixed - length. +Pagination splits memory in "n" pieces, each one with + a fixed length. +

-A process may be loaded in one or more Pages. When memory is freed, all - pages are freed (see Segmentation Problem, before). +A process may be loaded in one or more Pages. When memory is + freed, all pages are freed (see Segmentation Problem, before). +

-Pagination is also used for another important purpose, "Swapping". If a page - is not present in physical memory then it generates an EXCEPTION, that will - make the Kernel search for a new page in storage memory. This mechanism allow - OS to load more applications than the ones allowed by physical memory only. +Pagination is also used for another important purpose, "Swapping". + If a page is not present in physical memory then it generates an + EXCEPTION, that will make the Kernel search for a new page in storage + memory. This mechanism allow OS to load more applications than the + ones allowed by physical memory only. +

Pagination Problem +

____________________ @@ -1031,17 +1197,22 @@ Pagination Problem Pagination Problem +

-In the diagram above, we can see what is wrong with the pagination policy: - when a Process Y loads into Page X, ALL memory space of the Page is allocated, - so the remaining space at the end of Page is wasted. +In the diagram above, we can see what is wrong with the pagination + policy: when a Process Y loads into Page X, ALL memory space of the + Page is allocated, so the remaining space at the end of Page is wasted. +

Segmentation and Pagination

-How can we solve segmentation and pagination problems? Using either 2 policies. +How can we solve segmentation and pagination problems? Using + either 2 policies. +

+

@@ -1061,23 +1232,29 @@ How can we solve segmentation and pagination problems? Using either 2 policies. |____________________| | .. | +

-Process X, identified by Segment X, is split in 3 pieces and each of one - is loaded in a page. +Process X, identified by Segment X, is split in 3 pieces and + each of one is loaded in a page. +

We do not have: +

+

-Segmentation problem: we allocate per Pages, so we also free Pages and - we manage free space in an optimized way. +Segmentation problem: we allocate per Pages, so we also free + Pages and we manage free space in an optimized way. -Pagination problem: only last page wastes space, but we can decide to use - very small pages, for example 4096 bytes length (losing at maximum 4096*N_Tasks - bytes) and manage hierarchical paging (using 2 or 3 levels of paging) +Pagination problem: only last page wastes space, but we can decide + to use very small pages, for example 4096 bytes length (losing at + maximum 4096*N_Tasks bytes) and manage hierarchical paging (using + 2 or 3 levels of paging) +

@@ -1097,13 +1274,16 @@ Pagination problem: only last page wastes space, but we can decide to use | | | | Hierarchical Paging +

Linux Startup

We start the Linux kernel first from C code executed from ''startup_32:'' asm label: +

+

|startup_32: @@ -1142,6 +1322,7 @@ We start the Linux kernel first from C code executed from ''startup_32:'' |kernel_thread |unlock_kernel |cpu_idle +

@@ -1205,10 +1386,13 @@ kernel_thread [arch/i386/kernel/process.c] unlock_kernel [include/asm/smplock.h] cpu_idle [arch/i386/kernel/process.c] +

The last function ''rest_init'' does the following: +

+

@@ -1216,16 +1400,20 @@ launches the kernel thread ''init'' calls unlock_kernel -makes the kernel run cpu_idle routine, that will be the idle loop executing - when nothing is scheduled +makes the kernel run cpu_idle routine, that will be the idle + loop executing when nothing is scheduled +

-In fact the start_kernel procedure never ends. It will execute cpu_idle - routine endlessly. +In fact the start_kernel procedure never ends. It will execute + cpu_idle routine endlessly. +

Follows ''init'' description, which is the first Kernel Thread: +

+

|init @@ -1242,15 +1430,18 @@ Follows ''init'' description, which is the first Kernel Thread: |free_initmem |unlock_kernel |execve +

Linux Peculiarities Overview

-Linux has some peculiarities that distinguish it from other OSs. These - peculiarities include: +Linux has some peculiarities that distinguish it from other OSs. + These peculiarities include: +

+

@@ -1263,33 +1454,43 @@ Kernel threads Kernel modules ''Proc'' directory +

Flexibility Elements

-Points 4 and 5 give system administrators an enormous flexibility on system - configuration from user mode allowing them to solve also critical kernel bugs - or specific problems without have to reboot the machine. For example, if you - needed to change something on a big server and you didn't want to make a reboot, - you could prepare the kernel to talk with a module, that you'll write. +Points 4 and 5 give system administrators an enormous flexibility + on system configuration from user mode allowing them to solve also + critical kernel bugs or specific problems without have to reboot + the machine. For example, if you needed to change something on a + big server and you didn't want to make a reboot, you could prepare + the kernel to talk with a module, that you'll write. +

Pagination only

-Linux doesn't use segmentation to distinguish Tasks from each other; it - uses pagination. (Only 2 segments are used for all Tasks, CODE and DATA/STACK) - +Linux doesn't use segmentation to distinguish Tasks from each + other; it uses pagination. (Only 2 segments are used for all Tasks, + CODE and DATA/STACK) +

-We can also say that an interTask page fault never occurs, because each - Task uses a set of Page Tables that are different for each Task. These tables - cannot point to the same physical addresses. +We can also say that an interTask page fault never occurs, because + each Task uses a set of Page Tables that are different for each Task. + There are some cases where different Tasks point to same Page Tables, + like shared libraries: this is needed to reduce memory usage; remember + that shared libraries are CODE only cause all datas are stored into + actual Task stack. +

Linux segments

Under the Linux kernel only 4 segments exist: +

+

@@ -1300,13 +1501,17 @@ Kernel Data / Stack [0x18] User Code [0x23] User Data / Stack [0x2b] +

[syntax is ''Purpose [Segment]''] +

Under Intel architecture, the segment registers used are: +

+

@@ -1316,85 +1521,106 @@ DS for Data Segment SS for Stack Segment -ES for Alternative Segment (for example used to make a memory copy between - 2 different segments) +ES for Alternative Segment (for example used to make a memory + copy between 2 different segments) +

So, every Task uses 0x23 for code and 0x2b for data/stack. +

Linux pagination

Under Linux 3 levels of pages are used, depending on the architecture. - Under Intel only 2 levels are supported. Linux also supports Copy on Write - mechanisms (please see Cap.10 for more information). + Under Intel only 2 levels are supported. Linux also supports Copy + on Write mechanisms (please see Cap.10 for more information). +

Why don't interTasks address conflicts exist?

-The answer is very very simple: interTask address conflicts cannot exist - because they are impossible. Linear -> physical mapping is done by "Pagination", - so it just needs to assign physical pages in an univocal fashion. +The answer is very very simple: interTask address conflicts + cannot exist because they are impossible. Linear -> physical + mapping is done by "Pagination", so it just needs to assign physical + pages in an univocal fashion. +

Do we need to defragment memory?

-No. Page assigning is a dynamic process. We need a page only when a Task - asks for it, so we choose it from free memory paging in an ordered fashion. - When we want to release the page, we only have to add it to the free pages - list. +No. Page assigning is a dynamic process. We need a page only + when a Task asks for it, so we choose it from free memory paging + in an ordered fashion. When we want to release the page, we only + have to add it to the free pages list. +

What about Kernel Pages?

-Kernel pages have a problem: they can be allocated in a dynamic fashion - but we cannot have a guarantee that they are in contiguous area allocation, - because linear kernel space is equivalent to physical kernel space. +Kernel pages have a problem: they can be allocated in a dynamic + fashion but we cannot have a guarantee that they are in contiguous + area allocation, because linear kernel space is equivalent to physical + kernel space. +

-For Code Segment there is no problem. Boot code is allocated at boot time - (so we have a fixed amount of memory to allocate), and on modules we only have - to allocate a memory area which could contain module code. +For Code Segment there is no problem. Boot code is allocated + at boot time (so we have a fixed amount of memory to allocate), and + on modules we only have to allocate a memory area which could contain + module code. +

-The real problem is the stack segment because each Task uses some kernel - stack pages. Stack segments must be contiguous (according to stack definition), - so we have to establish a maximum limit for each Task's stack dimension. If - we exceed this limit bad things happen. We overwrite kernel mode process data - structures. +The real problem is the stack segment because each Task uses + some kernel stack pages. Stack segments must be contiguous (according + to stack definition), so we have to establish a maximum limit for + each Task's stack dimension. If we exceed this limit bad things happen. + We overwrite kernel mode process data structures. +

-The structure of the Kernel helps us, because kernel functions are never: +The structure of the Kernel helps us, because kernel functions + are never: +

+

recursive intercalling more than N times. +

-Once we know N, and we know the average of static variables for all kernel - functions, we can estimate a stack limit. +Once we know N, and we know the average of static variables for + all kernel functions, we can estimate a stack limit. +

-If you want to try the problem out, you can create a module with a function - inside calling itself many times. After a fixed number of times, the kernel - module will hang because of a page fault exception handler (typically write - to a read-only page). +If you want to try the problem out, you can create a module with + a function inside calling itself many times. After a fixed number + of times, the kernel module will hang because of a page fault exception + handler (typically write to a read-only page). +

Softirq

-When an IRQ comes, task switching is deferred until later to get better - performance. Some Task jobs (that could have to be done just after the IRQ - and that could take much CPU in interrupt time, like building up a TCP/IP packet) - are queued and will be done at scheduling time (once a time-slice will end). +When an IRQ comes, task switching is deferred until later to + get better performance. Some Task jobs (that could have to be done + just after the IRQ and that could take much CPU in interrupt time, + like building up a TCP/IP packet) are queued and will be done at + scheduling time (once a time-slice will end). +

-In recent kernels (2.4.x) the softirq mechanisms are given to a kernel_thread: - ''ksoftirqd_CPUn''. n stands for the number of CPU executing kernel_thread - (in a monoprocessor system ''ksoftirqd_CPU0'' uses PID 3). +In recent kernels (2.4.x) the softirq mechanisms are given to + a kernel_thread: ''ksoftirqd_CPUn''. n stands for the number of CPU + executing kernel_thread (in a monoprocessor system ''ksoftirqd_CPU0'' + uses PID 3). +

Preparing Softirq @@ -1403,13 +1629,16 @@ Enabling Softirq

''cpu_raise_softirq'' is a routine that will wake_up ''ksoftirqd_CPU0'' kernel thread, to let it manage the enqueued job. +

+

|cpu_raise_softirq |__cpu_raise_softirq |wakeup_softirqd |wake_up_process +

@@ -1421,27 +1650,34 @@ __cpu_raise_softirq [include/linux/interrupt.h] wakeup_softirq [kernel/softirq.c] wake_up_process [kernel/sched.c] +

-''__cpu_raise_softirq'' routine will set right bit in the vector describing - softirq pending. +''__cpu_raise_softirq'' routine will set right bit in the vector + describing softirq pending. +

''wakeup_softirq'' uses ''wakeup_process'' to wake up ''ksoftirqd_CPU0'' kernel thread. +

Executing Softirq

TODO: describing data structures involved in softirq mechanism. +

-When kernel thread ''ksoftirqd_CPU0'' has been woken up, it will execute - queued jobs +When kernel thread ''ksoftirqd_CPU0'' has been woken up, it will + execute queued jobs +

The code of ''ksoftirqd_CPU0'' is (main endless loop): +

+

for (;;) { @@ -1455,31 +1691,34 @@ for (;;) { } __set_current_state(TASK_INTERRUPTIBLE) } +

ksoftirqd [kernel/softirq.c] - -

-

- + +

Kernel Threads

-Even though Linux is a monolithic OS, a few ''kernel threads'' exist to - do housekeeping work. +Even though Linux is a monolithic OS, a few ''kernel threads'' + exist to do housekeeping work. +

-These Tasks don't utilize USER memory; they share KERNEL memory. They also - operate at the highest privilege (RING 0 on a i386 architecture) like any other - kernel mode piece of code. +These Tasks don't utilize USER memory; they share KERNEL memory. + They also operate at the highest privilege (RING 0 on a i386 architecture) + like any other kernel mode piece of code. +

Kernel threads are created by ''kernel_thread [arch/i386/kernel/process]'' - function, which calls ''clone'' [arch/i386/kernel/process.c] system - call from assembler (which is a ''fork'' like system call): + function, which calls ''clone'' [arch/i386/kernel/process.c] + system call from assembler (which is a ''fork'' like system call): +

+

int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags) @@ -1507,15 +1746,21 @@ int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags) : "memory"); return retval; } +

-Once called, we have a new Task (usually with very low PID number, like - 2,3, etc.) waiting for a very slow resource, like swap or usb event. A very - slow resource is used because we would have a task switching overhead otherwise. +Once called, we have a new Task (usually with very low PID number, + like 2,3, etc.) waiting for a very slow resource, like swap or usb + event. A very slow resource is used because we would have a task + switching overhead otherwise. +

-Below is a list of most common kernel threads (from ''ps x'' command): +Below is a list of most common kernel threads (from ''ps x'' + command): +

+

PID COMMAND @@ -1528,104 +1773,133 @@ PID COMMAND 7 kacpid 67 khubd +

-'init' kernel thread is the first process created, at boot time. It will - call all other User Mode Tasks (from file /etc/inittab) like console daemons, - tty daemons and network daemons (''rc'' scripts). +'init' kernel thread is the first process created, at boot time. + It will call all other User Mode Tasks (from file /etc/inittab) like + console daemons, tty daemons and network daemons (''rc'' scripts). +

Example of Kernel Threads: kswapd [mm/vmscan.c].

''kswapd'' is created by ''clone() [arch/i386/kernel/process.c]'' +

Initialisation routines: +

+

|do_initcalls |kswapd_init |kernel_thread |syscall fork (in assembler) +

do_initcalls [init/main.c] +

kswapd_init [mm/vmscan.c] +

kernel_thread [arch/i386/kernel/process.c] +

Kernel Modules Overview

-Linux Kernel modules are pieces of code (examples: fs, net, and hw driver) - running in kernel mode that you can add at runtime. +Linux Kernel modules are pieces of code (examples: fs, net, and + hw driver) running in kernel mode that you can add at runtime. +

-The Linux core cannot be modularized: scheduling and interrupt management - or core network, and so on. +The Linux core cannot be modularized: scheduling and interrupt + management or core network, and so on. +

-Under "/lib/modules/KERNEL_VERSION/" you can find all the modules installed - on your system. +Under "/lib/modules/KERNEL_VERSION/" you can find all the modules + installed on your system. +

Module loading and unloading

To load a module, type the following: +

+

insmod MODULE_NAME parameters example: insmod ne io=0x300 irq=9 +

-NOTE: You can use modprobe in place of insmod if you want the kernel automatically - search some parameter (for example when using PCI driver, or if you have specified - parameter under /etc/conf.modules file). +NOTE: You can use modprobe in place of insmod if you want the + kernel automatically search some parameter (for example when using + PCI driver, or if you have specified parameter under /etc/conf.modules + file). +

To unload a module, type the following: +

+

rmmod MODULE_NAME +

Module definition

A module always contains: +

+

-"init_module" function, executed at insmod (or modprobe) command +"init_module" function, executed at insmod (or modprobe) command + "cleanup_module" function, executed at rmmod command +

-If these functions are not in the module, you need to add 2 macros to specify - what functions will act as init and exit module: +If these functions are not in the module, you need to add 2 macros + to specify what functions will act as init and exit module: +

+

module_init(FUNCTION_NAME) module_exit(FUNCTION_NAME) +

-NOTE: a module can "see" a kernel variable only if it has been exported (with - macro EXPORT_SYMBOL). +NOTE: a module can "see" a kernel variable only if it has been + exported (with macro EXPORT_SYMBOL). +

A useful trick for adding flexibility to your kernel +

// kernel sources side @@ -1651,36 +1925,44 @@ int init_module() { int cleanup_module() { foo_function_pointer = NULL; } +

-This simple trick allows you to have very high flexibility in your Kernel, - because only when you load the module you'll make "my_function" routine execute. - This routine will do everything you want to do: for example ''rshaper'' module, - which controls bandwidth input traffic from the network, works in this kind - of matter. +This simple trick allows you to have very high flexibility in + your Kernel, because only when you load the module you'll make "my_function" + routine execute. This routine will do everything you want to do: + for example ''rshaper'' module, which controls bandwidth input traffic + from the network, works in this kind of matter. +

-Notice that the whole module mechanism is possible thanks to some global - variables exported to modules, such as head list (allowing you to extend the - list as much as you want). Typical examples are fs, generic devices (char, - block, net, telephony). You have to prepare the kernel to accept your new module; - in some cases you have to create an infrastructure (like telephony one, that - was recently created) to be as standard as possible. +Notice that the whole module mechanism is possible thanks to + some global variables exported to modules, such as head list (allowing + you to extend the list as much as you want). Typical examples are + fs, generic devices (char, block, net, telephony). You have to prepare + the kernel to accept your new module; in some cases you have to create + an infrastructure (like telephony one, that was recently created) + to be as standard as possible. +

Proc directory

-Proc fs is located in the /proc directory, which is a special directory - allowing you to talk directly with kernel. +Proc fs is located in the /proc directory, which is a special + directory allowing you to talk directly with kernel. +

Linux uses ''proc'' directory to support direct kernel communications: - this is necessary in many cases, for example when you want see main processes - data structures or enable ''proxy-arp'' feature on one interface and not in - others, you want to change max number of threads, or if you want to debug some - bus state, like ISA or PCI, to know what cards are installed and what I/O addresses - and IRQs are assigned to them. + this is necessary in many cases, for example when you want see main + processes data structures or enable ''proxy-arp'' feature on one + interface and not in others, you want to change max number of threads, + or if you want to debug some bus state, like ISA or PCI, to know + what cards are installed and what I/O addresses and IRQs are assigned + to them. +

+

|-- bus @@ -2139,18 +2421,22 @@ Linux uses ''proc'' directory to support direct kernel communications: |-- uptime `-- version +

-In the directory there are also all the tasks using PID as file names (you - have access to all Task information, like path of binary file, memory used, - and so on). +In the directory there are also all the tasks using PID as file + names (you have access to all Task information, like path of binary + file, memory used, and so on). +

-The interesting point is that you cannot only see kernel values (for example, - see info about any task or about network options enabled of your TCP/IP stack) - but you are also able to modify some of it, typically that ones under /proc/sys - directory: +The interesting point is that you cannot only see kernel values + (for example, see info about any task or about network options enabled + of your TCP/IP stack) but you are also able to modify some of it, + typically that ones under /proc/sys directory: +

+

/proc/sys/ @@ -2162,12 +2448,16 @@ The interesting point is that you cannot only see kernel values (for example, net vm kernel +

/proc/sys/kernel

-Below are very important and well-know kernel values, ready to be modified: +Below are very important and well-know kernel values, ready to + be modified: +

+

overflowgid @@ -2194,13 +2484,17 @@ hostname // host name of your Linux box version // date info about kernel compilation osrelease // kernel version (i.e. 2.4.5) ostype // Linux! +

/proc/sys/net

-This can be considered the most useful proc subdirectory. It allows you - to change very important settings for your network kernel configuration. +This can be considered the most useful proc subdirectory. It + allows you to change very important settings for your network kernel + configuration. +

+

core @@ -2209,15 +2503,19 @@ ipv6 unix ethernet 802 +

/proc/sys/net/core

-Listed below are general net settings, like "netdev_max_backlog" (typically - 300), the length of all your network packets. This value can limit your network - bandwidth when receiving packets, Linux has to wait up to scheduling time to - flush buffers (due to bottom half mechanism), about 1000/HZ ms +Listed below are general net settings, like "netdev_max_backlog" + (typically 300), the length of all your network packets. This value + can limit your network bandwidth when receiving packets, Linux has + to wait up to scheduling time to flush buffers (due to bottom half + mechanism), about 1000/HZ ms +

+

300 * 100 = 30 000 @@ -2225,58 +2523,75 @@ packets HZ(Timeslice freq) packets/s 30 000 * 1000 = 30 M packets average (Bytes/packet) throughput Bytes/s +

If you want to get higher throughput, you need to increase netdev_max_backlog, by typing: +

+

echo 4000 > /proc/sys/net/core/netdev_max_backlog +

-Note: Warning for some HZ values: under some architecture (like alpha or - arm-tbox) it is 1000, so you can have 300 MBytes/s of average throughput. +Note: Warning for some HZ values: under some architecture (like + alpha or arm-tbox) it is 1000, so you can have 300 MBytes/s of average + throughput. +

/proc/sys/net/ipv4

-"ip_forward", enables or disables ip forwarding in your Linux box. This is - a generic setting for all devices, you can specify each device you choose. +"ip_forward", enables or disables ip forwarding in your Linux box. + This is a generic setting for all devices, you can specify each + device you choose. +

/proc/sys/net/ipv4/conf/interface

-I think this is the most useful /proc entry, because it allows you to change - some net settings to support wireless networks (see for more information). +I think this is the most useful /proc entry, because it allows + you to change some net settings to support wireless networks (see + for more information). +

Here are some examples of when you could use this setting: +

+

"forwarding", to enable ip forwarding for your interface -"proxy_arp", to enable proxy arp feature. For more see Proxy arp HOWTO under - and for proxy arp use in Wireless networks. +"proxy_arp", to enable proxy arp feature. For more see Proxy arp + HOWTO under and for proxy arp use in Wireless networks. -"send_redirects" to avoid interface to send ICMP_REDIRECT (as before, see - for more). +"send_redirects" to avoid interface to send ICMP_REDIRECT (as before, + see for more). +

Linux Multitasking Overview

-This section will analyze data structures--the mechanism used to manage - multitasking environment under Linux. +This section will analyze data structures--the mechanism used + to manage multitasking environment under Linux. +

Task States

-A Linux Task can be one of the following states (according to [include/linux.h]): +A Linux Task can be one of the following states (according to + [include/linux.h]): +

+

@@ -2284,15 +2599,17 @@ TASK_RUNNING, it means that it is in the "Ready List" TASK_INTERRUPTIBLE, task waiting for a signal or a resource (sleeping) -TASK_UNINTERRUPTIBLE, task waiting for a resource (sleeping), it is in - same "Wait Queue" +TASK_UNINTERRUPTIBLE, task waiting for a resource (sleeping), + it is in same "Wait Queue" TASK_ZOMBIE, task child without father TASK_STOPPED, task being debugged +

Graphical Interaction +

______________ CPU Available ______________ @@ -2311,16 +2628,20 @@ Waiting for | | Resource |______________________| Main Multitasking Flow +

Timeslice PIT 8253 Programming

-Each 10 ms (depending on HZ value) an IRQ0 comes, which helps us in a multitasking - environment. This signal comes from PIC 8259 (in arch 386+) which is connected - to PIT 8253 with a clock of 1.19318 MHz. +Each 10 ms (depending on HZ value) an IRQ0 comes, which helps + us in a multitasking environment. This signal comes from PIC 8259 + (in arch 386+) which is connected to PIT 8253 with a clock of 1.19318 + MHz. +

+

_____ ______ ______ @@ -2344,25 +2665,32 @@ outb_p(0x34,0x43); /* binary, mode 2, LSB/MSB, ch 0 */ outb_p(LATCH & 0xff , 0x40); /* LSB */ outb(LATCH >> 8 , 0x40); /* MSB */ +

-So we program 8253 (PIT, Programmable Interval Timer) with LATCH = (1193180/HZ) - = 11931.8 when HZ=100 (default). LATCH indicates the frequency divisor factor. +So we program 8253 (PIT, Programmable Interval Timer) with LATCH + = (1193180/HZ) = 11931.8 when HZ=100 (default). LATCH indicates the + frequency divisor factor. +

-LATCH = 11931.8 gives to 8253 (in output) a frequency of 1193180 / 11931.8 - = 100 Hz, so period = 10ms +LATCH = 11931.8 gives to 8253 (in output) a frequency of 1193180 + / 11931.8 = 100 Hz, so period = 10ms +

So Timeslice = 1/HZ. +

-With each Timeslice we temporarily interrupt current process execution - (without task switching), and we do some housekeeping work, after which we'll - return back to our previous process. +With each Timeslice we temporarily interrupt current process + execution (without task switching), and we do some housekeeping work, + after which we'll return back to our previous process. +

Linux Timer IRQ ICA +

Linux Timer IRQ @@ -2389,10 +2717,13 @@ IRQ 0 [Timer] |} |RESTORE_ALL +

Functions can be found under: +

+

@@ -2407,51 +2738,61 @@ do_timer, update_process_times [kernel/timer.c] do_softirq [kernel/soft_irq.c] RESTORE_ALL, while loop [arch/i386/kernel/entry.S] +

Notes: +

+

-Function "IRQ0x00_interrupt" (like others IRQ0xXY_interrupt) is directly - pointed by IDT (Interrupt Descriptor Table, similar to Real Mode Interrupt - Vector Table, see Cap 11 for more), so EVERY interrupt coming to the processor - is managed by "IRQ0x#NR_interrupt" routine, where #NR is the interrupt - number. We refer to it as "wrapper irq handler". +Function "IRQ0x00_interrupt" (like others IRQ0xXY_interrupt) is + directly pointed by IDT (Interrupt Descriptor Table, similar to Real + Mode Interrupt Vector Table, see Cap 11 for more), so EVERY interrupt + coming to the processor is managed by "IRQ0x#NR_interrupt" routine, + where #NR is the interrupt number. We refer to it as "wrapper + irq handler". wrapper routines are executed, like "do_IRQ","handle_IRQ_event" [arch/i386/kernel/irq.c]. -After this, control is passed to official IRQ routine (pointed by "handler()"), - previously registered with "request_irq" [arch/i386/kernel/irq.c], +After this, control is passed to official IRQ routine (pointed + by "handler()"), previously registered with "request_irq" [arch/i386/kernel/irq.c], in this case "timer_interrupt" [arch/i386/kernel/time.c]. -"timer_interrupt" [arch/i386/kernel/time.c] routine is executed - and, when it ends, +"timer_interrupt" [arch/i386/kernel/time.c] routine is + executed and, when it ends, control backs to some assembler routines [arch/i386/kernel/entry.S]. +

Description: +

-To manage Multitasking, Linux (like every other Unix) uses a ''counter'' - variable to keep track of how much CPU was used by the task. So, on each IRQ - 0, the counter is decremented (point 4) and, when it reaches 0, we need to - switch task to manage timesharing (point 4 "need_resched" variable is set to - 1, then, in point 5 assembler routines control "need_resched" and call, if needed, - "schedule" [kernel/sched.c]). +To manage Multitasking, Linux (like every other Unix) uses a + ''counter'' variable to keep track of how much CPU was used by the + task. So, on each IRQ 0, the counter is decremented (point 4) and, + when it reaches 0, we need to switch task to manage timesharing (point + 4 "need_resched" variable is set to 1, then, in point 5 assembler routines + control "need_resched" and call, if needed, "schedule" [kernel/sched.c]). +

Scheduler

-The scheduler is the piece of code that chooses what Task has to be executed - at a given time. +The scheduler is the piece of code that chooses what Task has + to be executed at a given time. +

-Any time you need to change running task, select a candidate. Below is - the ''schedule [kernel/sched.c]'' function. +Any time you need to change running task, select a candidate. + Below is the ''schedule [kernel/sched.c]'' function. +

+

|schedule @@ -2471,30 +2812,38 @@ Any time you need to change running task, select a candidate. Below is |ret *** ret from call using future_EIP in place of call address new_task +

Bottom Half, Task Queues. and Tasklets Overview

-In classic Unix, when an IRQ comes (from a device), Unix makes "task switching" - to interrogate the task that requested the device. +In classic Unix, when an IRQ comes (from a device), Unix makes + "task switching" to interrogate the task that requested the device. +

-To improve performance, Linux can postpone the non-urgent work until later, - to better manage high speed event. +To improve performance, Linux can postpone the non-urgent work + until later, to better manage high speed event. +

-This feature is managed since kernel 1.x by the "bottom half" (BH). The irq - handler "marks" a bottom half, to be executed later, in scheduling time. +This feature is managed since kernel 1.x by the "bottom half" (BH). + The irq handler "marks" a bottom half, to be executed later, in scheduling + time. +

-In the latest kernels there is a "task queue"that is more dynamic than BH - and there is also a "tasklet" to manage multiprocessor environments. +In the latest kernels there is a "task queue"that is more dynamic + than BH and there is also a "tasklet" to manage multiprocessor environments. +

BH schema is: +

+

@@ -2503,9 +2852,11 @@ Declaration Mark Execution +

Declaration +

#define DECLARE_TASK_QUEUE(q) LIST_HEAD(q) @@ -2517,17 +2868,21 @@ struct list_head { #define LIST_HEAD_INIT(name) { &(name), &(name) } ''DECLARE_TASK_QUEUE'' [include/linux/tqueue.h, include/linux/list.h] +

-"DECLARE_TASK_QUEUE(q)" macro is used to declare a structure named "q" managing - task queue. +"DECLARE_TASK_QUEUE(q)" macro is used to declare a structure named + "q" managing task queue. +

Mark

Here is the ICA schema for "mark_bh" [include/linux/interrupt.h] function: +

+

|mark_bh(NUMBER) @@ -2537,17 +2892,22 @@ Here is the ICA schema for "mark_bh" [include/linux/interrupt.h] |soft_active |= (1 << HI_SOFTIRQ) ''mark_bh''[include/linux/interrupt.h] +

-For example, when an IRQ handler wants to "postpone" some work, it would - "mark_bh(NUMBER)", where NUMBER is a BH declarated (see section before). +For example, when an IRQ handler wants to "postpone" some work, + it would "mark_bh(NUMBER)", where NUMBER is a BH declarated (see section + before). +

Execution

We can see this calling from "do_IRQ" [arch/i386/kernel/irq.c] function: +

+

|do_softirq @@ -2555,27 +2915,34 @@ We can see this calling from "do_IRQ" [arch/i386/kernel/irq.c] |tasklet_vec[0].list->func +

"h->action(h);" is the function has been previously queued. +

Very low level routines

set_intr_gate +

set_trap_gate +

set_task_gate (not used). +

-(*interrupt)[NR_IRQS](void) = { IRQ0x00_interrupt, IRQ0x01_interrupt, - ..} +(*interrupt)[NR_IRQS](void) = { IRQ0x00_interrupt, + IRQ0x01_interrupt, ..} +

NR_IRQS = 224 [kernel 2.4.2] +

Task Switching @@ -2583,23 +2950,28 @@ Task Switching When does Task switching occur?

Now we'll see how the Linux Kernel switchs from one task to another. +

Task Switching is needed in many cases, such as the following: +

+

when TimeSlice ends, we need to give access to some other task -when a task decide to access a resource, it sleeps for it, so we have to - choose another task +when a task decide to access a resource, it sleeps for it, so + we have to choose another task -when a task waits for a pipe, we have to give access to other task, which - would write to pipe +when a task waits for a pipe, we have to give access to other + task, which would write to pipe +

Task Switching +

TASK SWITCHING TRICK @@ -2622,17 +2994,22 @@ Task Switching "a" (prev), "d" (next), \ "b" (prev)); \ } while (0) +

Trick is here: +

+

''pushl %4'' which puts future_EIP into the stack -''jmp __switch_to'' which execute ''__switch_to'' function, but in opposite - of ''call'' we will return to valued pushed in point 1 (so new Task!) +''jmp __switch_to'' which execute ''__switch_to'' function, but + in opposite of ''call'' we will return to valued pushed in point + 1 (so new Task!) +

@@ -2660,15 +3037,18 @@ Task1 Data/Stack Task1 Code | | |w | | |__________| |__________| |__________| |__________| Task2 Data/Stack Task2 Code Kernel Code Kernel Data/Stack +

Fork Overview

-Fork is used to create another task. We start from a Task Parent, and we - copy many data structures to Task Child. +Fork is used to create another task. We start from a Task Parent, + and we copy many data structures to Task Child. +

+

@@ -2690,28 +3070,35 @@ Fork is used to create another task. We start from a Task Parent, and we |_________| Fork SysCall +

What is not copied

-New Task just created (''Task Child'') is almost equal to Parent (''Task - Parent''), there are only few differences: +New Task just created (''Task Child'') is almost equal to Parent + (''Task Parent''), there are only few differences: +

+

obviously PID -child ''fork()'' will return 0, while parent ''fork()'' will return PID - of Task Child, to distinguish them each other in User Mode +child ''fork()'' will return 0, while parent ''fork()'' will + return PID of Task Child, to distinguish them each other in User + Mode -All child data pages are marked ''READ + EXECUTE'', no "WRITE'' (while parent - has WRITE right for its own pages) so, when a write request comes, a ''Page - Fault'' exception is generated which will create a new independent page: this - mechanism is called ''Copy on Write'' (see Cap.10 for more). +All child data pages are marked ''READ + EXECUTE'', no "WRITE'' + (while parent has WRITE right for its own pages) so, when a write + request comes, a ''Page Fault'' exception is generated which will + create a new independent page: this mechanism is called ''Copy on + Write'' (see Cap.10 for more). +

Fork ICA +

|sys_fork @@ -2746,6 +3133,7 @@ Fork ICA fork ICA +

@@ -2791,19 +3179,23 @@ copy_thread SET_LINKS [include/linux/sched.h] wake_up_process [kernel/sched.c] +

Copy on Write

To implement Copy on Write for Linux: +

+

-Mark all copied pages as read-only, causing a Page Fault when a child tries - to write to them. +Mark all copied pages as read-only, causing a Page Fault when + a Task tries to write to them. -Page Fault handler creates a new page for the Task caused exception. +Page Fault handler creates a new page. +

@@ -2825,6 +3217,7 @@ Page Fault handler creates a new page for the Task caused exception. Page Fault ICA +

@@ -2846,27 +3239,33 @@ copy_cow_page establish_pte set_pte [include/asm/pgtable-3level.h] +

Linux Memory Management Overview

-Linux uses segmentation + pagination, which simplifies notation. +Linux uses segmentation + pagination, which simplifies notation. + +

Segments

Linux uses only 4 segments: +

+

-2 segments (code and data/stack) for KERNEL SPACE from [0xC000 0000] - (3 GB) to [0xFFFF FFFF] (4 GB) +2 segments (code and data/stack) for KERNEL SPACE from [0xC000 + 0000] (3 GB) to [0xFFFF FFFF] (4 GB) -2 segments (code and data/stack) for USER SPACE from [0] (0 GB) - to [0xBFFF FFFF] (3 GB) +2 segments (code and data/stack) for USER SPACE from [0] + (0 GB) to [0xBFFF FFFF] (3 GB) +

@@ -2886,13 +3285,16 @@ Linux uses only 4 segments: 0x00000000 Kernel/User Linear addresses +

Specific i386 implementation

-Again, Linux implements Pagination using 3 Levels of Paging, but in i386 - architecture only 2 of them are really used: +Again, Linux implements Pagination using 3 Levels of Paging, + but in i386 architecture only 2 of them are really used: +

+

@@ -2930,27 +3332,36 @@ Again, Linux implements Pagination using 3 Levels of Paging, but in i386 +

Memory Mapping

-Linux manages Access Control with Pagination only, so different Tasks will - have the same segment addresses, but different CR3 (register used to store - Directory Page Address), pointing to different Page Entries. +Linux manages Access Control with Pagination only, so different + Tasks will have the same segment addresses, but different CR3 (register + used to store Directory Page Address), pointing to different Page + Entries. +

-In User mode a task cannot overcome 3 GB limit (0 x C0 00 00 00), so only - the first 768 page directory entries are meaningful (768*4MB = 3GB). +In User mode a task cannot overcome 3 GB limit (0 x C0 00 00 + 00), so only the first 768 page directory entries are meaningful + (768*4MB = 3GB). +

-When a Task goes in Kernel Mode (by System call or by IRQ) the other 256 - pages directory entries become important, and they point to the same page files - as all other Tasks (which are the same as the Kernel). +When a Task goes in Kernel Mode (by System call or by IRQ) the + other 256 pages directory entries become important, and they point + to the same page files as all other Tasks (which are the same as + the Kernel). +

-Note that Kernel (and only kernel) Linear Space is equal to Kernel Physical - Space, so: +Note that Kernel (and only kernel) Linear Space is equal to Kernel + Physical Space, so: +

+

@@ -2970,16 +3381,20 @@ Note that Kernel (and only kernel) Linear Space is equal to Kernel Physical Logical Addresses Physical Addresses +

-Linear Kernel Space corresponds to Physical Kernel Space translated 3 - GB down (in fact page tables are something like { "00000000", "00000001" }, - so they operate no virtualization, they only report physical addresses they - take from linear ones). +Linear Kernel Space corresponds to Physical Kernel Space translated + 3 GB down (in fact page tables are something like { "00000000", + "00000001" }, so they operate no virtualization, they only report + physical addresses they take from linear ones). +

-Notice that you'll not have an "addresses conflict" between Kernel and User - spaces because we can manage physical addresses with Page Tables. +Notice that you'll not have an "addresses conflict" between Kernel + and User spaces because we can manage physical addresses with Page + Tables. +

Low level memory allocation @@ -2988,43 +3403,56 @@ Boot Initialization

We start from kmem_cache_init (launched by start_kernel [init/main.c] at boot up). +

+

|kmem_cache_init |kmem_cache_estimate +

kmem_cache_init [mm/slab.c] +

kmem_cache_estimate +

Now we continue with mem_init (also launched by start_kernel[init/main.c]) +

+

|mem_init |free_all_bootmem |free_all_bootmem_core +

mem_init [arch/i386/mm/init.c] +

free_all_bootmem [mm/bootmem.c] +

free_all_bootmem_core +

Run-time allocation

-Under Linux, when we want to allocate memory, for example during "copy_on_write" - mechanism (see Cap.10), we call: +Under Linux, when we want to allocate memory, for example during + "copy_on_write" mechanism (see Cap.10), we call: +

+

|copy_mm @@ -3041,10 +3469,13 @@ Under Linux, when we want to allocate memory, for example during "copy_on_write" |rmqueue |reclaim_pages +

Functions can be found under: +

+

@@ -3075,9 +3506,11 @@ __alloc_pages [mm/page_alloc.c] rm_queue reclaim_pages [mm/vmscan.c] +

TODO: Understand Zones +

Swap @@ -3085,12 +3518,16 @@ Swap Overview

Swap is managed by the kswapd daemon (kernel thread). +

kswapd

-As other kernel threads, kswapd has a main loop that wait to wake up. +As other kernel threads, kswapd has a main loop that wait to + wake up. +

+

|kswapd @@ -3102,6 +3539,7 @@ As other kernel threads, kswapd has a main loop that wait to wake up. |run_task_queue |interruptible_sleep_on_timeout // we sleep for a new swap request |} +

@@ -3117,17 +3555,21 @@ refill_inactive_scan [mm/vmswap.c] run_task_queue [kernel/softirq.c] interruptible_sleep_on_timeout [kernel/sched.c] +

When do we need swapping?

-Swapping is needed when we have to access a page that is not in physical - memory. +Swapping is needed when we have to access a page that is not + in physical memory. +

-Linux uses ''kswapd'' kernel thread to carry out this purpose. When the - Task receives a page fault exception we do the following: +Linux uses ''kswapd'' kernel thread to carry out this purpose. + When the Task receives a page fault exception we do the following: +

+

@@ -3150,6 +3592,7 @@ Linux uses ''kswapd'' kernel thread to carry out this purpose. When the Page Fault ICA +

@@ -3173,26 +3616,30 @@ alloc_pages_pgdat __alloc_pages wakeup_kswapd [mm/vmscan.c] +

Linux Networking How Linux networking is managed?

-There exists a device driver for each kind of NIC. Inside it, Linux will - ALWAYS call a standard high level routing: "netif_rx [net/core/dev.c]", - which will controls what 3 level protocol the frame belong to, and it will - call the right 3 level function (so we'll use a pointer to the function to - determine which is right). +There exists a device driver for each kind of NIC. Inside it, + Linux will ALWAYS call a standard high level routing: "netif_rx [net/core/dev.c]", + which will controls what 3 level protocol the frame belong to, and + it will call the right 3 level function (so we'll use a pointer to + the function to determine which is right). +

TCP example

-We'll see now an example of what happens when we send a TCP packet to Linux, - starting from ''netif_rx [net/core/dev.c]'' call. +We'll see now an example of what happens when we send a TCP packet + to Linux, starting from ''netif_rx [net/core/dev.c]'' call. +

Interrupt management: "netif_rx" +

|netif_rx @@ -3202,27 +3649,34 @@ Interrupt management: "netif_rx" |cpu_raise_softirq |softirq_active(cpu) |= (1 << NET_RX_SOFTIRQ) // set bit NET_RX_SOFTIRQ in the BH vector +

Functions: +

+

__skb_queue_tail [include/linux/skbuff.h] cpu_raise_softirq [kernel/softirq.c] +

Post Interrupt management: "net_rx_action"

-Once IRQ interaction is ended, we need to follow the next part of the frame - life and examine what NET_RX_SOFTIRQ does. +Once IRQ interaction is ended, we need to follow the next part + of the frame life and examine what NET_RX_SOFTIRQ does. +

-We will next call ''net_rx_action [net/core/dev.c]'' according - to "net_dev_init [net/core/dev.c]". +We will next call ''net_rx_action [net/core/dev.c]'' + according to "net_dev_init [net/core/dev.c]". +

+

|net_rx_action @@ -3281,10 +3735,13 @@ We will next call ''net_rx_action [net/core/dev.c]'' according |if (ACK) |tcp_set_state(TCP_ESTABLISHED) +

Functions can be found under: +

+

@@ -3343,19 +3800,23 @@ tcp_rcv_synsent_state_process [net/ipv4/tcp_input.c] tcp_set_state [include/net/tcp.h] tcp_send_ack [net/ipv4/tcp_output.c] +

Description: +

+

First we determine protocol type (IP, then TCP) -NF_HOOK (function) is a wrapper routine that first manages the network - filter (for example firewall), then it calls ''function''. +NF_HOOK (function) is a wrapper routine that first manages the + network filter (for example firewall), then it calls ''function''. After we manage 3-way TCP Handshake which consists of: +

@@ -3373,17 +3834,20 @@ SERVER (LISTENING) CLIENT (CONNECTING) 3-Way TCP handshake +

In the end we only have to launch "tcp_rcv_established [net/ipv4/tcp_input.c]" which gives the packet to the user socket and wakes it up. +

Linux File System

TODO +

Useful Tips @@ -3393,9 +3857,11 @@ Stack and Heap Overview

Here we view how "stack" and "heap" are allocated in memory +

Memory allocation +

@@ -3412,17 +3878,22 @@ XX.. | | <-- top of the stack [Stack Pointer&rsqb Stack +

-Memory address values start from 00.. (which is also where Stack Segment - begins) and they grow going toward FF.. value. +Memory address values start from 00.. (which is also where Stack + Segment begins) and they grow going toward FF.. value. +

XX.. is the actual value of the Stack Pointer. +

Stack is used by functions for: +

+

@@ -3431,10 +3902,13 @@ global variables local variables return address +

For example, for a classical function: +

+

@@ -3480,6 +3954,7 @@ we have Note: variables order can be different depending on hardware architecture. +

Application vs Process @@ -3487,17 +3962,21 @@ Application vs Process Base definition

We have to distinguish 2 concepts: +

+

Application: that is the useful code we want to execute -Process: that is the IMAGE on memory of the application (it depends on - memory strategy used, segmentation and/or Pagination). +Process: that is the IMAGE on memory of the application (it depends + on memory strategy used, segmentation and/or Pagination). +

Often Process is also called Task or Thread. +

Locks @@ -3505,56 +3984,62 @@ Locks Overview

2 kind of locks: +

+

intraCPU interCPU +

Copy_on_write

-Copy_on_write is a mechanism used to reduce memory usage. It postpones - memory allocation until the memory is really needed. +Copy_on_write is a mechanism used to reduce memory usage. It + postpones memory allocation until the memory is really needed. +

-For example, when a task executes the "fork()" system call (to create another - task), we still use the same memory pages as the parent, in read only mode. - When the new task WRITES into the old page, it causes an exception and the - page is copied and marked "rw" (read, write). +For example, when a task executes the "fork()" system call (to + create another task), we still use the same memory pages as the + parent, in read only mode. When a task WRITES into the page, it causes + an exception and the page is copied and marked "rw" (read, write). +

+

1-) Page X is shared between Task Parent and Task Child Task Parent - | | RW Access ______ + | | RO Access ______ | |---------->|Page X| |_________| |______| /|\ | Task Child | - | | R Access | + | | RO Access | | |---------------- |_________| -2-) Write request from Task Child +2-) Write request Task Parent - | | RW Access ______ - | |---------->|Page X| + | | RO Access ______ + | |---------->|Page X| Trying to write |_________| |______| /|\ | Task Child | - | | W Access | + | | RO Access | | |---------------- |_________| -3-) Final Configuration: Task Parent and Task Child have an independent copy of the Page, X and Y +3-) Final Configuration: Either Task Parent and Task Child have an independent copy of the Page, X and Y Task Parent | | RW Access ______ | |---------->|Page X| @@ -3565,28 +4050,33 @@ For example, when a task executes the "fork()" system call (to create another | | RW Access ______ | |---------->|Page Y| |_________| |______| +

80386 specific details Boot procedure +

bbootsect.s [arch/i386/boot] setup.S (+video.S) head.S (+misc.c) [arch/i386/boot/compressed] start_kernel [init/main.c] +

80386 (and more) Descriptors Overview

-Descriptors are data structure used by Intel microprocessor i386+ to virtualize - memory. +Descriptors are data structure used by Intel microprocessor i386+ + to virtualize memory. +

Kind of descriptors +

@@ -3595,17 +4085,20 @@ GDT (Global Descriptor Table) LDT (Local Descriptor Table) IDT (Interrupt Descriptor Table) +

IRQ Overview

-IRQ is an asyncronous signal sent to microprocessor to advertise a requested - work is completed +IRQ is an asyncronous signal sent to microprocessor to advertise + a requested work is completed +

Interaction schema +

|<--> IRQ(0) [Timer] @@ -3623,13 +4116,16 @@ Interaction schema IRQ - Tasks Interaction Schema +

What happens?

-A typical O.S. uses many IRQ signals to interrupt normal process execution - and does some housekeeping work. So: +A typical O.S. uses many IRQ signals to interrupt normal process + execution and does some housekeeping work. So: +

+

@@ -3638,11 +4134,13 @@ IRQ (i) occurs and Task(j) is interrupted IRQ(i)_handler is executed control backs to Task(j) interrupted +

-Under Linux, when an IRQ comes, first the IRQ wrapper routine (named "interrupt0x??") - is called, then the "official" IRQ(i)_handler will be executed. This allows some - duties like timeslice preemption. +Under Linux, when an IRQ comes, first the IRQ wrapper routine + (named "interrupt0x??") is called, then the "official" IRQ(i)_handler + will be executed. This allows some duties like timeslice preemption. +

Utility functions @@ -3650,22 +4148,29 @@ Utility functions list_entry [include/linux/list.h]

Definition: +

+

#define list_entry(ptr, type, member) \ ((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member))) +

Meaning: +

-"list_entry" macro is used to retrieve a parent struct pointer, by using - only one of internal struct pointer. +"list_entry" macro is used to retrieve a parent struct pointer, + by using only one of internal struct pointer. +

Example: +

+

struct __wait_queue { @@ -3684,11 +4189,14 @@ typedef struct __wait_queue wait_queue_t; wait_queue_t *out list_entry(tmp, wait_queue_t, task_list); // where tmp point to list_head +

-So, in this case, by means of *tmp pointer [list_head] we retrieve - an *out pointer [wait_queue_t]. +So, in this case, by means of *tmp pointer [list_head] + we retrieve an *out pointer [wait_queue_t]. +

+

@@ -3700,6 +4208,7 @@ So, in this case, by means of *tmp pointer [list_head] we retrieve | next * -->| | | |____________| ----- *tmp [we have this] +

Sleep @@ -3707,7 +4216,9 @@ Sleep Sleep code

Files: +

+

@@ -3718,10 +4229,13 @@ include/linux/sched.h include/linux/wait.h include/linux/list.h +

Functions: +

+

@@ -3732,10 +4246,13 @@ interruptible_sleep_on_timeout sleep_on sleep_on_timeout +

Called functions: +

+

@@ -3748,10 +4265,13 @@ list_add __list_add __remove_wait_queue +

InterCallings Analysis: +

+

|sleep_on @@ -3765,18 +4285,24 @@ InterCallings Analysis: |__list_del -- +

Description: +

-Under Linux each resource (ideally an object shared between many users - and many processes), , has a queue to manage ALL tasks requesting it. +Under Linux each resource (ideally an object shared between many + users and many processes), , has a queue to manage ALL tasks requesting + it. +

-This queue is called "wait queue" and it consists of many items we'll call - the"wait queue element": +This queue is called "wait queue" and it consists of many items + we'll call the"wait queue element": +

+

*** wait queue structure [include/linux/wait.h] *** @@ -3790,10 +4316,13 @@ struct __wait_queue { struct list_head { struct list_head *next, *prev; }; +

Graphic working: +

+

*** wait queue element *** @@ -3820,16 +4349,20 @@ Graphic working: task1 <--[prev *, lock, next *]--> taskN +

-"wait queue head" point to first (with next *) and last (with prev *) elements - of the "wait queue list". +"wait queue head" point to first (with next *) and last (with prev + *) elements of the "wait queue list". +

When a new element has to be added, "__add_wait_queue" [include/linux/wait.h] is called, after which the generic routine "list_add" [include/linux/wait.h], will be executed: +

+

*** function list_add [include/linux/list.h] *** @@ -3843,12 +4376,15 @@ static __inline__ void __list_add (struct list_head * new, \ new->prev = prev; prev->next = new; } +

To complete the description, we see also "__list_del" [include/linux/list.h] - function called by "list_del" [include/linux/list.h] inside "remove_wait_queue" - [include/linux/wait.h]: + function called by "list_del" [include/linux/list.h] inside + "remove_wait_queue" [include/linux/wait.h]: +

+

*** function list_del [include/linux/list.h] *** @@ -3859,16 +4395,20 @@ static __inline__ void __list_del (struct list_head * prev, struct list_head * n next->prev = prev; prev->next = next; } +

Stack consideration

-A typical list (or queue) is usually managed allocating it into the Heap - (see Cap.10 for Heap and Stack definition and about where variables are allocated). - Otherwise here, we statically allocate Wait Queue data in a local variable - (Stack), then function is interrupted by scheduling, in the end, (returning - from scheduling) we'll erase local variable. +A typical list (or queue) is usually managed allocating it into + the Heap (see Cap.10 for Heap and Stack definition and about where + variables are allocated). Otherwise here, we statically allocate + Wait Queue data in a local variable (Stack), then function is interrupted + by scheduling, in the end, (returning from scheduling) we'll erase + local variable. +

+

new task <----| task1 <------| task2 <------| @@ -3885,32 +4425,42 @@ A typical list (or queue) is usually managed allocating it into the Heap |__________| |__________| |__________| Stack Stack Stack +

Static variables Overview

-Linux is written in ''C'' language, and as every application has: +Linux is written in ''C'' language, and as every application + has: +

+

Local variables -Module variables (inside the source file and relative only to that module) +Module variables (inside the source file and relative only to + that module) -Global/Static variables present in only 1 copy (the same for all modules) +Global/Static variables present in only 1 copy (the same for + all modules) +

-When a Static variable is modified by a module, all other modules will - see the new value. +When a Static variable is modified by a module, all other modules + will see the new value. +

-Static variables under Linux are very important, cause they are the only - kind to add new support to kernel: they typically are pointers to the head - of a list of registered elements, which can be: +Static variables under Linux are very important, cause they are + the only kind to add new support to kernel: they typically are pointers + to the head of a list of registered elements, which can be: +

+

@@ -3919,37 +4469,47 @@ added deleted maybe modified +

_______ _______ _______ Global variable -------> |Item(1)| -> |Item(2)| -> |Item(3)| .. |_______| |_______| |_______| +

Main variables Current +

________________ Current ----------------> | Actual process | |________________| +

-Current points to ''task_struct'' structure, which contains all data about - a process like: +Current points to ''task_struct'' structure, which contains all + data about a process like: +

+

pid, name, state, counter, policy of scheduling -pointers to many data structures like: files, vfs, other processes, signals... +pointers to many data structures like: files, vfs, other processes, + signals... +

Current is not a real variable, it is +

+

static inline struct task_struct * get_current(void) { @@ -3958,103 +4518,129 @@ static inline struct task_struct * get_current(void) { return current; } #define current get_current() +

-Above lines just takes value of ''esp'' register (stack pointer) and get - it available like a variable, from which we can point to our task_struct structure. -

-

-From ''current'' element we can access directly to any other process (ready, - stopped or in any other state) kernel data structure, for example changing - STATE (like a I/O driver does), PID, presence in ready list or blocked list, - etc. +Above lines just takes value of ''esp'' register (stack pointer) + and get it available like a variable, from which we can point to + our task_struct structure. +

+From ''current'' element we can access directly to any other + process (ready, stopped or in any other state) kernel data structure, + for example changing STATE (like a I/O driver does), PID, presence + in ready list or blocked list, etc.

Registered filesystems +

______ _______ ______ file_systems ------> | ext2 | -> | msdos | -> | ntfs | [fs/super.c] |______| |_______| |______| +

-When you use command like ''modprobe some_fs'' you will add a new entry - to file systems list, while removing it (by using ''rmmod'') will delete it. +When you use command like ''modprobe some_fs'' you will add a + new entry to file systems list, while removing it (by using ''rmmod'') + will delete it. +

Mounted filesystems +

______ _______ ______ mount_hash_table ---->| / | -> | /usr | -> | /var | [fs/namespace.c] |______| |_______| |______| +

-When you use ''mount'' command to add a fs, the new entry will be inserted - in the list, while an ''umount'' command will delete the entry. +When you use ''mount'' command to add a fs, the new entry will + be inserted in the list, while an ''umount'' command will delete + the entry. +

Registered Network Packet Type +

______ _______ ______ ptype_all ------>| ip | -> | x25 | -> | ipv6 | [net/core/dev.c] |______| |_______| |______| +

-For example, if you add support for IPv6 (loading relative module) a new - entry will be added in the list. +For example, if you add support for IPv6 (loading relative module) + a new entry will be added in the list. +

Registered Network Internet Protocol +

______ _______ _______ inet_protocol_base ----->| icmp | -> | tcp | -> | udp | [net/ipv4/protocol.c] |______| |_______| |_______| +

-Also others packet type have many internal protocols in each list (like - IPv6). +Also others packet type have many internal protocols in each + list (like IPv6). +

+

______ _______ _______ inet6_protos ----------->|icmpv6| -> | tcpv6 | -> | udpv6 | [net/ipv6/protocol.c] |______| |_______| |_______| +

Registered Network Device +

______ _______ _______ dev_base --------------->| lo | -> | eth0 | -> | ppp0 | [drivers/core/Space.c] |______| |_______| |_______| +

Registered Char Device +

______ _______ ________ chrdevs ---------------->| lp | -> | keyb | -> | serial | [fs/devices.c] |______| |_______| |________| +

-''chrdevs'' is not a pointer to a real list, but it is a standard vector. +''chrdevs'' is not a pointer to a real list, but it is a standard + vector. +

Registered Block Device +

______ ______ ________ bdev_hashtable --------->| fd | -> | hd | -> | scsi | [fs/block_dev.c] |______| |______| |________| +

''bdev_hashtable'' is an hash vector. +

Glossary @@ -4062,16 +4648,22 @@ Glossary Links

+

+

+

- + +

+ +