old-www/LDP/lki/lki-5.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
 <TITLE>Linux Kernel 2.4 Internals: IPC mechanisms</TITLE>
 <LINK HREF="lki-4.html" REL=previous>
 <LINK HREF="lki.html#toc5" REL=contents>
</HEAD>
<BODY>
Next
<A HREF="lki-4.html">Previous</A>
<A HREF="lki.html#toc5">Contents</A>
<HR>
<H2><A NAME="s5">5. IPC mechanisms</A></H2>

<P>This chapter describes the semaphore, shared memory, and
message queue IPC mechanisms as implemented in the Linux 2.4
kernel. It is organized into four sections. The
first three sections cover the interfaces and support functions
for
<A HREF="#semaphores">semaphores</A>,
<A HREF="#message">message queues</A>,
and
<A HREF="#sharedmem">shared memory</A> respectively.
The
<A HREF="#ipc_primitives">last</A> section describes
a set of common functions and data structures that are shared by
all three mechanisms.
<P>
<H2><A NAME="semaphores"></A> <A NAME="ss5.1">5.1 Semaphores</A>
</H2>

<P>The functions described in this section implement the user level
semaphore mechanisms. Note that this implementation relies on the
use of kernel splinlocks and kernel semaphores. To avoid confusion,
the term "kernel semaphore" will be used in reference to kernel
semaphores. All other uses of the word "sempahore" will be in
reference to the user level semaphores.
<P>
<H3><A NAME="sem_apis"></A> Semaphore System Call Interfaces</H3>

<P>
<P>
<H3><A NAME="sys_semget"></A> sys_semget()</H3>

<P>The entire call to sys_semget() is protected by the
global
<A HREF="#struct_ipc_ids">sem_ids.sem</A>
kernel semaphore.
<P>In the case where a new set of semaphores must be
created, the
<A HREF="#newary">newary()</A> function is
called to create and initialize a new semaphore set. The ID of
the new set is returned to the caller.
<P>In the case where a key value is provided for an existing
semaphore set,
<A HREF="#ipc_findkey">ipc_findkey()</A>
is invoked to look up the corresponding semaphore descriptor
array index.  The parameters and permissions of the caller are
verified before returning the semaphore set ID.
<H3><A NAME="sys_semctl"></A> sys_semctl()</H3>

<P>For the
<A HREF="#IPC_INFO_and_SEM_INFO">IPC_INFO</A>,
<A HREF="#IPC_INFO_and_SEM_INFO">SEM_INFO</A>, and
<A HREF="#SEM_STAT">SEM_STAT</A> commands,
<A HREF="#semctl_nolock">semctl_nolock()</A>
is called to perform the necessary functions.
<P>For the
<A HREF="#GETALL">GETALL</A>,
<A HREF="#GETVAL">GETVAL</A>,
<A HREF="#GETPID">GETPID</A>,
<A HREF="#GETNCNT">GETNCNT</A>,
<A HREF="#GETZCNT">GETZCNT</A>,
<A HREF="#IPC_STAT">IPC_STAT</A>,
<A HREF="#SETVAL">SETVAL</A>,and
<A HREF="#SETALL">SETALL</A> commands,
<A HREF="#semctl_main">semctl_main()</A> is called to perform the
necessary functions.
<P>For the
<A HREF="#semctl_ipc_rmid">IPC_RMID</A>
and
<A HREF="#semctl_ipc_set">IPC_SET</A> command,
<A HREF="#semctl_down">semctl_down()</A> is called
to perform the necessary functions. Throughout both of these
operations, the global
<A HREF="#struct_ipc_ids">sem_ids.sem</A>
kernel semaphore is held.
<H3><A NAME="sys_semop"></A> sys_semop()</H3>

<P>After validating the call parameters, the semaphore
operations data is copied from user space to a temporary buffer.
If a small temporary buffer is sufficient, then a stack buffer is
used. Otherwise, a larger buffer is allocated. After copying in the
semaphore operations data, the global semaphores spinlock is
locked, and the user-specified semaphore set ID is validated.
Access permissions for the semaphore set are also validated.
<P>All of the user-specified semaphore operations are parsed.
During this process, a count is maintained of all the operations that
have the SEM_UNDO flag set. A <CODE>decrease</CODE> flag is set if any of the
operations subtract from a semaphore value, and an <CODE>alter</CODE> flag is set
if any of the semaphore values are modified (i.e. increased or
decreased). The number of each
semaphore to be modified is validated.
<P>If SEM_UNDO was asserted for any of the semaphore operations,
then the undo list for the current task is searched for an undo
structure associated with this semaphore set. During this search,
if the semaphore set ID of any of the undo structures is found
to be -1, then
<A HREF="#freeundos">freeundos()</A>
is called to free the undo structure
and remove it from the list. If no undo structure is found for
this semaphore set then
<A HREF="#alloc_undo">alloc_undo()</A>
is called to allocate and initialize one.
<P>The
<A HREF="#try_atomic_semop">try_atomic_semop()</A>
function is called with the <CODE>do_undo</CODE>
parameter equal to 0 in order to execute the sequence of
operations. The return value indicates that either the
operations passed, failed, or were not executed because
they need to block. Each of these cases are further described below:
<P>
<H3><A NAME="Non-blocking_Semaphore_Operations"></A> Non-blocking Semaphore Operations</H3>

<P>The
<A HREF="#try_atomic_semop">try_atomic_semop()</A>
function returns zero to indicate that all operations in the
sequence succeeded. In this case,
<A HREF="#update_queue">update_queue()</A>
is called to traverse the queue of pending semaphore
operations for the semaphore set and awaken any
sleeping tasks that no longer need to block. This completes the
execution of the sys_semop() system call for this case.
<H3><A NAME="Failing_Semaphore_Operations"></A> Failing Semaphore Operations</H3>

<P>If
<A HREF="#try_atomic_semop">try_atomic_semop()</A>
returns a negative value, then a failure condition was encountered.
In this case, none of the operations have been executed.
This occurs when either a semaphore operation would cause an
invalid semaphore value, or an operation marked IPC_NOWAIT is
unable to complete.  The error condition is then returned to the
caller of sys_semop().
<P>Before sys_semop() returns, a call is made to
<A HREF="#update_queue">update_queue()</A> to traverse
the queue of pending semaphore operations for the semaphore set
and awaken any sleeping tasks that no longer need to block.
<H3><A NAME="Blocking_Semaphore_Operations"></A> Blocking Semaphore Operations</H3>

<P>The
<A HREF="#try_atomic_semop">try_atomic_semop()</A>
function returns 1 to indicate that the
sequence of semaphore operations was not executed because
one of the semaphores would block. For this case, a new
<A HREF="#struct_sem_queue">sem_queue</A> element is
initialized containing these semaphore operations. If any of
these operations would alter the state of the semaphore, then
the new queue element is added at the tail of the queue.
Otherwise, the new queue element is added at the head of the queue.
<P>The <CODE>semsleeping</CODE> element of the current
task is set to indicate that the task is sleeping on this
<A HREF="#struct_sem_queue">sem_queue</A> element.
The current task is marked as TASK_INTERRUPTIBLE, and the
<CODE>sleeper</CODE> element of the
<A HREF="#struct_sem_queue">sem_queue</A>
is set to identify this task as the sleeper. The
global semaphore spinlock is then unlocked, and schedule() is called
to put the current task to sleep.
<P>When awakened, the task re-locks the global semaphore spinlock,
determines why it was awakened, and how it should
respond.  The following cases are handled:
<P>
<UL>
<LI>  If the the semaphore set has been removed, then
the system call fails with EIDRM.
</LI>
<LI>  If the <CODE>status</CODE> element of the
<A HREF="#struct_sem_queue">sem_queue</A> structure
is set to 1, then the task was awakened in order to retry the
semaphore operations. Another call to
<A HREF="#try_atomic_semop">try_atomic_semop()</A> is
made to execute the sequence of semaphore operations.  If
try_atomic_sweep() returns 1, then the task must block again
as described above. Otherwise, 0 is returned for success,
or an appropriate error code is returned in case of failure.

Before sys_semop() returns, current->semsleeping is cleared,
and the
<A HREF="#struct_sem_queue">sem_queue</A>
is removed from the queue.  If any of the specified semaphore
operations were altering operations (increase or decrease),
then
<A HREF="#update_queue">update_queue()</A> is
called to traverse the queue of pending semaphore operations
for the semaphore set and awaken any sleeping tasks that no
longer need to block.
</LI>
<LI>  If the <CODE>status</CODE> element of the
<A HREF="#struct_sem_queue">sem_queue</A> structure is
NOT set to 1, and the
<A HREF="#struct_sem_queue">sem_queue</A> element has
not been dequeued, then the task was awakened by an interrupt.
In this case, the system call fails with EINTR.  Before
returning, current->semsleeping is cleared, and the
<A HREF="#struct_sem_queue">sem_queue</A> is removed
from the queue.  Also,
<A HREF="#update_queue">update_queue()</A> is called
if any of the operations were altering operations.
</LI>
<LI>  If the <CODE>status</CODE> element of the
<A HREF="#struct_sem_queue">sem_queue</A> structure is
NOT set to 1, and the
<A HREF="#struct_sem_queue">sem_queue</A> element
has been dequeued,
then the semaphore operations have already been executed by
<A HREF="#update_queue">update_queue()</A>.  The
queue <CODE>status</CODE>, which could be 0 for success
or a negated error code for failure, becomes the return value of
the system call.
</LI>
</UL>
<H3><A NAME="sem_structures"></A> Semaphore Specific Support Structures</H3>

<P>The following structures are used specifically for semaphore support:
<P>
<H3><A NAME="struct_sem_array"></A> struct sem_array</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* One sem_array data structure for each set of semaphores in the system. */
struct sem_array {
    struct kern_ipc_perm sem_perm; /* permissions .. see ipc.h */
    time_t sem_otime; /* last semop time */
    time_t sem_ctime; /* last change time */
    struct sem *sem_base; /* ptr to first semaphore in array */
    struct sem_queue *sem_pending; /* pending operations to be processed */
    struct sem_queue **sem_pending_last; /* last pending operation */
    struct sem_undo *undo; /* undo requests on this array * /
    unsigned long sem_nsems; /* no. of semaphores in array */
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_sem"></A> struct sem</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* One semaphore structure for each semaphore in the system. */
struct sem {
        int     semval;         /* current value */
        int     sempid;         /* pid of last operation */
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_seminfo"></A> struct seminfo</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct  seminfo {
        int semmap;
        int semmni;
        int semmns;
        int semmnu;
        int semmsl;
        int semopm;
        int semume;
        int semusz;
        int semvmx;
        int semaem;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_semid64_ds"></A> struct semid64_ds</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct semid64_ds {
        struct ipc64_perm sem_perm;             /* permissions .. see
ipc.h */
        __kernel_time_t sem_otime;              /* last semop time */
        unsigned long   __unused1;
        __kernel_time_t sem_ctime;              /* last change time */
        unsigned long   __unused2;
        unsigned long   sem_nsems;              /* no. of semaphores in
array */
        unsigned long   __unused3;
        unsigned long   __unused4;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_sem_queue"></A> struct sem_queue</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* One queue for each sleeping process in the system. */
struct sem_queue {
        struct sem_queue *      next;    /* next entry in the queue */
        struct sem_queue **     prev;    /* previous entry in the queue, *(q->pr
ev) == q */
        struct task_struct*     sleeper; /* this process */
        struct sem_undo *       undo;    /* undo structure */
        int                     pid;     /* process id of requesting process */
        int                     status;  /* completion status of operation */
        struct sem_array *      sma;     /* semaphore array for operations */
        int                     id;      /* internal sem id */
        struct sembuf *         sops;    /* array of pending operations */
        int                     nsops;   /* number of operations */
        int                     alter;   /* operation will alter semaphore */
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_sembuf"></A> struct sembuf</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* semop system calls takes an array of these. */
struct sembuf {
        unsigned short  sem_num;        /* semaphore index in array */
        short           sem_op;         /* semaphore operation */
        short           sem_flg;        /* operation flags */
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_sem_undo"></A> struct sem_undo</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* Each task has a list of undo requests. They are executed automatically
 * when the process exits.
 */
struct sem_undo {
        struct sem_undo *       proc_next;      /* next entry on this process */
        struct sem_undo *       id_next;        /* next entry on this semaphore set */
        int                     semid;          /* semaphore set identifier */
        short *                 semadj;         /* array of adjustments, one per
 semaphore */
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="sem_primitives"></A> Semaphore Support Functions</H3>

<P>The following functions are used specifically in support of
semaphores:
<P>
<H3><A NAME="newary"></A> newary()</H3>

<P>newary() relies on the
<A HREF="#ipc_alloc">ipc_alloc()</A>
function to allocate the memory
required for the new semaphore set. It allocates enough memory
for the semaphore set descriptor and for each of the semaphores
in the set.  The allocated memory is cleared, and the address of the
first element of the semaphore set descriptor is passed to
<A HREF="#ipc_addid">ipc_addid()</A>.
<A HREF="#ipc_addid">ipc_addid()</A> reserves an array entry
for the new semaphore set descriptor and initializes the
(
<A HREF="#struct_kern_ipc_perm">struct kern_ipc_perm</A>) data for the set.
The global <CODE>used_sems</CODE> variable is updated by the number of
semaphores in the new set and the initialization of the
(
<A HREF="#struct_kern_ipc_perm">struct kern_ipc_perm</A>)
data for the new set is completed. Other
initialization for this set performed are listed below:
<P>
<UL>
<LI>  The <CODE>sem_base</CODE> element for the set is initialized
to the address immediately following the
(
<A HREF="#struct_sem_array">struct sem_array</A>)
portion of the newly allocated data. This corresponds to
the location of the first semaphore in the set.
</LI>
<LI>  The <CODE>sem_pending</CODE> queue is initialized as empty.</LI>
</UL>
<P>All of the operations following the call to
<A HREF="#ipc_addid">ipc_addid()</A>
are performed while holding the global semaphores spinlock. After
unlocking the global semaphores spinlock, newary() calls
<A HREF="#ipc_buildid">ipc_buildid()</A>
(via sem_buildid()). This function uses the index
of the semaphore set descriptor to create a unique ID, that is then
returned to the caller of newary().
<P>
<H3><A NAME="freeary"></A> freeary()</H3>

<P>freeary() is called by
<A HREF="#semctl_down">semctl_down()</A> to perform the
functions listed below. It is called with the global semaphores
spinlock locked and it returns with the spinlock unlocked
<P>
<UL>
<LI>  The
<A HREF="#func_ipc_rmid">ipc_rmid()</A> function
is called (via the
sem_rmid() wrapper) to delete the ID for the semaphore
set and to retrieve a pointer to the semaphore set.
</LI>
<LI>  The undo list for the semaphore set is invalidated.</LI>
<LI>  All pending processes are awakened and caused to fail
with EIDRM.
</LI>
<LI>  The number of used semaphores is reduced by the number
of semaphores in the removed set.
</LI>
<LI>  The memory associated with the semaphore set is freed.</LI>
</UL>
<H3><A NAME="semctl_down"></A> semctl_down()</H3>

<P>semctl_down() provides the
<A HREF="#semctl_ipc_rmid">IPC_RMID</A> and
<A HREF="#semctl_ipc_set">IPC_SET</A> operations of the
semctl() system call.  The semaphore set ID and the access permissions
are verified prior to either of these operations, and in either
case, the global semaphore spinlock is held throughout the
operation.
<P>
<H3><A NAME="semctl_ipc_rmid"></A> IPC_RMID</H3>

<P>The IPC_RMID operation calls
<A HREF="#freeary">freeary()</A> to remove the semaphore set.
<H3><A NAME="semctl_ipc_set"></A> IPC_SET</H3>

<P>The IPC_SET operation updates the <CODE>uid</CODE>, <CODE>gid</CODE>,
<CODE>mode</CODE>, and <CODE>ctime</CODE> elements of the semaphore set.
<H3><A NAME="semctl_nolock"></A> semctl_nolock()</H3>

<P>semctl_nolock() is called by
<A HREF="#sys_semctl">sys_semctl()</A>
to perform the IPC_INFO, SEM_INFO and SEM_STAT functions.
<P>
<H3><A NAME="IPC_INFO_and_SEM_INFO"></A> IPC_INFO and SEM_INFO</H3>

<P>IPC_INFO and SEM_INFO cause a temporary
<A HREF="#struct_seminfo">seminfo</A>
buffer to be initialized and loaded with unchanging semaphore
statistical data. Then, while holding the global <CODE>sem_ids.sem</CODE>
kernel semaphore, the <CODE>semusz</CODE> and <CODE>semaem</CODE> elements of
the
<A HREF="#struct_seminfo">seminfo</A> structure are
updated according to the given command (IPC_INFO or SEM_INFO).
The return value of the system call is set to the maximum
semaphore set ID.
<H3><A NAME="SEM_STAT"></A> SEM_STAT</H3>

<P>SEM_STAT causes a temporary
<A HREF="#struct_semid64_ds">semid64_ds</A>
buffer to be initialized.  The global
semaphore spinlock is then held while copying the <CODE>sem_otime</CODE>,
<CODE>sem_ctime</CODE>, and <CODE>sem_nsems</CODE> values into the buffer. This data is
then copied to user space.
<H3><A NAME="semctl_main"></A> semctl_main()</H3>

<P>semctl_main() is called by
<A HREF="#sys_semctl">sys_semctl()</A> to perform many
of the supported functions, as described in the subsections below.
Prior to performing any of the following operations, semctl_main()
locks the global semaphore spinlock and validates the
semaphore set ID and the permissions. The spinlock is released
before returning.
<P>
<H3><A NAME="GETALL"></A> GETALL</H3>

<P>The GETALL operation loads the current semaphore values into
a temporary kernel buffer and copies
them out to user space. The small stack buffer is used if the
semaphore set is small. Otherwise, the spinlock is temporarily
dropped in order to allocate a larger buffer. The spinlock is
held while copying the semaphore values in to the temporary buffer.
<H3><A NAME="SETALL"></A> SETALL</H3>

<P>The SETALL operation copies semaphore values from user space into a temporary buffer,
and then into the semaphore set. The spinlock is dropped while
copying the values from user space into the temporary buffer,
and while verifying reasonable values. If the semaphore set
is small, then a stack buffer is used, otherwise a larger buffer
is allocated. The spinlock is regained and held while the
following operations are performed on the semaphore set:
<P>
<UL>
<LI>  The semaphore values are copied into the semaphore set.</LI>
<LI>  The semaphore adjustments of the undo queue for
the semaphore set are cleared.
</LI>
<LI>  The <CODE>sem_ctime</CODE> value for the semaphore set is set.
</LI>
<LI>  The
<A HREF="#update_queue">update_queue()</A>
function is called to traverse
the queue of pending semops and look for any tasks that
can be completed as a result of the SETALL operation. Any
pending tasks that are no longer blocked are awakened.</LI>
</UL>
<H3><A NAME="IPC_STAT"></A> IPC_STAT</H3>

<P>In the IPC_STAT operation, the <CODE>sem_otime</CODE>,
<CODE>sem_ctime</CODE>, and <CODE>sem_nsems</CODE> value are copied into
a stack buffer. The data is then copied to user space after
dropping the spinlock.
<H3><A NAME="GETVAL"></A> GETVAL</H3>

<P>For GETVAL in the non-error case, the return value for the system call is
set to the value of the specified semaphore.
<H3><A NAME="GETPID"></A> GETPID</H3>

<P>For GETPID in the non-error case, the return value for the system call is
set to the <CODE>pid</CODE> associated with the last operation on the
semaphore.
<H3><A NAME="GETNCNT"></A> GETNCNT</H3>

<P>For GETNCNT in the non-error case, the return value for the system call
is set to the number of processes waiting on the semaphore
being less than zero. This number is calculated by the
<A HREF="#count_semncnt">count_semncnt()</A> function.
<H3><A NAME="GETZCNT"></A> GETZCNT</H3>

<P>For GETZCNT in the non-error case, the return value for the system call
is set to the number of processes waiting on the semaphore
being set to zero. This number is calculated by the
<A HREF="#count_semzcnt">count_semzcnt()</A> function.
<H3><A NAME="SETVAL"></A> SETVAL</H3>

<P>After validating the new semaphore value, the following
functions are performed:
<P>
<UL>
<LI>  The undo queue is searched for any adjustments to
this semaphore. Any adjustments that are found are reset to
zero.
</LI>
<LI>  The semaphore value is set to the value provided.</LI>
<LI>  The <CODE>sem_ctime</CODE> value for the semaphore set is updated.</LI>
<LI>  The
<A HREF="#update_queue">update_queue()</A>
function is called to traverse
the queue of pending semops and look for any tasks that
can be completed as a result of the
<A HREF="#SETALL">SETALL</A> operation. Any
pending tasks that are no longer blocked are awakened.</LI>
</UL>
<H3><A NAME="count_semncnt"></A> count_semncnt()</H3>

<P>count_semncnt() counts the number of tasks waiting on the value of a semaphore
to be less than zero.
<H3><A NAME="count_semzcnt"></A> count_semzcnt()</H3>

<P>count_semzcnt() counts the number of tasks waiting on the value of a semaphore
to be zero.
<H3><A NAME="update_queue"></A> update_queue()</H3>

<P>update_queue() traverses the queue of pending semops for
a semaphore set and calls
<A HREF="#try_atomic_semop">try_atomic_semop()</A>
to determine which sequences of semaphore operations
would succeed. If the status of the queue element
indicates that blocked tasks have already
been awakened, then the queue element is skipped over. For other
elements of the queue, the <CODE>q-alter</CODE> flag
is passed as the undo parameter to
<A HREF="#try_atomic_semop">try_atomic_semop()</A>,
indicating that any
altering operations should be undone before returning.
<P>If the sequence of operations would block, then
update_queue() returns without making any changes.
<P>A sequence of operations can fail if one of the semaphore
operations would cause an invalid semaphore value, or an
operation marked IPC_NOWAIT is unable to complete. In such a
case, the task that is blocked on the sequence of semaphore
operations is awakened, and the queue status is set with an
appropriate error code. The queue element is also dequeued.
<P>If the sequence of operations is non-altering, then
they would have passed a zero value as the undo parameter to
<A HREF="#try_atomic_semop">try_atomic_semop()</A>.
If these operations succeeded, then they
are considered complete and are removed from the queue.
The blocked task is awakened, and the queue element
<CODE>status</CODE> is set to indicate success.
<P>If the sequence of operations would alter the semaphore
values, but can succeed, then sleeping tasks that no longer
need to be blocked are awakened. The queue status is set to
1 to indicate that the blocked task has been awakened. The
operations have not been performed, so the queue element is not
removed from the queue. The semaphore operations would be
executed by the awakened task.
<H3><A NAME="try_atomic_semop"></A> try_atomic_semop()</H3>

<P>try_atomic_semop() is called by
<A HREF="#sys_semop">sys_semop()</A>
and
<A HREF="#update_queue">update_queue()</A>
to determine if a sequence of semaphore operations will all
succeed. It determines this by attempting to perform each of the
operations.
<P>If a blocking operation is encountered, then the process
is aborted and all operations are reversed. -EAGAIN is returned
if IPC_NOWAIT is set. Otherwise 1 is returned to indicate that
the sequence of semaphore operations is blocked.
<P>If a semaphore value is adjusted beyond system limits, then
then all operations are reversed, and -ERANGE is returned.
<P>If all operations in the sequence succeed, and the <CODE>do_undo</CODE>
parameter is non-zero, then all operations are reversed, and 0
is returned. If the <CODE>do_undo</CODE> parameter is zero, then all operations
succeeded and remain in force, and the <CODE>sem_otime</CODE>, field of the
semaphore set is updated.
<H3><A NAME="sem_revalidate"></A> sem_revalidate()</H3>

<P>sem_revalidate() is called when the global semaphores spinlock
has been temporarily dropped and needs to be locked again. It is
called by
<A HREF="#semctl_main">semctl_main()</A>
and
<A HREF="#alloc_undo">alloc_undo()</A>.  It validates the
semaphore ID and permissions and on success, returns with the
global semaphores spinlock locked.
<H3><A NAME="freeundos"></A> freeundos()</H3>

<P>freeundos() traverses the process undo list in search of
the desired undo structure. If found, the undo structure is removed from the
list and freed. A pointer to the next undo structure on the
process list is returned.
<H3><A NAME="alloc_undo"></A> alloc_undo()</H3>

<P>alloc_undo() expects to be called with the global semaphores
spinlock locked. In the case of an error, it returns with it
unlocked.
<P>The global semaphores spinlock is unlocked, and kmalloc() is
called to allocate sufficient memory for both the
<A HREF="#struct_sem_undo">sem_undo</A>
structure, and also an array of one adjustment value for each
semaphore in the set. On success, the global spinlock is regained
with a call to
<A HREF="#sem_revalidate">sem_revalidate()</A>.
<P>The new semundo structure is then initialized, and the address
of this structure is placed at the address provided by the
caller. The new undo structure is then placed at the head of undo
list for the current task.
<H3><A NAME="sem_exit"></A> sem_exit()</H3>

<P>sem_exit() is called by do_exit(), and is responsible for
executing all of the undo adjustments for the exiting task.
<P>If the current process was blocked on a semaphore, then it is
removed from the
<A HREF="#struct_sem_queue">sem_queue</A>
list while holding the global semaphores spinlock.
<P>The undo list for the current task is then traversed, and the
following operations are performed while holding and releasing the
the global semaphores spinlock around the processing of each
element of the list. The following operations are performed for
each of the undo elements:
<P>
<UL>
<LI>  The undo structure and the semaphore set ID are validated.</LI>
<LI>  The undo list of the corresponding semaphore set is
searched to find a reference to the same undo structure and to
remove it from that list.</LI>
<LI>  The adjustments indicated in the undo structure are
applied to the semaphore set.</LI>
<LI>  The <CODE>sem_otime</CODE> parameter of the semaphore set is updated.</LI>
<LI>
<A HREF="#update_queue">update_queue()</A> is called
to traverse the queue of
pending semops and awaken any sleeping tasks that no longer
need to be blocked as a result of executing the undo
operations.</LI>
<LI>  The undo structure is freed.</LI>
</UL>
<P>When the processing of the list is complete, the
current->semundo value is cleared.
<H2><A NAME="message"></A> <A NAME="ss5.2">5.2 Message queues</A>
</H2>

<P>
<H3><A NAME="Message_System_Call_Interfaces"></A> Message System Call Interfaces</H3>

<P>
<H3><A NAME="sys_msgget"></A> sys_msgget()</H3>

<P>The entire call to sys_msgget() is protected by
the global message queue semaphore
(
<A HREF="#struct_ipc_ids">msg_ids.sem</A>).
<P>In the case where a new message queue must be created,
the
<A HREF="#newque">newque()</A> function is
called to create and initialize
a new message queue, and the new queue ID is returned to
the caller.
<P>If a key value is provided for an existing message queue,
then
<A HREF="#ipc_findkey">ipc_findkey()</A> is called
to look up the corresponding index in the global message queue
descriptor array (msg_ids.entries). The
parameters and permissions of the caller are verified before
returning the message queue ID. The look up operation and
verification are performed while the global message queue
spinlock(msg_ids.ary) is held.
<H3><A NAME="sys_msgctl"></A> sys_msgctl()</H3>

<P>The parameters passed to sys_msgctl() are: a message
queue ID (<CODE>msqid</CODE>), the operation
(<CODE>cmd</CODE>), and a pointer to a user space buffer of type
<A HREF="#struct_msqid_ds">msgid_ds</A>
(<CODE>buf</CODE>).  Six operations are
provided in this function: IPC_INFO, MSG_INFO,IPC_STAT,
MSG_STAT, IPC_SET and IPC_RMID.  The message queue
ID and the operation parameters are validated; then, the operation(cmd)
is performed as follows:
<P>
<H3><A NAME="msgctl_IPCINFO"></A> IPC_INFO ( or MSG_INFO)</H3>

<P>The global message queue information is copied to user space.
<H3><A NAME="msgctl_IPCSTAT"></A> IPC_STAT ( or MSG_STAT)</H3>

<P>A temporary buffer of type
<A HREF="#struct_msqid64_ds">struct msqid64_ds</A>
is initialized and the global message queue spinlock is locked.
After verifying the access permissions of the calling process,
the message queue information associated with the message
queue ID is loaded into the temporary buffer, the global
message queue spinlock is unlocked, and the contents of
the temporary buffer are copied out to user space by
<A HREF="#copy_msqid_to_user">copy_msqid_to_user()</A>.
<H3><A NAME="msgctl_IPCSET"></A> IPC_SET</H3>

<P>The user data is copied in via
<A HREF="#copy_msqid_to_user">copy_msqid_to_user()</A>.  The global
message queue semaphore and spinlock are obtained and released
at the end.  After the the message queue ID and the current
process access permissions are validated, the message queue
information is updated with the user provided data.  Later,
<A HREF="#expunge_all">expunge_all()</A> and
<A HREF="#ss_wakeup">ss_wakeup()</A>
are called to wake up all
processes sleeping on the receiver and sender waiting queues
of the message queue. This is because some receivers may now
be excluded by stricter access permissions and some senders
may now be able to send the message due to an increased
queue size.
<H3><A NAME="msgctl_IPCRMID"></A> IPC_RMID</H3>

<P>The global message queue semaphore
is obtained and the global message queue spinlock is locked.
After validating the message queue ID and the current task
access permissions,
<A HREF="#freeque">freeque()</A>
is called to free the resources related to the message queue ID.
The global message queue semaphore and spinlock are released.
<H3><A NAME="sys_msgsnd"></A> sys_msgsnd()</H3>

<P>sys_msgsnd() receives as parameters a message queue ID
(<CODE>msqid</CODE>), a pointer to a buffer of type
<A HREF="#struct_msg_msg">struct msg_msg</A>
(<CODE>msgp</CODE>), the size of the message to be sent
(<CODE>msgsz</CODE>), and a flag indicating wait vs.
not wait (<CODE>msgflg</CODE>). There are two task waiting
queues and one message waiting queue associated with the message
queue ID. If there is a task in the receiver waiting queue
that is waiting for this message, then the message is
delivered directly to the receiver, and the receiver is
awakened. Otherwise, if there is enough space available in
the message waiting queue, the message is saved in this
queue. As a last resort, the sending task enqueues itself
on the sender waiting queue. A more in-depth discussion of the
operations performed by sys_msgsnd() follows:
<P>
<OL>
<LI>  Validates the user buffer address and the message
type, then invokes
<A HREF="#load_msg">load_msg()</A> to load the
contents of the user message into a temporary object
<CODE>
<A NAME="msg"></A> msg</CODE> of type
<A HREF="#struct_msg_msg">struct msg_msg</A>.
The message type and message size fields
of <CODE>msg</CODE> are also initialized.</LI>
<LI>  Locks the global message queue spinlock and gets
the message queue descriptor associated with the
message queue ID. If no such message queue exists,
returns EINVAL.</LI>
<LI>
<A NAME="sndretry"></A>
Invokes
<A HREF="#ipc_checkid">ipc_checkid()</A>
(via msg_checkid())to verify that the message
queue ID is valid and calls
<A HREF="#ipcperms">ipcperms()</A> to check the
calling process' access permissions.</LI>
<LI>  Checks the message size and the space left in
the message waiting queue to see if there is enough
room to store the message. If not, the following
substeps are performed:

<OL>
<LI>  If IPC_NOWAIT is specified in
<CODE>msgflg</CODE> the global message
queue spinlock is unlocked, the memory
resources for the message are freed, and EAGAIN
is returned.</LI>
<LI>  Invokes
<A HREF="#ss_add">ss_add()</A> to
enqueue the current
task in the sender waiting queue. It also unlocks
the global message queue spinlock and invokes
schedule() to put the current task to sleep.</LI>
<LI>  When awakened, obtains the global spinlock
again and verifies that the message queue ID
is still valid. If the message queue ID is not valid,
ERMID is returned.</LI>
<LI>  Invokes
<A HREF="#ss_del">ss_del()</A>
to remove the sending task from the sender
waiting queue. If there is any signal pending
for the task, sys_msgsnd() unlocks the
global spinlock,
invokes
<A HREF="#free_msg">free_msg()</A>
to free the message buffer,
and returns EINTR. Otherwise, the function goes
<A HREF="#sndretry">back</A>
to check again whether there is enough space
in the message waiting queue.</LI>
</OL>
</LI>
<LI>  Invokes
<A HREF="#pipelined_send">pipelined_send()</A>
to try to send the message to the waiting receiver directly.</LI>
<LI>  If there is no receiver waiting for this message,
enqueues <CODE>msg</CODE> into the message waiting
queue(msq->q_messages). Updates the
<CODE>q_cbytes</CODE> and
the <CODE>q_qnum</CODE> fields of the message
queue descriptor, as well as the global variables
<CODE>msg_bytes</CODE> and
<CODE>msg_hdrs</CODE>, which indicate the total
number of bytes used for messages and the total number
of messages system wide.</LI>
<LI>  If the message has been successfully sent or
enqueued, updates the <CODE>q_lspid</CODE>
and the <CODE>q_stime</CODE> fields
of the message queue descriptor and releases the global
message queue spinlock.</LI>
</OL>
<H3><A NAME="sys_msgrcv"></A> sys_msgrcv()</H3>

<P>The sys_msgrcv() function receives as parameters
a message queue ID
(<CODE>msqid</CODE>), a pointer to a buffer of type
<A HREF="#struct_msg_msg">msg_msg</A>
(<CODE>msgp</CODE>), the desired
message size(<CODE>msgsz</CODE>), the message type
(<CODE>msgtyp</CODE>), and the flags
(<CODE>msgflg</CODE>). It searches the message waiting queue
associated with the message queue ID, finds the first
message in the queue which matches the request type, and
copies it into the given user buffer. If no such message
is found in the message waiting queue, the requesting task
is enqueued into the receiver waiting queue until the
desired message is available. A more in-depth discussion of the
operations performed by sys_msgrcv() follows:
<P>
<OL>
<LI>  First, invokes
<A HREF="#convert_mode">convert_mode()</A>
to derive the search mode from
<CODE>msgtyp</CODE>.  sys_msgrcv() then locks
the global message
queue spinlock and obtains the message queue descriptor
associated with the message queue ID. If no such
message queue exists, it returns EINVAL.</LI>
<LI>  Checks whether the current task has the correct
permissions to access the message queue.</LI>
<LI>
<A NAME="rcvretry"></A>
Starting from the first message in the message
waiting queue, invokes
<A HREF="#testmsg">testmsg()</A> to check whether
the message type matches the required type.  sys_msgrcv()
continues searching until a matched message is found or the whole
waiting queue is exhausted. If the search mode is
SEARCH_LESSEQUAL, then the first message on the queue
with the lowest type less than or equal to
<CODE>msgtyp</CODE> is searched.</LI>
<LI>  If a message is found, sys_msgrcv() performs
the following substeps:
<OL>
<LI>  If the message size is larger than
the desired size and <CODE>msgflg</CODE>
indicates no error allowed, unlocks the global
message queue spinlock and returns E2BIG.</LI>
<LI>  Removes the message from the message
waiting queue and updates the message queue
statistics.</LI>
<LI>  Wakes up all tasks sleeping on the senders
waiting queue. The removal of a message from
the queue in the previous step makes it possible
for one of the senders to progress. Goes to
the
<A HREF="#laststep">last step</A></LI>
</OL>
</LI>
<LI>  If no message matching the receivers criteria is found
in the message waiting queue, then <CODE>msgflg</CODE>
is checked. If IPC_NOWAIT is set, then the global message
queue spinlock is unlocked and ENOMSG is returned. Otherwise,
the receiver is enqueued on the receiver waiting queue as
follows:
<OL>
<LI>  A
<A HREF="#struct_msg_receiver">msg_receiver</A> data structure
<CODE>msr</CODE> is allocated and is
added to the head of waiting queue.</LI>
<LI>  The <CODE>r_tsk</CODE> field of <CODE>msr</CODE>
is set to current task.</LI>
<LI>  The <CODE>r_msgtype</CODE> and
<CODE>r_mode</CODE> fields are
initialized with the desired message type and
mode respectively.</LI>
<LI>  If <CODE>msgflg</CODE> indicates
MSG_NOERROR, then the r_maxsize field of
<CODE>msr</CODE> is set to be the
value of <CODE>msgsz</CODE> otherwise
it is set to be INT_MAX.</LI>
<LI>  The <CODE>r_msg</CODE> field
is initialized to indicate that
no message has been received yet.</LI>
<LI>  After the initialization is complete,
the status of the receiving task is set to
TASK_INTERRUPTIBLE, the global message queue
spinlock is unlocked, and schedule() is invoked.</LI>
</OL>
</LI>
<LI>  After the receiver is awakened,
the <CODE>r_msg</CODE> field of
<CODE>msr</CODE> is checked.  This field is used to
store the pipelined message or in the case of an error,
to store the error status.
If the <CODE>r_msg</CODE> field is filled
with the desired message, then go to the
<A HREF="#laststep">last step</A>  Otherwise,
the global message queue spinlock is locked again.</LI>
<LI>  After obtaining the spinlock,
the <CODE>r_msg</CODE> field is
re-checked to see if the message was received while
waiting for the spinlock. If the message has been
received, the
<A HREF="#laststep">last step</A>
occurs.</LI>
<LI>  If the <CODE>r_msg</CODE> field remains
unchanged, then the task was
awakened in order to retry.  In this case,
<CODE>msr</CODE> is dequeued. If there is a
signal pending for the task, then the global message
queue spinlock is unlocked and EINTR is returned.
Otherwise, the function needs to go
<A HREF="#rcvretry">back</A> and retry.</LI>
<LI>  If the <CODE>r_msg</CODE> field shows
that an error occurred
while sleeping, the global message queue spinlock
is unlocked and the error is returned.</LI>
<LI>
<A NAME="laststep"></A>
After validating that the address of the user buffer
<CODE>msp</CODE> is valid, message type is loaded
into the <CODE>mtype</CODE> field of
<CODE>msp</CODE>,and
<A HREF="#store_msg">store_msg()</A>
is invoked to copy the message contents to
the <CODE>mtext</CODE> field of
<CODE>msp</CODE>. Finally the memory for the message is
freed by function
<A HREF="#free_msg">free_msg()</A>.</LI>
</OL>
<H3><A NAME="datastructs"></A> Message Specific Structures</H3>

<P>Data structures for message queues are defined in msg.c.
<H3><A NAME="struct_msg_queue"></A> struct msg_queue</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* one msq_queue structure for each present queue on the system */
struct msg_queue {
        struct kern_ipc_perm q_perm;
        time_t q_stime;                 /* last msgsnd time */
        time_t q_rtime;                 /* last msgrcv time */
        time_t q_ctime;                 /* last change time */
        unsigned long q_cbytes;         /* current number of bytes on queue */
        unsigned long q_qnum;           /* number of messages in queue */
        unsigned long q_qbytes;         /* max number of bytes on queue */
        pid_t q_lspid;                  /* pid of last msgsnd */
        pid_t q_lrpid;                  /* last receive pid */

        struct list_head q_messages;
        struct list_head q_receivers;
        struct list_head q_senders;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_msg_msg"></A> struct msg_msg</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* one msg_msg structure for each message */
struct msg_msg {
        struct list_head m_list;
        long  m_type;
        int m_ts;           /* message text size */
        struct msg_msgseg* next;
        /* the actual message follows immediately */
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_msg_msgseg"></A> struct msg_msgseg</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* message segment for each message */
struct msg_msgseg {
        struct msg_msgseg* next;
        /* the next part of the message follows immediately */
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_msg_sender"></A> struct msg_sender</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* one msg_sender for each sleeping sender */
struct msg_sender {
        struct list_head list;
        struct task_struct* tsk;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_msg_receiver"></A> struct msg_receiver</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* one msg_receiver structure for each sleeping receiver */
struct msg_receiver {
        struct list_head r_list;
        struct task_struct* r_tsk;

        int r_mode;
        long r_msgtype;
        long r_maxsize;

        struct msg_msg* volatile r_msg;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_msqid64_ds"></A> struct msqid64_ds</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct msqid64_ds {
        struct ipc64_perm msg_perm;
        __kernel_time_t msg_stime;      /* last msgsnd time */
        unsigned long   __unused1;
        __kernel_time_t msg_rtime;      /* last msgrcv time */
        unsigned long   __unused2;
        __kernel_time_t msg_ctime;      /* last change time */
        unsigned long   __unused3;
        unsigned long  msg_cbytes;      /* current number of bytes on queue */
        unsigned long  msg_qnum;        /* number of messages in queue */
        unsigned long  msg_qbytes;      /* max number of bytes on queue */
        __kernel_pid_t msg_lspid;       /* pid of last msgsnd */
        __kernel_pid_t msg_lrpid;       /* last receive pid */
        unsigned long  __unused4;
        unsigned long  __unused5;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_msqid_ds"></A> struct msqid_ds</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
 struct msqid_ds {
        struct ipc_perm msg_perm;
        struct msg *msg_first;          /* first message on queue,unused  */
        struct msg *msg_last;           /* last message in queue,unused */
        __kernel_time_t msg_stime;      /* last msgsnd time */
        __kernel_time_t msg_rtime;      /* last msgrcv time */
        __kernel_time_t msg_ctime;      /* last change time */
        unsigned long  msg_lcbytes;     /* Reuse junk fields for 32 bit */
        unsigned long  msg_lqbytes;     /* ditto */
        unsigned short msg_cbytes;      /* current number of bytes on queue */
        unsigned short msg_qnum;        /* number of messages in queue */
        unsigned short msg_qbytes;      /* max number of bytes on queue */
        __kernel_ipc_pid_t msg_lspid;   /* pid of last msgsnd */
        __kernel_ipc_pid_t msg_lrpid;   /* last receive pid */
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="msg_setbuf"></A> msg_setbuf</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct msq_setbuf {
        unsigned long   qbytes;
        uid_t           uid;
        gid_t           gid;
        mode_t          mode;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="msgfuncs"></A> Message Support Functions</H3>

<P>
<H3><A NAME="newque"></A> newque()</H3>

<P>newque() allocates the memory for a new message queue
descriptor (
<A HREF="#struct_msg_queue">struct msg_queue</A>)
and then calls
<A HREF="#ipc_addid">ipc_addid()</A>, which
reserves a message queue array entry for the new message queue
descriptor.  The message queue descriptor is initialized as
follows:
<P>
<UL>
<LI>  The
<A HREF="#struct_kern_ipc_perm">kern_ipc_perm</A>
structure is initialized.</LI>
<LI>  The <CODE>q_stime</CODE> and <CODE>q_rtime</CODE> fields of the message
queue descriptor are initialized as 0. The <CODE>q_ctime</CODE>
field is set to be CURRENT_TIME.</LI>
<LI>  The maximum number of bytes allowed in this
queue message (<CODE>q_qbytes</CODE>) is set to be MSGMNB,
and the number of bytes currently used by the queue
(<CODE>q_cbytes</CODE>) is initialized as 0.</LI>
<LI>  The message waiting queue (<CODE>q_messages</CODE>),
the receiver waiting queue (<CODE>q_receivers</CODE>),
and the sender waiting queue (<CODE>q_senders</CODE>)
are each initialized as empty.</LI>
</UL>
<P>All the operations following the call to
<A HREF="#ipc_addid">ipc_addid()</A> are
performed while holding the global message queue spinlock.
After unlocking the spinlock, newque() calls msg_buildid(),
which maps directly to
<A HREF="#ipc_buildid">ipc_buildid()</A>.
<A HREF="#ipc_buildid">ipc_buildid()</A>
uses the index of the message queue descriptor to create a unique
message queue ID that is then returned to the caller of newque().
<H3><A NAME="freeque"></A> freeque()</H3>

<P>When a message queue is going to be removed, the freeque() function is
called.  This function assumes that the global message queue spinlock
is already locked by the calling function.  It frees all kernel
resources associated with that message queue. First, it calls
<A HREF="#func_ipc_rmid">ipc_rmid()</A> (via msg_rmid())
to remove the message queue descriptor from the array of global
message queue descriptors. Then it calls
<A HREF="#expunge_all">expunge_all</A> to wake up
all receivers and
<A HREF="#ss_wakeup">ss_wakeup()</A>
to wake up all senders sleeping on this message queue. Later
the global message queue spinlock is released.
All messages stored in this message queue are freed and the
memory for the message queue descriptor is freed.
<H3><A NAME="ss_wakeup"></A> ss_wakeup()</H3>

<P>ss_wakeup() wakes up all the tasks waiting in the given
message sender waiting queue. If this function is called by
<A HREF="#freeque">freeque()</A>, then all senders
in the queue are dequeued.
<H3><A NAME="ss_add"></A> ss_add()</H3>

<P>ss_add() receives as parameters a message queue descriptor
and a message sender data structure.  It fills the
<CODE>tsk</CODE> field of the message sender data
structure with the current process, changes the status of
current process to TASK_INTERRUPTIBLE,
then inserts the message sender data structure at the head of
the sender waiting queue of the given message queue.
<H3><A NAME="ss_del"></A> ss_del()</H3>

<P>If the given message sender data structure
(<CODE>mss</CODE>) is still in the associated sender
waiting queue, then ss_del() removes
<CODE>mss</CODE> from the queue.
<H3><A NAME="expunge_all"></A> expunge_all()</H3>

<P>expunge_all() receives as parameters a message queue
descriptor(<CODE>msq</CODE>) and an integer value
(<CODE>res</CODE>) indicating the reason for waking up the
receivers. For each sleeping receiver associated with
<CODE>msq</CODE>, the <CODE>r_msg</CODE>
field is set to the indicated
wakeup reason (<CODE>res</CODE>), and the associated receiving
task is awakened. This function is called when a message queue is
removed or a message control operation has been performed.
<H3><A NAME="load_msg"></A> load_msg()</H3>

<P>When a process sends a message, the
<A HREF="#sys_msgsnd">sys_msgsnd()</A> function
first invokes the load_msg() function to load the message
from user space to kernel space.  The message is represented in
kernel memory as a linked list of data blocks. Associated with
the first data block is a
<A HREF="#struct_msg_msg">msg_msg</A>
structure that describes the overall message. The datablock
associated with the msg_msg structure is limited to a size of
DATA_MSG_LEN. The data block and the structure are allocated in one
contiguous memory block that can be as large as one page in memory.
If the full message will not fit into this first data block, then
additional data blocks are allocated and are organized into a
linked list.  These additional data blocks are limited to a size
of DATA_SEG_LEN, and each include an associated
<A HREF="#struct_msg_msgseg">msg_msgseg)</A> structure. The
msg_msgseg structure and the associated data block are allocated in
one contiguous memory block that can be as large as one page in
memory.  This function returns the address of the new
<A HREF="#struct_msg_msg">msg_msg</A> structure on success.
<H3><A NAME="store_msg"></A> store_msg()</H3>

<P>The store_msg() function is called by
<A HREF="#sys_msgrcv">sys_msgrcv()</A> to reassemble a received
message into the user space buffer provided by the caller. The data
described by the
<A HREF="#struct_msg_msg">msg_msg</A>
structure and any
<A HREF="#struct_msg_msgseg">msg_msgseg</A>
structures are sequentially copied to the user space buffer.
<H3><A NAME="free_msg"></A> free_msg()</H3>

<P>The free_msg() function releases the memory for a message
data structure
<A HREF="#struct_msg_msg">msg_msg</A>,
and the message segments.
<H3><A NAME="convert_mode"></A> convert_mode()</H3>

<P>convert_mode() is called by
<A HREF="#sys_msgrcv">sys_msgrcv()</A>.
It receives as parameters the address of the specified message
type (<CODE>msgtyp</CODE>) and a flag (<CODE>msgflg</CODE>).
It returns the search mode to the caller based on the value of
<CODE>msgtyp</CODE> and <CODE>msgflg</CODE>.  If
<CODE>msgtyp</CODE> is null, then SEARCH_ANY is returned.
If <CODE>msgtyp</CODE> is less than 0, then <CODE>msgtyp</CODE> is
set to it's absolute value and SEARCH_LESSEQUAL is returned.
If MSG_EXCEPT is specified in <CODE>msgflg</CODE>, then SEARCH_NOTEQUAL is returned.
Otherwise SEARCH_EQUAL is returned.
<H3><A NAME="testmsg"></A> testmsg()</H3>

<P>The testmsg() function checks whether a message meets the
criteria specified by the receiver.  It returns 1 if one of the
following conditions is true:
<P>
<UL>
<LI>  The search mode indicates searching any message (SEARCH_ANY).</LI>
<LI>  The search mode is SEARCH_LESSEQUAL and the message type
is less than or equal to desired type.</LI>
<LI>  The search mode is SEARCH_EQUAL and the message type is
the same as desired type.</LI>
<LI>  Search mode is SEARCH_NOTEQUAL and the message type is
not equal to the specified type.</LI>
</UL>
<H3><A NAME="pipelined_send"></A> pipelined_send()</H3>

<P>pipelined_send() allows a process to directly send a message
to a waiting receiver rather than deposit the message in the
associated message waiting queue. The
<A HREF="#testmsg">testmsg()</A> function is
invoked to find the first receiver which is waiting for the
given message. If found, the waiting receiver is removed from
the receiver waiting queue, and the associated receiving task is
awakened. The message is stored in the <CODE>r_msg</CODE>
field of the receiver, and 1 is returned. In the case where no
receiver is waiting for the message, 0 is returned.
<P>In the process of searching for a receiver, potential
receivers may be found which have requested a size that is too small
for the given message. Such receivers are removed from the queue,
and are awakened with an error status of E2BIG, which is stored in the
<CODE>r_msg</CODE> field. The search then continues until
either a valid receiver is found, or the queue is exhausted.
<H3><A NAME="copy_msqid_to_user"></A> copy_msqid_to_user()</H3>

<P>copy_msqid_to_user() copies the contents of a kernel buffer to
the user buffer.  It receives as parameters a user buffer, a
kernel buffer of type
<A HREF="#struct_msqid64_ds">msqid64_ds</A>, and a
version flag indicating
the new IPC version vs. the old IPC version.  If the version
flag equals IPC_64, then copy_to_user() is invoked to copy from
the kernel buffer to the user buffer directly.  Otherwise a
temporary buffer of type struct msqid_ds is initialized, and the
kernel data is translated to this temporary buffer.  Later
copy_to_user() is called to copy the contents of the the temporary
buffer to the user buffer.
<H3><A NAME="copy_msqid_from_user"></A> copy_msqid_from_user()</H3>

<P>The function copy_msqid_from_user() receives as parameters
a kernel message buffer of type struct msq_setbuf, a user buffer
and a version flag indicating the new IPC version vs. the old IPC
version.  In the case of the new IPC version, copy_from_user()
is called to copy the contents of the user buffer
to a temporary buffer of type
<A HREF="#struct_msqid64_ds">msqid64_ds</A>.
Then, the <CODE>qbytes</CODE>,<CODE>uid</CODE>, <CODE>gid</CODE>,
and <CODE>mode</CODE> fields of the kernel buffer are
filled with the values of the
corresponding fields from the temporary buffer.  In the case of the
old IPC version, a temporary buffer of type struct
<A HREF="#struct_msqid_ds">msqid_ds</A> is used instead.
<H2><A NAME="sharedmem"></A> <A NAME="ss5.3">5.3 Shared Memory</A>
</H2>

<P>
<H3><A NAME="Shared_Memory_System_Call_Interfaces"></A> Shared Memory System Call Interfaces</H3>

<P>
<H3><A NAME="sys_shmget"></A> sys_shmget()</H3>

<P>The entire call to sys_shmget() is protected by the
global shared memory semaphore.
<P>In the case where a new shared memory segment must
be created, the
<A HREF="#newseg">newseg()</A> function is called to create
and initialize a new shared memory segment.  The ID of
the new segment is returned to the caller.
<P>In the case where a key value is provided for an
existing shared memory segment, the corresponding index
in the shared memory descriptors array is looked up, and
the parameters and permissions of the caller are verified
before returning the shared memory segment ID. The look up
operation and verification are performed while the global
shared memory spinlock is held.
<H3><A NAME="sys_shmctl"></A> sys_shmctl()</H3>

<P>
<H3><A NAME="IPC_INFO"></A> IPC_INFO</H3>

<P>A temporary
<A HREF="#struct_shminfo64">shminfo64</A>
buffer is loaded with system-wide
shared memory parameters and is copied out to user space for
access by the calling application.
<H3><A NAME="SHM_INFO"></A> SHM_INFO</H3>

<P>The global shared memory semaphore and the global shared
memory spinlock are held while gathering system-wide statistical
information for shared memory.  The
<A HREF="#shm_get_stat">shm_get_stat()</A> function is called
to calculate both the number of shared memory pages that are
resident in memory and the number of shared memory pages that are
swapped out. Other statistics include the total number of shared
memory pages and the number of shared memory segments in use.
The counts of <CODE>swap_attempts</CODE> and <CODE>swap_successes</CODE>
are hard-coded to zero. These statistics are stored in a temporary
<A HREF="#struct_shm_info">shm_info</A> buffer and copied out
to user space for the calling application.
<H3><A NAME="SHM_STAT_IPC_STAT"></A> SHM_STAT, IPC_STAT</H3>

<P>For SHM_STAT and IPC_STATA, a temporary buffer of type
<A HREF="#struct_shmid64_ds">struct shmid64_ds</A> is
initialized, and the global shared memory spinlock is locked.
<P>For the SHM_STAT case, the shared memory segment ID parameter is
expected to be a straight index (i.e. 0 to n where n is the
number of shared memory IDs in the system). After validating
the index,
<A HREF="#ipc_buildid">ipc_buildid()</A>
is called (via shm_buildid()) to
convert the index into a shared memory ID. In the passing case
of SHM_STAT, the shared memory ID will be the return value.
Note that this is an undocumented feature, but is maintained
for the ipcs(8) program.
<P>For the IPC_STAT case, the shared memory segment ID parameter is
expected to be an ID that was generated by a call to
<A HREF="#sys_shmget">shmget()</A>.
The ID is validated before proceeding. In the passing case of
IPC_STAT, 0 will be the return value.
<P>For both SHM_STAT and IPC_STAT, the access permissions of
the caller are verified. The desired statistics are loaded into
the temporary buffer and then copied out to the calling
application.
<H3><A NAME="SHM_LOCK_SHM_UNLOCK"></A> SHM_LOCK, SHM_UNLOCK</H3>

<P>After validating access permissions, the global shared
memory spinlock is locked, and the shared memory segment ID
is validated.  For both SHM_LOCK and SHM_UNLOCK,
<A HREF="#shmem_lock">shmem_lock()</A>
is called to perform the function. The parameters for
<A HREF="#shmem_lock">shmem_lock()</A>
identify the function to be performed.
<H3><A NAME="IPC_RMID"></A> IPC_RMID</H3>

<P>During IPC_RMID the global shared memory semaphore and
the global shared memory spinlock are held throughout this
function. The Shared Memory ID is validated, and then if
there are no current attachments,
<A HREF="#shm_destroy">shm_destroy()</A>
is called to destroy the shared memory segment.
Otherwise, the SHM_DEST flag is set to mark it for destruction,
and the IPC_PRIVATE flag is set to prevent other processes from
being able to reference the shared memory ID.
<H3><A NAME="IPC_SET"></A> IPC_SET</H3>

<P>After validating the shared memory segment ID and the user
access permissions, the <CODE>uid</CODE>, <CODE>gid</CODE>, and <CODE>mode</CODE> flags of the
shared memory segment are updated with the user data. The
<CODE>shm_ctime</CODE> field is also updated.  These changes are made
while holding the global shared memory semaphore and the
global share memory spinlock.
<H3><A NAME="sys_shmat"></A> sys_shmat()</H3>

<P>sys_shmat() takes as parameters, a shared memory segment ID,
an address at which the shared memory segment should be
attached(<CODE>shmaddr</CODE>), and flags which will be described below.
<P>If <CODE>shmaddr</CODE> is non-zero, and the SHM_RND flag is
specified, then <CODE>shmaddr</CODE> is rounded down to a multiple of
SHMLBA. If <CODE>shmaddr</CODE> is not a multiple of SHMLBA and SHM_RND
is not specified, then EINVAL is returned.
<P>The access permissions of the caller are validated and
the <CODE>shm_nattch</CODE> field for the shared memory segment is
incremented. Note that this increment guarantees that the
attachment count is non-zero and prevents the shared memory
segment from being destroyed during the process of attaching
to the segment.  These operations are performed while holding the
global shared memory spinlock.
<P>The do_mmap() function is called to create a virtual memory
mapping to the shared memory segment pages. This is done while
holding the <CODE>mmap_sem</CODE> semaphore of the current task. The
MAP_SHARED flag is passed to do_mmap(). If an address was
provided by the caller, then the MAP_FIXED flag is also passed
to do_mmap(). Otherwise, do_mmap() will select the virtual
address at which to map the shared memory segment.
<P>NOTE
<A HREF="#shm_inc">shm_inc()</A> will be invoked within the do_mmap()
function call via the <CODE>shm_file_operations</CODE> structure. This
function is called to set the PID, to set the current time, and
to increment the number of attachments to this shared memory
segment.
<P>After the call to do_mmap(), the global shared memory
semaphore and the global shared memory spinlock are both
obtained.  The attachment count is then decremented.  The the net
change to the attachment count is 1 for a call
to shmat() because of the call to
<A HREF="#shm_inc">shm_inc()</A>. If, after
decrementing the attachment count, the resulting count is found
to be zero, and if the segment is marked for destruction
(SHM_DEST), then
<A HREF="#shm_destroy">shm_destroy()</A> is
called to release the shared memory segment resources.
<P>Finally, the virtual address at which the shared memory is
mapped is returned to the caller at the user specified address.
If an error code had been returned by do_mmap(), then this
failure code is passed on as the return value for the system call.
<H3><A NAME="sys_shmdt"></A> sys_shmdt()</H3>

<P>The global shared memory semaphore is held while performing
sys_shmdt(). The <CODE>mm_struct</CODE> of the current
process is searched for the <CODE>vm_area_struct</CODE> associated with
the shared memory address. When it is found, do_munmap() is
called to undo the virtual address mapping for the shared memory segment.
<P>Note also that do_munmap() performs a call-back to
<A HREF="#shm_close">shm_close()</A>,
which performs the shared-memory book keeping functions, and
releases the shared memory segment resources if there are no other
attachments.
<P>sys_shmdt() unconditionally returns 0.
<H3><A NAME="shm_structures"></A> Shared Memory Support Structures</H3>

<P>
<H3><A NAME="struct_shminfo64"></A> struct shminfo64</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct shminfo64 {
        unsigned long   shmmax;
        unsigned long   shmmin;
        unsigned long   shmmni;
        unsigned long   shmseg;
        unsigned long   shmall;
        unsigned long   __unused1;
        unsigned long   __unused2;
        unsigned long   __unused3;
        unsigned long   __unused4;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_shm_info"></A> struct shm_info</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct shm_info {
        int used_ids;
        unsigned long shm_tot;  /* total allocated shm */
        unsigned long shm_rss;  /* total resident shm */
        unsigned long shm_swp;  /* total swapped shm */
        unsigned long swap_attempts;
        unsigned long swap_successes;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_shmid_kernel"></A> struct shmid_kernel</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct shmid_kernel /* private to the kernel */
{
        struct kern_ipc_perm    shm_perm;
        struct file *           shm_file;
        int                     id;
        unsigned long           shm_nattch;
        unsigned long           shm_segsz;
        time_t                  shm_atim;
        time_t                  shm_dtim;
        time_t                  shm_ctim;
        pid_t                   shm_cprid;
        pid_t                   shm_lprid;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_shmid64_ds"></A> struct shmid64_ds</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct shmid64_ds {
        struct ipc64_perm       shm_perm;       /* operation perms */
        size_t                  shm_segsz;      /* size of segment (bytes) */
        __kernel_time_t         shm_atime;      /* last attach time */
        unsigned long           __unused1;
        __kernel_time_t         shm_dtime;      /* last detach time */
        unsigned long           __unused2;
        __kernel_time_t         shm_ctime;      /* last change time */
        unsigned long           __unused3;
        __kernel_pid_t          shm_cpid;       /* pid of creator */
        __kernel_pid_t          shm_lpid;       /* pid of last operator */
        unsigned long           shm_nattch;     /* no. of current attaches */
        unsigned long           __unused4;
        unsigned long           __unused5;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_shmem_inode_info"></A> struct shmem_inode_info</H3>

<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct shmem_inode_info {
        spinlock_t      lock;
        unsigned long   max_index;
        swp_entry_t     i_direct[SHMEM_NR_DIRECT]; /* for the first blocks */
        swp_entry_t   **i_indirect; /* doubly indirect blocks */
        unsigned long   swapped;
        int             locked;     /* into memory */
        struct list_head        list;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="shm_primitives"></A> Shared Memory Support Functions</H3>

<P>
<H3><A NAME="newseg"></A> newseg()</H3>

<P>The newseg() function is called when a new shared memory
segment needs to be created.  It acts on three parameters for
the new segment the key, the flag, and the size.  After
validating that the size of the shared memory segment to be
created is between SHMMIN and SHMMAX and that the total number
of shared memory segments does not exceed SHMALL, it allocates
a new shared memory segment descriptor.
The
<A HREF="#shmem_file_setup">shmem_file_setup()</A>
function is invoked later to create an unlinked file of type
tmpfs.  The returned file pointer is saved in the <CODE>shm_file</CODE> field
of the associated shared memory segment descriptor. The files
size is set to be the same as the size of the segment.  The
new shared memory segment descriptor is initialized and inserted
into the global IPC shared memory descriptors array.  The shared
memory segment ID is created by shm_buildid()
(via
<A HREF="#ipc_buildid">ipc_buildid()</A>).
This segment ID is saved in the <CODE>id</CODE> field of the shared memory
segment descriptor, as well as in the <CODE>i_ino</CODE> field of the associated
inode.  In addition, the address of the shared memory operations
defined in structure <CODE>shm_file_operation</CODE> is stored in the associated
file.  The value of the global variable <CODE>shm_tot</CODE>, which indicates
the total number of shared memory segments system wide, is also
increased to reflect this change.  On success, the segment ID is
returned to the caller application.
<H3><A NAME="shm_get_stat"></A> shm_get_stat()</H3>

<P>shm_get_stat() cycles through all of the shared memory
structures, and calculates the total number of memory pages in
use by shared memory and the total number of shared memory pages
that are swapped out. There is a file structure and an inode
structure for each shared memory segment.  Since the required
data is obtained via the inode, the spinlock for each inode
structure that is accessed is locked and unlocked in sequence.
<H3><A NAME="shmem_lock"></A> shmem_lock()</H3>

<P>shmem_lock() receives as parameters a pointer to the
shared memory segment descriptor and a flag indicating
lock vs. unlock.The locking state of the shared memory
segment is stored in an associated inode. This state is compared
with the desired locking state; shmem_lock() simply returns if they match.
<P>While holding the semaphore of the associated inode, the
locking state of the inode is set. The following list of items
occur for each page in the shared memory segment:
<UL>
<LI>  find_lock_page() is called to lock the page (setting
PG_locked) and to increment the reference count of the page.
Incrementing the reference count assures that the shared
memory segment remains locked in memory throughout this
operation.</LI>
<LI>  If the desired state is locked, then PG_locked is cleared,
but the reference count remains incremented.</LI>
<LI>  If the desired state is unlocked, then the reference count
is decremented twice once for the current reference, and once
for the existing reference which caused the page to remain
locked in memory. Then PG_locked is cleared.</LI>
</UL>
<H3><A NAME="shm_destroy"></A> shm_destroy()</H3>

<P>During shm_destroy() the total number of shared memory pages
is adjusted to account for the removal of the shared memory segment.
<A HREF="#func_ipc_rmid">ipc_rmid()</A> is called
(via shm_rmid()) to remove the Shared Memory ID.
<A HREF="#shmem_lock">shmem_lock</A> is
called to unlock the shared memory pages, effectively decrementing
the reference counts to zero for each page. fput() is called to
decrement the usage counter <CODE>f_count</CODE> for the associated file object,
and if necessary, to release the file object resources.  kfree() is
called to free the shared memory segment descriptor.
<H3><A NAME="shm_inc"></A> shm_inc()</H3>

<P>shm_inc() sets the PID, sets the current time, and increments
the number of attachments for the given shared memory segment.
These operations are performed while holding the global shared
memory spinlock.
<H3><A NAME="shm_close"></A> shm_close()</H3>

<P>shm_close() updates the <CODE>shm_lprid</CODE> and the <CODE>shm_dtim</CODE> fields
and decrements the number of attached shared memory segments. If
there are no other attachments to the shared memory segment,
then
<A HREF="#shm_destroy">shm_destroy()</A> is called to
release the shared memory segment resources. These operations are
all performed while holding both the global shared memory semaphore
and the global  shared memory spinlock.
<H3><A NAME="shmem_file_setup"></A> shmem_file_setup()</H3>

<P>The function shmem_file_setup() sets up an unlinked file living
in the tmpfs file system with the given name and size.  If there
are enough systen memory resource for this file, it creates a new
dentry under the mount root of tmpfs, and allocates a new file
descriptor and a new inode object of tmpfs type.  Then it associates
the new dentry object with the new inode object by calling
d_instantiate() and saves the address of the dentry object in the
file descriptor. The <CODE>i_size</CODE> field of the inode object is set to
be the file size and the <CODE>i_nlink</CODE> field is set to be 0 in order to
mark the inode unlinked.  Also, shmem_file_setup() stores the
address of the <CODE>shmem_file_operations</CODE> structure in the <CODE>f_op</CODE> field,
and initializes <CODE>f_mode</CODE> and <CODE>f_vfsmnt</CODE> fields of the file descriptor
properly.  The function shmem_truncate() is called to complete the
initialization of the inode object. On success, shmem_file_setup()
returns the new file descriptor.
<H2><A NAME="ipc_primitives"></A> <A NAME="ss5.4">5.4 Linux IPC Primitives</A>
</H2>

<P>
<H3><A NAME="Generic_Linux_IPC_Primitives_used_with_Semaphores_Messages_and_Shared_Memory"></A> Generic Linux IPC Primitives used with Semaphores, Messages,and Shared Memory</H3>

<P>The semaphores, messages, and shared memory mechanisms of Linux
are built on a set of common primitives. These primitives are described in the sections below.
<P>
<H3><A NAME="ipc_alloc"></A> ipc_alloc()</H3>

<P>If the memory allocation is greater than PAGE_SIZE, then
vmalloc() is used to allocate memory. Otherwise, kmalloc() is
called with GFP_KERNEL to allocate the memory.
<H3><A NAME="ipc_addid"></A> ipc_addid()</H3>

<P>When a new semaphore set, message queue, or shared memory
segment is added,  ipc_addid() first calls
<A HREF="#grow_ary">grow_ary()</A> to
insure that the size of the corresponding descriptor array is
sufficiently large for the system maximum.  The array of descriptors
is searched for the first unused element. If an unused element
is found, the count of descriptors which are in use is incremented.
The
<A HREF="#struct_kern_ipc_perm">kern_ipc_perm</A> structure for the new resource descriptor
is then initialized, and the array index for the new descriptor
is returned. When ipc_addid() succeeds, it returns with the global
spinlock for the given IPC type locked.
<H3><A NAME="func_ipc_rmid"></A> ipc_rmid()</H3>

<P>ipc_rmid() removes the IPC descriptor from the the global
descriptor array of the IPC type, updates the count of IDs which
are in use, and adjusts the maximum ID in the corresponding
descriptor array if necessary. A pointer to  the IPC
descriptor associated with given IPC ID is returned.
<H3><A NAME="ipc_buildid"></A> ipc_buildid()</H3>

<P>ipc_buildid() creates a unique ID to be associated with
each descriptor within a given IPC type. This ID is created at
the time a new IPC element is added (e.g. a new shared memory
segment or a new semaphore set).  The IPC ID converts
easily into the corresponding descriptor array index. Each
IPC type maintains a sequence number which is incremented
each time a descriptor is added.  An ID is created by
multiplying the sequence number with SEQ_MULTIPLIER and adding
the product to the descriptor array index. The sequence number
used in creating a particular IPC ID is then stored in the
corresponding descriptor. The existence of the sequence number
makes it possible to detect the use of a stale IPC ID.
<H3><A NAME="ipc_checkid"></A> ipc_checkid()</H3>

<P>ipc_checkid() divides the given IPC ID by the SEQ_MULTIPLIER
and compares the quotient with the seq value saved corresponding
descriptor.  If they are equal, then the IPC ID is considered to
be valid and 1 is returned.  Otherwise, 0 is returned.
<H3><A NAME="grow_ary"></A> grow_ary()</H3>

<P>grow_ary() handles the possibility that the maximum
(tunable) number of IDs for a given IPC type can be dynamically
changed. It enforces the current maximum limit so that it is no
greater than the permanent system limit (IPCMNI) and adjusts it down
if necessary. It also insures that the existing descriptor array
is large enough.  If the existing array size is sufficiently large,
then the current maximum limit is returned.  Otherwise, a new larger
array is allocated, the old array is copied into the new array,
and the old array is freed.  The corresponding global
spinlock is held when updating the descriptor array for the
given IPC type.
<H3><A NAME="ipc_findkey"></A> ipc_findkey()</H3>

<P>ipc_findkey() searches through the descriptor array of
the specified
<A HREF="#struct_ipc_ids">ipc_ids</A> object,
and searches for the specified key. Once found, the index of
the corresponding descriptor is returned. If the key is not found,
then -1 is returned.
<H3><A NAME="ipcperms"></A> ipcperms()</H3>

<P>ipcperms() checks the user, group, and other permissions
for access to the IPC resources. It returns 0 if permission
is granted and -1 otherwise.
<H3><A NAME="ipc_lock"></A> ipc_lock()</H3>

<P>ipc_lock() takes an IPC ID as one of its parameters.
It locks the global spinlock for the given IPC type, and
returns a pointer to the descriptor corresponding to the
specified IPC ID.
<H3><A NAME="ipc_unlock"></A> ipc_unlock()</H3>

<P>ipc_unlock() releases the global spinlock for the indicated IPC
type.
<H3><A NAME="ipc_lockall"></A> ipc_lockall()</H3>

<P>ipc_lockall() locks the global spinlock for the given
IPC mechanism (i.e. shared memory, semaphores, and messaging).
<H3><A NAME="ipc_unlockall"></A> ipc_unlockall()</H3>

<P>ipc_unlockall() unlocks the global spinlock for the given
IPC mechanism (i.e. shared memory, semaphores, and messaging).
<H3><A NAME="ipc_get"></A> ipc_get()</H3>

<P>ipc_get() takes a pointer to a particular IPC type
(i.e. shared memory, semaphores, or message queues) and a
descriptor ID, and returns a pointer to the corresponding
IPC descriptor.  Note that although the descriptors for each
IPC type are of different data types, the common
<A HREF="#struct_kern_ipc_perm">kern_ipc_perm</A>
structure type is embedded as the first entity in every case.
The ipc_get() function returns this common data type. The expected
model is that ipc_get() is called through a wrapper function
(e.g. shm_get()) which casts the data type to the correct
descriptor data type.
<H3><A NAME="ipc_parse_version"></A> ipc_parse_version()</H3>

<P>ipc_parse_version() removes the IPC_64 flag from the command
if it is present and returns either IPC_64 or IPC_OLD.
<H3><A NAME="ipc_structures"></A> Generic IPC Structures used with Semaphores,Messages, and Shared Memory</H3>

<P>The semaphores, messages, and shared memory mechanisms all make
use of the following common structures:
<P>
<H3><A NAME="struct_kern_ipc_perm"></A> struct kern_ipc_perm</H3>

<P>Each of the IPC descriptors has a data object of this type
as the first element. This makes it possible to access any
descriptor from any of the generic IPC functions using a pointer
of this data type.
<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
/* used by in-kernel data structures */
struct kern_ipc_perm {
    key_t key;
    uid_t uid;
    gid_t gid;
    uid_t cuid;
    gid_t cgid;
    mode_t mode;
    unsigned long seq;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_ipc_ids"></A> struct ipc_ids</H3>

<P>The ipc_ids structure describes the common data for semaphores,
message queues, and shared memory. There are three global instances of
this data structure-- <CODE>semid_ds</CODE>,
<CODE>msgid_ds</CODE> and <CODE>shmid_ds</CODE>-- for
semaphores, messages and shared memory respectively. In each
instance, the <CODE>sem</CODE> semaphore is used to
protect access to the structure.
The <CODE>entries</CODE> field points to an IPC
descriptor array, and the
<CODE>ary</CODE> spinlock protects access to this array.  The
<CODE>seq</CODE> field is a global sequence number which will
be incremented when a new IPC resource is created.
<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct ipc_ids {
    int size;
    int in_use;
    int max_id;
    unsigned short seq;
    unsigned short seq_max;
    struct semaphore sem;
    spinlock_t ary;
    struct ipc_id* entries;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<H3><A NAME="struct_ipc_id"></A> struct ipc_id</H3>

<P>An array of struct ipc_id exists in each instance of
the
<A HREF="#struct_ipc_ids">ipc_ids</A> structure.
The array is dynamically allocated and may be replaced with
larger array by
<A HREF="#grow_ary">grow_ary()</A>
as required. The array is
sometimes referred to as the descriptor array, since the
<A HREF="#struct_kern_ipc_perm">kern_ipc_perm</A> data
type is used as the common descriptor data type by the IPC generic
functions.
<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
struct ipc_id {
    struct kern_ipc_perm* p;
};
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<HR>
Next
<A HREF="lki-4.html">Previous</A>
<A HREF="lki.html#toc5">Contents</A>
</BODY>
</HTML>