419 lines
12 KiB
HTML
419 lines
12 KiB
HTML
<!--startcut ==============================================-->
|
|
<!-- *** BEGIN HTML header *** -->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML><HEAD>
|
|
<title>LinuxThreads Programming LG #48</title>
|
|
</HEAD>
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
|
|
ALINK="#FF0000">
|
|
<!-- *** END HTML header *** -->
|
|
|
|
<!-- *** BEGIN navbar *** -->
|
|
<A HREF="index.html"><IMG ALT="[ Table of Contents ]"
|
|
SRC="../gx/indexnew.gif" WIDTH=163 HEIGHT=60 ALIGN=bottom ></A>
|
|
<A HREF="../index.html"><IMG ALT="[ Front Page ]"
|
|
SRC="../gx/homenew.gif" WIDTH=163 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="blanchard.html"><IMG ALT="[ Prev ]" SRC="../gx/back2.gif" WIDTH=41 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="../faq/index.html"><IMG ALT="[ Linux Gazette FAQ ]"
|
|
SRC="./../gx/dennis/faq.gif"WIDTH=163 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="fischer.html"><IMG ALT="[ Next ]" SRC="../gx/fwd.gif" WIDTH=41 HEIGHT=60 ALIGN=bottom ></A>
|
|
<!-- *** END navbar *** -->
|
|
|
|
<!--endcut ============================================================-->
|
|
|
|
<H4>
|
|
"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
|
</H4>
|
|
|
|
<P> <HR> <P>
|
|
<!--===================================================================-->
|
|
|
|
<center>
|
|
<H1><font color="maroon">LinuxThreads Programming</font></H1>
|
|
<H4>By <a href="mailto:matt@martine2.difi.unipi.it">Matteo Dell'Omodarme</a></H4>
|
|
</center>
|
|
<P> <HR> <P>
|
|
|
|
<!-- END header -->
|
|
|
|
|
|
|
|
|
|
<h2 align=center>Some theory...</h2>
|
|
|
|
<b>Introduction</b><p>
|
|
|
|
LinuxThreads
|
|
is a Linux library for multi-threaded programming. LinuxThreads
|
|
provides kernel-level
|
|
threads: threads are created with the <em>clone()</em> system call and
|
|
all scheduling is done in the kernel. It
|
|
implements the Posix 1003.1c
|
|
API (Application Programming Interface) for threads and runs on any
|
|
Linux system with kernel 2.0.0
|
|
or more recent, and a suitable C library.
|
|
|
|
<p>
|
|
<b>What are threads?</b>
|
|
<p>
|
|
|
|
A thread is a sequential flow of control through a
|
|
program. Thus multi-threaded programming is a
|
|
form of parallel programming where several threads of control are
|
|
executing concurrently in the program.
|
|
<p>
|
|
Multi-threaded programming differs from Unix-style multi-processing in
|
|
that all threads share the
|
|
same memory space (and a few other system resources, such as file
|
|
descriptors), instead of running in
|
|
their own memory space as is the case with Unix processes.
|
|
So a context switch between two threads in a single process is
|
|
considerably cheaper than a context
|
|
switch between two processes
|
|
<p>
|
|
There are two main reasons to use threads:
|
|
<OL>
|
|
<LI> Some programs reach their best performance only expressed as
|
|
several threads that communicate together (i.e. servers), rather than
|
|
a single flow of instructions.<P>
|
|
<LI> On a multiprocessor system, they can run in parallel on
|
|
several processors, allowing a single program to divide its work
|
|
between different processor. Such programs run faster than a single-thread
|
|
program which can exploits only a CPU at a time.<P>
|
|
</OL>
|
|
|
|
|
|
<b>Atomicity and volatility</b>
|
|
<p>
|
|
|
|
Accessing the memory shared from threads require some care, because
|
|
your parallel program can't access shared memory
|
|
objects as they were in ordinary local memory.<p>
|
|
|
|
<em>Atomicity</em> refers to the concept that an operation on an object is
|
|
accomplished as an indivisible, uninterruptible, sequence.
|
|
|
|
Operations on data in shared memory can occur not atomically,
|
|
and, in addition of that, GCC compiler will often performs some
|
|
optimizations, buffering values of shared variables in registers,
|
|
avoiding the memory operations needed to ensure that all processors
|
|
can see the changes on shared data. <p>
|
|
|
|
To prevent GCC's optimizer from buffering values of shared memory
|
|
objects in registers, all objects in shared memory should be declared
|
|
as having types with the <em>volatile</em> attribute, since
|
|
volatile objects reads and writes that require just one word access will
|
|
occur atomically.<p>
|
|
|
|
<b>Locks</b>
|
|
<p>
|
|
The load and store of the result are
|
|
separate memory transactions: ++<em>i</em> doesn't always work to
|
|
add one to a variable <em>i</em>
|
|
in shared memory because other processors could access <em>i</em>
|
|
between these two transactions. So, having two processes
|
|
both perform ++<em>i</em> might only increment <em>i</em> by one,
|
|
rather than by two.
|
|
<p>
|
|
So you need a system call that prevents a thread to work
|
|
on a variable while another one is changing its value. This
|
|
mechanism is implemented by the lock scheme, explained just below.<br>
|
|
|
|
Suppose that you have two threads running a routine which change the
|
|
value of a shared variable.
|
|
To obtain the correct result the routine must:
|
|
|
|
<ul>
|
|
<li> assert a lock on variable <em>i</em>
|
|
<li> modify the value of the locked variable
|
|
<li> remove the lock
|
|
</ul>
|
|
|
|
When a lock is asserted on a variable only the thread which locked the
|
|
variable can change its value. Even more the flux of the other thread
|
|
is blocked on the lock assertion, since only one lock at a time is
|
|
allowed for a variable. Only when the first thread remove the lock the
|
|
second one can restart asserting its own lock.<p>
|
|
|
|
Consequently using shared variables may delay memory activity from other
|
|
processors, whereas ordinary references may use local cache.
|
|
|
|
|
|
<h2 align=center>... and some practice</h2>
|
|
<p>
|
|
|
|
<b>The header pthread.h</b>
|
|
<p>
|
|
|
|
The facilities provided by LinuxThreads are available trough the
|
|
header /usr/include/pthread.h which declare the prototypes of the
|
|
thread routines.<p>
|
|
Writing a multi-thread program is basically a 2 step process:
|
|
<ul>
|
|
<li> use the pthread routines to assert locks on shared variables and
|
|
generate the threads.
|
|
<li> create a structure for all the parameters you must pass to the
|
|
thread subroutine
|
|
</ul>
|
|
|
|
Let's analyze the two steps starting from a brief description of
|
|
some basic pthread.h routines.<p>
|
|
|
|
|
|
<b>Initialize locks</b>
|
|
<p>
|
|
|
|
One of the first actions you must accomplish is initialize all the locks.
|
|
POSIX locks are declared as variables of type <em>pthread_mutex_t</em>;
|
|
to initialize each lock you will need, call the routine:
|
|
|
|
<pre>
|
|
int pthread_mutex_init(pthread_mutex_t *mutex,
|
|
const pthread_mutexattr_t *mutexattr);
|
|
</pre>
|
|
|
|
as in the costruction:
|
|
|
|
<pre>
|
|
#include <pthread.h>
|
|
...
|
|
pthread_mutex_t lock;
|
|
pthread_mutex_init(&lock,NULL);
|
|
...
|
|
</pre>
|
|
|
|
The function
|
|
<em>pthread_mutex_init</em> initializes the mutex object pointed to
|
|
by mutex according to the mutex attributes specified in
|
|
mutexattr. If mutexattr is NULL, default attributes are
|
|
used instead.<p>
|
|
|
|
In the continuation is shown how to use this initialized locks.<p>
|
|
|
|
<b>Spawning threads</b>
|
|
<p>
|
|
|
|
POSIX requires the user to declare a variable of type
|
|
<em>pthread_t</em> to identify each thread. <br>
|
|
A thread is generated by the call to:
|
|
|
|
<pre>
|
|
int pthread_create(pthread_t *thread, pthread_attr_t *attr,
|
|
void *(*start_routine)(void *), void *arg);
|
|
</pre>
|
|
|
|
On success, the identifier of the newly created thread is
|
|
stored in the location pointed by the thread argument, and
|
|
a 0 is returned. On error, a non-zero error code is
|
|
returned.<p>
|
|
|
|
To create a thread running the routine <em>f()</em> and pass to <em>f()</em>
|
|
a pointer to the variable <em>arg</em> use:
|
|
|
|
<pre>
|
|
#include <pthread.h>
|
|
...
|
|
pthread_t thread;
|
|
pthread_create(&thread, NULL, f, &arg).
|
|
...
|
|
</pre>
|
|
|
|
The routine <em>f()</em> must have the prototype:
|
|
|
|
<pre>
|
|
void *f(void *arg);
|
|
</pre>
|
|
|
|
<b>Clean termination</b>
|
|
<p>
|
|
|
|
As the last step you need to wait for the termination of all the
|
|
threads spawned before accessing the result of the routine
|
|
<em>f()</em>. The call to:
|
|
|
|
<pre>
|
|
int pthread_join(pthread_t th, void **thread_return);
|
|
</pre>
|
|
|
|
suspends the execution of the calling thread
|
|
until the thread identified by th terminates.<br>
|
|
|
|
If thread_return is not NULL, the return value of th is
|
|
stored in the location pointed to by thread_return. <p>
|
|
|
|
<b>Passing data to a thread routine</b>
|
|
<p>
|
|
There are two ways to pass informations from a caller routine to a
|
|
thread routine:
|
|
|
|
<ul>
|
|
<li> Global variables
|
|
<li> Structures
|
|
</ul>
|
|
|
|
The second one is the best choice in order to preserve the modularity
|
|
of the code.<br>
|
|
The structure must contain three levels of information; first of all
|
|
informations about the shared variables and locks, second
|
|
informations about all data needed by the routine; third an
|
|
identification index distinguishing among threads and the number of
|
|
CPU the program can exploit (making easy to provide this information
|
|
at run time).<p>
|
|
|
|
Let's inspect the first level of that structure; the information passed
|
|
must be shared among every threads, so you must use pointers to the
|
|
needed variables and locks. To pass a shared variable <em>var</em> of
|
|
the type <em>double</em>, and its lock, the structure must
|
|
contain two members:
|
|
|
|
<pre>
|
|
double volatile *var;
|
|
pthread_mutex_t *var_lock;
|
|
</pre>
|
|
|
|
Note the use of the volatile attribute, specifying that not pointer
|
|
itself but <em>var</em> is volatile.<p>
|
|
|
|
<b>Example of parallel code</b>
|
|
<p>
|
|
An example of program which can be easily parallelized using threads is the
|
|
computation of the scalar product of two vectors. <br>
|
|
The code is shown below with comments inserted.
|
|
|
|
|
|
<pre>
|
|
/* use gcc <progname> -D_REENTRANT -lpthread to compile */
|
|
|
|
#include<stdio.h>
|
|
#include<pthread.h>
|
|
|
|
/* definition of a suitable structure */
|
|
typedef struct
|
|
{
|
|
double volatile *p_s; /* the shared value of scalar product */
|
|
pthread_mutex_t *p_s_lock; /* the lock for variable s */
|
|
int n; /* the number of the thread */
|
|
int nproc; /* the number of processors to exploit */
|
|
double *x; /* data for first vector */
|
|
double *y; /* data for second vector */
|
|
int l; /* length of vectors */
|
|
} DATA;
|
|
|
|
void *SMP_scalprod(void *arg)
|
|
{
|
|
register double localsum;
|
|
long i;
|
|
DATA D = *(DATA *)arg;
|
|
|
|
localsum = 0.0;
|
|
|
|
/* Each thread start calculating the scalar product from i = D.n
|
|
with D.n = 1, 2, ... , D.nproc.
|
|
Since there are exactly D.nproc threads the increment on i is just
|
|
D.nproc */
|
|
|
|
for(i=D.n;i<D.l;i+=D.nproc)
|
|
localsum += D.x[i]*D.y[i];
|
|
|
|
/* the thread assert the lock on s ... */
|
|
pthread_mutex_lock(D.p_s_lock);
|
|
|
|
/* ... change the value of s ... */
|
|
*(D.p_s) += localsum;
|
|
|
|
/* ... and remove the lock */
|
|
pthread_mutex_unlock(D.p_s_lock);
|
|
|
|
return NULL;
|
|
}
|
|
|
|
#define L 9 /* dimension of vectors */
|
|
|
|
int main(int argc, char **argv)
|
|
{
|
|
pthread_t *thread;
|
|
void *retval;
|
|
int cpu, i;
|
|
DATA *A;
|
|
volatile double s=0; /* the shared variable */
|
|
pthread_mutex_t s_lock;
|
|
double x[L], y[L];
|
|
|
|
if(argc != 2)
|
|
{
|
|
printf("usage: %s <number of CPU>\n", argv[0]);
|
|
exit(1);
|
|
}
|
|
|
|
cpu = atoi(argv[1]);
|
|
thread = (pthread_t *) calloc(cpu, sizeof(pthread_t));
|
|
A = (DATA *)calloc(cpu, sizeof(DATA));
|
|
|
|
|
|
for(i=0;i<L;i++)
|
|
x[i] = y[i] = i;
|
|
|
|
/* initialize the lock variable */
|
|
pthread_mutex_init(&s_lock, NULL);
|
|
|
|
for(i=0;i&lt;cpu;i++)
|
|
{
|
|
/* initialize the structure */
|
|
A[i].n = i; /* the number of the thread */
|
|
A[i].x = x;
|
|
A[i].y = y;
|
|
A[i].l = L;
|
|
A[i].nproc = cpu; /* the number of CPU */
|
|
A[i].p_s = &s;
|
|
A[i].p_s_lock = &s_lock;
|
|
|
|
if(pthread_create(&thread[i], NULL, SMP_scalprod, &A[i] ))
|
|
{
|
|
fprintf(stderr, "%s: cannot make thread\n", argv[0]);
|
|
exit(1);
|
|
}
|
|
}
|
|
|
|
for(i=0;i<cpu;i++)
|
|
{
|
|
if(pthread_join(thread[i], &retval))
|
|
{
|
|
fprintf(stderr, "%s: cannot join thread\n", argv[0]);
|
|
exit(1);
|
|
}
|
|
}
|
|
|
|
printf("s = %f\n", s);
|
|
exit(0);
|
|
}
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- *** BEGIN copyright *** -->
|
|
<P> <hr> <P>
|
|
<H5 ALIGN=center>
|
|
|
|
Copyright © 1999, Matteo Dell'Omodarme<BR>
|
|
Published in Issue 48 of <i>Linux Gazette</i>, December 1999</H5>
|
|
<!-- *** END copyright *** -->
|
|
|
|
<!--startcut ==========================================================-->
|
|
<P> <HR> <P>
|
|
<!-- *** BEGIN navbar *** -->
|
|
<A HREF="index.html"><IMG ALT="[ Table of Contents ]"
|
|
SRC="../gx/indexnew.gif" WIDTH=163 HEIGHT=60 ALIGN=bottom ></A>
|
|
<A HREF="../index.html"><IMG ALT="[ Front Page ]"
|
|
SRC="../gx/homenew.gif" WIDTH=163 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="blanchard.html"><IMG ALT="[ Prev ]" SRC="../gx/back2.gif" WIDTH=41 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="../faq/index.html"><IMG ALT="[ Linux Gazette FAQ ]"
|
|
SRC="./../gx/dennis/faq.gif"WIDTH=163 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="fischer.html"><IMG ALT="[ Next ]" SRC="../gx/fwd.gif" WIDTH=41 HEIGHT=60 ALIGN=bottom ></A>
|
|
<!-- *** END navbar *** -->
|
|
</BODY></HTML>
|
|
<!--endcut ============================================================-->
|