mirror of https://github.com/mkerrisk/man-pages
futex.2: Terminology fixes
Here is the result of a first pass over futex.2. I tried to do nothing that is too controversial. I tried to apply the terminology that at least Darren and I had in mind consistently; but please check again. The major changes are in how futexes are described in the introductory parts of the page. I hope it's easier to understand now. I've also tried to add some more precision to the the description of the synchronization semantics (e.g., it makes a difference whether we claim something is atomic (without further qualification), or just atomic wrt. other futex operations). In some cases, that adds some verbosity to the text -- but I believe that this is worth the clarity and consistency in using terms, for example. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
63de469c6a
commit
4b35dc5dab
318
man2/futex.2
318
man2/futex.2
|
@ -40,50 +40,65 @@ There is no glibc wrapper for this system call; see NOTES.
|
||||||
.PP
|
.PP
|
||||||
The
|
The
|
||||||
.BR futex ()
|
.BR futex ()
|
||||||
system call provides a method for
|
system call provides a method for waiting until a certain condition becomes
|
||||||
a program to wait for a value at a given address to change, and a
|
true. It is typically used as a blocking construct in the context of
|
||||||
method to wake up anyone waiting on a particular address.
|
shared-memory synchronization: The program implements the majority of the
|
||||||
|
synchronization in user space, and uses one of operations of the system call
|
||||||
|
when it is likely that it has to block for a longer time until the condition
|
||||||
|
becomes true. The program uses another operation of the system call to wake
|
||||||
|
anyone waiting for a particular condition.
|
||||||
|
|
||||||
|
The condition is represented by the futex word, which is an address in memory
|
||||||
|
supplied to the
|
||||||
|
.BR futex ()
|
||||||
|
system call, and the value at this memory location.
|
||||||
(While the virtual addresses for the same memory in separate
|
(While the virtual addresses for the same memory in separate
|
||||||
processes may not be equal,
|
processes may not be equal,
|
||||||
the kernel maps them internally so that the same memory mapped
|
the kernel maps them internally so that the same memory mapped
|
||||||
in different locations will correspond for
|
in different locations will correspond for
|
||||||
.BR futex ()
|
.BR futex ()
|
||||||
calls.)
|
calls.)
|
||||||
This system call is typically used to
|
|
||||||
implement the contended case of a lock in shared memory, as
|
|
||||||
described in
|
|
||||||
.BR futex (7).
|
|
||||||
|
|
||||||
In the uncontended case,
|
When executing a futex operation that requests to block a thread, the kernel
|
||||||
all operations on the futex memory location are performed
|
will only block if the futex word has the value that the calling thread
|
||||||
in user space using atomic machine-language instructions,
|
supplied as expected value. The load from the futex word, the comparison with
|
||||||
and the kernel maintains no information about the futex.
|
the expected value, and the actual blocking will happen atomically and totally
|
||||||
The kernel allocates state information for the futex only
|
ordered with respect to concurrently executing futex operations on the same
|
||||||
in the contended case, when operations such as
|
futex word, such as operations that wake threads blocked on this futex word.
|
||||||
|
Thus, the futex word is used to connect the synchronization in user space with
|
||||||
|
the implementation of blocking by the kernel; similar to an atomic
|
||||||
|
compare-and-exchange operation that potentially changes shared memory,
|
||||||
|
blocking via a futex is an atomic compare-and-block operation. See NOTES for
|
||||||
|
a detailed specification of the synchronization semantics.
|
||||||
|
|
||||||
|
One example use of futexes is implementing locks. The state of the lock (i.e.,
|
||||||
|
acquired or not acquired) can be represented as an atomically accessed flag
|
||||||
|
in shared memory. In the uncontended case, a thread can access or modify the
|
||||||
|
lock state with atomic instructions, for example atomically changing it from
|
||||||
|
not acquired to acquired using an atomic compare-and-exchange instruction. If
|
||||||
|
a thread cannot acquire a lock because it is already acquired by another
|
||||||
|
thread, it can request to block if and only the lock is still acquired by
|
||||||
|
using the lock's flag as futex word and expecting a value that represents the
|
||||||
|
acquired state. When releasing the lock, a thread has to first reset the
|
||||||
|
lock state to not acquired and then execute the futex operation that wakes
|
||||||
|
one thread blocked on the futex word that is the lock's flag (this can be
|
||||||
|
be further optimized to avoid unnecessary wake-ups). See
|
||||||
|
.BR futex (7)
|
||||||
|
for more detail on how to use futexes.
|
||||||
|
|
||||||
|
Besides the basic wait and wake-up futex functionality, there are further
|
||||||
|
futex operations aimed at supporting more complex use cases. Also note that
|
||||||
|
no explicit initialization or destruction are necessary to use futexes; the
|
||||||
|
kernel maintains a futex (i.e., the kernel-internal implementation artifact)
|
||||||
|
only while operations such as
|
||||||
.BR FUTEX_WAIT ,
|
.BR FUTEX_WAIT ,
|
||||||
described below, are performed.
|
described below, are being performed on a particular futex word.
|
||||||
|
|
||||||
When a futex operation did not finish uncontended in user space, a
|
|
||||||
.BR futex ()
|
|
||||||
call needs to be made to the kernel to arbitrate.
|
|
||||||
Arbitration can either mean putting the caller
|
|
||||||
to sleep or, conversely, waking a waiting process or thread.
|
|
||||||
.PP
|
|
||||||
Callers of
|
|
||||||
.BR futex ()
|
|
||||||
are expected to adhere to the semantics described in
|
|
||||||
.BR futex (7).
|
|
||||||
As these semantics involve writing nonportable assembly instructions
|
|
||||||
(see the example library referred to in SEE ALSO),
|
|
||||||
this in turn probably means that most users will in fact be
|
|
||||||
library authors and not general application developers.
|
|
||||||
.\"
|
.\"
|
||||||
.SS Arguments
|
.SS Arguments
|
||||||
The
|
The
|
||||||
.I uaddr
|
.I uaddr
|
||||||
argument points to an integer which stores the counter (futex).
|
argument points to the futex word. On all platforms, futexes are four-byte
|
||||||
On all platforms, futexes are four-byte integers that
|
integers that must be aligned on a four-byte boundary.
|
||||||
must be aligned on a four-byte boundary.
|
|
||||||
The operation to perform on the futex is specified in the
|
The operation to perform on the futex is specified in the
|
||||||
.I futex_op
|
.I futex_op
|
||||||
argument;
|
argument;
|
||||||
|
@ -117,7 +132,7 @@ when interpreted in this fashion.
|
||||||
|
|
||||||
Where it is required, the
|
Where it is required, the
|
||||||
.IR uaddr2
|
.IR uaddr2
|
||||||
argument is a pointer to a second futex that is employed by the operation.
|
argument is a pointer to a second futex word that is employed by the operation.
|
||||||
The interpretation of the final integer argument,
|
The interpretation of the final integer argument,
|
||||||
.IR val3 ,
|
.IR val3 ,
|
||||||
depends on the operation.
|
depends on the operation.
|
||||||
|
@ -139,8 +154,8 @@ are as follows:
|
||||||
.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
|
.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
|
||||||
This option bit can be employed with all futex operations.
|
This option bit can be employed with all futex operations.
|
||||||
It tells the kernel that the futex is process-private and not shared
|
It tells the kernel that the futex is process-private and not shared
|
||||||
with another process
|
with another process (i.e., it is only being used for synchronization between
|
||||||
(i.e., it is being used for synchronization between threads).
|
threads of the same process).
|
||||||
This allows the kernel to choose the fast path for validating
|
This allows the kernel to choose the fast path for validating
|
||||||
the user-space address and avoids expensive VMA lookups,
|
the user-space address and avoids expensive VMA lookups,
|
||||||
taking reference counts on file backing store, and so on.
|
taking reference counts on file backing store, and so on.
|
||||||
|
@ -191,24 +206,32 @@ is one of the following:
|
||||||
.BR FUTEX_WAIT " (since Linux 2.6.0)"
|
.BR FUTEX_WAIT " (since Linux 2.6.0)"
|
||||||
.\" Strictly speaking, since some time in 2.5.x
|
.\" Strictly speaking, since some time in 2.5.x
|
||||||
This operation tests that the value at the
|
This operation tests that the value at the
|
||||||
location pointed to by the futex address
|
futex word pointed to by the address
|
||||||
.I uaddr
|
.I uaddr
|
||||||
still contains the value
|
still contains the expected value
|
||||||
.IR val ,
|
.IR val ,
|
||||||
and then sleeps awaiting
|
and if so, then sleeps awaiting
|
||||||
.B FUTEX_WAKE
|
.B FUTEX_WAKE
|
||||||
on the futex address.
|
on the futex word. The load of the value of the futex word is an atomic memory
|
||||||
The test and sleep steps are performed atomically.
|
access (i.e., using atomic machine instructions of the respective
|
||||||
|
architecture). This load, the comparison with the expected value, and
|
||||||
|
starting to sleep are performed atomically and totally ordered with respect
|
||||||
|
to other futex operations on the same futex word. If the thread starts to
|
||||||
|
sleep, it is considered a waiter on this futex word.
|
||||||
If the futex value does not match
|
If the futex value does not match
|
||||||
.IR val ,
|
.IR val ,
|
||||||
then the call fails immediately with the error
|
then the call fails immediately with the error
|
||||||
.BR EAGAIN .
|
.BR EAGAIN .
|
||||||
.\" FIXME I added the following sentence. Please confirm that it is correct.
|
|
||||||
The purpose of the test step is to detect races where
|
The purpose of the comparison with the expected value is to prevent lost
|
||||||
another process or thread changes the value of the futex between
|
wake-ups: If another thread changed the value of the futex word after the
|
||||||
the time it was last checked and the time of the
|
calling thread decided to block based on the prior value, and if the other
|
||||||
|
thread executed a
|
||||||
|
.BR FUTEX_WAKE
|
||||||
|
operation (or similar wake-up) after the value change and before this
|
||||||
.BR FUTEX_WAIT
|
.BR FUTEX_WAIT
|
||||||
operation.
|
operation, then the latter will observe the value change and will not start
|
||||||
|
to sleep.
|
||||||
|
|
||||||
If the
|
If the
|
||||||
.I timeout
|
.I timeout
|
||||||
|
@ -230,14 +253,15 @@ and
|
||||||
.I val3
|
.I val3
|
||||||
are ignored.
|
are ignored.
|
||||||
|
|
||||||
For
|
.\" XXX I think we should remove this. Or maybe adapt to a different example.
|
||||||
.BR futex (7),
|
.\" For
|
||||||
this call is executed if decrementing the count gave a negative value
|
.\" .BR futex (7),
|
||||||
(indicating contention),
|
.\" this call is executed if decrementing the count gave a negative value
|
||||||
and will sleep until another process or thread releases
|
.\" (indicating contention),
|
||||||
the futex and executes the
|
.\" and will sleep until another process or thread releases
|
||||||
.B FUTEX_WAKE
|
.\" the futex and executes the
|
||||||
operation.
|
.\" .B FUTEX_WAKE
|
||||||
|
.\" operation.
|
||||||
.\"
|
.\"
|
||||||
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
||||||
.\"
|
.\"
|
||||||
|
@ -246,9 +270,11 @@ operation.
|
||||||
.\" Strictly speaking, since Linux 2.5.x
|
.\" Strictly speaking, since Linux 2.5.x
|
||||||
This operation wakes at most
|
This operation wakes at most
|
||||||
.I val
|
.I val
|
||||||
of the waiters that are waiting (i.e., inside
|
.\" XXX I believe FUTEX_WAIT_BITSET waiters, for example, could also be woken
|
||||||
|
.\" (therefore, make it e.g. instead of i.e.)?
|
||||||
|
of the waiters that are waiting (e.g., inside
|
||||||
.BR FUTEX_WAIT )
|
.BR FUTEX_WAIT )
|
||||||
on the futex at the address
|
on the futex word at the address
|
||||||
.IR uaddr .
|
.IR uaddr .
|
||||||
Most commonly,
|
Most commonly,
|
||||||
.I val
|
.I val
|
||||||
|
@ -267,11 +293,12 @@ and
|
||||||
.I val3
|
.I val3
|
||||||
are ignored.
|
are ignored.
|
||||||
|
|
||||||
For
|
.\" XXX I think we should remove this. Or maybe adapt to a different example.
|
||||||
.BR futex (7),
|
.\" For
|
||||||
this is executed if incrementing the count showed that there were waiters,
|
.\" .BR futex (7),
|
||||||
|
.\" this is executed if incrementing the count showed that there were waiters,
|
||||||
.\" FIXME How does "incrementing the count showed that there were waiters"?
|
.\" FIXME How does "incrementing the count showed that there were waiters"?
|
||||||
once the futex value has been set to 1 (indicating that it is available).
|
.\" once the futex value has been set to 1 (indicating that it is available).
|
||||||
.\"
|
.\"
|
||||||
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
||||||
.\"
|
.\"
|
||||||
|
@ -283,7 +310,7 @@ This operation creates a file descriptor that is associated with the futex at
|
||||||
The caller must close the returned file descriptor after use.
|
The caller must close the returned file descriptor after use.
|
||||||
When another process or thread performs a
|
When another process or thread performs a
|
||||||
.BR FUTEX_WAKE
|
.BR FUTEX_WAKE
|
||||||
on the futex, the file descriptor indicates as being readable with
|
on the futex word, the file descriptor indicates as being readable with
|
||||||
.BR select (2),
|
.BR select (2),
|
||||||
.BR poll (2),
|
.BR poll (2),
|
||||||
and
|
and
|
||||||
|
@ -303,6 +330,7 @@ and
|
||||||
.I val3
|
.I val3
|
||||||
are ignored.
|
are ignored.
|
||||||
|
|
||||||
|
.\" FIXME We never define "upped". Maybe just remove that sentence?
|
||||||
To prevent race conditions, the caller should test if the futex has
|
To prevent race conditions, the caller should test if the futex has
|
||||||
been upped after
|
been upped after
|
||||||
.B FUTEX_FD
|
.B FUTEX_FD
|
||||||
|
@ -319,8 +347,11 @@ from Linux 2.6.26 onward.
|
||||||
.TP
|
.TP
|
||||||
.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
|
.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
|
||||||
.\" Strictly speaking: from Linux 2.5.70
|
.\" Strictly speaking: from Linux 2.5.70
|
||||||
|
.\" FIXME Is there some indication that it is broken in general, or is this
|
||||||
|
.\" comment implicitly speaking about the condvar (?) use case? If the latter
|
||||||
|
.\" we might want to weaken the advice a little.
|
||||||
.IR "Avoid using this operation" .
|
.IR "Avoid using this operation" .
|
||||||
It is broken (unavoidably racy) for its intended purpose.
|
It is broken for its intended purpose.
|
||||||
Use
|
Use
|
||||||
.BR FUTEX_CMP_REQUEUE
|
.BR FUTEX_CMP_REQUEUE
|
||||||
instead.
|
instead.
|
||||||
|
@ -337,35 +368,13 @@ is ignored.)
|
||||||
.\"
|
.\"
|
||||||
.TP
|
.TP
|
||||||
.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
|
.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
|
||||||
This operation was added as a replacement for the earlier
|
This operation first checks whether the location
|
||||||
.BR FUTEX_REQUEUE ,
|
|
||||||
because that operation was racy for its intended use.
|
|
||||||
|
|
||||||
As with
|
|
||||||
.BR FUTEX_REQUEUE ,
|
|
||||||
the
|
|
||||||
.BR FUTEX_CMP_REQUEUE
|
|
||||||
operation is used to avoid a "thundering herd" effect when
|
|
||||||
.B FUTEX_WAKE
|
|
||||||
is used and all of the waiters that are woken up
|
|
||||||
need to acquire another futex.
|
|
||||||
It differs from
|
|
||||||
.BR FUTEX_REQUEUE
|
|
||||||
in that it first checks whether the location
|
|
||||||
.I uaddr
|
.I uaddr
|
||||||
still contains the value
|
still contains the value
|
||||||
.IR val3 .
|
.IR val3 .
|
||||||
If not, the operation fails with the error
|
If not, the operation fails with the error
|
||||||
.BR EAGAIN .
|
.BR EAGAIN .
|
||||||
.\" FIXME I added the following sentence on the rationale for
|
Otherwise, the operation wakes up a maximum of
|
||||||
.\" FUTEX_CMP_REQUEUE. Is it correct? Should it be expanded?
|
|
||||||
This additional feature of
|
|
||||||
.BR FUTEX_CMP_REQUEUE
|
|
||||||
can be used by the caller to (atomically) detect changes
|
|
||||||
in the value of the target futex at
|
|
||||||
.IR uaddr2 .
|
|
||||||
|
|
||||||
The operation wakes up a maximum of
|
|
||||||
.I val
|
.I val
|
||||||
waiters that are waiting on the futex at
|
waiters that are waiting on the futex at
|
||||||
.IR uaddr .
|
.IR uaddr .
|
||||||
|
@ -376,13 +385,31 @@ from the wait queue of the source futex at
|
||||||
.I uaddr
|
.I uaddr
|
||||||
and added to the wait queue of the target futex at
|
and added to the wait queue of the target futex at
|
||||||
.IR uaddr2 .
|
.IR uaddr2 .
|
||||||
|
|
||||||
The
|
The
|
||||||
.I val2
|
.I val2
|
||||||
argument specifies an upper limit on the number of waiters
|
argument specifies an upper limit on the number of waiters
|
||||||
that are requeued to the futex at
|
that are requeued to the futex at
|
||||||
.IR uaddr2 .
|
.IR uaddr2 .
|
||||||
|
|
||||||
|
.\" FIXME Is this correct? Or is just the decision which threads to wake or
|
||||||
|
.\" requeue part of the atomic operation?
|
||||||
|
The load from
|
||||||
|
.I uaddr
|
||||||
|
is an atomic memory access (i.e., using atomic machine instructions of the
|
||||||
|
respective architecture). This load, the comparison with
|
||||||
|
.IR val3 ,
|
||||||
|
and the requeueing of any waiters are performed atomically and totally ordered
|
||||||
|
with respect to other operations on the same futex word.
|
||||||
|
|
||||||
|
This operation was added as a replacement for the earlier
|
||||||
|
.BR FUTEX_REQUEUE .
|
||||||
|
The difference is that the check of the value at
|
||||||
|
.I uaddr
|
||||||
|
can be used to ensure that requeueing only happens under certain conditions.
|
||||||
|
Both operations can be used to avoid a "thundering herd" effect when
|
||||||
|
.B FUTEX_WAKE
|
||||||
|
is used and all of the waiters that are woken need to acquire another futex.
|
||||||
|
|
||||||
.\" FIXME Please review the following new paragraph to see if it is
|
.\" FIXME Please review the following new paragraph to see if it is
|
||||||
.\" accurate.
|
.\" accurate.
|
||||||
Typical values to specify for
|
Typical values to specify for
|
||||||
|
@ -416,6 +443,9 @@ operation equivalent to
|
||||||
.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
|
.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
|
||||||
.\" Author: Jakub Jelinek <jakub@redhat.com>
|
.\" Author: Jakub Jelinek <jakub@redhat.com>
|
||||||
.\" Date: Tue Sep 6 15:16:25 2005 -0700
|
.\" Date: Tue Sep 6 15:16:25 2005 -0700
|
||||||
|
.\" FIXME The glibc condvar implementation is currently being revised (e.g.,
|
||||||
|
.\" to not use an internal lock anymore).
|
||||||
|
.\" It is probably more future-proof to remove this paragraph.
|
||||||
This operation was added to support some user-space use cases
|
This operation was added to support some user-space use cases
|
||||||
where more than one futex must be handled at the same time.
|
where more than one futex must be handled at the same time.
|
||||||
The most notable example is the implementation of
|
The most notable example is the implementation of
|
||||||
|
@ -429,7 +459,9 @@ high rates of contention and context switching.
|
||||||
|
|
||||||
The
|
The
|
||||||
.BR FUTEX_WAIT_OP
|
.BR FUTEX_WAIT_OP
|
||||||
operation is equivalent to atomically executing the following code:
|
operation is equivalent to execute the following code atomically and totally
|
||||||
|
ordered with respect to other futex operations on any of the two supplied
|
||||||
|
futex words:
|
||||||
|
|
||||||
.in +4n
|
.in +4n
|
||||||
.nf
|
.nf
|
||||||
|
@ -446,23 +478,24 @@ In other words,
|
||||||
does the following:
|
does the following:
|
||||||
.RS
|
.RS
|
||||||
.IP * 3
|
.IP * 3
|
||||||
saves the original value of the futex at
|
saves the original value of the futex word at
|
||||||
.IR uaddr2 ;
|
.IR uaddr2
|
||||||
.IP *
|
and performs an operation to modify the value of the futex at
|
||||||
performs an operation to modify the value of the futex at
|
|
||||||
.IR uaddr2 ;
|
.IR uaddr2 ;
|
||||||
|
this is an atomic read-modify-write memory access (i.e., using atomic machine
|
||||||
|
instructions of the respective architecture)
|
||||||
.IP *
|
.IP *
|
||||||
wakes up a maximum of
|
wakes up a maximum of
|
||||||
.I val
|
.I val
|
||||||
waiters on the futex
|
waiters on the futex for the futex word at
|
||||||
.IR uaddr ;
|
.IR uaddr ;
|
||||||
and
|
and
|
||||||
.IP *
|
.IP *
|
||||||
dependent on the results of a test of the original value of the futex at
|
dependent on the results of a test of the original value of the futex word at
|
||||||
.IR uaddr2 ,
|
.IR uaddr2 ,
|
||||||
wakes up a maximum of
|
wakes up a maximum of
|
||||||
.I val2
|
.I val2
|
||||||
waiters on the futex
|
waiters on the futex for the futex word at
|
||||||
.IR uaddr2 .
|
.IR uaddr2 .
|
||||||
.RE
|
.RE
|
||||||
.IP
|
.IP
|
||||||
|
@ -676,7 +709,7 @@ have their priorities raised to be the same as the high-priority task.
|
||||||
.\" based on mail discussions with Darren Hart. Does it seem okay?
|
.\" based on mail discussions with Darren Hart. Does it seem okay?
|
||||||
From a user-space perspective,
|
From a user-space perspective,
|
||||||
what makes a futex PI-aware is a policy agreement between user space
|
what makes a futex PI-aware is a policy agreement between user space
|
||||||
and the kernel about the value of the futex (described in a moment),
|
and the kernel about the value of the futex word (described in a moment),
|
||||||
coupled with the use of the PI futex operations described below
|
coupled with the use of the PI futex operations described below
|
||||||
(in particular,
|
(in particular,
|
||||||
.BR FUTEX_LOCK_PI ,
|
.BR FUTEX_LOCK_PI ,
|
||||||
|
@ -697,11 +730,13 @@ and
|
||||||
.\" significantly. Please check it.
|
.\" significantly. Please check it.
|
||||||
.\"
|
.\"
|
||||||
The PI futex operations described below differ from the other
|
The PI futex operations described below differ from the other
|
||||||
futex operations in that they impose policy on the use of the futex value:
|
futex operations in that they impose policy on the use of the value of the
|
||||||
|
futex word:
|
||||||
.IP * 3
|
.IP * 3
|
||||||
If the lock is unowned, the futex value shall be 0.
|
If the lock is not acquired, the futex word's value shall be 0.
|
||||||
.IP *
|
.IP *
|
||||||
If the lock is owned, the futex value shall be the thread ID (TID; see
|
If the lock is acquired, the futex word's value shall be the thread ID (TID;
|
||||||
|
see
|
||||||
.BR gettid (2))
|
.BR gettid (2))
|
||||||
of the owning thread.
|
of the owning thread.
|
||||||
.IP *
|
.IP *
|
||||||
|
@ -709,32 +744,34 @@ of the owning thread.
|
||||||
If the lock is owned and there are threads contending for the lock,
|
If the lock is owned and there are threads contending for the lock,
|
||||||
then the
|
then the
|
||||||
.B FUTEX_WAITERS
|
.B FUTEX_WAITERS
|
||||||
bit shall be set in the futex value; in other words, the futex value is:
|
bit shall be set in the futex word's value; in other words, this value is:
|
||||||
|
|
||||||
FUTEX_WAITERS | TID
|
FUTEX_WAITERS | TID
|
||||||
|
|
||||||
.PP
|
.PP
|
||||||
Note that a PI futex never just has the value
|
Note that a PI futex word never just has the value
|
||||||
.BR FUTEX_WAITERS ,
|
.BR FUTEX_WAITERS ,
|
||||||
which is a permissible state for non-PI futexes.
|
which is a permissible state for non-PI futexes.
|
||||||
|
|
||||||
With this policy in place,
|
With this policy in place,
|
||||||
a user-space application can acquire an unowned
|
a user-space application can acquire a not-acquired
|
||||||
lock or release an uncontended lock using atomic
|
lock or release a lock that no other threads try to acquire using atomic
|
||||||
instructions executed in user-space (e.g.,
|
instructions executed in user space (e.g., a compare-and-swap operation such
|
||||||
|
as
|
||||||
.I cmpxchg
|
.I cmpxchg
|
||||||
on the x86 architecture).
|
on the x86 architecture).
|
||||||
Locking an unowned lock simply consists of setting
|
Acquiring a lock simply consists of using compare-and-swap to atomically set
|
||||||
the futex value to the caller's TID.
|
the futex word's value to the caller's TID if its previous value was 0.
|
||||||
Releasing an uncontended lock simply requires setting the futex value to 0.
|
Releasing a lock requires using compare-and-swap to set the futex word's
|
||||||
|
value to 0 if the previous value was the expected TID.
|
||||||
|
|
||||||
If a futex is currently owned (i.e., has a nonzero value),
|
If a futex is already acquired (i.e., has a nonzero value),
|
||||||
waiters must employ the
|
waiters must employ the
|
||||||
.B FUTEX_LOCK_PI
|
.B FUTEX_LOCK_PI
|
||||||
operation to acquire the lock.
|
operation to acquire the lock.
|
||||||
If a lock is contended (i.e., the
|
If other threads are waiting for the lock, then the
|
||||||
.B FUTEX_WAITERS
|
.B FUTEX_WAITERS
|
||||||
bit is set in the futex value), the lock owner must employ the
|
bit is set in the futex value; in this case, the lock owner must employ the
|
||||||
.B FUTEX_UNLOCK_PI
|
.B FUTEX_UNLOCK_PI
|
||||||
operation to release the lock.
|
operation to release the lock.
|
||||||
|
|
||||||
|
@ -752,7 +789,7 @@ before the calling thread returns to user space.
|
||||||
It is important to note
|
It is important to note
|
||||||
.\" FIXME We need some explanation here of *why* it is important to
|
.\" FIXME We need some explanation here of *why* it is important to
|
||||||
.\" note this. Can someone explain?
|
.\" note this. Can someone explain?
|
||||||
that the kernel will update the futex value prior
|
that the kernel will update the futex word's value prior
|
||||||
to returning to user space.
|
to returning to user space.
|
||||||
Unlike the other futex operations described above,
|
Unlike the other futex operations described above,
|
||||||
the PI futex operations are designed
|
the PI futex operations are designed
|
||||||
|
@ -782,8 +819,8 @@ PI futexes are operated on by specifying one of the following values in
|
||||||
.\" Please check, in case I injected errors.
|
.\" Please check, in case I injected errors.
|
||||||
.\"
|
.\"
|
||||||
This operation is used after after an attempt to acquire
|
This operation is used after after an attempt to acquire
|
||||||
the futex lock via an atomic user-space instruction failed
|
the lock via an atomic user-space instruction failed
|
||||||
because the futex has a nonzero value\(emspecifically,
|
because the futex word has a nonzero value\(emspecifically,
|
||||||
because it contained the namespace-specific TID of the lock owner.
|
because it contained the namespace-specific TID of the lock owner.
|
||||||
.\" FIXME In the preceding line, what does "namespace-specific" mean?
|
.\" FIXME In the preceding line, what does "namespace-specific" mean?
|
||||||
.\" (I kept those words from tglx.)
|
.\" (I kept those words from tglx.)
|
||||||
|
@ -791,13 +828,13 @@ because it contained the namespace-specific TID of the lock owner.
|
||||||
.\" (I suppose we are talking PID namespaces here, but I want to
|
.\" (I suppose we are talking PID namespaces here, but I want to
|
||||||
.\" be sure.)
|
.\" be sure.)
|
||||||
|
|
||||||
The operation checks the value of the futex at the address
|
The operation checks the value of the futex word at the address
|
||||||
.IR uaddr .
|
.IR uaddr .
|
||||||
If the value is 0, then the kernel tries to atomically set
|
If the value is 0, then the kernel tries to atomically set
|
||||||
the futex value to the caller's TID.
|
the futex value to the caller's TID.
|
||||||
If that fails,
|
If that fails,
|
||||||
.\" FIXME What would be the cause of failure?
|
.\" FIXME What would be the cause of failure?
|
||||||
or the futex value is nonzero,
|
or the futex word's value is nonzero,
|
||||||
the kernel atomically sets the
|
the kernel atomically sets the
|
||||||
.B FUTEX_WAITERS
|
.B FUTEX_WAITERS
|
||||||
bit, which signals the futex owner that it cannot unlock the futex in
|
bit, which signals the futex owner that it cannot unlock the futex in
|
||||||
|
@ -855,6 +892,8 @@ This operation tries to acquire the futex at
|
||||||
.\" the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI.
|
.\" the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI.
|
||||||
.\" Can someone propose something?
|
.\" Can someone propose something?
|
||||||
.\"
|
.\"
|
||||||
|
.\" FIXME Additionally, we claim above that just FUTEX_WAITERS is never an
|
||||||
|
.\" allowed state.
|
||||||
It deals with the situation where the TID value at
|
It deals with the situation where the TID value at
|
||||||
.I uaddr
|
.I uaddr
|
||||||
is 0, but the
|
is 0, but the
|
||||||
|
@ -1049,7 +1088,13 @@ The return value on success depends on the operation,
|
||||||
as described in the following list:
|
as described in the following list:
|
||||||
.TP
|
.TP
|
||||||
.B FUTEX_WAIT
|
.B FUTEX_WAIT
|
||||||
Returns 0 if the caller was woken up.
|
Returns 0 if the caller was woken up. Note that a wake-up can also be
|
||||||
|
caused by common futex usage patterns in unrelated code that happened to have
|
||||||
|
previously used the futex word's memory location (e.g., typical futex-based
|
||||||
|
implementations of Pthreads mutexes can cause this under some conditions).
|
||||||
|
Therefore, callers should always conservatively assume that a return value of
|
||||||
|
0 can mean a spurious wake-up, and use the futex word's value (i.e., the user
|
||||||
|
space synchronization scheme) to decide whether to continue to block or not.
|
||||||
.TP
|
.TP
|
||||||
.B FUTEX_WAKE
|
.B FUTEX_WAKE
|
||||||
Returns the number of waiters that were woken up.
|
Returns the number of waiters that were woken up.
|
||||||
|
@ -1062,22 +1107,25 @@ Returns the number of waiters that were woken up.
|
||||||
.TP
|
.TP
|
||||||
.B FUTEX_CMP_REQUEUE
|
.B FUTEX_CMP_REQUEUE
|
||||||
Returns the total number of waiters that were woken up or
|
Returns the total number of waiters that were woken up or
|
||||||
requeued to the futex at
|
requeued to the futex for the futex word at
|
||||||
.IR uaddr2 .
|
.IR uaddr2 .
|
||||||
If this value is greater than
|
If this value is greater than
|
||||||
.IR val ,
|
.IR val ,
|
||||||
then difference is the number of waiters requeued to the futex at
|
then difference is the number of waiters requeued to the futex for the futex
|
||||||
|
word at
|
||||||
.IR uaddr2 .
|
.IR uaddr2 .
|
||||||
.TP
|
.TP
|
||||||
.B FUTEX_WAKE_OP
|
.B FUTEX_WAKE_OP
|
||||||
Returns the total number of waiters that were woken up.
|
Returns the total number of waiters that were woken up.
|
||||||
This is the sum of the woken waiters on the two futexes at
|
This is the sum of the woken waiters on the two futexes for the futex words at
|
||||||
.I uaddr
|
.I uaddr
|
||||||
and
|
and
|
||||||
.IR uaddr2 .
|
.IR uaddr2 .
|
||||||
.TP
|
.TP
|
||||||
.B FUTEX_WAIT_BITSET
|
.B FUTEX_WAIT_BITSET
|
||||||
Returns 0 if the caller was woken up.
|
Returns 0 if the caller was woken up. See
|
||||||
|
.B FUTEX_WAIT
|
||||||
|
for how to interpret this correctly in practice.
|
||||||
.TP
|
.TP
|
||||||
.B FUTEX_WAKE_BITSET
|
.B FUTEX_WAKE_BITSET
|
||||||
Returns the number of waiters that were woken up.
|
Returns the number of waiters that were woken up.
|
||||||
|
@ -1093,15 +1141,17 @@ Returns 0 if the futex was successfully unlocked.
|
||||||
.TP
|
.TP
|
||||||
.B FUTEX_CMP_REQUEUE_PI
|
.B FUTEX_CMP_REQUEUE_PI
|
||||||
Returns the total number of waiters that were woken up or
|
Returns the total number of waiters that were woken up or
|
||||||
requeued to the futex at
|
requeued to the futex for the futex word at
|
||||||
.IR uaddr2 .
|
.IR uaddr2 .
|
||||||
If this value is greater than
|
If this value is greater than
|
||||||
.IR val ,
|
.IR val ,
|
||||||
then difference is the number of waiters requeued to the futex at
|
then difference is the number of waiters requeued to the futex for the futex
|
||||||
|
word at
|
||||||
.IR uaddr2 .
|
.IR uaddr2 .
|
||||||
.TP
|
.TP
|
||||||
.B FUTEX_WAIT_REQUEUE_PI
|
.B FUTEX_WAIT_REQUEUE_PI
|
||||||
Returns 0 if the caller was successfully requeued to the futex at
|
Returns 0 if the caller was successfully requeued to the futex for the futex
|
||||||
|
word at
|
||||||
.IR uaddr2 .
|
.IR uaddr2 .
|
||||||
.\"
|
.\"
|
||||||
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
||||||
|
@ -1109,10 +1159,11 @@ Returns 0 if the caller was successfully requeued to the futex at
|
||||||
.SH ERRORS
|
.SH ERRORS
|
||||||
.TP
|
.TP
|
||||||
.B EACCES
|
.B EACCES
|
||||||
No read access to futex memory.
|
No read access to the memory of a futex word.
|
||||||
.TP
|
.TP
|
||||||
.B EAGAIN
|
.B EAGAIN
|
||||||
.RB ( FUTEX_WAIT ,
|
.RB ( FUTEX_WAIT ,
|
||||||
|
.BR FUTEX_WAIT_BITSET ,
|
||||||
.BR FUTEX_WAIT_REQUEUE_PI )
|
.BR FUTEX_WAIT_REQUEUE_PI )
|
||||||
The value pointed to by
|
The value pointed to by
|
||||||
.I uaddr
|
.I uaddr
|
||||||
|
@ -1136,6 +1187,7 @@ The value pointed to by
|
||||||
is not equal to the expected value
|
is not equal to the expected value
|
||||||
.IR val3 .
|
.IR val3 .
|
||||||
.\" FIXME: Is the following sentence correct?
|
.\" FIXME: Is the following sentence correct?
|
||||||
|
.\" I would prefer to remove this sentence. --triegel@redhat.com
|
||||||
(This probably indicates a race;
|
(This probably indicates a race;
|
||||||
use the safe
|
use the safe
|
||||||
.B FUTEX_WAKE
|
.B FUTEX_WAKE
|
||||||
|
@ -1164,7 +1216,7 @@ Try again.
|
||||||
.RB ( FUTEX_LOCK_PI ,
|
.RB ( FUTEX_LOCK_PI ,
|
||||||
.BR FUTEX_TRYLOCK_PI ,
|
.BR FUTEX_TRYLOCK_PI ,
|
||||||
.BR FUTEX_CMP_REQUEUE_PI )
|
.BR FUTEX_CMP_REQUEUE_PI )
|
||||||
The futex at
|
The futex word at
|
||||||
.I uaddr
|
.I uaddr
|
||||||
is already locked by the caller.
|
is already locked by the caller.
|
||||||
.TP
|
.TP
|
||||||
|
@ -1175,7 +1227,7 @@ is already locked by the caller.
|
||||||
.\" constants are synonymous. Is there a reason that both names
|
.\" constants are synonymous. Is there a reason that both names
|
||||||
.\" are used?
|
.\" are used?
|
||||||
.RB ( FUTEX_CMP_REQUEUE_PI )
|
.RB ( FUTEX_CMP_REQUEUE_PI )
|
||||||
While requeueing a waiter to the PI futex at
|
While requeueing a waiter to the PI futex for the futex word at
|
||||||
.IR uaddr2 ,
|
.IR uaddr2 ,
|
||||||
the kernel detected a deadlock.
|
the kernel detected a deadlock.
|
||||||
.TP
|
.TP
|
||||||
|
@ -1196,7 +1248,6 @@ operation was interrupted by a signal (see
|
||||||
.BR signal (7)).
|
.BR signal (7)).
|
||||||
In kernels before Linux 2.6.22, this error could also be returned for
|
In kernels before Linux 2.6.22, this error could also be returned for
|
||||||
on a spurious wakeup; since Linux 2.6.22, this no longer happens.
|
on a spurious wakeup; since Linux 2.6.22, this no longer happens.
|
||||||
or a spurious wakeup.
|
|
||||||
.TP
|
.TP
|
||||||
.B EINVAL
|
.B EINVAL
|
||||||
The operation in
|
The operation in
|
||||||
|
@ -1353,7 +1404,7 @@ nor
|
||||||
.BR FUTEX_UNLOCK_PI ,
|
.BR FUTEX_UNLOCK_PI ,
|
||||||
.BR FUTEX_CMP_REQUEUE_PI ,
|
.BR FUTEX_CMP_REQUEUE_PI ,
|
||||||
.BR FUTEX_WAIT_REQUEUE_PI )
|
.BR FUTEX_WAIT_REQUEUE_PI )
|
||||||
A run-time check determined that the operation not available.
|
A run-time check determined that the operation is not available.
|
||||||
The PI futex operations are not implemented on all architectures and
|
The PI futex operations are not implemented on all architectures and
|
||||||
are not supported on some CPU variants.
|
are not supported on some CPU variants.
|
||||||
.TP
|
.TP
|
||||||
|
@ -1371,7 +1422,7 @@ the futex at
|
||||||
.TP
|
.TP
|
||||||
.BR EPERM
|
.BR EPERM
|
||||||
.RB ( FUTEX_UNLOCK_PI )
|
.RB ( FUTEX_UNLOCK_PI )
|
||||||
The caller does not own the futex.
|
The caller does not own the lock represented by the futex word.
|
||||||
.TP
|
.TP
|
||||||
.BR ESRCH
|
.BR ESRCH
|
||||||
.RB ( FUTEX_LOCK_PI ,
|
.RB ( FUTEX_LOCK_PI ,
|
||||||
|
@ -1379,7 +1430,7 @@ The caller does not own the futex.
|
||||||
.BR FUTEX_CMP_REQUEUE_PI )
|
.BR FUTEX_CMP_REQUEUE_PI )
|
||||||
.\" FIXME I reworded the following sentence a bit differently from
|
.\" FIXME I reworded the following sentence a bit differently from
|
||||||
.\" tglx's formulation. Is it okay?
|
.\" tglx's formulation. Is it okay?
|
||||||
The thread ID in the futex at
|
The thread ID in the futex word at
|
||||||
.I uaddr
|
.I uaddr
|
||||||
does not exist.
|
does not exist.
|
||||||
.TP
|
.TP
|
||||||
|
@ -1387,7 +1438,7 @@ does not exist.
|
||||||
.RB ( FUTEX_CMP_REQUEUE_PI )
|
.RB ( FUTEX_CMP_REQUEUE_PI )
|
||||||
.\" FIXME I reworded the following sentence a bit differently from
|
.\" FIXME I reworded the following sentence a bit differently from
|
||||||
.\" tglx's formulation. Is it okay?
|
.\" tglx's formulation. Is it okay?
|
||||||
The thread ID in the futex at
|
The thread ID in the futex word at
|
||||||
.I uaddr2
|
.I uaddr2
|
||||||
does not exist.
|
does not exist.
|
||||||
.TP
|
.TP
|
||||||
|
@ -1418,6 +1469,9 @@ This system call is Linux-specific.
|
||||||
.SH NOTES
|
.SH NOTES
|
||||||
Glibc does not provide a wrapper for this system call; call it using
|
Glibc does not provide a wrapper for this system call; call it using
|
||||||
.BR syscall (2).
|
.BR syscall (2).
|
||||||
|
.\" TODO FIXME Above, we cite this section and claim it contains details on
|
||||||
|
.\" the synchronization semantics; add the C11 equivalents here (or whatever
|
||||||
|
.\" we find consensus for).
|
||||||
.\"
|
.\"
|
||||||
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
||||||
.\"
|
.\"
|
||||||
|
@ -1651,3 +1705,7 @@ Futex example library, futex-*.tar.bz2 at
|
||||||
.\"
|
.\"
|
||||||
.\" FIXME Are there any other resources that should be listed
|
.\" FIXME Are there any other resources that should be listed
|
||||||
.\" in the SEE ALSO section?
|
.\" in the SEE ALSO section?
|
||||||
|
.\" FIXME We should probably refer to the glibc code here, in particular the
|
||||||
|
.\" glibc-internal futex wrapper functions that are WIP, and the
|
||||||
|
.\" generic pthread_mutex_t and perhaps condvar implementations.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue