mirror of https://github.com/mkerrisk/man-pages
1318 lines
37 KiB
Groff
1318 lines
37 KiB
Groff
.\" Page by b.hubert
|
|
.\" and Copyright (C) 2015, Thomas Gleixner <tglx@linutronix.de>
|
|
.\" and Copyright (C) 2015, Michael Kerrisk <mtk.manpages@gmail.com>
|
|
.\"
|
|
.\" %%%LICENSE_START(FREELY_REDISTRIBUTABLE)
|
|
.\" may be freely modified and distributed
|
|
.\" %%%LICENSE_END
|
|
.\"
|
|
.\" Niki A. Rahimi (LTC Security Development, narahimi@us.ibm.com)
|
|
.\" added ERRORS section.
|
|
.\"
|
|
.\" Modified 2004-06-17 mtk
|
|
.\" Modified 2004-10-07 aeb, added FUTEX_REQUEUE, FUTEX_CMP_REQUEUE
|
|
.\"
|
|
.\" 2.6.31 adds FUTEX_WAIT_REQUEUE_PI, FUTEX_CMP_REQUEUE_PI
|
|
.\" commit 52400ba946759af28442dee6265c5c0180ac7122
|
|
.\" Author: Darren Hart <dvhltc@us.ibm.com>
|
|
.\" Date: Fri Apr 3 13:40:49 2009 -0700
|
|
.\"
|
|
.\" commit ba9c22f2c01cf5c88beed5a6b9e07d42e10bd358
|
|
.\" Author: Darren Hart <dvhltc@us.ibm.com>
|
|
.\" Date: Mon Apr 20 22:22:22 2009 -0700
|
|
.\"
|
|
.\" See Documentation/futex-requeue-pi.txt
|
|
.\"
|
|
.TH FUTEX 2 2014-05-21 "Linux" "Linux Programmer's Manual"
|
|
.SH NAME
|
|
futex \- fast user-space locking
|
|
.SH SYNOPSIS
|
|
.nf
|
|
.sp
|
|
.B "#include <linux/futex.h>"
|
|
.B "#include <sys/time.h>"
|
|
.sp
|
|
.BI "int futex(int *" uaddr ", int " futex_op ", int " val ,
|
|
.BI " const struct timespec *" timeout , \
|
|
" \fR /* or: \fBu32 \fIval2\fP */
|
|
.BI " int *" uaddr2 ", int " val3 );
|
|
.fi
|
|
|
|
.IR Note :
|
|
There is no glibc wrapper for this system call; see NOTES.
|
|
.SH DESCRIPTION
|
|
.PP
|
|
The
|
|
.BR futex ()
|
|
system call provides a method for
|
|
a program to wait for a value at a given address to change, and a
|
|
method to wake up anyone waiting on a particular address (while the
|
|
addresses for the same memory in separate processes may not be
|
|
equal, the kernel maps them internally so the same memory mapped in
|
|
different locations will correspond for
|
|
.BR futex ()
|
|
calls).
|
|
This system call is typically used to
|
|
implement the contended case of a lock in shared memory, as
|
|
described in
|
|
.BR futex (7).
|
|
.PP
|
|
When a futex operation did not finish uncontended in user space, a
|
|
.BR futex ()
|
|
call needs to be made to the kernel to arbitrate.
|
|
Arbitration can either mean putting the calling
|
|
process to sleep or, conversely, waking a waiting process.
|
|
.PP
|
|
Callers of
|
|
.BR futex ()
|
|
are expected to adhere to the semantics described in
|
|
.BR futex (7).
|
|
As these
|
|
semantics involve writing nonportable assembly instructions, this in turn
|
|
probably means that most users will in fact be library authors and not
|
|
general application developers.
|
|
.PP
|
|
The
|
|
.I uaddr
|
|
argument points to an integer which stores the counter (futex).
|
|
On all platforms, futexes are four-byte integers that
|
|
must be aligned on a four-byte boundary.
|
|
The operation to perform on the futex is specified in the
|
|
.I futex_op
|
|
argument;
|
|
.IR val
|
|
is a value whose meaning and purpose depends on
|
|
.IR futex_op .
|
|
|
|
The remaining arguments
|
|
.RI ( timeout ,
|
|
.IR uaddr2 ,
|
|
and
|
|
.IR val3 )
|
|
are required only for certain of the futex operations described below.
|
|
Where one of these arguments is not required, it is ignored.
|
|
|
|
For several blocking operations, the
|
|
.I timeout
|
|
argument is a pointer to a
|
|
.IR timespec
|
|
structure that specifies a timeout for the operation.
|
|
However, notwithstanding the prototype shown above, for some operations,
|
|
this argument is instead a four-byte integer whose meaning
|
|
is determined by the operation.
|
|
For these operations, the kernel casts the
|
|
.I timeout
|
|
value to
|
|
.IR u32 ,
|
|
and in the remainder of this page, this argument is referred to as
|
|
.I val2
|
|
when interpreted in this fashion.
|
|
|
|
Where it is required, the
|
|
.IR uaddr2
|
|
argument is a pointer to a second futex that is employed by the operation.
|
|
The interpretation of the final integer argument,
|
|
.IR val3 ,
|
|
depends on the operation.
|
|
|
|
The
|
|
.I futex_op
|
|
argument consists of two parts:
|
|
a command that specifies the operation to be performed,
|
|
bit-wise ORed with zero or or more options that
|
|
modify the behaviour of the operation.
|
|
The options that may be included in
|
|
.I futex_op
|
|
are as follows:
|
|
.TP
|
|
.BR FUTEX_PRIVATE_FLAG " (since Linux 2.6.22)"
|
|
.\" commit 34f01cc1f512fa783302982776895c73714ebbc2
|
|
This option bit can be employed with all futex operations.
|
|
It tells the kernel that the futex is process private and not shared
|
|
with another process.
|
|
This allows the kernel to choose the fast path for validating
|
|
the user-space address and avoids expensive VMA lookups,
|
|
taking reference counts on file backing store, and so on.
|
|
|
|
As a convenience,
|
|
.IR <linux/futex.h>
|
|
defines a set of constants with the suffix
|
|
.BR _PRIVATE
|
|
that are equivalents of all of the operations listed below,
|
|
.\" except the obsolete FUTEX_FD, for which the "private" flag was
|
|
.\" meaningless
|
|
but with the
|
|
.BR FUTEX_PRIVATE_FLAG
|
|
ORed into the constant value.
|
|
Thus, there are
|
|
.BR FUTEX_WAIT_PRIVATE ,
|
|
.BR FUTEX_WAKE_PRIVATE ,
|
|
and so on.
|
|
.TP
|
|
.BR FUTEX_CLOCK_REALTIME " (since Linux 2.6.28)"
|
|
.\" commit 1acdac104668a0834cfa267de9946fac7764d486
|
|
This option bit can be employed only with the
|
|
.BR FUTEX_WAIT_BITSET
|
|
and
|
|
.BR FUTEX_WAIT_REQUEUE_PI
|
|
operations.
|
|
|
|
If this option is set, the kernel treats
|
|
.I timeout
|
|
as an absolute time based on
|
|
.BR CLOCK_REALTIME .
|
|
|
|
If this option is not set, the kernel treats
|
|
.I timeout
|
|
as relative time,
|
|
.\" FIXME I added CLOCK_MONOTONIC here. Is it correct?
|
|
measured against the
|
|
.BR CLOCK_MONOTONIC
|
|
clock.
|
|
.PP
|
|
The operation specified in
|
|
.I futex_op
|
|
is one of the following:
|
|
.TP
|
|
.BR FUTEX_WAIT " (since Linux 2.6.0)"
|
|
.\" Strictly speaking, since some time in 2.5.x
|
|
This operation tests that the value at the
|
|
location pointed to by the futex address
|
|
.I uaddr
|
|
still contains the value
|
|
.IR val ,
|
|
and then sleeps awaiting
|
|
.B FUTEX_WAKE
|
|
on the futex address.
|
|
The test and sleep steps are performed atomically.
|
|
If the futex value does not match
|
|
.IR val ,
|
|
then the call fails immediately with the error
|
|
.BR EAGAIN .
|
|
.\" FIXME I added the following sentence. Please confirm that it is correct.
|
|
The purpose of the test step is to detect races where
|
|
another process changes that value of the futex between
|
|
the time it was last checked and the time of the
|
|
.BR FUTEX_WAIT
|
|
operation.
|
|
|
|
If the
|
|
.I timeout
|
|
argument is non-NULL, its contents specify a relative timeout for the wait
|
|
.\" FIXME I added CLOCK_MONOTONIC here. Is it correct?
|
|
measured according to the
|
|
.BR CLOCK_MONOTONIC
|
|
clock.
|
|
(This interval will be rounded up to the system clock granularity,
|
|
and kernel scheduling delays mean that the
|
|
blocking interval may overrun by a small amount.)
|
|
If
|
|
.I timeout
|
|
is NULL, the call blocks indefinitely.
|
|
|
|
The arguments
|
|
.I uaddr2
|
|
and
|
|
.I val3
|
|
are ignored.
|
|
|
|
For
|
|
.BR futex (7),
|
|
this call is executed if decrementing the count gave a negative value
|
|
(indicating contention), and will sleep until another process releases
|
|
the futex and executes the
|
|
.B FUTEX_WAKE
|
|
operation.
|
|
.TP
|
|
.BR FUTEX_WAKE " (since Linux 2.6.0)"
|
|
.\" Strictly speaking, since Linux 2.5.x
|
|
This operation wakes at most
|
|
.I val
|
|
processes waiting (i.e., inside
|
|
.BR FUTEX_WAIT )
|
|
on the futex at the address
|
|
.IR uaddr .
|
|
Most commonly,
|
|
.I val
|
|
is specified as either 1 (wake up a single waiter) or
|
|
.BR INT_MAX
|
|
(wake up all waiters).
|
|
.\" FIXME Please confirm that the following is correct:
|
|
No guarantee is provided about which waiters are awoken
|
|
(e.g., a waiter with a higher scheduling priority is not guaranteed
|
|
to be awoken in preference to a waiter with a lower priority).
|
|
|
|
The arguments
|
|
.IR timeout ,
|
|
.IR uaddr2 ,
|
|
and
|
|
.I val3
|
|
are ignored.
|
|
|
|
For
|
|
.BR futex (7),
|
|
this is executed if incrementing
|
|
the count showed that there were waiters, once the futex value has been set
|
|
to 1 (indicating that it is available).
|
|
.TP
|
|
.BR FUTEX_FD " (from Linux 2.6.0 up to and including Linux 2.6.25)"
|
|
.\" Strictly speaking, from Linux 2.5.x to 2.6.25
|
|
This operation creates a file descriptor that is associated with the futex at
|
|
.IR uaddr .
|
|
.\" , suitable for .BR poll (2).
|
|
The calling process must close the returned file descriptor after use.
|
|
When another process performs a
|
|
.BR FUTEX_WAKE
|
|
on the futex, the file descriptor indicates as being readable with
|
|
.BR select (2),
|
|
.BR poll (2),
|
|
and
|
|
.BR epoll (7)
|
|
|
|
The file descriptor can be used to obtain asynchronous notifications:
|
|
if
|
|
.I val
|
|
is nonzero, then when another process executes a
|
|
.BR FUTEX_WAKE ,
|
|
the caller will receive the signal number that was passed in
|
|
.IR val .
|
|
|
|
The arguments
|
|
.IR timeout ,
|
|
.I uaddr2
|
|
and
|
|
.I val3
|
|
are ignored.
|
|
|
|
To prevent race conditions, the caller should test if the futex has
|
|
been upped after
|
|
.B FUTEX_FD
|
|
returns.
|
|
|
|
Because it was inherently racy,
|
|
.B FUTEX_FD
|
|
has been removed
|
|
.\" commit 82af7aca56c67061420d618cc5a30f0fd4106b80
|
|
from Linux 2.6.26 onward.
|
|
.TP
|
|
.BR FUTEX_REQUEUE " (since Linux 2.6.0)"
|
|
.\" Strictly speaking: from Linux 2.5.70
|
|
.\"
|
|
.\" FIXME I added this warning. Okay?
|
|
.IR "Avoid using this operation" .
|
|
It is broken (unavoidably racy) for its intended purpose.
|
|
Use
|
|
.BR FUTEX_CMP_REQUEUE
|
|
instead.
|
|
|
|
This operation performs the same task as
|
|
.BR FUTEX_CMP_REQUEUE ,
|
|
except that no check is made using the value in
|
|
.IR val3 .
|
|
(The argument
|
|
.I val3
|
|
is ignored.)
|
|
.TP
|
|
.BR FUTEX_CMP_REQUEUE " (since Linux 2.6.7)"
|
|
This operation was added as a replacement for the earlier
|
|
.BR FUTEX_REQUEUE ,
|
|
because that operation was racy for its intended use.
|
|
|
|
As with
|
|
.BR FUTEX_REQUEUE ,
|
|
the
|
|
.BR FUTEX_CMP_REQUEUE
|
|
operation is used to avoid a "thundering herd" effect when
|
|
.B FUTEX_WAKE
|
|
is used and all processes woken up need to acquire another futex.
|
|
It differs from
|
|
.BR FUTEX_REQUEUE
|
|
in that it first checks whether the location
|
|
.I uaddr
|
|
still contains the value
|
|
.IR val3 .
|
|
If not, the operation fails with the error
|
|
.BR EAGAIN .
|
|
.\" FIXME I added the following sentence on rational for FUTEX_CMP_REQUEUE.
|
|
.\" Is it correct? SHould it be expanded?
|
|
This additional feature of
|
|
.BR FUTEX_CMP_REQUEUE
|
|
can be used by the caller to (atomically) detect changes
|
|
in the value of the target futex at
|
|
.IR uaddr2 .
|
|
|
|
The operation wakes up a maximum of
|
|
.I val
|
|
waiters that are waiting on the futex at
|
|
.IR uaddr .
|
|
If there are more than
|
|
.I val
|
|
waiters, then the remaining waiters are removed
|
|
from the wait queue of the source futex at
|
|
.I uaddr
|
|
and added to the wait queue of the target futex at
|
|
.IR uaddr2 .
|
|
|
|
The
|
|
.I val2
|
|
argument specifies an upper limit on the number of waiters
|
|
that are requeued to the futex at
|
|
.IR uaddr2 .
|
|
|
|
.\" FIXME Please review the following new paragraph to see if it is
|
|
.\" accurate.
|
|
Typical values to specify for
|
|
.I val
|
|
are 0 or or 1.
|
|
(Specifying
|
|
.BR INT_MAX
|
|
is not useful, because it would make the
|
|
.BR FUTEX_CMP_REQUEUE
|
|
operation equivalent to
|
|
.BR FUTEX_WAKE .)
|
|
The limit value specified via
|
|
.I val2
|
|
is typically either 1 or
|
|
.BR INT_MAX .
|
|
(Specifying the argument as 0 is not useful, because it would make the
|
|
.BR FUTEX_CMP_REQUEUE
|
|
operation equivalent to
|
|
.BR FUTEX_WAIT .)
|
|
.\"
|
|
.\" FIXME I added some FUTEX_WAKE_OP text, and I'd be happy if someone
|
|
.\" checked it.
|
|
.TP
|
|
.BR FUTEX_WAKE_OP " (since Linux 2.6.14)"
|
|
.\" commit 4732efbeb997189d9f9b04708dc26bf8613ed721
|
|
.\" Author: Jakub Jelinek <jakub@redhat.com>
|
|
.\" Date: Tue Sep 6 15:16:25 2005 -0700
|
|
This operation was added to support some user-space use cases
|
|
where more than one futex must be handled at the same time.
|
|
The most notable example is the implementation of
|
|
.BR pthread_cond_signal (3),
|
|
which requires operations on two futexes,
|
|
the one used to implement the mutex and the one used in the implementation
|
|
of the wait queue associated with the condition variable.
|
|
.BR FUTEX_WAKE_OP
|
|
allows such cases to be implemented without leading to
|
|
high rates of contention and context switching.
|
|
|
|
The
|
|
.BR FUTEX_WAIT_OP
|
|
operation is equivalent to atomically executing the following code:
|
|
|
|
.in +4n
|
|
.nf
|
|
int oldval = *(int *) uaddr2;
|
|
*(int *) uaddr2 = oldval \fIop\fP \fIoparg\fP;
|
|
futex(uaddr, FUTEX_WAKE, val, 0, 0, 0);
|
|
if (oldval \fIcmp\fP \fIcmparg\fP)
|
|
futex(uaddr2, FUTEX_WAKE, val2, 0, 0, 0);
|
|
.fi
|
|
.in
|
|
|
|
In other words,
|
|
.BR FUTEX_WAIT_OP
|
|
does the following:
|
|
.RS
|
|
.IP * 3
|
|
saves the original value of the futex at
|
|
.IR uaddr2 ;
|
|
.IP *
|
|
performs an operation to modify the value of the futex at
|
|
.IR uaddr2 ;
|
|
.IP *
|
|
wakes up a maximum of
|
|
.I val
|
|
waiters on the futex
|
|
.IR uaddr ;
|
|
and
|
|
.IP *
|
|
dependent on the results of a test of the original value of the futex at
|
|
.IR uaddr2 ,
|
|
wakes up a maximum of
|
|
.I val2
|
|
waiters on the futex
|
|
.IR uaddr2 .
|
|
.RE
|
|
.IP
|
|
The operation and comparison that are to be performed are encoded
|
|
in the bits of the argument
|
|
.IR val3 .
|
|
Pictorially, the encoding is:
|
|
|
|
.in +8n
|
|
.nf
|
|
+---+---+-----------+-----------+
|
|
|op |cmp| oparg | cmparg |
|
|
+---+---+-----------+-----------+
|
|
4 4 12 12 <== # of bits
|
|
.fi
|
|
.in
|
|
|
|
Expressed in code, the encoding is:
|
|
|
|
.in +4n
|
|
.nf
|
|
#define FUTEX_OP(op, oparg, cmp, cmparg) \\
|
|
(((op & 0xf) << 28) | \\
|
|
((cmp & 0xf) << 24) | \\
|
|
((oparg & 0xfff) << 12) | \\
|
|
(cmparg & 0xfff))
|
|
.fi
|
|
.in
|
|
|
|
In the above,
|
|
.I op
|
|
and
|
|
.I cmp
|
|
are each one of the codes listed below.
|
|
The
|
|
.I oparg
|
|
and
|
|
.I cmparg
|
|
components are literal numeric values, except as noted below.
|
|
|
|
The
|
|
.I op
|
|
component has one of the following values:
|
|
|
|
.in +4n
|
|
.nf
|
|
FUTEX_OP_SET 0 /* uaddr2 = oparg; */
|
|
FUTEX_OP_ADD 1 /* uaddr2 += oparg; */
|
|
FUTEX_OP_OR 2 /* uaddr2 |= oparg; */
|
|
FUTEX_OP_ANDN 3 /* uaddr2 &= ~oparg; */
|
|
FUTEX_OP_XOR 4 /* uaddr2 ^= oparg; */
|
|
.fi
|
|
.in
|
|
|
|
In addition, bit-wise ORing the following value into
|
|
.I op
|
|
causes
|
|
.IR "(1\ <<\ oparg)"
|
|
to be used as the operand:
|
|
|
|
.in +4n
|
|
.nf
|
|
FUTEX_OP_ARG_SHIFT 8 /* Use (1 << oparg) as operand */
|
|
.fi
|
|
.in
|
|
|
|
The
|
|
.I cmp
|
|
field is one of the following:
|
|
|
|
.in +4n
|
|
.nf
|
|
FUTEX_OP_CMP_EQ 0 /* if (oldval == cmparg) wake */
|
|
FUTEX_OP_CMP_NE 1 /* if (oldval != cmparg) wake */
|
|
FUTEX_OP_CMP_LT 2 /* if (oldval < cmparg) wake */
|
|
FUTEX_OP_CMP_LE 3 /* if (oldval <= cmparg) wake */
|
|
FUTEX_OP_CMP_GT 4 /* if (oldval > cmparg) wake */
|
|
FUTEX_OP_CMP_GE 5 /* if (oldval >= cmparg) wake */
|
|
.fi
|
|
.in
|
|
|
|
The return value of
|
|
.BR FUTEX_WAKE_OP
|
|
is the sum of the number of waiters woken on the futex
|
|
.IR uaddr
|
|
plus the number of waiters woken on the futex
|
|
.IR uaddr2 .
|
|
.TP
|
|
.BR FUTEX_WAIT_BITSET " (since Linux 2.6.25)"
|
|
.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
|
|
This operation is like
|
|
.BR FUTEX_WAIT
|
|
except that
|
|
.I val3
|
|
is used to provide a 32-bit bitset to the kernel.
|
|
This bitset is stored in the kernel-internal state of the waiter.
|
|
See the description of
|
|
.BR FUTEX_WAKE_BITSET
|
|
for further details.
|
|
|
|
The
|
|
.BR FUTEX_WAIT_BITSET
|
|
also interprets the
|
|
.I timeout
|
|
argument differently from
|
|
.BR FUTEX_WAIT .
|
|
See the discussion of
|
|
.BR FUTEX_CLOCK_REALTIME ,
|
|
above.
|
|
|
|
The
|
|
.I uaddr2
|
|
argument is ignored.
|
|
.TP
|
|
.BR FUTEX_WAKE_BITSET " (since Linux 2.6.25)"
|
|
.\" commit cd689985cf49f6ff5c8eddc48d98b9d581d9475d
|
|
This operation is the same as
|
|
.BR FUTEX_WAKE
|
|
except that the
|
|
.I val3
|
|
argument is used to provide a 32-bit bitset to the kernel.
|
|
This bitset is used to select which waiters should be woken up.
|
|
The selection is done by a bit-wise AND of the "wake" bitset
|
|
(i.e., the value in
|
|
.IR val3 )
|
|
and the bitset which is stored in the kernel-internal
|
|
state of the waiter (the "wait" bitset that is set using
|
|
.BR FUTEX_WAIT_BITSET ).
|
|
All of the waiters for which the result of the AND is nonzero are woken up;
|
|
the remaining waiters are left sleeping.
|
|
|
|
.\" FIXME please review this paragraph that I added
|
|
The effect of
|
|
.BR FUTEX_WAIT_BITSET
|
|
and
|
|
.BR FUTEX_WAKE_BITSET
|
|
is to allow selective wake-ups among multiple waiters that are waiting
|
|
on the same futex;
|
|
since a futex has a size of 32 bits,
|
|
these operations provide 32 wakeup "channels".
|
|
(The
|
|
.BR FUTEX_WAIT
|
|
and
|
|
.BR FUTEX_WAKE
|
|
operations correspond to
|
|
.BR FUTEX_WAIT_BITSET
|
|
and
|
|
.BR FUTEX_WAKE_BITSET
|
|
operations where the bitsets are all ones.)
|
|
Note, however, that using this bitset multiplexing feature on a
|
|
futex is less efficient than simply using multiple futexes,
|
|
because employing bitset multiplexing requires the kernel
|
|
to check all waiters on a futex,
|
|
including those that are not interested in being woken up
|
|
(i.e., they do not have the relevant bit set in their "wait" bitset).
|
|
.\" According to http://locklessinc.com/articles/futex_cheat_sheet/:
|
|
.\"
|
|
.\" "The original reason for the addition of these extensions
|
|
.\" was to improve the performance of pthread read-write locks
|
|
.\" in glibc. However, the pthreads library no longer uses the
|
|
.\" same locking algorithm, and these extensions are not used
|
|
.\" without the bitset parameter being all ones.
|
|
.\"
|
|
.\" The page goes on to note that the FUTEX_WAIT_BITSET operation
|
|
.\" is nevertheless used (with a bitset of all ones) in order to
|
|
.\" obtain the absolute timeout functionality that is useful
|
|
.\" for efficiently implementing Pthreads APIs (which use absolute
|
|
.\" timeouts); FUTEX_WAIT provides only relative timeouts.
|
|
|
|
The
|
|
.I uaddr2
|
|
and
|
|
.I timeout
|
|
arguments are ignored.
|
|
.\"
|
|
.\"
|
|
.SS Priority-inheritance futexes
|
|
Linux supports priority-inheritance (PI) futexes in order to handle
|
|
priority-inversion problems that can be encountered with
|
|
normal futex locks.
|
|
.\"
|
|
.\" FIXME ===== Start of adapted Hart/Guniguntala text =====
|
|
.\" The following text is drawn from the Hart/Guniguntala paper,
|
|
.\" but I have reworded some pieces significantly. Please check it.
|
|
.\"
|
|
The PI futex operations described below differ from the other
|
|
futex operations in that they impose policy on the use of the futex value:
|
|
.IP * 3
|
|
If the lock is unowned, the futex value shall be 0.
|
|
.IP *
|
|
If the lock is owned, the futex value shall be the thread ID (TID; see
|
|
.BR gettid (2))
|
|
of the owning thread.
|
|
.IP *
|
|
.\" FIXME In the following line, I added "the lock is owned and". Okay?
|
|
If the lock is owned and there are threads contending for the lock,
|
|
then the
|
|
.B FUTEX_WAITERS
|
|
bit shall be set in the futex value; in other words, the futex value is:
|
|
|
|
FUTEX_WAITERS | TID
|
|
.PP
|
|
With this policy in place,
|
|
a user-space application can acquire an unowned
|
|
lock or release an uncontended lock using a atomic
|
|
.\" FIXME In the following line, I added "user-space". Okay?
|
|
user-space instructions (e.g.,
|
|
.I cmpxchg
|
|
on the x86 architecture).
|
|
Locking an unowned lock simply consists of setting
|
|
the futex value to the caller's TID.
|
|
Releasing an uncontended lock simply requires setting the futex value to 0.
|
|
|
|
If a futex is currently owned (i.e., has a nonzero value),
|
|
waiters must employ the
|
|
.B FUTEX_LOCK_PI
|
|
operation to acquire the lock.
|
|
If a lock is contended (i.e., the
|
|
.B FUTEX_WAITERS
|
|
bit is set in the futex value), the lock owner must employ the
|
|
.B FUTEX_UNLOCK_PI
|
|
operation to release the lock.
|
|
|
|
In the cases where callers are forced into the kernel
|
|
(i.e., required to perform a
|
|
.BR futex ()
|
|
operation),
|
|
they then deal directly with a so-called RT-mutex,
|
|
a kernel locking mechanism which implements the required
|
|
priority-inheritance semantics.
|
|
After the RT-mutex is acquired, the futex value is updated accordingly,
|
|
before the calling thread returns to user space.
|
|
.\" FIXME ===== End of adapted Hart/Guniguntala text =====
|
|
|
|
It is important
|
|
.\" FIXME We need some explanation here of why it is important to note this
|
|
to note that the kernel will update the futex value prior
|
|
to returning to user space.
|
|
Unlike the other futex operations described above,
|
|
the PI futex operations are designed
|
|
for the implementation of very specific IPC mechanisms).
|
|
.\"
|
|
.\" FIXME We don't quite have a definition anywhere of what a PI futex
|
|
.\" is (vs a non-PI futex). Below, we have the information of
|
|
.\" FUTEX_CMP_REQUEUE_PI requeues from a non-PI futex to a
|
|
.\" PI futex, but what determines whether the futex is of one
|
|
.\" kind of the other? We should have such a definition somewhere
|
|
.\" about here.
|
|
|
|
PI futexes are operated on by specifying one of the following values in
|
|
.IR futex_op :
|
|
.TP
|
|
.BR FUTEX_LOCK_PI " (since Linux 2.6.18)"
|
|
.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
|
|
.\"
|
|
.\" FIXME I did some significant rewording of tglx's text.
|
|
.\" Please check, in case I injected errors.
|
|
.\"
|
|
This operation is used after after an attempt to acquire
|
|
the futex lock via an atomic user-space instruction failed
|
|
because the futex has a nonzero value\(emspecifically,
|
|
because it contained the namespace-specific TID of the lock owner.
|
|
.\" FIXME In the preceding line, what does "namespace-specific" mean?
|
|
.\" (I kept those words from tglx.)
|
|
.\" That is, what kind of namespace are we talking about?
|
|
.\" (I suppose we are talking PID namespaces here, but I want to
|
|
.\" be sure.)
|
|
|
|
The operation checks the value of the futex at the address
|
|
.IR uaddr .
|
|
If the value is 0, then the kernel tries to atomically set the futex value to
|
|
the caller's TID.
|
|
If that fails,
|
|
.\" FIXME What would be the cause of failure?
|
|
or the futex value is nonzero,
|
|
the kernel atomically sets the
|
|
.B FUTEX_WAITERS
|
|
bit, which signals the futex owner that it cannot unlock the futex in
|
|
user space atomically by setting the futex value to 0.
|
|
After that, the kernel tries to find the thread which is
|
|
associated with the owner TID,
|
|
.\" FIXME Could I get a bit more detail on the next two lines?
|
|
.\" What is "creates or reuses kernel state" about?
|
|
creates or reuses kernel state on behalf of the owner
|
|
and attaches the waiter to it.
|
|
.\" FIXME In the next line, what type of "priority" are we talking about?
|
|
.\" Realtime priorities for SCHED_FIFO and SCHED_RR?
|
|
.\" Or something else?
|
|
The enqueing of the waiter is in descending priority order if more
|
|
than one waiter exists.
|
|
.\" FIXME What does "bandwidth" refer to in the next line?
|
|
The owner inherits either the priority or the bandwidth of the waiter.
|
|
.\" FIXME In the preceding line, what determines whether the
|
|
.\" owner inherits the priority versus the bandwidth?
|
|
.\"
|
|
.\" FIXME Could I get some help translating the next sentence into
|
|
.\" something that user-space developers (and I) can understand?
|
|
.\" In particular, what are "nexted locks" in this context?
|
|
This inheritance follows the lock chain in the case of
|
|
nested locking and performs deadlock detection.
|
|
|
|
.\" FIXME tglx says "The timeout argument is handled as described in
|
|
.\" FUTEX_WAIT." However, it appears to me that this is not right.
|
|
.\" Is the following formulation correct.
|
|
The
|
|
.I timeout
|
|
argument provides a timeout for the lock attempt.
|
|
It is interpreted as an absolute time, measured against the
|
|
.BR CLOCK_REALTIME
|
|
clock.
|
|
If
|
|
.I timeout
|
|
is NULL, the operation will block indefinitely.
|
|
|
|
The
|
|
.IR uaddr2 ,
|
|
.IR val ,
|
|
and
|
|
.IR val3
|
|
arguments are ignored.
|
|
.\" FIXME
|
|
.\" tglx noted the following "ERROR" case for FUTEX_LOCK_PI and
|
|
.\" FUTEX_TRYLOCK_PI and FUTEX_WAIT_REQUEUE_PI:
|
|
.\"
|
|
.\" > [EOWNERDIED] The owner of the futex died and the kernel made the
|
|
.\" > caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit
|
|
.\" > in the futex userspace value. Caller is responsible for cleanup
|
|
.\"
|
|
.\" However, there is no such thing as an EOWNERDIED error. I had a look
|
|
.\" through the kernel source for the FUTEX_OWNER_DIED cases and didn't
|
|
.\" see an obvious error associated with them. Can you clarify? (I think
|
|
.\" the point is that this condition, which is described in
|
|
.\" Documentation/robust-futexes.txt, is not an error as such. However,
|
|
.\" I'm not yet sure of how to describe it in the man page.)
|
|
.\" Suggestions please!
|
|
.\"
|
|
.TP
|
|
.BR FUTEX_TRYLOCK_PI " (since Linux 2.6.18)"
|
|
.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
|
|
This operation tries to acquire the futex at
|
|
.IR uaddr .
|
|
.\" FIXME I think it would be helpful here to say a few more words about
|
|
.\" the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI
|
|
It deals with the situation where the TID value at
|
|
.I uaddr
|
|
is 0, but the
|
|
.B FUTEX_WAITERS
|
|
bit is set.
|
|
.\" FIXME How does the situation in the previous sentence come about?
|
|
.\" Probably it would be helpful to say something about that in
|
|
.\" the man page.
|
|
.\" FIXME And *how* does FUTEX_TRYLOCK_PI deal with this situation?
|
|
User space cannot handle this race free.
|
|
|
|
The
|
|
.IR uaddr2 ,
|
|
.IR val ,
|
|
.IR timeout ,
|
|
and
|
|
.IR val3
|
|
arguments are ignored.
|
|
.TP
|
|
.BR FUTEX_UNLOCK_PI " (since Linux 2.6.18)"
|
|
.\" commit c87e2837be82df479a6bae9f155c43516d2feebc
|
|
This operation wakes the top priority waiter which is waiting in
|
|
.B FUTEX_LOCK_PI
|
|
on the futex address provided by the
|
|
.I uaddr
|
|
argument.
|
|
|
|
This is called when the user space value at
|
|
.I uaddr
|
|
cannot be changed atomically from a TID (of the owner) to 0.
|
|
|
|
The
|
|
.IR uaddr2 ,
|
|
.IR val ,
|
|
.IR timeout ,
|
|
and
|
|
.IR val3
|
|
arguments are ignored.
|
|
.TP
|
|
.BR FUTEX_CMP_REQUEUE_PI " (since Linux 2.6.31)"
|
|
.\" commit 52400ba946759af28442dee6265c5c0180ac7122
|
|
.\" FIXME to complete
|
|
This operation is a PI-aware variant of
|
|
.BR FUTEX_CMP_REQUEUE .
|
|
It requeues waiters that are blocked via
|
|
.B FUTEX_WAIT_REQUEUE_PI
|
|
on
|
|
.I uaddr
|
|
from a non-PI source futex
|
|
.RI ( uaddr )
|
|
to a PI target futex
|
|
.RI ( uaddr2 ).
|
|
|
|
As with
|
|
.BR FUTEX_CMP_REQUEUE ,
|
|
this operation wakes up a maximum of
|
|
.I val
|
|
waiters that are waiting on the futex at
|
|
.IR uaddr .
|
|
However, for
|
|
.BR FUTEX_CMP_REQUEUE_PI ,
|
|
.I val
|
|
is required to be 1.
|
|
The remaining waiters are removed from the wait queue of the source futex at
|
|
.I uaddr
|
|
and added to the wait queue of the target futex at
|
|
.IR uaddr2 .
|
|
|
|
The
|
|
.I val2
|
|
and
|
|
.I val3
|
|
arguments serve the same purposes as for
|
|
.BR FUTEX_CMP_REQUEUE .
|
|
.\" FIXME The page at http://locklessinc.com/articles/futex_cheat_sheet/
|
|
.\" notes that "priority-inheritance Futex to priority-inheritance
|
|
.\" Futex requeues are currently unsupported". Do we need to say
|
|
.\" something in the man page about that?
|
|
.TP
|
|
.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
|
|
.\" commit 52400ba946759af28442dee6265c5c0180ac7122
|
|
.\" FIXME to complete
|
|
.\"
|
|
.\" FIXME Employs 'timeout' argument, supports FUTEX_CLOCK_REALTIME
|
|
.\" 'timeout' can be NULL
|
|
.\"
|
|
[As yet undocumented]
|
|
.SH RETURN VALUE
|
|
.PP
|
|
In the event of an error, all operations return \-1 and set
|
|
.I errno
|
|
to indicate the cause of the error.
|
|
The return value on success depends on the operation,
|
|
as described in the following list:
|
|
.TP
|
|
.B FUTEX_WAIT
|
|
Returns 0 if the process was woken by a
|
|
.B FUTEX_WAKE
|
|
or
|
|
.B FUTEX_WAKE_BITSET
|
|
call.
|
|
.TP
|
|
.B FUTEX_WAKE
|
|
Returns the number of processes woken up.
|
|
.TP
|
|
.B FUTEX_FD
|
|
Returns the new file descriptor associated with the futex.
|
|
.TP
|
|
.B FUTEX_REQUEUE
|
|
Returns the number of processes woken up.
|
|
.TP
|
|
.B FUTEX_CMP_REQUEUE
|
|
Returns the total number of processes woken up or requeued to the futex at
|
|
.IR uaddr2 .
|
|
If this value is greater than
|
|
.IR val ,
|
|
then difference is the number of waiters requeued to the futex at
|
|
.IR uaddr2 .
|
|
.\"
|
|
.\" FIXME Add success returns for other operations
|
|
.TP
|
|
.B FUTEX_WAKE_OP
|
|
.\" FIXME Is the following correct?
|
|
Returns the total number of waiters that were woken up.
|
|
This is the sum of the woken waiters on the two futexes at
|
|
.I uaddr
|
|
and
|
|
.IR uaddr2 .
|
|
.TP
|
|
.B FUTEX_WAIT_BITSET
|
|
.\" FIXME Is the following correct?
|
|
Returns 0 if the process was woken by a
|
|
.B FUTEX_WAKE
|
|
or
|
|
.B FUTEX_WAKE_BITSET
|
|
call.
|
|
.TP
|
|
.B FUTEX_WAKE_BITSET
|
|
.\" FIXME Is the following correct?
|
|
Returns the number of processes woken up.
|
|
.TP
|
|
.B FUTEX_LOCK_PI
|
|
.\" FIXME Is the following correct?
|
|
Returns 0 if the futex was successfully locked.
|
|
.TP
|
|
.B FUTEX_TRYLOCK_PI
|
|
.\" FIXME Is the following correct?
|
|
Returns 0 if the futex was successfully locked.
|
|
.TP
|
|
.B FUTEX_UNLOCK_PI
|
|
.\" FIXME Is the following correct?
|
|
Returns 0 if the futex was successfully unlocked.
|
|
.TP
|
|
.B FUTEX_CMP_REQUEUE_PI
|
|
.\" FIXME Is the following correct?
|
|
Returns the total number of processes woken up or requeued to the futex at
|
|
.IR uaddr2 .
|
|
If this value is greater than
|
|
.IR val ,
|
|
then difference is the number of waiters requeued to the futex at
|
|
.IR uaddr2 .
|
|
.TP
|
|
.B FUTEX_WAIT_REQUEUE_PI
|
|
.\" FIXME Is the following correct?
|
|
Returns 0 if the caller was successfully requeued to the futex at
|
|
.IR uaddr2 .
|
|
.SH ERRORS
|
|
.TP
|
|
.B EACCES
|
|
No read access to futex memory.
|
|
.TP
|
|
.B EAGAIN
|
|
.RB ( FUTEX_WAIT )
|
|
The value pointed to by
|
|
.I uaddr
|
|
was not equal to the expected value
|
|
.I val
|
|
at the time of the call.
|
|
.TP
|
|
.B EAGAIN
|
|
.RB ( FUTEX_CMP_REQUEUE ,
|
|
.BR FUTEX_CMP_REQUEUE_PI )
|
|
The value pointed to by
|
|
.I uaddr
|
|
is not equal to the expected value
|
|
.IR val3 .
|
|
.\" FIXME: Is the following sentence correct?
|
|
(This probably indicates a race;
|
|
use the safe
|
|
.B FUTEX_WAKE
|
|
now.)
|
|
.\"
|
|
.\" FIXME Should there be an EAGAIN case for FUTEX_TRYLOCK_PI?
|
|
.\" It seems so, looking at the handling of the rt_mutex_trylock()
|
|
.\" call in futex_lock_pi()
|
|
.\"
|
|
.TP
|
|
.BR EAGAIN
|
|
.RB ( FUTEX_LOCK_PI ,
|
|
.BR FUTEX_TRYLOCK_PI ,
|
|
.BR FUTEX_CMP_REQUEUE_PI )
|
|
The futex owner thread ID of
|
|
.I uaddr
|
|
(for
|
|
.BR FUTEX_CMP_REQUEUE_PI :
|
|
.IR uaddr2 )
|
|
is about to exit,
|
|
but has not yet handled the internal state cleanup.
|
|
Try again.
|
|
.\"
|
|
.\" FIXME Is there not also an EAGAIN error case on 'uaddr2' for
|
|
.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
|
|
.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
|
|
.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EAGAIN?
|
|
.TP
|
|
.BR EDEADLK
|
|
.RB ( FUTEX_LOCK_PI ,
|
|
.BR FUTEX_TRYLOCK_PI )
|
|
The futex at
|
|
.I uaddr
|
|
is already locked by the caller.
|
|
.\"
|
|
.\" FIXME Is there not also an EDEADLK error case on 'uaddr2' for
|
|
.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
|
|
.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
|
|
.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EDEADLK?
|
|
.TP
|
|
.BR EDEADLK
|
|
.\" FIXME I reworded tglx's text somewhat; is the following okay?
|
|
.RB ( FUTEX_CMP_REQUEUE_PI )
|
|
While requeueing a waiter to the PI futex at
|
|
.IR uaddr2 ,
|
|
the kernel detected a deadlock.
|
|
.TP
|
|
.B EFAULT
|
|
A required pointer argument (i.e.,
|
|
.IR uaddr ,
|
|
.IR uaddr2 ,
|
|
or
|
|
.IR timeout )
|
|
did not point to a valid user-space address.
|
|
.TP
|
|
.B EINTR
|
|
A
|
|
.B FUTEX_WAIT
|
|
or
|
|
.B FUTEX_WAIT_BITSET
|
|
operation was interrupted by a signal (see
|
|
.BR signal (7))
|
|
or a spurious wakeup.
|
|
.\" FIXME
|
|
.\" Regarding the words "spurious wakeup" above, I received this
|
|
.\" bug report from Rich Felker:
|
|
.\"
|
|
.\" I see no code in the kernel whereby a "spurious wakeup", or anything
|
|
.\" other than interruption by a signal handler that's not SA_RESTART,
|
|
.\" can cause futex to fail with EINTR. In general, overloading of EINTR
|
|
.\" and/or spurious EINTRs from a syscall make it impossible to use that
|
|
.\" syscall for implementing any function where EINTR is a mandatory
|
|
.\" failure on interruption-by-signal, since there is no way for
|
|
.\" userspace to distinguish whether the EINTR occurred as a result of
|
|
.\" an interrupting signal or some other reason. The kernel folks have
|
|
.\" gone to great lengths to fix spurious EINTRs (see signal(7) for
|
|
.\" history), especially by non-interrupting signal handlers, including
|
|
.\" in futex, and allowing EINTR here would be contrary to that goal.
|
|
.\"
|
|
.\" It's my belief that the "or a spurious wakeup" text should simply be
|
|
.\" removed.
|
|
.\"
|
|
.\" The reason I'm raising this topic is its relevance to a thread on
|
|
.\" libc-alpha:
|
|
.\" [RFC] mutex destruction (#13690): problem description and workarounds
|
|
.\"
|
|
.\" The bug and mailing list discussions to which Rich refers are:
|
|
.\" https://sourceware.org/bugzilla/show_bug.cgi?id=13690
|
|
.\" https://sourceware.org/ml/libc-alpha/2014-12/threads.html#0001
|
|
.\"
|
|
.\" Can anyone comment on whether the words "spurious wakeup" are correct?
|
|
.\"
|
|
.TP
|
|
.B EINVAL
|
|
The operation in
|
|
.IR futex_op
|
|
is one of those that employs a timeout, but the supplied
|
|
.I timeout
|
|
argument was invalid
|
|
.RI ( tv_sec
|
|
was less than zero, or
|
|
.IR tv_nsec
|
|
was not less than 1000,000,000).
|
|
.TP
|
|
.B EINVAL
|
|
The operation specified in
|
|
.IR futex_op
|
|
employs one or both of the pointers
|
|
.I uaddr
|
|
and
|
|
.IR uaddr2 ,
|
|
but one of these does not point to a valid object\(emthat is,
|
|
the address is not four-byte-aligned.
|
|
.TP
|
|
.B EINVAL
|
|
.RB ( FUTEX_WAIT_BITSET ,
|
|
.BR FUTEX_WAKE_BITSET )
|
|
The bitset supplied in
|
|
.IR val3
|
|
is zero.
|
|
.TP
|
|
.B EINVAL
|
|
.RB ( FUTEX_REQUEUE ,
|
|
.\" FIXME tglx suggested adding this, but does this error really occur for
|
|
.\" FUTEX_REQUEUE? (The case where it occurs for FUTEX_CMP_REQUEUE_PI
|
|
.\" is obvious at the start of futex_requeue().)
|
|
.BR FUTEX_CMP_REQUEUE_PI )
|
|
.I uaddr
|
|
equals
|
|
.IR uaddr2
|
|
(i.e., an attempt was made to requeue to the same futex).
|
|
.TP
|
|
.BR EINVAL
|
|
.RB ( FUTEX_FD )
|
|
The signal number supplied in
|
|
.I val
|
|
is invalid.
|
|
.TP
|
|
.B EINVAL
|
|
.RB ( FUTEX_WAKE ,
|
|
.BR FUTEX_WAKE_OP ,
|
|
.BR FUTEX_WAKE_BITSET ,
|
|
.BR FUTEX_REQUEUE ,
|
|
.BR FUTEX_CMP_REQUEUE )
|
|
The kernel detected an inconsistency between the user-space state at
|
|
.I uaddr
|
|
and the kernel state\(emthat is, it detected a waiter which waits in
|
|
.BR FUTEX_LOCK_PI
|
|
on
|
|
.IR uaddr .
|
|
.TP
|
|
.B EINVAL
|
|
.RB ( FUTEX_LOCK_PI ,
|
|
.BR FUTEX_TRYLOCK_PI ,
|
|
.BR FUTEX_UNLOCK_PI )
|
|
The kernel detected an inconsistency between the user-space state at
|
|
.I uaddr
|
|
and the kernel state.
|
|
This indicates either state corruption
|
|
.\" FIXME tglx did not mention the "state corruption" for FUTEX_UNLOCK_PI.
|
|
.\" Does that case also apply for FUTEX_UNLOCK_PI?
|
|
or that the kernel found a waiter on
|
|
.I uaddr
|
|
which is waiting via
|
|
.BR FUTEX_WAIT
|
|
or
|
|
.BR FUTEX_WAIT_BITSET .
|
|
.TP
|
|
.B EINVAL
|
|
.RB ( FUTEX_CMP_REQUEUE_PI )
|
|
The kernel detected an inconsistency between the user-space state at
|
|
.I uaddr2
|
|
and the kernel state;
|
|
that is, the kernel detected a waiter which waits via
|
|
.BR FUTEX_WAIT
|
|
.\" FIXME tglx did not mention FUTEX_WAIT_BITSET here,
|
|
.\" but should that not also be included here?
|
|
on
|
|
.IR uaddr2 .
|
|
.TP
|
|
.B EINVAL
|
|
.RB ( FUTEX_CMP_REQUEUE_PI )
|
|
The kernel detected an inconsistency between the user-space state at
|
|
.I uaddr
|
|
and the kernel state;
|
|
that is, the kernel detected a waiter which waits via
|
|
.BR FUTEX_LOCK_PI ,
|
|
.BR FUTEX_WAIT ,
|
|
or
|
|
.BR FUTEX_WAIT_BITSET ,
|
|
on
|
|
.IR uaddr .
|
|
.TP
|
|
.B EINVAL
|
|
.RB ( FUTEX_CMP_REQUEUE_PI )
|
|
.TP
|
|
.B EINVAL
|
|
Invalid argument.
|
|
.TP
|
|
.BR ENOMEM
|
|
.RB ( FUTEX_LOCK_PI ,
|
|
.BR FUTEX_TRYLOCK_PI ,
|
|
.BR FUTEX_CMP_REQUEUE_PI )
|
|
The kernel could not allocate memory to hold state information.
|
|
.TP
|
|
.B ENFILE
|
|
.RB ( FUTEX_FD )
|
|
The system limit on the total number of open files has been reached.
|
|
.TP
|
|
.B ENOSYS
|
|
Invalid operation specified in
|
|
.IR futex_op .
|
|
.TP
|
|
.B ENOSYS
|
|
The
|
|
.BR FUTEX_CLOCK_REALTIME
|
|
option was specified in
|
|
.IR futex_op ,
|
|
but the accompanying operation was neither
|
|
.BR FUTEX_WAIT_BITSET
|
|
nor
|
|
.BR FUTEX_WAIT_REQUEUE_PI .
|
|
.TP
|
|
.BR ENOSYS
|
|
.RB ( FUTEX_LOCK_PI ,
|
|
.BR FUTEX_TRYLOCK_PI ,
|
|
.BR FUTEX_UNLOCK_PI ,
|
|
.BR FUTEX_CMP_REQUEUE_PI
|
|
.BR FUTEX_WAIT_REQUEUE_PI )
|
|
A run-time check determined that the operation not available.
|
|
The PI futex operations are not implemented on all architectures and
|
|
are not supported on some CPU variants.
|
|
.TP
|
|
.BR EPERM
|
|
.RB ( FUTEX_LOCK_PI ,
|
|
.BR FUTEX_TRYLOCK_PI ,
|
|
.BR FUTEX_CMP_REQUEUE_PI )
|
|
The caller is not allowed to attach itself to the futex at
|
|
.I uaddr
|
|
(for
|
|
.BR FUTEX_CMP_REQUEUE_PI :
|
|
the futex at
|
|
.IR uaddr2 ).
|
|
(This may be caused by a state corruption in user space.)
|
|
.\"
|
|
.\" FIXME Is there not also an EPERM error case on 'uaddr2' for
|
|
.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
|
|
.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
|
|
.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> EPERM?
|
|
.TP
|
|
.BR EPERM
|
|
.RB ( FUTEX_UNLOCK_PI )
|
|
The caller does not own the futex.
|
|
.TP
|
|
.BR ESRCH
|
|
.RB ( FUTEX_LOCK_PI ,
|
|
.BR FUTEX_TRYLOCK_PI )
|
|
.\" FIXME I reworded the following sentence a bit differently from
|
|
.\" tglx's formulation. Is it okay?
|
|
The thread ID in the futex at
|
|
.I uaddr
|
|
does not exist.
|
|
.\"
|
|
.\" FIXME Is there not also an ESRCH error case on 'uaddr2' for
|
|
.\" FUTEX_REQUEUE and FUTEX_CMP_REQUEUE via
|
|
.\" futex_requeue() ==> futex_proxy_trylock_atomic() ==>
|
|
.\" futex_lock_pi_atomic() ==> attach_to_pi_owner() ==> ESRCH?
|
|
.TP
|
|
.BR ESRCH
|
|
.RB ( FUTEX_CMP_REQUEUE_PI )
|
|
.\" FIXME I reworded the following sentence a bit differently from
|
|
.\" tglx's formulation. Is it okay?
|
|
The thread ID in the futex at
|
|
.I uaddr2
|
|
does not exist.
|
|
.TP
|
|
.B ETIMEDOUT
|
|
The operation in
|
|
.IR futex_op
|
|
employed the timeout specified in
|
|
.IR timeout ,
|
|
and the timeout expired before the operation completed.
|
|
.SH VERSIONS
|
|
.PP
|
|
Futexes were first made available in a stable kernel release
|
|
with Linux 2.6.0.
|
|
|
|
Initial futex support was merged in Linux 2.5.7 but with different semantics
|
|
from what was described above.
|
|
A four-argument system call with the semantics
|
|
described in this page was introduced in Linux 2.5.40.
|
|
In Linux 2.5.70, one argument
|
|
was added.
|
|
In Linux 2.6.7, a sixth argument was added\(emmessy, especially
|
|
on the s390 architecture.
|
|
.SH CONFORMING TO
|
|
This system call is Linux-specific.
|
|
.SH NOTES
|
|
.PP
|
|
To reiterate, bare futexes are not intended as an easy-to-use abstraction
|
|
for end-users.
|
|
(There is no wrapper function for this system call in glibc.)
|
|
Implementors are expected to be assembly literate and to have
|
|
read the sources of the futex user-space library referenced below.
|
|
.\" .SH AUTHORS
|
|
.\" .PP
|
|
.\" Futexes were designed and worked on by
|
|
.\" Hubertus Franke (IBM Thomas J. Watson Research Center),
|
|
.\" Matthew Kirkwood, Ingo Molnar (Red Hat)
|
|
.\" and Rusty Russell (IBM Linux Technology Center).
|
|
.\" This page written by bert hubert.
|
|
.SH SEE ALSO
|
|
.BR get_robust_list (2),
|
|
.BR restart_syscall (2),
|
|
.BR futex (7)
|
|
.PP
|
|
The following kernel source files:
|
|
.IP * 2
|
|
.I Documentation/pi-futex.txt
|
|
.IP *
|
|
.I Documentation/futex-requeue-pi.txt
|
|
.IP *
|
|
.I Documentation/locking/rt-mutex.txt
|
|
.IP *
|
|
.I Documentation/locking/rt-mutex-design.txt
|
|
.PP
|
|
\fIFuss, Futexes and Furwocks: Fast Userlevel Locking in Linux\fP
|
|
(proceedings of the Ottawa Linux Symposium 2002), online at
|
|
.br
|
|
.UR http://kernel.org\:/doc\:/ols\:/2002\:/ols2002-pages-479-495.pdf
|
|
.UE
|
|
|
|
\fIA futex overview and update\fP, 11 November 2009
|
|
.UR http://lwn.net/Articles/360699/
|
|
.UE
|
|
|
|
\fIRequeue-PI: Making Glibc Condvars PI-Aware\fP
|
|
(2009 Real-Time Linux Workshop)
|
|
.UR http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
|
|
.UE
|
|
|
|
\fIFutexes Are Tricky\fP (updated in 2011), Ulrich Drepper
|
|
.UR http://www.akkadia.org/drepper/futex.pdf
|
|
.UE
|
|
.PP
|
|
Futex example library, futex-*.tar.bz2 at
|
|
.br
|
|
.UR ftp://ftp.kernel.org\:/pub\:/linux\:/kernel\:/people\:/rusty/
|
|
.UE
|
|
.\"
|
|
.\" FIXME Are there any other resources that should be listed
|
|
.\" in the SEE ALSO section?
|