futex.2: Fixes after review comments from Thomas Gleixner

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2015-08-07 22:41:53 +02:00
parent 30239c10a8
commit c3875d1d3a
1 changed files with 76 additions and 50 deletions

View File

@ -913,22 +913,58 @@ The operation checks the value of the futex word at the address
.IR uaddr . .IR uaddr .
If the value is 0, then the kernel tries to atomically set If the value is 0, then the kernel tries to atomically set
the futex value to the caller's TID. the futex value to the caller's TID.
.\" FIXME What would be the cause(s) of failure referred to If the futex word's value is nonzero,
.\" in the following sentence?
If that fails,
or the futex word's value is nonzero,
the kernel atomically sets the the kernel atomically sets the
.B FUTEX_WAITERS .B FUTEX_WAITERS
bit, which signals the futex owner that it cannot unlock the futex in bit, which signals the futex owner that it cannot unlock the futex in
user space atomically by setting the futex value to 0. user space atomically by setting the futex value to 0.
After that, the kernel tries to find the thread which is .\" tglx (July 2015):
associated with the owner TID, .\" The operation here is similar to the FUTEX_WAIT logic. When the user
.\" FIXME Could I get a bit more detail on the next two lines? .\" space atomic acquire does not succeed because the futex value was non
.\" What is "creates or reuses kernel state" about? .\" zero, then the waiter goes into the kernel, takes the kernel internal
.\" (I think this needs to be clearer in the page) .\" lock and retries the acquisition under the lock. If the acquisition
creates or reuses kernel state on behalf of the owner .\" does not succeed either, then it sets the FUTEX_WAITERS bit, to signal
and attaches the waiter to it. .\" the lock owner that it needs to go into the kernel. Here is the pseudo
.\" code:
.\"
.\" lock(kernel_lock);
.\" retry:
.\"
.\" /*
.\" * Owner might have unlocked in userspace before we
.\" * were able to set the waiter bit.
.\" */
.\" if (atomic_acquire(futex) == SUCCESS) {
.\" unlock(kernel_lock());
.\" return 0;
.\" }
.\"
.\" /*
.\" * Owner might have unlocked after the above atomic_acquire()
.\" * attempt.
.\" */
.\" if (atomic_set_waiters_bit(futex) != SUCCESS)
.\" goto retry;
.\"
.\" queue_waiter();
.\" unlock(kernel_lock);
.\" block();
.\"
After that, the kernel:
.RS
.IP 1. 3
Tries to find the thread which is associated with the owner TID.
.IP 2.
Creates or reuses kernel state on behalf of the owner.
(If this is the first waiter, there is no kernel state for this
futex, so kernel state is created by locking the RT-mutex
and the futex owner is made the owner of the RT-mutex.
If there are existing waiters, then the existing state is reused.)
.IP 3.
Attaches the waiter to it
(i.e., the waiter is enqueued on the RT-mutex waiter list).
.RE
.IP
If more than one waiter exists, If more than one waiter exists,
the enqueueing of the waiter is in descending priority order. the enqueueing of the waiter is in descending priority order.
(For information on priority ordering, see the discussion of the (For information on priority ordering, see the discussion of the
@ -946,16 +982,11 @@ policy) or the waiter's priority (if the waiter is scheduled under the
or or
.BR SCHED_FIFO .BR SCHED_FIFO
policy). policy).
.\" This inheritance follows the lock chain in the case of nested locking
.\" FIXME Could I get some help translating the next sentence into (i.e., task 1 blocks on lock A, held by task 2,
.\" something that user-space developers (and I) can understand? while task 2 blocks on lock B, held by task 3)
.\" In particular, what are "nested locks" in this context? and performs deadlock detection.
This inheritance follows the lock chain in the case of
nested locking and performs deadlock detection.
.\" FIXME tglx said "The timeout argument is handled as described in
.\" FUTEX_WAIT." However, it appears to me that this is not right.
.\" Is the following formulation correct?
The The
.I timeout .I timeout
argument provides a timeout for the lock attempt. argument provides a timeout for the lock attempt.
@ -980,21 +1011,19 @@ arguments are ignored.
.\" commit c87e2837be82df479a6bae9f155c43516d2feebc .\" commit c87e2837be82df479a6bae9f155c43516d2feebc
This operation tries to acquire the futex at This operation tries to acquire the futex at
.IR uaddr . .IR uaddr .
It is invoked when a user-space atomic acquire did not
succeed because the futex word was not 0.
The trylock in kernel might succeed because the futex word
contains stale state
.RB ( FUTEX_WAITERS
and/or
.BR FUTEX_OWNER_DIED ).
This can happen when the owner of the futex died.
.\" FIXME I think it would be helpful here to say a few more words about .\" FIXME I think it would be helpful here to say a few more words about
.\" the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI. .\" the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI.
.\" Can someone propose something? .\" Can someone propose something?
.\" .\"
.\" FIXME(Torvald) Additionally, we claim above that just FUTEX_WAITERS
.\" is never an allowed state.
It deals with the situation where the TID value at
.I uaddr
is 0, but the
.B FUTEX_WAITERS
bit is set.
.\" FIXME How does the situation in the previous sentence come about?
.\" Probably it would be helpful to say something about that in
.\" the man page.
.\" FIXME And *how* does FUTEX_TRYLOCK_PI deal with this situation?
User space cannot handle this condition in a race-free manner User space cannot handle this condition in a race-free manner
The The
@ -1083,31 +1112,28 @@ arguments serve the same purposes as for
.BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)" .BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
.\" commit 52400ba946759af28442dee6265c5c0180ac7122 .\" commit 52400ba946759af28442dee6265c5c0180ac7122
.\" .\"
.\" FIXME I find the next sentence (from tglx) pretty hard to grok. Wait on a non-PI futex at
.\" Could someone explain it a bit more?
Wait operation to wait on a non-PI futex at
.I uaddr .I uaddr
and potentially be requeued onto a PI futex at and potentially be requeued (via a
.BR FUTEX_CMP_REQUEUE_PI
operation in another task) onto a PI futex at
.IR uaddr2 . .IR uaddr2 .
The wait operation on The wait operation on
.I uaddr .I uaddr
is the same as is the same as for
.BR FUTEX_WAIT . .BR FUTEX_WAIT .
.\"
.\" FIXME I'm not quite clear on the meaning of the following sentence.
.\" Is this trying to say that while blocked in a
.\" FUTEX_WAIT_REQUEUE_PI, it could happen that another
.\" task does a FUTEX_WAKE on uaddr that simply causes
.\" a normal wake, with the result that the FUTEX_WAIT_REQUEUE_PI
.\" does not complete? What happens then to the FUTEX_WAIT_REQUEUE_PI
.\" opertion? Does it remain blocked, or does it unblock
.\" In which case, what does user space see?
The waiter can be removed from the wait on The waiter can be removed from the wait on
.I uaddr .I uaddr
via
.BR FUTEX_WAKE
without requeueing on without requeueing on
.IR uaddr2 . .IR uaddr2
via a
.BR FUTEX_WAIT
operation in another task.
In this case, the
.BR FUTEX_WAIT_REQUEUE_PI
operation returns with the error
.BR EWOULDBLOCK .
If If
.I timeout .I timeout
@ -1304,7 +1330,7 @@ The futex word at
is already locked by the caller. is already locked by the caller.
.TP .TP
.BR EDEADLK .BR EDEADLK
.\" FIXME XXX I see that kernel/locking/rtmutex.c uses EDEADLK in some .\" FIXME . I see that kernel/locking/rtmutex.c uses EDEADLK in some
.\" places, and EDEADLOCK in others. On almost all architectures .\" places, and EDEADLOCK in others. On almost all architectures
.\" these constants are synonymous. Is there a reason that both .\" these constants are synonymous. Is there a reason that both
.\" names are used? .\" names are used?