futex.2: Fixes after review comments from Thomas Gleixner

Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2015-08-07 22:41:53 +02:00 · 2015-08-07 22:41:53 +02:00 · c3875d1d3a
parent 30239c10a8
commit c3875d1d3a
1 changed files with 76 additions and 50 deletions
--- a/man2/futex.2
+++ b/man2/futex.2
@ -913,22 +913,58 @@ The operation checks the value of the futex word at the address
 .IR uaddr .
 If the value is 0, then the kernel tries to atomically set
 the futex value to the caller's TID.
-.\" FIXME What would be the cause(s) of failure referred to
-.\"       in the following sentence?
-If that fails,
-or the futex word's value is nonzero,
+If the futex word's value is nonzero,
 the kernel atomically sets the
 .B FUTEX_WAITERS
 bit, which signals the futex owner that it cannot unlock the futex in
 user space atomically by setting the futex value to 0.
-After that, the kernel tries to find the thread which is
-associated with the owner TID,
-.\" FIXME Could I get a bit more detail on the next two lines?
-.\"       What is "creates or reuses kernel state" about?
-.\"       (I think this needs to be clearer in the page)
-creates or reuses kernel state on behalf of the owner
-and attaches the waiter to it.
-
+.\" tglx (July 2015):
+.\"     The operation here is similar to the FUTEX_WAIT logic. When the user
+.\"     space atomic acquire does not succeed because the futex value was non
+.\"     zero, then the waiter goes into the kernel, takes the kernel internal
+.\"     lock and retries the acquisition under the lock. If the acquisition
+.\"     does not succeed either, then it sets the FUTEX_WAITERS bit, to signal
+.\"     the lock owner that it needs to go into the kernel. Here is the pseudo
+.\"     code:
+.\"
+.\"     	lock(kernel_lock);
+.\"     retry:
+.\"     	
+.\"     	/*
+.\"     	 * Owner might have unlocked in userspace before we
+.\"     	 * were able to set the waiter bit.
+.\"              */
+.\"             if (atomic_acquire(futex) == SUCCESS) {
+.\"     	   unlock(kernel_lock());
+.\"     	   return 0;
+.\"     	}
+.\"
+.\"     	/*
+.\"     	 * Owner might have unlocked after the above atomic_acquire()
+.\"     	 * attempt.
+.\"     	 */
+.\"     	if (atomic_set_waiters_bit(futex) != SUCCESS)
+.\"     	   goto retry;
+.\"
+.\"     	queue_waiter();
+.\"     	unlock(kernel_lock);
+.\"     	block();
+.\"
+After that, the kernel:
+.RS
+.IP 1. 3
+Tries to find the thread which is associated with the owner TID.
+.IP 2.
+Creates or reuses kernel state on behalf of the owner.
+(If this is the first waiter, there is no kernel state for this
+futex, so kernel state is created by locking the RT-mutex
+and the futex owner is made the owner of the RT-mutex.
+If there are existing waiters, then the existing state is reused.)
+.IP 3.
+Attaches the waiter to it
+(i.e., the waiter is enqueued on the RT-mutex waiter list).
+.RE
+.IP
 If more than one waiter exists,
 the enqueueing of the waiter is in descending priority order.
 (For information on priority ordering, see the discussion of the
@ -946,16 +982,11 @@ policy) or the waiter's priority (if the waiter is scheduled under the
 or
 .BR SCHED_FIFO
 policy).
-.\"
-.\" FIXME Could I get some help translating the next sentence into
-.\"       something that user-space developers (and I) can understand?
-.\"       In particular, what are "nested locks" in this context?
-This inheritance follows the lock chain in the case of
-nested locking and performs deadlock detection.
+This inheritance follows the lock chain in the case of nested locking
+(i.e., task 1 blocks on lock A, held by task 2,
+while task 2 blocks on lock B, held by task 3)
+and performs deadlock detection.

-.\" FIXME tglx said "The timeout argument is handled as described in
-.\"       FUTEX_WAIT." However, it appears to me that this is not right.
-.\"       Is the following formulation correct?
 The
 .I timeout
 argument provides a timeout for the lock attempt.
@ -980,21 +1011,19 @@ arguments are ignored.
 .\" commit c87e2837be82df479a6bae9f155c43516d2feebc
 This operation tries to acquire the futex at
 .IR uaddr .
+It is invoked when a user-space atomic acquire did not
+succeed because the futex word was not 0.
+
+The trylock in kernel might succeed because the futex word
+contains stale state
+.RB ( FUTEX_WAITERS
+and/or
+.BR FUTEX_OWNER_DIED ).
+This can happen when the owner of the futex died.
 .\" FIXME I think it would be helpful here to say a few more words about
 .\"       the difference(s) between FUTEX_LOCK_PI and FUTEX_TRYLOCK_PI.
 .\"       Can someone propose something?
 .\"
-.\" FIXME(Torvald)  Additionally, we claim above that just FUTEX_WAITERS
-.\"       is never an allowed state.
-It deals with the situation where the TID value at
-.I uaddr
-is 0, but the
-.B FUTEX_WAITERS
-bit is set.
-.\" FIXME How does the situation in the previous sentence come about?
-.\"       Probably it would be helpful to say something about that in
-.\"       the man page.
-.\" FIXME And *how* does FUTEX_TRYLOCK_PI deal with this situation?
 User space cannot handle this condition in a race-free manner

 The
@ -1083,31 +1112,28 @@ arguments serve the same purposes as for
 .BR FUTEX_WAIT_REQUEUE_PI " (since Linux 2.6.31)"
 .\" commit 52400ba946759af28442dee6265c5c0180ac7122
 .\"
-.\" FIXME I find the next sentence (from tglx) pretty hard to grok.
-.\"       Could someone explain it a bit more?
-Wait operation to wait on a non-PI futex at
+Wait on a non-PI futex at
 .I uaddr
-and potentially be requeued onto a PI futex at
+and potentially be requeued (via a
+.BR FUTEX_CMP_REQUEUE_PI
+operation in another task) onto a PI futex at
 .IR uaddr2 .
 The wait operation on
 .I uaddr
-is the same as
+is the same as for
 .BR FUTEX_WAIT .
-.\"
-.\" FIXME I'm not quite clear on the meaning of the following sentence.
-.\"       Is this trying to say that while blocked in a
-.\"       FUTEX_WAIT_REQUEUE_PI, it could happen that another
-.\"       task does a FUTEX_WAKE on uaddr that simply causes
-.\"       a normal wake, with the result that the FUTEX_WAIT_REQUEUE_PI
-.\"       does not complete? What happens then to the FUTEX_WAIT_REQUEUE_PI
-.\"       opertion? Does it remain blocked, or does it unblock
-.\"       In which case, what does user space see?
+
 The waiter can be removed from the wait on
 .I uaddr
-via
-.BR FUTEX_WAKE
 without requeueing on
-.IR uaddr2 .
+.IR uaddr2
+via a
+.BR FUTEX_WAIT
+operation in another task.
+In this case, the
+.BR FUTEX_WAIT_REQUEUE_PI
+operation returns with the error
+.BR EWOULDBLOCK .

 If
 .I timeout
@ -1304,7 +1330,7 @@ The futex word at
 is already locked by the caller.
 .TP
 .BR EDEADLK
-.\" FIXME XXX I see that kernel/locking/rtmutex.c uses EDEADLK in some
+.\" FIXME . I see that kernel/locking/rtmutex.c uses EDEADLK in some
 .\"       places, and EDEADLOCK in others. On almost all architectures
 .\"       these constants are synonymous. Is there a reason that both
 .\"       names are used?