mmap.2: Clarify MAP_LOCKED semantics

MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2)
since it has been introduced.
mlock(2) fails if the memory range cannot get populated to
guarantee that no future major faults will happen on the range.
mmap(MAP_LOCKED) on the other hand silently succeeds even if
the range was populated only partially.

Fixing this subtle difference in the kernel is rather awkward
because the memory population happens after mm locks have been
dropped and so the cleanup before returning failure (munlock)
could operate on something else than the originally mapped area.

E.g. speculative userspace page fault handler catching SEGV and
doing mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion
of a racing mmap and lead to lost data. Although it is not clear
whether such a usage would be valid, mmap page doesn't explicitly
describe requirements for threaded applications so we cannot
exclude this possibility.

This patch makes the semantic of MAP_LOCKED explicit and suggests
using mmap + mlock as the only way to guarantee no later major
page faults.

Reviewed-by: Eric B Munson <emunson@akamai.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michal Hocko 2015-05-14 15:27:13 +02:00 committed by Michael Kerrisk
parent 79ae0b1fbd
commit 7e3786bcdc
1 changed files with 12 additions and 1 deletions

View File

@ -261,8 +261,19 @@ can be discovered by listing the subdirectories in
.IR /sys/kernel/mm/hugepages .
.TP
.BR MAP_LOCKED " (since Linux 2.5.37)"
Lock the pages of the mapped region into memory in the manner of
Mark the mmaped region to be locked in the same way as
.BR mlock (2).
This implementation will try to populate (prefault) the whole range but
the mmap call doesn't fail with
.B ENOMEM
if this fails. Therefore major faults might happen later on. So the semantic
is not as strong as
.BR mlock (2).
.BR mmap (2)
+
.BR mlock (2)
should be used when major faults are not acceptable after the initialization
of the mapping.
This flag is ignored in older kernels.
.\" If set, the mapped pages will not be swapped out.
.TP