userfaultfd.2: Add write-protect mode

Write-protect mode is supported starting from Linux 5.7.

Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Peter Xu 2021-04-05 15:13:06 +02:00 committed by Michael Kerrisk
parent e70f957d81
commit 4b338b38e6
1 changed files with 104 additions and 4 deletions

View File

@ -78,6 +78,32 @@ all memory ranges that were registered with the object are unregistered
and unread events are flushed.
.\"
.PP
Userfaultfd supports two modes of registration:
.TP
.BR UFFDIO_REGISTER_MODE_MISSING " (since 4.10)"
When registered with
.B UFFDIO_REGISTER_MODE_MISSING
mode, the userspace will receive a page fault message
when a missing page is accessed.
The faulted thread will be stopped from execution until the page fault is
resolved from the userspace by either an
.B UFFDIO_COPY
or an
.B UFFDIO_ZEROPAGE
ioctl.
.TP
.BR UFFDIO_REGISTER_MODE_WP " (since 5.7)"
When registered with
.B UFFDIO_REGISTER_MODE_WP
mode, the userspace will receive a page fault message
when a write-protected page is written.
The faulted thread will be stopped from execution
until the userspace write-unprotect the page using an
.B UFFDIO_WRITEPROTECT
ioctl.
.PP
Multiple modes can be enabled at the same time for the same memory range.
.PP
Since Linux 4.14, userfaultfd page fault message can selectively embed
faulting thread ID information into the fault message.
One needs to enable this feature explicitly using the
@ -107,7 +133,7 @@ the process that monitors userfaultfd and handles page faults
needs to be aware of the changes in the virtual memory layout
of the faulting process to avoid memory corruption.
.PP
Starting from Linux 4.11,
Since Linux 4.11,
userfaultfd can also notify the fault-handling threads about changes
in the virtual memory layout of the faulting process.
In addition, if the faulting process invokes
@ -144,6 +170,17 @@ single threaded non-cooperative userfaultfd manager implementations.
.\" and limitations remaining in 4.11
.\" Maybe it's worth adding a dedicated sub-section...
.\"
.PP
Since Linux 5.7, userfaultfd is able to do
synchronous page dirty tracking using the new write-protect register mode.
One should check against the feature bit
.B UFFD_FEATURE_PAGEFAULT_FLAG_WP
before using this feature.
Similar to the original userfaultfd missing mode, the write-protect mode will
generate an userfaultfd message when the protected page is written.
The user needs to resolve the page fault by unprotecting the faulted page and
kick the faulted thread to continue.
For more information, please refer to "Userfaultfd write-protect mode" section.
.SS Userfaultfd operation
After the userfaultfd object is created with
.BR userfaultfd (),
@ -179,7 +216,7 @@ or
.BR ioctl (2)
operations to resolve the page fault.
.PP
Starting from Linux 4.14, if the application sets the
Since Linux 4.14, if the application sets the
.B UFFD_FEATURE_SIGBUS
feature bit using the
.B UFFDIO_API
@ -219,6 +256,65 @@ userfaultfd can be used only with anonymous private memory mappings.
Since Linux 4.11,
userfaultfd can be also used with hugetlbfs and shared memory mappings.
.\"
.SS Userfaultfd write-protect mode (since 5.7)
Since Linux 5.7, userfaultfd supports write-protect mode.
The user needs to first check availability of this feature using
.B UFFDIO_API
ioctl against the feature bit
.B UFFD_FEATURE_PAGEFAULT_FLAG_WP
before using this feature.
.PP
To register with userfaultfd write-protect mode, the user needs to initiate the
.B UFFDIO_REGISTER
ioctl with mode
.B UFFDIO_REGISTER_MODE_WP
set.
Note that it's legal to monitor the same memory range with multiple modes.
For example, the user can do
.B UFFDIO_REGISTER
with the mode set to
.BR "UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_WP" .
When there is only
.B UFFDIO_REGISTER_MODE_WP
registered, the userspace will
.I not
receive any message when a missing page is written.
Instead, the userspace will only receive a write-protect page fault message
when an existing but write-protected page got written.
.PP
After the
.B UFFDIO_REGISTER
ioctl completed with
.B UFFDIO_REGISTER_MODE_WP
mode set,
the user can write-protect any existing memory within the range using the ioctl
.B UFFDIO_WRITEPROTECT
where
.I uffdio_writeprotect.mode
should be set to
.BR UFFDIO_WRITEPROTECT_MODE_WP .
.PP
When a write-protect event happens,
the userspace will receive a page fault message whose
.I uffd_msg.pagefault.flags
will be with
.B UFFD_PAGEFAULT_FLAG_WP
flag set.
Note: since only writes can trigger such kind of fault,
write-protect messages will always be with
.B UFFD_PAGEFAULT_FLAG_WRITE
bit set too along with bit
.BR UFFD_PAGEFAULT_FLAG_WP .
.PP
To resolve a write-protection page fault, the user should initiate another
.B UFFDIO_WRITEPROTECT
ioctl, whose
.I uffd_msg.pagefault.flags
should have the flag
.B UFFDIO_WRITEPROTECT_MODE_WP
cleared upon the faulted page or range.
.PP
Write-protect mode only supports private anonymous memory.
.SS Reading from the userfaultfd structure
Each
.BR read (2)
@ -364,8 +460,12 @@ flag (see
.BR ioctl_userfaultfd (2))
and this flag is set, this a write fault;
otherwise it is a read fault.
.\"
.\" UFFD_PAGEFAULT_FLAG_WP is not yet supported.
.TP
.B UFFD_PAGEFAULT_FLAG_WP
If the address is in a range that was registered with the
.B UFFDIO_REGISTER_MODE_WP
flag, when this bit is set it means it's a write-protect fault.
Otherwise it's a page missing fault.
.RE
.TP
.I pagefault.feat.pid