2016-12-29 07:15:17 +00:00
|
|
|
.\" Copyright (c) 2016, IBM Corporation.
|
|
|
|
.\" Written by Mike Rapoport <rppt@linux.vnet.ibm.com>
|
2017-01-04 22:29:03 +00:00
|
|
|
.\" and Copyright (C) 2017 Michael Kerrisk <mtk.manpages@gmail.com>
|
2016-12-29 07:15:17 +00:00
|
|
|
.\"
|
|
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\" %%%LICENSE_END
|
|
|
|
.\"
|
2016-12-30 10:27:56 +00:00
|
|
|
.\" FIXME Need to describe close(2) semantics for userfaulfd file descriptor
|
|
|
|
.\"
|
2016-12-29 07:15:17 +00:00
|
|
|
.TH USERFAULTFD 2 2016-12-12 "Linux" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
2017-01-04 22:44:30 +00:00
|
|
|
userfaultfd \- create a file descriptor for handling page faults in user space
|
2016-12-29 07:15:17 +00:00
|
|
|
.SH SYNOPSIS
|
|
|
|
.nf
|
|
|
|
.B #include <sys/types.h>
|
2017-01-06 03:55:33 +00:00
|
|
|
.B #include <linux/userfaultfd.h>
|
2016-12-29 07:15:17 +00:00
|
|
|
.sp
|
|
|
|
.BI "int userfaultfd(int " flags );
|
|
|
|
.fi
|
|
|
|
.PP
|
|
|
|
.IR Note :
|
|
|
|
There is no glibc wrapper for this system call; see NOTES.
|
|
|
|
.SH DESCRIPTION
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR userfaultfd ()
|
|
|
|
creates a new userfaultfd object that can be used for delegation of page-fault
|
|
|
|
handling to a user-space application,
|
|
|
|
and returns a file descriptor that refers to the new object.
|
|
|
|
The new userfaultfd object is configured using
|
2016-12-29 07:15:17 +00:00
|
|
|
.BR ioctl (2).
|
2016-12-29 11:50:09 +00:00
|
|
|
|
|
|
|
Once the userfaultfd object is configured, the application can use
|
2016-12-29 07:15:17 +00:00
|
|
|
.BR read (2)
|
|
|
|
to receive userfaultfd notifications.
|
2016-12-29 11:50:09 +00:00
|
|
|
The reads from userfaultfd may be blocking or non-blocking,
|
|
|
|
depending on the value of
|
2016-12-29 07:15:17 +00:00
|
|
|
.I flags
|
|
|
|
used for the creation of the userfaultfd or subsequent calls to
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR fcntl (2).
|
2016-12-29 07:15:17 +00:00
|
|
|
|
|
|
|
The following values may be bitwise ORed in
|
|
|
|
.IR flags
|
|
|
|
to change the behavior of
|
|
|
|
.BR userfaultfd ():
|
|
|
|
.TP
|
|
|
|
.BR O_CLOEXEC
|
2016-12-29 11:50:09 +00:00
|
|
|
Enable the close-on-exec flag for the new userfaultfd file descriptor.
|
2016-12-29 07:15:17 +00:00
|
|
|
See the description of the
|
|
|
|
.B O_CLOEXEC
|
|
|
|
flag in
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR open (2).
|
2016-12-29 07:15:17 +00:00
|
|
|
.TP
|
|
|
|
.BR O_NONBLOCK
|
2016-12-29 11:50:09 +00:00
|
|
|
Enables non-blocking operation for the userfaultfd object.
|
2016-12-29 07:15:17 +00:00
|
|
|
See the description of the
|
|
|
|
.BR O_NONBLOCK
|
|
|
|
flag in
|
|
|
|
.BR open (2).
|
|
|
|
.\"
|
2017-01-04 22:29:03 +00:00
|
|
|
.SS Usage
|
|
|
|
The userfaultfd mechanism is designed to allow a thread in a multithreaded
|
|
|
|
program to perform user-space paging for the other threads in the process.
|
|
|
|
When a page fault occurs for one of the regions registered
|
|
|
|
to the userfaultfd object,
|
|
|
|
the faulting thread is put to sleep and
|
|
|
|
an event is generated that can be read via the userfaultfd file descriptor.
|
|
|
|
The fault-handling thread reads events from this file descriptor and services
|
|
|
|
them using the operations described in
|
|
|
|
.BR ioctl_userfaultfd (2).
|
|
|
|
When servicing the page fault events,
|
|
|
|
the fault-handling thread can trigger a wake-up for the sleeping thread.
|
|
|
|
.\"
|
2016-12-29 07:15:17 +00:00
|
|
|
.SS Userfaultfd operation
|
|
|
|
After the userfaultfd object is created with
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR userfaultfd (),
|
|
|
|
the application must enable it using the
|
|
|
|
.B UFFDIO_API
|
|
|
|
.BR ioctl (2)
|
|
|
|
operation.
|
|
|
|
This operation allows a handshake between the kernel and user space
|
|
|
|
to determine the API version and supported features.
|
2016-12-29 12:11:53 +00:00
|
|
|
This operation must be performed before any of the other
|
|
|
|
.BR ioctl (2)
|
|
|
|
operations described below (or those operations fail with the
|
|
|
|
.BR EINVAL
|
|
|
|
error).
|
|
|
|
|
2016-12-29 11:50:09 +00:00
|
|
|
After a successful
|
|
|
|
.B UFFDIO_API
|
|
|
|
operation,
|
|
|
|
the application then registers memory address ranges using the
|
|
|
|
.B UFFDIO_REGISTER
|
|
|
|
.BR ioctl (2)
|
|
|
|
operation.
|
|
|
|
After successful completion of a
|
|
|
|
.B UFFDIO_REGISTER
|
|
|
|
operation,
|
|
|
|
a page fault occurring in the requested memory range, and satisfying
|
|
|
|
the mode defined at the registration time, will be forwarded by the kernel to
|
|
|
|
the user-space application.
|
|
|
|
The application can then use the
|
|
|
|
.B UFFDIO_COPY
|
2016-12-29 07:15:17 +00:00
|
|
|
or
|
2016-12-29 11:50:09 +00:00
|
|
|
.B UFFDIO_ZERO
|
|
|
|
.BR ioctl (2)
|
|
|
|
operations to resolve the page fault.
|
2016-12-29 07:15:17 +00:00
|
|
|
|
2016-12-30 10:06:33 +00:00
|
|
|
Details of the various
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR ioctl (2)
|
2016-12-30 10:06:33 +00:00
|
|
|
operations can be found in
|
|
|
|
.BR ioctl_userfaultfd (2).
|
2016-12-29 19:41:58 +00:00
|
|
|
|
2016-12-30 10:06:33 +00:00
|
|
|
Currently, userfaultfd can be used only with anonymous private memory
|
|
|
|
mappings.
|
2017-01-04 22:29:03 +00:00
|
|
|
.\"
|
|
|
|
.SS Reading from the userfaultfd structure
|
|
|
|
.\" FIXME are the details below correct
|
|
|
|
Each
|
|
|
|
.BR read (2)
|
|
|
|
from the userfaultfd file descriptor returns one or more
|
|
|
|
.I uffd_msg
|
|
|
|
structures, each of which describes a page-fault event:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
.in +4n
|
|
|
|
struct uffd_msg {
|
|
|
|
__u8 event; /* Type of event */
|
|
|
|
...
|
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
__u64 flags; /* Flags describing fault */
|
|
|
|
__u64 address; /* Faulting address */
|
|
|
|
} pagefault;
|
|
|
|
...
|
|
|
|
} arg;
|
|
|
|
|
|
|
|
/* Padding fields omitted */
|
|
|
|
} __packed;
|
|
|
|
.in
|
|
|
|
.fi
|
|
|
|
|
|
|
|
If multiple events are available and the supplied buffer is large enough,
|
|
|
|
.BR read (2)
|
|
|
|
returns as many events as will fit in the supplied buffer.
|
|
|
|
If the buffer supplied to
|
|
|
|
.BR read (2)
|
|
|
|
is smaller than the size of the
|
|
|
|
.I uffd_msg
|
|
|
|
structure, the
|
|
|
|
.BR read (2)
|
|
|
|
fails with the error
|
|
|
|
.BR EINVAL .
|
|
|
|
|
|
|
|
The fields set in the
|
|
|
|
.I uffd_msg
|
|
|
|
structure are as follows:
|
|
|
|
.TP
|
|
|
|
.I event
|
|
|
|
The type of event.
|
|
|
|
Currently, only one value can appear in this field:
|
|
|
|
.BR UFFD_EVENT_PAGEFAULT ,
|
2017-01-04 22:44:30 +00:00
|
|
|
which indicates a page-fault event.
|
2017-01-04 22:29:03 +00:00
|
|
|
.TP
|
|
|
|
.I address
|
|
|
|
The address that triggered the page fault.
|
|
|
|
.TP
|
|
|
|
.I flags
|
|
|
|
A bit mask of flags that describe the event.
|
|
|
|
For
|
|
|
|
.BR UFFD_EVENT_PAGEFAULT ,
|
|
|
|
the following flag may appear:
|
|
|
|
.RS
|
|
|
|
.TP
|
|
|
|
.B UFFD_PAGEFAULT_FLAG_WRITE
|
|
|
|
If the address is in a range that was registered with the
|
|
|
|
.B UFFDIO_REGISTER_MODE_MISSING
|
|
|
|
flag (see
|
|
|
|
.BR ioctl_userfaultfd (2))
|
|
|
|
and this flag is set, this a write fault;
|
|
|
|
otherwise it is a read fault.
|
|
|
|
.\"
|
|
|
|
.\" UFFD_PAGEFAULT_FLAG_WP is not yet supported.
|
|
|
|
.RE
|
|
|
|
.PP
|
2017-01-05 18:53:41 +00:00
|
|
|
A
|
|
|
|
.BR read (2)
|
|
|
|
on a userfaultfd file descriptor can fail with the following errors:
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
The userfaultfd object has not yet been enabled using the
|
|
|
|
.BR UFFDIO_API
|
|
|
|
.BR ioctl (2)
|
|
|
|
operation
|
|
|
|
.PP
|
2017-01-04 22:29:03 +00:00
|
|
|
The userfaultfd file descriptor can be monitored with
|
|
|
|
.BR poll (2),
|
|
|
|
.BR select (2),
|
|
|
|
and
|
|
|
|
.BR epoll (7).
|
|
|
|
When events are available, the file descriptor indicates as readable.
|
2017-01-05 19:58:16 +00:00
|
|
|
.\" FIXME But, it seems, the object must be created with O_NONBLOCK.
|
|
|
|
.\" What is the rationale for this requirement?
|
2016-12-29 07:15:17 +00:00
|
|
|
.SH RETURN VALUE
|
2016-12-29 11:50:09 +00:00
|
|
|
On success,
|
|
|
|
.BR userfaultfd ()
|
|
|
|
returns a new file descriptor that refers to the userfaultfd object.
|
2016-12-29 07:15:17 +00:00
|
|
|
On error, \-1 is returned, and
|
|
|
|
.I errno
|
|
|
|
is set appropriately.
|
|
|
|
.SH ERRORS
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
An unsupported value was specified in
|
|
|
|
.IR flags .
|
|
|
|
.TP
|
|
|
|
.BR EMFILE
|
|
|
|
The per-process limit on the number of open file descriptors has been
|
|
|
|
reached
|
|
|
|
.TP
|
|
|
|
.B ENFILE
|
|
|
|
The system-wide limit on the total number of open files has been
|
|
|
|
reached.
|
|
|
|
.TP
|
|
|
|
.B ENOMEM
|
|
|
|
Insufficient kernel memory was available.
|
2017-01-04 08:00:29 +00:00
|
|
|
.SH VERSIONS
|
|
|
|
The
|
|
|
|
.BR userfaultfd ()
|
|
|
|
system call first appeared in Linux 4.3.
|
2016-12-29 07:15:17 +00:00
|
|
|
.SH CONFORMING TO
|
|
|
|
.BR userfaultfd ()
|
|
|
|
is Linux-specific and should not be used in programs intended to be
|
|
|
|
portable.
|
|
|
|
.SH NOTES
|
|
|
|
Glibc does not provide a wrapper for this system call; call it using
|
|
|
|
.BR syscall (2).
|
|
|
|
.SH SEE ALSO
|
|
|
|
.BR fcntl (2),
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR ioctl (2),
|
2016-12-30 10:06:33 +00:00
|
|
|
.BR ioctl_userfaultfd (2),
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR mmap (2)
|
2016-12-29 07:15:17 +00:00
|
|
|
|
|
|
|
.IR Documentation/vm/userfaultfd.txt
|
|
|
|
in the Linux kernel source tree
|
|
|
|
|