2016-12-29 07:15:17 +00:00
|
|
|
.\" Copyright (c) 2016, IBM Corporation.
|
|
|
|
.\" Written by Mike Rapoport <rppt@linux.vnet.ibm.com>
|
|
|
|
.\"
|
|
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\" %%%LICENSE_END
|
|
|
|
.\"
|
2016-12-29 12:07:14 +00:00
|
|
|
.\" FIXME Need to mention poll/select/epoll
|
|
|
|
.\"
|
2016-12-29 07:15:17 +00:00
|
|
|
.TH USERFAULTFD 2 2016-12-12 "Linux" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
|
|
|
userfaultfd \- create a file descriptor for handling page faults in user
|
|
|
|
space
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.nf
|
|
|
|
.B #include <sys/types.h>
|
|
|
|
.sp
|
|
|
|
.BI "int userfaultfd(int " flags );
|
|
|
|
.fi
|
|
|
|
.PP
|
|
|
|
.IR Note :
|
|
|
|
There is no glibc wrapper for this system call; see NOTES.
|
|
|
|
.SH DESCRIPTION
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR userfaultfd ()
|
|
|
|
creates a new userfaultfd object that can be used for delegation of page-fault
|
|
|
|
handling to a user-space application,
|
|
|
|
and returns a file descriptor that refers to the new object.
|
|
|
|
The new userfaultfd object is configured using
|
2016-12-29 07:15:17 +00:00
|
|
|
.BR ioctl (2).
|
2016-12-29 11:50:09 +00:00
|
|
|
|
|
|
|
Once the userfaultfd object is configured, the application can use
|
2016-12-29 07:15:17 +00:00
|
|
|
.BR read (2)
|
|
|
|
to receive userfaultfd notifications.
|
2016-12-29 11:50:09 +00:00
|
|
|
The reads from userfaultfd may be blocking or non-blocking,
|
|
|
|
depending on the value of
|
2016-12-29 07:15:17 +00:00
|
|
|
.I flags
|
|
|
|
used for the creation of the userfaultfd or subsequent calls to
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR fcntl (2).
|
2016-12-29 07:15:17 +00:00
|
|
|
|
|
|
|
The following values may be bitwise ORed in
|
|
|
|
.IR flags
|
|
|
|
to change the behavior of
|
|
|
|
.BR userfaultfd ():
|
|
|
|
.TP
|
|
|
|
.BR O_CLOEXEC
|
2016-12-29 11:50:09 +00:00
|
|
|
Enable the close-on-exec flag for the new userfaultfd file descriptor.
|
2016-12-29 07:15:17 +00:00
|
|
|
See the description of the
|
|
|
|
.B O_CLOEXEC
|
|
|
|
flag in
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR open (2).
|
2016-12-29 07:15:17 +00:00
|
|
|
.TP
|
|
|
|
.BR O_NONBLOCK
|
2016-12-29 11:50:09 +00:00
|
|
|
Enables non-blocking operation for the userfaultfd object.
|
2016-12-29 07:15:17 +00:00
|
|
|
See the description of the
|
|
|
|
.BR O_NONBLOCK
|
|
|
|
flag in
|
|
|
|
.BR open (2).
|
|
|
|
.\"
|
|
|
|
.SS Userfaultfd operation
|
|
|
|
After the userfaultfd object is created with
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR userfaultfd (),
|
|
|
|
the application must enable it using the
|
|
|
|
.B UFFDIO_API
|
|
|
|
.BR ioctl (2)
|
|
|
|
operation.
|
|
|
|
This operation allows a handshake between the kernel and user space
|
|
|
|
to determine the API version and supported features.
|
|
|
|
After a successful
|
|
|
|
.B UFFDIO_API
|
|
|
|
operation,
|
|
|
|
the application then registers memory address ranges using the
|
|
|
|
.B UFFDIO_REGISTER
|
|
|
|
.BR ioctl (2)
|
|
|
|
operation.
|
|
|
|
After successful completion of a
|
|
|
|
.B UFFDIO_REGISTER
|
|
|
|
operation,
|
|
|
|
a page fault occurring in the requested memory range, and satisfying
|
|
|
|
the mode defined at the registration time, will be forwarded by the kernel to
|
|
|
|
the user-space application.
|
|
|
|
The application can then use the
|
|
|
|
.B UFFDIO_COPY
|
2016-12-29 07:15:17 +00:00
|
|
|
or
|
2016-12-29 11:50:09 +00:00
|
|
|
.B UFFDIO_ZERO
|
|
|
|
.BR ioctl (2)
|
|
|
|
operations to resolve the page fault.
|
2016-12-29 07:15:17 +00:00
|
|
|
.PP
|
2016-12-29 11:50:09 +00:00
|
|
|
Currently, userfaultfd can be used only with anonymous private memory
|
2016-12-29 07:15:17 +00:00
|
|
|
mappings.
|
|
|
|
.\"
|
2016-12-29 11:50:09 +00:00
|
|
|
.SS Configuration ioctl(2) operations
|
|
|
|
The
|
|
|
|
.BR ioctl (2)
|
|
|
|
operations described below are used to configure userfaultfd behavior.
|
|
|
|
They allow the caller to choose what features will be enabled and
|
|
|
|
what kinds of events will be delivered to the application.
|
2016-12-29 07:15:17 +00:00
|
|
|
.TP
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR "UFFDIO_API struct uffdio_api *" argp
|
|
|
|
Enable operation of the userfaultfd and perform API handshake.
|
2016-12-29 07:15:17 +00:00
|
|
|
The
|
|
|
|
.I uffdio_api
|
|
|
|
structure is defined as:
|
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
|
|
|
|
struct uffdio_api {
|
2016-12-29 11:50:09 +00:00
|
|
|
__u64 api;
|
|
|
|
__u64 features;
|
|
|
|
__u64 ioctls;
|
2016-12-29 07:15:17 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
.fi
|
|
|
|
.in
|
|
|
|
The
|
|
|
|
.I api
|
|
|
|
field denotes the API version requested by the application.
|
2016-12-29 11:50:09 +00:00
|
|
|
The kernel verifies that it can support the requested version, and sets the
|
2016-12-29 07:15:17 +00:00
|
|
|
.I features
|
|
|
|
and
|
|
|
|
.I ioctls
|
|
|
|
fields to bit masks representing all the available features and the generic
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR ioctl (2
|
|
|
|
operationss available.
|
|
|
|
.\" FIXME We need to say more about the list of bits that can appear in
|
|
|
|
.\" these two fields.
|
2016-12-29 07:15:17 +00:00
|
|
|
.\"
|
|
|
|
.TP
|
2016-12-29 11:50:09 +00:00
|
|
|
.BI "UFFDIO_REGISTER struct uffdio_register *" argp
|
|
|
|
Register a memory address range with the userfaultfd object.
|
2016-12-29 07:15:17 +00:00
|
|
|
The
|
|
|
|
.I uffdio_register
|
|
|
|
structure is defined as:
|
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
|
|
|
|
struct uffdio_range {
|
2016-12-29 11:50:09 +00:00
|
|
|
__u64 start;
|
|
|
|
__u64 end;
|
2016-12-29 07:15:17 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
struct uffdio_register {
|
2016-12-29 11:50:09 +00:00
|
|
|
struct uffdio_range range;
|
|
|
|
__u64 mode;
|
|
|
|
__u64 ioctls;
|
2016-12-29 07:15:17 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
.fi
|
|
|
|
.in
|
|
|
|
|
|
|
|
The
|
|
|
|
.I range
|
|
|
|
field defines a memory range starting at
|
|
|
|
.I start
|
|
|
|
and ending at
|
|
|
|
.I end
|
|
|
|
that should be handled by the userfaultfd.
|
2016-12-29 11:50:09 +00:00
|
|
|
|
2016-12-29 07:15:17 +00:00
|
|
|
The
|
|
|
|
.I mode
|
2016-12-29 11:50:09 +00:00
|
|
|
field defines the mode of operation desired for this memory region.
|
2016-12-29 07:15:17 +00:00
|
|
|
The following values may be bitwise ORed to set the userfaultfd mode for
|
2016-12-29 11:50:09 +00:00
|
|
|
the specified range:
|
|
|
|
|
2016-12-29 07:15:17 +00:00
|
|
|
.RS
|
2016-12-29 11:50:09 +00:00
|
|
|
.TP
|
2016-12-29 07:15:17 +00:00
|
|
|
.B UFFDIO_REGISTER_MODE_MISSING
|
|
|
|
Track page faults on missing pages
|
2016-12-29 11:50:09 +00:00
|
|
|
.TP
|
2016-12-29 07:15:17 +00:00
|
|
|
.B UFFDIO_REGISTER_MODE_WP
|
2016-12-29 11:50:09 +00:00
|
|
|
Track page faults on write-protected pages.
|
|
|
|
Currently, the only supported mode is
|
|
|
|
.BR UFFDIO_REGISTER_MODE_MISSING .
|
2016-12-29 07:15:17 +00:00
|
|
|
.RE
|
|
|
|
.IP
|
2016-12-29 11:50:09 +00:00
|
|
|
.\" FIXME In the following, what does "answers" mean, and what are the bits?
|
|
|
|
.\" (we need a list of the bits here).
|
2016-12-29 07:15:17 +00:00
|
|
|
The kernel answers which ioctl commands are available for the requested
|
|
|
|
range in the
|
|
|
|
.I ioctls
|
|
|
|
field.
|
|
|
|
.\"
|
|
|
|
.TP
|
2016-12-29 11:50:09 +00:00
|
|
|
.BI "UFFDIO_UNREGISTER struct uffdio_register *" argp
|
2016-12-29 07:15:17 +00:00
|
|
|
Unregister a memory range from userfaultfd.
|
|
|
|
.\"
|
2016-12-29 11:50:09 +00:00
|
|
|
.SS Range ioctl(2) operations
|
|
|
|
The range
|
|
|
|
.BR ioctl (2)
|
|
|
|
operations enable the calling application to resolve page fault
|
|
|
|
events in a consistent way.
|
|
|
|
.\" FIXME What does "consistent" mean?
|
2016-12-29 07:15:17 +00:00
|
|
|
.TP
|
2016-12-29 11:50:09 +00:00
|
|
|
.BI "UFFDIO_COPY struct uffdio_copy *" argp
|
2016-12-29 07:15:17 +00:00
|
|
|
Atomically copy a continuous memory chunk into the userfault registered
|
|
|
|
range and optionally wake up the blocked thread.
|
2016-12-29 11:50:09 +00:00
|
|
|
The source and destination addresses and the number of bytes to copy are
|
|
|
|
specified by the
|
2016-12-29 07:15:17 +00:00
|
|
|
.IR src ", " dst ", and " len
|
|
|
|
fields of
|
2016-12-29 11:50:09 +00:00
|
|
|
.IR "struct uffdio_copy" :
|
2016-12-29 07:15:17 +00:00
|
|
|
|
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
struct uffdio_copy {
|
2016-12-29 11:50:09 +00:00
|
|
|
__u64 dst;
|
|
|
|
__u64 src;
|
|
|
|
__u64 len;
|
|
|
|
__u64 mode;
|
|
|
|
__s64 copy;
|
2016-12-29 07:15:17 +00:00
|
|
|
};
|
|
|
|
.nf
|
|
|
|
.fi
|
2016-12-29 11:50:09 +00:00
|
|
|
.IP
|
2016-12-29 07:15:17 +00:00
|
|
|
The following values may be bitwise ORed in
|
|
|
|
.IR mode
|
2016-12-29 11:50:09 +00:00
|
|
|
to change the behavior of the
|
|
|
|
.B UFFDIO_COPY
|
|
|
|
operation:
|
|
|
|
|
2016-12-29 07:15:17 +00:00
|
|
|
.RS
|
2016-12-29 11:50:09 +00:00
|
|
|
.TP
|
2016-12-29 07:15:17 +00:00
|
|
|
.B UFFDIO_COPY_MODE_DONTWAKE
|
|
|
|
Do not wake up the thread that waits for page fault resolution
|
|
|
|
.RE
|
|
|
|
.IP
|
|
|
|
The
|
|
|
|
.I copy
|
|
|
|
field of the
|
|
|
|
.I uffdio_copy
|
2016-12-29 11:50:09 +00:00
|
|
|
structure is used by the kernel to return the number of bytes
|
|
|
|
that was actually copied, or an error.
|
2016-12-29 07:15:17 +00:00
|
|
|
If
|
|
|
|
.I uffdio_copy.copy
|
|
|
|
doesn't match the
|
|
|
|
.I uffdio_copy.len
|
|
|
|
passed in input to
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR UFFDIO_COPY ,
|
|
|
|
the operation will return
|
|
|
|
.\" FIXME In the 'copy' field? (This isn't clear.)
|
|
|
|
.BR \-EAGAIN .
|
|
|
|
If
|
|
|
|
.BR ioctl (2)
|
|
|
|
returns zero it means it succeeded, no error was reported and
|
2016-12-29 07:15:17 +00:00
|
|
|
the entire area was copied.
|
2016-12-29 11:50:09 +00:00
|
|
|
If an invalid fault happens while writing to the
|
2016-12-29 07:15:17 +00:00
|
|
|
.I uffdio_copy.copy
|
2016-12-29 11:50:09 +00:00
|
|
|
field, the system call will return
|
|
|
|
.\" FIXME In the 'copy' field? (This isn't clear.)
|
|
|
|
.BR \-EFAULT .
|
2016-12-29 07:15:17 +00:00
|
|
|
.I uffdio_copy.copy
|
2016-12-29 11:50:09 +00:00
|
|
|
is an output-only field;
|
|
|
|
it is not read by the
|
|
|
|
.B UFFDIO_COPY
|
|
|
|
operation.
|
2016-12-29 07:15:17 +00:00
|
|
|
.\"
|
|
|
|
.TP
|
2016-12-29 11:50:09 +00:00
|
|
|
.BI "UFFDIO_ZERO struct uffdio_zero *" argp
|
2016-12-29 07:15:17 +00:00
|
|
|
Zero out a part of memory range registered with userfaultfd.
|
2016-12-29 11:50:09 +00:00
|
|
|
The requested range is specified by the
|
2016-12-29 07:15:17 +00:00
|
|
|
.I range
|
2016-12-29 11:50:09 +00:00
|
|
|
field of the
|
2016-12-29 07:15:17 +00:00
|
|
|
.I uffdio_zeropage
|
|
|
|
structure:
|
|
|
|
|
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
struct uffdio_zeropage {
|
2016-12-29 11:50:09 +00:00
|
|
|
struct uffdio_range range;
|
|
|
|
__u64 mode;
|
|
|
|
__s64 zeropage;
|
2016-12-29 07:15:17 +00:00
|
|
|
};
|
|
|
|
.nf
|
|
|
|
.fi
|
2016-12-29 11:50:09 +00:00
|
|
|
.IP
|
2016-12-29 07:15:17 +00:00
|
|
|
The following values may be bitwise ORed in
|
|
|
|
.IR mode
|
|
|
|
to change the behavior of
|
2016-12-29 11:50:09 +00:00
|
|
|
.B UFFDIO_ZERO
|
|
|
|
operation:
|
|
|
|
|
2016-12-29 07:15:17 +00:00
|
|
|
.RS
|
2016-12-29 11:50:09 +00:00
|
|
|
.TP
|
2016-12-29 07:15:17 +00:00
|
|
|
.B UFFDIO_ZEROPAGE_MODE_DONTWAKE
|
2016-12-29 11:50:09 +00:00
|
|
|
Do not wake up the thread that waits for page-fault resolution.
|
2016-12-29 07:15:17 +00:00
|
|
|
.RE
|
|
|
|
.IP
|
|
|
|
The
|
|
|
|
.I zeropage
|
|
|
|
field of the
|
|
|
|
.I uffdio_zero
|
2016-12-29 11:50:09 +00:00
|
|
|
structure is used by the kernel to return the number of bytes
|
|
|
|
that was actually zeroed,
|
|
|
|
or an error in the same manner as
|
2016-12-29 07:15:17 +00:00
|
|
|
.IR uffdio_copy.copy .
|
|
|
|
.\"
|
|
|
|
.TP
|
2016-12-29 11:50:09 +00:00
|
|
|
.BI "UFFDIO_WAKE struct uffdio_range *" argp
|
|
|
|
Wake up the thread waiting for page-fault resolution.
|
2016-12-29 07:15:17 +00:00
|
|
|
.SH RETURN VALUE
|
2016-12-29 11:50:09 +00:00
|
|
|
On success,
|
|
|
|
.BR userfaultfd ()
|
|
|
|
returns a new file descriptor that refers to the userfaultfd object.
|
2016-12-29 07:15:17 +00:00
|
|
|
On error, \-1 is returned, and
|
|
|
|
.I errno
|
|
|
|
is set appropriately.
|
|
|
|
.SH ERRORS
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
An unsupported value was specified in
|
|
|
|
.IR flags .
|
|
|
|
.TP
|
|
|
|
.BR EMFILE
|
|
|
|
The per-process limit on the number of open file descriptors has been
|
|
|
|
reached
|
|
|
|
.TP
|
|
|
|
.B ENFILE
|
|
|
|
The system-wide limit on the total number of open files has been
|
|
|
|
reached.
|
|
|
|
.TP
|
|
|
|
.B ENOMEM
|
|
|
|
Insufficient kernel memory was available.
|
|
|
|
.SH CONFORMING TO
|
|
|
|
.BR userfaultfd ()
|
|
|
|
is Linux-specific and should not be used in programs intended to be
|
|
|
|
portable.
|
|
|
|
.SH NOTES
|
|
|
|
Glibc does not provide a wrapper for this system call; call it using
|
|
|
|
.BR syscall (2).
|
|
|
|
.SH SEE ALSO
|
|
|
|
.BR fcntl (2),
|
2016-12-29 11:50:09 +00:00
|
|
|
.BR ioctl (2),
|
|
|
|
.BR mmap (2)
|
2016-12-29 07:15:17 +00:00
|
|
|
|
|
|
|
.IR Documentation/vm/userfaultfd.txt
|
|
|
|
in the Linux kernel source tree
|
|
|
|
|