2016-12-30 10:25:25 +00:00
|
|
|
.\" Copyright (c) 2016, IBM Corporation.
|
|
|
|
.\" Written by Mike Rapoport <rppt@linux.vnet.ibm.com>
|
|
|
|
.\" and Copyright (C) 2016 Michael Kerrisk <mtk.manpages@gmail.com>
|
|
|
|
.\"
|
|
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\" %%%LICENSE_END
|
|
|
|
.\"
|
|
|
|
.\"
|
|
|
|
.TH IOCTL_USERFAULTFD 2 2016-12-12 "Linux" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
|
|
|
userfaultfd \- create a file descriptor for handling page faults in user
|
|
|
|
space
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.nf
|
|
|
|
.B #include <sys/ioctl.h>
|
|
|
|
|
|
|
|
.BI "int ioctl(int " fd ", int " cmd ", ...);"
|
|
|
|
.fi
|
|
|
|
.SH DESCRIPTION
|
|
|
|
Various
|
|
|
|
.BR ioctl (2)
|
|
|
|
operations can be performed on a userfaultfd object (created by a call to
|
|
|
|
.BR userfaultfd (2))
|
|
|
|
using calls of the form:
|
|
|
|
|
|
|
|
ioctl(fd, cmd, argp);
|
|
|
|
|
|
|
|
In the above,
|
|
|
|
.I fd
|
|
|
|
is a file descriptor referring to a userfaultfd object,
|
|
|
|
.I cmd
|
|
|
|
is one of the commands listed below, and
|
|
|
|
.I argp
|
|
|
|
is a pointer to a data structure that is specific to
|
|
|
|
.IR cmd .
|
2016-12-30 13:04:52 +00:00
|
|
|
|
|
|
|
The various
|
2016-12-30 10:25:25 +00:00
|
|
|
.BR ioctl (2)
|
2016-12-30 13:04:52 +00:00
|
|
|
operations are described below.
|
|
|
|
The
|
|
|
|
.BR UFFDIO_API,
|
|
|
|
.BR UFFDIO_REGISTER ,
|
|
|
|
and
|
|
|
|
.BR UFFDIO_UNREGISTER
|
|
|
|
operations are used to
|
|
|
|
.I configure
|
|
|
|
userfaultfd behavior.
|
|
|
|
These operations allow the caller to choose what features will be enabled and
|
2016-12-30 10:25:25 +00:00
|
|
|
what kinds of events will be delivered to the application.
|
2016-12-30 13:04:52 +00:00
|
|
|
The remaining operations are
|
|
|
|
.IR range
|
|
|
|
operations.
|
2017-01-04 10:40:13 +00:00
|
|
|
These operations enable the calling application to resolve page-fault
|
2016-12-30 13:04:52 +00:00
|
|
|
events in a consistent way.
|
|
|
|
.\" FIXME What does "consistent" mean?
|
2016-12-30 13:10:16 +00:00
|
|
|
.\"
|
|
|
|
.SS UFFDIO_API
|
2017-01-04 08:08:31 +00:00
|
|
|
(Since Linux 4.3.)
|
2016-12-30 10:25:25 +00:00
|
|
|
Enable operation of the userfaultfd and perform API handshake.
|
|
|
|
The
|
2016-12-30 12:58:56 +00:00
|
|
|
.I argp
|
|
|
|
argument is a pointer to a
|
|
|
|
.IR uffdio_api
|
|
|
|
structure, defined as:
|
2016-12-30 10:25:25 +00:00
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
|
|
|
|
struct uffdio_api {
|
|
|
|
__u64 api;
|
|
|
|
__u64 features;
|
|
|
|
__u64 ioctls;
|
|
|
|
};
|
|
|
|
|
|
|
|
.fi
|
|
|
|
.in
|
|
|
|
The
|
|
|
|
.I api
|
|
|
|
field denotes the API version requested by the application.
|
2017-01-04 03:40:19 +00:00
|
|
|
Before the call, the
|
|
|
|
.I features
|
|
|
|
field must be initialized to zero.
|
|
|
|
.\" FIXME Why must the 'features' field be initialized to zero?
|
|
|
|
|
|
|
|
The kernel verifies that it can support the requested API version,
|
|
|
|
and sets the
|
2016-12-30 10:25:25 +00:00
|
|
|
.I features
|
|
|
|
and
|
|
|
|
.I ioctls
|
|
|
|
fields to bit masks representing all the available features and the generic
|
2016-12-30 13:25:06 +00:00
|
|
|
.BR ioctl (2)
|
2016-12-30 10:25:25 +00:00
|
|
|
operations available.
|
2017-01-04 03:40:19 +00:00
|
|
|
Currently, zero (i.e., no feature bits) is placed in the
|
|
|
|
.I features
|
|
|
|
field.
|
|
|
|
The returned
|
|
|
|
.I ioctls
|
|
|
|
field can contain the following bits:
|
|
|
|
.\" FIXME This user-space API seems not fully polished. Why are there
|
|
|
|
.\" not constants defined for each of the bit-mask values listed here?
|
|
|
|
.TP
|
|
|
|
.B 1 << _UFFDIO_API
|
|
|
|
The
|
|
|
|
.B UFFDIO_API
|
|
|
|
operation is supported.
|
|
|
|
.TP
|
|
|
|
.B 1 << _UFFDIO_REGISTER
|
|
|
|
The
|
|
|
|
.B UFFDIO_REGISTER
|
|
|
|
operation is supported.
|
|
|
|
.TP
|
|
|
|
.B 1 << _UFFDIO_UNREGISTER
|
|
|
|
The
|
|
|
|
.B UFFDIO_UNREGISTER
|
|
|
|
operation is supported.
|
|
|
|
.\" FIXME Is the above description of the 'ioctls' field correct.
|
|
|
|
.\" Does more need to be said?
|
2016-12-30 10:25:25 +00:00
|
|
|
.\"
|
2017-01-04 03:40:19 +00:00
|
|
|
.PP
|
2016-12-30 10:25:25 +00:00
|
|
|
This
|
|
|
|
.BR ioctl (2)
|
|
|
|
operation returns 0 on success.
|
|
|
|
On error, \-1 is returned and
|
|
|
|
.I errno
|
|
|
|
is set to indicate the cause of the error.
|
|
|
|
Possible errors include:
|
2017-01-04 03:40:19 +00:00
|
|
|
.\" FIXME Is the following error list correct?
|
|
|
|
.\"
|
2016-12-30 10:25:25 +00:00
|
|
|
.TP
|
|
|
|
.B EINVAL
|
2016-12-30 13:22:15 +00:00
|
|
|
The userfaultfd has already been enabled by a previous
|
|
|
|
.BR UFFDIO_API
|
|
|
|
operation.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
2017-01-04 03:40:19 +00:00
|
|
|
The API version requested in the
|
|
|
|
.I api
|
|
|
|
field is not supported by this kernel, or the
|
|
|
|
.I features
|
|
|
|
field was not zero.
|
|
|
|
.\" FIXME In this error case, the returned 'uffdio_api' structure
|
|
|
|
.\" zeroed out. Why is this done?
|
2016-12-30 13:10:16 +00:00
|
|
|
.\"
|
|
|
|
.SS UFFDIO_REGISTER
|
2017-01-04 08:08:31 +00:00
|
|
|
(Since Linux 4.3.)
|
2016-12-30 10:25:25 +00:00
|
|
|
Register a memory address range with the userfaultfd object.
|
|
|
|
The
|
2016-12-30 12:58:56 +00:00
|
|
|
.I argp
|
|
|
|
argument is a pointer to a
|
2016-12-30 10:25:25 +00:00
|
|
|
.I uffdio_register
|
2016-12-30 12:58:56 +00:00
|
|
|
structure, defined as:
|
2016-12-30 10:25:25 +00:00
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
|
|
|
|
struct uffdio_range {
|
|
|
|
__u64 start;
|
|
|
|
__u64 len;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct uffdio_register {
|
|
|
|
struct uffdio_range range;
|
|
|
|
__u64 mode;
|
|
|
|
__u64 ioctls;
|
|
|
|
};
|
|
|
|
|
|
|
|
.fi
|
|
|
|
.in
|
|
|
|
|
|
|
|
The
|
|
|
|
.I range
|
|
|
|
field defines a memory range starting at
|
|
|
|
.I start
|
|
|
|
and continuing for
|
|
|
|
.I len
|
|
|
|
bytes that should be handled by the userfaultfd.
|
|
|
|
|
|
|
|
The
|
|
|
|
.I mode
|
|
|
|
field defines the mode of operation desired for this memory region.
|
|
|
|
The following values may be bitwise ORed to set the userfaultfd mode for
|
|
|
|
the specified range:
|
|
|
|
.TP
|
|
|
|
.B UFFDIO_REGISTER_MODE_MISSING
|
2017-01-04 07:50:43 +00:00
|
|
|
Track page faults on missing pages.
|
2016-12-30 10:25:25 +00:00
|
|
|
.TP
|
|
|
|
.B UFFDIO_REGISTER_MODE_WP
|
|
|
|
Track page faults on write-protected pages.
|
2017-01-04 08:04:43 +00:00
|
|
|
.PP
|
2016-12-30 10:25:25 +00:00
|
|
|
Currently, the only supported mode is
|
|
|
|
.BR UFFDIO_REGISTER_MODE_MISSING .
|
2016-12-30 13:10:16 +00:00
|
|
|
.PP
|
2017-01-04 08:31:57 +00:00
|
|
|
If the operation is successful, the kernel modifies the
|
2016-12-30 10:25:25 +00:00
|
|
|
.I ioctls
|
2017-01-04 08:31:57 +00:00
|
|
|
bit-mask field to indicate which
|
|
|
|
.BR ioctl (2)
|
|
|
|
operations are available for the specified range.
|
|
|
|
This returned bit mask is as for
|
|
|
|
.BR UFFDIO_API .
|
|
|
|
|
|
|
|
This
|
|
|
|
.BR ioctl (2)
|
|
|
|
operation returns 0 on success.
|
|
|
|
On error, \-1 is returned and
|
|
|
|
.I errno
|
|
|
|
is set to indicate the cause of the error.
|
|
|
|
Possible errors include:
|
|
|
|
.\" FIXME Is the following error list correct?
|
|
|
|
.\"
|
|
|
|
.TP
|
|
|
|
.B EBUSY
|
|
|
|
A mapping in the specified range is registered with another
|
|
|
|
userfaultfd object.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
An invalid or unsupported bit was specified in the
|
|
|
|
.I mode
|
|
|
|
field; or the
|
|
|
|
.I mode
|
|
|
|
field was zero.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
There is no mapping in the specified address range.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
There as an incompatible mapping in the specified address range.
|
|
|
|
.\" FIXME What does "incompatible" mean?
|
2016-12-30 10:25:25 +00:00
|
|
|
.\"
|
2016-12-30 13:10:16 +00:00
|
|
|
.SS UFFDIO_UNREGISTER
|
2017-01-04 08:08:31 +00:00
|
|
|
(Since Linux 4.3.)
|
2016-12-30 10:25:25 +00:00
|
|
|
Unregister a memory address range from userfaultfd.
|
|
|
|
The address range to unregister is specified in the
|
|
|
|
.IR uffdio_range
|
|
|
|
structure pointed to by
|
|
|
|
.IR argp .
|
|
|
|
|
|
|
|
This
|
|
|
|
.BR ioctl (2)
|
|
|
|
operation returns 0 on success.
|
|
|
|
On error, \-1 is returned and
|
|
|
|
.I errno
|
|
|
|
is set to indicate the cause of the error.
|
|
|
|
Possible errors include:
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
Either the
|
|
|
|
.I start
|
|
|
|
or the
|
|
|
|
.I len
|
|
|
|
field of the
|
|
|
|
.I ufdio_range
|
|
|
|
structure was not a multiple of the system page size.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
There as an incompatible mapping in the specified address range.
|
2017-01-04 08:33:23 +00:00
|
|
|
.\" FIXME What does "incompatible" mean?
|
2016-12-30 10:25:25 +00:00
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
There was no mapping in the specified address range.
|
2016-12-30 13:10:16 +00:00
|
|
|
.\"
|
|
|
|
.SS UFFDIO_COPY
|
2017-01-04 08:08:31 +00:00
|
|
|
(Since Linux 4.3.)
|
2016-12-30 10:25:25 +00:00
|
|
|
Atomically copy a continuous memory chunk into the userfault registered
|
|
|
|
range and optionally wake up the blocked thread.
|
|
|
|
The source and destination addresses and the number of bytes to copy are
|
|
|
|
specified by the
|
|
|
|
.IR src ", " dst ", and " len
|
2016-12-30 12:58:56 +00:00
|
|
|
fields of the
|
|
|
|
.I uffdio_copy
|
|
|
|
structure pointed to by
|
|
|
|
.IR argp :
|
2016-12-30 10:25:25 +00:00
|
|
|
|
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
struct uffdio_copy {
|
|
|
|
__u64 dst;
|
|
|
|
__u64 src;
|
|
|
|
__u64 len;
|
|
|
|
__u64 mode;
|
|
|
|
__s64 copy;
|
|
|
|
};
|
|
|
|
.fi
|
|
|
|
.in
|
2016-12-30 13:10:16 +00:00
|
|
|
.PP
|
2016-12-30 10:25:25 +00:00
|
|
|
The following values may be bitwise ORed in
|
|
|
|
.IR mode
|
|
|
|
to change the behavior of the
|
|
|
|
.B UFFDIO_COPY
|
|
|
|
operation:
|
|
|
|
.TP
|
|
|
|
.B UFFDIO_COPY_MODE_DONTWAKE
|
2017-01-04 10:40:13 +00:00
|
|
|
Do not wake up the thread that waits for page-fault resolution
|
2016-12-30 13:10:16 +00:00
|
|
|
.PP
|
2016-12-30 10:25:25 +00:00
|
|
|
The
|
|
|
|
.I copy
|
|
|
|
field of the
|
|
|
|
.I uffdio_copy
|
|
|
|
structure is used by the kernel to return the number of bytes
|
|
|
|
that was actually copied, or an error.
|
|
|
|
If
|
|
|
|
.I uffdio_copy.copy
|
|
|
|
doesn't match the
|
|
|
|
.I uffdio_copy.len
|
|
|
|
passed in input to
|
|
|
|
.BR UFFDIO_COPY ,
|
|
|
|
the operation will return
|
|
|
|
.\" FIXME In the 'copy' field? (This isn't clear.)
|
|
|
|
.BR \-EAGAIN .
|
|
|
|
If
|
|
|
|
.BR ioctl (2)
|
|
|
|
returns zero it means it succeeded, no error was reported and
|
|
|
|
the entire area was copied.
|
|
|
|
If an invalid fault happens while writing to the
|
|
|
|
.I uffdio_copy.copy
|
|
|
|
field, the system call will return
|
|
|
|
.\" FIXME In the 'copy' field? (This isn't clear.)
|
|
|
|
.BR \-EFAULT .
|
|
|
|
.I uffdio_copy.copy
|
|
|
|
is an output-only field;
|
|
|
|
it is not read by the
|
|
|
|
.B UFFDIO_COPY
|
|
|
|
operation.
|
|
|
|
.\"
|
2016-12-30 13:10:16 +00:00
|
|
|
.SS UFFDIO_ZERO
|
2017-01-04 08:08:31 +00:00
|
|
|
(Since Linux 4.3.)
|
2016-12-30 10:25:25 +00:00
|
|
|
Zero out a part of memory range registered with userfaultfd.
|
|
|
|
The requested range is specified by the
|
|
|
|
.I range
|
|
|
|
field of the
|
|
|
|
.I uffdio_zeropage
|
2016-12-30 12:58:56 +00:00
|
|
|
structure pointed to by
|
|
|
|
.IR argp :
|
2016-12-30 10:25:25 +00:00
|
|
|
|
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
struct uffdio_zeropage {
|
|
|
|
struct uffdio_range range;
|
|
|
|
__u64 mode;
|
|
|
|
__s64 zeropage;
|
|
|
|
};
|
|
|
|
.fi
|
|
|
|
.in
|
2016-12-30 13:10:16 +00:00
|
|
|
.PP
|
2016-12-30 10:25:25 +00:00
|
|
|
The following values may be bitwise ORed in
|
|
|
|
.IR mode
|
2016-12-30 13:25:06 +00:00
|
|
|
to change the behavior of the
|
2016-12-30 10:25:25 +00:00
|
|
|
.B UFFDIO_ZERO
|
|
|
|
operation:
|
|
|
|
.TP
|
|
|
|
.B UFFDIO_ZEROPAGE_MODE_DONTWAKE
|
|
|
|
Do not wake up the thread that waits for page-fault resolution.
|
2016-12-30 13:10:16 +00:00
|
|
|
.PP
|
2016-12-30 10:25:25 +00:00
|
|
|
The
|
|
|
|
.I zeropage
|
|
|
|
field of the
|
|
|
|
.I uffdio_zero
|
|
|
|
structure is used by the kernel to return the number of bytes
|
|
|
|
that was actually zeroed,
|
|
|
|
or an error in the same manner as
|
|
|
|
.IR uffdio_copy.copy .
|
|
|
|
.\"
|
2016-12-30 13:10:16 +00:00
|
|
|
.SS UFFDIO_WAKE
|
2017-01-04 08:08:31 +00:00
|
|
|
(Since Linux 4.3.)
|
2016-12-30 10:25:25 +00:00
|
|
|
Wake up the thread waiting for page-fault resolution.
|
2016-12-30 12:58:56 +00:00
|
|
|
The
|
|
|
|
.I argp
|
|
|
|
argument is a pointer to a
|
|
|
|
.I uffdio_range
|
|
|
|
structure (shown above).
|
2016-12-30 10:25:25 +00:00
|
|
|
.\" FIXME: Need more detail here. What is the purpose of the
|
|
|
|
.\" 'struct uffdio_range *' argument?
|
|
|
|
|
|
|
|
This
|
|
|
|
.BR ioctl (2)
|
|
|
|
operation returns 0 on success.
|
|
|
|
On error, \-1 is returned and
|
|
|
|
.I errno
|
|
|
|
is set to indicate the cause of the error.
|
|
|
|
Possible errors include:
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
Either the
|
|
|
|
.I start
|
|
|
|
or the
|
|
|
|
.I len
|
|
|
|
field of the
|
|
|
|
.I ufdio_range
|
|
|
|
structure was not a multiple of the system page size.
|
|
|
|
.SH RETURN VALUE
|
|
|
|
See descriptions of the individual operations, above.
|
|
|
|
.SH ERRORS
|
|
|
|
See descriptions of the individual operations, above.
|
2016-12-30 13:16:59 +00:00
|
|
|
In addition, the following general errors can occur for all of the
|
|
|
|
operations described above:
|
|
|
|
.TP
|
|
|
|
.B EFAULT
|
|
|
|
.I argp
|
|
|
|
does not point to a valid memory address.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
(For all operations except
|
|
|
|
.BR UFFDIO_API .)
|
|
|
|
The userfaultfd object has not yet been enabled (via the
|
|
|
|
.BR UFFDIO_API
|
|
|
|
operation).
|
2016-12-30 10:25:25 +00:00
|
|
|
.SH CONFORMING TO
|
|
|
|
These
|
|
|
|
.BR ioctl (2)
|
2016-12-30 13:25:06 +00:00
|
|
|
operations are Linux-specific.
|
2016-12-30 10:25:25 +00:00
|
|
|
.SH SEE ALSO
|
|
|
|
.BR ioctl (2),
|
|
|
|
.BR mmap (2),
|
|
|
|
.BR userfaultfd (2)
|
|
|
|
|
|
|
|
.IR Documentation/vm/userfaultfd.txt
|
|
|
|
in the Linux kernel source tree
|
|
|
|
|