prctl.2: Document Syscall User Dispatch

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Gabriel Krisman Bertazi 2021-02-07 18:47:57 -05:00 committed by Michael Kerrisk
parent 8a6b6cb878
commit 131ee1e1de
1 changed files with 154 additions and 0 deletions

View File

@ -1533,6 +1533,130 @@ For more information, see the kernel source file
(or
.I Documentation/arm64/sve.txt
before Linux 5.3).
.TP
.\" prctl PR_SET_SYSCALL_USER_DISPATCH
.\" commit 1446e1df9eb183fdf81c3f0715402f1d7595d4
.BR PR_SET_SYSCALL_USER_DISPATCH " (since Linux 5.11, x86 only)"
.IP
Configure the Syscall User Dispatch mechanism
for the calling thread.
This mechanism allows an application
to selectively intercept system calls
so that they can be handled within the application itself.
Interception takes the form of a thread-directed
.B SIGSYS
signal that is delivered to the thread
when it makes a system call.
If intercepted,
the system call is not executed by the kernel.
.IP
To enable this mechanism,
.I arg2
should be set to
.BR PR_SYS_DISPATCH_ON .
Once enabled, further system calls will be selectively intercepted,
depending on a control variable provided by user space.
In this case,
.I arg3
and
.I arg4
respectively identify the
.I offset
and
.I length
of a single contiguous memory region in the process address space
from where system calls are always allowed to be executed,
regardless of the control variable.
(Typically, this area would include the area of memory
containing the C library.)
.IP
.I arg5
points to a char-sized variable
that is a fast switch to allow/block system call execution
without the overhead of doing another system call
to re-configure Syscall User Dispatch.
This control variable can either be set to
.B SYSCALL_DISPATCH_FILTER_BLOCK
to block system calls from executing
or to
.B SYSCALL_DISPATCH_FILTER_ALLOW
to temporarily allow them to be executed.
This value is checked by the kernel
on every system call entry,
and any unexpected value will raise
an uncatchable
.B SIGSYS
at that time,
killing the application.
.IP
When a system call is intercepted,
the kernel sends a thread-directed
.B SIGSYS
signal to the triggering thread.
Various fields will be set in the
.I siginfo_t
structure (see
.BR sigaction (2))
associated with the signal:
.RS
.IP * 3
.I si_signo
will contain
.BR SIGSYS .
.IP *
.IR si_call_addr
will show the address of the system call instruction.
.IP *
.IR si_syscall
and
.IR si_arch
will indicate which system call was attempted.
.IP *
.I si_code
will contain
.BR SYS_USER_DISPATCH .
.IP *
.I si_errno
will be set to 0.
.RE
.IP
The program counter will be as though the system call happened
(i.e., the program counter will not point to the system call instruction).
.IP
When the signal handler returns to the kernel,
the system call completes immediately
and returns to the calling thread,
without actually being executed.
If necessary
(i.e., when emulating the system call on user space.),
the signal handler should set the system call return value
to a sane value,
by modifying the register context stored in the
.I ucontext
argument of the signal handler.
See
.BR sigaction (2),
.BR sigreturn (2),
and
.BR getcontext (3)
for more information.
.IP
If
.I arg2
is set to
.BR PR_SYS_DISPATCH_OFF ,
Syscall User Dispatch is disabled for that thread.
the remaining arguments must be set to 0.
.IP
The setting is not preserved across
.BR fork (2),
.BR clone (2),
or
.BR execve (2).
.IP
For more information,
see the kernel source file
.IR Documentation/admin-guide/syscall-user-dispatch.rst
.\" prctl PR_SET_TAGGED_ADDR_CTRL
.\" commit 63f0c60379650d82250f22e4cf4137ef3dc4f43d
.TP
@ -2000,6 +2124,14 @@ and
.I arg3
is an invalid address.
.TP
.B EFAULT
.I option
is
.B PR_SET_SYSCALL_USER_DISPATCH
and
.I arg5
has an invalid address.
.TP
.B EINVAL
The value of
.I option
@ -2231,6 +2363,28 @@ and SVE is not available on this platform.
.B EINVAL
.I option
is
.B PR_SET_SYSCALL_USER_DISPATCH
and one of the following is true:
.RS
.IP * 3
.I arg2
is
.B PR_SYS_DISPATCH_OFF
and the remaining arguments are not 0;
.IP * 3
.I arg2
is
.B PR_SYS_DISPATCH_ON
and the memory range specified is outside the
address space of the process.
.IP * 3
.I arg2
is invalid.
.RE
.TP
.B EINVAL
.I option
is
.BR PR_SET_TAGGED_ADDR_CTRL
and the arguments are invalid or unsupported.
See the description of