- Change some instances of "-" to "\"
- Use C99 style (declare variables nearer use in code)
- Add a bit of white space
- Remove one 'const...const' added by Alex that caused
compiler warnings
- Use "reverse Christmas tree" form for declarations in main()
- Other minor changes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We don't really need ext4(5) and xfs(5) here. They provide
no further info that is directly relevant to the reader of
mount_setattr(2).
clone3(2) isn't necessary because it is the same page as clone(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Fix SYNOPSIS to fit in 78 columns
Also, we don't show when an include is included for a specific type,
unless that header is included _only_ for the type,
or there might be confusion (e.g., termios).
Instead, that type should be documented in system_data_types(7),
with a link page mount_attr-struct(3).
- Fix references to mount_setattr(). See man-pages(7):
Any reference to the subject of the current manual page should be writ‐
ten with the name in bold followed by a pair of parentheses in Roman
(normal) font. For example, in the fcntl(2) man page, references to
the subject of the page would be written as: fcntl(). The preferred
way to write this in the source file is:
.BR fcntl ()
- Fix line breaks according to semantic newline rules (and add some commas)
- Fix wrong usage of .IR when .RI should have been used
- Fix formatting of variable part in FOO<number>:
- Make italic the variable part (as groff_man(7) recommends)
- Remove <>
- Use syntax recommended by G. Branden Robinson (groff)
- Fix unnecessary uses of .BR or .IR when .B or .I would suffice
- Fix formatting of punctuation
In some cases, it was in italics or bold, and it should always be in roman.
- Use uppercase to begin text, even in bullet points, since those were
multi-sentence.
- Simplify usage of .RS/.RE in combination with .IP
- s/fat/FAT/ as fs(7) does
- Slightly reword some sentences for consistency
- Use Linux-specific for consistency with other pages (in VERSIONS)
- EXAMPLES: Place the return type in a line of its own (as in other pages)
- Fix alignment of code
- Replace unnecessary use of the GNU extension ({}) by do {} while (0)
In that case, there was no return value (moreover, it's a noreturn).
- Break complex declaration lines into a line for each variable
The variables were being initialized, some to non-zero values,
so for clarity, a line for each one seems more appropriate.
- Add const to pointers when possible
- s/\\/\e/
- Remove unmatched groff commands
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note the use of FUTEX_CLOCK_REALTIME for selecting the clock,
and eliminate repetition of details already covered in the
description of FUTEX_LOCK_PI.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
FUTEX_LOCK_PI2 is a new futex operation which was recently introduced into the
Linux kernel. It works exactly like FUTEX_LOCK_PI. However, it has support for
selectable clocks for timeouts. By default CLOCK_MONOTONIC is used. If
FUTEX_CLOCK_REALTIME is specified then the timeout is measured against
CLOCK_REALTIME.
This new operation addresses an inconsistency in the futex interface:
FUTEX_LOCK_PI only works with timeouts based on CLOCK_REALTIME in contrast to
all the other PI operations.
Document the FUTEX_LOCK_PI2 command.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Move example program to a new EXAMPLES section
- Invert logic in the handler to have the failure in the
conditional path, and the success out of any conditionals.
- Use NULL, EXIT_SUCCESS, and EXIT_FAILURE instead of magic numbers
- Separate declarations from code
- Put function return type on its own line
- Put function opening brace on its line
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Do not include unused (and incompatible) header file termios.h and
include required header files for puts() and close() functions.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After a patch proposal from наб triggered by concerns that, when
talking about PIPE_BUF, pipe(7) explicitly mentions write(2) but
not writev(2), I've concluded that the reference in writev(2) to
pipe(7) is not needed (mea culpa; I added that text), and I think
the text in pipe(7) could be written to be closer to the POSIX
spec, which doesn't talk about "write() calls", but simply about
"writes".
Reported-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mainly: I generally don't want us to be including URLs to mailing
list discussions in a manual page. Either, the issue in the
discussion is worth writing up in the manual page (so that
the reader doesn't have to look elsewhere), or the details
are less important, in which case it is sufficient to note the
existence of the bug. I think this is an example of the latter.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As pointed out by Nora, the example shown in the manual
page already demonstrates that the pathname is not absolute!
Reported-by: Nora Platiel <nplatiel@gmx.us>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove duplicated word.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix a typo in the documentation of using fallocate to allocate shared
blocks. The flag FALLOC_FL_UNSHARE should instead be documented as
FALLOC_FL_UNSHARE_RANGE.
Fixes: 63a599c657 ("man2/fallocate.2: Document behavior with shared blocks")
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Correct function signature by adding missing parenthesis.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The recv.2 misspelled `SO_EE_OFFENDER` to `SOCK_EE_OFFENDER`.
This patch fix this typo.
Signed-off-by: kXuan <kxuanobj@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Using mount flag `MS_NOSUID` also affects SELinux domain transitions but
this has not been documented well.
Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Don't document includes that provide types; only those that
provide prototypes and constants.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The types that need <sys/types.h> are better documented in
system_data_types(7). Let's keep only the includes for the
prototypes and the constants.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'struct iovec' is defined in <bits/types/struct_iovec.h>,
which is included by <sys/io.h>, but it is also included by
<bits/fcntl-linux.h>, which is in the end included by <fcntl.h>.
Given that we already include <fcntl.h>, we don't need any more
includes.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'struct utimbuf' is provided by <utime.h>.
There's no need for <sys/types.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<sys/types.h> makes no sense for a function that only uses 'int'.
The flags used by this function are provided by <fcntl.h>
(or others), but not by <linux/userfaultfd.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'mode_t', which is the only reason this might have been ever
needed, is provided by <sys/stat.h> since POSIX.1-2001.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'off_t', which is the only reason this might have been ever
needed, is provided by <unistd.h> since POSIX.1-2001.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There seems to be no reason to include <unistd.h>.
<sys/swap.h> already provides both the function prototypes and the
SWAP_* constants.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<unistd.h> doesn't seem to be needed:
AT_* constants come from <fcntl.h>
STATX_* constants come from <sys/stat.h>
'struct statx' comes from <sys/stat.h>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove <sys/types.h>; ffix too
<sys/types.h> is only needed for 'struct stat'.
That is better documented in system_data_types(7).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A function declarator with empty parentheses, which is not a
prototype, is an obsolescent feature of C (See C17 6.11.6.1), and
doesn't mean 0 parameters, but instead that no information about
the parameters is provided (See C17 6.5.2.2).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It's only needed for getting 'mode_t'.
But that type is better documented in system_data_types(7).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Doing so decreases the degree to which text is indented, and
thus avoids short, poorly wrapped lines.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Jann Horn:
[[
As discussed at
<https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>,
we need to re-check checkNotificationIdIsValid() after reading remote
memory but before using the read value in any way. Otherwise, the
syscall could in the meantime get interrupted by a signal handler, the
signal handler could return, and then the function that performed the
syscall could free() allocations or return (thereby freeing buffers on
the stack).
In essence, this pread() is (unavoidably) a potential use-after-free
read; and to make that not have any security impact, we need to check
whether UAF read occurred before using the read value. This should
probably be called out elsewhere in the manpage, too...
Now, of course, **reading** is the easy case. The difficult case is if
we have to **write** to the remote process... because then we can't
play games like that. If we write data to a freed pointer, we're
screwed, that's it. (And for somewhat unrelated bonus fun, consider
that /proc/$pid/mem is originally intended for process debugging,
including installing breakpoints, and will therefore happily write
over "readonly" private mappings, such as typical mappings of
executable code.)
So, uuuuh... I guess if anyone wants to actually write memory back to
the target process, we'd better come up with some dedicated API for
that, using an ioctl on the seccomp fd that magically freezes the
target process inside the syscall while writing to its memory, or
something like that? And until then, the manpage should have a big fat
warning that writing to the target's memory is simply not possible
(safely).
]]
and
<https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>:
[[
The second bit of trouble is that if the supervisor is so oblivious
that it doesn't realize that syscalls can be interrupted, it'll run
into other problems. Let's say the target process does something like
this:
int func(void) {
char pathbuf[4096];
sprintf(pathbuf, "/tmp/blah.%d", some_number);
mount("foo", pathbuf, ...);
}
and mount() is handled with a notification. If the supervisor just
reads the path string and immediately passes it into the real mount()
syscall, something like this can happen:
target: starts mount()
target: receives signal, aborts mount()
target: runs signal handler, returns from signal handler
target: returns out of func()
supervisor: receives notification
supervisor: reads path from remote buffer
supervisor: calls mount()
but because the stack allocation has already been freed by the time
the supervisor reads it, the supervisor just reads random garbage, and
beautiful fireworks ensue.
So the supervisor *fundamentally* has to be written to expect that at
*any* time, the target can abandon a syscall. And every read of remote
memory has to be separated from uses of that remote memory by a
notification ID recheck.
And at that point, I think it's reasonable to expect the supervisor to
also be able to handle that a syscall can be aborted before the
notification is delivered.
]]
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>