Document the seccomp /proc interfaces in Linux 4.14:
/proc/sys/kernel/seccomp/actions_avail and
/proc/sys/kernel/seccomp/actions_logged.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Point the reader at strace(1) as a way of discovering system calls
that might need to be filtered.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From a conversation with Walter Harms:
> i am confused, i understand that:
> ss.ss_sp = malloc(SIGSTKSZ);
>
> ss.ss_size = SIGSTKSZ;
> ss.ss_flags = 0;
> if (sigaltstack(&ss, NULL) == -1)
>
> is equivalent to:
> ss.ss_sp = malloc(SIGSTKSZ);
>
> ss.ss_size = SIGSTKSZ;
> ss.ss_flags = SS_ONSTACK ;
> if (sigaltstack(&ss, NULL) == -1)
>
> but also to
> ss.ss_sp = malloc(SIGSTKSZ);
>
> ss.ss_size = SIGSTKSZ;
> ss.ss_flags = SS_ONSTACK | SOMETHING_FLAG ;
> if (sigaltstack(&ss, NULL) == -1)
>
> so the use of SS_ONSTACK would result in ss.ss_flags = 0 no matter what.
> OR
> SS_ONSTACK is a no-op in Linux
I see what you mean. The point is back then that SS_ONSTACK was
the only flag that could (on Linux) be specified in ss.ss_flags,
so that "SS_ONSTACK | SOMETHING_FLAG" was a nonexistent case.
These days, it's possible to specify the new SS_AUTODISARM
flag in ss.ss_flags, which I think is why you are doubtful
about the new page text.
Reported-by: Walter Harms <wharms@bfs.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[mtk: The raw system calls use "unsigned int", but the glibc
wrappers have "int" for the 'flags' argument.]
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
People seem to be using "cf." ("confere"), which means "compare",
to mean "see" instead, for which the Latin abbreviation would be
"q.v." ("quod vide" -> "which see").
In some cases "cf." might actually be the correct term but it's
still not clear what specific aspects of a function/system call
one is supposed to be comparing.
I left one use in place in hope of obtaining clarification,
because it looks like it might be useful there, if contextualized.
Migrate these uses to English and add them to the list of
abbreviations to be avoided.
If the patch to vfork(2) is not accepted, then the cf. still needs
an \& after it because it is at the end of the line but not the
end of a sentence.
Signed-off-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It's more logical to use lstat() in the example code,
since one can then experiment with sybolic links, and
also the S_IFLNK case can also occur.
Reported-by: Richard Knutsson <richard.knutsson@abelko.se>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Userfaultfd feature UFFD_FEATURE_SIGBUS was merged recently and
should be available in the Linux 4.14 release. This patch is for
the man page changes documenting this API.
Documents the following commit:
commit 2d6d6f5a09a96cc1fec7ed992b825e05f64cb50e
Author: Prakash Sangappa <prakash.sangappa@oracle.com>
Date: Wed Sep 6 16:23:39 2017 -0700
mm: userfaultfd: add feature to request for a signal delivery
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When referring to the architecture, consistently use "x86-64",
not "x86_64". Hitherto, there was a mixture of usages, with
"x86-64" predominant.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Combine two redundant paragraphs (one of which I recently
added) describing child_stack==NULL for the raw system call.
Also, make sure this text is in a more obvious place than
its previous location.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
At the current man page for shmat(2)[1], there is no mentioning
whether the returned memory address of shmat(2) will be page size
aligned or not. As that is quite important to many applications(e.g.,
those that use locks heavily and would like to avoid some locks by
some atomic guarantees provided by the CPU), it would be great to
specify that for Linux.
I walked down the current implementation of shmat(2) in the latest
kernel src and found that shmat(2) does return a page size aligned
memory address:
SYSCALL_DEFINE3(shmat, int, shmid, char __user *, shmaddr, int, shmflg)
-> do_shmat(...)
-> do_mmap_pgoff(...)
-> do_mmap(...)
-> get_unmapped_area(...)
-> get_area(...) -> offset_in_page(addr)
there is a `offset_in_page(addr)' assertion at the end and if that is
true a -EINVAL would be returned, by which we can be sure that
shmat(2) will return a page size aligned memory address on success[2].
[1]: http://man7.org/linux/man-pages/man2/shmat.2.html
[2]: there is also a `offset_in_page(2)' in get_unmapped_area(...),
but that doesn't lead to -EINVAL...I am not sure whether the logic of
that code is right.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Expand and rework the text a little, in particular adding
a reference to sigreturn(2) as a source of further
information about the ucontext argument.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since 4.13, errors from writeback are more reliably reported
to all file descriptors that might be relevant.
Add notes to this effect, and also add detail about ENOSPC and
EDQUOT which can be delayed in a similar many to EIO - for NFS
in particular.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current text was confused (mea culpa). No signal is sent to
the init() process. Rather, depending on the 'cmd' given to
reboot(), the 'group_exit_code' value will set to either SIGHUP or
SIGINT, with the effect that one of those signals is reported to
wait() in the parent process.
See https://bugzilla.kernel.org/show_bug.cgi?id=195899
Reported-by: Michał Zegan <webczat_200@poczta.onet.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
hugetlbfs support for memfd_create() was recently merged by Linus
and should be in the Linux 4.14 release. To request hugetlbfs
support a new memfd_create() flag (MFD_HUGETLB) was added.
This patch documents the following commit:
commit 749df87bd7bee5a79cef073f5d032ddb2b211de8
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date: Wed Sep 6 16:24:16 2017 -0700
mm/shmem: add hugetlbfs support to memfd_create()
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explain the older behavior, and why it changed. This is a
follow-up to Mike Kravetz's patch documenting the behavior
for old_size==0 with shared mappings.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since at least the 2.6 time frame, mremap() would create a new
mapping of the same pages if 'old_size == 0'. It would also leave
the original mapping. This was used to create a 'duplicate
mapping'.
A recent change was made to mremap() so that an attempt to create a
duplicate a private mapping will fail.
Document the 'old_size == 0' behavior and new return code from
below commit.
commit dba58d3b8c5045ad89c1c95d33d01451e3964db7
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date: Wed Sep 6 16:20:55 2017 -0700
mm/mremap: fail map duplication attempts for private mappings
v2: Fix incorrect wording noticed by Jann Horn.
Remove deprecated and memfd_create() discussion as
suggested by Florian Weimer.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add MADV_WIPEONFORK and MADV_KEEPONFORK documentation,
which has been merged for the 4.14 kernel.
While documenting what EINVAL means for MADV_WIPEONFORK,
I realized that MADV_FREE has the same thing going on,
so I documented EINVAL for both in the ERRORS section.
This patch documents the following kernel commit:
commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
Author: Rik van Riel <riel@redhat.com>
Date: Wed Sep 6 16:25:15 2017 -0700
mm,fork: introduce MADV_WIPEONFORK
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Colm MacCárthaigh <colm@allcosts.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing example given specifically states that it focus on
x86 (TSO memory model), but gives a read-read vs write-write
ordering example, even though this scenario does not require
explicit barriers on TSO.
So either we change the example architecture to a weakly-ordered
architecture, or we change the example to a scenario requiring
barriers on x86.
Let's stay on x86, but provide a Dekker as example instead.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: https://stackoverflow.com/questions/45970525/is-the-example-in-the-membarrier-man-page-pointless-in-x86
Link: https://lwn.net/Articles/573436/
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note one of the significant advantages of O_PATH: many of the
operations applied to O_PATH file descriptors don't require
read permission, so there's no readon why the open() itself
should require read permission.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the discussion of O_PATH, make it completely obvious that
fchdir() file descriptor must refer to a directory.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>