After comments from Miklos, and further digging in the kernel
source that showed that chroot() can also result in "hidden"
parent-IDs in mountinfo, I've revised the description of
mountinfo.
In fs/proc_namespace.cs::how_mountinfo() there is:
/* mountpoints outside of chroot jail will give SEQ_SKIP on this */
err = seq_path_root(m, &mnt_path, &p->root, " \t\n\\");
if (err)
goto out;
I instrumented the 'if (err)' code path with printk()
to show that there is indeed a record corresponding to the
parent-ID for the process root that is being skipped.
Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I do not have an exact handle on the details, but I can see
roughly what is going on. Internally, there seems to be one
("hidden") mount ID reserved to each mount namespace, and that ID
is the parent of the root mount point.
Looking through the (4.14) kernel source, mount IDs are allocated
by a kernel function called mnt_alloc_id() (in fs/namespace.c),
which is in turn called by alloc_vfsmnt() which is in turn called
by clone_mnt().
A new mount namespace is created by the kernel function
copy_mnt_ns() (in fs/namespace.c, called by
create_new_namespaces() in kernel/nsproxy.c). The copy_mnt_ns()
function calls copy_tree() (in fs/namespace.c), and copy_tree()
calls clone_mnt() in *two* places. The first of these is the call
that creates the "hidden" mount ID that becomes the parent of the
root mount point. (I verified this by instrumenting the kernel
with a few printk() calls to display the IDs.) The second place
where copy_tree() calls clone_mnt() is in a loop that replicates
each of the mount points (including the root mount point) in the
source mount namespace.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After Linux 2.6.36, the heuristic calculation of oom_score
has changed to only consider used memory and CAP_SYS_ADMIN.
See kernel commit a63d83f427fbce97a6cea0db2e64b0eb8435cd10.
Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for those new membarrier() commands:
MEMBARRIER_CMD_PRIVATE_EXPEDITED
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED
Adapt the MEMBARRIER_CMD_SHARED return value documentation to reflect
that it now returns -EINVAL when issued on a system configured for
nohz_full.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The kernel defaults to either SECCOMP_RET_KILL_PROCESS
or SECCOMP_RET_KILL_THREAD for unrecognized filter
return action values.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From linux/v4.14-rc6/source/net/ipv4/tcp.c:
if (tp->fastopen_req)
return -EALREADY; /* Another Fast Open is in progress */
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Christian Brauner's patch added the Linux 4.15 details,
but we need to retain the historical details.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch documents the following kernel commit:
commit 6397fac4915ab3002dc15aae751455da1a852f25
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date: Wed Oct 25 00:04:41 2017 +0200
userns: bump idmap limits to 340
Since Linux 4.15 the number of idmap lines has been bumped to 340.
The patch also removes the "(arbitrary)" in "There is an
(arbitrary) limit on the number of lines in the file." since the
340 line limit is well-explained by the current implementation.
The struct recording the idmaps is 12 bytes and quite some proc
files only allow writes the size of a single page size which is
4096kB. This leaves room for 340 idmappings (340 * 12 = 4080
bytes). The struct layout itself has been chosen very carefully
to allow for an implementation that limits the time-complexity for
the idmap codepaths to O(log n). However, I think it's unnecessary
to expose this much implementation detail to users in the man
page. So only mention this in the commit message. Furthermore,
the comment about the page size restriction is misleading. The
kernel sources show that >= page size is considered an error.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In Linux 4.14, the action component of the return value
switched from being 15 bits to being 16 bits. A new macro,
SECCOMP_RET_ACTION_FULL, that masks the 16 bits was added,
to replace the older SECCOMP_RET_ACTION.
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux 4.14 added SECCOMP_RET_KILL_THREAD as a synonym for
SECCOMP_RET_KILL. Remove also the discussion of multithreaded
processes, since that will be addressed in the documentation
of SECCOMP_RET_KILL_PROCESS.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch contains the initial submission of the
smartpqi man page.
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
vfork(2), getpid(2) and others which return pid_t already do this.
mtk: Additional info from Ahmad: <unistd.h> defines 'pid_t',
but only dependent on certain FTMs beng defined.
Cc: linux-man@vger.kernel.org
Signed-off-by: Ahmad Fatoum <ahmad@a3f.at>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>