When the user creates an unprivileged mount namespace, the Linux
kernel sets the MNT_LOCKED flag [1] on any submounts to prevent
such mounts from being unmounted inside the mount namespace. Such
an unmount would reveal the filesystem tree behind the mount,
which is not otherwise possible from an unprivileged vantage
point.
Attempting to unmount such a mount will fail with EINVAL. However,
less obvious implication is that attempting a bind mount without
MS_REC, where the tree being bound contains locked sub-mounts,
will also fail with EINVAL, because, without MS_REC, such
submounts are effectively being unmounted.
Cursory googling shows several instances of people running into
this problem, so I felt it advantageous to have it documented in
the man page.
[1] 4fbd8d194f/fs/namespace.c (L1110-L1113)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If an advisory lock is lost, then read/write requests on any
affected file descriptor can return EIO - for NFSv4 at least.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
One last thing: reading through this, I think it might need a
wording fix (this is my fault), in order to avoid implying that
brk() or malloc() use dlopen().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
-- Expand the documentation to discuss the hazards in
enough detail to allow avoiding them.
-- Mention the upcoming MAP_FIXED_SAFE flag.
-- Enhance the alignment requirement slightly.
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Jann Horn <jannh@google.com>
CC: Matthew Wilcox <willy@infradead.org>
CC: Michal Hocko <mhocko@kernel.org>
CC: Mike Rapoport <rppt@linux.vnet.ibm.com>
CC: Cyril Hrubis <chrubis@suse.cz>
CC: Michal Hocko <mhocko@suse.com>
CC: Pavel Machek <pavel@ucw.cz>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
MAP_FIXED has been widely used for a very long time, yet the man
page still claims that "the use of this option is discouraged".
The documentation assumes that "less portable" == "must be discouraged".
Instead of discouraging something that is so useful and widely used,
change the documentation to explain its limitations better.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It makes no sense to describe this flag in two different
manual pages, so consolidate the description to one page.
Furthermore, the following statement that was in the prctl(2)
page is not correct:
A thread's effective capability set is always cleared
when such a credential change is made, regardless of
the setting of the "keep capabilities" flag.
The effective set is not cleared if, for example, the
credential sets were [ruid != 0, euid != 0, suid == 0]
and suid is switched to zero while the "keep capabilities"
flag is set.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As hinted in the kernel source, MAX_HANDLE_SZ is a hint
rather than a promise:
/* limit the handle size to NFSv4 handle size now */
#define MAX_HANDLE_SZ 128
Note the "now" (probably should be "for now").
So change the description to make this clear.
Reported-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>