It makes sense to have the description of this file
in the general discussion of user namespaces.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Files with access permissions such as rwx---rwx give fewer
permissions to their group then they do to everyone else. Which
means dropping groups with setgroups(0, NULL) actually grants a
process privileges.
The unprivileged setting of gid_map turned out not to be safe
after this change. Privileged setting of gid_map can be
interpreted as meaning yes it is ok to drop groups. [ Eric
additionally noted: Setting of gid_map with privilege has been
clarified to mean that dropping groups is ok. This allows
existing programs that set gid_map with privilege to work
without changes. That is, newgidmap(1) continues to work
unchanged.]
To prevent this problem and future problems, user namespaces were
changed in such a way as to guarantee a user can not obtain
credentials without privilege that they could not obtain without
the help of user namespaces.
This meant testing the effective user ID and not the filesystem
user ID, as setresuid(2) and setregid(2) allow setting any process
UID or GID (except the supplementary groups) to the effective ID.
Furthermore, to preserve in some form the useful applications
that have been setting gid_map without privilege, the file
/proc/[pid]/setgroups was added to allow disabling setgroups(2).
With setgroups(2) permanently disabled in a user namespace, it
again becomes safe to allow writes to gid_map without privilege.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I recently realized that I had been reasoning improperly about
what umount(MNT_DETACH) did based on an insufficient description
in the umount.2 man page, that matched my intuition but not the
implementation.
When there are no submounts, MNT_DETACH is essentially harmless to
applications. Where there are submounts, MNT_DETACH changes what
is visible to applications using the detach directories.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As at Linux 3.18, the limit is still five lines, so mention the
more recent kernel version in the text.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Include linux/ext2_fs.h does not contain any ioctl definitions
anymore.
Request codes EXT2_IOC* have been replaced by FS_IOC* in
linux/fs.h.
Some definitions of FS_IOC_* use long* but the actual code expects
int* (see fs/ext2/ioctl.c).
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This behavior is implementation-defined by POSIX. If the name
doesn't start with a '/', glibc returns EINVAL without attempting
the syscall.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clone has so many effects that it's an oversimplification to say
that the *main* use of clone is to create a thread. (In fact,
the use of clone() to create new processes may well be more
common, since glibc's fork() is a wrapper that calls clone().)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
An undocumented escape sequence in drivers/tty/vt/vt.c brings the
previously accessed virtual terminal to the foreground.
Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Normally, system calls return EINVAL for flags they don't support.
Explicitly document that clone does *not* produce an error for these two
obsolete flags.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>