mirror of https://github.com/mkerrisk/man-pages
user_namespaces.7: New page splitting user namespace material out of namespaces(7)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
9552196ecb
commit
046de6a7d7
|
@ -0,0 +1,352 @@
|
|||
.\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
|
||||
.\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
|
||||
.\"
|
||||
.\" Permission is granted to make and distribute verbatim copies of this
|
||||
.\" manual provided the copyright notice and this permission notice are
|
||||
.\" preserved on all copies.
|
||||
.\"
|
||||
.\" Permission is granted to copy and distribute modified versions of this
|
||||
.\" manual under the conditions for verbatim copying, provided that the
|
||||
.\" entire resulting derived work is distributed under the terms of a
|
||||
.\" permission notice identical to this one.
|
||||
.\"
|
||||
.\" Since the Linux kernel and libraries are constantly changing, this
|
||||
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
||||
.\" responsibility for errors or omissions, or for damages resulting from
|
||||
.\" the use of the information contained herein. The author(s) may not
|
||||
.\" have taken the same level of care in the production of this manual,
|
||||
.\" which is licensed free of charge, as they might when working
|
||||
.\" professionally.
|
||||
.\"
|
||||
.\" Formatted or processed versions of this manual, if unaccompanied by
|
||||
.\" the source, must acknowledge the copyright and authors of this work.
|
||||
.\"
|
||||
.\"
|
||||
.TH USER_NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
user_namespaces \- overview of Linux user_namespaces
|
||||
.SH DESCRIPTION
|
||||
For an overview of namespaces, see
|
||||
.BR namespaces (7).
|
||||
|
||||
User namespaces isolate security-related identifiers, in particular,
|
||||
user IDs, group IDs, keys (see
|
||||
.BR keyctl (2)),
|
||||
and capabilities.
|
||||
A process's user and group IDs can be different
|
||||
inside and outside a user namespace.
|
||||
In particular,
|
||||
a process can have a normal unprivileged user ID outside a user namespace
|
||||
while at the same time having a user ID of 0 inside the namespace;
|
||||
in other words,
|
||||
the process has full privileges for operations inside the user namespace,
|
||||
but is unprivileged for operations outside the namespace.
|
||||
|
||||
User namespaces can be nested;
|
||||
that is, each user namespace has a parent user namespace,
|
||||
and can have zero or more child user namespaces.
|
||||
The parent of a user namespace is the user namespace
|
||||
of the process that creates the user namespace via a call to
|
||||
.BR unshare (2)
|
||||
or
|
||||
.BR clone (2)
|
||||
with the
|
||||
.BR CLONE_NEWUSER
|
||||
flag.
|
||||
|
||||
When a user namespace is created,
|
||||
it starts out without a mapping of user IDs (group IDs)
|
||||
to the parent user namespace.
|
||||
The desired mapping of user IDs (group IDs) to the parent user namespace
|
||||
may be set by writing into
|
||||
.IR /proc/[pid]/uid_map
|
||||
.RI ( /proc/[pid]/gid_map );
|
||||
see below.
|
||||
|
||||
The first process in a user namespace starts out with a complete set
|
||||
of capabilities with respect to the new user namespace.
|
||||
|
||||
System calls that return user IDs (group IDs) will return
|
||||
either the user ID (group ID) mapped into the current
|
||||
user namespace if there is a mapping, or the overflow user ID (group ID);
|
||||
the default value for the overflow user ID (group ID) is 65534.
|
||||
See the descriptions of
|
||||
.IR /proc/sys/kernel/overflowuid
|
||||
and
|
||||
.IR /proc/sys/kernel/overflowgid
|
||||
in
|
||||
.BR proc (5).
|
||||
|
||||
Starting in Linux 3.8, unprivileged processes can create user namespaces,
|
||||
and mount, PID, IPC, network, and UTS namespaces can be created with just the
|
||||
.B CAP_SYS_ADMIN
|
||||
capability in the caller's user namespace.
|
||||
|
||||
If
|
||||
.BR CLONE_NEWUSER
|
||||
is specified along with other
|
||||
.B CLONE_NEW*
|
||||
flags in a single
|
||||
.BR clone (2)
|
||||
or
|
||||
.BR unshare (2)
|
||||
call, the user namespace is guaranteed to be created first,
|
||||
giving the caller privileges over the remaining
|
||||
namespaces created by the call.
|
||||
Thus, it is possible for an unprivileged caller to specify this combination
|
||||
of flags.
|
||||
|
||||
When a new IPC, mount, network, PID, or UTS namespace is created via
|
||||
.BR clone (2)
|
||||
or
|
||||
.BR unshare (2),
|
||||
the kernel records the user namespace of the creating process against
|
||||
the new namespace.
|
||||
When a process in the new namespace subsequently performs
|
||||
privileged operations that operate on global
|
||||
resources isolated by the namespace,
|
||||
the permission checks are performed according to the process's capabilities
|
||||
in the user namespace that the kernel associated with the new namespace.
|
||||
|
||||
|
||||
The following rules apply with respect to the capabilities granted
|
||||
to a process:
|
||||
.\" In the 3.8 sources, see security/commoncap.c::cap_capable():
|
||||
.IP 1. 3
|
||||
If a process has a capability in a parent user namespace,
|
||||
then it has that capability in all child (and further removed descendant)
|
||||
namespaces as well.
|
||||
.IP 2.
|
||||
.\" * The owner of the user namespace in the parent of the
|
||||
.\" * user namespace has all caps.
|
||||
When a user namespace is created, the kernel records the effective
|
||||
user ID of the creating process as being the "owner" of the namespace,
|
||||
and likewise associates the effective group ID of the creating process
|
||||
with the namespace.
|
||||
A process whose effective user ID matches that of the
|
||||
owner of a user namespace and which is a member of the parent namespace
|
||||
(or a further removed namespace that is a direct ancestor)
|
||||
has all capabilities in the user namespace.
|
||||
.\" As a rough approximation, this means that
|
||||
.\" the user who creates a user namespace
|
||||
.\" has all capabilities inside that namespace and its descendants.
|
||||
.PP
|
||||
Use of user namespaces requires a kernel that is configured with the
|
||||
.B CONFIG_USER_NS
|
||||
option.
|
||||
|
||||
Over the years, there have been a lot of features that have been added
|
||||
to the Linux kernel that are only available to privileged users
|
||||
because of their potential to confuse set-user-ID-root applications.
|
||||
In general, it becomes safe to allow the root user in a user namespace to
|
||||
use those features because it is impossible, while in a user namespace,
|
||||
to gain more privilege than the root user of a user namespace has.
|
||||
|
||||
The
|
||||
.IR /proc/[pid]/uid_map
|
||||
and
|
||||
.IR /proc/[pid]/gid_map
|
||||
files (available since Linux 3.5)
|
||||
.\" commit 22d917d80e842829d0ca0a561967d728eb1d6303
|
||||
expose the mappings for user and group IDs
|
||||
inside the user namespace for the process
|
||||
.IR pid .
|
||||
The description here explains the details for
|
||||
.IR uid_map ;
|
||||
.IR gid_map
|
||||
is exactly the same,
|
||||
but each instance of "user ID" is replaced by "group ID".
|
||||
|
||||
The
|
||||
.I uid_map
|
||||
file exposes the mapping of user IDs from the user namespace
|
||||
of the process
|
||||
.IR pid
|
||||
to the user namespace of the process that opened
|
||||
.IR uid_map
|
||||
(but see a qualification to this point below).
|
||||
In other words, processes that are in different user namespaces
|
||||
will potentially see different values when reading from a particular
|
||||
.I uid_map
|
||||
file, depending on the user ID mappings for the user namespaces
|
||||
of the reading processes.
|
||||
|
||||
Each line in the
|
||||
.I uid_map
|
||||
file specifies a 1-to-1 mapping of a range of contiguous
|
||||
user IDs between two user namespaces.
|
||||
(When a user namespace is first created, this file is empty.)
|
||||
The specification in each line takes the form of
|
||||
three numbers delimited by white space.
|
||||
The first two numbers specify the starting user ID in
|
||||
each user namespace.
|
||||
The third number specifies the length of the mapped range.
|
||||
In detail, the fields are interpreted as follows:
|
||||
.IP (1) 4
|
||||
The start of the range of user IDs in
|
||||
the user namespace of the process
|
||||
.IR pid .
|
||||
.IP (2)
|
||||
The start of the range of user
|
||||
IDs to which the user IDs specified by field one map.
|
||||
How field two is interpreted depends on whether the process that opened
|
||||
.I uid_map
|
||||
and the process
|
||||
.IR pid
|
||||
are in the same user namespace, as follows:
|
||||
.RS
|
||||
.IP a) 3
|
||||
If the two processes are in different user namespaces:
|
||||
field two is the start of a range of
|
||||
user IDs in the user namespace of the process that opened
|
||||
.IR uid_map .
|
||||
.IP b)
|
||||
If the two processes are in the same user namespace:
|
||||
field two is the start of the range of
|
||||
user IDs in the parent user namespace of the process
|
||||
.IR pid .
|
||||
This case enables the opener of
|
||||
.I uid_map
|
||||
(the common case here is opening
|
||||
.IR /proc/self/uid_map )
|
||||
to see the mapping of user IDs into the user namespace of the process
|
||||
that created this user namespace.
|
||||
.RE
|
||||
.IP (3)
|
||||
The length of the range of user IDs that is mapped between the two
|
||||
user namespaces.
|
||||
.PP
|
||||
After the creation of a new user namespace, the
|
||||
.I uid_map
|
||||
file of
|
||||
.I one
|
||||
of the process in the namespace may be written to
|
||||
.I once
|
||||
to define the mapping of user IDs in the new user namespace.
|
||||
(An attempt to write more than once to a
|
||||
.I uid_map
|
||||
file in a user namespace fails with the error
|
||||
.BR EPERM .)
|
||||
|
||||
The lines written to
|
||||
.IR uid_map
|
||||
must conform to the following rules:
|
||||
.IP * 3
|
||||
The three fields must be valid numbers,
|
||||
and the last field must be greater than 0.
|
||||
.IP *
|
||||
Lines are terminated by newline characters.
|
||||
.IP *
|
||||
There is an (arbitrary) limit on the number of lines in the file.
|
||||
As at Linux 3.8, the limit is five lines.
|
||||
In addition, the number of bytes written to
|
||||
the file must be less than the system page size,
|
||||
.\" FIXME(Eric): the restriction "less than" rather than "less than or equal"
|
||||
.\" seems strangely arbitrary. Furthermore, the comment does not agree
|
||||
.\" with the code in kernel/user_namespace.c. Which is correct.
|
||||
and the write must be performed at the start of the file (i.e.,
|
||||
.BR lseek (2)
|
||||
and
|
||||
.BR pwrite (2)
|
||||
can't be used to write to nonzero offsets in the file).
|
||||
.IP *
|
||||
The range of user IDs specified in each line cannot overlap with the ranges
|
||||
in any other lines.
|
||||
In the current implementation (Linux 3.8), this requirement is
|
||||
satisfied by a simplistic implementation that imposes the further
|
||||
requirement that
|
||||
the values in both field 1 and field 2 of successive lines must be
|
||||
in ascending numerical order.
|
||||
.IP *
|
||||
At least one line must be written to the file.
|
||||
.PP
|
||||
Writes that violate the above rules fail with the error
|
||||
.BR EINVAL .
|
||||
|
||||
In order for a process to write to the
|
||||
.I /proc/[pid]/uid_map
|
||||
.RI ( /proc/[pid]/gid_map )
|
||||
file, all of the following requirements must be met:
|
||||
.IP 1. 3
|
||||
The writing process must have the
|
||||
.BR CAP_SETUID
|
||||
.RB ( CAP_SETGID )
|
||||
capability in the user namespace of the process
|
||||
.IR pid .
|
||||
.\" FIXME(Eric):
|
||||
.\" Something isn't quite right in the description here.
|
||||
.\" Suppose UID 1000 creates a user namespace. At this point, UID 0 in
|
||||
.\" the parent namespace can write a map of (say) '0 1000 10' to uid_map.
|
||||
.\" That succeeds. But how is that case covered in the three rules here?
|
||||
.\" In other words, how does UID 0 in the parent namespace have any
|
||||
.\" capabilities in the new child namespace? Somewhere on the page,
|
||||
.\" I think there needs to be a statement about the privileges of
|
||||
.\" UID 0 when no mapping has yet been defined, right?
|
||||
.\" Or is it simply the case that UID 0 in the parent namespace
|
||||
.\" always has all capabilities in the child namespace?
|
||||
.\"
|
||||
.IP 2.
|
||||
The writing process must be in either the user namespace of the process
|
||||
.I pid
|
||||
or inside the parent user namespace of the process
|
||||
.IR pid .
|
||||
.IP 3.
|
||||
One of the following is true:
|
||||
.RS
|
||||
.IP * 3
|
||||
The data written to
|
||||
.I uid_map
|
||||
.RI ( gid_map )
|
||||
consists of a single line that maps the writing process's file system user ID
|
||||
(group ID) in the parent user namespace to a user ID (group ID)
|
||||
in the user namespace.
|
||||
The usual case here is that this single line provides a mapping for user ID
|
||||
of the process that created the namespace.
|
||||
.IP * 3
|
||||
The process has the
|
||||
.BR CAP_SETUID
|
||||
.RB ( CAP_SETGID )
|
||||
capability in the parent user namespace.
|
||||
Thus, a privileged process can make mappings to arbitrary user IDs (group IDs)
|
||||
in the parent user namespace.
|
||||
.RE
|
||||
.PP
|
||||
Writes that violate the above rules fail with the error
|
||||
.BR EPERM .
|
||||
.PP
|
||||
In order to create a new user namespace,
|
||||
there must exist a mapping of the caller's effective
|
||||
user and group IDs into the parent namespace.
|
||||
If such a mapping does not exist, then
|
||||
.BR clone (2)
|
||||
and
|
||||
.BR unshare (2)
|
||||
fail with the error
|
||||
.BR EPERM .
|
||||
.PP
|
||||
When a process inside a user namespace executes
|
||||
a set-user-ID (set-group-ID) program,
|
||||
the process's effective user (group) ID inside the namespace is changed
|
||||
to whatever value is mapped for the user (group) ID of the file.
|
||||
However, if either the user
|
||||
.I or
|
||||
the group ID of the file has no mapping inside the namespace,
|
||||
the set-user-ID (set-group-ID) bit is silently ignored:
|
||||
the new program is executed,
|
||||
but the process's effective user (group) ID is left unchanged.
|
||||
(This mirrors the semantics of executing a set-user-ID or set-group-ID
|
||||
program that resides on a file system that was mounted with the
|
||||
.BR MS_NOSUID
|
||||
flag (see
|
||||
.BR mount (2).)
|
||||
.SH CONFORMING TO
|
||||
Namespaces are a Linux-specific feature.
|
||||
.SH SEE ALSO
|
||||
.BR unshare (1),
|
||||
.BR clone (2),
|
||||
.BR setns (2),
|
||||
.BR unshare (2),
|
||||
.BR proc (5),
|
||||
.BR credentials (7),
|
||||
.BR capabilities (7)
|
||||
.BR namespaces (7)
|
Loading…
Reference in New Issue