mirror of https://github.com/mkerrisk/man-pages
namespaces.7: Remove userns material shifted to user_namespaces(7)
The user namespaces section was getting long and unwieldy. Split it into its own page, so that it can be better structured with subtitles, etc. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
046de6a7d7
commit
67d1131fd9
|
@ -486,316 +486,8 @@ option.
|
||||||
.\" ==================== User namespaces ====================
|
.\" ==================== User namespaces ====================
|
||||||
.\"
|
.\"
|
||||||
.SS User namespaces (CLONE_NEWUSER)
|
.SS User namespaces (CLONE_NEWUSER)
|
||||||
User namespaces isolate security-related identifiers, in particular,
|
See
|
||||||
user IDs, group IDs, keys (see
|
.BR user_namespaces (7).
|
||||||
.BR keyctl (2)),
|
|
||||||
and capabilities.
|
|
||||||
A process's user and group IDs can be different
|
|
||||||
inside and outside a user namespace.
|
|
||||||
In particular,
|
|
||||||
a process can have a normal unprivileged user ID outside a user namespace
|
|
||||||
while at the same time having a user ID of 0 inside the namespace;
|
|
||||||
in other words,
|
|
||||||
the process has full privileges for operations inside the user namespace,
|
|
||||||
but is unprivileged for operations outside the namespace.
|
|
||||||
|
|
||||||
User namespaces can be nested;
|
|
||||||
that is, each user namespace has a parent user namespace,
|
|
||||||
and can have zero or more child user namespaces.
|
|
||||||
The parent of a user namespace is the user namespace
|
|
||||||
of the process that creates the user namespace via a call to
|
|
||||||
.BR unshare (2)
|
|
||||||
or
|
|
||||||
.BR clone (2)
|
|
||||||
with the
|
|
||||||
.BR CLONE_NEWUSER
|
|
||||||
flag.
|
|
||||||
|
|
||||||
When a user namespace is created,
|
|
||||||
it starts out without a mapping of user IDs (group IDs)
|
|
||||||
to the parent user namespace.
|
|
||||||
The desired mapping of user IDs (group IDs) to the parent user namespace
|
|
||||||
may be set by writing into
|
|
||||||
.IR /proc/[pid]/uid_map
|
|
||||||
.RI ( /proc/[pid]/gid_map );
|
|
||||||
see below.
|
|
||||||
|
|
||||||
The first process in a user namespace starts out with a complete set
|
|
||||||
of capabilities with respect to the new user namespace.
|
|
||||||
|
|
||||||
System calls that return user IDs (group IDs) will return
|
|
||||||
either the user ID (group ID) mapped into the current
|
|
||||||
user namespace if there is a mapping, or the overflow user ID (group ID);
|
|
||||||
the default value for the overflow user ID (group ID) is 65534.
|
|
||||||
See the descriptions of
|
|
||||||
.IR /proc/sys/kernel/overflowuid
|
|
||||||
and
|
|
||||||
.IR /proc/sys/kernel/overflowgid
|
|
||||||
in
|
|
||||||
.BR proc (5).
|
|
||||||
|
|
||||||
Starting in Linux 3.8, unprivileged processes can create user namespaces,
|
|
||||||
and mount, PID, IPC, network, and UTS namespaces can be created with just the
|
|
||||||
.B CAP_SYS_ADMIN
|
|
||||||
capability in the caller's user namespace.
|
|
||||||
|
|
||||||
If
|
|
||||||
.BR CLONE_NEWUSER
|
|
||||||
is specified along with other
|
|
||||||
.B CLONE_NEW*
|
|
||||||
flags in a single
|
|
||||||
.BR clone (2)
|
|
||||||
or
|
|
||||||
.BR unshare (2)
|
|
||||||
call, the user namespace is guaranteed to be created first,
|
|
||||||
giving the caller privileges over the remaining
|
|
||||||
namespaces created by the call.
|
|
||||||
Thus, it is possible for an unprivileged caller to specify this combination
|
|
||||||
of flags.
|
|
||||||
|
|
||||||
When a new IPC, mount, network, PID, or UTS namespace is created via
|
|
||||||
.BR clone (2)
|
|
||||||
or
|
|
||||||
.BR unshare (2),
|
|
||||||
the kernel records the user namespace of the creating process against
|
|
||||||
the new namespace.
|
|
||||||
When a process in the new namespace subsequently performs
|
|
||||||
privileged operations that operate on global
|
|
||||||
resources isolated by the namespace,
|
|
||||||
the permission checks are performed according to the process's capabilities
|
|
||||||
in the user namespace that the kernel associated with the new namespace.
|
|
||||||
|
|
||||||
|
|
||||||
The following rules apply with respect to the capabilities granted
|
|
||||||
to a process:
|
|
||||||
.\" In the 3.8 sources, see security/commoncap.c::cap_capable():
|
|
||||||
.IP 1. 3
|
|
||||||
If a process has a capability in a parent user namespace,
|
|
||||||
then it has that capability in all child (and further removed descendant)
|
|
||||||
namespaces as well.
|
|
||||||
.IP 2.
|
|
||||||
.\" * The owner of the user namespace in the parent of the
|
|
||||||
.\" * user namespace has all caps.
|
|
||||||
When a user namespace is created, the kernel records the effective
|
|
||||||
user ID of the creating process as being the "owner" of the namespace,
|
|
||||||
and likewise associates the effective group ID of the creating process
|
|
||||||
with the namespace.
|
|
||||||
A process whose effective user ID matches that of the
|
|
||||||
owner of a user namespace and which is a member of the parent namespace
|
|
||||||
(or a further removed namespace that is a direct ancestor)
|
|
||||||
has all capabilities in the user namespace.
|
|
||||||
.\" As a rough approximation, this means that
|
|
||||||
.\" the user who creates a user namespace
|
|
||||||
.\" has all capabilities inside that namespace and its descendants.
|
|
||||||
.PP
|
|
||||||
Use of user namespaces requires a kernel that is configured with the
|
|
||||||
.B CONFIG_USER_NS
|
|
||||||
option.
|
|
||||||
|
|
||||||
Over the years, there have been a lot of features that have been added
|
|
||||||
to the Linux kernel that are only available to privileged users
|
|
||||||
because of their potential to confuse set-user-ID-root applications.
|
|
||||||
In general, it becomes safe to allow the root user in a user namespace to
|
|
||||||
use those features because it is impossible, while in a user namespace,
|
|
||||||
to gain more privilege than the root user of a user namespace has.
|
|
||||||
|
|
||||||
The
|
|
||||||
.IR /proc/[pid]/uid_map
|
|
||||||
and
|
|
||||||
.IR /proc/[pid]/gid_map
|
|
||||||
files (available since Linux 3.5)
|
|
||||||
.\" commit 22d917d80e842829d0ca0a561967d728eb1d6303
|
|
||||||
expose the mappings for user and group IDs
|
|
||||||
inside the user namespace for the process
|
|
||||||
.IR pid .
|
|
||||||
The description here explains the details for
|
|
||||||
.IR uid_map ;
|
|
||||||
.IR gid_map
|
|
||||||
is exactly the same,
|
|
||||||
but each instance of "user ID" is replaced by "group ID".
|
|
||||||
|
|
||||||
The
|
|
||||||
.I uid_map
|
|
||||||
file exposes the mapping of user IDs from the user namespace
|
|
||||||
of the process
|
|
||||||
.IR pid
|
|
||||||
to the user namespace of the process that opened
|
|
||||||
.IR uid_map
|
|
||||||
(but see a qualification to this point below).
|
|
||||||
In other words, processes that are in different user namespaces
|
|
||||||
will potentially see different values when reading from a particular
|
|
||||||
.I uid_map
|
|
||||||
file, depending on the user ID mappings for the user namespaces
|
|
||||||
of the reading processes.
|
|
||||||
|
|
||||||
Each line in the
|
|
||||||
.I uid_map
|
|
||||||
file specifies a 1-to-1 mapping of a range of contiguous
|
|
||||||
user IDs between two user namespaces.
|
|
||||||
(When a user namespace is first created, this file is empty.)
|
|
||||||
The specification in each line takes the form of
|
|
||||||
three numbers delimited by white space.
|
|
||||||
The first two numbers specify the starting user ID in
|
|
||||||
each user namespace.
|
|
||||||
The third number specifies the length of the mapped range.
|
|
||||||
In detail, the fields are interpreted as follows:
|
|
||||||
.IP (1) 4
|
|
||||||
The start of the range of user IDs in
|
|
||||||
the user namespace of the process
|
|
||||||
.IR pid .
|
|
||||||
.IP (2)
|
|
||||||
The start of the range of user
|
|
||||||
IDs to which the user IDs specified by field one map.
|
|
||||||
How field two is interpreted depends on whether the process that opened
|
|
||||||
.I uid_map
|
|
||||||
and the process
|
|
||||||
.IR pid
|
|
||||||
are in the same user namespace, as follows:
|
|
||||||
.RS
|
|
||||||
.IP a) 3
|
|
||||||
If the two processes are in different user namespaces:
|
|
||||||
field two is the start of a range of
|
|
||||||
user IDs in the user namespace of the process that opened
|
|
||||||
.IR uid_map .
|
|
||||||
.IP b)
|
|
||||||
If the two processes are in the same user namespace:
|
|
||||||
field two is the start of the range of
|
|
||||||
user IDs in the parent user namespace of the process
|
|
||||||
.IR pid .
|
|
||||||
This case enables the opener of
|
|
||||||
.I uid_map
|
|
||||||
(the common case here is opening
|
|
||||||
.IR /proc/self/uid_map )
|
|
||||||
to see the mapping of user IDs into the user namespace of the process
|
|
||||||
that created this user namespace.
|
|
||||||
.RE
|
|
||||||
.IP (3)
|
|
||||||
The length of the range of user IDs that is mapped between the two
|
|
||||||
user namespaces.
|
|
||||||
.PP
|
|
||||||
After the creation of a new user namespace, the
|
|
||||||
.I uid_map
|
|
||||||
file of
|
|
||||||
.I one
|
|
||||||
of the process in the namespace may be written to
|
|
||||||
.I once
|
|
||||||
to define the mapping of user IDs in the new user namespace.
|
|
||||||
(An attempt to write more than once to a
|
|
||||||
.I uid_map
|
|
||||||
file in a user namespace fails with the error
|
|
||||||
.BR EPERM .)
|
|
||||||
|
|
||||||
The lines written to
|
|
||||||
.IR uid_map
|
|
||||||
must conform to the following rules:
|
|
||||||
.IP * 3
|
|
||||||
The three fields must be valid numbers,
|
|
||||||
and the last field must be greater than 0.
|
|
||||||
.IP *
|
|
||||||
Lines are terminated by newline characters.
|
|
||||||
.IP *
|
|
||||||
There is an (arbitrary) limit on the number of lines in the file.
|
|
||||||
As at Linux 3.8, the limit is five lines.
|
|
||||||
In addition, the number of bytes written to
|
|
||||||
the file must be less than the system page size,
|
|
||||||
.\" FIXME(Eric): the restriction "less than" rather than "less than or equal"
|
|
||||||
.\" seems strangely arbitrary. Furthermore, the comment does not agree
|
|
||||||
.\" with the code in kernel/user_namespace.c. Which is correct.
|
|
||||||
and the write must be performed at the start of the file (i.e.,
|
|
||||||
.BR lseek (2)
|
|
||||||
and
|
|
||||||
.BR pwrite (2)
|
|
||||||
can't be used to write to nonzero offsets in the file).
|
|
||||||
.IP *
|
|
||||||
The range of user IDs specified in each line cannot overlap with the ranges
|
|
||||||
in any other lines.
|
|
||||||
In the current implementation (Linux 3.8), this requirement is
|
|
||||||
satisfied by a simplistic implementation that imposes the further
|
|
||||||
requirement that
|
|
||||||
the values in both field 1 and field 2 of successive lines must be
|
|
||||||
in ascending numerical order.
|
|
||||||
.IP *
|
|
||||||
At least one line must be written to the file.
|
|
||||||
.PP
|
|
||||||
Writes that violate the above rules fail with the error
|
|
||||||
.BR EINVAL .
|
|
||||||
|
|
||||||
In order for a process to write to the
|
|
||||||
.I /proc/[pid]/uid_map
|
|
||||||
.RI ( /proc/[pid]/gid_map )
|
|
||||||
file, all of the following requirements must be met:
|
|
||||||
.IP 1. 3
|
|
||||||
The writing process must have the
|
|
||||||
.BR CAP_SETUID
|
|
||||||
.RB ( CAP_SETGID )
|
|
||||||
capability in the user namespace of the process
|
|
||||||
.IR pid .
|
|
||||||
.\" FIXME(Eric):
|
|
||||||
.\" Something isn't quite right in the description here.
|
|
||||||
.\" Suppose UID 1000 creates a user namespace. At this point, UID 0 in
|
|
||||||
.\" the parent namespace can write a map of (say) '0 1000 10' to uid_map.
|
|
||||||
.\" That succeeds. But how is that case covered in the three rules here?
|
|
||||||
.\" In other words, how does UID 0 in the parent namespace have any
|
|
||||||
.\" capabilities in the new child namespace? Somewhere on the page,
|
|
||||||
.\" I think there needs to be a statement about the privileges of
|
|
||||||
.\" UID 0 when no mapping has yet been defined, right?
|
|
||||||
.\" Or is it simply the case that UID 0 in the parent namespace
|
|
||||||
.\" always has all capabilities in the child namespace?
|
|
||||||
.\"
|
|
||||||
.IP 2.
|
|
||||||
The writing process must be in either the user namespace of the process
|
|
||||||
.I pid
|
|
||||||
or inside the parent user namespace of the process
|
|
||||||
.IR pid .
|
|
||||||
.IP 3.
|
|
||||||
One of the following is true:
|
|
||||||
.RS
|
|
||||||
.IP * 3
|
|
||||||
The data written to
|
|
||||||
.I uid_map
|
|
||||||
.RI ( gid_map )
|
|
||||||
consists of a single line that maps the writing process's file system user ID
|
|
||||||
(group ID) in the parent user namespace to a user ID (group ID)
|
|
||||||
in the user namespace.
|
|
||||||
The usual case here is that this single line provides a mapping for user ID
|
|
||||||
of the process that created the namespace.
|
|
||||||
.IP * 3
|
|
||||||
The process has the
|
|
||||||
.BR CAP_SETUID
|
|
||||||
.RB ( CAP_SETGID )
|
|
||||||
capability in the parent user namespace.
|
|
||||||
Thus, a privileged process can make mappings to arbitrary user IDs (group IDs)
|
|
||||||
in the parent user namespace.
|
|
||||||
.RE
|
|
||||||
.PP
|
|
||||||
Writes that violate the above rules fail with the error
|
|
||||||
.BR EPERM .
|
|
||||||
.PP
|
|
||||||
In order to create a new user namespace,
|
|
||||||
there must exist a mapping of the caller's effective
|
|
||||||
user and group IDs into the parent namespace.
|
|
||||||
If such a mapping does not exist, then
|
|
||||||
.BR clone (2)
|
|
||||||
and
|
|
||||||
.BR unshare (2)
|
|
||||||
fail with the error
|
|
||||||
.BR EPERM .
|
|
||||||
.PP
|
|
||||||
When a process inside a user namespace executes
|
|
||||||
a set-user-ID (set-group-ID) program,
|
|
||||||
the process's effective user (group) ID inside the namespace is changed
|
|
||||||
to whatever value is mapped for the user (group) ID of the file.
|
|
||||||
However, if either the user
|
|
||||||
.I or
|
|
||||||
the group ID of the file has no mapping inside the namespace,
|
|
||||||
the set-user-ID (set-group-ID) bit is silently ignored:
|
|
||||||
the new program is executed,
|
|
||||||
but the process's effective user (group) ID is left unchanged.
|
|
||||||
(This mirrors the semantics of executing a set-user-ID or set-group-ID
|
|
||||||
program that resides on a file system that was mounted with the
|
|
||||||
.BR MS_NOSUID
|
|
||||||
flag (see
|
|
||||||
.BR mount (2).)
|
|
||||||
.\"
|
.\"
|
||||||
.\" ==================== UTS namespaces ====================
|
.\" ==================== UTS namespaces ====================
|
||||||
.\"
|
.\"
|
||||||
|
@ -827,4 +519,5 @@ Namespaces are a Linux-specific feature.
|
||||||
.BR proc (5),
|
.BR proc (5),
|
||||||
.BR credentials (7),
|
.BR credentials (7),
|
||||||
.BR capabilities (7),
|
.BR capabilities (7),
|
||||||
|
.BR user_namespaces (7),
|
||||||
.BR switch_root (8)
|
.BR switch_root (8)
|
||||||
|
|
Loading…
Reference in New Issue