2013-02-27 06:07:03 +00:00
|
|
|
.\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
|
|
|
|
.\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\"
|
|
|
|
.\"
|
|
|
|
.TH USER_NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
|
|
|
user_namespaces \- overview of Linux user_namespaces
|
|
|
|
.SH DESCRIPTION
|
|
|
|
For an overview of namespaces, see
|
|
|
|
.BR namespaces (7).
|
|
|
|
|
|
|
|
User namespaces isolate security-related identifiers, in particular,
|
2013-02-27 08:21:24 +00:00
|
|
|
user IDs and group IDs (see
|
|
|
|
.BR credentials (7),
|
|
|
|
keys (see
|
2013-02-27 06:07:03 +00:00
|
|
|
.BR keyctl (2)),
|
2013-02-27 08:21:24 +00:00
|
|
|
and capabilities (see
|
|
|
|
.BR capabilities (7).
|
2013-02-27 06:07:03 +00:00
|
|
|
A process's user and group IDs can be different
|
|
|
|
inside and outside a user namespace.
|
|
|
|
In particular,
|
|
|
|
a process can have a normal unprivileged user ID outside a user namespace
|
|
|
|
while at the same time having a user ID of 0 inside the namespace;
|
|
|
|
in other words,
|
|
|
|
the process has full privileges for operations inside the user namespace,
|
|
|
|
but is unprivileged for operations outside the namespace.
|
|
|
|
|
|
|
|
User namespaces can be nested;
|
|
|
|
that is, each user namespace has a parent user namespace,
|
|
|
|
and can have zero or more child user namespaces.
|
2013-02-27 06:35:07 +00:00
|
|
|
The parent user namespace is the user namespace
|
2013-02-27 06:07:03 +00:00
|
|
|
of the process that creates the user namespace via a call to
|
|
|
|
.BR unshare (2)
|
|
|
|
or
|
|
|
|
.BR clone (2)
|
|
|
|
with the
|
|
|
|
.BR CLONE_NEWUSER
|
|
|
|
flag.
|
|
|
|
|
2013-02-27 06:35:07 +00:00
|
|
|
The first process in a user namespace starts out with a complete set
|
|
|
|
of capabilities with respect to the new user namespace.
|
2013-02-27 19:57:50 +00:00
|
|
|
On the other hand, that process has no capabilities in the outside
|
|
|
|
that user namespace.
|
|
|
|
Even if the new namespace is created by the root
|
|
|
|
user, the initial process will have no capabilities outside the
|
|
|
|
new user namespace.
|
|
|
|
(However, that process will be able to access resources such as
|
2013-02-27 22:57:23 +00:00
|
|
|
files that are owned by user ID 0,
|
2013-02-27 19:57:50 +00:00
|
|
|
and will be able to do things such as sending signals
|
|
|
|
to processes belonging to user ID 0.)
|
|
|
|
|
2013-02-27 06:07:03 +00:00
|
|
|
When a user namespace is created,
|
|
|
|
it starts out without a mapping of user IDs (group IDs)
|
|
|
|
to the parent user namespace.
|
|
|
|
The desired mapping of user IDs (group IDs) to the parent user namespace
|
|
|
|
may be set by writing into
|
|
|
|
.IR /proc/[pid]/uid_map
|
|
|
|
.RI ( /proc/[pid]/gid_map );
|
|
|
|
see below.
|
2013-02-27 06:35:07 +00:00
|
|
|
.PP
|
|
|
|
In order to create a new user namespace,
|
|
|
|
there must exist a mapping of the caller's effective
|
|
|
|
user and group IDs into the parent namespace.
|
|
|
|
If such a mapping does not exist, then
|
|
|
|
.BR clone (2)
|
|
|
|
and
|
|
|
|
.BR unshare (2)
|
|
|
|
fail with the error
|
|
|
|
.BR EPERM .
|
2013-02-27 06:07:03 +00:00
|
|
|
|
2013-02-27 21:36:02 +00:00
|
|
|
System calls that return user IDs (group IDs)\(emfor example,
|
|
|
|
.BR getuid (2),
|
|
|
|
.BR getgid (2),
|
|
|
|
and the credential fields in the structure returned by
|
|
|
|
.BR stat (2)\(emwill
|
|
|
|
return either the user ID (group ID) mapped into the current
|
2013-02-27 06:07:03 +00:00
|
|
|
user namespace if there is a mapping, or the overflow user ID (group ID);
|
|
|
|
the default value for the overflow user ID (group ID) is 65534.
|
|
|
|
See the descriptions of
|
|
|
|
.IR /proc/sys/kernel/overflowuid
|
|
|
|
and
|
|
|
|
.IR /proc/sys/kernel/overflowgid
|
|
|
|
in
|
|
|
|
.BR proc (5).
|
2013-02-27 20:59:34 +00:00
|
|
|
|
2013-02-27 23:05:14 +00:00
|
|
|
When a process accesses a file, its user and group IDs
|
|
|
|
are mapped into the initial user namespace for the purpose of permission
|
|
|
|
checking and assigning IDs when creating a file.
|
|
|
|
When a process retrieves file user and group IDs via
|
|
|
|
.BR stat (2)
|
|
|
|
the IDs are mapped in the opposite direction,
|
|
|
|
to produce values relative to the process user and group ID mappings.
|
|
|
|
|
2013-02-27 20:59:34 +00:00
|
|
|
When a process's user and group IDs are passed over a UNIX domain socket
|
|
|
|
to a process in a different user namespace (see the description of
|
|
|
|
.B SCM_CREDENTIALS
|
|
|
|
in
|
|
|
|
.BR unix (7)),
|
|
|
|
they are translated into the corresponding values as per the
|
|
|
|
receiving process's user and group ID mappings.
|
|
|
|
|
2013-02-27 06:35:07 +00:00
|
|
|
Use of user namespaces requires a kernel that is configured with the
|
|
|
|
.B CONFIG_USER_NS
|
|
|
|
option.
|
2013-02-27 09:03:52 +00:00
|
|
|
.\"
|
|
|
|
.\" ============================================================
|
|
|
|
.\"
|
2013-02-27 06:35:07 +00:00
|
|
|
.SS Interaction of user namespaces and other types of namespaces
|
2013-02-27 06:07:03 +00:00
|
|
|
Starting in Linux 3.8, unprivileged processes can create user namespaces,
|
|
|
|
and mount, PID, IPC, network, and UTS namespaces can be created with just the
|
|
|
|
.B CAP_SYS_ADMIN
|
|
|
|
capability in the caller's user namespace.
|
|
|
|
|
|
|
|
If
|
|
|
|
.BR CLONE_NEWUSER
|
|
|
|
is specified along with other
|
|
|
|
.B CLONE_NEW*
|
|
|
|
flags in a single
|
|
|
|
.BR clone (2)
|
|
|
|
or
|
|
|
|
.BR unshare (2)
|
|
|
|
call, the user namespace is guaranteed to be created first,
|
|
|
|
giving the caller privileges over the remaining
|
|
|
|
namespaces created by the call.
|
|
|
|
Thus, it is possible for an unprivileged caller to specify this combination
|
|
|
|
of flags.
|
|
|
|
|
|
|
|
When a new IPC, mount, network, PID, or UTS namespace is created via
|
|
|
|
.BR clone (2)
|
|
|
|
or
|
|
|
|
.BR unshare (2),
|
|
|
|
the kernel records the user namespace of the creating process against
|
|
|
|
the new namespace.
|
|
|
|
When a process in the new namespace subsequently performs
|
|
|
|
privileged operations that operate on global
|
|
|
|
resources isolated by the namespace,
|
|
|
|
the permission checks are performed according to the process's capabilities
|
|
|
|
in the user namespace that the kernel associated with the new namespace.
|
2013-02-27 09:03:52 +00:00
|
|
|
.\"
|
|
|
|
.\" ============================================================
|
|
|
|
.\"
|
2013-02-27 06:35:07 +00:00
|
|
|
.SS Capabilities
|
2013-02-27 08:55:10 +00:00
|
|
|
A process may have a capability either
|
|
|
|
because that capability is present in its effective capability set,
|
|
|
|
or because it inherits the capability from a parent user namespace
|
|
|
|
according to the following rules:
|
2013-02-27 06:07:03 +00:00
|
|
|
.\" In the 3.8 sources, see security/commoncap.c::cap_capable():
|
|
|
|
.IP 1. 3
|
2013-02-27 19:57:50 +00:00
|
|
|
If a process has a capability in a user namespace,
|
2013-02-27 06:07:03 +00:00
|
|
|
then it has that capability in all child (and further removed descendant)
|
|
|
|
namespaces as well.
|
|
|
|
.IP 2.
|
|
|
|
.\" * The owner of the user namespace in the parent of the
|
|
|
|
.\" * user namespace has all caps.
|
|
|
|
When a user namespace is created, the kernel records the effective
|
2013-02-27 08:55:10 +00:00
|
|
|
user ID of the creating process as being the "owner" of the namespace
|
|
|
|
(and likewise associates the effective group ID of the creating process
|
|
|
|
with the namespace).
|
|
|
|
.IP
|
2013-02-27 06:07:03 +00:00
|
|
|
A process whose effective user ID matches that of the
|
|
|
|
owner of a user namespace and which is a member of the parent namespace
|
|
|
|
has all capabilities in the user namespace.
|
2013-02-27 08:55:10 +00:00
|
|
|
By virtue of the first rule,
|
|
|
|
this means that the process has all capabilities in all
|
|
|
|
further removed descendant user namespaces as well.
|
2013-02-27 06:07:03 +00:00
|
|
|
.\" As a rough approximation, this means that
|
|
|
|
.\" the user who creates a user namespace
|
|
|
|
.\" has all capabilities inside that namespace and its descendants.
|
2013-02-27 09:03:52 +00:00
|
|
|
.\"
|
|
|
|
.\" ============================================================
|
|
|
|
.\"
|
2013-02-27 06:35:07 +00:00
|
|
|
.SS User and group ID mappings: uid_map and gid_map
|
2013-02-27 06:07:03 +00:00
|
|
|
The
|
|
|
|
.IR /proc/[pid]/uid_map
|
|
|
|
and
|
|
|
|
.IR /proc/[pid]/gid_map
|
|
|
|
files (available since Linux 3.5)
|
|
|
|
.\" commit 22d917d80e842829d0ca0a561967d728eb1d6303
|
|
|
|
expose the mappings for user and group IDs
|
|
|
|
inside the user namespace for the process
|
|
|
|
.IR pid .
|
2013-02-27 06:35:07 +00:00
|
|
|
These files can be read to view the mappings in a user namespace and
|
|
|
|
written to (once) to define the mappings.
|
|
|
|
|
|
|
|
The description in the following paragraphs explains the details for
|
2013-02-27 06:07:03 +00:00
|
|
|
.IR uid_map ;
|
|
|
|
.IR gid_map
|
|
|
|
is exactly the same,
|
|
|
|
but each instance of "user ID" is replaced by "group ID".
|
|
|
|
|
|
|
|
The
|
|
|
|
.I uid_map
|
|
|
|
file exposes the mapping of user IDs from the user namespace
|
|
|
|
of the process
|
|
|
|
.IR pid
|
|
|
|
to the user namespace of the process that opened
|
|
|
|
.IR uid_map
|
|
|
|
(but see a qualification to this point below).
|
|
|
|
In other words, processes that are in different user namespaces
|
|
|
|
will potentially see different values when reading from a particular
|
|
|
|
.I uid_map
|
|
|
|
file, depending on the user ID mappings for the user namespaces
|
|
|
|
of the reading processes.
|
|
|
|
|
|
|
|
Each line in the
|
|
|
|
.I uid_map
|
|
|
|
file specifies a 1-to-1 mapping of a range of contiguous
|
|
|
|
user IDs between two user namespaces.
|
|
|
|
(When a user namespace is first created, this file is empty.)
|
|
|
|
The specification in each line takes the form of
|
|
|
|
three numbers delimited by white space.
|
|
|
|
The first two numbers specify the starting user ID in
|
|
|
|
each user namespace.
|
|
|
|
The third number specifies the length of the mapped range.
|
|
|
|
In detail, the fields are interpreted as follows:
|
|
|
|
.IP (1) 4
|
|
|
|
The start of the range of user IDs in
|
|
|
|
the user namespace of the process
|
|
|
|
.IR pid .
|
|
|
|
.IP (2)
|
|
|
|
The start of the range of user
|
|
|
|
IDs to which the user IDs specified by field one map.
|
|
|
|
How field two is interpreted depends on whether the process that opened
|
|
|
|
.I uid_map
|
|
|
|
and the process
|
|
|
|
.IR pid
|
|
|
|
are in the same user namespace, as follows:
|
|
|
|
.RS
|
|
|
|
.IP a) 3
|
|
|
|
If the two processes are in different user namespaces:
|
|
|
|
field two is the start of a range of
|
|
|
|
user IDs in the user namespace of the process that opened
|
|
|
|
.IR uid_map .
|
|
|
|
.IP b)
|
|
|
|
If the two processes are in the same user namespace:
|
|
|
|
field two is the start of the range of
|
|
|
|
user IDs in the parent user namespace of the process
|
|
|
|
.IR pid .
|
|
|
|
This case enables the opener of
|
|
|
|
.I uid_map
|
|
|
|
(the common case here is opening
|
|
|
|
.IR /proc/self/uid_map )
|
|
|
|
to see the mapping of user IDs into the user namespace of the process
|
|
|
|
that created this user namespace.
|
|
|
|
.RE
|
|
|
|
.IP (3)
|
|
|
|
The length of the range of user IDs that is mapped between the two
|
|
|
|
user namespaces.
|
2013-02-27 09:03:52 +00:00
|
|
|
.\"
|
|
|
|
.\" ============================================================
|
|
|
|
.\"
|
2013-02-27 06:35:07 +00:00
|
|
|
.SS Defining user and group ID mappings: writing to uid_map and gid_map
|
2013-02-27 06:07:03 +00:00
|
|
|
.PP
|
|
|
|
After the creation of a new user namespace, the
|
|
|
|
.I uid_map
|
|
|
|
file of
|
|
|
|
.I one
|
|
|
|
of the process in the namespace may be written to
|
|
|
|
.I once
|
|
|
|
to define the mapping of user IDs in the new user namespace.
|
|
|
|
(An attempt to write more than once to a
|
|
|
|
.I uid_map
|
|
|
|
file in a user namespace fails with the error
|
|
|
|
.BR EPERM .)
|
|
|
|
|
|
|
|
The lines written to
|
|
|
|
.IR uid_map
|
|
|
|
must conform to the following rules:
|
|
|
|
.IP * 3
|
|
|
|
The three fields must be valid numbers,
|
|
|
|
and the last field must be greater than 0.
|
|
|
|
.IP *
|
|
|
|
Lines are terminated by newline characters.
|
|
|
|
.IP *
|
|
|
|
There is an (arbitrary) limit on the number of lines in the file.
|
|
|
|
As at Linux 3.8, the limit is five lines.
|
|
|
|
In addition, the number of bytes written to
|
|
|
|
the file must be less than the system page size,
|
|
|
|
.\" FIXME(Eric): the restriction "less than" rather than "less than or equal"
|
|
|
|
.\" seems strangely arbitrary. Furthermore, the comment does not agree
|
|
|
|
.\" with the code in kernel/user_namespace.c. Which is correct.
|
|
|
|
and the write must be performed at the start of the file (i.e.,
|
|
|
|
.BR lseek (2)
|
|
|
|
and
|
|
|
|
.BR pwrite (2)
|
|
|
|
can't be used to write to nonzero offsets in the file).
|
|
|
|
.IP *
|
|
|
|
The range of user IDs specified in each line cannot overlap with the ranges
|
|
|
|
in any other lines.
|
|
|
|
In the current implementation (Linux 3.8), this requirement is
|
|
|
|
satisfied by a simplistic implementation that imposes the further
|
|
|
|
requirement that
|
|
|
|
the values in both field 1 and field 2 of successive lines must be
|
|
|
|
in ascending numerical order.
|
|
|
|
.IP *
|
|
|
|
At least one line must be written to the file.
|
|
|
|
.PP
|
|
|
|
Writes that violate the above rules fail with the error
|
|
|
|
.BR EINVAL .
|
|
|
|
|
|
|
|
In order for a process to write to the
|
|
|
|
.I /proc/[pid]/uid_map
|
|
|
|
.RI ( /proc/[pid]/gid_map )
|
|
|
|
file, all of the following requirements must be met:
|
|
|
|
.IP 1. 3
|
|
|
|
The writing process must have the
|
|
|
|
.BR CAP_SETUID
|
|
|
|
.RB ( CAP_SETGID )
|
|
|
|
capability in the user namespace of the process
|
|
|
|
.IR pid .
|
|
|
|
.IP 2.
|
|
|
|
The writing process must be in either the user namespace of the process
|
|
|
|
.I pid
|
|
|
|
or inside the parent user namespace of the process
|
|
|
|
.IR pid .
|
|
|
|
.IP 3.
|
|
|
|
One of the following is true:
|
|
|
|
.RS
|
|
|
|
.IP * 3
|
|
|
|
The data written to
|
|
|
|
.I uid_map
|
|
|
|
.RI ( gid_map )
|
|
|
|
consists of a single line that maps the writing process's file system user ID
|
|
|
|
(group ID) in the parent user namespace to a user ID (group ID)
|
|
|
|
in the user namespace.
|
|
|
|
The usual case here is that this single line provides a mapping for user ID
|
|
|
|
of the process that created the namespace.
|
|
|
|
.IP * 3
|
|
|
|
The process has the
|
|
|
|
.BR CAP_SETUID
|
|
|
|
.RB ( CAP_SETGID )
|
|
|
|
capability in the parent user namespace.
|
|
|
|
Thus, a privileged process can make mappings to arbitrary user IDs (group IDs)
|
|
|
|
in the parent user namespace.
|
|
|
|
.RE
|
|
|
|
.PP
|
|
|
|
Writes that violate the above rules fail with the error
|
|
|
|
.BR EPERM .
|
2013-02-27 09:03:52 +00:00
|
|
|
.\"
|
|
|
|
.\" ============================================================
|
|
|
|
.\"
|
2013-02-27 06:35:07 +00:00
|
|
|
.SS Set-user-ID and set-group-ID programs
|
2013-02-27 06:07:03 +00:00
|
|
|
.PP
|
|
|
|
When a process inside a user namespace executes
|
|
|
|
a set-user-ID (set-group-ID) program,
|
|
|
|
the process's effective user (group) ID inside the namespace is changed
|
|
|
|
to whatever value is mapped for the user (group) ID of the file.
|
|
|
|
However, if either the user
|
|
|
|
.I or
|
|
|
|
the group ID of the file has no mapping inside the namespace,
|
|
|
|
the set-user-ID (set-group-ID) bit is silently ignored:
|
|
|
|
the new program is executed,
|
|
|
|
but the process's effective user (group) ID is left unchanged.
|
|
|
|
(This mirrors the semantics of executing a set-user-ID or set-group-ID
|
|
|
|
program that resides on a file system that was mounted with the
|
|
|
|
.BR MS_NOSUID
|
|
|
|
flag (see
|
|
|
|
.BR mount (2).)
|
|
|
|
.SH CONFORMING TO
|
|
|
|
Namespaces are a Linux-specific feature.
|
2013-02-27 06:35:07 +00:00
|
|
|
.SH NOTES
|
|
|
|
Over the years, there have been a lot of features that have been added
|
2013-02-27 08:22:56 +00:00
|
|
|
to the Linux kernel that have been made available only to privileged users
|
2013-02-27 06:35:07 +00:00
|
|
|
because of their potential to confuse set-user-ID-root applications.
|
|
|
|
In general, it becomes safe to allow the root user in a user namespace to
|
|
|
|
use those features because it is impossible, while in a user namespace,
|
|
|
|
to gain more privilege than the root user of a user namespace has.
|
2013-02-27 06:07:03 +00:00
|
|
|
.SH SEE ALSO
|
|
|
|
.BR unshare (1),
|
|
|
|
.BR clone (2),
|
|
|
|
.BR setns (2),
|
|
|
|
.BR unshare (2),
|
|
|
|
.BR proc (5),
|
|
|
|
.BR credentials (7),
|
|
|
|
.BR capabilities (7)
|
|
|
|
.BR namespaces (7)
|