mirror of https://github.com/mkerrisk/man-pages
clone.2, namespaces.7: Move some CLONE_NEWUSER text from clone.2 to namespaces.7
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
3dd2331ce7
commit
9d005472a8
179
man2/clone.2
179
man2/clone.2
|
@ -379,90 +379,6 @@ in the same
|
||||||
.BR clone ()
|
.BR clone ()
|
||||||
call.
|
call.
|
||||||
|
|
||||||
.TP
|
|
||||||
.BR CLONE_NEWUSER
|
|
||||||
(This flag first became meaningful for
|
|
||||||
.BR clone ()
|
|
||||||
in Linux 2.6.23,
|
|
||||||
the current
|
|
||||||
.BR clone()
|
|
||||||
semantics were merged in Linux 3.5,
|
|
||||||
and the final pieces to make the user namespaces completely usable were
|
|
||||||
merged in Linux 3.8.)
|
|
||||||
|
|
||||||
If
|
|
||||||
.B CLONE_NEWUSER
|
|
||||||
is set, then create the process in a new user namespace.
|
|
||||||
If this flag is not set, then (as with
|
|
||||||
.BR fork (2))
|
|
||||||
the process is created in the same user namespace as the calling process.
|
|
||||||
|
|
||||||
A user namespace provides an isolated environment for
|
|
||||||
security related identifiers, in particular,
|
|
||||||
user IDs, group IDs, keys (see
|
|
||||||
.BR keyctl (2)),
|
|
||||||
and capabilities.
|
|
||||||
|
|
||||||
When a user namespace is created,
|
|
||||||
it starts out without a mapping of user IDs (group IDs)
|
|
||||||
to the parent user namespace.
|
|
||||||
The desired mapping of user IDs (group IDs) to the parent user namespace
|
|
||||||
may be set by writing into
|
|
||||||
.IR /proc/[pid]/uid_map
|
|
||||||
.RI ( /proc/[pid]/gid_map );
|
|
||||||
see
|
|
||||||
.BR proc (5).
|
|
||||||
|
|
||||||
The first process in a user namespace starts out with a complete set
|
|
||||||
of capabilities with respect to the new user namespace.
|
|
||||||
|
|
||||||
System calls that return user IDs (group IDs) will return
|
|
||||||
either the user ID (group ID) mapped into the current
|
|
||||||
user namespace if there is a mapping, or the overflow user ID (group ID);
|
|
||||||
the default value for the overflow user ID (group ID) is 65534.
|
|
||||||
See the descriptions of
|
|
||||||
.IR /proc/sys/kernel/overflowuid
|
|
||||||
and
|
|
||||||
.IR /proc/sys/kernel/overflowgid
|
|
||||||
in
|
|
||||||
.BR proc (5).
|
|
||||||
|
|
||||||
Use of this flag requires a kernel configured with the
|
|
||||||
.BR CONFIG_USER_NS
|
|
||||||
option.
|
|
||||||
Before Linux 3.8, use of
|
|
||||||
.BR CLONE_NEWUSER
|
|
||||||
required that the caller have three capabilities:
|
|
||||||
.BR CAP_SYS_ADMIN ,
|
|
||||||
.BR CAP_SETUID ,
|
|
||||||
and
|
|
||||||
.BR CAP_SETGID .
|
|
||||||
.\" Before Linux 2.6.29, it appears that only CAP_SYS_ADMIN was needed
|
|
||||||
Starting with Linux 3.8,
|
|
||||||
no privileges are needed to create a user namespace,
|
|
||||||
and mount, PID, IPC, network, and UTS namespaces can be created with just the
|
|
||||||
.B CAP_SYS_ADMIN
|
|
||||||
capability in the caller's user namespace.
|
|
||||||
|
|
||||||
If
|
|
||||||
.BR CLONE_NEWUSER
|
|
||||||
is specified along with other
|
|
||||||
.B CLONE_NEW*
|
|
||||||
flags in a single
|
|
||||||
.BR clone()
|
|
||||||
call, the user namespace is guaranteed to be created first,
|
|
||||||
giving the caller privileges over the remaining
|
|
||||||
namespaces created by the call.
|
|
||||||
Thus, it possible for an unprivileged caller to specify this combination
|
|
||||||
of flags.
|
|
||||||
|
|
||||||
Over the years, there have been a lot of features that have been added
|
|
||||||
to the Linux kernel that are only available to privileged users
|
|
||||||
because of their potential to confuse set-user-ID-root applications.
|
|
||||||
In general, it becomes safe to allow the root user in a user namespace to
|
|
||||||
use those features because it is impossible, while in a user namespace,
|
|
||||||
to gain more privilege than the root user of a user namespace has.
|
|
||||||
|
|
||||||
.TP
|
.TP
|
||||||
.BR CLONE_NEWPID " (since Linux 2.6.24)"
|
.BR CLONE_NEWPID " (since Linux 2.6.24)"
|
||||||
.\" This explanation draws a lot of details from
|
.\" This explanation draws a lot of details from
|
||||||
|
@ -481,68 +397,47 @@ the process is created in the same PID namespace as
|
||||||
the calling process.
|
the calling process.
|
||||||
This flag is intended for the implementation of containers.
|
This flag is intended for the implementation of containers.
|
||||||
|
|
||||||
A PID namespace provides an isolated environment for PIDs:
|
For further information on PID namespaces, see
|
||||||
PIDs in a new namespace start at 1,
|
.BR namespaces (7).
|
||||||
somewhat like a standalone system, and calls to
|
|
||||||
.BR fork (2),
|
|
||||||
.BR vfork (2),
|
|
||||||
or
|
|
||||||
.BR clone ()
|
|
||||||
will produce processes with PIDs that are unique within the namespace.
|
|
||||||
|
|
||||||
The first process created in a new namespace
|
Use of this flag requires
|
||||||
(i.e., the process created using the
|
that the process be privileged
|
||||||
.BR CLONE_NEWPID
|
|
||||||
flag) has the PID 1, and is the "init" process for the namespace.
|
|
||||||
Children that are orphaned within the namespace will be reparented
|
|
||||||
to this process rather than
|
|
||||||
.BR init (8).
|
|
||||||
Unlike the traditional
|
|
||||||
.B init
|
|
||||||
process, the "init" process of a PID namespace can terminate,
|
|
||||||
and if it does, all of the processes in the namespace are terminated.
|
|
||||||
|
|
||||||
PID namespaces form a hierarchy.
|
|
||||||
When a new PID namespace is created,
|
|
||||||
the processes in that namespace are visible
|
|
||||||
in the PID namespace of the process that created the new namespace;
|
|
||||||
analogously, if the parent PID namespace is itself
|
|
||||||
the child of another PID namespace,
|
|
||||||
then processes in the child and parent PID namespaces will both be
|
|
||||||
visible in the grandparent PID namespace.
|
|
||||||
Conversely, the processes in the "child" PID namespace do not see
|
|
||||||
the processes in the parent namespace.
|
|
||||||
The existence of a namespace hierarchy means that each process
|
|
||||||
may now have multiple PIDs:
|
|
||||||
one for each namespace in which it is visible;
|
|
||||||
each of these PIDs is unique within the corresponding namespace.
|
|
||||||
(A call to
|
|
||||||
.BR getpid (2)
|
|
||||||
always returns the PID associated with the namespace in which
|
|
||||||
the process lives.)
|
|
||||||
|
|
||||||
After creating the new namespace,
|
|
||||||
it is useful for the child to change its root directory
|
|
||||||
and mount a new procfs instance at
|
|
||||||
.I /proc
|
|
||||||
so that tools such as
|
|
||||||
.BR ps (1)
|
|
||||||
work correctly.
|
|
||||||
.\" mount -t proc proc /proc
|
|
||||||
(If
|
|
||||||
.BR CLONE_NEWNS
|
|
||||||
is also included in
|
|
||||||
.IR flags ,
|
|
||||||
then it isn't necessary to change the root directory:
|
|
||||||
a new procfs instance can be mounted directly over
|
|
||||||
.IR /proc .)
|
|
||||||
|
|
||||||
Use of this flag requires: a kernel configured with the
|
|
||||||
.B CONFIG_PID_NS
|
|
||||||
option and that the process be privileged
|
|
||||||
.RB ( CAP_SYS_ADMIN ).
|
.RB ( CAP_SYS_ADMIN ).
|
||||||
This flag can't be specified in conjunction with
|
This flag can't be specified in conjunction with
|
||||||
.BR CLONE_THREAD .
|
.BR CLONE_THREAD .
|
||||||
|
|
||||||
|
.TP
|
||||||
|
.BR CLONE_NEWUSER
|
||||||
|
(This flag first became meaningful for
|
||||||
|
.BR clone ()
|
||||||
|
in Linux 2.6.23,
|
||||||
|
the current
|
||||||
|
.BR clone()
|
||||||
|
semantics were merged in Linux 3.5,
|
||||||
|
and the final pieces to make the user namespaces completely usable were
|
||||||
|
merged in Linux 3.8.)
|
||||||
|
|
||||||
|
If
|
||||||
|
.B CLONE_NEWUSER
|
||||||
|
is set, then create the process in a new user namespace.
|
||||||
|
If this flag is not set, then (as with
|
||||||
|
.BR fork (2))
|
||||||
|
the process is created in the same user namespace as the calling process.
|
||||||
|
|
||||||
|
For further information on user namespaces, see
|
||||||
|
.BR namespaces (7).
|
||||||
|
|
||||||
|
Before Linux 3.8, use of
|
||||||
|
.BR CLONE_NEWUSER
|
||||||
|
required that the caller have three capabilities:
|
||||||
|
.BR CAP_SYS_ADMIN ,
|
||||||
|
.BR CAP_SETUID ,
|
||||||
|
and
|
||||||
|
.BR CAP_SETGID .
|
||||||
|
.\" Before Linux 2.6.29, it appears that only CAP_SYS_ADMIN was needed
|
||||||
|
Starting with Linux 3.8,
|
||||||
|
no privileges are needed to create a user namespace.
|
||||||
|
|
||||||
.TP
|
.TP
|
||||||
.BR CLONE_NEWUTS " (since Linux 2.6.19)"
|
.BR CLONE_NEWUTS " (since Linux 2.6.19)"
|
||||||
If
|
If
|
||||||
|
|
|
@ -292,27 +292,88 @@ PID namespaces isolate the process ID number space,
|
||||||
meaning that processes in different PID namespaces can have the same PID.
|
meaning that processes in different PID namespaces can have the same PID.
|
||||||
PID namespaces allow containers to migrate to a new hosts
|
PID namespaces allow containers to migrate to a new hosts
|
||||||
while the processes inside the container maintain the same PIDs.
|
while the processes inside the container maintain the same PIDs.
|
||||||
Each PID namespace has its own init (PID 1, see
|
|
||||||
.BR init (1)),
|
|
||||||
the "ancestor of all processes" that
|
|
||||||
manages various system initialization tasks and
|
|
||||||
reaps orphaned child processes when they terminate.
|
|
||||||
|
|
||||||
From the point of view of a particular PID namespace instance,
|
PIDs in a new PID namespace start at 1,
|
||||||
a process has two PIDs: the PID inside the namespace,
|
somewhat like a standalone system, and calls to
|
||||||
and the PID outside the namespace on the host system.
|
.BR fork (2),
|
||||||
PID namespaces can be nested:
|
.BR vfork (2),
|
||||||
a process will have one PID for each of the layers of the hierarchy
|
or
|
||||||
|
.BR clone (2)
|
||||||
|
will produce processes with PIDs that are unique within the namespace.
|
||||||
|
|
||||||
|
The first process created in a new namespace
|
||||||
|
(i.e., the process created using
|
||||||
|
.BR clone (2)
|
||||||
|
with the
|
||||||
|
.BR CLONE_NEWPID
|
||||||
|
flag, or the first child created by a process after a call to
|
||||||
|
.BR unshare (2)
|
||||||
|
using the
|
||||||
|
.BR CLONE_NEWPID
|
||||||
|
flag) has the PID 1, and is the "init" process for the namespace (see
|
||||||
|
.BR init (1)).
|
||||||
|
Children that are orphaned within the namespace will be reparented
|
||||||
|
to this process rather than
|
||||||
|
.BR init (8).
|
||||||
|
Unlike the traditional
|
||||||
|
.B init
|
||||||
|
process, the "init" process of a PID namespace can terminate,
|
||||||
|
and if it does, all of the processes in the namespace are terminated.
|
||||||
|
|
||||||
|
PID namespaces can be nested.
|
||||||
|
When a new PID namespace is created,
|
||||||
|
the processes in that namespace are visible
|
||||||
|
in the PID namespace of the process that created the new namespace;
|
||||||
|
analogously, if the parent PID namespace is itself
|
||||||
|
the child of another PID namespace,
|
||||||
|
then processes in the child and parent PID namespaces will both be
|
||||||
|
visible in the grandparent PID namespace.
|
||||||
|
Conversely, the processes in the "child" PID namespace do not see
|
||||||
|
the processes in the parent namespace.
|
||||||
|
More succinctly: a process can see (e.g., send signals with
|
||||||
|
.BR kill(2))
|
||||||
|
only to processes contained in its own PID namespace
|
||||||
|
and the namespaces nested below that PID namespace.
|
||||||
|
|
||||||
|
A process will have one PID for each of the layers of the hierarchy
|
||||||
starting from the PID namespace in which it resides
|
starting from the PID namespace in which it resides
|
||||||
through to the root PID namespace.
|
through to the root PID namespace.
|
||||||
A process can see (e.g., send signals with
|
A call to
|
||||||
.BR kill(2))
|
.BR getpid (2)
|
||||||
only processes contained in its own PID namespace
|
always returns the PID associated with the namespace in which
|
||||||
and the namespaces nested below that PID namespace.
|
the process resides.
|
||||||
|
|
||||||
|
After creating a new PID namespace,
|
||||||
|
it is useful for the child to change its root directory
|
||||||
|
and mount a new procfs instance at
|
||||||
|
.I /proc
|
||||||
|
so that tools such as
|
||||||
|
.BR ps (1)
|
||||||
|
work correctly.
|
||||||
|
.\" mount -t proc proc /proc
|
||||||
|
(If
|
||||||
|
.BR CLONE_NEWNS
|
||||||
|
is also included in the
|
||||||
|
.IR flags
|
||||||
|
argument of
|
||||||
|
.BR clone (2)
|
||||||
|
or
|
||||||
|
.BR unshare (2)),
|
||||||
|
then it isn't necessary to change the root directory:
|
||||||
|
a new procfs instance can be mounted directly over
|
||||||
|
.IR /proc .)
|
||||||
|
|
||||||
|
Use of PID namespaces requires a kernel that is configured with the
|
||||||
|
.B CONFIG_PID_NS
|
||||||
|
option.
|
||||||
|
|
||||||
.SS User namespaces (CLONE_NEWUSER)
|
.SS User namespaces (CLONE_NEWUSER)
|
||||||
|
|
||||||
User namespaces isolate the user and group ID number spaces.
|
User namespaces isolate
|
||||||
|
security related identifiers, in particular,
|
||||||
|
user IDs, group IDs, keys (see
|
||||||
|
.BR keyctl (2)),
|
||||||
|
and capabilities.
|
||||||
In other words, a process's user and group IDs can be different
|
In other words, a process's user and group IDs can be different
|
||||||
inside and outside a user namespace.
|
inside and outside a user namespace.
|
||||||
A process can have a normal unprivileged user ID outside a user namespace
|
A process can have a normal unprivileged user ID outside a user namespace
|
||||||
|
@ -321,7 +382,58 @@ in other words,
|
||||||
the process has full privileges for operations inside the user namespace,
|
the process has full privileges for operations inside the user namespace,
|
||||||
but is unprivileged for operations outside the namespace.
|
but is unprivileged for operations outside the namespace.
|
||||||
|
|
||||||
Starting in Linux 3.8, unprivileged processes can create user namespaces.
|
When a user namespace is created,
|
||||||
|
it starts out without a mapping of user IDs (group IDs)
|
||||||
|
to the parent user namespace.
|
||||||
|
The desired mapping of user IDs (group IDs) to the parent user namespace
|
||||||
|
may be set by writing into
|
||||||
|
.IR /proc/[pid]/uid_map
|
||||||
|
.RI ( /proc/[pid]/gid_map );
|
||||||
|
see below.
|
||||||
|
|
||||||
|
The first process in a user namespace starts out with a complete set
|
||||||
|
of capabilities with respect to the new user namespace.
|
||||||
|
|
||||||
|
System calls that return user IDs (group IDs) will return
|
||||||
|
either the user ID (group ID) mapped into the current
|
||||||
|
user namespace if there is a mapping, or the overflow user ID (group ID);
|
||||||
|
the default value for the overflow user ID (group ID) is 65534.
|
||||||
|
See the descriptions of
|
||||||
|
.IR /proc/sys/kernel/overflowuid
|
||||||
|
and
|
||||||
|
.IR /proc/sys/kernel/overflowgid
|
||||||
|
in
|
||||||
|
.BR proc (5).
|
||||||
|
|
||||||
|
Starting in Linux 3.8, unprivileged processes can create user namespaces,
|
||||||
|
and mount, PID, IPC, network, and UTS namespaces can be created with just the
|
||||||
|
.B CAP_SYS_ADMIN
|
||||||
|
capability in the caller's user namespace.
|
||||||
|
|
||||||
|
If
|
||||||
|
.BR CLONE_NEWUSER
|
||||||
|
is specified along with other
|
||||||
|
.B CLONE_NEW*
|
||||||
|
flags in a single
|
||||||
|
.BR clone (2)
|
||||||
|
or
|
||||||
|
.BR unshare (2)
|
||||||
|
call, the user namespace is guaranteed to be created first,
|
||||||
|
giving the caller privileges over the remaining
|
||||||
|
namespaces created by the call.
|
||||||
|
Thus, it possible for an unprivileged caller to specify this combination
|
||||||
|
of flags.
|
||||||
|
|
||||||
|
Use of user namespaces requires a kernel that is configured with the
|
||||||
|
.B CONFIG_USER_NS
|
||||||
|
option.
|
||||||
|
|
||||||
|
Over the years, there have been a lot of features that have been added
|
||||||
|
to the Linux kernel that are only available to privileged users
|
||||||
|
because of their potential to confuse set-user-ID-root applications.
|
||||||
|
In general, it becomes safe to allow the root user in a user namespace to
|
||||||
|
use those features because it is impossible, while in a user namespace,
|
||||||
|
to gain more privilege than the root user of a user namespace has.
|
||||||
|
|
||||||
The
|
The
|
||||||
.IR /proc/[pid]/uid_map
|
.IR /proc/[pid]/uid_map
|
||||||
|
|
Loading…
Reference in New Issue