mirror of https://github.com/mkerrisk/man-pages
clone.2, namespaces.7: Move some CLONE_NEWUSER text from clone.2 to namespaces.7
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
3dd2331ce7
commit
9d005472a8
179
man2/clone.2
179
man2/clone.2
|
@ -379,90 +379,6 @@ in the same
|
|||
.BR clone ()
|
||||
call.
|
||||
|
||||
.TP
|
||||
.BR CLONE_NEWUSER
|
||||
(This flag first became meaningful for
|
||||
.BR clone ()
|
||||
in Linux 2.6.23,
|
||||
the current
|
||||
.BR clone()
|
||||
semantics were merged in Linux 3.5,
|
||||
and the final pieces to make the user namespaces completely usable were
|
||||
merged in Linux 3.8.)
|
||||
|
||||
If
|
||||
.B CLONE_NEWUSER
|
||||
is set, then create the process in a new user namespace.
|
||||
If this flag is not set, then (as with
|
||||
.BR fork (2))
|
||||
the process is created in the same user namespace as the calling process.
|
||||
|
||||
A user namespace provides an isolated environment for
|
||||
security related identifiers, in particular,
|
||||
user IDs, group IDs, keys (see
|
||||
.BR keyctl (2)),
|
||||
and capabilities.
|
||||
|
||||
When a user namespace is created,
|
||||
it starts out without a mapping of user IDs (group IDs)
|
||||
to the parent user namespace.
|
||||
The desired mapping of user IDs (group IDs) to the parent user namespace
|
||||
may be set by writing into
|
||||
.IR /proc/[pid]/uid_map
|
||||
.RI ( /proc/[pid]/gid_map );
|
||||
see
|
||||
.BR proc (5).
|
||||
|
||||
The first process in a user namespace starts out with a complete set
|
||||
of capabilities with respect to the new user namespace.
|
||||
|
||||
System calls that return user IDs (group IDs) will return
|
||||
either the user ID (group ID) mapped into the current
|
||||
user namespace if there is a mapping, or the overflow user ID (group ID);
|
||||
the default value for the overflow user ID (group ID) is 65534.
|
||||
See the descriptions of
|
||||
.IR /proc/sys/kernel/overflowuid
|
||||
and
|
||||
.IR /proc/sys/kernel/overflowgid
|
||||
in
|
||||
.BR proc (5).
|
||||
|
||||
Use of this flag requires a kernel configured with the
|
||||
.BR CONFIG_USER_NS
|
||||
option.
|
||||
Before Linux 3.8, use of
|
||||
.BR CLONE_NEWUSER
|
||||
required that the caller have three capabilities:
|
||||
.BR CAP_SYS_ADMIN ,
|
||||
.BR CAP_SETUID ,
|
||||
and
|
||||
.BR CAP_SETGID .
|
||||
.\" Before Linux 2.6.29, it appears that only CAP_SYS_ADMIN was needed
|
||||
Starting with Linux 3.8,
|
||||
no privileges are needed to create a user namespace,
|
||||
and mount, PID, IPC, network, and UTS namespaces can be created with just the
|
||||
.B CAP_SYS_ADMIN
|
||||
capability in the caller's user namespace.
|
||||
|
||||
If
|
||||
.BR CLONE_NEWUSER
|
||||
is specified along with other
|
||||
.B CLONE_NEW*
|
||||
flags in a single
|
||||
.BR clone()
|
||||
call, the user namespace is guaranteed to be created first,
|
||||
giving the caller privileges over the remaining
|
||||
namespaces created by the call.
|
||||
Thus, it possible for an unprivileged caller to specify this combination
|
||||
of flags.
|
||||
|
||||
Over the years, there have been a lot of features that have been added
|
||||
to the Linux kernel that are only available to privileged users
|
||||
because of their potential to confuse set-user-ID-root applications.
|
||||
In general, it becomes safe to allow the root user in a user namespace to
|
||||
use those features because it is impossible, while in a user namespace,
|
||||
to gain more privilege than the root user of a user namespace has.
|
||||
|
||||
.TP
|
||||
.BR CLONE_NEWPID " (since Linux 2.6.24)"
|
||||
.\" This explanation draws a lot of details from
|
||||
|
@ -481,68 +397,47 @@ the process is created in the same PID namespace as
|
|||
the calling process.
|
||||
This flag is intended for the implementation of containers.
|
||||
|
||||
A PID namespace provides an isolated environment for PIDs:
|
||||
PIDs in a new namespace start at 1,
|
||||
somewhat like a standalone system, and calls to
|
||||
.BR fork (2),
|
||||
.BR vfork (2),
|
||||
or
|
||||
.BR clone ()
|
||||
will produce processes with PIDs that are unique within the namespace.
|
||||
For further information on PID namespaces, see
|
||||
.BR namespaces (7).
|
||||
|
||||
The first process created in a new namespace
|
||||
(i.e., the process created using the
|
||||
.BR CLONE_NEWPID
|
||||
flag) has the PID 1, and is the "init" process for the namespace.
|
||||
Children that are orphaned within the namespace will be reparented
|
||||
to this process rather than
|
||||
.BR init (8).
|
||||
Unlike the traditional
|
||||
.B init
|
||||
process, the "init" process of a PID namespace can terminate,
|
||||
and if it does, all of the processes in the namespace are terminated.
|
||||
|
||||
PID namespaces form a hierarchy.
|
||||
When a new PID namespace is created,
|
||||
the processes in that namespace are visible
|
||||
in the PID namespace of the process that created the new namespace;
|
||||
analogously, if the parent PID namespace is itself
|
||||
the child of another PID namespace,
|
||||
then processes in the child and parent PID namespaces will both be
|
||||
visible in the grandparent PID namespace.
|
||||
Conversely, the processes in the "child" PID namespace do not see
|
||||
the processes in the parent namespace.
|
||||
The existence of a namespace hierarchy means that each process
|
||||
may now have multiple PIDs:
|
||||
one for each namespace in which it is visible;
|
||||
each of these PIDs is unique within the corresponding namespace.
|
||||
(A call to
|
||||
.BR getpid (2)
|
||||
always returns the PID associated with the namespace in which
|
||||
the process lives.)
|
||||
|
||||
After creating the new namespace,
|
||||
it is useful for the child to change its root directory
|
||||
and mount a new procfs instance at
|
||||
.I /proc
|
||||
so that tools such as
|
||||
.BR ps (1)
|
||||
work correctly.
|
||||
.\" mount -t proc proc /proc
|
||||
(If
|
||||
.BR CLONE_NEWNS
|
||||
is also included in
|
||||
.IR flags ,
|
||||
then it isn't necessary to change the root directory:
|
||||
a new procfs instance can be mounted directly over
|
||||
.IR /proc .)
|
||||
|
||||
Use of this flag requires: a kernel configured with the
|
||||
.B CONFIG_PID_NS
|
||||
option and that the process be privileged
|
||||
Use of this flag requires
|
||||
that the process be privileged
|
||||
.RB ( CAP_SYS_ADMIN ).
|
||||
This flag can't be specified in conjunction with
|
||||
.BR CLONE_THREAD .
|
||||
|
||||
.TP
|
||||
.BR CLONE_NEWUSER
|
||||
(This flag first became meaningful for
|
||||
.BR clone ()
|
||||
in Linux 2.6.23,
|
||||
the current
|
||||
.BR clone()
|
||||
semantics were merged in Linux 3.5,
|
||||
and the final pieces to make the user namespaces completely usable were
|
||||
merged in Linux 3.8.)
|
||||
|
||||
If
|
||||
.B CLONE_NEWUSER
|
||||
is set, then create the process in a new user namespace.
|
||||
If this flag is not set, then (as with
|
||||
.BR fork (2))
|
||||
the process is created in the same user namespace as the calling process.
|
||||
|
||||
For further information on user namespaces, see
|
||||
.BR namespaces (7).
|
||||
|
||||
Before Linux 3.8, use of
|
||||
.BR CLONE_NEWUSER
|
||||
required that the caller have three capabilities:
|
||||
.BR CAP_SYS_ADMIN ,
|
||||
.BR CAP_SETUID ,
|
||||
and
|
||||
.BR CAP_SETGID .
|
||||
.\" Before Linux 2.6.29, it appears that only CAP_SYS_ADMIN was needed
|
||||
Starting with Linux 3.8,
|
||||
no privileges are needed to create a user namespace.
|
||||
|
||||
.TP
|
||||
.BR CLONE_NEWUTS " (since Linux 2.6.19)"
|
||||
If
|
||||
|
|
|
@ -292,27 +292,88 @@ PID namespaces isolate the process ID number space,
|
|||
meaning that processes in different PID namespaces can have the same PID.
|
||||
PID namespaces allow containers to migrate to a new hosts
|
||||
while the processes inside the container maintain the same PIDs.
|
||||
Each PID namespace has its own init (PID 1, see
|
||||
.BR init (1)),
|
||||
the "ancestor of all processes" that
|
||||
manages various system initialization tasks and
|
||||
reaps orphaned child processes when they terminate.
|
||||
|
||||
From the point of view of a particular PID namespace instance,
|
||||
a process has two PIDs: the PID inside the namespace,
|
||||
and the PID outside the namespace on the host system.
|
||||
PID namespaces can be nested:
|
||||
a process will have one PID for each of the layers of the hierarchy
|
||||
PIDs in a new PID namespace start at 1,
|
||||
somewhat like a standalone system, and calls to
|
||||
.BR fork (2),
|
||||
.BR vfork (2),
|
||||
or
|
||||
.BR clone (2)
|
||||
will produce processes with PIDs that are unique within the namespace.
|
||||
|
||||
The first process created in a new namespace
|
||||
(i.e., the process created using
|
||||
.BR clone (2)
|
||||
with the
|
||||
.BR CLONE_NEWPID
|
||||
flag, or the first child created by a process after a call to
|
||||
.BR unshare (2)
|
||||
using the
|
||||
.BR CLONE_NEWPID
|
||||
flag) has the PID 1, and is the "init" process for the namespace (see
|
||||
.BR init (1)).
|
||||
Children that are orphaned within the namespace will be reparented
|
||||
to this process rather than
|
||||
.BR init (8).
|
||||
Unlike the traditional
|
||||
.B init
|
||||
process, the "init" process of a PID namespace can terminate,
|
||||
and if it does, all of the processes in the namespace are terminated.
|
||||
|
||||
PID namespaces can be nested.
|
||||
When a new PID namespace is created,
|
||||
the processes in that namespace are visible
|
||||
in the PID namespace of the process that created the new namespace;
|
||||
analogously, if the parent PID namespace is itself
|
||||
the child of another PID namespace,
|
||||
then processes in the child and parent PID namespaces will both be
|
||||
visible in the grandparent PID namespace.
|
||||
Conversely, the processes in the "child" PID namespace do not see
|
||||
the processes in the parent namespace.
|
||||
More succinctly: a process can see (e.g., send signals with
|
||||
.BR kill(2))
|
||||
only to processes contained in its own PID namespace
|
||||
and the namespaces nested below that PID namespace.
|
||||
|
||||
A process will have one PID for each of the layers of the hierarchy
|
||||
starting from the PID namespace in which it resides
|
||||
through to the root PID namespace.
|
||||
A process can see (e.g., send signals with
|
||||
.BR kill(2))
|
||||
only processes contained in its own PID namespace
|
||||
and the namespaces nested below that PID namespace.
|
||||
A call to
|
||||
.BR getpid (2)
|
||||
always returns the PID associated with the namespace in which
|
||||
the process resides.
|
||||
|
||||
After creating a new PID namespace,
|
||||
it is useful for the child to change its root directory
|
||||
and mount a new procfs instance at
|
||||
.I /proc
|
||||
so that tools such as
|
||||
.BR ps (1)
|
||||
work correctly.
|
||||
.\" mount -t proc proc /proc
|
||||
(If
|
||||
.BR CLONE_NEWNS
|
||||
is also included in the
|
||||
.IR flags
|
||||
argument of
|
||||
.BR clone (2)
|
||||
or
|
||||
.BR unshare (2)),
|
||||
then it isn't necessary to change the root directory:
|
||||
a new procfs instance can be mounted directly over
|
||||
.IR /proc .)
|
||||
|
||||
Use of PID namespaces requires a kernel that is configured with the
|
||||
.B CONFIG_PID_NS
|
||||
option.
|
||||
|
||||
.SS User namespaces (CLONE_NEWUSER)
|
||||
|
||||
User namespaces isolate the user and group ID number spaces.
|
||||
User namespaces isolate
|
||||
security related identifiers, in particular,
|
||||
user IDs, group IDs, keys (see
|
||||
.BR keyctl (2)),
|
||||
and capabilities.
|
||||
In other words, a process's user and group IDs can be different
|
||||
inside and outside a user namespace.
|
||||
A process can have a normal unprivileged user ID outside a user namespace
|
||||
|
@ -321,7 +382,58 @@ in other words,
|
|||
the process has full privileges for operations inside the user namespace,
|
||||
but is unprivileged for operations outside the namespace.
|
||||
|
||||
Starting in Linux 3.8, unprivileged processes can create user namespaces.
|
||||
When a user namespace is created,
|
||||
it starts out without a mapping of user IDs (group IDs)
|
||||
to the parent user namespace.
|
||||
The desired mapping of user IDs (group IDs) to the parent user namespace
|
||||
may be set by writing into
|
||||
.IR /proc/[pid]/uid_map
|
||||
.RI ( /proc/[pid]/gid_map );
|
||||
see below.
|
||||
|
||||
The first process in a user namespace starts out with a complete set
|
||||
of capabilities with respect to the new user namespace.
|
||||
|
||||
System calls that return user IDs (group IDs) will return
|
||||
either the user ID (group ID) mapped into the current
|
||||
user namespace if there is a mapping, or the overflow user ID (group ID);
|
||||
the default value for the overflow user ID (group ID) is 65534.
|
||||
See the descriptions of
|
||||
.IR /proc/sys/kernel/overflowuid
|
||||
and
|
||||
.IR /proc/sys/kernel/overflowgid
|
||||
in
|
||||
.BR proc (5).
|
||||
|
||||
Starting in Linux 3.8, unprivileged processes can create user namespaces,
|
||||
and mount, PID, IPC, network, and UTS namespaces can be created with just the
|
||||
.B CAP_SYS_ADMIN
|
||||
capability in the caller's user namespace.
|
||||
|
||||
If
|
||||
.BR CLONE_NEWUSER
|
||||
is specified along with other
|
||||
.B CLONE_NEW*
|
||||
flags in a single
|
||||
.BR clone (2)
|
||||
or
|
||||
.BR unshare (2)
|
||||
call, the user namespace is guaranteed to be created first,
|
||||
giving the caller privileges over the remaining
|
||||
namespaces created by the call.
|
||||
Thus, it possible for an unprivileged caller to specify this combination
|
||||
of flags.
|
||||
|
||||
Use of user namespaces requires a kernel that is configured with the
|
||||
.B CONFIG_USER_NS
|
||||
option.
|
||||
|
||||
Over the years, there have been a lot of features that have been added
|
||||
to the Linux kernel that are only available to privileged users
|
||||
because of their potential to confuse set-user-ID-root applications.
|
||||
In general, it becomes safe to allow the root user in a user namespace to
|
||||
use those features because it is impossible, while in a user namespace,
|
||||
to gain more privilege than the root user of a user namespace has.
|
||||
|
||||
The
|
||||
.IR /proc/[pid]/uid_map
|
||||
|
|
Loading…
Reference in New Issue