mirror of https://github.com/mkerrisk/man-pages
228 lines
7.4 KiB
Groff
228 lines
7.4 KiB
Groff
|
.\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
|
||
|
.\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
|
||
|
.\"
|
||
|
.\" Permission is granted to make and distribute verbatim copies of this
|
||
|
.\" manual provided the copyright notice and this permission notice are
|
||
|
.\" preserved on all copies.
|
||
|
.\"
|
||
|
.\" Permission is granted to copy and distribute modified versions of this
|
||
|
.\" manual under the conditions for verbatim copying, provided that the
|
||
|
.\" entire resulting derived work is distributed under the terms of a
|
||
|
.\" permission notice identical to this one.
|
||
|
.\"
|
||
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
||
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
||
|
.\" responsibility for errors or omissions, or for damages resulting from
|
||
|
.\" the use of the information contained herein. The author(s) may not
|
||
|
.\" have taken the same level of care in the production of this manual,
|
||
|
.\" which is licensed free of charge, as they might when working
|
||
|
.\" professionally.
|
||
|
.\"
|
||
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
||
|
.\" the source, must acknowledge the copyright and authors of this work.
|
||
|
.\"
|
||
|
.\"
|
||
|
.TH PID_NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual"
|
||
|
.SH NAME
|
||
|
pid_namespaces \- overview of Linux PID namespaces
|
||
|
.SH DESCRIPTION
|
||
|
For an overview of namespaces, see
|
||
|
.BR namespaces (7).
|
||
|
.SS PID namespaces (CLONE_NEWPID)
|
||
|
PID namespaces isolate the process ID number space,
|
||
|
meaning that processes in different PID namespaces can have the same PID.
|
||
|
PID namespaces allow containers to migrate to a new host
|
||
|
while the processes inside the container maintain the same PIDs.
|
||
|
|
||
|
PIDs in a new PID namespace start at 1,
|
||
|
somewhat like a standalone system, and calls to
|
||
|
.BR fork (2),
|
||
|
.BR vfork (2),
|
||
|
or
|
||
|
.BR clone (2)
|
||
|
will produce processes with PIDs that are unique within the namespace.
|
||
|
|
||
|
The first process created in a new namespace
|
||
|
(i.e., the process created using
|
||
|
.BR clone (2)
|
||
|
with the
|
||
|
.BR CLONE_NEWPID
|
||
|
flag, or the first child created by a process after a call to
|
||
|
.BR unshare (2)
|
||
|
using the
|
||
|
.BR CLONE_NEWPID
|
||
|
flag) has the PID 1, and is the "init" process for the namespace (see
|
||
|
.BR init (1)).
|
||
|
Children that are orphaned within the namespace will be reparented
|
||
|
to this process rather than
|
||
|
.BR init (1).
|
||
|
|
||
|
If the "init" process of a PID namespace terminates,
|
||
|
the kernel terminates all of the processes in the namespace via a
|
||
|
.BR SIGKILL
|
||
|
signal.
|
||
|
This behavior reflects the fact that the "init" process
|
||
|
is essential for the correct operation of a PID namespace.
|
||
|
In this case, a subsequent
|
||
|
.BR fork (2)
|
||
|
into this PID namespace (e.g., from a process that has done a
|
||
|
.BR setns (2)
|
||
|
into the namespace using an open file descriptor for a
|
||
|
.I /proc/[pid]/ns/pid
|
||
|
file corresponding to a process that was in the namespace)
|
||
|
will fail with the error
|
||
|
.BR ENOMEM ;
|
||
|
it is not possible to create a new processes in a PID namespace whose "init"
|
||
|
process has terminated.
|
||
|
|
||
|
Only signals for which the "init" process has established a signal handler
|
||
|
can be sent to the "init" process by other members of the PID namespace.
|
||
|
This restriction applies even to privileged processes,
|
||
|
and prevents other members of the PID namespace from
|
||
|
accidentally killing the "init" process.
|
||
|
|
||
|
Likewise, a process in an ancestor namespace
|
||
|
can\(emsubject to the usual permission checks described in
|
||
|
.BR kill (2)\(emsend
|
||
|
signals to the "init" process of a child PID namespace only
|
||
|
if the "init" process has established a handler for that signal.
|
||
|
(Within the handler, the
|
||
|
.I siginfo_t
|
||
|
.I si_pid
|
||
|
field described in
|
||
|
.BR sigaction (2)
|
||
|
will be zero.)
|
||
|
.B SIGKILL
|
||
|
or
|
||
|
.B SIGSTOP
|
||
|
are treated exceptionally:
|
||
|
these signals are forcibly delivered when sent from an ancestor PID namespace.
|
||
|
Neither of these signals can be caught by the "init" process,
|
||
|
and so will result in the usual actions associated with those signals
|
||
|
(respectively, terminating and stopping the process).
|
||
|
|
||
|
PID namespaces can be nested.
|
||
|
When a new PID namespace is created,
|
||
|
the processes in that namespace are visible
|
||
|
in the PID namespace of the process that created the new namespace;
|
||
|
analogously, if the parent PID namespace is itself
|
||
|
the child of another PID namespace,
|
||
|
then processes in the child and parent PID namespaces will both be
|
||
|
visible in the grandparent PID namespace.
|
||
|
Conversely, the processes in the "child" PID namespace do not see
|
||
|
the processes in the parent namespace.
|
||
|
More succinctly: a process can see (e.g., send signals with
|
||
|
.BR kill(2))
|
||
|
only processes contained in its own PID namespace
|
||
|
and the namespaces nested below that PID namespace.
|
||
|
|
||
|
A process will have one PID for each of the layers of the hierarchy
|
||
|
starting from the PID namespace in which it resides
|
||
|
through to the root PID namespace.
|
||
|
A call to
|
||
|
.BR getpid (2)
|
||
|
always returns the PID associated with the namespace in which
|
||
|
the process resides.
|
||
|
|
||
|
Some processes in a PID namespace may have parents
|
||
|
that are outside of the namespace.
|
||
|
For example, the parent of the initial process in the namespace
|
||
|
(i.e.,
|
||
|
the
|
||
|
.BR init (1)
|
||
|
process with PID 1) is necessarily in another namespace.
|
||
|
Likewise, the direct children of a process that uses
|
||
|
.BR setns (2)
|
||
|
to cause its children to join a PID namespace are in a different
|
||
|
PID namespace from the caller of
|
||
|
.BR setns (2).
|
||
|
Calls to
|
||
|
.BR getppid (2)
|
||
|
for such processes return 0.
|
||
|
|
||
|
After creating a new PID namespace,
|
||
|
it is useful for the child to change its root directory
|
||
|
and mount a new procfs instance at
|
||
|
.I /proc
|
||
|
so that tools such as
|
||
|
.BR ps (1)
|
||
|
work correctly.
|
||
|
.\" mount -t proc proc /proc
|
||
|
(If
|
||
|
.BR CLONE_NEWNS
|
||
|
is also included in the
|
||
|
.IR flags
|
||
|
argument of
|
||
|
.BR clone (2)
|
||
|
or
|
||
|
.BR unshare (2)),
|
||
|
then it isn't necessary to change the root directory:
|
||
|
a new procfs instance can be mounted directly over
|
||
|
.IR /proc .)
|
||
|
|
||
|
Calls to
|
||
|
.BR setns (2)
|
||
|
that specify a PID namespace file descriptor
|
||
|
and calls to
|
||
|
.BR unshare (2)
|
||
|
with the
|
||
|
.BR CLONE_NEWPID
|
||
|
flag cause children subsequently created
|
||
|
by the caller to be placed in a different PID namespace from the caller.
|
||
|
These calls do not, however,
|
||
|
change the PID namespace of the calling process,
|
||
|
because doing so would change the caller's idea of its own PID
|
||
|
(as reported by
|
||
|
.BR getpid ()),
|
||
|
which would break many applications and libraries.
|
||
|
To put things another way:
|
||
|
a process's PID namespace membership is determined when the process is created
|
||
|
and cannot be changed thereafter.
|
||
|
|
||
|
Every thread in a process must be in the same PID namespace.
|
||
|
For this reason, the two following call sequences will fail:
|
||
|
|
||
|
.nf
|
||
|
unshare(CLONE_NEWPID);
|
||
|
clone(..., CLONE_VM, ...); /* Fails */
|
||
|
|
||
|
setns(fd, CLONE_NEWPID);
|
||
|
clone(..., CLONE_VM, ...); /* Fails */
|
||
|
.fi
|
||
|
|
||
|
Because the above
|
||
|
.BR unshare (2)
|
||
|
and
|
||
|
.BR setns (2)
|
||
|
calls only change the PID namespace for created children, the
|
||
|
.BR clone (2)
|
||
|
calls necessarily put the new thread in a different PID namespace from
|
||
|
the calling thread.
|
||
|
|
||
|
When a process ID is passed over a UNIX domain socket to a
|
||
|
process in a different PID namespace (see the description of
|
||
|
.B SCM_CREDENTIALS
|
||
|
in
|
||
|
.BR unix (7)),
|
||
|
it is translated into the corresponding PID value in
|
||
|
the receiving process's PID namespace.
|
||
|
.\" FIXME Presumably, a similar thing happens with the UID and GID passed
|
||
|
.\" via a UNIX domain socket. That needs to be confirmed and documented
|
||
|
.\" under the "User namespaces" section.
|
||
|
|
||
|
Use of PID namespaces requires a kernel that is configured with the
|
||
|
.B CONFIG_PID_NS
|
||
|
option.
|
||
|
.SH CONFORMING TO
|
||
|
Namespaces are a Linux-specific feature.
|
||
|
.SH SEE ALSO
|
||
|
.BR unshare (1),
|
||
|
.BR clone (2),
|
||
|
.BR setns (2),
|
||
|
.BR unshare (2),
|
||
|
.BR proc (5),
|
||
|
.BR credentials (7),
|
||
|
.BR capabilities (7),
|
||
|
.BR user_namespaces (7),
|
||
|
.BR switch_root (8)
|