mirror of https://github.com/mkerrisk/man-pages
438 lines
14 KiB
Groff
438 lines
14 KiB
Groff
.\" Copyright (c) 2013, 2016, 2017 by Michael Kerrisk <mtk.manpages@gmail.com>
|
|
.\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
|
|
.\"
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
.\" preserved on all copies.
|
|
.\"
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
.\" permission notice identical to this one.
|
|
.\"
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
.\" have taken the same level of care in the production of this manual,
|
|
.\" which is licensed free of charge, as they might when working
|
|
.\" professionally.
|
|
.\"
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
.\" %%%LICENSE_END
|
|
.\"
|
|
.\"
|
|
.TH NAMESPACES 7 2020-04-11 "Linux" "Linux Programmer's Manual"
|
|
.SH NAME
|
|
namespaces \- overview of Linux namespaces
|
|
.SH DESCRIPTION
|
|
A namespace wraps a global system resource in an abstraction that
|
|
makes it appear to the processes within the namespace that they
|
|
have their own isolated instance of the global resource.
|
|
Changes to the global resource are visible to other processes
|
|
that are members of the namespace, but are invisible to other processes.
|
|
One use of namespaces is to implement containers.
|
|
.PP
|
|
This page provides pointers to information on the various namespace types,
|
|
describes the associated
|
|
.I /proc
|
|
files, and summarizes the APIs for working with namespaces.
|
|
.\"
|
|
.SS Namespace types
|
|
The following table shows the namespace types available on Linux.
|
|
The second column of the table shows the flag value that is used to specify
|
|
the namespace type in various APIs.
|
|
The third column identifies the manual page that provides details
|
|
on the namespace type.
|
|
The last column is a summary of the resources that are isolated by
|
|
the namespace type.
|
|
.TS
|
|
lB lB lB lB
|
|
l1 lB1 l1 l.
|
|
Namespace Flag Page Isolates
|
|
Cgroup CLONE_NEWCGROUP \fBcgroup_namespaces\fP(7) Cgroup root directory
|
|
IPC CLONE_NEWIPC \fBipc_namespaces\fP(7) T{
|
|
System V IPC,
|
|
.br
|
|
POSIX message queues
|
|
T}
|
|
Network CLONE_NEWNET \fBnetwork_namespaces\fP(7) T{
|
|
Network devices,
|
|
.br
|
|
stacks, ports, etc.
|
|
T}
|
|
Mount CLONE_NEWNS \fBmount_namespaces\fP(7) Mount points
|
|
PID CLONE_NEWPID \fBpid_namespaces\fP(7) Process IDs
|
|
Time CLONE_NEWTIME \fBtime_namespaces\fP(7) T{
|
|
Boot and monotonic
|
|
.br
|
|
clocks
|
|
T}
|
|
User CLONE_NEWUSER \fBuser_namespaces\fP(7) User and group IDs
|
|
UTS CLONE_NEWUTS \fButs_namespaces\fP(7) T{
|
|
Hostname and NIS
|
|
.br
|
|
domain name
|
|
T}
|
|
.TE
|
|
.\"
|
|
.\" ==================== The namespaces API ====================
|
|
.\"
|
|
.SS The namespaces API
|
|
As well as various
|
|
.I /proc
|
|
files described below,
|
|
the namespaces API includes the following system calls:
|
|
.TP
|
|
.BR clone (2)
|
|
The
|
|
.BR clone (2)
|
|
system call creates a new process.
|
|
If the
|
|
.I flags
|
|
argument of the call specifies one or more of the
|
|
.B CLONE_NEW*
|
|
flags listed below, then new namespaces are created for each flag,
|
|
and the child process is made a member of those namespaces.
|
|
(This system call also implements a number of features
|
|
unrelated to namespaces.)
|
|
.TP
|
|
.BR setns (2)
|
|
The
|
|
.BR setns (2)
|
|
system call allows the calling process to join an existing namespace.
|
|
The namespace to join is specified via a file descriptor that refers to
|
|
one of the
|
|
.IR /proc/[pid]/ns
|
|
files described below.
|
|
.TP
|
|
.BR unshare (2)
|
|
The
|
|
.BR unshare (2)
|
|
system call moves the calling process to a new namespace.
|
|
If the
|
|
.I flags
|
|
argument of the call specifies one or more of the
|
|
.B CLONE_NEW*
|
|
flags listed below, then new namespaces are created for each flag,
|
|
and the calling process is made a member of those namespaces.
|
|
(This system call also implements a number of features
|
|
unrelated to namespaces.)
|
|
.TP
|
|
.BR ioctl (2)
|
|
Various
|
|
.BR ioctl (2)
|
|
operations can be used to discover information about namespaces.
|
|
These operations are described in
|
|
.BR ioctl_ns (2).
|
|
.PP
|
|
Creation of new namespaces using
|
|
.BR clone (2)
|
|
and
|
|
.BR unshare (2)
|
|
in most cases requires the
|
|
.BR CAP_SYS_ADMIN
|
|
capability, since, in the new namespace,
|
|
the creator will have the power to change global resources
|
|
that are visible to other processes that are subsequently created in,
|
|
or join the namespace.
|
|
User namespaces are the exception: since Linux 3.8,
|
|
no privilege is required to create a user namespace.
|
|
.\"
|
|
.\" ==================== The /proc/[pid]/ns/ directory ====================
|
|
.\"
|
|
.SS The /proc/[pid]/ns/ directory
|
|
Each process has a
|
|
.IR /proc/[pid]/ns/
|
|
.\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f
|
|
subdirectory containing one entry for each namespace that
|
|
supports being manipulated by
|
|
.BR setns (2):
|
|
.PP
|
|
.in +4n
|
|
.EX
|
|
$ \fBls \-l /proc/$$/ns | awk \(aq{print $1, $9, $10, $11}\(aq\fP
|
|
total 0
|
|
lrwxrwxrwx. cgroup \-> cgroup:[4026531835]
|
|
lrwxrwxrwx. ipc \-> ipc:[4026531839]
|
|
lrwxrwxrwx. mnt \-> mnt:[4026531840]
|
|
lrwxrwxrwx. net \-> net:[4026531969]
|
|
lrwxrwxrwx. pid \-> pid:[4026531836]
|
|
lrwxrwxrwx. pid_for_children \-> pid:[4026531834]
|
|
lrwxrwxrwx. time -> time:[4026531834]
|
|
lrwxrwxrwx. time_for_children -> time:[4026531834]
|
|
lrwxrwxrwx. user \-> user:[4026531837]
|
|
lrwxrwxrwx. uts \-> uts:[4026531838]
|
|
.EE
|
|
.in
|
|
.PP
|
|
Bind mounting (see
|
|
.BR mount (2))
|
|
one of the files in this directory
|
|
to somewhere else in the filesystem keeps
|
|
the corresponding namespace of the process specified by
|
|
.I pid
|
|
alive even if all processes currently in the namespace terminate.
|
|
.PP
|
|
Opening one of the files in this directory
|
|
(or a file that is bind mounted to one of these files)
|
|
returns a file handle for
|
|
the corresponding namespace of the process specified by
|
|
.IR pid .
|
|
As long as this file descriptor remains open,
|
|
the namespace will remain alive,
|
|
even if all processes in the namespace terminate.
|
|
The file descriptor can be passed to
|
|
.BR setns (2).
|
|
.PP
|
|
In Linux 3.7 and earlier, these files were visible as hard links.
|
|
Since Linux 3.8,
|
|
.\" commit bf056bfa80596a5d14b26b17276a56a0dcb080e5
|
|
they appear as symbolic links.
|
|
If two processes are in the same namespace,
|
|
then the device IDs and inode numbers of their
|
|
.IR /proc/[pid]/ns/xxx
|
|
symbolic links will be the same; an application can check this using the
|
|
.I stat.st_dev
|
|
.\" Eric Biederman: "I reserve the right for st_dev to be significant
|
|
.\" when comparing namespaces."
|
|
.\" https://lore.kernel.org/lkml/87poky5ca9.fsf@xmission.com/
|
|
.\" Re: Documenting the ioctl interfaces to discover relationships...
|
|
.\" Date: Mon, 12 Dec 2016 11:30:38 +1300
|
|
and
|
|
.I stat.st_ino
|
|
fields returned by
|
|
.BR stat (2).
|
|
The content of this symbolic link is a string containing
|
|
the namespace type and inode number as in the following example:
|
|
.PP
|
|
.in +4n
|
|
.EX
|
|
$ \fBreadlink /proc/$$/ns/uts\fP
|
|
uts:[4026531838]
|
|
.EE
|
|
.in
|
|
.PP
|
|
The symbolic links in this subdirectory are as follows:
|
|
.TP
|
|
.IR /proc/[pid]/ns/cgroup " (since Linux 4.6)"
|
|
This file is a handle for the cgroup namespace of the process.
|
|
.TP
|
|
.IR /proc/[pid]/ns/ipc " (since Linux 3.0)"
|
|
This file is a handle for the IPC namespace of the process.
|
|
.TP
|
|
.IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
|
|
.\" commit 8823c079ba7136dc1948d6f6dcb5f8022bde438e
|
|
This file is a handle for the mount namespace of the process.
|
|
.TP
|
|
.IR /proc/[pid]/ns/net " (since Linux 3.0)"
|
|
This file is a handle for the network namespace of the process.
|
|
.TP
|
|
.IR /proc/[pid]/ns/pid " (since Linux 3.8)"
|
|
.\" commit 57e8391d327609cbf12d843259c968b9e5c1838f
|
|
This file is a handle for the PID namespace of the process.
|
|
This handle is permanent for the lifetime of the process
|
|
(i.e., a process's PID namespace membership never changes).
|
|
.TP
|
|
.IR /proc/[pid]/ns/pid_for_children " (since Linux 4.12)"
|
|
.\" commit eaa0d190bfe1ed891b814a52712dcd852554cb08
|
|
This file is a handle for the PID namespace of
|
|
child processes created by this process.
|
|
This can change as a consequence of calls to
|
|
.BR unshare (2)
|
|
and
|
|
.BR setns (2)
|
|
(see
|
|
.BR pid_namespaces (7)),
|
|
so the file may differ from
|
|
.IR /proc/[pid]/ns/pid .
|
|
The symbolic link gains a value only after the first child process
|
|
is created in the namespace.
|
|
(Beforehand,
|
|
.BR readlink (2)
|
|
of the symbolic link will return an empty buffer.)
|
|
.TP
|
|
.IR /proc/[pid]/ns/time " (since Linux 5.6)"
|
|
This file is a handle for the time namespace of the process.
|
|
.TP
|
|
.IR /proc/[pid]/ns/time_for_children " (since Linux 5.6)"
|
|
This file is a handle for the time namespace of
|
|
child processes created by this process.
|
|
This can change as a consequence of calls to
|
|
.BR unshare (2)
|
|
and
|
|
.BR setns (2)
|
|
(see
|
|
.BR time_namespaces (7)),
|
|
so the file may differ from
|
|
.IR /proc/[pid]/ns/time .
|
|
.TP
|
|
.IR /proc/[pid]/ns/user " (since Linux 3.8)"
|
|
.\" commit cde1975bc242f3e1072bde623ef378e547b73f91
|
|
This file is a handle for the user namespace of the process.
|
|
.TP
|
|
.IR /proc/[pid]/ns/uts " (since Linux 3.0)"
|
|
This file is a handle for the UTS namespace of the process.
|
|
.PP
|
|
Permission to dereference or read
|
|
.RB ( readlink (2))
|
|
these symbolic links is governed by a ptrace access mode
|
|
.B PTRACE_MODE_READ_FSCREDS
|
|
check; see
|
|
.BR ptrace (2).
|
|
.\"
|
|
.\" ==================== The /proc/sys/user directory ====================
|
|
.\"
|
|
.SS The /proc/sys/user directory
|
|
The files in the
|
|
.I /proc/sys/user
|
|
directory (which is present since Linux 4.9) expose limits
|
|
on the number of namespaces of various types that can be created.
|
|
The files are as follows:
|
|
.TP
|
|
.IR max_cgroup_namespaces
|
|
The value in this file defines a per-user limit on the number of
|
|
cgroup namespaces that may be created in the user namespace.
|
|
.TP
|
|
.IR max_ipc_namespaces
|
|
The value in this file defines a per-user limit on the number of
|
|
ipc namespaces that may be created in the user namespace.
|
|
.TP
|
|
.IR max_mnt_namespaces
|
|
The value in this file defines a per-user limit on the number of
|
|
mount namespaces that may be created in the user namespace.
|
|
.TP
|
|
.IR max_net_namespaces
|
|
The value in this file defines a per-user limit on the number of
|
|
network namespaces that may be created in the user namespace.
|
|
.TP
|
|
.IR max_pid_namespaces
|
|
The value in this file defines a per-user limit on the number of
|
|
PID namespaces that may be created in the user namespace.
|
|
.TP
|
|
.IR max_time_namespaces " (since Linux 5.7)"
|
|
.\" commit eeec26d5da8248ea4e240b8795bb4364213d3247
|
|
The value in this file defines a per-user limit on the number of
|
|
time namespaces that may be created in the user namespace.
|
|
.TP
|
|
.IR max_user_namespaces
|
|
The value in this file defines a per-user limit on the number of
|
|
user namespaces that may be created in the user namespace.
|
|
.TP
|
|
.IR max_uts_namespaces
|
|
The value in this file defines a per-user limit on the number of
|
|
uts namespaces that may be created in the user namespace.
|
|
.PP
|
|
Note the following details about these files:
|
|
.IP * 3
|
|
The values in these files are modifiable by privileged processes.
|
|
.IP *
|
|
The values exposed by these files are the limits for the user namespace
|
|
in which the opening process resides.
|
|
.IP *
|
|
The limits are per-user.
|
|
Each user in the same user namespace
|
|
can create namespaces up to the defined limit.
|
|
.IP *
|
|
The limits apply to all users, including UID 0.
|
|
.IP *
|
|
These limits apply in addition to any other per-namespace
|
|
limits (such as those for PID and user namespaces) that may be enforced.
|
|
.IP *
|
|
Upon encountering these limits,
|
|
.BR clone (2)
|
|
and
|
|
.BR unshare (2)
|
|
fail with the error
|
|
.BR ENOSPC .
|
|
.IP *
|
|
For the initial user namespace,
|
|
the default value in each of these files is half the limit on the number
|
|
of threads that may be created
|
|
.RI ( /proc/sys/kernel/threads-max ).
|
|
In all descendant user namespaces, the default value in each file is
|
|
.BR MAXINT .
|
|
.IP *
|
|
When a namespace is created, the object is also accounted
|
|
against ancestor namespaces.
|
|
More precisely:
|
|
.RS
|
|
.IP + 3
|
|
Each user namespace has a creator UID.
|
|
.IP +
|
|
When a namespace is created,
|
|
it is accounted against the creator UIDs in each of the
|
|
ancestor user namespaces,
|
|
and the kernel ensures that the corresponding namespace limit
|
|
for the creator UID in the ancestor namespace is not exceeded.
|
|
.IP +
|
|
The aforementioned point ensures that creating a new user namespace
|
|
cannot be used as a means to escape the limits in force
|
|
in the current user namespace.
|
|
.RE
|
|
.\"
|
|
.SS Namespace lifetime
|
|
Absent any other factors,
|
|
a namespace is automatically torn down when the last process in
|
|
the namespace terminates or leaves the namespace.
|
|
However, there are a number of other factors that may pin
|
|
a namespace into existence even though it has no member processes.
|
|
These factors include the following:
|
|
.IP * 3
|
|
An open file descriptor or a bind mount exists for the corresponding
|
|
.IR /proc/[pid]/ns/*
|
|
file.
|
|
.IP *
|
|
The namespace is hierarchical (i.e., a PID or user namespace),
|
|
and has a child namespace.
|
|
.IP *
|
|
It is a user namespace that owns one or more nonuser namespaces.
|
|
.IP *
|
|
It is a PID namespace,
|
|
and there is a process that refers to the namespace via a
|
|
.IR /proc/[pid]/ns/pid_for_children
|
|
symbolic link.
|
|
.IP *
|
|
It is a time namespace,
|
|
and there is a process that refers to the namespace via a
|
|
.IR /proc/[pid]/ns/time_for_children
|
|
symbolic link.
|
|
.IP *
|
|
It is an IPC namespace, and a corresponding mount of an
|
|
.I mqueue
|
|
filesystem (see
|
|
.BR mq_overview (7))
|
|
refers to this namespace.
|
|
.IP *
|
|
It is a PID namespace, and a corresponding mount of a
|
|
.BR proc (5)
|
|
filesystem refers to this namespace.
|
|
.SH EXAMPLES
|
|
See
|
|
.BR clone (2)
|
|
and
|
|
.BR user_namespaces (7).
|
|
.SH SEE ALSO
|
|
.BR nsenter (1),
|
|
.BR readlink (1),
|
|
.BR unshare (1),
|
|
.BR clone (2),
|
|
.BR ioctl_ns (2),
|
|
.BR setns (2),
|
|
.BR unshare (2),
|
|
.BR proc (5),
|
|
.BR capabilities (7),
|
|
.BR cgroup_namespaces (7),
|
|
.BR cgroups (7),
|
|
.BR credentials (7),
|
|
.BR ipc_namespaces (7),
|
|
.BR network_namespaces (7),
|
|
.BR pid_namespaces (7),
|
|
.BR user_namespaces (7),
|
|
.BR uts_namespaces (7),
|
|
.BR lsns (8),
|
|
.BR pam_namespace (8),
|
|
.BR switch_root (8)
|