2013-01-13 23:45:09 +00:00
|
|
|
.\" Copyright (c) 2013 by Michael Kerrisk <mtk.manpages@gmail.com>
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\"
|
|
|
|
.\"
|
|
|
|
.TH NAMESPACES 7 2013-01-14 "Linux" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
|
|
|
namespaces \- overview of Linux namespaces
|
|
|
|
.SH DESCRIPTION
|
|
|
|
A namespace wraps a global system resource in an abstraction that
|
|
|
|
makes it appear to the processes within the namespace that they
|
|
|
|
have their own isolated instance of the global resource.
|
|
|
|
Changes to the global resource are visible to other processes
|
|
|
|
that are members of the namespace, but are invisible to other processes.
|
|
|
|
One use of namespaces is to implement containers.
|
|
|
|
|
|
|
|
This page describes the various namespaces and the associated
|
|
|
|
.I /proc
|
|
|
|
files, and summarizes the APIs for working with namespaces.
|
|
|
|
|
|
|
|
.SS The namespaces API
|
|
|
|
|
|
|
|
As well as various
|
|
|
|
.I /proc
|
|
|
|
files described below,
|
|
|
|
the namespaces API comprises the following system calls:
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.BR clone (2)
|
|
|
|
The
|
|
|
|
.BR clone (2)
|
|
|
|
system call creates a new process.
|
|
|
|
If the
|
|
|
|
.I flags
|
|
|
|
argument of the call specifies one or more of the
|
|
|
|
.B CLONE_NEW*
|
|
|
|
flags listed below, then new namespaces are created for each flag,
|
|
|
|
and the child process is made a member of those namespaces.
|
|
|
|
(This system call also implements a number of features
|
|
|
|
unrelated to namespaces.)
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.BR setns (2)
|
|
|
|
The
|
|
|
|
.BR setns (2)
|
|
|
|
system call allows the calling process to join an existing namespace.
|
|
|
|
The namespace to join is specified via a file descriptor that refers to
|
|
|
|
one of the
|
|
|
|
.IR /proc/[pid]/ns
|
|
|
|
files described below.
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.BR unshare (2)
|
|
|
|
The
|
|
|
|
.BR unshare (2)
|
|
|
|
system call moves the calling process to a new namespace.
|
|
|
|
If the
|
|
|
|
.I flags
|
|
|
|
argument of the call specifies one or more of the
|
|
|
|
.B CLONE_NEW*
|
|
|
|
flags listed below, then new namespaces are created for each flag,
|
|
|
|
and the calling process is made a member of those namespaces.
|
|
|
|
(This system call also implements a number of features
|
|
|
|
unrelated to namespaces.)
|
|
|
|
|
|
|
|
Leaving aside the other effects of the
|
|
|
|
.BR clone (2)
|
|
|
|
system call, the following call:
|
|
|
|
|
|
|
|
clone(..., CLONE_NEWXXX, ...);
|
|
|
|
|
|
|
|
is equivalent in namespace terms to:
|
|
|
|
|
|
|
|
if (fork() == 0) /* if child */
|
|
|
|
unshare(CLONE_NEWXXX);
|
|
|
|
|
2013-01-14 00:22:01 +00:00
|
|
|
.SS The /proc/[pid]/ns/ directory
|
|
|
|
|
|
|
|
Each process has a
|
|
|
|
.IR /proc/[pid]/ns/
|
|
|
|
.\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f
|
|
|
|
subdirectory containing one entry for each namespace that
|
|
|
|
supports being manipulated by
|
|
|
|
.BR setns (2).
|
|
|
|
|
|
|
|
Bind mounting (see
|
|
|
|
.BR mount (2))
|
|
|
|
one of the files in this directory
|
|
|
|
to somewhere else in the file system keeps
|
|
|
|
the corresponding namespace of the process specified by
|
|
|
|
.I pid
|
|
|
|
alive even if all processes currently in the namespace terminate.
|
|
|
|
|
|
|
|
Opening one of the files in this directory
|
|
|
|
(or a file that is bind mounted to one of these files)
|
|
|
|
returns a file handle for
|
|
|
|
the corresponding namespace of the process specified by
|
|
|
|
.IR pid .
|
|
|
|
As long as this file descriptor remains open,
|
|
|
|
the namespace will remain alive,
|
|
|
|
even if all processes in the namespace terminate.
|
|
|
|
The file descriptor can be passed to
|
|
|
|
.BR setns (2).
|
|
|
|
|
|
|
|
In Linux 3.7 and earlier, these files were visible as hard links.
|
|
|
|
Since Linux 3.8, they appear as symbolic links.
|
|
|
|
If two processes are in the same namespace, then the inode numbers of their
|
|
|
|
.IR /proc/[pid]/ns/xxx
|
|
|
|
symbolic links will be the same; an application can check this using the
|
|
|
|
.I stat.st_ino
|
|
|
|
field returned by
|
|
|
|
.BR stat (2).
|
|
|
|
The content of this symbolic link is a string containing
|
|
|
|
the namespace type and inode number as in the following example:
|
|
|
|
|
|
|
|
.in +4n
|
|
|
|
.nf
|
|
|
|
$ \fBreadlink /proc/$$/ns/uts\fP
|
|
|
|
uts:[4026531838]
|
|
|
|
.fi
|
|
|
|
.in
|
|
|
|
|
|
|
|
The files in this subdirectory are as follows:
|
|
|
|
.TP
|
|
|
|
.IR /proc/[pid]/ns/ipc " (since Linux 3.0)"
|
|
|
|
This file is a handle for the IPC namespace of the process.
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
|
|
|
|
This file is a handle for the mount namespace of the process.
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.IR /proc/[pid]/ns/net " (since Linux 3.0)"
|
|
|
|
This file is a handle for the network namespace of the process.
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.IR /proc/[pid]/ns/pid " (since Linux 3.8)"
|
|
|
|
This file is a handle for the PID namespace of the process.
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.IR /proc/[pid]/ns/user " (since Linux 3.8)"
|
|
|
|
This file is a handle for the user namespace of the process.
|
|
|
|
|
|
|
|
.TP
|
|
|
|
.IR /proc/[pid]/ns/uts " (since Linux 3.0)"
|
|
|
|
This file is a handle for the IPC namespace of the process.
|
|
|
|
|
|
|
|
|
2013-01-13 23:45:09 +00:00
|
|
|
.SS IPC namespaces (CLONE_NEWIPC)
|
|
|
|
|
|
|
|
IPC namespaces isolate certain IPC resources,
|
|
|
|
namely, System V IPC objects (see
|
|
|
|
.BR svipc (7))
|
2013-01-14 03:21:33 +00:00
|
|
|
and (since Linux 2.6.30)
|
|
|
|
.\" commit 7eafd7c74c3f2e67c27621b987b28397110d643f
|
|
|
|
.\" https://lwn.net/Articles/312232/
|
|
|
|
POSIX message queues (see
|
2013-01-13 23:45:09 +00:00
|
|
|
.BR mq_overview (7).
|
2013-01-14 03:21:33 +00:00
|
|
|
The common characteristic of these IPC mechanisms is that IPC
|
|
|
|
objects are identified by mechanisms other than filesystem
|
|
|
|
pathnames.
|
|
|
|
|
2013-01-13 23:45:09 +00:00
|
|
|
Each IPC namespace has its own set of System V IPC identifiers and
|
|
|
|
its own POSIX message queue file system.
|
2013-01-14 03:21:33 +00:00
|
|
|
Objects created in an IPC namespace are visible to all other processes
|
|
|
|
that are members of that namespace,
|
|
|
|
but are not visible to processes in other IPC namespaces.
|
|
|
|
|
|
|
|
When an IPC namespace is destroyed
|
|
|
|
(i.e., when the last process that is a member of the namespace terminates),
|
|
|
|
all IPC objects in the namespace are automatically destroyed.
|
|
|
|
|
|
|
|
Use of IPC namespaces requires a kernel that is configured with the
|
|
|
|
.B CONFIG_IPC_NS
|
|
|
|
option.
|
2013-01-13 23:45:09 +00:00
|
|
|
|
|
|
|
.SS Network namespaces (CLONE_NEWNET)
|
|
|
|
|
|
|
|
Network namespaces provide isolation of the system resources associated
|
|
|
|
with networking: network devices, IP addresses, IP routing tables,
|
|
|
|
.I /proc/net
|
|
|
|
directory,
|
|
|
|
.I /sys/class/net
|
|
|
|
directory, port numbers, and so on.
|
|
|
|
|
2013-01-14 03:24:34 +00:00
|
|
|
A network namespace provides an isolated view of the networking stack
|
|
|
|
(network device interfaces, IPv4 and IPv6 protocol stacks,
|
|
|
|
IP routing tables, firewall rules, the
|
|
|
|
.I /proc/net
|
|
|
|
and
|
|
|
|
.I /sys/class/net
|
|
|
|
directory trees, sockets, etc.).
|
|
|
|
A physical network device can live in exactly one
|
|
|
|
network namespace.
|
|
|
|
A virtual network device ("veth") pair provides a pipe-like abstraction
|
|
|
|
.\" FIXME Add pointer to veth(4) page when it is eventually completed
|
|
|
|
that can be used to create tunnels between network namespaces,
|
|
|
|
and can be used to create a bridge to a physical network device
|
|
|
|
in another namespace.
|
|
|
|
|
|
|
|
When a network namespace is freed
|
|
|
|
(i.e., when the last process in the namespace terminates),
|
|
|
|
its physical network devices are moved back to the
|
|
|
|
initial network namespace (not to the parent of the process).
|
|
|
|
|
|
|
|
Use of network namespaces requires a kernel that is configured with the
|
|
|
|
.B CONFIG_NET_NS
|
|
|
|
option.
|
|
|
|
|
2013-01-14 00:01:21 +00:00
|
|
|
.SS Mount namespaces (CLONE_NEWNS)
|
|
|
|
|
|
|
|
Mount namespaces isolate the set of file system mount points,
|
|
|
|
meaning that processes in different mount namespaces can
|
|
|
|
have different views of the file system hierarchy.
|
|
|
|
The set of mounts in a mount namespace is modified using
|
|
|
|
.BR mount (2)
|
|
|
|
and
|
|
|
|
.BR umount (2).
|
|
|
|
|
|
|
|
The
|
|
|
|
.IR /proc/[pid]/mounts
|
|
|
|
file (present since Linux 2.4.19)
|
|
|
|
lists all the file systems currently mounted in the
|
|
|
|
process's mount namespace.
|
|
|
|
The format of this file is documented in
|
|
|
|
.BR fstab (5).
|
|
|
|
Since kernel version 2.6.15, this file is pollable:
|
|
|
|
after opening the file for reading, a change in this file
|
|
|
|
(i.e., a file system mount or unmount) causes
|
|
|
|
.BR select (2)
|
|
|
|
to mark the file descriptor as readable, and
|
|
|
|
.BR poll (2)
|
|
|
|
and
|
|
|
|
.BR epoll_wait (2)
|
|
|
|
mark the file as having an error condition.
|
|
|
|
|
2013-01-14 00:11:55 +00:00
|
|
|
The
|
|
|
|
.IR /proc/[pid]/mountstats
|
|
|
|
file (present since Linux 2.6.17)
|
|
|
|
exports information (statistics, configuration information)
|
|
|
|
about the mount points in the process's mount namespace.
|
|
|
|
This file is only readable by the owner of the process.
|
|
|
|
Lines in this file have the form:
|
|
|
|
.RS
|
|
|
|
.in 12
|
|
|
|
.nf
|
|
|
|
|
|
|
|
device /dev/sda7 mounted on /home with fstype ext3 [statistics]
|
|
|
|
( 1 ) ( 2 ) (3 ) (4)
|
|
|
|
.fi
|
|
|
|
.in
|
|
|
|
|
|
|
|
The fields in each line are:
|
|
|
|
.TP 5
|
|
|
|
(1)
|
|
|
|
The name of the mounted device
|
|
|
|
(or "nodevice" if there is no corresponding device).
|
|
|
|
.TP
|
|
|
|
(2)
|
|
|
|
The mount point within the file system tree.
|
|
|
|
.TP
|
|
|
|
(3)
|
|
|
|
The file system type.
|
|
|
|
.TP
|
|
|
|
(4)
|
|
|
|
Optional statistics and configuration information.
|
|
|
|
Currently (as at Linux 2.6.26), only NFS file systems export
|
|
|
|
information via this field.
|
|
|
|
.RE
|
|
|
|
|
2013-01-13 23:45:09 +00:00
|
|
|
.SS PID namespaces (CLONE_NEWPID)
|
|
|
|
|
|
|
|
PID namespaces isolate the process ID number space,
|
|
|
|
meaning that processes in different PID namespaces can have the same PID.
|
|
|
|
PID namespaces allow containers to migrate to a new hosts
|
|
|
|
while the processes inside the container maintain the same PIDs.
|
|
|
|
Each PID namespace has its own init (PID 1, see
|
|
|
|
.BR init (1)),
|
|
|
|
the "ancestor of all processes" that
|
|
|
|
manages various system initialization tasks and
|
|
|
|
reaps orphaned child processes when they terminate.
|
|
|
|
|
|
|
|
From the point of view of a particular PID namespace instance,
|
|
|
|
a process has two PIDs: the PID inside the namespace,
|
|
|
|
and the PID outside the namespace on the host system.
|
|
|
|
PID namespaces can be nested:
|
|
|
|
a process will have one PID for each of the layers of the hierarchy
|
|
|
|
starting from the PID namespace in which it resides
|
|
|
|
through to the root PID namespace.
|
|
|
|
A process can see (e.g., send signals with
|
|
|
|
.BR kill(2))
|
|
|
|
only processes contained in its own PID namespace
|
|
|
|
and the namespaces nested below that PID namespace.
|
|
|
|
|
|
|
|
.SS User namespaces (CLONE_NEWUSER)
|
|
|
|
|
|
|
|
User namespaces isolate the user and group ID number spaces.
|
|
|
|
In other words, a process's user and group IDs can be different
|
|
|
|
inside and outside a user namespace.
|
|
|
|
A process can have a normal unprivileged user ID outside a user namespace
|
|
|
|
while at the same time having a user ID of 0 inside the namespace;
|
|
|
|
in other words,
|
|
|
|
the process has full privileges for operations inside the user namespace,
|
|
|
|
but is unprivileged for operations outside the namespace.
|
|
|
|
|
|
|
|
Starting in Linux 3.8, unprivileged processes can create user namespaces.
|
|
|
|
|
2013-01-14 03:08:20 +00:00
|
|
|
The
|
|
|
|
.IR /proc/[pid]/uid_map
|
|
|
|
and
|
|
|
|
.IR /proc/[pid]/gid_map
|
|
|
|
files (available since Linux 3.5)
|
|
|
|
.\" commit 22d917d80e842829d0ca0a561967d728eb1d6303
|
|
|
|
expose the mappings for user and group IDs
|
|
|
|
inside the user namespace for the process
|
|
|
|
.IR pid .
|
|
|
|
The description here explains the details for
|
|
|
|
.IR uid_map ;
|
|
|
|
.IR gid_map
|
|
|
|
is exactly the same,
|
|
|
|
but each instance of "user ID" is replaced by "group ID".
|
|
|
|
|
|
|
|
The
|
|
|
|
.I uid_map
|
|
|
|
file exposes the mapping of user IDs from the user namespace
|
|
|
|
of the process
|
|
|
|
.IR pid
|
|
|
|
to the user namespace of the process that opened
|
|
|
|
.IR uid_map
|
|
|
|
(but see a qualification to this point below).
|
|
|
|
In other words, processes that are in different user namespaces
|
|
|
|
will potentially see different values when reading from a particular
|
|
|
|
.I uid_map
|
|
|
|
file, depending on the user ID mappings for the user namespaces
|
|
|
|
of the reading processes.
|
|
|
|
|
|
|
|
Each line in the file specifies a 1-to-1 mapping of a range of contiguous
|
|
|
|
between two user namespaces.
|
|
|
|
The specification in each line takes the form of
|
|
|
|
three numbers delimited by white space.
|
|
|
|
The first two numbers specify the starting user ID in
|
|
|
|
each user namespace.
|
|
|
|
The third number specifies the length of the mapped range.
|
|
|
|
In detail, the fields are interpreted as follows:
|
|
|
|
.IP (1) 4
|
|
|
|
The start of the range of user IDs in
|
|
|
|
the user namespace of the process
|
|
|
|
.IR pid .
|
|
|
|
.IP (2)
|
|
|
|
The start of the range of user
|
|
|
|
IDs to which the user IDs specified by field one map.
|
|
|
|
How field two is interpreted depends on whether the process that opened
|
|
|
|
.I uid_map
|
|
|
|
and the process
|
|
|
|
.IR pid
|
|
|
|
are in the same user namespace, as follows:
|
|
|
|
.RS
|
|
|
|
.IP a) 3
|
|
|
|
If the two processes are in different user namespaces:
|
|
|
|
field two is the start of a range of
|
|
|
|
user IDs in the user namespace of the process that opened
|
|
|
|
.IR uid_map .
|
|
|
|
.IP b)
|
|
|
|
If the two processes are in the same user namespace:
|
|
|
|
field two is the start of the range of
|
|
|
|
user IDs in the parent user namespace of the process
|
|
|
|
.IR pid .
|
|
|
|
(The "parent user namespace"
|
|
|
|
is the user namespace of the process that created a user namespace
|
|
|
|
via a call to
|
|
|
|
.BR unshare (2)
|
|
|
|
or
|
|
|
|
.BR clone (2)
|
|
|
|
with the
|
|
|
|
.BR CLONE_NEWUSER
|
|
|
|
flag.)
|
|
|
|
This case enables the opener of
|
|
|
|
.I uid_map
|
|
|
|
(the common case here is opening
|
|
|
|
.IR /proc/self/uid_map )
|
|
|
|
to see the mapping of user IDs into the user namespace of the process
|
|
|
|
that created this user namespace.
|
|
|
|
.RE
|
|
|
|
.IP (3)
|
|
|
|
The length of the range of user IDs that is mapped between the two
|
|
|
|
user namespaces.
|
|
|
|
.PP
|
|
|
|
After the creation of a new user namespace, the
|
|
|
|
.I uid_map
|
|
|
|
file may be written to exactly once to specify
|
|
|
|
the mapping of user IDs in the new user namespace.
|
|
|
|
(An attempt to write more than once to the file fails with the error
|
|
|
|
.BR EPERM .)
|
|
|
|
|
|
|
|
The lines written to
|
|
|
|
.IR uid_map
|
|
|
|
must conform to the following rules:
|
|
|
|
.IP * 3
|
|
|
|
The three fields must be valid numbers,
|
|
|
|
and the last field must be greater than 0.
|
|
|
|
.IP *
|
|
|
|
Lines are terminated by newline characters.
|
|
|
|
.IP *
|
|
|
|
There is an (arbitrary) limit on the number of lines in the file.
|
|
|
|
As at Linux 3.8, the limit is five lines.
|
|
|
|
.IP *
|
|
|
|
The range of user IDs specified in each line cannot overlap with the ranges
|
|
|
|
in any other lines.
|
|
|
|
In the current implementation (Linux 3.8), this requirement is
|
|
|
|
satisfied by a simplistic implementation that imposes the further
|
|
|
|
requirement that
|
|
|
|
the values in both field 1 and field 2 of successive lines must be
|
|
|
|
in ascending numerical order.
|
|
|
|
.PP
|
|
|
|
Writes that violate the above rules fail with the error
|
|
|
|
.BR EINVAL .
|
|
|
|
|
|
|
|
In order for a process to write to the
|
|
|
|
.I /proc/[pid]/uid_map
|
|
|
|
.RI ( /proc/[pid]/gid_map )
|
|
|
|
file, the following requirements must be met:
|
|
|
|
.IP * 3
|
|
|
|
The process must have the
|
|
|
|
.BR CAP_SETUID
|
|
|
|
.RB ( CAP_SETGID )
|
|
|
|
capability in the user namespace of the process
|
|
|
|
.IR pid .
|
|
|
|
.IP *
|
|
|
|
The process must have the
|
|
|
|
.BR CAP_SETUID
|
|
|
|
.RB ( CAP_SETGID )
|
|
|
|
capability in the parent user namespace.
|
|
|
|
.IP *
|
|
|
|
The process must be in either the user namespace of the process
|
|
|
|
.I pid
|
|
|
|
or inside the parent user namespace of the process
|
|
|
|
.IR pid .
|
|
|
|
|
2013-01-13 23:45:09 +00:00
|
|
|
.SS UTS namespaces (CLONE_NEWUTS)
|
|
|
|
|
|
|
|
UTS namespaces provide isolation of two system identifiers:
|
|
|
|
the hostname and the NIS domain name.
|
|
|
|
These identifiers are set using
|
|
|
|
.BR sethostname (2)
|
|
|
|
and
|
|
|
|
.BR setdomainname (2),
|
|
|
|
and can be retrieved using
|
|
|
|
.BR uname (2),
|
|
|
|
.BR gethostname (2),
|
|
|
|
and
|
|
|
|
.BR getdomainname (2).
|
|
|
|
|
|
|
|
.SH CONFORMING TO
|
|
|
|
Namespaces are a Linux-specific feature.
|
|
|
|
.SH SEE ALSO
|
|
|
|
.BR readlink (1),
|
|
|
|
.BR clone (2),
|
|
|
|
.BR setns (2),
|
|
|
|
.BR unshare (2),
|
|
|
|
.BR proc (5),
|
|
|
|
.BR credentials (7),
|
|
|
|
.BR capabilities (7)
|