namespaces.7: Remove cgroup namespaces content to a separate page

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2016-05-06 16:08:33 +02:00
parent 434aadd5d3
commit a2ee61a38a
1 changed files with 3 additions and 169 deletions

View File

@ -193,175 +193,8 @@ This file is a handle for the UTS namespace of the process.
.\" ==================== Cgroup namespaces ====================
.\"
.SS Cgroup namespaces (CLONE_NEWCGROUP)
Cgroup namespaces virtualize the view of a process's cgroups (see
.BR cgroups (7))
as seen via
.IR /proc/[pid]/cgroup
and
.IR /proc/[pid]/mountinfo .
Each cgroup namespace has its own set of cgroup root directories,
which are the base points for the relative locations displayed in
.IR /proc/[pid]/cgroup .
When a process creates a new cgroup namespace using
.BR clone (2)
or
.BR unshare (2)
with the
.BR CLONE_NEWCGROUP
flag, then its current cgroups directories become its cgroup root directories.
(This applies both for the cgroups version 1 hierarchies
and the cgroups version 2 unified hierarchy.)
When viewing
.IR /proc/[pid]/cgroup ,
the pathname shown in the third field of each record will be
relative to the reading process's cgroup root directory.
If the cgroup directory of the target process lies outside
the root directory of the reading process's cgroup namespace,
then the pathname will show
.I ../
entries for each ancestor level in the cgroup hierarchy.
The following shell session demonstrates the effect of creating
a new cgroup namespace.
First, (as superuser) we create a child cgroup in the
.I freezer
hierarchy, and put the shell into that cgroup:
.nf
.in +4n
# \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP
# \fBecho $$\fP # Show PID of this shell
30655
# \fBsh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP
# \fBcat /proc/self/cgroup | grep freezer\fP
7:freezer:/sub
.in
.fi
Next, we use
.BR unshare (1)
to create a process running a new shell in new cgroup and mount namespaces:
.nf
.in +4n
# \fBunshare \-Cm bash\fP
.in
.fi
We then inspect the
.IR /proc/[pid]/cgroup
files of, respectively, the new shell process started by the
.BR unshare (1)
command, a process that is in the original cgroup namespace
.RI ( init ,
with PID 1), and a process in a sibling cgroup:
.nf
.in +4n
$ \fBcat /proc/self/cgroup | grep freezer\fP
7:freezer:/
$ \fBcat /proc/1/cgroup | grep freezer\fP
7:freezer:/..
$ \fBcat /proc/20124/cgroup | grep freezer\fP
7:freezer:/../sub2
.in
.fi
However, when we look in
.IR /proc/self/mountinfo
we see the following anomaly:
.nf
.in +4n
# \fBcat /proc/self/mountinfo | grep freezer\fP
155 145 0:32 /.. /sys/fs/cgroup/freezer ...
.in
.fi
The fourth field this file should show the
directory in the cgroup filesystem which forms the root of this mount.
Since by the definition of cgroup namespaces, the process's current
freezer cgroup directory became its root freezer cgroup directory,
we should see \(aq/\(aq in this field.
The problem here is that we are seeing a mount entry for the cgroup
filesystem corresponding to our initial shell process's cgroup namespace
(whose cgroup filesystem is indeed rooted in the parent directory of
.IR sub ).
We need to remount the freezer cgroup filesystem
inside this cgroup namespace, after which we see the expected results:
.nf
.in +4n
# mount \-\-make\-rprivate # Don't propagate mount events
# to other namespaces
# umount /sys/fs/cgroup/freezer
# mount \-t cgroup \-o freezer freezer /sys/fs/cgroup/freezer
# cat /proc/self/mountinfo | grep freezer
155 145 0:32 / /sys/fs/cgroup/freezer rw,relatime ...
.in
.fi
Use of cgroup namespaces requires a kernel that is configured with the
.B CONFIG_CGROUPS
option.
Among the purposes served by the
virtualization provided by cgroup namespaces are the following:
.IP * 2
It prevents information leaks whereby cgroup directory paths outside of
a container would otherwise be visible to processes in the container.
Such leakages could, for example,
reveal information about the container framework
to containerized applications.
.IP *
It allows easier and more flexible
confinement of container root tasks, because they can mount
their own cgroup filesystems without gaining access to ancestor
cgroup directories.
Consider, for example, the following scenario:
.RS 4
.IP \(bu 2
We have a cgroup directory,
.IR /cg/1 ,
that is owned by user ID 9000.
.IP \(bu
We have a process,
.IR X ,
also owned by user ID 9000,
that is namespaced under the cgroup
.IR /cg/1/2
(i.e.,
.I X
was placed in a new cgroup namespace via
.BR clone (2)
or
.BR unshare (2)
with the
.BR CLONE_NEWCGROUP
flag).
.RE
.IP
In the absence of cgroup namespacing, because the cgroup directory
.IR /cg/1
is owned (and writable) by UID 9000 and process X is also owned
by user ID 9000, then process X would be able to modify the contents
of cgroups files (i.e., change cgroup settings) not only in
.IR /cg/1/2
but also in the ancestor cgroup directory
.IR /cg/1 .
Namespacing process
.IR X
under the cgroup directory
.IR /cg/1/2
prevents it modifying files in
.IR /cg/1 ,
since it cannot even see the contents of that directory
(or of further removed cgroup ancestor directories).
Combined with correct enforcement of hierarchical limits,
this prevents that process X from escaping the limits imposed
by ancestor cgroups.
See
.BR cgroup_namespaces (7).
.\"
.\" ==================== IPC namespaces ====================
.\"
@ -549,6 +382,7 @@ See
.BR unshare (2),
.BR proc (5),
.BR capabilities (7),
.BR cgroup_namespaces (7),
.BR cgroups (7),
.BR credentials (7),
.BR pid_namespaces (7),