namespaces.7: Rework discussion of cgroup namespaces

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2016-05-06 15:01:11 +02:00
parent 99ef85aba8
commit 8079aefa6f
1 changed files with 68 additions and 29 deletions

View File

@ -193,10 +193,10 @@ This file is a handle for the UTS namespace of the process.
.\" ==================== Cgroup namespaces ====================
.\"
.SS Cgroup namespaces (CLONE_NEWCGROUP)
Cgroup namespaces virtualize the view of a process's cgroups as seen via
.IR /proc/[pid]/cgroup
(see
.BR cgroups (7)).
Cgroup namespaces virtualize the view of a process's cgroups (see
.BR cgroups (7))
as seen via
.IR /proc/[pid]/cgroup .
Each cgroup namespace has its own set of cgroup root directories,
which are the base points for the relative locations displayed in
@ -209,7 +209,7 @@ with the
.BR CLONE_NEWCGROUP
flag, then its current cgroups directories become its cgroup root directories.
(This applies both for the cgroups version 1 hierarchies
as well as the cgroups version 2 unified hierarchy.)
and the cgroups version 2 unified hierarchy.)
When viewing
.IR /proc/[pid]/cgroup ,
@ -223,28 +223,28 @@ entries for each ancestor level in the cgroup hierarchy.
The following shell session demonstrates the effect of creating
a new cgroup namespace.
First, we create child cgroup in the
First, (as superuser) we create a child cgroup in the
.I freezer
hierarchy, and put the shell into that cgroup:
.nf
.in +4n
$ \fBsudo mkdir \-p /sys/fs/cgroup/freezer/sub\fP
$ \fBecho $$\fP # Show PID of this shell
# \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP
# \fBecho $$\fP # Show PID of this shell
30655
$ \fBsudo sh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP
$ \fBcat /proc/self/cgroup | grep freezer\fP
# \fBsh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP
# \fBcat /proc/self/cgroup | grep freezer\fP
7:freezer:/sub
.in
.fi
Next, we use
.BR unshare (1)
to create a process running a shell in new user and cgroup namespaces:
to create a process running a shell in a new cgroup namespace:
.nf
.in +4n
$ \fBunshare -U -C bash\fP
# \fBunshare \-C bash\fP
.in
.fi
@ -267,26 +267,65 @@ $ \fBcat /proc/20124/cgroup | grep freezer\fP
.in
.fi
The virtualization provided by cgroup namespaces serves at least two purposes.
First, it can be used to prevent
information leaks whereby cgroup directory paths outside of
a container would otherwise be visible to processes in the container.
More importantly, this allows easier and more flexible
confinement of container root tasks, because they can mount
their own cgroup filesystems without needing to gain access to ancestor
cgroup directories.
So, for example, even if
.I /cg/1
is owned by uid 100000, a task namespaced under
.I /cg/1/2
owned by UID 100000 can mount that cgroup but not change settings in
.IR /cg/1 .
Combined with correct enforcement of hierarchical limits,
this prevents that task from escaping its limits.
Use of cgroup namespaces requires a kernel that is configured with the
.B CONFIG_CGROUPS
option.
Among the purposes served by the
virtualization provided by cgroup namespaces are the following:
.IP * 2
It prevents information leaks whereby cgroup directory paths outside of
a container would otherwise be visible to processes in the container.
Such leakages could, for example,
reveal information about the container framework
to containerized applications.
.IP *
It allows easier and more flexible
confinement of container root tasks, because they can mount
their own cgroup filesystems without gaining access to ancestor
cgroup directories.
Consider, for example, the following scenario:
.RS 4
.IP \(bu 2
We have a cgroup directory,
.IR /cg/1 ,
that is owned by user ID 9000.
.IP \(bu
We have a process,
.IR X ,
also owned by user ID 9000,
that is namespaced under the cgroup
.IR /cg/1/2
(i.e.,
.I X
was placed in a new cgroup namespace via
.BR clone (2)
or
.BR unshare (2)
with the
.BR CLONE_NEWCGROUP
flag).
.RE
.IP
In the absence of cgroup namespacing, because the cgroup directory
.IR /cg/1
is owned (and writable) by UID 9000 and process X is also owned
by user ID 9000, then process X would be able to modify the contents
of cgroups files (i.e., change cgroup settings) not only in
.IR /cg/1/2
but also in the ancestor cgroup directory
.IR /cg/1 .
Namespacing process
.IR X
under the cgroup directory
.IR /cg/1/2
prevents it modifying files in
.IR /cg/1 ,
since it cannot even see the contents of that directory
(or of further removed cgroup ancestor directories).
Combined with correct enforcement of hierarchical limits,
this prevents that process X from escaping the limits imposed
by ancestor cgroups.
.\"
.\" ==================== IPC namespaces ====================
.\"