mirror of https://github.com/mkerrisk/man-pages
namespaces.7: Remove cgroup namespaces content to a separate page
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
434aadd5d3
commit
a2ee61a38a
|
@ -193,175 +193,8 @@ This file is a handle for the UTS namespace of the process.
|
|||
.\" ==================== Cgroup namespaces ====================
|
||||
.\"
|
||||
.SS Cgroup namespaces (CLONE_NEWCGROUP)
|
||||
Cgroup namespaces virtualize the view of a process's cgroups (see
|
||||
.BR cgroups (7))
|
||||
as seen via
|
||||
.IR /proc/[pid]/cgroup
|
||||
and
|
||||
.IR /proc/[pid]/mountinfo .
|
||||
|
||||
Each cgroup namespace has its own set of cgroup root directories,
|
||||
which are the base points for the relative locations displayed in
|
||||
.IR /proc/[pid]/cgroup .
|
||||
When a process creates a new cgroup namespace using
|
||||
.BR clone (2)
|
||||
or
|
||||
.BR unshare (2)
|
||||
with the
|
||||
.BR CLONE_NEWCGROUP
|
||||
flag, then its current cgroups directories become its cgroup root directories.
|
||||
(This applies both for the cgroups version 1 hierarchies
|
||||
and the cgroups version 2 unified hierarchy.)
|
||||
|
||||
When viewing
|
||||
.IR /proc/[pid]/cgroup ,
|
||||
the pathname shown in the third field of each record will be
|
||||
relative to the reading process's cgroup root directory.
|
||||
If the cgroup directory of the target process lies outside
|
||||
the root directory of the reading process's cgroup namespace,
|
||||
then the pathname will show
|
||||
.I ../
|
||||
entries for each ancestor level in the cgroup hierarchy.
|
||||
|
||||
The following shell session demonstrates the effect of creating
|
||||
a new cgroup namespace.
|
||||
First, (as superuser) we create a child cgroup in the
|
||||
.I freezer
|
||||
hierarchy, and put the shell into that cgroup:
|
||||
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP
|
||||
# \fBecho $$\fP # Show PID of this shell
|
||||
30655
|
||||
# \fBsh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP
|
||||
# \fBcat /proc/self/cgroup | grep freezer\fP
|
||||
7:freezer:/sub
|
||||
.in
|
||||
.fi
|
||||
|
||||
Next, we use
|
||||
.BR unshare (1)
|
||||
to create a process running a new shell in new cgroup and mount namespaces:
|
||||
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBunshare \-Cm bash\fP
|
||||
.in
|
||||
.fi
|
||||
|
||||
We then inspect the
|
||||
.IR /proc/[pid]/cgroup
|
||||
files of, respectively, the new shell process started by the
|
||||
.BR unshare (1)
|
||||
command, a process that is in the original cgroup namespace
|
||||
.RI ( init ,
|
||||
with PID 1), and a process in a sibling cgroup:
|
||||
|
||||
.nf
|
||||
.in +4n
|
||||
$ \fBcat /proc/self/cgroup | grep freezer\fP
|
||||
7:freezer:/
|
||||
$ \fBcat /proc/1/cgroup | grep freezer\fP
|
||||
7:freezer:/..
|
||||
$ \fBcat /proc/20124/cgroup | grep freezer\fP
|
||||
7:freezer:/../sub2
|
||||
.in
|
||||
.fi
|
||||
|
||||
However, when we look in
|
||||
.IR /proc/self/mountinfo
|
||||
we see the following anomaly:
|
||||
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBcat /proc/self/mountinfo | grep freezer\fP
|
||||
155 145 0:32 /.. /sys/fs/cgroup/freezer ...
|
||||
.in
|
||||
.fi
|
||||
|
||||
The fourth field this file should show the
|
||||
directory in the cgroup filesystem which forms the root of this mount.
|
||||
Since by the definition of cgroup namespaces, the process's current
|
||||
freezer cgroup directory became its root freezer cgroup directory,
|
||||
we should see \(aq/\(aq in this field.
|
||||
The problem here is that we are seeing a mount entry for the cgroup
|
||||
filesystem corresponding to our initial shell process's cgroup namespace
|
||||
(whose cgroup filesystem is indeed rooted in the parent directory of
|
||||
.IR sub ).
|
||||
We need to remount the freezer cgroup filesystem
|
||||
inside this cgroup namespace, after which we see the expected results:
|
||||
|
||||
.nf
|
||||
.in +4n
|
||||
# mount \-\-make\-rprivate # Don't propagate mount events
|
||||
# to other namespaces
|
||||
# umount /sys/fs/cgroup/freezer
|
||||
# mount \-t cgroup \-o freezer freezer /sys/fs/cgroup/freezer
|
||||
# cat /proc/self/mountinfo | grep freezer
|
||||
155 145 0:32 / /sys/fs/cgroup/freezer rw,relatime ...
|
||||
.in
|
||||
.fi
|
||||
|
||||
Use of cgroup namespaces requires a kernel that is configured with the
|
||||
.B CONFIG_CGROUPS
|
||||
option.
|
||||
|
||||
Among the purposes served by the
|
||||
virtualization provided by cgroup namespaces are the following:
|
||||
.IP * 2
|
||||
It prevents information leaks whereby cgroup directory paths outside of
|
||||
a container would otherwise be visible to processes in the container.
|
||||
Such leakages could, for example,
|
||||
reveal information about the container framework
|
||||
to containerized applications.
|
||||
.IP *
|
||||
It allows easier and more flexible
|
||||
confinement of container root tasks, because they can mount
|
||||
their own cgroup filesystems without gaining access to ancestor
|
||||
cgroup directories.
|
||||
Consider, for example, the following scenario:
|
||||
.RS 4
|
||||
.IP \(bu 2
|
||||
We have a cgroup directory,
|
||||
.IR /cg/1 ,
|
||||
that is owned by user ID 9000.
|
||||
.IP \(bu
|
||||
We have a process,
|
||||
.IR X ,
|
||||
also owned by user ID 9000,
|
||||
that is namespaced under the cgroup
|
||||
.IR /cg/1/2
|
||||
(i.e.,
|
||||
.I X
|
||||
was placed in a new cgroup namespace via
|
||||
.BR clone (2)
|
||||
or
|
||||
.BR unshare (2)
|
||||
with the
|
||||
.BR CLONE_NEWCGROUP
|
||||
flag).
|
||||
.RE
|
||||
.IP
|
||||
In the absence of cgroup namespacing, because the cgroup directory
|
||||
.IR /cg/1
|
||||
is owned (and writable) by UID 9000 and process X is also owned
|
||||
by user ID 9000, then process X would be able to modify the contents
|
||||
of cgroups files (i.e., change cgroup settings) not only in
|
||||
.IR /cg/1/2
|
||||
but also in the ancestor cgroup directory
|
||||
.IR /cg/1 .
|
||||
Namespacing process
|
||||
.IR X
|
||||
under the cgroup directory
|
||||
.IR /cg/1/2
|
||||
prevents it modifying files in
|
||||
.IR /cg/1 ,
|
||||
since it cannot even see the contents of that directory
|
||||
(or of further removed cgroup ancestor directories).
|
||||
Combined with correct enforcement of hierarchical limits,
|
||||
this prevents that process X from escaping the limits imposed
|
||||
by ancestor cgroups.
|
||||
See
|
||||
.BR cgroup_namespaces (7).
|
||||
.\"
|
||||
.\" ==================== IPC namespaces ====================
|
||||
.\"
|
||||
|
@ -549,6 +382,7 @@ See
|
|||
.BR unshare (2),
|
||||
.BR proc (5),
|
||||
.BR capabilities (7),
|
||||
.BR cgroup_namespaces (7),
|
||||
.BR cgroups (7),
|
||||
.BR credentials (7),
|
||||
.BR pid_namespaces (7),
|
||||
|
|
Loading…
Reference in New Issue