mirror of https://github.com/mkerrisk/man-pages
232 lines
7.0 KiB
Groff
232 lines
7.0 KiB
Groff
.\" Copyright (c) 2016 by Michael Kerrisk <mtk.manpages@gmail.com>
|
|
.\"
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
.\" preserved on all copies.
|
|
.\"
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
.\" permission notice identical to this one.
|
|
.\"
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
.\" have taken the same level of care in the production of this manual,
|
|
.\" which is licensed free of charge, as they might when working
|
|
.\" professionally.
|
|
.\"
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
.\" %%%LICENSE_END
|
|
.\"
|
|
.\"
|
|
.TH CGROUP_NAMESPACES 7 2016-07-17 "Linux" "Linux Programmer's Manual"
|
|
.SH NAME
|
|
cgroup_namespaces \- overview of Linux cgroup namespaces
|
|
.SH DESCRIPTION
|
|
For an overview of namespaces, see
|
|
.BR namespaces (7).
|
|
|
|
Cgroup namespaces virtualize the view of a process's cgroups (see
|
|
.BR cgroups (7))
|
|
as seen via
|
|
.IR /proc/[pid]/cgroup
|
|
and
|
|
.IR /proc/[pid]/mountinfo .
|
|
|
|
Each cgroup namespace has its own set of cgroup root directories,
|
|
which are the base points for the relative locations displayed in
|
|
.IR /proc/[pid]/cgroup .
|
|
When a process creates a new cgroup namespace using
|
|
.BR clone (2)
|
|
or
|
|
.BR unshare (2)
|
|
with the
|
|
.BR CLONE_NEWCGROUP
|
|
flag, it enters a new cgroup namespace in which its current
|
|
cgroups directories become the cgroup root directories
|
|
of the new namespace.
|
|
(This applies both for the cgroups version 1 hierarchies
|
|
and the cgroups version 2 unified hierarchy.)
|
|
|
|
When viewing
|
|
.IR /proc/[pid]/cgroup ,
|
|
the pathname shown in the third field of each record will be
|
|
relative to the reading process's cgroup root directory.
|
|
If the cgroup directory of the target process lies outside
|
|
the root directory of the reading process's cgroup namespace,
|
|
then the pathname will show
|
|
.I ../
|
|
entries for each ancestor level in the cgroup hierarchy.
|
|
|
|
The following shell session demonstrates the effect of creating
|
|
a new cgroup namespace.
|
|
First, (as superuser) we create a child cgroup in the
|
|
.I freezer
|
|
hierarchy, and put the shell into that cgroup:
|
|
|
|
.nf
|
|
.in +4n
|
|
# \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP
|
|
# \fBecho $$\fP # Show PID of this shell
|
|
30655
|
|
# \fBsh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP
|
|
# \fBcat /proc/self/cgroup | grep freezer\fP
|
|
7:freezer:/sub
|
|
.in
|
|
.fi
|
|
|
|
Next, we use
|
|
.BR unshare (1)
|
|
to create a process running a new shell in new cgroup and mount namespaces:
|
|
|
|
.nf
|
|
.in +4n
|
|
# \fBunshare \-Cm bash\fP
|
|
.in
|
|
.fi
|
|
|
|
We then inspect the
|
|
.IR /proc/[pid]/cgroup
|
|
files of, respectively, the new shell process started by the
|
|
.BR unshare (1)
|
|
command, a process that is in the original cgroup namespace
|
|
.RI ( init ,
|
|
with PID 1), and a process in a sibling cgroup:
|
|
|
|
.nf
|
|
.in +4n
|
|
$ \fBcat /proc/self/cgroup | grep freezer\fP
|
|
7:freezer:/
|
|
$ \fBcat /proc/1/cgroup | grep freezer\fP
|
|
7:freezer:/..
|
|
$ \fBcat /proc/20124/cgroup | grep freezer\fP
|
|
7:freezer:/../sub2
|
|
.in
|
|
.fi
|
|
|
|
However, when we look in
|
|
.IR /proc/self/mountinfo
|
|
we see the following anomaly:
|
|
|
|
.nf
|
|
.in +4n
|
|
# \fBcat /proc/self/mountinfo | grep freezer\fP
|
|
155 145 0:32 /.. /sys/fs/cgroup/freezer ...
|
|
.in
|
|
.fi
|
|
|
|
The fourth field of this file should show the
|
|
directory in the cgroup filesystem which forms the root of this mount.
|
|
Since by the definition of cgroup namespaces, the process's current
|
|
freezer cgroup directory became its root freezer cgroup directory,
|
|
we should see \(aq/\(aq in this field.
|
|
The problem here is that we are seeing a mount entry for the cgroup
|
|
filesystem corresponding to our initial shell process's cgroup namespace
|
|
(whose cgroup filesystem is indeed rooted in the parent directory of
|
|
.IR sub ).
|
|
We need to remount the freezer cgroup filesystem
|
|
inside this cgroup namespace, after which we see the expected results:
|
|
|
|
.nf
|
|
.in +4n
|
|
# \fBmount \-\-make\-rslave /\fP # Don't propagate mount events
|
|
# to other namespaces
|
|
# \fBumount /sys/fs/cgroup/freezer\fP
|
|
# \fBmount \-t cgroup \-o freezer freezer /sys/fs/cgroup/freezer\fP
|
|
# \fBcat /proc/self/mountinfo | grep freezer\fP
|
|
155 145 0:32 / /sys/fs/cgroup/freezer rw,relatime ...
|
|
.in
|
|
.fi
|
|
|
|
Use of cgroup namespaces requires a kernel that is configured with the
|
|
.B CONFIG_CGROUPS
|
|
option.
|
|
.\"
|
|
.SH CONFORMING TO
|
|
Namespaces are a Linux-specific feature.
|
|
.SH NOTES
|
|
Among the purposes served by the
|
|
virtualization provided by cgroup namespaces are the following:
|
|
.IP * 2
|
|
It prevents information leaks whereby cgroup directory paths outside of
|
|
a container would otherwise be visible to processes in the container.
|
|
Such leakages could, for example,
|
|
reveal information about the container framework
|
|
to containerized applications.
|
|
.IP *
|
|
It eases tasks such as container migration.
|
|
The virtualization provided by cgroup namespaces
|
|
allows containers to be isolated from knowledge of
|
|
the pathnames of ancestor cgroups.
|
|
Without such isolation, the full cgroup pathnames (displayed in
|
|
.IR /proc/self/cgroups )
|
|
would need to be replicated on the target system when migrating a container;
|
|
those pathnames would also need to be unique,
|
|
so that they don't conflict with other pathnames on the target system.
|
|
.IP *
|
|
It allows better confinement of containerized processes,
|
|
because it is possible to mount the container's cgroup filesystems such that
|
|
the container processes can't gain access to ancestor cgroup directories.
|
|
Consider, for example, the following scenario:
|
|
.RS 4
|
|
.IP \(bu 2
|
|
We have a cgroup directory,
|
|
.IR /cg/1 ,
|
|
that is owned by user ID 9000.
|
|
.IP \(bu
|
|
We have a process,
|
|
.IR X ,
|
|
also owned by user ID 9000,
|
|
that is namespaced under the cgroup
|
|
.IR /cg/1/2
|
|
(i.e.,
|
|
.I X
|
|
was placed in a new cgroup namespace via
|
|
.BR clone (2)
|
|
or
|
|
.BR unshare (2)
|
|
with the
|
|
.BR CLONE_NEWCGROUP
|
|
flag).
|
|
.RE
|
|
.IP
|
|
In the absence of cgroup namespacing, because the cgroup directory
|
|
.IR /cg/1
|
|
is owned (and writable) by UID 9000 and process
|
|
.I X
|
|
is also owned by user ID 9000, then process
|
|
.I X
|
|
would be able to modify the contents of cgroups files
|
|
(i.e., change cgroup settings) not only in
|
|
.IR /cg/1/2
|
|
but also in the ancestor cgroup directory
|
|
.IR /cg/1 .
|
|
Namespacing process
|
|
.IR X
|
|
under the cgroup directory
|
|
.IR /cg/1/2 ,
|
|
in combination with suitable mount operations
|
|
for the cgroup filesystem (as shown above),
|
|
prevents it modifying files in
|
|
.IR /cg/1 ,
|
|
since it cannot even see the contents of that directory
|
|
(or of further removed cgroup ancestor directories).
|
|
Combined with correct enforcement of hierarchical limits,
|
|
this prevents process
|
|
.I X
|
|
from escaping the limits imposed by ancestor cgroups.
|
|
.SH SEE ALSO
|
|
.BR unshare (1),
|
|
.BR clone (2),
|
|
.BR setns (2),
|
|
.BR unshare (2),
|
|
.BR proc (5),
|
|
.BR cgroups (7),
|
|
.BR credentials (7),
|
|
.BR namespaces (7),
|
|
.BR user_namespaces (7)
|