2016-04-24 17:49:13 +00:00
|
|
|
.\" Copyright (C) 2015 Serge Hallyn <serge@hallyn.com>
|
|
|
|
.\"
|
|
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\" %%%LICENSE_END
|
|
|
|
.\"
|
2016-04-24 18:26:50 +00:00
|
|
|
.TH CGROUPS 7 2016-04-24 "Linux" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
|
|
|
cgroups \- Linux control groups
|
|
|
|
.SH DESCRIPTION
|
|
|
|
Control cgroups, usually referred to as cgroups,
|
|
|
|
are a Linux kernel feature which provides for grouping of tasks and
|
|
|
|
resource tracking and limitations for those groups.
|
2016-04-24 17:39:04 +00:00
|
|
|
While several systems have been introduced to help in configuring and
|
|
|
|
managing cgroups, the kernel's cgroup interface is provided through
|
2016-04-24 18:26:50 +00:00
|
|
|
a pseudo-filesystem called cgroupfs.
|
|
|
|
Task grouping is implemented in the core cgroup kernel code,
|
|
|
|
while resource tracking and limits are implemented in
|
|
|
|
a set of per-resource-type subsystems (memory, CPU, and so on) which may be
|
2016-04-24 17:39:04 +00:00
|
|
|
enabled as separate hierarchies, or joined into comounted hierarchies.
|
2016-04-24 18:26:50 +00:00
|
|
|
|
|
|
|
Each hierarchy constitutes a separate mount of the cgroup filesystem,
|
2016-04-24 17:39:04 +00:00
|
|
|
with the subsystems enabled in that hierarchy listed in the mount options.
|
2016-04-24 18:26:50 +00:00
|
|
|
For each mounted hierarchy,
|
|
|
|
the directory tree mirrors the control group hierarchy.
|
2016-04-24 17:39:04 +00:00
|
|
|
Each control group is represented by a directory, with each of its child
|
|
|
|
control cgroups represented as a child directory.
|
2016-04-24 18:26:50 +00:00
|
|
|
For instance,
|
|
|
|
.IR /user/joe/1.session
|
|
|
|
represents control group
|
|
|
|
.IR 1.session ,
|
|
|
|
which is a child of cgroup
|
|
|
|
.IR joe ,
|
|
|
|
which is a child of
|
|
|
|
.IR /user .
|
|
|
|
Under each cgroup directory is a set of files which can be read or
|
2016-04-24 17:39:04 +00:00
|
|
|
written to, reflecting resource limits and a few general cgroup
|
|
|
|
properties.
|
|
|
|
|
2016-04-24 18:26:50 +00:00
|
|
|
In general, cgroup limits are hierarchical, meaning that the limits placed on
|
|
|
|
.IR /user/joe
|
|
|
|
cannot be exceeded by
|
|
|
|
.IR /usr/joe/1.session .
|
|
|
|
There are currently exceptions to this rule,
|
|
|
|
but stricter adherence is a goal as cgroups are being largely reworked.
|
|
|
|
|
|
|
|
The existing subsystems include:
|
|
|
|
|
|
|
|
.PD 0
|
|
|
|
.IP * 2
|
2016-04-25 06:43:10 +00:00
|
|
|
.I cpu
|
2016-04-24 18:26:50 +00:00
|
|
|
.IP *
|
|
|
|
.I cpuacct
|
|
|
|
.IP *
|
2016-04-25 06:43:10 +00:00
|
|
|
.I cpuset
|
|
|
|
.IP *
|
|
|
|
.I memory
|
|
|
|
.IP *
|
2016-04-24 18:26:50 +00:00
|
|
|
.I devices
|
|
|
|
.IP *
|
|
|
|
.I freezer
|
|
|
|
.IP *
|
2016-04-25 06:43:10 +00:00
|
|
|
.I net_cls
|
2016-04-24 18:26:50 +00:00
|
|
|
.IP *
|
2016-04-25 06:43:10 +00:00
|
|
|
.I blkio
|
2016-04-24 18:26:50 +00:00
|
|
|
.IP *
|
2016-04-25 06:43:10 +00:00
|
|
|
.I perf_event
|
2016-04-24 18:26:50 +00:00
|
|
|
.IP *
|
|
|
|
.I net_pri
|
|
|
|
.IP *
|
2016-04-25 06:43:10 +00:00
|
|
|
.I hugetlb
|
2016-04-25 06:33:41 +00:00
|
|
|
.IP *
|
|
|
|
.I pids
|
2016-04-24 18:26:50 +00:00
|
|
|
.PD
|
|
|
|
.PP
|
2016-04-24 17:39:04 +00:00
|
|
|
In addition, cgroups can be mounted with no bound subsystem, in which case
|
2016-04-24 18:26:50 +00:00
|
|
|
they serve only to track processes.
|
|
|
|
An example of this is the
|
|
|
|
.I name=systemd
|
|
|
|
cgroup which is used by
|
|
|
|
.BR systemd (1)
|
|
|
|
to track services and user sessions.
|
|
|
|
.\"
|
|
|
|
.SS Mounting
|
2016-04-24 17:39:04 +00:00
|
|
|
To be available, a given cgroup subsystem must be compiled into the
|
2016-04-24 18:26:50 +00:00
|
|
|
kernel.
|
|
|
|
Since they are exposed through a virtual filesystem, subsystems
|
|
|
|
must be mounted before they can be controlled.
|
|
|
|
The usual place for this is under
|
|
|
|
.I /sys/fs/cgroup.
|
|
|
|
If all the desired subsystems can be co-mounted,
|
2016-04-24 17:39:04 +00:00
|
|
|
then the system may simply
|
|
|
|
|
2016-04-24 18:26:50 +00:00
|
|
|
mount -t cgroup cgroup /sys/fs/cgroup
|
2016-04-24 17:39:04 +00:00
|
|
|
|
|
|
|
If multiple, separately mounted subsystems are desired, then this is
|
2016-04-24 18:26:50 +00:00
|
|
|
usually done in per-subsystem subdirectories.
|
|
|
|
This requires first mounting a tmpfs under
|
|
|
|
.I /sys/fs/cgroup
|
|
|
|
so that subdirectories can be created.
|
|
|
|
For instance, one could mount
|
|
|
|
.IR cpu ,
|
|
|
|
.IR memory ,
|
|
|
|
and
|
|
|
|
.I devices
|
|
|
|
cgroups as follows:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
.in +4n
|
|
|
|
mount -t tmpfs -o size=100000,mode=755 cgroups /sys/fs/cgroup
|
|
|
|
for s in cpu memory devices; do
|
|
|
|
mkdir /sys/fs/cgroup/$s
|
|
|
|
mount -t cgroup -o $s $s /sys/fs/cgroup/$s
|
|
|
|
done
|
|
|
|
.in
|
|
|
|
.fi
|
2016-04-24 17:39:04 +00:00
|
|
|
|
|
|
|
Co-mounting subsystems has the effect that a task is in the same cgroup for
|
2016-04-24 18:26:50 +00:00
|
|
|
all co-mounted subsystems.
|
|
|
|
Separately mounting subsystems allows a task to
|
|
|
|
be in cgroup
|
|
|
|
.I /foo1
|
|
|
|
for one subsystem while being in
|
|
|
|
.I /foo2/foo3
|
|
|
|
for another.
|
|
|
|
.\"
|
|
|
|
.SS Introspection
|
2016-04-24 17:39:04 +00:00
|
|
|
The list of subsystems compiled into the kernel can be seen in the file
|
2016-04-24 18:26:50 +00:00
|
|
|
.IR /proc/cgroups .
|
|
|
|
The file
|
|
|
|
.I /proc/pid/cgroup
|
|
|
|
lists the task's current cgroup
|
2016-04-24 17:39:04 +00:00
|
|
|
membership for each mounted hierarchy.
|
2016-04-24 18:26:50 +00:00
|
|
|
.\"
|
|
|
|
.SS Creating cgroups and moving tasks
|
2016-04-24 17:39:04 +00:00
|
|
|
The system begins with a single root cgroup (per hierarchy), '/', which all tasks belong to.
|
2016-04-24 18:26:50 +00:00
|
|
|
A new cgroup is created by creating a directory in the cgroup filesystem:
|
|
|
|
|
|
|
|
mkdir /sys/fs/cgroup/cpu/cg1
|
|
|
|
|
|
|
|
This creates a new empty cgroup.
|
|
|
|
Tasks may be moved to this cgroup by writing
|
|
|
|
their PIDs into the cgroup's
|
|
|
|
.I cgroup.procs
|
|
|
|
(deprecated)
|
|
|
|
.I tasks
|
|
|
|
file:
|
|
|
|
|
|
|
|
echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
|
|
|
|
|
|
|
|
The same file can be read to obtain a list of the processes currently in
|
|
|
|
.IR cg1 .
|
|
|
|
By using the
|
|
|
|
.I cgroup.procs
|
|
|
|
file instead of the
|
|
|
|
.I tasks
|
|
|
|
file, all tasks in the
|
|
|
|
thread group are moved into the new cgroup at once.
|
|
|
|
|
|
|
|
On
|
|
|
|
.BR fork (2),
|
|
|
|
the new child is created as a member of the parent's cgroup,
|
|
|
|
leading to implicit grouping of process hierarchies.
|
2016-04-24 17:39:04 +00:00
|
|
|
|
|
|
|
Note: in the upcoming unified hierarchy, a new restriction is imposed such
|
2016-04-24 18:26:50 +00:00
|
|
|
that tasks may only exist in leaf cgroups.
|
|
|
|
For instance, if cgroup
|
|
|
|
.I /cg1/cg2
|
|
|
|
exists, then a task may exist in
|
|
|
|
.IR /cg1/cg2 ,
|
|
|
|
but not in
|
|
|
|
.IR /cg1 .
|
|
|
|
This is to avoid the current ambiguity in the delegation of resources
|
|
|
|
between tasks in
|
|
|
|
.I /cg1
|
|
|
|
and its child cgroups.
|
|
|
|
The recommended workaround is to create a subdirectory called
|
|
|
|
.I leaf
|
|
|
|
for any non-leaf cgroup which should contain tasks, and make sure not to
|
|
|
|
create child cgroups of it.
|
|
|
|
In the above example, tasks which previously would have gone into
|
|
|
|
.I /cg1
|
|
|
|
would now go into
|
|
|
|
.IR /cg1/leaf .
|
|
|
|
This has the advantage of making explicit the relationship between tasks in
|
|
|
|
.I /cg1/leaf
|
|
|
|
and
|
|
|
|
.IR /cg1 's
|
|
|
|
other children.
|
|
|
|
.\"
|
|
|
|
.SS Removing cgroups
|
|
|
|
To remove a cgroup, it must first have no child cgroups and contain no tasks.
|
|
|
|
So long as that is the case,
|
|
|
|
the cgroup by removing the corresponding directory pathname.
|
|
|
|
|
|
|
|
A special file in each cgroup hierarchy,
|
|
|
|
.IR release_agent ,
|
|
|
|
can be used to register a program to handle cgroups which become newly empty.
|
|
|
|
The program will be called each time a cgroup marked for
|
|
|
|
autoremove becomes empty and childless.
|
|
|
|
The cgroup path will be provided as the first command-line argument.
|
|
|
|
The cgroup must be marked as eligible for autoremove by writing '1' into its
|
|
|
|
.IR notify_on_release
|
|
|
|
file;
|
2016-04-24 17:39:04 +00:00
|
|
|
this value is inherited by newly created child cgroups.
|
|
|
|
|
2016-04-24 18:26:50 +00:00
|
|
|
A new feature in 3.15 (?) is the
|
|
|
|
.I cgroup.populated
|
|
|
|
file.
|
|
|
|
This reads 0 if there are no tasks in the cgroup or its descendants,
|
|
|
|
and 1 otherwise.
|
|
|
|
It can be watched for changes using
|
|
|
|
.BR inotify (7).
|
|
|
|
This allows user-space applications to efficiently watch cgroups
|
|
|
|
for autoremove conditions.
|
|
|
|
.\"
|
|
|
|
.SS Unified Hierarchy
|
2016-04-24 17:39:04 +00:00
|
|
|
In order to address a number of shortcomings in the original Control Groups
|
2016-04-24 18:26:50 +00:00
|
|
|
design, new semantics are being gradually introduced.
|
|
|
|
In order not to break existing applications,
|
|
|
|
the new semantics are hidden behind a mount option
|
2016-04-24 17:39:04 +00:00
|
|
|
(subject to change):
|
|
|
|
|
2016-04-24 18:26:50 +00:00
|
|
|
mount -t cgroup -o __DEVEL__sane_behavior cgroup /sys/fs/cgroup
|
2016-04-24 17:39:04 +00:00
|
|
|
|
2016-04-24 18:26:50 +00:00
|
|
|
By default, all controllers are co-mounted in the unified hierarchy.
|
|
|
|
While controllers may be mounted under the legacy hierarchy,
|
|
|
|
they may not be mounted at the same time in legacy and unified hierarchies.
|
2016-04-24 17:39:04 +00:00
|
|
|
|
|
|
|
The new behaviors are summarized below:
|
2016-04-24 18:26:50 +00:00
|
|
|
.TP 3
|
|
|
|
1. Tasks only in leaf nodes
|
|
|
|
With the exception of the root cgroup, tasks may only reside in leaf nodes.
|
2016-04-24 17:39:04 +00:00
|
|
|
This avoids the need to decide how to partition resources between tasks which
|
|
|
|
are members of cgroup A and tasks in child cgroups of A.
|
2016-04-24 18:26:50 +00:00
|
|
|
.TP
|
2016-04-24 17:39:04 +00:00
|
|
|
2. Active cgroups must be specified
|
2016-04-24 18:26:50 +00:00
|
|
|
The unified hierarchy presents two new files,
|
|
|
|
.IR cgroup.controllers
|
|
|
|
and
|
|
|
|
.IR cgroup.subtree_control .
|
|
|
|
When a cgroup
|
|
|
|
.I A/b
|
|
|
|
is created, its
|
|
|
|
.IR cgroup.controllers
|
2016-04-24 17:39:04 +00:00
|
|
|
file contains the list of controllers which were active in its parent, A.
|
2016-04-24 18:26:50 +00:00
|
|
|
This is the list of controllers which are available to this cgroup.
|
|
|
|
No controllers are active until they are enabled through the
|
|
|
|
.IR cgroup.subtree_control
|
|
|
|
file, by writing the name of the space-separate list of controllers,
|
|
|
|
each preceded by '+' (to enable) or '-' (to disable).
|
|
|
|
If the
|
|
|
|
.I freezer
|
|
|
|
controller is not enabled in
|
|
|
|
.IR /A/B ,
|
|
|
|
then it cannot be enabled in
|
|
|
|
.IR /A/B/C .
|
|
|
|
.TP
|
2016-04-24 17:39:04 +00:00
|
|
|
3. No "tasks" or "cgroup.clone_children" files
|
2016-04-24 18:26:50 +00:00
|
|
|
.TP
|
2016-04-24 17:39:04 +00:00
|
|
|
4. Empty cgroup notification
|
2016-04-24 18:26:50 +00:00
|
|
|
A new file,
|
|
|
|
.IR cgroup.populated ,
|
|
|
|
under each cgroup contains '0' when the
|
|
|
|
cgroup is empty, and 1 when it is populated.
|
|
|
|
It therefore may be watched to detect when a cgroup becomes (non-)empty.
|
|
|
|
This replaces the original notify-on-release mechanism.
|
|
|
|
|
|
|
|
For more changes, please see the
|
|
|
|
.I Documentation/cgroups/unified-hierarchy
|
2016-04-24 17:39:04 +00:00
|
|
|
file in the kernel source.
|
2016-04-24 18:26:50 +00:00
|
|
|
.\"
|
|
|
|
.SS Subsystems
|
|
|
|
.TP
|
2016-04-25 06:43:10 +00:00
|
|
|
.I cpu
|
|
|
|
Cgroups can be guaranteed a minimum number of "cpu shares"
|
|
|
|
when a system is busy.
|
|
|
|
This does not limit a cgroup's CPU usage if the CPUs are not busy.
|
|
|
|
.TP
|
|
|
|
.I cpuacct
|
|
|
|
This provides accounting for CPU usage by groups of tasks.
|
|
|
|
.TP
|
2016-04-25 06:35:36 +00:00
|
|
|
.I cpuset
|
2016-04-24 18:26:50 +00:00
|
|
|
This cgroup can be used to bind the tasks in a cgroup to
|
|
|
|
a specified set of CPUs and NUMA nodes.
|
|
|
|
.TP
|
2016-04-25 06:43:10 +00:00
|
|
|
.I memory
|
|
|
|
The memory controller supports reporting and limiting of process memory, kernel
|
|
|
|
memory, and swap used by cgroups.
|
2016-04-24 18:26:50 +00:00
|
|
|
.TP
|
|
|
|
.I devices
|
2016-04-24 17:39:04 +00:00
|
|
|
This supports controlling which tasks may create (mknod) devices as
|
2016-04-24 18:26:50 +00:00
|
|
|
well as open them for reading or writing.
|
|
|
|
The policies may be specified as whitelists and blacklists.
|
|
|
|
Hierarchy is enforced, so new rules must not
|
2016-04-24 17:39:04 +00:00
|
|
|
violate existing rules for the target or ancestor cgroups.
|
2016-04-24 18:26:50 +00:00
|
|
|
.TP
|
|
|
|
.I freezer
|
|
|
|
The
|
|
|
|
.I freezer
|
|
|
|
cgroup can suspend and restore (resume) all tasks in a cgroup.
|
|
|
|
Freezing a cgroup
|
|
|
|
.I /A
|
|
|
|
also causes its children, for example, tasks in
|
|
|
|
.IR /A/B ,
|
2016-04-24 17:39:04 +00:00
|
|
|
to be frozen.
|
2016-04-24 18:26:50 +00:00
|
|
|
.TP
|
|
|
|
.I net_cls
|
2016-04-24 17:39:04 +00:00
|
|
|
This places a classid, specified for the cgroup, on network packets
|
2016-04-24 18:26:50 +00:00
|
|
|
created by a cgroup.
|
|
|
|
These classids can then be used in firewall rules,
|
|
|
|
as well as used to shape traffic using
|
|
|
|
.BR tc (8).
|
|
|
|
This only applies to packets
|
2016-04-24 17:39:04 +00:00
|
|
|
leaving the cgroup, not to traffic arriving at the cgroup.
|
2016-04-24 18:26:50 +00:00
|
|
|
.TP
|
2016-04-25 06:43:10 +00:00
|
|
|
.I blkio
|
|
|
|
The
|
|
|
|
.I blkio
|
|
|
|
cgroup controls and limits access to specified block devices by
|
|
|
|
applying IO control in the form of throttling and upper limits against leaf
|
|
|
|
nodes and intermediate nodes in the storage hierarchy.
|
|
|
|
|
|
|
|
Two policies are available.
|
|
|
|
The first is a proportional-weight time-based division
|
|
|
|
of disk implemented with CFQ.
|
|
|
|
This is in effect for leaf nodes using CFQ.
|
|
|
|
The second is a throttling policy which specifies
|
|
|
|
upper I/O rate limits on a device.
|
|
|
|
.TP
|
|
|
|
.I perf_event
|
|
|
|
.TP
|
2016-04-24 18:26:50 +00:00
|
|
|
.I net_prio
|
|
|
|
This allows priorities to be specified, per network interface, for cgroups.
|
|
|
|
.TP
|
2016-04-25 06:43:10 +00:00
|
|
|
.I hugetlb
|
|
|
|
This supports limiting the use of huge pages by cgroups.
|
2016-04-25 06:33:41 +00:00
|
|
|
.TP
|
|
|
|
.I pids
|
|
|
|
This controller permits limiting the number of process that may be created
|
|
|
|
in a cgroup (and its descendants).
|
2016-04-24 18:52:27 +00:00
|
|
|
.SH SEE ALSO
|
2016-04-24 18:57:57 +00:00
|
|
|
.BR cpuset (7),
|
2016-04-24 18:52:27 +00:00
|
|
|
.BR namespaces (7)
|