mirror of https://github.com/mkerrisk/man-pages
cgroups.7: Formatting and wording fixes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
014cb63b3c
commit
21f0d132f3
442
man7/cgroups.7
442
man7/cgroups.7
|
@ -22,232 +22,332 @@
|
|||
.\" the source, must acknowledge the copyright and authors of this work.
|
||||
.\" %%%LICENSE_END
|
||||
.\"
|
||||
|
||||
Name: cgroups - linux process control groups
|
||||
|
||||
Description
|
||||
|
||||
Control cgroups, usually referred to as cgroups, are a Linux kernel feature
|
||||
which provides for grouping of tasks and resource tracking and limitations for those group.
|
||||
.TH CGROUPS 7 2016-04-24 "Linux" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
cgroups \- Linux control groups
|
||||
.SH DESCRIPTION
|
||||
Control cgroups, usually referred to as cgroups,
|
||||
are a Linux kernel feature which provides for grouping of tasks and
|
||||
resource tracking and limitations for those groups.
|
||||
While several systems have been introduced to help in configuring and
|
||||
managing cgroups, the kernel's cgroup interface is provided through
|
||||
a pseudo-filesystem called cgroupfs. Task grouping is implemented in the
|
||||
core cgroup kernel code, while resource tracking and limits are implemented in
|
||||
a set of per-resource-type subsystems - memory, cpu, etc - which may be
|
||||
a pseudo-filesystem called cgroupfs.
|
||||
Task grouping is implemented in the core cgroup kernel code,
|
||||
while resource tracking and limits are implemented in
|
||||
a set of per-resource-type subsystems (memory, CPU, and so on) which may be
|
||||
enabled as separate hierarchies, or joined into comounted hierarchies.
|
||||
Each hierarchy constitutes a separate mount of the cgroupfs filesystem,
|
||||
|
||||
Each hierarchy constitutes a separate mount of the cgroup filesystem,
|
||||
with the subsystems enabled in that hierarchy listed in the mount options.
|
||||
For each mounted hierarchy, the directory tree mirrors the control group hierarchy.
|
||||
For each mounted hierarchy,
|
||||
the directory tree mirrors the control group hierarchy.
|
||||
Each control group is represented by a directory, with each of its child
|
||||
control cgroups represented as a child directory.
|
||||
For instance, /user/joe/1.session represents control group
|
||||
1.session, which is a child of cgroup joe, which is a child of /user.
|
||||
Under each cgroup directory are a set of files which can be read or
|
||||
For instance,
|
||||
.IR /user/joe/1.session
|
||||
represents control group
|
||||
.IR 1.session ,
|
||||
which is a child of cgroup
|
||||
.IR joe ,
|
||||
which is a child of
|
||||
.IR /user .
|
||||
Under each cgroup directory is a set of files which can be read or
|
||||
written to, reflecting resource limits and a few general cgroup
|
||||
properties.
|
||||
|
||||
In general, cgroup limits are hierarchical, meaning that the limits placed
|
||||
on /user/joe cannot be exceeded by /usr/joe/1.session. There are currently
|
||||
exceptions to this, but stricter adherence is a goal as cgroups are being
|
||||
largely reworked.
|
||||
In general, cgroup limits are hierarchical, meaning that the limits placed on
|
||||
.IR /user/joe
|
||||
cannot be exceeded by
|
||||
.IR /usr/joe/1.session .
|
||||
There are currently exceptions to this rule,
|
||||
but stricter adherence is a goal as cgroups are being largely reworked.
|
||||
|
||||
The existing subsystems include
|
||||
|
||||
. cpusets
|
||||
. blkio
|
||||
. cpuacct
|
||||
. devices
|
||||
. freezer
|
||||
. hugetlb
|
||||
. memory
|
||||
. net_cls
|
||||
. net_pri
|
||||
. cpu
|
||||
. perf_event
|
||||
The existing subsystems include:
|
||||
|
||||
.PD 0
|
||||
.IP * 2
|
||||
.I cpusets
|
||||
.IP *
|
||||
.I blkio
|
||||
.IP *
|
||||
.I cpuacct
|
||||
.IP *
|
||||
.I devices
|
||||
.IP *
|
||||
.I freezer
|
||||
.IP *
|
||||
.I hugetlb
|
||||
.IP *
|
||||
.I memory
|
||||
.IP *
|
||||
.I net_cls
|
||||
.IP *
|
||||
.I net_pri
|
||||
.IP *
|
||||
.I cpu
|
||||
.IP *
|
||||
.I perf_event
|
||||
.PD
|
||||
.PP
|
||||
In addition, cgroups can be mounted with no bound subsystem, in which case
|
||||
they serve only to track processes. An example of this is the name=systemd
|
||||
cgroup which is used by systemd to track services and user sessions.
|
||||
|
||||
Mounting
|
||||
|
||||
they serve only to track processes.
|
||||
An example of this is the
|
||||
.I name=systemd
|
||||
cgroup which is used by
|
||||
.BR systemd (1)
|
||||
to track services and user sessions.
|
||||
.\"
|
||||
.SS Mounting
|
||||
To be available, a given cgroup subsystem must be compiled into the
|
||||
kernel. Since they are exposed through a virtual filesystem, subsystems
|
||||
must be mounted before they can be controlled. The usual place for this
|
||||
is under /sys/fs/cgroup. If all the desired subsystems can be co-mounted,
|
||||
kernel.
|
||||
Since they are exposed through a virtual filesystem, subsystems
|
||||
must be mounted before they can be controlled.
|
||||
The usual place for this is under
|
||||
.I /sys/fs/cgroup.
|
||||
If all the desired subsystems can be co-mounted,
|
||||
then the system may simply
|
||||
|
||||
mount -t cgroup cgroup /sys/fs/cgroup
|
||||
mount -t cgroup cgroup /sys/fs/cgroup
|
||||
|
||||
If multiple, separately mounted subsystems are desired, then this is
|
||||
usually done in per-subsystem subdirectories. This requires first mounting
|
||||
a tmpfs under /sys/fs/cgroup so that subdirectories can be created. For
|
||||
instance, to mount cpu, memory and devices cgroups, you could
|
||||
usually done in per-subsystem subdirectories.
|
||||
This requires first mounting a tmpfs under
|
||||
.I /sys/fs/cgroup
|
||||
so that subdirectories can be created.
|
||||
For instance, one could mount
|
||||
.IR cpu ,
|
||||
.IR memory ,
|
||||
and
|
||||
.I devices
|
||||
cgroups as follows:
|
||||
|
||||
mount -t tmpfs -o size=100000,mode=755 cgroups /sys/fs/cgroup
|
||||
for s in cpu memory devices; do
|
||||
mkdir /sys/fs/cgroup/$s
|
||||
mount -t cgroup -o $s $s /sys/fs/cgroup/$s
|
||||
done
|
||||
.nf
|
||||
.in +4n
|
||||
mount -t tmpfs -o size=100000,mode=755 cgroups /sys/fs/cgroup
|
||||
for s in cpu memory devices; do
|
||||
mkdir /sys/fs/cgroup/$s
|
||||
mount -t cgroup -o $s $s /sys/fs/cgroup/$s
|
||||
done
|
||||
.in
|
||||
.fi
|
||||
|
||||
Co-mounting subsystems has the effect that a task is in the same cgroup for
|
||||
all co-mounted subsystems. Separately mounting subsystems allows a task to
|
||||
be in cgroup /foo1 for one subsystem while being in /foo2/foo3 for another.
|
||||
|
||||
Introspection
|
||||
|
||||
all co-mounted subsystems.
|
||||
Separately mounting subsystems allows a task to
|
||||
be in cgroup
|
||||
.I /foo1
|
||||
for one subsystem while being in
|
||||
.I /foo2/foo3
|
||||
for another.
|
||||
.\"
|
||||
.SS Introspection
|
||||
The list of subsystems compiled into the kernel can be seen in the file
|
||||
/proc/cgroups. The file /proc/pid/cgroup lists the task's current cgroup
|
||||
.IR /proc/cgroups .
|
||||
The file
|
||||
.I /proc/pid/cgroup
|
||||
lists the task's current cgroup
|
||||
membership for each mounted hierarchy.
|
||||
|
||||
Creating cgroups and moving tasks
|
||||
|
||||
.\"
|
||||
.SS Creating cgroups and moving tasks
|
||||
The system begins with a single root cgroup (per hierarchy), '/', which all tasks belong to.
|
||||
A new cgroup is created using mkdir(2):
|
||||
A new cgroup is created by creating a directory in the cgroup filesystem:
|
||||
|
||||
mkdir /sys/fs/cgroup/cpu/cg1
|
||||
mkdir /sys/fs/cgroup/cpu/cg1
|
||||
|
||||
This creates a new empty cgroup. Tasks may be moved to this cgroup by writing
|
||||
their pids into the cgroup's "cgroup.procs" (deprecated) "tasks" file:
|
||||
This creates a new empty cgroup.
|
||||
Tasks may be moved to this cgroup by writing
|
||||
their PIDs into the cgroup's
|
||||
.I cgroup.procs
|
||||
(deprecated)
|
||||
.I tasks
|
||||
file:
|
||||
|
||||
echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
|
||||
echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
|
||||
|
||||
The same file can be read to obtain a list of the processes currently in cg1.
|
||||
By using the cgroup.procs file instead of the tasks file, all tasks in the
|
||||
threadgroup are moved into the new cgroup at once.
|
||||
The same file can be read to obtain a list of the processes currently in
|
||||
.IR cg1 .
|
||||
By using the
|
||||
.I cgroup.procs
|
||||
file instead of the
|
||||
.I tasks
|
||||
file, all tasks in the
|
||||
thread group are moved into the new cgroup at once.
|
||||
|
||||
At fork(2), the new child is created as a member of the parent's cgroup, leading
|
||||
to implicit grouping of process hierarchies.
|
||||
On
|
||||
.BR fork (2),
|
||||
the new child is created as a member of the parent's cgroup,
|
||||
leading to implicit grouping of process hierarchies.
|
||||
|
||||
Note: in the upcoming unified hierarchy, a new restriction is imposed such
|
||||
that tasks may only exist in leaf cgroups. For instance, if cgroup /cg1/cg2
|
||||
exists, then a task may exist in /cg1/cg2, but not in /cg1. This is to
|
||||
avoid the current ambiguity in the delegation of resources between tasks in /cg1
|
||||
and its children. The recommended workaround is to create a subdirectory called
|
||||
leaf for any non-leaf cgroup which should contain tasks, and make sure not to
|
||||
create child cgroups of it. In the above example, tasks which previously would
|
||||
have gone into /cg1 would now go into /cg1/leaf. This has the advantage of
|
||||
making explicit the relationship between tasks in /cg1/leaf and /cg1's other
|
||||
children.
|
||||
that tasks may only exist in leaf cgroups.
|
||||
For instance, if cgroup
|
||||
.I /cg1/cg2
|
||||
exists, then a task may exist in
|
||||
.IR /cg1/cg2 ,
|
||||
but not in
|
||||
.IR /cg1 .
|
||||
This is to avoid the current ambiguity in the delegation of resources
|
||||
between tasks in
|
||||
.I /cg1
|
||||
and its child cgroups.
|
||||
The recommended workaround is to create a subdirectory called
|
||||
.I leaf
|
||||
for any non-leaf cgroup which should contain tasks, and make sure not to
|
||||
create child cgroups of it.
|
||||
In the above example, tasks which previously would have gone into
|
||||
.I /cg1
|
||||
would now go into
|
||||
.IR /cg1/leaf .
|
||||
This has the advantage of making explicit the relationship between tasks in
|
||||
.I /cg1/leaf
|
||||
and
|
||||
.IR /cg1 's
|
||||
other children.
|
||||
.\"
|
||||
.SS Removing cgroups
|
||||
To remove a cgroup, it must first have no child cgroups and contain no tasks.
|
||||
So long as that is the case,
|
||||
the cgroup by removing the corresponding directory pathname.
|
||||
|
||||
Removing cgroups
|
||||
|
||||
To remove a cgroup, it must first have no child cgroups and no tasks. So long
|
||||
as that is the case, the cgroup is removed using rmdir(2).
|
||||
|
||||
A special file in each cgroup hierarchy, called 'release_agent', can be used
|
||||
to register a program to handle cgroups which become newly empty. The program
|
||||
will be called each time a cgroup marked for autoremove becomes empty and childless.
|
||||
The cgroup path will be listed as the first argument. The cgroup must be marked
|
||||
as eligible for autoremove by writing '1' into its notify_on_release file, and
|
||||
A special file in each cgroup hierarchy,
|
||||
.IR release_agent ,
|
||||
can be used to register a program to handle cgroups which become newly empty.
|
||||
The program will be called each time a cgroup marked for
|
||||
autoremove becomes empty and childless.
|
||||
The cgroup path will be provided as the first command-line argument.
|
||||
The cgroup must be marked as eligible for autoremove by writing '1' into its
|
||||
.IR notify_on_release
|
||||
file;
|
||||
this value is inherited by newly created child cgroups.
|
||||
|
||||
A new feature in 3.15 (?) is the 'cgroup.populated' file. This reads 0 if
|
||||
there are no tasks in the cgroup or its descendants, and 1 otherwise. It
|
||||
can be watched for changes using inotify. This allows userspace to efficiently
|
||||
watch cgroups for autoremove conditions.
|
||||
|
||||
Unified Hierarchy
|
||||
|
||||
A new feature in 3.15 (?) is the
|
||||
.I cgroup.populated
|
||||
file.
|
||||
This reads 0 if there are no tasks in the cgroup or its descendants,
|
||||
and 1 otherwise.
|
||||
It can be watched for changes using
|
||||
.BR inotify (7).
|
||||
This allows user-space applications to efficiently watch cgroups
|
||||
for autoremove conditions.
|
||||
.\"
|
||||
.SS Unified Hierarchy
|
||||
In order to address a number of shortcomings in the original Control Groups
|
||||
design, new semantics are being gradually introduced. In order not to break
|
||||
existing applications, the new semantics are hidden behind a mount option
|
||||
design, new semantics are being gradually introduced.
|
||||
In order not to break existing applications,
|
||||
the new semantics are hidden behind a mount option
|
||||
(subject to change):
|
||||
|
||||
mount -t cgroup -o __DEVEL__sane_behavior cgroup /sys/fs/cgroup
|
||||
mount -t cgroup -o __DEVEL__sane_behavior cgroup /sys/fs/cgroup
|
||||
|
||||
By default all controllers are co-mounted in the unified hierarchy. While
|
||||
controllers may be mounted under the legacy hierarchy, they may not be
|
||||
mounted at the same time in legacy and unified hierarchies.
|
||||
By default, all controllers are co-mounted in the unified hierarchy.
|
||||
While controllers may be mounted under the legacy hierarchy,
|
||||
they may not be mounted at the same time in legacy and unified hierarchies.
|
||||
|
||||
The new behaviors are summarized below:
|
||||
|
||||
1 Tasks only in leave nodes
|
||||
|
||||
With the exception of the root cgroup, tasks may only belong in leaf nodes.
|
||||
.TP 3
|
||||
1. Tasks only in leaf nodes
|
||||
With the exception of the root cgroup, tasks may only reside in leaf nodes.
|
||||
This avoids the need to decide how to partition resources between tasks which
|
||||
are members of cgroup A and tasks in child cgroups of A.
|
||||
|
||||
.TP
|
||||
2. Active cgroups must be specified
|
||||
|
||||
The unified hierarchy presents two new files, "cgroup.controllers" and
|
||||
"cgroup.subtree_control". When a cgroup A/b is created, its 'cgroup.controllers"
|
||||
The unified hierarchy presents two new files,
|
||||
.IR cgroup.controllers
|
||||
and
|
||||
.IR cgroup.subtree_control .
|
||||
When a cgroup
|
||||
.I A/b
|
||||
is created, its
|
||||
.IR cgroup.controllers
|
||||
file contains the list of controllers which were active in its parent, A.
|
||||
This is the list of controllers which are available to this cgroup. No
|
||||
controllers are active until they are enabled through the "cgroup.subtree_control"
|
||||
file, by writing the name of the space-separate list of controllers, each preceded
|
||||
by '+' (to enable) or '-' (to disable). If the freezer controller is not
|
||||
enabled in /A/B, then it cannot be enabled in /A/B/C.
|
||||
|
||||
This is the list of controllers which are available to this cgroup.
|
||||
No controllers are active until they are enabled through the
|
||||
.IR cgroup.subtree_control
|
||||
file, by writing the name of the space-separate list of controllers,
|
||||
each preceded by '+' (to enable) or '-' (to disable).
|
||||
If the
|
||||
.I freezer
|
||||
controller is not enabled in
|
||||
.IR /A/B ,
|
||||
then it cannot be enabled in
|
||||
.IR /A/B/C .
|
||||
.TP
|
||||
3. No "tasks" or "cgroup.clone_children" files
|
||||
|
||||
.TP
|
||||
4. Empty cgroup notification
|
||||
A new file,
|
||||
.IR cgroup.populated ,
|
||||
under each cgroup contains '0' when the
|
||||
cgroup is empty, and 1 when it is populated.
|
||||
It therefore may be watched to detect when a cgroup becomes (non-)empty.
|
||||
This replaces the original notify-on-release mechanism.
|
||||
|
||||
A new file "cgroup.populated" under each cgroup contains '0' when the
|
||||
cgroup is empty, and 1 when it is populated. It therefore may be
|
||||
watched to detect when a cgroup becomes (non-)empty. This replaces
|
||||
the original notify-on-release mechanism.
|
||||
|
||||
For more changes, please see the Documentation/cgroups/unified-hierarchy
|
||||
For more changes, please see the
|
||||
.I Documentation/cgroups/unified-hierarchy
|
||||
file in the kernel source.
|
||||
|
||||
Subsystems # give details on each subsystem
|
||||
|
||||
. cpusets
|
||||
|
||||
Cpusets bind the tasks in a cgroup to a specified set of cpus and
|
||||
numa nodes.
|
||||
|
||||
. blkio
|
||||
|
||||
The blkio cgroup controls and limits access to specified block devices by
|
||||
.\"
|
||||
.SS Subsystems
|
||||
.TP
|
||||
.I cpusets
|
||||
This cgroup can be used to bind the tasks in a cgroup to
|
||||
a specified set of CPUs and NUMA nodes.
|
||||
.TP
|
||||
.I blkio
|
||||
The
|
||||
.I blkio
|
||||
cgroup controls and limits access to specified block devices by
|
||||
applying IO control in the form of throttling and upper limits against leaf
|
||||
nodes and intermediate nodes in the storage hierarchy.
|
||||
|
||||
Two policies are available. The first is a proportional weight time based division
|
||||
of disk implemented with CFQ. This is in effect for leaf nodes using CFQ. The
|
||||
second is a throttling policy which specifies upper IO rate limits on a device.
|
||||
|
||||
. cpuacct
|
||||
|
||||
This provides accounting for cpu usage by groups of tasks.
|
||||
|
||||
. devices
|
||||
|
||||
Two policies are available.
|
||||
The first is a proportional-weight time-based division
|
||||
of disk implemented with CFQ.
|
||||
This is in effect for leaf nodes using CFQ.
|
||||
The second is a throttling policy which specifies
|
||||
upper I/O rate limits on a device.
|
||||
.TP
|
||||
.I cpuacct
|
||||
This provides accounting for CPU usage by groups of tasks.
|
||||
.TP
|
||||
.I devices
|
||||
This supports controlling which tasks may create (mknod) devices as
|
||||
well as open them for reading or writing. The policies may be specified
|
||||
as whitelists and blacklists. Hierarchy is enforced, so new rules must not
|
||||
well as open them for reading or writing.
|
||||
The policies may be specified as whitelists and blacklists.
|
||||
Hierarchy is enforced, so new rules must not
|
||||
violate existing rules for the target or ancestor cgroups.
|
||||
|
||||
. freezer
|
||||
|
||||
The freezer cgroup can suspend and un-suspend all tasks in a cgroup.
|
||||
Freezing a cgroup /A also causes its children, i.e. tasks in /A/B,
|
||||
.TP
|
||||
.I freezer
|
||||
The
|
||||
.I freezer
|
||||
cgroup can suspend and restore (resume) all tasks in a cgroup.
|
||||
Freezing a cgroup
|
||||
.I /A
|
||||
also causes its children, for example, tasks in
|
||||
.IR /A/B ,
|
||||
to be frozen.
|
||||
|
||||
. hugetlb
|
||||
|
||||
.TP
|
||||
.I hugetlb
|
||||
This supports limiting the use of huge pages by cgroups.
|
||||
|
||||
. memory
|
||||
|
||||
.TP
|
||||
.I memory
|
||||
The memory controller supports reporting and limiting of process memory, kernel
|
||||
memory, and swap used by cgroups.
|
||||
|
||||
. net_cls
|
||||
|
||||
.TP
|
||||
.I net_cls
|
||||
This places a classid, specified for the cgroup, on network packets
|
||||
created by a cgroup. These classids can then be used in firewall rules,
|
||||
as well as used to shape traffic using tc. This only applies to packets
|
||||
created by a cgroup.
|
||||
These classids can then be used in firewall rules,
|
||||
as well as used to shape traffic using
|
||||
.BR tc (8).
|
||||
This only applies to packets
|
||||
leaving the cgroup, not to traffic arriving at the cgroup.
|
||||
|
||||
. net_prio
|
||||
|
||||
This allows priorities to be specified, per network interfaces, for cgroups.
|
||||
|
||||
. cpu
|
||||
|
||||
Cgroups can be guaranteed a minimum number of "cpu shares" when a system is
|
||||
busy. This does not limit a cgroup's cpu usage if the cpus are not busy.
|
||||
|
||||
. perf_event
|
||||
.TP
|
||||
.I net_prio
|
||||
This allows priorities to be specified, per network interface, for cgroups.
|
||||
.TP
|
||||
.I cpu
|
||||
Cgroups can be guaranteed a minimum number of "cpu shares"
|
||||
when a system is busy.
|
||||
This does not limit a cgroup's CPU usage if the CPUs are not busy.
|
||||
.TP
|
||||
.I perf_event
|
||||
|
|
Loading…
Reference in New Issue