2016-05-06 14:09:14 +00:00
|
|
|
.\" Copyright (c) 2016 by Michael Kerrisk <mtk.manpages@gmail.com>
|
|
|
|
.\"
|
|
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\" %%%LICENSE_END
|
|
|
|
.\"
|
|
|
|
.\"
|
pldd.1, bpf.2, chdir.2, clone.2, fanotify_init.2, fanotify_mark.2, intro.2, ipc.2, mount.2, mprotect.2, msgctl.2, msgget.2, msgop.2, pivot_root.2, pkey_alloc.2, poll.2, prctl.2, semctl.2, semget.2, semop.2, setxattr.2, shmctl.2, shmget.2, shmop.2, tkill.2, dlopen.3, exec.3, ftok.3, getutent.3, on_exit.3, strcat.3, cpuid.4, proc.5, capabilities.7, cgroup_namespaces.7, credentials.7, fanotify.7, mount_namespaces.7, namespaces.7, sched.7, signal.7, socket.7, unix.7, user_namespaces.7, vdso.7, xattr.7, ld.so.8: tstamp
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-08-02 06:34:32 +00:00
|
|
|
.TH CGROUP_NAMESPACES 7 2019-08-02 "Linux" "Linux Programmer's Manual"
|
2016-05-06 14:09:14 +00:00
|
|
|
.SH NAME
|
|
|
|
cgroup_namespaces \- overview of Linux cgroup namespaces
|
|
|
|
.SH DESCRIPTION
|
|
|
|
For an overview of namespaces, see
|
|
|
|
.BR namespaces (7).
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2016-05-06 14:09:14 +00:00
|
|
|
Cgroup namespaces virtualize the view of a process's cgroups (see
|
|
|
|
.BR cgroups (7))
|
|
|
|
as seen via
|
|
|
|
.IR /proc/[pid]/cgroup
|
|
|
|
and
|
|
|
|
.IR /proc/[pid]/mountinfo .
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2017-06-13 08:53:16 +00:00
|
|
|
Each cgroup namespace has its own set of cgroup root directories.
|
|
|
|
These root directories are the base points for the relative
|
|
|
|
locations displayed in the corresponding records in the
|
|
|
|
.IR /proc/[pid]/cgroup
|
|
|
|
file.
|
2016-05-06 14:09:14 +00:00
|
|
|
When a process creates a new cgroup namespace using
|
|
|
|
.BR clone (2)
|
|
|
|
or
|
|
|
|
.BR unshare (2)
|
|
|
|
with the
|
|
|
|
.BR CLONE_NEWCGROUP
|
2019-05-20 19:14:05 +00:00
|
|
|
flag, its current
|
2016-05-06 21:34:43 +00:00
|
|
|
cgroups directories become the cgroup root directories
|
|
|
|
of the new namespace.
|
2016-05-06 14:09:14 +00:00
|
|
|
(This applies both for the cgroups version 1 hierarchies
|
|
|
|
and the cgroups version 2 unified hierarchy.)
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2019-05-20 19:34:50 +00:00
|
|
|
When reading the cgroup memberships of a "target" process from
|
2016-05-06 14:09:14 +00:00
|
|
|
.IR /proc/[pid]/cgroup ,
|
|
|
|
the pathname shown in the third field of each record will be
|
2017-06-13 08:53:16 +00:00
|
|
|
relative to the reading process's root directory
|
|
|
|
for the corresponding cgroup hierarchy.
|
2016-05-06 14:09:14 +00:00
|
|
|
If the cgroup directory of the target process lies outside
|
|
|
|
the root directory of the reading process's cgroup namespace,
|
|
|
|
then the pathname will show
|
|
|
|
.I ../
|
|
|
|
entries for each ancestor level in the cgroup hierarchy.
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2016-05-06 14:09:14 +00:00
|
|
|
The following shell session demonstrates the effect of creating
|
|
|
|
a new cgroup namespace.
|
2018-10-14 11:56:27 +00:00
|
|
|
.PP
|
2019-05-20 19:34:50 +00:00
|
|
|
First, (as superuser) in a shell in the initial cgroup namespace,
|
|
|
|
we create a child cgroup in the
|
2016-05-06 14:09:14 +00:00
|
|
|
.I freezer
|
2018-10-14 11:56:27 +00:00
|
|
|
hierarchy, and place a process in that cgroup that we will
|
|
|
|
use as part of the demonstration below:
|
|
|
|
.PP
|
|
|
|
.in +4n
|
|
|
|
.EX
|
|
|
|
# \fBmkdir \-p /sys/fs/cgroup/freezer/sub2\fP
|
|
|
|
# \fBsleep 10000 &\fP # Create a process that lives for a while
|
|
|
|
[1] 20124
|
|
|
|
# \fBecho 20124 > /sys/fs/cgroup/freezer/sub2/cgroup.procs\fP
|
|
|
|
.EE
|
|
|
|
.in
|
|
|
|
.PP
|
|
|
|
We then create another child cgroup in the
|
|
|
|
.I freezer
|
|
|
|
hierarchy and put the shell into that cgroup:
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2016-05-06 14:09:14 +00:00
|
|
|
.in +4n
|
execve.2, ioctl_console.2, ioctl_iflags.2, ioctl_ns.2, ioctl_userfaultfd.2, kcmp.2, kexec_load.2, keyctl.2, link.2, listxattr.2, membarrier.2, memfd_create.2, mmap.2, modify_ldt.2, mprotect.2, msgctl.2, nanosleep.2, open_by_handle_at.2, perf_event_open.2, poll.2, posix_fadvise.2, process_vm_readv.2, ptrace.2, query_module.2, quotactl.2, readdir.2, readv.2, recv.2, recvmmsg.2, request_key.2, sched_rr_get_interval.2, sched_setaffinity.2, sched_setattr.2, sched_setscheduler.2, seccomp.2, select.2, select_tut.2, semctl.2, semop.2, send.2, sendmmsg.2, set_thread_area.2, setns.2, shmctl.2, shmget.2, sigaction.2, sigaltstack.2, signal.2, sigwaitinfo.2, stat.2, statfs.2, statx.2, sync_file_range.2, syscall.2, sysctl.2, sysinfo.2, tee.2, timer_create.2, timer_settime.2, timerfd_create.2, unshare.2, userfaultfd.2, ustat.2, utime.2, utimensat.2, vmsplice.2, wait.2, adjtime.3, aio_init.3, backtrace.3, basename.3, bswap.3, btree.3, clock_getcpuclockid.3, cmsg.3, confstr.3, dbopen.3, dl_iterate_phdr.3, dladdr.3, dlinfo.3, dlopen.3, duplocale.3, encrypt.3, end.3, endian.3, err.3, errno.3, ether_aton.3, fgetgrent.3, fgetpwent.3, fmemopen.3, frexp.3, ftime.3, fts.3, getaddrinfo.3, getaddrinfo_a.3, getdate.3, getfsent.3, getgrent.3, getgrent_r.3, getgrnam.3, getgrouplist.3, gethostbyname.3, getifaddrs.3, getipnodebyname.3, getmntent.3, getnameinfo.3, getnetent.3, getopt.3, getprotoent.3, getprotoent_r.3, getpw.3, getpwent.3, getpwent_r.3, getpwnam.3, getrpcent.3, getservent.3, getservent_r.3, getspnam.3, getttyent.3, glob.3, gnu_get_libc_version.3, hash.3, hsearch.3, if_nameindex.3, inet.3, inet_net_pton.3, inet_pton.3, insque.3, isalpha.3, makecontext.3, mallinfo.3, malloc_info.3, mallopt.3, matherr.3, mbstowcs.3, mcheck.3, memchr.3, mq_getattr.3, mq_open.3, mq_receive.3, mq_send.3, mtrace.3, newlocale.3, ntp_gettime.3, posix_openpt.3, printf.3, pthread_attr_init.3, pthread_attr_setschedparam.3, pthread_cancel.3, pthread_cleanup_push.3, pthread_cleanup_push_defer_np.3, pthread_create.3, pthread_getattr_default_np.3, pthread_getattr_np.3, pthread_getcpuclockid.3, pthread_setname_np.3, pthread_setschedparam.3, pthread_sigmask.3, pthread_tryjoin_np.3, readdir.3, realpath.3, recno.3, regex.3, rpc.3, scanf.3, sched_getcpu.3, sem_wait.3, setaliasent.3, sigqueue.3, statvfs.3, strcat.3, strcpy.3, strftime.3, strtok.3, strtol.3, strverscmp.3, toupper.3, ttyslot.3, xdr.3, fuse.4, loop.4, rtc.4, st.4, acct.5, core.5, elf.5, slabinfo.5, aio.7, arp.7, capabilities.7, cgroup_namespaces.7, cgroups.7, ddp.7, fanotify.7, feature_test_macros.7, inode.7, inotify.7, ip.7, keyrings.7, locale.7, mount_namespaces.7, namespaces.7, netdevice.7, netlink.7, packet.7, pkeys.7, pthreads.7, sched.7, session-keyring.7, sock_diag.7, socket.7, spufs.7, udplite.7, unix.7, user_namespaces.7, vdso.7, x25.7, ld.so.8: Use consistent markup for code snippets
Change .nf/.fi to .EX/.EE
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-08-18 19:52:46 +00:00
|
|
|
.EX
|
2016-05-06 14:09:14 +00:00
|
|
|
# \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP
|
|
|
|
# \fBecho $$\fP # Show PID of this shell
|
|
|
|
30655
|
2018-10-14 11:37:02 +00:00
|
|
|
# \fBecho 30655 > /sys/fs/cgroup/freezer/sub/cgroup.procs\fP
|
2016-05-06 14:09:14 +00:00
|
|
|
# \fBcat /proc/self/cgroup | grep freezer\fP
|
|
|
|
7:freezer:/sub
|
execve.2, ioctl_console.2, ioctl_iflags.2, ioctl_ns.2, ioctl_userfaultfd.2, kcmp.2, kexec_load.2, keyctl.2, link.2, listxattr.2, membarrier.2, memfd_create.2, mmap.2, modify_ldt.2, mprotect.2, msgctl.2, nanosleep.2, open_by_handle_at.2, perf_event_open.2, poll.2, posix_fadvise.2, process_vm_readv.2, ptrace.2, query_module.2, quotactl.2, readdir.2, readv.2, recv.2, recvmmsg.2, request_key.2, sched_rr_get_interval.2, sched_setaffinity.2, sched_setattr.2, sched_setscheduler.2, seccomp.2, select.2, select_tut.2, semctl.2, semop.2, send.2, sendmmsg.2, set_thread_area.2, setns.2, shmctl.2, shmget.2, sigaction.2, sigaltstack.2, signal.2, sigwaitinfo.2, stat.2, statfs.2, statx.2, sync_file_range.2, syscall.2, sysctl.2, sysinfo.2, tee.2, timer_create.2, timer_settime.2, timerfd_create.2, unshare.2, userfaultfd.2, ustat.2, utime.2, utimensat.2, vmsplice.2, wait.2, adjtime.3, aio_init.3, backtrace.3, basename.3, bswap.3, btree.3, clock_getcpuclockid.3, cmsg.3, confstr.3, dbopen.3, dl_iterate_phdr.3, dladdr.3, dlinfo.3, dlopen.3, duplocale.3, encrypt.3, end.3, endian.3, err.3, errno.3, ether_aton.3, fgetgrent.3, fgetpwent.3, fmemopen.3, frexp.3, ftime.3, fts.3, getaddrinfo.3, getaddrinfo_a.3, getdate.3, getfsent.3, getgrent.3, getgrent_r.3, getgrnam.3, getgrouplist.3, gethostbyname.3, getifaddrs.3, getipnodebyname.3, getmntent.3, getnameinfo.3, getnetent.3, getopt.3, getprotoent.3, getprotoent_r.3, getpw.3, getpwent.3, getpwent_r.3, getpwnam.3, getrpcent.3, getservent.3, getservent_r.3, getspnam.3, getttyent.3, glob.3, gnu_get_libc_version.3, hash.3, hsearch.3, if_nameindex.3, inet.3, inet_net_pton.3, inet_pton.3, insque.3, isalpha.3, makecontext.3, mallinfo.3, malloc_info.3, mallopt.3, matherr.3, mbstowcs.3, mcheck.3, memchr.3, mq_getattr.3, mq_open.3, mq_receive.3, mq_send.3, mtrace.3, newlocale.3, ntp_gettime.3, posix_openpt.3, printf.3, pthread_attr_init.3, pthread_attr_setschedparam.3, pthread_cancel.3, pthread_cleanup_push.3, pthread_cleanup_push_defer_np.3, pthread_create.3, pthread_getattr_default_np.3, pthread_getattr_np.3, pthread_getcpuclockid.3, pthread_setname_np.3, pthread_setschedparam.3, pthread_sigmask.3, pthread_tryjoin_np.3, readdir.3, realpath.3, recno.3, regex.3, rpc.3, scanf.3, sched_getcpu.3, sem_wait.3, setaliasent.3, sigqueue.3, statvfs.3, strcat.3, strcpy.3, strftime.3, strtok.3, strtol.3, strverscmp.3, toupper.3, ttyslot.3, xdr.3, fuse.4, loop.4, rtc.4, st.4, acct.5, core.5, elf.5, slabinfo.5, aio.7, arp.7, capabilities.7, cgroup_namespaces.7, cgroups.7, ddp.7, fanotify.7, feature_test_macros.7, inode.7, inotify.7, ip.7, keyrings.7, locale.7, mount_namespaces.7, namespaces.7, netdevice.7, netlink.7, packet.7, pkeys.7, pthreads.7, sched.7, session-keyring.7, sock_diag.7, socket.7, spufs.7, udplite.7, unix.7, user_namespaces.7, vdso.7, x25.7, ld.so.8: Use consistent markup for code snippets
Change .nf/.fi to .EX/.EE
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-08-18 19:52:46 +00:00
|
|
|
.EE
|
ioctl_console.2, ioctl_getfsmap.2, ioctl_iflags.2, ioctl_list.2, ioctl_ns.2, kcmp.2, kexec_load.2, keyctl.2, link.2, mmap.2, modify_ldt.2, msgctl.2, poll.2, query_module.2, quotactl.2, recv.2, recvmmsg.2, sched_setscheduler.2, seccomp.2, select.2, semctl.2, semop.2, send.2, set_thread_area.2, setns.2, shmctl.2, shmget.2, sigaction.2, sysinfo.2, timer_create.2, timerfd_create.2, uname.2, unshare.2, userfaultfd.2, ustat.2, utimensat.2, vmsplice.2, wait.2, adjtime.3, backtrace.3, bswap.3, btree.3, clock_getcpuclockid.3, confstr.3, dbopen.3, dl_iterate_phdr.3, dlinfo.3, duplocale.3, encrypt.3, end.3, endian.3, err.3, errno.3, fmemopen.3, fopencookie.3, frexp.3, fts.3, ftw.3, getaddrinfo.3, getaddrinfo_a.3, getcontext.3, getgrouplist.3, getifaddrs.3, getipnodebyname.3, getnameinfo.3, getopt.3, getprotoent_r.3, getpwent_r.3, getrpcent.3, getservent_r.3, getttyent.3, getumask.3, glob.3, gnu_get_libc_version.3, hash.3, hsearch.3, inet.3, inet_pton.3, insque.3, isalpha.3, makecontext.3, mallopt.3, mbstowcs.3, mcheck.3, memchr.3, mq_getattr.3, mq_open.3, mtrace.3, newlocale.3, ntp_gettime.3, offsetof.3, posix_openpt.3, printf.3, pthread_setname_np.3, pthread_setschedparam.3, rpc.3, scanf.3, sched_getcpu.3, sem_wait.3, setaliasent.3, sigqueue.3, sigvec.3, stdarg.3, strcat.3, strcpy.3, strftime.3, strtol.3, toupper.3, ttyslot.3, fuse.4, loop.4, st.4, elf.5, cgroup_namespaces.7, cgroups.7, feature_test_macros.7, inode.7, inotify.7, keyrings.7, man-pages.7, math_error.7, mount_namespaces.7, mq_overview.7, pthreads.7, sched.7, session-keyring.7, udplite.7, unix.7, vdso.7: Use consistent markup for code snippets
The preferred form is
.PP/.IP
.in +4n
.EX
<code>
.EE
.in
.PP/.IP
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-08-18 19:37:55 +00:00
|
|
|
.in
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2016-05-06 14:09:14 +00:00
|
|
|
Next, we use
|
|
|
|
.BR unshare (1)
|
|
|
|
to create a process running a new shell in new cgroup and mount namespaces:
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2016-05-06 14:09:14 +00:00
|
|
|
.in +4n
|
2020-06-10 10:04:48 +00:00
|
|
|
.EX
|
2019-05-20 19:52:13 +00:00
|
|
|
# \fBPS1="sh2# " unshare \-Cm bash\fP
|
2017-06-13 10:19:56 +00:00
|
|
|
.EE
|
2020-06-10 10:04:48 +00:00
|
|
|
.in
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2019-05-20 19:34:50 +00:00
|
|
|
From the new shell started by
|
|
|
|
.BR unshare (1),
|
|
|
|
we then inspect the
|
2016-05-06 14:09:14 +00:00
|
|
|
.IR /proc/[pid]/cgroup
|
2019-05-20 19:34:50 +00:00
|
|
|
files of, respectively, the new shell,
|
|
|
|
a process that is in the initial cgroup namespace
|
2016-05-06 14:09:14 +00:00
|
|
|
.RI ( init ,
|
2018-10-14 11:56:27 +00:00
|
|
|
with PID 1), and the process in the sibling cgroup
|
2017-06-13 08:53:16 +00:00
|
|
|
.RI ( sub2 ):
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2016-05-06 14:09:14 +00:00
|
|
|
.in +4n
|
2020-06-10 10:04:48 +00:00
|
|
|
.EX
|
2019-05-20 19:52:13 +00:00
|
|
|
sh2# \fBcat /proc/self/cgroup | grep freezer\fP
|
2016-05-06 14:09:14 +00:00
|
|
|
7:freezer:/
|
2019-05-20 19:52:13 +00:00
|
|
|
sh2# \fBcat /proc/1/cgroup | grep freezer\fP
|
2016-05-06 14:09:14 +00:00
|
|
|
7:freezer:/..
|
2019-05-20 19:52:13 +00:00
|
|
|
sh2# \fBcat /proc/20124/cgroup | grep freezer\fP
|
2016-05-06 14:09:14 +00:00
|
|
|
7:freezer:/../sub2
|
2017-06-13 10:19:56 +00:00
|
|
|
.EE
|
2020-06-10 10:04:48 +00:00
|
|
|
.in
|
2017-06-13 09:17:16 +00:00
|
|
|
.PP
|
|
|
|
From the output of the first command,
|
|
|
|
we see that the freezer cgroup membership of the new shell
|
|
|
|
(which is in the same cgroup as the initial shell)
|
|
|
|
is shown defined relative to the freezer cgroup root directory
|
|
|
|
that was established when the new cgroup namespace was created.
|
|
|
|
(In absolute terms,
|
|
|
|
the new shell is in the
|
|
|
|
.I /sub
|
|
|
|
freezer cgroup,
|
|
|
|
and the root directory of the freezer cgroup hierarchy
|
|
|
|
in the new cgroup namespace is also
|
|
|
|
.IR /sub .
|
|
|
|
Thus, the new shell's cgroup membership is displayed as \(aq/\(aq.)
|
|
|
|
.PP
|
2016-05-06 14:09:14 +00:00
|
|
|
However, when we look in
|
|
|
|
.IR /proc/self/mountinfo
|
|
|
|
we see the following anomaly:
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2016-05-06 14:09:14 +00:00
|
|
|
.in +4n
|
2020-06-10 10:04:48 +00:00
|
|
|
.EX
|
2019-05-20 19:52:13 +00:00
|
|
|
sh2# \fBcat /proc/self/mountinfo | grep freezer\fP
|
2016-05-06 14:09:14 +00:00
|
|
|
155 145 0:32 /.. /sys/fs/cgroup/freezer ...
|
2017-06-13 10:19:56 +00:00
|
|
|
.EE
|
2020-06-10 10:04:48 +00:00
|
|
|
.in
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2017-06-13 08:53:16 +00:00
|
|
|
The fourth field of this line
|
|
|
|
.RI ( /.. )
|
|
|
|
should show the
|
2016-05-06 14:09:14 +00:00
|
|
|
directory in the cgroup filesystem which forms the root of this mount.
|
|
|
|
Since by the definition of cgroup namespaces, the process's current
|
|
|
|
freezer cgroup directory became its root freezer cgroup directory,
|
|
|
|
we should see \(aq/\(aq in this field.
|
|
|
|
The problem here is that we are seeing a mount entry for the cgroup
|
2019-05-20 19:34:50 +00:00
|
|
|
filesystem corresponding to the initial cgroup namespace
|
|
|
|
(whose cgroup filesystem is indeed rooted at the parent directory of
|
2016-05-06 14:09:14 +00:00
|
|
|
.IR sub ).
|
2019-05-20 19:34:50 +00:00
|
|
|
To fix this problem, we must remount the freezer cgroup filesystem
|
|
|
|
from the new shell (i.e., perform the mount from a process that is in the
|
|
|
|
new cgroup namespace), after which we see the expected results:
|
2017-06-13 09:58:40 +00:00
|
|
|
.PP
|
2016-05-06 14:09:14 +00:00
|
|
|
.in +4n
|
2020-06-10 10:04:48 +00:00
|
|
|
.EX
|
pldd.1, bpf.2, clone.2, dup.2, ioctl_fat.2, nfsservctl.2, open_by_handle_at.2, perf_event_open.2, pivot_root.2, request_key.2, sched_setaffinity.2, seccomp.2, select.2, statx.2, dl_iterate_phdr.3, dlinfo.3, dlopen.3, insque.3, newlocale.3, printf.3, pthread_setname_np.3, rpc.3, stdarg.3, strfmon.3, veth.4, proc.5, slabinfo.5, cgroup_namespaces.7, cgroups.7, cpuset.7, fanotify.7, inotify.7, mount_namespaces.7, sock_diag.7, user_namespaces.7, ld.so.8: Use \(aq instead of ' inside monospace fonts
Use \(aq to get an unslanted single quote inside monospace code
blocks. Using a simple ' results in a slanted quote inside PDFs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2020-09-24 07:32:31 +00:00
|
|
|
sh2# \fBmount \-\-make\-rslave /\fP # Don\(aqt propagate mount events
|
2019-05-20 19:52:13 +00:00
|
|
|
# to other namespaces
|
|
|
|
sh2# \fBumount /sys/fs/cgroup/freezer\fP
|
|
|
|
sh2# \fBmount \-t cgroup \-o freezer freezer /sys/fs/cgroup/freezer\fP
|
|
|
|
sh2# \fBcat /proc/self/mountinfo | grep freezer\fP
|
2016-05-06 14:09:14 +00:00
|
|
|
155 145 0:32 / /sys/fs/cgroup/freezer rw,relatime ...
|
2017-06-13 10:19:56 +00:00
|
|
|
.EE
|
2020-06-10 10:04:48 +00:00
|
|
|
.in
|
2016-05-06 14:09:14 +00:00
|
|
|
.\"
|
2016-08-07 19:21:01 +00:00
|
|
|
.SH CONFORMING TO
|
|
|
|
Namespaces are a Linux-specific feature.
|
2016-05-06 14:09:14 +00:00
|
|
|
.SH NOTES
|
2018-10-14 11:40:47 +00:00
|
|
|
Use of cgroup namespaces requires a kernel that is configured with the
|
|
|
|
.B CONFIG_CGROUPS
|
|
|
|
option.
|
|
|
|
.PP
|
2018-10-14 09:41:57 +00:00
|
|
|
The virtualization provided by cgroup namespaces serves a number of purposes:
|
2016-05-06 14:09:14 +00:00
|
|
|
.IP * 2
|
|
|
|
It prevents information leaks whereby cgroup directory paths outside of
|
|
|
|
a container would otherwise be visible to processes in the container.
|
|
|
|
Such leakages could, for example,
|
|
|
|
reveal information about the container framework
|
|
|
|
to containerized applications.
|
|
|
|
.IP *
|
2016-05-07 07:15:19 +00:00
|
|
|
It eases tasks such as container migration.
|
|
|
|
The virtualization provided by cgroup namespaces
|
|
|
|
allows containers to be isolated from knowledge of
|
|
|
|
the pathnames of ancestor cgroups.
|
2016-05-07 20:42:45 +00:00
|
|
|
Without such isolation, the full cgroup pathnames (displayed in
|
|
|
|
.IR /proc/self/cgroups )
|
|
|
|
would need to be replicated on the target system when migrating a container;
|
2016-05-07 07:15:19 +00:00
|
|
|
those pathnames would also need to be unique,
|
|
|
|
so that they don't conflict with other pathnames on the target system.
|
|
|
|
.IP *
|
2016-05-07 20:41:34 +00:00
|
|
|
It allows better confinement of containerized processes,
|
2016-05-07 07:06:21 +00:00
|
|
|
because it is possible to mount the container's cgroup filesystems such that
|
|
|
|
the container processes can't gain access to ancestor cgroup directories.
|
2016-05-06 14:09:14 +00:00
|
|
|
Consider, for example, the following scenario:
|
|
|
|
.RS 4
|
|
|
|
.IP \(bu 2
|
|
|
|
We have a cgroup directory,
|
|
|
|
.IR /cg/1 ,
|
|
|
|
that is owned by user ID 9000.
|
|
|
|
.IP \(bu
|
|
|
|
We have a process,
|
|
|
|
.IR X ,
|
|
|
|
also owned by user ID 9000,
|
|
|
|
that is namespaced under the cgroup
|
|
|
|
.IR /cg/1/2
|
|
|
|
(i.e.,
|
|
|
|
.I X
|
|
|
|
was placed in a new cgroup namespace via
|
|
|
|
.BR clone (2)
|
|
|
|
or
|
|
|
|
.BR unshare (2)
|
|
|
|
with the
|
|
|
|
.BR CLONE_NEWCGROUP
|
|
|
|
flag).
|
|
|
|
.RE
|
|
|
|
.IP
|
|
|
|
In the absence of cgroup namespacing, because the cgroup directory
|
|
|
|
.IR /cg/1
|
2016-05-07 20:29:28 +00:00
|
|
|
is owned (and writable) by UID 9000 and process
|
2016-05-07 20:29:03 +00:00
|
|
|
.I X
|
2020-07-17 04:26:43 +00:00
|
|
|
is also owned by user ID 9000, process
|
2016-05-07 20:29:03 +00:00
|
|
|
.I X
|
|
|
|
would be able to modify the contents of cgroups files
|
|
|
|
(i.e., change cgroup settings) not only in
|
2016-05-06 14:09:14 +00:00
|
|
|
.IR /cg/1/2
|
|
|
|
but also in the ancestor cgroup directory
|
|
|
|
.IR /cg/1 .
|
|
|
|
Namespacing process
|
|
|
|
.IR X
|
|
|
|
under the cgroup directory
|
2016-05-07 06:10:07 +00:00
|
|
|
.IR /cg/1/2 ,
|
|
|
|
in combination with suitable mount operations
|
|
|
|
for the cgroup filesystem (as shown above),
|
2016-05-06 14:09:14 +00:00
|
|
|
prevents it modifying files in
|
|
|
|
.IR /cg/1 ,
|
|
|
|
since it cannot even see the contents of that directory
|
|
|
|
(or of further removed cgroup ancestor directories).
|
|
|
|
Combined with correct enforcement of hierarchical limits,
|
2016-05-17 00:23:09 +00:00
|
|
|
this prevents process
|
|
|
|
.I X
|
|
|
|
from escaping the limits imposed by ancestor cgroups.
|
2016-05-06 14:09:14 +00:00
|
|
|
.SH SEE ALSO
|
|
|
|
.BR unshare (1),
|
|
|
|
.BR clone (2),
|
|
|
|
.BR setns (2),
|
|
|
|
.BR unshare (2),
|
|
|
|
.BR proc (5),
|
|
|
|
.BR cgroups (7),
|
|
|
|
.BR credentials (7),
|
2016-06-21 08:25:38 +00:00
|
|
|
.BR namespaces (7),
|
2016-05-06 14:09:14 +00:00
|
|
|
.BR user_namespaces (7)
|