From a721e8b25faf1a33d32961f5b22fdfa0f4a82515 Mon Sep 17 00:00:00 2001 From: Michael Kerrisk Date: Fri, 18 Aug 2017 00:59:04 +0200 Subject: [PATCH] aio.7, arp.7, attributes.7, boot.7, cgroups.7, cpuset.7, credentials.7, fanotify.7, fifo.7, glob.7, hier.7, hostname.7, icmp.7, inode.7, inotify.7, keyrings.7, libc.7, mailaddr.7, mount_namespaces.7, mq_overview.7, nptl.7, numa.7, path_resolution.7, persistent-keyring.7, pid_namespaces.7, pipe.7, pkeys.7, process-keyring.7, pthreads.7, pty.7, random.7, sched.7, sem_overview.7, session-keyring.7, shm_overview.7, signal-safety.7, signal.7, spufs.7, standards.7, symlink.7, termio.7, thread-keyring.7, time.7, unicode.7, user-keyring.7, user-session-keyring.7, user_namespaces.7, utf-8.7, xattr.7: ffix Signed-off-by: Michael Kerrisk --- man7/aio.7 | 14 +-- man7/arp.7 | 22 ++--- man7/attributes.7 | 28 +++--- man7/boot.7 | 28 +++--- man7/cgroups.7 | 106 ++++++++++----------- man7/cpuset.7 | 8 +- man7/credentials.7 | 22 ++--- man7/fanotify.7 | 22 ++--- man7/fifo.7 | 2 +- man7/glob.7 | 38 ++++---- man7/hier.7 | 4 +- man7/hostname.7 | 14 +-- man7/icmp.7 | 10 +- man7/inode.7 | 48 +++++----- man7/inotify.7 | 66 ++++++------- man7/keyrings.7 | 62 ++++++------ man7/libc.7 | 6 +- man7/mailaddr.7 | 2 +- man7/mount_namespaces.7 | 182 ++++++++++++++++++------------------ man7/mq_overview.7 | 20 ++-- man7/nptl.7 | 6 +- man7/numa.7 | 12 +-- man7/path_resolution.7 | 36 +++---- man7/persistent-keyring.7 | 16 ++-- man7/pid_namespaces.7 | 32 +++---- man7/pipe.7 | 42 ++++----- man7/pkeys.7 | 24 ++--- man7/process-keyring.7 | 6 +- man7/pthreads.7 | 26 +++--- man7/pty.7 | 10 +- man7/random.7 | 6 +- man7/sched.7 | 104 ++++++++++----------- man7/sem_overview.7 | 10 +- man7/session-keyring.7 | 14 +-- man7/shm_overview.7 | 4 +- man7/signal-safety.7 | 14 +-- man7/signal.7 | 42 ++++----- man7/spufs.7 | 56 +++++------ man7/standards.7 | 30 +++--- man7/symlink.7 | 48 +++++----- man7/termio.7 | 4 +- man7/thread-keyring.7 | 8 +- man7/time.7 | 14 +-- man7/unicode.7 | 10 +- man7/user-keyring.7 | 12 +-- man7/user-session-keyring.7 | 16 ++-- man7/user_namespaces.7 | 94 +++++++++---------- man7/utf-8.7 | 2 +- man7/xattr.7 | 8 +- 49 files changed, 705 insertions(+), 705 deletions(-) diff --git a/man7/aio.7 b/man7/aio.7 index 76706cd98..053172db5 100644 --- a/man7/aio.7 +++ b/man7/aio.7 @@ -34,7 +34,7 @@ The application can elect to be notified of completion of the I/O operation in a variety of ways: by delivery of a signal, by instantiation of a thread, or no notification at all. - +.PP The POSIX AIO interface consists of the following functions: .TP 16 .BR aio_read (3) @@ -171,11 +171,11 @@ The control block buffer and the buffer pointed to by .I aio_buf must not be changed while the I/O operation is in progress. These buffers must remain valid until the I/O operation completes. - +.PP Simultaneous asynchronous read or write operations using the same .I aiocb structure yield undefined results. - +.PP The current Linux POSIX AIO implementation is provided in user space by glibc. This has a number of limitations, most notably that maintaining multiple threads to perform I/O operations is expensive and scales poorly. @@ -206,18 +206,18 @@ of a signal. After all I/O requests have completed, the program retrieves their status using .BR aio_return (3). - +.PP The .B SIGQUIT signal (generated by typing control-\\) causes the program to request cancellation of each of the outstanding requests using .BR aio_cancel (3). - +.PP Here is an example of what we might see when running this program. In this example, the program queues two requests to standard input, and these are satisfied by two lines of input containing "abc" and "x". - +.PP .in +4n .nf $ \fB./a.out /dev/stdin /dev/stdin\fP @@ -462,7 +462,7 @@ main(int argc, char *argv[]) .BR aio_return (3), .BR aio_write (3), .BR lio_listio (3) - +.PP "Asynchronous I/O Support in Linux 2.5", Bhattacharya, Pratt, Pulavarty, and Morgan, Proceedings of the Linux Symposium, 2003, diff --git a/man7/arp.7 b/man7/arp.7 index aed59ce08..2b5f76cb2 100644 --- a/man7/arp.7 +++ b/man7/arp.7 @@ -21,7 +21,7 @@ and IPv4 protocol addresses on directly connected networks. The user normally doesn't interact directly with this module except to configure it; instead it provides a service for other protocols in the kernel. - +.PP A user process can receive ARP packets by using .BR packet (7) sockets. @@ -34,7 +34,7 @@ The ARP table can also be controlled via on any .B AF_INET socket. - +.PP The ARP module maintains a cache of mappings between hardware addresses and protocol addresses. The cache has a limited size so old and less @@ -46,7 +46,7 @@ be directly manipulated by the use of ioctls and its behavior can be tuned by the .I /proc interfaces described below. - +.PP When there is no positive feedback for an existing mapping after some time (see the .I /proc @@ -69,7 +69,7 @@ If that fails too, it will broadcast a new ARP request to the network. Requests are sent only when there is data queued for sending. - +.PP Linux will automatically add a nonpermanent proxy arp entry when it receives a request for an address it forwards to and proxy arp is enabled on the receiving interface. @@ -81,7 +81,7 @@ sockets. They take a pointer to a .I struct arpreq as their argument. - +.PP .in +4n .nf struct arpreq { @@ -93,14 +93,14 @@ struct arpreq { }; .fi .in - +.PP .BR SIOCSARP ", " SIOCDARP " and " SIOCGARP respectively set, delete and get an ARP mapping. Setting and deleting ARP maps are privileged operations and may be performed only by a process with the .B CAP_NET_ADMIN capability or an effective UID of 0. - +.PP .I arp_pa must be an .B AF_INET @@ -276,13 +276,13 @@ changed in Linux 2.0 to include the .I arp_dev member and the ioctl numbers changed at the same time. Support for the old ioctls was dropped in Linux 2.2. - +.PP Support for proxy arp entries for networks (netmask not equal 0xffffffff) was dropped in Linux 2.2. It is replaced by automatic proxy arp setup by the kernel for all reachable hosts on other interfaces (when forwarding and proxy arp is enabled for the interface). - +.PP The .I neigh/* interfaces did not exist before Linux 2.2. @@ -290,13 +290,13 @@ interfaces did not exist before Linux 2.2. Some timer settings are specified in jiffies, which is architecture- and kernel version-dependent; see .BR time (7). - +.PP There is no way to signal positive feedback from user space. This means connection-oriented protocols implemented in user space will generate excessive ARP traffic, because ndisc will regularly reprobe the MAC address. The same problem applies for some kernel protocols (e.g., NFS over UDP). - +.PP This man page mashes together functionality that is IPv4-specific with functionality that is shared between IPv4 and IPv6. .SH SEE ALSO diff --git a/man7/attributes.7 b/man7/attributes.7 index d53252bf1..52bea817b 100644 --- a/man7/attributes.7 +++ b/man7/attributes.7 @@ -32,7 +32,7 @@ the text of this man page is based on the material taken from the "POSIX Safety Concepts" section of the GNU C Library manual. Further details on the topics described here can be found in that manual. - +.PP Various function manual pages include a section ATTRIBUTES that describes the safety of calling the function in various contexts. This section annotates functions with the following safety markings: @@ -43,7 +43,7 @@ or Thread-Safe functions are safe to call in the presence of other threads. MT, in MT-Safe, stands for Multi Thread. - +.IP Being MT-Safe does not imply a function is atomic, nor that it uses any of the memory synchronization mechanisms POSIX exposes to users. It is even possible that calling MT-Safe functions in sequence @@ -52,7 +52,7 @@ For example, having a thread call two MT-Safe functions one right after the other does not guarantee behavior equivalent to atomic execution of a combination of both functions, since concurrent calls in other threads may interfere in a destructive way. - +.IP Whole-program optimizations that could inline functions across library interfaces may expose unsafe reordering, and so performing inlining across the GNU C Library interface is not recommended. @@ -340,7 +340,7 @@ Functions marked with .I init as an MT-Unsafe feature perform MT-Unsafe initialization when they are first called. - +.IP Calling such a function at least once in single-threaded mode removes this specific cause for the function to be regarded as MT-Unsafe. If no other cause for that remains, @@ -517,7 +517,7 @@ modify enables readers to be regarded as MT-Safe \" and AS-Safe (as long as no other reasons for them to be unsafe remain), since the lack of synchronization is not a problem when the objects are effectively constant. - +.IP The identifier that follows the .I const mark will appear by itself as a safety note in readers. @@ -556,7 +556,7 @@ as a MT-Safety issue may temporarily install a signal handler for internal purposes, which may interfere with other uses of the signal, identified after a colon. - +.IP This safety problem can be worked around by ensuring that no other uses of the signal will take place for the duration of the call. Holding a non-recursive mutex while calling all functions that use the same @@ -594,7 +594,7 @@ are MT-Unsafe. .\" The same window enables changes made by asynchronous signals to be lost. .\" These functions are also AS-Unsafe, .\" but the corresponding mark is omitted as redundant. - +.IP It is thus advisable for applications using the terminal to avoid concurrent and reentrant interactions with it, by not using it in signal handlers or blocking signals that might use it, @@ -645,7 +645,7 @@ annotated with called concurrently with locale changes may behave in ways that do not correspond to any of the locales active during their execution, but an unpredictable mix thereof. - +.IP We do not mark these functions as MT-Unsafe, \" or AS-Unsafe, however, because functions that modify the locale object are marked with @@ -677,7 +677,7 @@ environment with .BR getenv (3) or similar, without any guards to ensure safety in the presence of concurrent modifications. - +.IP We do not mark these functions as MT-Unsafe, \" or AS-Unsafe, however, because functions that modify the environment are all marked with @@ -716,7 +716,7 @@ GNU C Library .I _sigintr internal data structure without any guards to ensure safety in the presence of concurrent modifications. - +.IP We do not mark these functions as MT-Unsafe, \" or AS-Unsafe, however, because functions that modify this data structure are all marked with @@ -797,7 +797,7 @@ as an MT-Safety issue may temporarily change the current working directory during their execution, which may cause relative pathnames to be resolved in unexpected ways in other threads or within asynchronous signal or cancellation handlers. - +.IP This is not enough of a reason to mark so-marked functions as MT-Unsafe, .\" or AS-Unsafe, but when this behavior is optional (e.g., @@ -836,7 +836,7 @@ It is envisioned that it may be applied to and .I corrupt as well in the future. - +.IP In most cases, the identifier will name a set of functions, but it may name global objects or function arguments, or identifiable properties or logical components associated with them, @@ -848,7 +848,7 @@ or .I :tcattr(fd) to denote the terminal attributes of a file descriptor .IR fd . - +.IP The most common use for identifiers is to provide logical groups of functions and arguments that need to be protected by the same synchronization primitive in order to ensure safe operation in a given @@ -874,7 +874,7 @@ indicate the preceding marker only applies when argument is NULL, or global variable .I one_per_line is nonzero. - +.IP When all marks that render a function unsafe are adorned with such conditions, and none of the named conditions hold, diff --git a/man7/boot.7 b/man7/boot.7 index 4700c7211..5142c7bde 100644 --- a/man7/boot.7 +++ b/man7/boot.7 @@ -37,7 +37,7 @@ After power-on or hard reset, control is given to a program stored in read-only memory (normally PROM); for historical reasons involving the personal computer, this program is often called "the \fBBIOS\fR". - +.PP This program normally performs a basic self-test of the machine and accesses nonvolatile memory to read further parameters. @@ -46,7 +46,7 @@ battery-backed CMOS memory, so most people refer to it as "the \fBCMOS\fR"; outside of the PC world, it is usually called "the \fBNVRAM\fR" (nonvolatile RAM). - +.PP The parameters stored in the NVRAM vary among systems, but as a minimum, they should specify which device can supply an OS loader, or at least which @@ -67,11 +67,11 @@ interactive use, in order to enable specification of an alternative kernel (maybe a backup in case the one last compiled isn't functioning) and to pass optional parameters to the kernel. - +.PP In a traditional PC, the OS loader is located in the initial 512-byte block of the boot device; this block is known as "the \fBMBR\fR" (Master Boot Record). - +.PP In most systems, the OS loader is very limited due to various constraints. Even on non-PC systems, @@ -79,12 +79,12 @@ there are some limitations on the size and complexity of this loader, but the size limitation of the PC MBR (512 bytes, including the partition table) makes it almost impossible to squeeze much functionality into it. - +.PP Therefore, most systems split the role of loading the OS between a primary OS loader and a secondary OS loader; this secondary OS loader may be located within a larger portion of persistent storage, such as a disk partition. - +.PP In Linux, the OS loader is often either .BR lilo (8) or @@ -98,13 +98,13 @@ The kernel starts the virtual memory swapper (it is a kernel process, called "kswapd" in a modern Linux kernel), and mounts some filesystem at the root path, .IR / . - +.PP Some of the parameters that may be passed to the kernel relate to these activities (for example, the default root filesystem can be overridden); for further information on Linux kernel parameters, read .BR bootparam (7). - +.PP Only then does the kernel create the initial userland process, which is given the number 1 as its .B PID @@ -136,13 +136,13 @@ the administrator an easy way to establish an environment for some usage; each run-level is associated with a set of services (for example, run-level \fBS\fR is \fIsingle-user\fR mode, and run-level \fB2\fR entails running most network services). - +.PP The administrator may change the current run-level via .BR init (1), and query the current run-level via .BR runlevel (8). - +.PP However, since it is not convenient to manage individual services by editing this file, .I /etc/inittab @@ -174,7 +174,7 @@ of the form \fI/etc/rc[0\-6S].d\fR. In each of these directories, there are links (usually symbolic) to the scripts in the \fI/etc/init.d\fR directory. - +.PP A primary script (usually \fI/etc/rc\fR) is called from .BR inittab (5); this primary script calls each service's script via a link in the @@ -183,7 +183,7 @@ Each link whose name begins with \(aqS\(aq is called with the argument "start" (thereby starting the service). Each link whose name begins with \(aqK\(aq is called with the argument "stop" (thereby stopping the service). - +.PP To define the starting or stopping order within the same run-level, the name of a link contains an \fBorder-number\fR. Also, for clarity, the name of a link usually @@ -193,7 +193,7 @@ the link \fI/etc/rc2.d/S80sendmail\fR starts the sendmail service on runlevel 2. This happens after \fI/etc/rc2.d/S12syslog\fR is run but before \fI/etc/rc2.d/S90xfs\fR is run. - +.PP To manage these links is to manage the boot order and run-levels; under many systems, there are tools to help with this task (e.g., @@ -207,7 +207,7 @@ inputs without editing an entire boot script, some separate configuration file is used, and is located in a specific directory where an associated boot script may find it (\fI/etc/sysconfig\fR on older Red Hat systems). - +.PP In older UNIX systems, such a file contained the actual command line options for a daemon, but in modern Linux systems (and also in HP-UX), it just contains shell variables. diff --git a/man7/cgroups.7 b/man7/cgroups.7 index c2dbcec6b..0ad417d09 100644 --- a/man7/cgroups.7 +++ b/man7/cgroups.7 @@ -42,7 +42,7 @@ A .I cgroup is a collection of processes that are bound to a set of limits or parameters defined via the cgroup filesystem. - +.PP A .I subsystem is a kernel component that modifies the behavior of @@ -54,7 +54,7 @@ and freezing and resuming execution of the processes in a cgroup. Subsystems are sometimes also known as .IR "resource controllers" (or simply, controllers). - +.PP The cgroups for a controller are arranged in a .IR hierarchy . This hierarchy is defined by creating, removing, and @@ -77,7 +77,7 @@ and management of the cgroup hierarchies became rather complex. (A longer description of these problems can be found in the kernel source file .IR Documentation/cgroup\-v2.txt .) - +.PP Because of the problems with the initial cgroups implementation (cgroups version 1), starting in Linux 3.10, work began on a new, @@ -87,7 +87,7 @@ Initially marked experimental, and hidden behind the mount option, the new version (cgroups version 2) was eventually made official with the release of Linux 4.5. Differences between the two versions are described in the text below. - +.PP Although cgroups v2 is intended as a replacement for cgroups v1, the older system continues to exist (and for compatibility reasons is unlikely to be removed). @@ -109,7 +109,7 @@ processes on the system. It is also possible comount multiple (or even all) cgroups v1 controllers against the same cgroup filesystem, meaning that the comounted controllers manage the same hierarchical organization of processes. - +.PP For each mounted hierarchy, the directory tree mirrors the control group hierarchy. Each control group is represented by a directory, with each of its child @@ -125,7 +125,7 @@ which is a child of Under each cgroup directory is a set of files which can be read or written to, reflecting resource limits and a few general cgroup properties. - +.PP In addition, in cgroups v1, cgroups can be mounted with no bound controller, in which case they serve only to track processes. @@ -160,7 +160,7 @@ The use of cgroups requires a kernel built with the option. In addition, each of the v1 controllers has an associated configuration option that must be set in order to employ that controller. - +.PP In order to use a v1 controller, it must be mounted against a cgroup filesystem. The usual place for such mounts is under a @@ -170,26 +170,26 @@ filesystem mounted at Thus, one might mount the .I cpu controller as follows: - +.PP .nf .in +4n mount \-t cgroup \-o cpu none /sys/fs/cgroup/cpu .in .fi - +.PP It is possible to comount multiple controllers against the same hierarchy. For example, here the .IR cpu and .IR cpuacct controllers are comounted against a single hierarchy: - +.PP .nf .in +4n mount \-t cgroup \-o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct .in .fi - +.PP Comounting controllers has the effect that a process is in the same cgroup for all of the comounted controllers. Separately mounting controllers allows a process to @@ -198,19 +198,19 @@ be in cgroup for one controller while being in .I /foo2/foo3 for another. - +.PP It is possible to comount all v1 controllers against the same hierarchy: - +.PP .nf .in +4n mount \-t cgroup \-o all cgroup /sys/fs/cgroup .in .fi - +.PP (One can achieve the same result by omitting .IR "\-o all" , since it is the default if no controllers are explicitly specified.) - +.PP It is not possible to mount the same controller against multiple cgroup hierarchies. For example, it is not possible to mount both the @@ -224,7 +224,7 @@ It is possible to create multiple mount points with exactly the same set of comounted controllers. However, in this case all that results is multiple mount points providing a view of the same hierarchy. - +.PP Note that on many systems, the v1 controllers are automatically mounted under .IR /sys/fs/cgroup ; in particular, @@ -244,7 +244,7 @@ when a system is busy. This does not limit a cgroup's CPU usage if the CPUs are not busy. For further information, see .IR Documentation/scheduler/sched-design-CFS.txt . - +.IP In Linux 3.2, this controller was extended to provide CPU "bandwidth" control. If the kernel is configured with @@ -258,21 +258,21 @@ Further information can be found in the kernel source file .TP .IR cpuacct " (since Linux 2.6.24; " \fBCONFIG_CGROUP_CPUACCT\fP ) This provides accounting for CPU usage by groups of processes. - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup\-v1/cpuacct.txt . .TP .IR cpuset " (since Linux 2.6.24; " \fBCONFIG_CPUSETS\fP ) This cgroup can be used to bind the processes in a cgroup to a specified set of CPUs and NUMA nodes. - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup\-v1/cpusets.txt . .TP .IR memory " (since Linux 2.6.25; " \fBCONFIG_MEMCG\fP ) The memory controller supports reporting and limiting of process memory, kernel memory, and swap used by cgroups. - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup\-v1/memory.txt . .TP @@ -282,7 +282,7 @@ well as open them for reading or writing. The policies may be specified as whitelists and blacklists. Hierarchy is enforced, so new rules must not violate existing rules for the target or ancestor cgroups. - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup-v1/devices.txt . .TP @@ -295,7 +295,7 @@ Freezing a cgroup also causes its children, for example, processes in .IR /A/B , to be frozen. - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup-v1/freezer-subsystem.txt . .TP @@ -307,7 +307,7 @@ as well as used to shape traffic using .BR tc (8). This applies only to packets leaving the cgroup, not to traffic arriving at the cgroup. - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup-v1/net_cls.txt . .TP @@ -317,14 +317,14 @@ The cgroup controls and limits access to specified block devices by applying IO control in the form of throttling and upper limits against leaf nodes and intermediate nodes in the storage hierarchy. - +.IP Two policies are available. The first is a proportional-weight time-based division of disk implemented with CFQ. This is in effect for leaf nodes using CFQ. The second is a throttling policy which specifies upper I/O rate limits on a device. - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup-v1/blkio-controller.txt . .TP @@ -332,26 +332,26 @@ Further information can be found in the kernel source file This controller allows .I perf monitoring of the set of processes grouped in a cgroup. - +.IP Further information can be found in the kernel source file .IR tools/perf/Documentation/perf-record.txt . .TP .IR net_prio " (since Linux 3.3; " \fBCONFIG_CGROUP_NET_PRIO\fP ) This allows priorities to be specified, per network interface, for cgroups. - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup-v1/net_prio.txt . .TP .IR hugetlb " (since Linux 3.5; " \fBCONFIG_CGROUP_HUGETLB\fP ) This supports limiting the use of huge pages by cgroups. - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup-v1/hugetlb.txt . .TP .IR pids " (since Linux 4.3; " \fBCONFIG_CGROUP_PIDS\fP ) This controller permits limiting the number of process that may be created in a cgroup (and its descendants). - +.IP Further information can be found in the kernel source file .IR Documentation/cgroup-v1/pids.txt . .\" @@ -359,33 +359,33 @@ Further information can be found in the kernel source file A cgroup filesystem initially contains a single root cgroup, '/', which all processes belong to. A new cgroup is created by creating a directory in the cgroup filesystem: - +.PP mkdir /sys/fs/cgroup/cpu/cg1 - +.PP This creates a new empty cgroup. - +.PP A process may be moved to this cgroup by writing its PID into the cgroup's .I cgroup.procs file: - +.PP echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs - +.PP Only one PID at a time should be written to this file. - +.PP Writing the value 0 to a .IR cgroup.procs file causes the writing process to be moved to the corresponding cgroup. - +.PP When writing a PID into the .IR cgroup.procs , all threads in the process are moved into the new cgroup at once. - +.PP Within a hierarchy, a process can be a member of exactly one cgroup. Writing a process's PID to a .IR cgroup.procs file automatically removes it from the cgroup of which it was previously a member. - +.PP The .I cgroup.procs file can be read to obtain a list of the processes that are @@ -393,7 +393,7 @@ members of a cgroup. The returned list of PIDs is not guaranteed to be in order. Nor is it guaranteed to be free of duplicates. (For example, a PID may be recycled while reading from the list.) - +.PP In cgroups v1 (but not cgroups v2), an individual thread can be moved to another cgroup by writing its thread ID (i.e., the kernel thread ID returned by @@ -420,7 +420,7 @@ Two files can be used to determine whether the kernel provides notifications when a cgroup becomes empty. A cgroup is considered to be empty when it contains no child cgroups and no member processes. - +.PP A special file in the root directory of each cgroup hierarchy, .IR release_agent , can be used to register the pathname of a program that may be invoked when @@ -433,11 +433,11 @@ The .IR release_agent program might remove the cgroup directory, or perhaps repopulate with a process. - +.PP The default value of the .IR release_agent file is empty, meaning that no release agent is invoked. - +.PP Whether or not the .IR release_agent program is invoked when a particular cgroup becomes empty is determined @@ -462,7 +462,7 @@ While (different) controllers may be simultaneously mounted under the v1 and v2 hierarchies, it is not possible to mount the same controller simultaneously under both the v1 and the v2 hierarchies. - +.PP The new behaviors in cgroups v2 are summarized here, and in some cases elaborated in the following subsections. .IP 1. 3 @@ -506,9 +506,9 @@ all available controllers are mounted against a single hierarchy. The available controllers are automatically mounted, meaning that it is not necessary (or possible) to specify the controllers when mounting the cgroup v2 filesystem using a command such as the following: - +.PP mount -t cgroup2 none /mnt/cgroup2 - +.PP A cgroup v2 controller is available only if it is not currently in use via a mount against a cgroup v1 hierarchy. Or, to put things another way, it is not possible to employ @@ -519,7 +519,7 @@ With the exception of the root cgroup, processes may reside only in leaf nodes (cgroups that do not themselves contain child cgroups). This avoids the need to decide how to partition resources between processes which are members of cgroup A and processes in child cgroups of A. - +.PP For instance, if cgroup .I /cg1/cg2 exists, then a process may reside in @@ -580,7 +580,7 @@ which has either the value 0, meaning that the cgroup (and its descendants) contain no (nonzombie) processes, or 1, meaning that the cgroup contains member processes. - +.PP The .IR cgroup.events file can be monitored, in order to receive notification when a cgroup @@ -594,7 +594,7 @@ events, and when monitoring the file using transitions generate .B POLLPRI events. - +.PP The cgroups v2 .IR notify_on_release mechanism offers at least two advantages over the cgroups v1 @@ -616,7 +616,7 @@ This file contains information about the controllers that are compiled into the kernel. An example of the contents of this file (reformatted for readability) is the following: - +.IP .nf .in +4n #subsys_name hierarchy num_cgroups enabled @@ -634,7 +634,7 @@ hugetlb 0 1 0 pids 2 1 1 .in .fi - +.IP The fields in this file are, from left to right: .RS .IP 1. 3 @@ -666,13 +666,13 @@ This file describes control groups to which the process with the corresponding PID belongs. The displayed information differs for cgroups version 1 and version 2 hierarchies. - +.IP For each cgroup hierarchy of which the process is a member, there is one entry containing three colon-separated fields of the form: - +.IP hierarchy-ID:controller-list:cgroup-path - +.IP For example: .IP .in +4n diff --git a/man7/cpuset.7 b/man7/cpuset.7 index 347b92902..5dd2df7cc 100644 --- a/man7/cpuset.7 +++ b/man7/cpuset.7 @@ -175,7 +175,7 @@ it from the cpuset that previously contained it) by writing its PID to that cpuset's .I tasks file (with or without a trailing newline). - +.IP .B Warning: only one PID may be written to the .I tasks @@ -199,7 +199,7 @@ in that cpuset are allowed to execute. See \fBList Format\fR below for a description of the format of .IR cpus . - +.IP The CPUs allowed to a cpuset may be changed by writing a new list to its .I cpus @@ -212,7 +212,7 @@ If set (1), the cpuset has exclusive use of its CPUs (no sibling or cousin cpuset may overlap CPUs). By default, this is off (0). Newly created cpusets also initially default this to off (0). - +.IP Two cpusets are .I sibling cpusets if they share the same parent cpuset in the @@ -250,7 +250,7 @@ its memory nodes (no sibling or cousin may overlap). Also if set (1), the cpuset is a \fBHardwall\fR cpuset (see below). By default, this is off (0). Newly created cpusets also initially default this to off (0). - +.IP Regardless of the .I mem_exclusive setting, if one cpuset is the ancestor of another, diff --git a/man7/credentials.7 b/man7/credentials.7 index f74cf195a..cf7764bf6 100644 --- a/man7/credentials.7 +++ b/man7/credentials.7 @@ -38,7 +38,7 @@ A PID is represented using the type .I pid_t (defined in .IR ). - +.PP PIDs are used in a range of system calls to identify the process affected by the call, for example: .BR kill (2), @@ -59,7 +59,7 @@ and .BR waitpid (2). .\" .BR waitid (2), .\" .BR wait4 (2), - +.PP A process's PID is preserved across an .BR execve (2). .SS Parent process ID (PPID) @@ -70,7 +70,7 @@ A process can obtain its PPID using .BR getppid (2). A PPID is represented using the type .IR pid_t . - +.PP A process's PPID is preserved across an .BR execve (2). .SS Process group ID and session ID @@ -81,13 +81,13 @@ A process can obtain its session ID using .BR getsid (2), and its process group ID using .BR getpgrp (2). - +.PP A child created by .BR fork (2) inherits its parent's session ID and process group ID. A process's session ID and process group ID are preserved across an .BR execve (2). - +.PP Sessions and process groups are abstractions devised to support shell job control. A process group (sometimes called a "job") is a collection of @@ -100,7 +100,7 @@ A process's group membership can be set using .BR setpgid (2). The process whose process ID is the same as its process group ID is the \fIprocess group leader\fP for that group. - +.PP A session is a collection of processes that share the same session ID. All of the members of a process group also have the same session ID (i.e., all of the members of a process group always belong to the @@ -112,7 +112,7 @@ which creates a new session whose session ID is the same as the PID of the process that called .BR setsid (2). The creator of the session is called the \fIsession leader\fP. - +.PP All of the processes in a session share a .IR "controlling terminal" . The controlling terminal is established when the session leader @@ -121,7 +121,7 @@ first opens a terminal (unless the flag is specified when calling .BR open (2)). A terminal may be the controlling terminal of at most one session. - +.PP At most one of the jobs in a session may be the .IR "foreground job" ; other jobs in the session are @@ -143,7 +143,7 @@ When terminal keys that generate a signal (such as the .I interrupt key, normally control-C) are pressed, the signal is sent to the processes in the foreground job. - +.PP Various system calls and library functions may operate on all members of a process group, including @@ -172,7 +172,7 @@ and .I gid_t (defined in .IR ). - +.PP On Linux, each process has the following user and group identifiers: .IP * 3 Real user ID and real group ID. @@ -260,7 +260,7 @@ a process's real user and group ID and supplementary group IDs are preserved; the effective and saved set IDs may be changed, as described in .BR execve (2). - +.PP Aside from the purposes noted above, a process's user IDs are also employed in a number of other contexts: .IP * 3 diff --git a/man7/fanotify.7 b/man7/fanotify.7 index db906d8d2..ae5c5a6a7 100644 --- a/man7/fanotify.7 +++ b/man7/fanotify.7 @@ -34,14 +34,14 @@ In particular, there is no support for create, delete, and move events. (See .BR inotify (7) for details of an API that does notify those events.) - +.PP Additional capabilities compared to the .BR inotify (7) API include the ability to monitor all of the objects in a mounted filesystem, the ability to make access permission decisions, and the possibility to read or modify files before access by other applications. - +.PP The following system calls are used with this API: .BR fanotify_init (2), .BR fanotify_mark (2), @@ -104,7 +104,7 @@ or similar) from the fanotify file descriptor returned by .BR fanotify_init (2). - +.PP Two types of events are generated: .I notification events and @@ -118,7 +118,7 @@ Permission events are requests to the receiving application to decide whether permission for a file access shall be granted. For these events, the recipient must write a response which decides whether access is granted or not. - +.PP An event is removed from the event queue of the fanotify group when it has been read. Permission events that have been read are kept in an internal list of the @@ -137,11 +137,11 @@ is not specified in the call to until either a file event occurs or the call is interrupted by a signal (see .BR signal (7)). - +.PP After a successful .BR read (2), the read buffer contains one or more of the following structures: - +.PP .in +4n .nf struct fanotify_event_metadata { @@ -160,12 +160,12 @@ For performance reasons, it is recommended to use a large buffer size (for example, 4096 bytes), so that multiple events can be retrieved by a single .BR read (2). - +.PP The return value of .BR read (2) is the number of bytes placed in the buffer, or \-1 in case of an error (but see BUGS). - +.PP The fields of the .I fanotify_event_metadata structure are as follows: @@ -291,7 +291,7 @@ To check for any close event, the following bit mask may be used: .B FAN_CLOSE A file was closed. This is a synonym for: - +.IP FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE .PP The following macros are provided to iterate over a buffer containing @@ -346,7 +346,7 @@ For permission events, the application must .BR write (2) a structure of the following form to the fanotify file descriptor: - +.PP .in +4n .nf struct fanotify_response { @@ -495,7 +495,7 @@ calls to generate .B FAN_MODIFY events. - +.PP As of Linux 3.17, the following bugs exist: .IP * 3 diff --git a/man7/fifo.7 b/man7/fifo.7 index daa4121a2..d7dde7ca5 100644 --- a/man7/fifo.7 +++ b/man7/fifo.7 @@ -55,7 +55,7 @@ When a process tries to write to a FIFO that is not opened for read on the other side, the process is sent a .B SIGPIPE signal. - +.PP FIFO special files can be created by .BR mkfifo (3), and are indicated by diff --git a/man7/glob.7 b/man7/glob.7 index 4ea43670c..706325bb2 100644 --- a/man7/glob.7 +++ b/man7/glob.7 @@ -31,11 +31,11 @@ Long ago, in UNIX\ V6, there was a program .I /etc/glob that would expand wildcard patterns. Soon afterward this became a shell built-in. - +.PP These days there is also a library routine .BR glob (3) that will perform this function for a user program. - +.PP The rules are as follows (POSIX.2, 3.13). .SS Wildcard matching A string is a wildcard pattern if it contains one of the @@ -44,9 +44,9 @@ Globbing is the operation that expands a wildcard pattern into the list of pathnames matching the pattern. Matching is defined by: - +.PP A \(aq?\(aq (not between brackets) matches any single character. - +.PP A \(aq*\(aq (not between brackets) matches any string, including the empty string. .PP @@ -81,7 +81,7 @@ any character that is not matched by the expression obtained by removing the first \(aq!\(aq from it. (Thus, "\fI[!]a\-]\fP" matches any single character except \(aq]\(aq, \(aqa\(aq and \(aq\-\(aq.) - +.PP One can remove the special meaning of \(aq?\(aq, \(aq*\(aq and \(aq[\(aq by preceding them by a backslash, or, in case this is part of a shell command line, enclosing them in quotes. @@ -95,7 +95,7 @@ A \(aq/\(aq in a pathname cannot be matched by a \(aq?\(aq or \(aq*\(aq wildcard, or by a range like "\fI[.\-0]\fP". A range containing an explicit \(aq/\(aq character is syntactically incorrect. (POSIX requires that syntactically incorrect patterns are left unchanged.) - +.PP If a filename starts with a \(aq.\(aq, this character must be matched explicitly. (Thus, \fIrm\ *\fP will not remove .profile, and \fItar\ c\ *\fP will not @@ -106,11 +106,11 @@ into the list of matching pathnames" was the original UNIX definition. It allowed one to have patterns that expand into an empty list, as in - +.PP .nf xv \-wait 0 *.gif *.jpg .fi - +.PP where perhaps no *.gif files are present (and this is not an error). However, POSIX requires that a wildcard pattern is left @@ -119,23 +119,23 @@ matching pathnames is empty. With .I bash one can force the classical behavior using this command: - +.PP shopt \-s nullglob .\" In Bash v1, by setting allow_null_glob_expansion=true - +.PP (Similar problems occur elsewhere. For example, where old scripts have - +.PP .nf rm \`find . \-name "*~"\` .fi - +.PP new scripts require - +.PP .nf rm \-f nosuchfile \`find . \-name "*~"\` .fi - +.PP to avoid error messages from .I rm called with an empty argument list.) @@ -147,7 +147,7 @@ First of all, they match filenames, rather than text, and secondly, the conventions are not the same: for example, in a regular expression \(aq*\(aq means zero or more copies of the preceding thing. - +.PP Now that regular expressions have bracket expressions where the negation is indicated by a \(aq^\(aq, POSIX has declared the effect of a wildcard pattern "\fI[^...]\fP" to be undefined. @@ -169,13 +169,13 @@ expression: namely (i) the negation, (ii) explicit single characters, and (iii) ranges. POSIX specifies ranges in an internationally more useful way and adds three more types: - +.PP (iii) Ranges X\-Y comprise all characters that fall between X and Y (inclusive) in the current collating sequence as defined by the .B LC_COLLATE category in the current locale. - +.PP (iv) Named character classes, like .nf @@ -191,13 +191,13 @@ These character classes are defined by the .B LC_CTYPE category in the current locale. - +.PP (v) Collating symbols, like "\fI[.ch.]\fP" or "\fI[.a-acute.]\fP", where the string between "\fI[.\fP" and "\fI.]\fP" is a collating element defined for the current locale. Note that this may be a multicharacter element. - +.PP (vi) Equivalence class expressions, like "\fI[=a=]\fP", where the string between "\fI[=\fP" and "\fI=]\fP" is any collating element from its equivalence class, as defined for the diff --git a/man7/hier.7 b/man7/hier.7 index 3c6f7fcfe..a035b101b 100644 --- a/man7/hier.7 +++ b/man7/hier.7 @@ -271,7 +271,7 @@ This contains information which may change from system release to system release and used to be a symbolic link to .I /usr/src/linux/include/linux to get at operating-system-specific information. - +.IP (Note that one should have include files there that work correctly with the current libc and in user space. However, Linux kernel source is not @@ -646,5 +646,5 @@ differently. .BR ln (1), .BR proc (5), .BR mount (8) - +.PP The Filesystem Hierarchy Standard diff --git a/man7/hostname.7 b/man7/hostname.7 index c2196cbcf..4c07485c7 100644 --- a/man7/hostname.7 +++ b/man7/hostname.7 @@ -43,7 +43,7 @@ hostname \- hostname resolution description Hostnames are domains, where a domain is a hierarchical, dot-separated list of subdomains; for example, the machine "monet", in the "example" subdomain of the "com" domain would be represented as "monet.example.com". - +.PP Each element of the hostname must be from 1 to 63 characters long and the entire hostname, including the dots, can be at most 253 characters long. Valid characters for hostnames are @@ -58,7 +58,7 @@ to .IR 9 , and the hyphen (\-). A hostname may not start with a hyphen. - +.PP Hostnames are often used with network client and server programs, which must generally translate the name to an address for use. (This task is generally performed by either @@ -67,7 +67,7 @@ or the obsolete .BR gethostbyname (3).) Hostnames are resolved by the Internet name resolver in the following fashion. - +.PP If the name consists of a single component, that is, contains no dot, and if the environment variable .B HOSTALIASES @@ -80,11 +80,11 @@ to be substituted for that alias. If a case-insensitive match is found between the hostname to be resolved and the first field of a line in the file, the substituted name is looked up with no further processing. - +.PP If the input name ends with a trailing dot, the trailing dot is removed, and the remaining name is looked up with no further processing. - +.PP If the input name does not end with a trailing dot, it is looked up by searching through a list of domains until a match is found. The default search list includes first the local domain, @@ -103,11 +103,11 @@ by a system-wide configuration file (see .BR resolver (5), .BR mailaddr (7), .BR named (8) - +.PP .UR http://www.ietf.org\:/rfc\:/rfc1123.txt IETF RFC\ 1123 .UE - +.PP .UR http://www.ietf.org\:/rfc\:/rfc1178.txt IETF RFC\ 1178 .UE diff --git a/man7/icmp.7 b/man7/icmp.7 index 9db5ddfd2..e1c2a4e19 100644 --- a/man7/icmp.7 +++ b/man7/icmp.7 @@ -85,13 +85,13 @@ packets. .\" The following taken from 2.6.28-rc4 Documentation/networking/ip-sysctl.txt If disabled, ICMP error messages are sent with the primary address of the exiting interface. - +.IP If enabled, the message will be sent with the primary address of the interface that received the packet that caused the ICMP error. This is the behavior that many network administrators will expect from a router. And it can make debugging complicated network layouts much easier. - +.IP Note that if no primary address exists for the interface selected, then the primary address of the first non-loopback interface that has one will be used regardless of this setting. @@ -122,11 +122,11 @@ otherwise the minimum space between responses in milliseconds. .IR icmp_ratemask " (integer; default: see below; since Linux 2.4.10)" .\" The following taken from 2.6.28-rc4 Documentation/networking/ip-sysctl.txt Mask made of ICMP types for which rates are being limited. - +.IP Significant bits: IHGFEDCBA9876543210 .br Default mask: 0000001100000011000 (0x1818) - +.IP Bit definitions (see the Linux kernel source file .IR include/linux/icmp.h ): .RS 12 @@ -147,7 +147,7 @@ H Address Mask Request I Address Mask Reply .TE .RE - +.PP The bits marked with an asterisk are rate limited by default (see the default mask above). .TP diff --git a/man7/inode.7 b/man7/inode.7 index 64d543542..9e176b161 100644 --- a/man7/inode.7 +++ b/man7/inode.7 @@ -37,7 +37,7 @@ structure, or which returns a .I statx structure. - +.PP The following is a list of the information typically found in, or associated with, the file inode, with the names of the corresponding structure fields returned by @@ -47,7 +47,7 @@ and .TP Device where inode resides \fIstat.st_dev\fP; \fIstatx.stx_dev_minor\fP and \fIstatx.stx_dev_major\fP - +.IP Each inode (as well as the associated file) resides in a filesystem that is hosted on a device. That device is identified by the combination of its major ID @@ -56,7 +56,7 @@ and minor ID (which identifies a specific instance in the general class). .TP Inode number \fIstat.st_ino\fP; \fIstatx.stx_ino\fP - +.IP Each file in a filesystem has a unique inode number. Inode numbers are guaranteed to be unique only within a filesystem (i.e., the same inode numbers may be used by different filesystems, @@ -65,12 +65,12 @@ This field contains the file's inode number. .TP File type and mode \fIstat.st_mode\fP; \fIstatx.stx_mode\fP - +.IP See the discussion of file type and mode, below. .TP Link count \fIstat.st_nlink\fP; \fIstatx.stx_nlink\fP - +.IP This field contains the number of hard links to the file. Additional links to an existing file are created using .BR link (2). @@ -78,7 +78,7 @@ Additional links to an existing file are created using User ID .I st_uid \fIstat.st_uid\fP; \fIstatx.stx_uid\fP - +.IP This field records the user ID of the owner of the file. For newly created files, the file user ID is the effective user ID of the creating process. @@ -87,7 +87,7 @@ The user ID of a file can be changed using .TP Group ID \fIstat.st_gid\fP; \fIstatx.stx_gid\fP - +.IP The inode records the ID of the group owner of the file. For newly created files, the file group ID is either the group ID of the parent directory or @@ -99,13 +99,13 @@ The group ID of a file can be changed using .TP Device represented by this inode \fIstat.st_rdev\fP; \fIstatx.stx_rdev_minor\fP and \fIstatx.stx_rdev_major\fP - +.IP If this file (inode) represents a device, then the inode records the major and minor ID of that device. .TP File size \fIstat.st_size\fP; \fIstatx.stx_size\fP - +.IP This field gives the size of the file (if it is a regular file or a symbolic link) in bytes. The size of a symbolic link is the length of the pathname @@ -113,20 +113,20 @@ it contains, without a terminating null byte. .TP Preferred block size for I/O \fIstat.st_blksize\fP; \fIstatx.stx_blksize\fP - +.IP This field gives the "preferred" blocksize for efficient filesystem I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.) .TP Number of blocks allocated to the file \fIstat.st_blocks\fP; \fIstatx.stx_size\fP - +.IP This field indicates the number of blocks allocated to the file, 512-byte units, (This may be smaller than .IR st_size /512 when the file has holes.) - +.IP The POSIX.1 standard notes .\" Rationale for sys/stat.h in POSIX.1-2008 that the unit for the @@ -140,7 +140,7 @@ Furthermore, the unit may differ on a per-filesystem basis. .TP Last access timestamp (atime) \fIstat.st_atime\fP; \fIstatx.stx_atime\fP - +.IP This is the file's last access timestamp. It is changed by file accesses, for example, by .BR execve (2), @@ -153,7 +153,7 @@ and Other interfaces, such as .BR mmap (2), may or may not update the atime timestamp - +.IP Some filesystem types allow mounting in such a way that file and/or directory accesses do not cause an update of the atime timestamp. (See @@ -173,17 +173,17 @@ flag; see .TP File creation (birth) timestamp (btime) (not returned in the \fIstat\fP structure); \fIstatx.stx_btime\fP - +.IP The file's creation timestamp. This is set on file creation and not changed subsequently. - +.IP The btime timestamp was not historically present on UNIX systems and is not currently supported by most Linux filesystems. .\" FIXME Is it supported on ext4 and XFS? .TP Last modification timestamp (mtime) \fIstat.st_atime\fP; \fIstatx.stx_mtime\fP - +.IP This is the file's last modification timestamp. It is changed by file modifications, for example, by .BR mknod (2), @@ -201,7 +201,7 @@ changed for changes in owner, group, hard link count, or mode. .TP Last status change timestamp (ctime) \fIstat.st_ctime\fP; \fIstatx.stx_ctime\fP - +.IP This is the file's last status change timestamp. It is changed by writing or by setting inode information (i.e., owner, group, link count, mode, etc.). @@ -225,7 +225,7 @@ field (for the .I statx.stx_mode field) contains the file type and mode. - +.PP POSIX refers to the .I stat.st_mode bits corresponding to the mask @@ -254,7 +254,7 @@ S_IFIFO 0010000 FIFO .in .PP Thus, to test for a regular file (for example), one could write: - +.PP .nf .in +4n stat(pathname, &sb); @@ -293,7 +293,7 @@ socket? (Not in POSIX.1-1996.) .RE .PP The preceding code snippet could thus be rewritten as: - +.PP .nf .in +4n stat(pathname, &sb); @@ -319,7 +319,7 @@ and are provided if .BR _XOPEN_SOURCE is defined. - +.PP The definition of .BR S_IFSOCK can also be exposed either by defining @@ -328,7 +328,7 @@ with a value of 500 or greater or (since glibc 2.24) by defining both .BR _XOPEN_SOURCE and .BR _XOPEN_SOURCE_EXTENDED . - +.PP The definition of .BR S_ISSOCK () is exposed if any of the following feature test macros is defined: @@ -424,7 +424,7 @@ and so on. The .BR S_IF* constants are present in POSIX.1-2001 and later. - +.PP The .BR S_ISLNK () and diff --git a/man7/inotify.7 b/man7/inotify.7 index a2d86cf97..524782969 100644 --- a/man7/inotify.7 +++ b/man7/inotify.7 @@ -35,7 +35,7 @@ Inotify can be used to monitor individual files, or to monitor directories. When a directory is monitored, inotify will return events for the directory itself, and for files inside the directory. - +.PP The following system calls are used with this API: .IP * 3 .BR inotify_init (2) @@ -99,7 +99,7 @@ in which case the call fails with the error .BR EINTR ; see .BR signal (7)). - +.PP Each successful .BR read (2) returns a buffer containing one or more of the following structures: @@ -120,15 +120,15 @@ struct inotify_event { }; .fi .in - +.PP .I wd identifies the watch for which this event occurs. It is one of the watch descriptors returned by a previous call to .BR inotify_add_watch (2). - +.PP .I mask contains bits that describe the event that occurred (see below). - +.PP .I cookie is a unique integer that connects related events. Currently, this is used only for rename events, and @@ -140,7 +140,7 @@ events to be connected by the application. For all other event types, .I cookie is set to 0. - +.PP The .I name field is present only when an event is returned @@ -149,7 +149,7 @@ it identifies the filename within to the watched directory. This filename is null-terminated, and may include further null bytes (\(aq\\0\(aq) to align subsequent reads to a suitable address boundary. - +.PP The .I len field counts all of the bytes in @@ -159,7 +159,7 @@ the length of each .I inotify_event structure is thus .IR "sizeof(struct inotify_event)+len" . - +.PP The behavior when the buffer given to .BR read (2) is too small to return information about the next event depends @@ -170,9 +170,9 @@ returns 0; since kernel 2.6.21, fails with the error .BR EINVAL . Specifying a buffer of size - +.PP sizeof(struct inotify_event) + NAME_MAX + 1 - +.PP will be sufficient to read at least one event. .SS inotify events The @@ -274,7 +274,7 @@ Inotify monitoring is inode-based: when monitoring a file (but not when monitoring the directory containing a file), an event can be generated for activity on any link to the file (in the same or a different directory). - +.PP When monitoring a directory: .IP * 3 the events marked above with an asterisk (*) can occur both @@ -288,7 +288,7 @@ when monitoring a directory, events are not generated for the files inside the directory when the events are performed via a pathname (i.e., a link) that lies outside the monitored directory. - +.PP When events are generated for objects inside a watched directory, the .I name field in the returned @@ -302,7 +302,7 @@ This macro can be used as the .I mask argument when calling .BR inotify_add_watch (2). - +.PP Two additional convenience macros are defined: .RS 4 .TP @@ -582,7 +582,7 @@ Inotify file descriptors can be monitored using and .BR epoll (7). When an event is available, the file descriptor indicates as readable. - +.PP Since Linux 2.6.25, signal-driven I/O notification is available for inotify file descriptors; see the discussion of @@ -611,7 +611,7 @@ and .B POLLIN is set in .IR si_band . - +.PP If successive output inotify events produced on the inotify file descriptor are identical (same .IR wd , @@ -624,13 +624,13 @@ older event has not yet been read (but see BUGS). This reduces the amount of kernel memory required for the event queue, but also means that an application can't use inotify to reliably count file events. - +.PP The events returned by reading from an inotify file descriptor form an ordered queue. Thus, for example, it is guaranteed that when renaming from one directory to another, events will be produced in the correct order on the inotify file descriptor. - +.PP The set of watch descriptors that is being monitored via an inotify file descriptor can be viewed via the entry for the inotify file descriptor in the process's @@ -651,7 +651,7 @@ In particular, there is no easy way for a process that is monitoring events via inotify to distinguish events that it triggers itself from those that are triggered by other processes. - +.PP Inotify reports only events that a user-space program triggers through the filesystem API. As a result, it does not catch remote events that occur @@ -664,28 +664,28 @@ Furthermore, various pseudo-filesystems such as and .IR /dev/pts are not monitorable with inotify. - +.PP The inotify API does not report file accesses and modifications that may occur because of .BR mmap (2), .BR msync (2), and .BR munmap (2). - +.PP The inotify API identifies affected files by filename. However, by the time an application processes an inotify event, the filename may already have been deleted or renamed. - +.PP The inotify API identifies events via watch descriptors. It is the application's responsibility to cache a mapping (if one is needed) between watch descriptors and pathnames. Be aware that directory renamings may affect multiple cached pathnames. - +.PP Inotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional watches must be created. This can take a significant amount time for large directory trees. - +.PP If monitoring an entire directory subtree, and a new subdirectory is created in that tree or an existing directory is renamed into that tree, @@ -694,7 +694,7 @@ new files (and subdirectories) may already exist inside the subdirectory. Therefore, you may want to scan the contents of the subdirectory immediately after adding the watch (and, if desired, recursively add watches for any subdirectories that it contains). - +.PP Note that the event queue can overflow. In this case, events are lost. Robust applications should handle the possibility of @@ -706,7 +706,7 @@ approach is to close the inotify file descriptor, empty the cache, create a new inotify file descriptor, and then re-create watches and cache entries for the objects to be monitored.) - +.PP If a filesystem is mounted on top of a monitored directory, no event is generated, and no events are generated for objects immediately under the new mount point. @@ -723,7 +723,7 @@ event pair that is generated by .BR rename (2) can be matched up via their shared cookie value. However, the task of matching has some challenges. - +.PP These two events are usually consecutive in the event stream available when reading from the inotify file descriptor. However, this is not guaranteed. @@ -740,7 +740,7 @@ inserted into the queue: there may be a brief interval where the has appeared, but the .B IN_MOVED_TO has not. - +.PP Matching up the .B IN_MOVED_FROM and @@ -765,7 +765,7 @@ then those watch descriptors will be inconsistent with the watch descriptors in any pending events. (Re-creating the inotify file descriptor and rebuilding the cache may be useful to deal with this scenario.) - +.PP Applications should also allow for the possibility that the .B IN_MOVED_FROM event was the last event that could fit in the buffer @@ -793,7 +793,7 @@ calls to generate .B IN_MODIFY events. - +.PP .\" FIXME . kernel commit 611da04f7a31b2208e838be55a42c7a1310ae321 .\" implies that unmount events were buggy 2.6.11 to 2.6.36 .\" @@ -801,7 +801,7 @@ In kernels before 2.6.16, the .B IN_ONESHOT .I mask flag does not work. - +.PP As originally designed and implemented, the .B IN_ONESHOT flag did not cause an @@ -811,7 +811,7 @@ However, as an unintended effect of other changes, since Linux 2.6.36, an .B IN_IGNORED event is generated in this case. - +.PP Before kernel 2.6.25, .\" commit 1c17d18e3775485bf1e0ce79575eb637a94494a2 the kernel code that was intended to coalesce successive identical events @@ -820,7 +820,7 @@ if the older had not yet been read) instead checked if the most recent event could be coalesced with the .I oldest unread event. - +.PP When a watch descriptor is removed by calling .BR inotify_rm_watch (2) (or because a watch file is deleted or the filesystem @@ -1089,6 +1089,6 @@ main(int argc, char* argv[]) .BR read (2), .BR stat (2), .BR fanotify (7) - +.PP .IR Documentation/filesystems/inotify.txt in the Linux kernel source tree diff --git a/man7/keyrings.7 b/man7/keyrings.7 index de25b2806..2ff911ae6 100644 --- a/man7/keyrings.7 +++ b/man7/keyrings.7 @@ -25,7 +25,7 @@ those objects and also use the facility for their own purposes; see .BR request_key (2), and .BR keyctl (2). - +.PP A library and some user-space utilities are provided to allow access to the facility. See @@ -48,7 +48,7 @@ Type A key's type defines what sort of data can be held in the key, how the proposed content of the key will be parsed, and how the payload will be used. - +.IP There are a number of general-purpose types available, plus some specialist types defined by specific kernel components. .TP @@ -65,7 +65,7 @@ instantiation of a key if that key wasn't already known to the kernel when it was requested. For further details, see .BR request_key (2). - +.IP A key's payload can be read and updated if the key type supports it and if suitable permission is granted to the caller. .TP @@ -78,7 +78,7 @@ and there is an additional category\(empossessor\(embeyond the usual user, group, and other (see .IR Possession , below). - +.IP Note that keys are quota controlled, since they require unswappable kernel memory. The owning user ID specifies whose quota is to be debited. @@ -113,7 +113,7 @@ to other keys (including other keyrings), analogous to a directory holding links to files. The main purpose of a keyring is to prevent other keys from being garbage collected because nothing refers to them. - +.IP Keyrings with descriptions (names) that begin with a period (\(aq.\(aq) are reserved to the implementation. .TP @@ -121,10 +121,10 @@ that begin with a period (\(aq.\(aq) are reserved to the implementation. This is a general-purpose key type. The key is kept entirely within kernel memory. The payload may be read and updated by user-space applications. - +.IP The payload for keys of this type is a blob of arbitrary data of up to 32,767 bytes. - +.IP The description may be any valid string, though it is preferred that it start with a colon-delimited prefix representing the service to which the key is of interest @@ -149,7 +149,7 @@ This key type is similar to the .I """user""" key type, but it may hold a payload of up to 1 MiB in size. This key type is useful for purposes such as holding Kerberos ticket caches. - +.IP The payload data may be stored in a tmpfs filesystem, rather than in kernel memory, if the data size exceeds the overhead of storing the data in the filesystem. @@ -165,7 +165,7 @@ thereby preventing it from being written unencrypted into swap space. There are more specialized key types available also, but they aren't discussed here because they aren't intended for normal user-space use. - +.PP Key type names that begin with a period (\(aq.\(aq) are reserved to the implementation. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" @@ -208,13 +208,13 @@ for more information. To prevent a key from being garbage collected, it must anchored to keep its reference count elevated when it is not in active use by the kernel. - +.PP Keyrings are used to anchor other keys: each link is a reference on a key. Note that keyrings themselves are just keys and are also subject to the same anchoring requirement to prevent them being garbage collected. - +.PP The kernel makes available a number of anchor keyrings. Note that some of these keyrings will be created only when first accessed. .TP @@ -233,7 +233,7 @@ the the .BR thread-keyring (7) (specific to a particular thread). - +.IP As an alternative to using the actual keyring IDs, in calls to .BR add_key (2), @@ -253,7 +253,7 @@ Each UID known to the kernel has a record that contains two keyrings: the and the .BR user-session-keyring (7). These exist for as long as the UID record in the kernel exists. - +.IP As an alternative to using the actual keyring IDs, in calls to .BR add_key (2), @@ -265,7 +265,7 @@ the special keyring values and .BR KEY_SPEC_USER_SESSION_KEYRING can be used to refer to the caller's own instances of these keyrings. - +.IP A link to the user keyring is placed in a new session keyring by .BR pam_keyinit (8) when a new login session is initiated. @@ -528,18 +528,18 @@ The thread need not possess the key for it to be visible in this file. .\" .\"Possibly it shouldn't be, but for now it is. .\" - +.IP The only keys included in the list are those that grant .I view permission to the reading process (regardless of whether or not it possesses them). LSM security checks are still performed, and may filter out further keys that the process is not authorized to view. - +.IP An example of the data that one might see in this file (with the columns numbered for easy reference below) is the following: - +.IP .nf .in 0n (1) (2) (3)(4) (5) (6) (7) (8) (9) @@ -554,7 +554,7 @@ is the following: 3ce56aea I--Q--- 5 perm 3f030000 1000 1000 keyring _ses: 1 .in .fi - +.IP The fields shown in each line of this file are as follows: .RS .TP @@ -612,7 +612,7 @@ Permissions (5) The key permissions, expressed as four hexadecimal bytes containing, from left to right, the possessor, user, group, and other permissions. Within each byte, the permission bits are as follows: - +.IP .PD 0 .RS 12 .TP @@ -651,9 +651,9 @@ Description (9) The key description (name). This field contains descriptive information about the key. For most key types, it has the form - +.IP name[: extra\-info] - +.IP The .I name subfield is the key's description (name). @@ -690,9 +690,9 @@ key type (authorization key; see .BR request_key (2)), the description field has the form shown in the following example: - +.IP key:c9a9b19 pid:28880 ci:10 - +.IP The three subfields are as follows: .RS .TP 5 @@ -713,7 +713,7 @@ be instantiated This file lists various information for each user ID that has at least one key on the system. An example of the data that one might see in this file is the following: - +.IP .nf .in +4n 0: 10 9/9 2/1000000 22/25000000 @@ -721,7 +721,7 @@ An example of the data that one might see in this file is the following: 1000: 11 11/11 10/200 271/20000 .in .fi - +.IP The fields shown in each line are as follows: .RS .TP @@ -755,7 +755,7 @@ of time where user space can see an error (respectively and .BR EKEYEXPIRED ) that indicates what happened to the key. - +.IP The default value in this file is 300 (i.e., 5 minutes). .TP .IR /proc/sys/kernel/keys/persistent_keyring_expiry " (since Linux 3.13)" @@ -768,7 +768,7 @@ or the .BR keyctl (2) .B KEYCTL_GET_PERSISTENT operation.) - +.IP The default value in this file is 259200 (i.e., 3 days). .PP The following files (which are writable by privileged processes) @@ -780,21 +780,21 @@ and number of bytes of data that can be stored in key payloads: .\" Previously: KEYQUOTA_MAX_BYTES 10000 This is the maximum number of bytes of data that a nonroot user can hold in the payloads of the keys owned by the user. - +.IP The default value in this file is 20,000. .TP .IR /proc/sys/kernel/keys/maxkeys " (since Linux 2.6.26)" .\" commit 0b77f5bfb45c13e1e5142374f9d6ca75292252a4 .\" Previously: KEYQUOTA_MAX_KEYS 100 This is the maximum number of keys that a nonroot user may own. - +.IP The default value in this file is 200. .TP .IR /proc/sys/kernel/keys/root_maxbytes " (since Linux 2.6.26)" This is the maximum number of bytes of data that the root user (UID 0 in the root user namespace) can hold in the payloads of the keys owned by root. - +.IP .\"738c5d190f6540539a04baf36ce21d46b5da04bd The default value in this file is 25,000,000 (20,000 before Linux 3.17). .\" commit 0b77f5bfb45c13e1e5142374f9d6ca75292252a4 @@ -804,7 +804,7 @@ The default value in this file is 25,000,000 (20,000 before Linux 3.17). This is the maximum number of keys that the root user (UID 0 in the root user namespace) may own. - +.IP .\"738c5d190f6540539a04baf36ce21d46b5da04bd The default value in this file is 1,000,000 (200 before Linux 3.17). .PP diff --git a/man7/libc.7 b/man7/libc.7 index e0f74a20a..66b06a267 100644 --- a/man7/libc.7 +++ b/man7/libc.7 @@ -51,7 +51,7 @@ available via the command Release 1.0 of glibc was made in September 1992. (There were earlier 0.x releases.) The next major release of glibc was 2.0, at the beginning of 1997. - +.PP The pathname .I /lib/libc.so.6 (or something similar) is normally a symbolic link that @@ -73,7 +73,7 @@ this version used the shared library soname .IR libc.so.5 . For a while, Linux libc was the standard C library in many Linux distributions. - +.PP However, notwithstanding the original motivations of the Linux libc effort, by the time glibc 2.0 was released (in 1997), it was clearly superior to Linux libc, @@ -82,7 +82,7 @@ soon switched back to glibc. To avoid any confusion with Linux libc versions, glibc 2.0 and later used the shared library soname .IR libc.so.6 . - +.PP Since the switch from Linux libc to glibc 2.0 occurred long ago, .I man-pages no longer takes care to document Linux libc details. diff --git a/man7/mailaddr.7 b/man7/mailaddr.7 index a26a45ea9..b8542cd06 100644 --- a/man7/mailaddr.7 +++ b/man7/mailaddr.7 @@ -116,7 +116,7 @@ The "postmaster" address is not case sensitive. .BR aliases (5), .BR forward (5), .BR sendmail (8) - +.PP .UR http://www.ietf.org\:/rfc\:/rfc5322.txt IETF RFC\ 5322 .UE diff --git a/man7/mount_namespaces.7 b/man7/mount_namespaces.7 index cbc237585..f5f2616e2 100644 --- a/man7/mount_namespaces.7 +++ b/man7/mount_namespaces.7 @@ -29,12 +29,12 @@ mount_namespaces \- overview of Linux mount namespaces .SH DESCRIPTION For an overview of namespaces, see .BR namespaces (7). - +.PP Mount namespaces provide isolation of the list of mount points seen by the processes in each namespace instance. Thus, the processes in each of the mount namespace instances will see distinct single-directory hierarchies. - +.PP The views provided by the .IR /proc/[pid]/mounts , .IR /proc/[pid]/mountinfo , @@ -47,7 +47,7 @@ correspond to the mount namespace in which the process with the PID resides. (All of the processes that reside in the same mount namespace will see the same view in these files.) - +.PP When a process creates a new mount namespace using .BR clone (2) or @@ -146,7 +146,7 @@ between namespaces (or, more precisely, between the members of a .IR "peer group" that are propagating events to one another). - +.PP Each mount point is marked (via .BR mount (2)) as having one of the following @@ -170,7 +170,7 @@ Mount and unmount events do not propagate into or out of this mount point. Mount and unmount events propagate into this mount point from a (master) shared peer group. Mount and unmount events under this mount point do not propagate to any peer. - +.IP Note that a mount point can be the slave of another peer group while at the same time sharing mount and unmount events with a peer group of which it is a member. @@ -184,7 +184,7 @@ Attempts to bind mount this mount with the .BR MS_BIND flag) will fail. - +.IP When a recursive bind mount .RB ( mount (2) with the @@ -198,13 +198,13 @@ when replicating that subtree to produce the target subtree. .PP For a discussion of the propagation type assigned to a new mount, see NOTES. - +.PP The propagation type is a per-mount-point setting; some mount points may be marked as shared (with each shared mount point being a member of a distinct peer group), while others are private (or slaved or unbindable). - +.PP Note that a mount's propagation type determines whether mounts and unmounts of mount points .I "immediately under" @@ -215,7 +215,7 @@ What happens if the mount point itself is unmounted is determined by the propagation type that is in effect for the .I parent of the mount point. - +.PP Members are added to a .IR "peer group" when a mount point is marked as shared and either: @@ -230,7 +230,7 @@ A mount ceases to be a member of a peer group when either the mount is explicitly unmounted, or when the mount is implicitly unmounted because a mount namespace is removed (because it has no more member processes). - +.PP The propagation type of the mount points in a mount namespace can be discovered via the "optional fields" exposed in .IR /proc/[pid]/mountinfo . @@ -283,7 +283,7 @@ Suppose that on a terminal in the initial mount namespace, we mark one mount point as shared and another as private, and then view the mounts in .IR /proc/self/mountinfo : - +.PP .nf .in +4n sh1# \fBmount \-\-make\-shared /mntS\fP @@ -293,7 +293,7 @@ sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP 83 61 8:15 / /mntP rw,relatime .in .fi - +.PP From the .IR /proc/self/mountinfo output, we see that @@ -310,18 +310,18 @@ and is the root directory, .IR / , which is mounted as private: - +.PP .nf .in +4n sh1# \fBcat /proc/self/mountinfo | awk \(aq$1 == 61\(aq | sed \(aqs/ \- .*//\(aq\fP 61 0 8:2 / / rw,relatime .in .fi - +.PP On a second terminal, we create a new mount namespace where we run a second shell and inspect the mounts: - +.PP .nf .in +4n $ \fBPS1=\(aqsh2# \(aq sudo unshare \-m \-\-propagation unchanged sh\fP @@ -330,7 +330,7 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP 225 145 8:15 / /mntP rw,relatime .in .fi - +.PP The new mount namespace received a copy of the initial mount namespace's mount points. These new mount points maintain the same propagation types, @@ -342,13 +342,13 @@ option prevents from marking all mounts as private when creating a new mount namespace, .\" Since util-linux 2.27 which it does by default.) - +.PP In the second terminal, we then create submounts under each of .IR /mntS and .IR /mntP and inspect the set-up: - +.PP .nf .in +4n sh2# \fBmkdir /mntS/a\fP @@ -362,13 +362,13 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP 230 225 8:23 / /mntP/b rw,relatime .in .fi - +.PP From the above, it can be seen that .IR /mntS/a was created as shared (inheriting this setting from its parent mount) and .IR /mntP/b was created as a private mount. - +.PP Returning to the first terminal and inspecting the set-up, we see that the new mount created under the shared mount point .IR /mntS @@ -376,7 +376,7 @@ propagated to its peer mount (in the initial mount namespace), but the new mount created under the private mount point .IR /mntP did not propagate: - +.PP .nf .in +4n sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP @@ -395,10 +395,10 @@ an optical disk is mounted in the master shared peer group (in another mount namespace), but want to prevent mount and unmount events under the slave mount from having side effects in other namespaces. - +.PP We can demonstrate the effect of slaving by first marking two mount points as shared in the initial mount namespace: - +.PP .nf .in +4n sh1# \fBmount \-\-make\-shared /mntX\fP @@ -408,10 +408,10 @@ sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP 133 83 8:22 / /mntY rw,relatime shared:2 .in .fi - +.PP On a second terminal, we create a new mount namespace and inspect the mount points: - +.PP .nf .in +4n sh2# \fBunshare \-m \-\-propagation unchanged sh\fP @@ -420,9 +420,9 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP 169 167 8:22 / /mntY rw,relatime shared:2 .in .fi - +.PP In the new mount namespace, we then mark one of the mount points as a slave: - +.PP .nf .in +4n sh2# \fBmount \-\-make\-slave /mntY\fP @@ -431,17 +431,17 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP 169 167 8:22 / /mntY rw,relatime master:2 .in .fi - +.PP From the above output, we see that .IR /mntY is now a slave mount that is receiving propagation events from the shared peer group with the ID 2. - +.PP Continuing in the new namespace, we create submounts under each of .IR /mntX and .IR /mntY : - +.PP .nf .in +4n sh2# \fBmkdir /mntX/a\fP @@ -450,7 +450,7 @@ sh2# \fBmkdir /mntY/b\fP sh2# \fBmount /dev/sda5 /mntY/b\fP .in .fi - +.PP When we inspect the state of the mount points in the new mount namespace, we see that .IR /mntX/a @@ -458,7 +458,7 @@ was created as a new shared mount (inheriting the "shared" setting from its parent mount) and .IR /mntY/b was created as a private mount: - +.PP .nf .in +4n sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP @@ -468,7 +468,7 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP 175 169 8:5 / /mntY/b rw,relatime .in .fi - +.PP Returning to the first terminal (in the initial mount namespace), we see that the mount .IR /mntX/a @@ -477,7 +477,7 @@ propagated to the peer (the shared but the mount .IR /mntY/b was not propagated: - +.PP .nf .in +4n sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP @@ -486,11 +486,11 @@ sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP 174 132 8:3 / /mntX/a rw,relatime shared:3 .in .fi - +.PP Now we create a new mount point under .IR /mntY in the first shell: - +.PP .nf .in +4n sh1# \fBmkdir /mntY/c\fP @@ -502,12 +502,12 @@ sh1# \fBcat /proc/self/mountinfo | grep '/mnt' | sed 's/ \- .*//'\fP 178 133 8:1 / /mntY/c rw,relatime shared:4 .in .fi - +.PP When we examine the mount points in the second mount namespace, we see that in this case the new mount has been propagated to the slave mount point, and that the new mount is itself a slave mount (to peer group 4): - +.PP .nf .in +4n sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP @@ -524,9 +524,9 @@ One of the primary purposes of unbindable mounts is to avoid the "mount point explosion" problem when repeatedly performing bind mounts of a higher-level subtree at a lower-level mount point. The problem is illustrated by the following shell session. - +.PP Suppose we have a system with the following mount points: - +.PP .nf .in +4n # \fBmount | awk \(aq{print $1, $2, $3}\(aq\fP @@ -535,11 +535,11 @@ Suppose we have a system with the following mount points: /dev/sdb7 on /mntY .in .fi - +.PP Suppose furthermore that we wish to recursively bind mount the root directory under several users' home directories. We do this for the first user, and inspect the mount points: - +.PP .nf .in +4n # \fBmount \-\-rbind / /home/cecilia/\fP @@ -552,10 +552,10 @@ We do this for the first user, and inspect the mount points: /dev/sdb7 on /home/cecilia/mntY .in .fi - +.PP When we repeat this operation for the second user, we start to see the explosion problem: - +.PP .nf .in +4n # \fBmount \-\-rbind / /home/henry\fP @@ -574,7 +574,7 @@ we start to see the explosion problem: /dev/sdb7 on /home/henry/home/cecilia/mntY .in .fi - +.PP Under .IR /home/henry , we have not only recursively added the @@ -586,7 +586,7 @@ mounts, but also the recursive mounts of those directories under that were created in the previous step. Upon repeating the step for a third user, it becomes obvious that the explosion is exponential in nature: - +.PP .nf .in +4n # \fBmount \-\-rbind / /home/otto\fP @@ -617,21 +617,21 @@ it becomes obvious that the explosion is exponential in nature: /dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY .in .fi - +.PP The mount explosion problem in the above scenario can be avoided by making each of the new mounts unbindable. The effect of doing this is that recursive mounts of the root directory will not replicate the unbindable mounts. We make such a mount for the first user: - +.PP .nf .in +4n # \fBmount \-\-rbind \-\-make\-unbindable / /home/cecilia\fP .in .fi - +.PP Before going further, we show that unbindable mounts are indeed unbindable: - +.PP .nf .in +4n # \fBmkdir /mntZ\fP @@ -643,21 +643,21 @@ mount: wrong fs type, bad option, bad superblock on /home/cecilia, dmesg | tail or so. .in .fi - +.PP Now we create unbindable recursive bind mounts for the other two users: - +.PP .nf .in +4n # \fBmount \-\-rbind \-\-make\-unbindable / /home/henry\fP # \fBmount \-\-rbind \-\-make\-unbindable / /home/otto\fP .in .fi - +.PP Upon examining the list of mount points, we see there has been no explosion of mount points, because the unbindable mounts were not replicated under each user's directory: - +.PP .nf .in +4n # \fBmount | awk \(aq{print $1, $2, $3}\(aq\fP @@ -695,7 +695,7 @@ slave+shared slave+shared slave priv unbind private shared priv [2] priv unbind unbindable shared unbind [2] priv unbind .TE - +.sp 1 Note the following details to the table: .IP [1] 4 If a shared mount is the only mount in its peer group, @@ -705,9 +705,9 @@ Slaving a nonshared mount has no effect on the mount. .\" .SS Bind (MS_BIND) semantics Suppose that the following command is performed: - +.PP mount \-\-bind A/a B/b - +.PP Here, .I A is the source mount point, @@ -727,7 +727,7 @@ depends on the propagation types of the mount points and .IR B , and is summarized in the following table. - +.PP .TS lb2 lb1 lb2 lb2 lb2 lb0 lb2 lb1 lb2 lb2 lb2 lb0 @@ -738,20 +738,20 @@ _ dest(B) shared | shared shared slave+shared invalid nonshared | shared private slave invalid .TE - +.sp 1 Note that a recursive bind of a subtree follows the same semantics as for a bind operation on each mount in the subtree. (Unbindable mounts are automatically pruned at the target mount point.) - +.PP For further details, see .I Documentation/filesystems/sharedsubtree.txt in the kernel source tree. .\" .SS Move (MS_MOVE) semantics Suppose that the following command is performed: - +.PP mount \-\-move A B/b - +.PP Here, .I A is the source mount point, @@ -767,7 +767,7 @@ depends on the propagation types of the mount points and .IR B , and is summarized in the following table. - +.PP .TS lb2 lb1 lb2 lb2 lb2 lb0 lb2 lb1 lb2 lb2 lb2 lb0 @@ -778,18 +778,18 @@ _ dest(B) shared | shared shared slave+shared invalid nonshared | shared private slave unbindable .TE - +.sp 1 Note: moving a mount that resides under a shared mount is invalid. - +.PP For further details, see .I Documentation/filesystems/sharedsubtree.txt in the kernel source tree. .\" .SS Mount semantics Suppose that we use the following command to create a mount point: - +.PP mount device B/b - +.PP Here, .I B is the destination mount point, and @@ -804,9 +804,9 @@ is considered always to be private. .\" .SS Unmount semantics Suppose that we use the following command to tear down a mount point: - +.PP unmount A - +.PP Here, .I A is a mount point on @@ -835,7 +835,7 @@ record in cases where a process can't see a slave's immediate master the filesystem root directory) and so cannot determine the chain of propagation between the mounts it can see. - +.PP In the following example, we first create a two-link master-slave chain between the mounts .IR /mnt , @@ -850,7 +850,7 @@ mount point unreachable from the root directory, creating a situation where the master of .IR /mnt/tmp/etc is not reachable from the (new) root directory of the process. - +.PP First, we bind mount the root directory onto .IR /mnt and then bind mount @@ -863,7 +863,7 @@ the .BR proc (5) filesystem remains visible at the correct location in the chroot-ed environment. - +.PP .nf .in +4n # \fBmkdir \-p /mnt/proc\fP @@ -871,11 +871,11 @@ in the chroot-ed environment. # \fBmount \-\-bind /proc /mnt/proc\fP .in .fi - +.PP Next, we ensure that the .IR /mnt mount is a shared mount in a new peer group (with no peers): - +.PP .nf .in +4n # \fBmount \-\-make\-private /mnt\fP # Isolate from any previous peer group @@ -885,12 +885,12 @@ mount is a shared mount in a new peer group (with no peers): 248 239 0:4 / /mnt/proc ... shared:5 .in .fi - +.PP Next, we bind mount .IR /mnt/etc onto .IR /tmp/etc : - +.PP .nf .in +4n # \fBmkdir \-p /tmp/etc\fP @@ -901,7 +901,7 @@ onto 267 40 8:2 /etc /tmp/etc ... shared:102 .in .fi - +.PP Initially, these two mount points are in the same peer group, but we then make the .IR /tmp/etc @@ -911,7 +911,7 @@ and then make .IR /tmp/etc shared as well, so that it can propagate events to the next slave in the chain: - +.PP .nf .in +4n # \fBmount \-\-make\-slave /tmp/etc\fP @@ -922,7 +922,7 @@ so that it can propagate events to the next slave in the chain: 267 40 8:2 /etc /tmp/etc ... shared:105 master:102 .in .fi - +.PP Then we bind mount .IR /tmp/etc onto @@ -932,7 +932,7 @@ but we then make .IR /mnt/tmp/etc a slave of .IR /tmp/etc : - +.PP .nf .in +4n # \fBmkdir \-p /mnt/tmp/etc\fP @@ -952,23 +952,23 @@ is the master of the slave .IR /tmp/etc , which in turn is the master of the slave .IR /mnt/tmp/etc . - +.PP We then .BR chroot (1) to the .IR /mnt directory, which renders the mount with ID 267 unreachable from the (new) root directory: - +.PP .nf .in +4n # \fBchroot /mnt\fP .in .fi - +.PP When we examine the state of the mounts inside the chroot-ed environment, we see the following: - +.PP .nf .in +4n # \fBcat /proc/self/mountinfo | sed \(aqs/ \- .*//\(aq\fP @@ -977,7 +977,7 @@ we see the following: 273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102 .in .fi - +.PP Above, we see that the mount with ID 273 is a slave whose master is the peer group 105. The mount point for that master is unreachable, and so a @@ -1006,7 +1006,7 @@ then the propagation type of the new mount is also Otherwise, the propagation type of the new mount is .BR MS_PRIVATE . But see also NOTES. - +.PP Notwithstanding the fact that the default propagation type for new mount points is in many cases .BR MS_PRIVATE , @@ -1019,7 +1019,7 @@ automatically remounts all mount points as on system startup. Thus, on most modern systems, the default propagation type is in practice .BR MS_SHARED . - +.PP Since, when one uses .BR unshare (1) to create a mount namespace, @@ -1034,14 +1034,14 @@ by making all mount points private in the new namespace. That is, .BR unshare (1) performs the equivalent of the following in the new mount namespace: - +.PP mount \-\-make\-rprivate / - +.PP To prevent this, one can use the .IR "\-\-propagation\ unchanged" option to .BR unshare (1). - +.PP For a discussion of propagation types when moving mounts .RB ( MS_MOVE ) and creating bind mounts @@ -1058,6 +1058,6 @@ see .BR proc (5), .BR namespaces (7), .BR user_namespaces (7) - +.PP .IR Documentation/filesystems/sharedsubtree.txt in the kernel source tree. diff --git a/man7/mq_overview.7 b/man7/mq_overview.7 index 741e9609d..9713d3598 100644 --- a/man7/mq_overview.7 +++ b/man7/mq_overview.7 @@ -34,7 +34,7 @@ This API is distinct from that provided by System V message queues .BR msgsnd (2), .BR msgrcv (2), etc.), but provides similar functionality. - +.PP Message queues are created and opened using .BR mq_open (3); this function returns a @@ -49,7 +49,7 @@ that is, a null-terminated string of up to followed by one or more characters, none of which are slashes. Two processes can operate on the same queue by passing the same name to .BR mq_open (3). - +.PP Messages are transferred to and from a queue using .BR mq_send (3) and @@ -65,7 +65,7 @@ and A process can request asynchronous notification of the arrival of a message on a previously empty queue using .BR mq_notify (3). - +.PP A message queue descriptor is a reference to an .I "open message queue description" (cf. @@ -78,7 +78,7 @@ as the corresponding message queue descriptors in the parent. Corresponding message queue descriptors in the two processes share the flags .RI ( mq_flags ) that are associated with the open message queue description. - +.PP Each message has an associated .IR priority , and messages are always delivered to the receiving process @@ -184,7 +184,7 @@ limit is ignored for privileged processes but the .BR HARD_MSGMAX ceiling is nevertheless imposed. - +.IP The definition of .BR HARD_MSGMAX has changed across kernel versions: @@ -294,14 +294,14 @@ commands: .fi .in The sticky bit is automatically enabled on the mount directory. - +.PP After the filesystem has been mounted, the message queues on the system can be viewed and manipulated using the commands usually used for files (e.g., .BR ls (1) and .BR rm (1)). - +.PP The contents of each file in the directory consist of a single line containing information about the queue: .in +4n @@ -345,7 +345,7 @@ This means that a message queue descriptor can be monitored using or .BR epoll (7). This is not portable. - +.PP The close-on-exec flag (see .BR open (2)) is automatically set on the file descriptor returned by @@ -364,7 +364,7 @@ POSIX message queues provide a better designed interface than System V message queues; on the other hand POSIX message queues are less widely available (especially on older systems) than System V message queues. - +.PP Linux does not currently (2.6.26) support the use of access control lists (ACLs) for POSIX message queues. .SH BUGS @@ -376,7 +376,7 @@ limit could be raised, and the ceiling was enforced even for privileged processes. This ceiling value was removed in Linux 3.14, and patches to stable kernels 3.5.x to 3.13.x also removed the ceiling. - +.PP As originally implemented (and documented), the QSIZE field displayed the total number of (user-supplied) bytes in all messages in the message queue. diff --git a/man7/nptl.7 b/man7/nptl.7 index c3da60770..0133c0a10 100644 --- a/man7/nptl.7 +++ b/man7/nptl.7 @@ -40,7 +40,7 @@ One of these signals is used to support thread cancellation and POSIX timers the other is used as part of a mechanism that ensures all threads in a process always have the same UIDs and GIDs, as required by POSIX. These signals cannot be used in applications. - +.PP To prevent accidental use of these signals in applications, which might interfere with the operation of the NPTL implementation, various glibc library functions and system call wrapper functions @@ -86,7 +86,7 @@ the NPTL implementation wraps all of the system calls that change process credentials with functions that, in addition to invoking the underlying system call, arrange for all other threads in the process to also change their credentials. - +.PP The implementation of each of these system calls involves the use of a real-time signal that is sent (using .BR tgkill (2)) @@ -96,7 +96,7 @@ saves the new credential(s) and records the system call being employed in a global buffer. A signal handler in the receiving thread(s) fetches this information and then uses the same system call to change its credentials. - +.PP Wrapper functions employing this technique are provided for .BR setgid (2), .BR setuid (2), diff --git a/man7/numa.7 b/man7/numa.7 index e3c250b93..b6d81029d 100644 --- a/man7/numa.7 +++ b/man7/numa.7 @@ -55,11 +55,11 @@ see "Library Support" below. .\" See also Changelog-2.6.14 This file displays information about a process's NUMA memory policy and allocation. - +.PP Each line contains information about a memory range used by the process, displaying\(emamong other information\(emthe effective memory policy for that memory range and on which nodes the pages have been allocated. - +.PP .I numa_maps is a read-only file. When @@ -67,14 +67,14 @@ When is read, the kernel will scan the virtual address space of the process and report how memory is used. One line is displayed for each unique memory range of the process. - +.PP The first field of each line shows the starting address of the memory range. This field allows a correlation with the contents of the .I /proc//maps file, which contains the end address of the range and other information, such as the access permissions and sharing. - +.PP The second field shows the memory policy currently in effect for the memory range. Note that the effective policy is not necessarily the policy @@ -82,7 +82,7 @@ installed by the process for that memory range. Specifically, if the process installed a "default" policy for that range, the effective policy for that range will be the process policy, which may or may not be "default". - +.PP The rest of the line contains information about the pages allocated in the memory range, as follows: .TP @@ -163,7 +163,7 @@ and the required header are available in the .I numactl package. - +.PP However, applications should not use these system calls directly. Instead, the higher level interface provided by the .BR numa (3) diff --git a/man7/path_resolution.7 b/man7/path_resolution.7 index 4139e2366..9edcc2b97 100644 --- a/man7/path_resolution.7 +++ b/man7/path_resolution.7 @@ -47,7 +47,7 @@ system call that had the .B CLONE_NEWNS flag set.) This handles the \(aq/\(aq part of the pathname. - +.PP If the pathname does not start with the \(aq/\(aq character, the starting lookup directory of the resolution process is the current working directory of the process. @@ -55,7 +55,7 @@ directory of the process. It can be changed by use of the .BR chdir (2) system call.) - +.PP Pathnames starting with a \(aq/\(aq character are called absolute pathnames. Pathnames not starting with a \(aq/\(aq are called relative pathnames. .SS Step 2: walk along the path @@ -63,27 +63,27 @@ Set the current lookup directory to the starting lookup directory. Now, for each nonfinal component of the pathname, where a component is a substring delimited by \(aq/\(aq characters, this component is looked up in the current lookup directory. - +.PP If the process does not have search permission on the current lookup directory, an .B EACCES error is returned ("Permission denied"). - +.PP If the component is not found, an .B ENOENT error is returned ("No such file or directory"). - +.PP If the component is found, but is neither a directory nor a symbolic link, an .B ENOTDIR error is returned ("Not a directory"). - +.PP If the component is found and is a directory, we set the current lookup directory to that directory, and go to the next component. - +.PP If the component is found and is a symbolic link (symlink), we first resolve this symbolic link (with the current lookup directory as starting lookup directory). @@ -106,7 +106,7 @@ An .B ELOOP error is returned when the maximum is exceeded ("Too many levels of symbolic links"). - +.PP .\" .\" presently: max recursion depth during symlink resolution: 5 .\" max total number of symbolic links followed: 40 @@ -140,17 +140,17 @@ system calls. By convention, every directory has the entries "." and "..", which refer to the directory itself and to its parent directory, respectively. - +.PP The path resolution process will assume that these entries have their conventional meanings, regardless of whether they are actually present in the physical filesystem. - +.PP One cannot walk down past the root: "/.." is the same as "/". .SS Mount points After a "mount dev path" command, the pathname "path" refers to the root of the filesystem hierarchy on the device "dev", and no longer to whatever it referred to earlier. - +.PP One can walk out of a mounted filesystem: "path/.." refers to the parent directory of "path", outside of the filesystem hierarchy on "dev". @@ -196,16 +196,16 @@ effective group ID of the calling process, or is one of the supplementary group IDs of the calling process (as set by .BR setgroups (2)). When neither holds, the third group is used. - +.PP Of the three bits used, the first bit determines read permission, the second write permission, and the last execute permission in case of ordinary files, or search permission in case of directories. - +.PP Linux uses the fsuid instead of the effective user ID in permission checks. Ordinarily the fsuid will equal the effective user ID, but the fsuid can be changed by the system call .BR setfsuid (2). - +.PP (Here "fsuid" stands for something like "filesystem user ID". The concept was required for the implementation of a user space NFS server at a time when processes could send a signal to a process @@ -213,7 +213,7 @@ with the same effective user ID. It is obsolete now. Nobody should use .BR setfsuid (2).) - +.PP Similarly, Linux uses the fsgid ("filesystem group ID") instead of the effective group ID. See @@ -230,7 +230,7 @@ when accessing files. .\" on some implementations (e.g., Solaris, FreeBSD), .\" access(X_OK) by superuser will report success, regardless .\" of the file's execute permission bits. -- MTK (Oct 05) - +.PP On Linux, superuser privileges are divided into capabilities (see .BR capabilities (7)). Two capabilities are relevant for file permissions checks: @@ -238,13 +238,13 @@ Two capabilities are relevant for file permissions checks: and .BR CAP_DAC_READ_SEARCH . (A process has these capabilities if its fsuid is 0.) - +.PP The .B CAP_DAC_OVERRIDE capability overrides all permission checking, but grants execute permission only when at least one of the file's three execute permission bits is set. - +.PP The .B CAP_DAC_READ_SEARCH capability grants read and search permission diff --git a/man7/persistent-keyring.7 b/man7/persistent-keyring.7 index e910a3921..48e219b0c 100644 --- a/man7/persistent-keyring.7 +++ b/man7/persistent-keyring.7 @@ -21,7 +21,7 @@ The persistent keyring has a name (description) of the form where .I is the user ID of the corresponding user. - +.PP The persistent keyring may not be accessed directly, even by processes with the appropriate UID. .\" FIXME The meaning of the preceding sentence isn't clear. What is meant? @@ -31,30 +31,30 @@ by virtue of its possessor permits. This linking is done with the .BR keyctl_get_persistent (3) function. - +.PP If a persistent keyring does not exist when it is accessed by the .BR keyctl_get_persistent (3) operation, it will be automatically created. - +.PP Each time the .BR keyctl_get_persistent (3) operation is performed, the persistent key's expiration timer is reset to the value in: - +.PP /proc/sys/kernel/keys/persistent_keyring_expiry - +.PP Should the timeout be reached, the persistent keyring will be removed and everything it pins can then be garbage collected. The key will then be re-created on a subsequent call to .BR keyctl_get_persistent (3). - +.PP The persistent keyring is not directly searched by .BR request_key (2); it is searched only if it is linked into one of the keyrings that is searched by .BR request_key (2). - +.PP The persistent keyring is independent of .BR clone (2), .BR fork (2), @@ -74,7 +74,7 @@ The persistent keyring can thus be used to hold authentication tokens for processes that run without user interaction, such as programs started by .BR cron (8). - +.PP The persistent keyring is used to store UID-specific objects that themselves have limited lifetimes (e.g., kerberos tokens). If those tokens cease to be used diff --git a/man7/pid_namespaces.7 b/man7/pid_namespaces.7 index 3458a8021..794246fa6 100644 --- a/man7/pid_namespaces.7 +++ b/man7/pid_namespaces.7 @@ -30,14 +30,14 @@ pid_namespaces \- overview of Linux PID namespaces .SH DESCRIPTION For an overview of namespaces, see .BR namespaces (7). - +.PP PID namespaces isolate the process ID number space, meaning that processes in different PID namespaces can have the same PID. PID namespaces allow containers to provide functionality such as suspending/resuming the set of processes in the container and migrating the container to a new host while the processes inside the container maintain the same PIDs. - +.PP PIDs in a new PID namespace start at 1, somewhat like a standalone system, and calls to .BR fork (2), @@ -45,7 +45,7 @@ somewhat like a standalone system, and calls to or .BR clone (2) will produce processes with PIDs that are unique within the namespace. - +.PP Use of PID namespaces requires a kernel that is configured with the .B CONFIG_PID_NS option. @@ -72,7 +72,7 @@ in the same PID namespace employed the .BR prctl (2) .B PR_SET_CHILD_SUBREAPER command to mark itself as the reaper of orphaned descendant processes). - +.PP If the "init" process of a PID namespace terminates, the kernel terminates all of the processes in the namespace via a .BR SIGKILL @@ -99,13 +99,13 @@ terminates, then subsequent calls to .BR fork (2) will fail with .BR ENOMEM . - +.PP Only signals for which the "init" process has established a signal handler can be sent to the "init" process by other members of the PID namespace. This restriction applies even to privileged processes, and prevents other members of the PID namespace from accidentally killing the "init" process. - +.PP Likewise, a process in an ancestor namespace can\(emsubject to the usual permission checks described in .BR kill (2)\(emsend @@ -125,7 +125,7 @@ these signals are forcibly delivered when sent from an ancestor PID namespace. Neither of these signals can be caught by the "init" process, and so will result in the usual actions associated with those signals (respectively, terminating and stopping the process). - +.PP Starting with Linux 3.4, the .BR reboot (2) system call causes a signal to be sent to the namespace "init" process. @@ -150,7 +150,7 @@ Since Linux 3.7, .\" commit f2302505775fd13ba93f034206f1e2a587017929 .\" The kernel constant MAX_PID_NS_LEVEL the kernel limits the maximum nesting depth for PID namespaces to 32. - +.PP A process is visible to other processes in its PID namespace, and to the processes in each direct ancestor PID namespace going back to the root PID namespace. @@ -165,7 +165,7 @@ set nice values with .BR setpriority (2), etc.) only processes contained in its own PID namespace and in descendants of that namespace. - +.PP A process has one process ID in each of the layers of the PID namespace hierarchy in which is visible, and walking back though each direct ancestor namespace @@ -177,7 +177,7 @@ A call to .BR getpid (2) always returns the PID associated with the namespace in which the process was created. - +.PP Some processes in a PID namespace may have parents that are outside of the namespace. For example, the parent of the initial process in the namespace @@ -192,7 +192,7 @@ PID namespace from the caller of Calls to .BR getppid (2) for such processes return 0. - +.PP While processes may freely descend into child PID namespaces (e.g., using .BR setns (2) @@ -201,7 +201,7 @@ they may not move in the other direction. That is to say, processes may not enter any ancestor namespaces (parent, grandparent, etc.). Changing PID namespaces is a one-way operation. - +.PP The .BR NS_GET_PARENT .BR ioctl (2) @@ -231,7 +231,7 @@ because doing so would change the caller's idea of its own PID (as reported by .BR getpid ()), which would break many applications and libraries. - +.PP To put things another way: a process's PID namespace membership is determined when the process is created and cannot be changed thereafter. @@ -260,7 +260,7 @@ type in Since this is computed when a signal is enqueued, a signal queue shared by processes in multiple PID namespaces would defeat that. - +.PP .\" Note these restrictions were all introduced in .\" 8382fcac1b813ad0a4e68a838fc7ae93fa39eda0 .\" when CLONE_NEWPID|CLONE_VM was disallowed @@ -289,7 +289,7 @@ directories) only processes visible in the PID namespace of the process that performed the mount, even if the .I /proc filesystem is viewed from processes in other namespaces. - +.PP After creating a new PID namespace, it is useful for the child to change its root directory and mount a new procfs instance at @@ -308,7 +308,7 @@ or then it isn't necessary to change the root directory: a new procfs instance can be mounted directly over .IR /proc . - +.PP From a shell, the command to mount .I /proc is: diff --git a/man7/pipe.7 b/man7/pipe.7 index 73e343154..94f91e8a1 100644 --- a/man7/pipe.7 +++ b/man7/pipe.7 @@ -34,7 +34,7 @@ and a .IR "write end" . Data written to the write end of a pipe can be read from the read end of the pipe. - +.PP A pipe is created using .BR pipe (2), which creates a new pipe and returns two file descriptors, @@ -44,7 +44,7 @@ Pipes can be used to create a communication channel between related processes; see .BR pipe (2) for an example. - +.PP A FIFO (short for First In First Out) has a name within the filesystem (created using .BR mkfifo (3)), @@ -68,7 +68,7 @@ The only difference between pipes and FIFOs is the manner in which they are created and opened. Once these tasks have been accomplished, I/O on pipes and FIFOs has exactly the same semantics. - +.PP If a process attempts to read from an empty pipe, then .BR read (2) will block until data is available. @@ -82,11 +82,11 @@ Nonblocking I/O is possible by using the operation to enable the .B O_NONBLOCK open file status flag. - +.PP The communication channel provided by a pipe is a .IR "byte stream" : there is no concept of message boundaries. - +.PP If all file descriptors referring to the write end of a pipe have been closed, then an attempt to .BR read (2) @@ -113,7 +113,7 @@ calls to close unnecessary duplicate file descriptors; this ensures that end-of-file and .BR SIGPIPE / EPIPE are delivered when appropriate. - +.PP It is not possible to apply .BR lseek (2) to a pipe. @@ -129,7 +129,7 @@ Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked. - +.PP In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 16 pages @@ -144,7 +144,7 @@ operations. See .BR fcntl (2) for more information. - +.PP The following .BR ioctl (2) operation, which can be applied to a file descriptor @@ -152,9 +152,9 @@ that refers to either end of a pipe, places a count of the number of unread bytes in the pipe in the .I int buffer pointed to by the final argument of the call: - +.PP ioctl(fd, FIONREAD, &nbytes); - +.PP The .B FIONREAD operation is not specified in any standard, @@ -170,10 +170,10 @@ An upper limit, in pages, on the capacity that an unprivileged user .BR CAP_SYS_RESOURCE capability) can set for a pipe. - +.IP The default value for this limit is 16 times the default pipe capacity (see above); the lower limit is two pages. - +.IP This interface was removed in Linux 2.6.35, in favor of .IR /proc/sys/fs/pipe-max-size . .TP @@ -189,14 +189,14 @@ The value assigned to this file may be rounded upward, to reflect the value actually employed for a convenient implementation. To determine the rounded-up value, display the contents of this file after assigning a value to it. - +.IP The default value for this file is 1048576 (1 MiB). The minimum value that can be assigned to this file is the system page size. Attempts to set a limit less than the page size cause .BR write (2) to fail with the error .BR EINVAL . - +.IP Since Linux 4.9, .\" commit 086e774a57fba4695f14383c0818994c0b31da7c the value on this file also acts as a ceiling on the default capacity @@ -214,7 +214,7 @@ So long as the total number of pages allocated to pipe buffers for this user is at this limit, attempts to create new pipes will be denied, and attempts to increase a pipe's capacity will be denied. - +.IP When the value of this limit is zero (which is the default), no hard limit is applied. .\" The default was chosen to avoid breaking existing applications that @@ -232,7 +232,7 @@ So long as the total number of pages allocated to pipe buffers for this user is at this limit, individual pipes created by a user will be limited to one page, and attempts to increase a pipe's capacity will be denied. - +.IP When the value of this limit is zero, no soft limit is applied. The default value for this file is 16384, which permits creating up to 1024 pipes with the default capacity. @@ -321,7 +321,7 @@ a pipe or FIFO are .B O_NONBLOCK and .BR O_ASYNC . - +.PP Setting the .B O_ASYNC flag for the read end of a pipe causes a signal @@ -359,7 +359,7 @@ and excluded the memory required for the increased pipe capacity. The new increase in pipe capacity could then push the total memory used by the user for pipes (possibly far) over a limit. (This could also trigger the problem described next.) - +.IP Starting with Linux 4.9, the limit checking includes the memory required for the new pipe capacity. .IP (2) @@ -368,13 +368,13 @@ less than the existing pipe capacity. This could lead to problems if a user set a large pipe capacity, and then the limits were lowered, with the result that the user could no longer decrease the pipe capacity. - +.IP Starting with Linux 4.9, checks against the limits are performed only when increasing a pipe's capacity; an unprivileged user can always decrease a pipe's capacity. .IP (3) The accounting and checking against the limits were done as follows: - +.IP .RS .PD 0 .IP (a) 4 @@ -391,7 +391,7 @@ Multiple processes could pass point (a) simultaneously, and then allocate pipe buffers that were accounted for only in step (c), with the result that the user's pipe buffer allocation could be pushed over the limit. - +.IP Starting with Linux 4.9, the accounting step is performed before doing the allocation, and the operation fails if the limit would be exceeded. diff --git a/man7/pkeys.7 b/man7/pkeys.7 index f33925ae2..c0f412d2b 100644 --- a/man7/pkeys.7 +++ b/man7/pkeys.7 @@ -34,13 +34,13 @@ when changing permissions. Memory Protection Keys provide a mechanism for changing protections without requiring modification of the page tables on every permission change. - +.PP To use pkeys, software must first "tag" a page in the page tables with a pkey. After this tag is in place, an application only has to change the contents of a register in order to remove write access, or all access to a tagged page. - +.PP Protection keys work in conjunction with the existing .BR PROT_READ / .BR PROT_WRITE / @@ -51,7 +51,7 @@ and .BR mmap (2), but always act to further restrict these traditional permission mechanisms. - +.PP If a process performs an access that violates pkey restrictions, it receives a .BR SIGSEGV @@ -59,7 +59,7 @@ signal. See .BR sigaction (2) for details of the information available with that signal. - +.PP To use the pkeys feature, the processor must support it, and the kernel must contain support for the feature on a given processor. As of early 2016 only future Intel x86 processors are supported, @@ -69,7 +69,7 @@ are available for actual application use. The default key is assigned to any memory region for which a pkey has not been explicitly assigned via .BR pkey_mprotect (2). - +.PP Protection keys have the potential to add a layer of security and reliability to applications. But they have not been primarily designed as @@ -77,7 +77,7 @@ a security feature. For instance, WRPKRU is a completely unprivileged instruction, so pkeys are useless in any case that an attacker controls the PKRU register or can execute arbitrary instructions. - +.PP Applications should be very careful to ensure that they do not "leak" protection keys. For instance, before calling @@ -96,7 +96,7 @@ Applications may implement these checks by searching the file for memory regions with the pkey assigned. Further details can be found in .BR proc (5). - +.PP Any application wanting to use protection keys needs to be able to function without them. They might be unavailable because the hardware that the @@ -110,7 +110,7 @@ keys should simply call and test whether the call succeeds, instead of attempting to detect support for the feature in any other way. - +.PP Although unnecessary, hardware support for protection keys may be enumerated with the .I cpuid @@ -123,7 +123,7 @@ under the "flags" field. The string "pku" in this field indicates hardware support for protection keys and the string "ospke" indicates that the kernel contains and has enabled protection keys support. - +.PP Applications using threads and protection keys should be especially careful. Threads inherit the protection key rights of the parent at the time @@ -145,7 +145,7 @@ key rights upon entering a signal handler if the desired rights differ from the defaults. The rights of any interrupted context are restored when the signal handler returns. - +.PP This signal behavior is unusual and is due to the fact that the x86 PKRU register (which stores protection key access rights) is managed with the same hardware mechanism (XSAVE) that manages floating-point registers. @@ -157,7 +157,7 @@ The Linux kernel implements the following pkey-related system calls: .BR pkey_alloc (2), and .BR pkey_free (2). - +.PP The Linux pkey system calls are available only if the kernel was configured and built with the .BR CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS @@ -171,7 +171,7 @@ After that, it attempts to allocate a protection key and disallows access to the page by using the WRPKRU instruction. It then tries to access the page, which we now expect to cause a fatal signal to the application. - +.PP .in +4n .nf .RB "$" " ./a.out" diff --git a/man7/process-keyring.7 b/man7/process-keyring.7 index c19d0f884..1b5000067 100644 --- a/man7/process-keyring.7 +++ b/man7/process-keyring.7 @@ -22,14 +22,14 @@ A special serial number value, .BR KEY_SPEC_PROCESS_KEYRING , is defined that can be used in lieu of the actual serial number of the calling process's process keyring. - +.PP From the .BR keyctl (1) utility, '\fB@p\fP' can be used instead of a numeric key ID in much the same way, but since .BR keyctl (1) is a program run after forking, this is of no utility. - +.PP A thread created using the .BR clone (2) .B CLONE_THREAD @@ -42,7 +42,7 @@ A process's process keyring is cleared on .BR execve (2). The process keyring is destroyed when the last thread that refers to it terminates. - +.PP If a process doesn't have a process keyring when it is accessed, then the process keyring will be created if the keyring is to be modified; otherwise, the error diff --git a/man7/pthreads.7 b/man7/pthreads.7 index 011a029f8..ec0fa79ef 100644 --- a/man7/pthreads.7 +++ b/man7/pthreads.7 @@ -33,7 +33,7 @@ A single process can contain multiple threads, all of which are executing the same program. These threads share the same global memory (data and heap segments), but each thread has its own stack (automatic variables). - +.PP POSIX.1 also requires that threads share a range of other attributes (i.e., these attributes are process-wide rather than per-thread): .IP \- 3 @@ -121,12 +121,12 @@ This identifier is returned to the caller of .BR pthread_create (3), and a thread can obtain its own thread identifier using .BR pthread_self (3). - +.PP Thread IDs are guaranteed to be unique only within a process. (In all pthreads functions that accept a thread ID as an argument, that ID by definition refers to a thread in the same process as the caller.) - +.PP The system may reuse a thread ID after a terminated thread has been joined, or a detached thread has terminated. POSIX says: "If an application attempts to use a thread ID whose @@ -135,7 +135,7 @@ lifetime has ended, the behavior is undefined." A thread-safe function is one that can be safely (i.e., it will deliver the same results regardless of whether it is) called from multiple threads at the same time. - +.PP POSIX.1-2001 and POSIX.1-2008 require that all functions specified in the standard shall be thread-safe, except for the following functions: @@ -239,7 +239,7 @@ wctomb() An async-cancel-safe function is one that can be safely called in an application where asynchronous cancelability is enabled (see .BR pthread_setcancelstate (3)). - +.PP Only the following functions are required to be async-cancel-safe by POSIX.1-2001 and POSIX.1-2008: .in +4n @@ -257,10 +257,10 @@ If a thread is cancelable, its cancelability type is deferred, and a cancellation request is pending for the thread, then the thread is canceled when it calls a function that is a cancellation point. - +.PP The following functions are required to be cancellation points by POSIX.1-2001 and/or POSIX.1-2008: - +.PP .\" FIXME .\" Document the list of all functions that are cancellation points in glibc .in +4n @@ -325,10 +325,10 @@ write() writev() .fi .in - +.PP The following functions may be cancellation points according to POSIX.1-2001 and/or POSIX.1-2008: - +.PP .in +4n .nf access() @@ -558,7 +558,7 @@ wprintf() wscanf() .fi .in - +.PP An implementation may also mark other functions not specified in the standard as cancellation points. In particular, an implementation is likely to mark @@ -792,13 +792,13 @@ With NPTL, all of the threads in a process are placed in the same thread group; all members of a thread group share the same PID. NPTL does not employ a manager thread. - +.PP NPTL makes internal use of the first two real-time signals; these signals cannot be used in applications. See .BR nptl (7) for further details. - +.PP NPTL still has at least one nonconformance with POSIX.1: .IP \- 3 Threads do not share a common nice value. @@ -909,7 +909,7 @@ bash$ $( LD_ASSUME_KERNEL=2.2.5 ldd /bin/ls | grep libc.so | \\ .BR nptl (7), .BR sigevent (7), .BR signal (7) - +.PP Various Pthreads manual pages, for example: .BR pthread_attr_init (3), .BR pthread_atfork (3), diff --git a/man7/pty.7 b/man7/pty.7 index 25c29bff6..f63dec717 100644 --- a/man7/pty.7 +++ b/man7/pty.7 @@ -58,19 +58,19 @@ terminal emulators such as .BR unbuffer (1), and .BR expect (1). - +.PP Data flow between master and slave is handled asynchronously, much like data flow with a physical terminal. Data written to the slave will be available at the master promptly, but may not be available immediately. Similarly, there may be a small processing delay between a write to the master, and the effect being visible at the slave. - +.PP Historically, two pseudoterminal APIs have evolved: BSD and System V. SUSv1 standardized a pseudoterminal API based on the System V API, and this API should be employed in all new programs that use pseudoterminals. - +.PP Linux provides both BSD-style and (standardized) System V-style pseudoterminals. System V-style terminals are commonly called UNIX 98 pseudoterminals @@ -95,7 +95,7 @@ the name returned by .BR ptsname (3) in a call to .BR open (2). - +.PP The Linux kernel imposes a limit on the number of available UNIX 98 pseudoterminals. In kernels up to and including 2.6.3, this limit is configured @@ -149,7 +149,7 @@ A description of the .BR ioctl (2), which controls packet mode operation, can be found in .BR ioctl_tty (2). - +.PP The BSD .BR ioctl (2) operations diff --git a/man7/random.7 b/man7/random.7 index 853648fde..a5c0ef972 100644 --- a/man7/random.7 +++ b/man7/random.7 @@ -36,7 +36,7 @@ The kernel random-number generator relies on entropy gathered from device drivers and other sources of environmental noise to seed a cryptographically secure pseudorandom number generator (CSPRNG). It is designed for security, rather than speed. - +.PP The following interfaces provide access to output from the kernel CSPRNG: .IP * 3 The @@ -96,7 +96,7 @@ flag. The cryptographic algorithms used for the .IR urandom source are quite conservative, and so should be sufficient for all purposes. - +.PP The disadvantage of .B GRND_RANDOM and reads from @@ -213,7 +213,7 @@ or Diffie-Hellman private key has an effective key size of 128 bits (it requires about 2^128 operations to break) so a key generator needs only 128 bits (16 bytes) of seed material from .IR /dev/random . - +.PP While some safety margin above that minimum is reasonable, as a guard against flaws in the CSPRNG algorithm, no cryptographic primitive available today can hope to promise more than 256 bits of security, diff --git a/man7/sched.7 b/man7/sched.7 index b918cb688..c246001aa 100644 --- a/man7/sched.7 +++ b/man7/sched.7 @@ -110,12 +110,12 @@ scheduling priority, .IR sched_priority . The scheduler makes its decisions based on knowledge of the scheduling policy and static priority of all threads on the system. - +.PP For threads scheduled under one of the normal scheduling policies (\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP), \fIsched_priority\fP is not used in scheduling decisions (it must be specified as 0). - +.PP Processes scheduled under one of the real-time policies (\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a \fIsched_priority\fP value in the range 1 (low) to 99 (high). @@ -129,17 +129,17 @@ Portable programs should use and .BR sched_get_priority_max (2) to find the range of priorities supported for a particular policy. - +.PP Conceptually, the scheduler maintains a list of runnable threads for each possible \fIsched_priority\fP value. In order to determine which thread runs next, the scheduler looks for the nonempty list with the highest static priority and selects the thread at the head of this list. - +.PP A thread's scheduling policy determines where it will be inserted into the list of threads with equal static priority and how it will move inside this list. - +.PP All scheduling is preemptive: if a thread with a higher static priority becomes ready to run, the currently running thread will be preempted and @@ -187,7 +187,7 @@ will be put at the end of the list. No other events will move a thread scheduled under the \fBSCHED_FIFO\fP policy in the wait list of runnable threads with equal static priority. - +.PP A \fBSCHED_FIFO\fP thread runs until either it is blocked by an I/O request, it is preempted by a higher priority thread, or it calls @@ -223,7 +223,7 @@ one must use the Linux-specific and .BR sched_getattr (2) system calls. - +.PP A sporadic task is one that has a sequence of jobs, where each job is activated at most once per period. Each job also has a @@ -241,9 +241,9 @@ is the time at which a task starts its execution. The .I "absolute deadline" is thus obtained by adding the relative deadline to the arrival time. - +.PP The following diagram clarifies these terms: - +.PP .in +4n .nf arrival/wakeup absolute deadline @@ -256,7 +256,7 @@ arrival/wakeup absolute deadline |<-------------- period ------------------->| .fi .in - +.PP When setting a .B SCHED_DEADLINE policy for a thread using @@ -273,7 +273,7 @@ Deadline to the relative deadline, and Period to the period of the task. Thus, for .BR SCHED_DEADLINE scheduling, we have: - +.PP .in +4n .nf arrival/wakeup absolute deadline @@ -286,7 +286,7 @@ arrival/wakeup absolute deadline |<-------------- Period ------------------->| .fi .in - +.PP The three deadline-scheduling parameters correspond to the .IR sched_runtime , .IR sched_deadline , @@ -304,11 +304,11 @@ If .IR sched_period is specified as 0, then it is made the same as .IR sched_deadline . - +.PP The kernel requires that: - +.PP sched_runtime <= sched_deadline <= sched_period - +.PP .\" See __checkparam_dl in kernel/sched/core.c In addition, under the current implementation, all of the parameter values must be at least 1024 @@ -318,10 +318,10 @@ If any of these checks fails, .BR sched_setattr (2) fails with the error .BR EINVAL . - +.PP The CBS guarantees non-interference between tasks, by throttling threads that attempt to over-run their specified Runtime. - +.PP To ensure deadline scheduling guarantees, the kernel must prevent situations where the set of .B SCHED_DEADLINE @@ -334,13 +334,13 @@ if it is not, .BR sched_setattr (2) fails with the error .BR EBUSY . - +.PP For example, it is required (but not necessarily sufficient) for the total utilization to be less than or equal to the total number of CPUs available, where, since each thread can maximally run for Runtime per Period, that thread's utilization is its Runtime divided by its Period. - +.PP In order to fulfill the guarantees that are made when a thread is admitted to the .BR SCHED_DEADLINE @@ -351,7 +351,7 @@ system; if any .BR SCHED_DEADLINE thread is runnable, it will preempt any thread scheduled under one of the other policies. - +.PP A call to .BR fork (2) by a thread scheduled under the @@ -359,7 +359,7 @@ by a thread scheduled under the policy will fail with the error .BR EAGAIN , unless the thread has its reset-on-fork flag set (see below). - +.PP A .B SCHED_DEADLINE thread that calls @@ -378,7 +378,7 @@ processes). \fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is intended for all threads that do not require the special real-time mechanisms. - +.PP The thread to run is chosen from the static priority 0 list based on a \fIdynamic\fP priority that is determined only inside this list. @@ -401,12 +401,12 @@ The nice value can be modified using .BR setpriority (2), or .BR sched_setattr (2). - +.PP According to POSIX.1, the nice value is a per-process attribute; that is, the threads in a process should share a nice value. However, on Linux, the nice value is a per-thread attribute: different threads in the same process may have different nice values. - +.PP The range of the nice value varies across UNIX systems. On modern Linux, the range is \-20 (high priority) to +19 (low priority). @@ -414,12 +414,12 @@ On some other systems, the range is \-20..20. Very early Linux kernels (Before Linux 2.0) had the range \-infinity..15. .\" Linux before 1.3.36 had \-infinity..15. .\" Since kernel 1.3.43, Linux has the range \-20..19. - +.PP The degree to which the nice value affects the relative scheduling of .BR SCHED_OTHER processes likewise varies across UNIX systems and across Linux kernel versions. - +.PP With the advent of the CFS scheduler in kernel 2.6.23, Linux adopted an algorithm that causes relative differences in nice values to have a much stronger effect. @@ -431,14 +431,14 @@ to a process whenever there is any other higher priority load on the system, and makes high nice values (\-20) deliver most of the CPU to applications that require it (e.g., some audio applications). - +.PP On Linux, the .BR RLIMIT_NICE resource limit can be used to define a limit to which an unprivileged process's nice value can be raised; see .BR setrlimit (2) for details. - +.PP For further details on the nice value, see the subsections on the autogroup feature and group scheduling, below. .\" @@ -454,7 +454,7 @@ that the thread is CPU-intensive. Consequently, the scheduler will apply a small scheduling penalty with respect to wakeup behavior, so that this thread is mildly disfavored in scheduling decisions. - +.PP .\" The following paragraph is drawn largely from the text that .\" accompanied Ingo Molnar's patch for the implementation of .\" SCHED_BATCH. @@ -468,7 +468,7 @@ interactivity causing extra preemptions (between the workload's tasks). (Since Linux 2.6.23.) \fBSCHED_IDLE\fP can be used only at static priority 0; the process nice value has no influence for this policy. - +.PP This policy is intended for running jobs at extremely low priority (lower even than a +19 nice value with the .B SCHED_OTHER @@ -504,14 +504,14 @@ The state of the reset-on-fork flag can analogously be retrieved using .BR sched_getscheduler (2) and .BR sched_getattr (2). - +.PP The reset-on-fork feature is intended for media-playback applications, and can be used to prevent applications evading the .BR RLIMIT_RTTIME resource limit (see .BR getrlimit (2)) by creating multiple child processes. - +.PP More precisely, if the reset-on-fork flag is set, the following rules apply for subsequently created children: .IP * 3 @@ -545,13 +545,13 @@ matches the real or effective user ID of the target thread (i.e., the thread specified by .IR pid ) whose policy is being changed. - +.PP A thread must be privileged .RB ( CAP_SYS_NICE ) in order to set or modify a .BR SCHED_DEADLINE policy. - +.PP Since Linux 2.6.12, the .B RLIMIT_RTPRIO resource limit defines a ceiling on an unprivileged thread's @@ -622,7 +622,7 @@ process from freezing the system was to run (at the console) a shell scheduled under a higher static priority than the tested application. This allows an emergency kill of tested real-time applications that do not block or terminate as expected. - +.PP Since Linux 2.6.25, there are other techniques for dealing with runaway real-time and deadline processes. One of these is to use the @@ -632,7 +632,7 @@ a real-time process may consume. See .BR getrlimit (2) for details. - +.PP Since version 2.6.25, Linux also provides two .I /proc files that can be used to reserve a certain amount of CPU time @@ -675,7 +675,7 @@ Child processes inherit the scheduling policy and parameters across a .BR fork (2). The scheduling policy and parameters are preserved across .BR execve (2). - +.PP Memory locking is usually needed for real-time processes to avoid paging delays; this can be done with .BR mlock (2) @@ -692,7 +692,7 @@ parallel build processes (i.e., the .BR make (1) .BR \-j flag). - +.PP This feature operates in conjunction with the CFS scheduler and requires a kernel that is configured with .BR CONFIG_SCHED_AUTOGROUP . @@ -702,7 +702,7 @@ a value of 0 disables the feature, while a value of 1 enables it. The default value in this file is 1, unless the kernel was booted with the .IR noautogroup parameter. - +.PP A new autogroup is created when a new session is created via .BR setsid (2); this happens, for example, when a new terminal window is started. @@ -712,14 +712,14 @@ inherits its parent's autogroup membership. Thus, all of the processes in a session are members of the same autogroup. An autogroup is automatically destroyed when the last process in the group terminates. - +.PP When autogrouping is enabled, all of the members of an autogroup are placed in the same kernel scheduler "task group". The CFS scheduler employs an algorithm that equalizes the distribution of CPU cycles across task groups. The benefits of this for interactive desktop performance can be described via the following example. - +.PP Suppose that there are two autogroups competing for the same CPU (i.e., presume either a single CPU system or the use of .BR taskset (1) @@ -750,17 +750,17 @@ the scheduler distributes CPU cycles across task groups such that an autogroup that contains a large number of CPU-bound processes does not end up hogging CPU cycles at the expense of the other jobs on the system. - +.PP A process's autogroup (task group) membership can be viewed via the file .IR /proc/[pid]/autogroup : - +.PP .nf .in +4n $ \fBcat /proc/1/autogroup\fP /autogroup-1 nice 0 .in .fi - +.PP This file can also be used to modify the CPU bandwidth allocated to an autogroup. This is done by writing a number in the "nice" range to the file @@ -782,7 +782,7 @@ to fail with the error .\" A patch was posted on 23 Nov 2016 .\" ("sched/autogroup: Fix 64bit kernel nice adjustment"; .\" check later to see in which kernel version it lands. - +.PP The autogroup nice setting has the same meaning as the process nice value, but applies to distribution of CPU cycles to the autogroup as a whole, based on the relative nice values of other autogroups. @@ -791,12 +791,12 @@ will be a product of the autogroup's nice value (compared to other autogroups) and the process's nice value (compared to other processes in the same autogroup. - +.PP The use of the .BR cgroups (7) CPU controller to place processes in cgroups other than the root CPU cgroup overrides the effect of autogrouping. - +.PP The autogroup feature groups only processes scheduled under non-real-time policies .RB ( SCHED_OTHER , @@ -817,7 +817,7 @@ policies), the CFS scheduler employs a technique known as "group scheduling", if the kernel was configured with the .BR CONFIG_FAIR_GROUP_SCHED option (which is typical). - +.PP Under group scheduling, threads are scheduled in "task groups". Task groups have a hierarchical relationship, rooted under the initial task group on the system, @@ -861,7 +861,7 @@ or on a process has an effect only for scheduling relative to other processes executed in the same session (typically: the same terminal window). - +.PP Conversely, for two processes that are (for example) the sole CPU-bound processes in different sessions (e.g., different terminal windows, @@ -877,7 +877,7 @@ A possibly useful workaround here is to use a command such as the following to modify the autogroup nice value for .I all of the processes in a terminal session: - +.PP .nf .in +4n $ \fBecho 10 > /proc/self/autogroup\fP @@ -905,7 +905,7 @@ patch-\fIkernelversion\fP-rt\fIpatchversion\fP and can be downloaded from .UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/ .UE . - +.PP Without the patches and prior to their full inclusion into the mainline kernel, the kernel configuration offers only the three preemption classes .BR CONFIG_PREEMPT_NONE , @@ -914,7 +914,7 @@ and .B CONFIG_PREEMPT_DESKTOP which respectively provide no, some, and considerable reduction of the worst-case scheduling latency. - +.PP With the patches applied or after their full inclusion into the mainline kernel, the additional configuration item .B CONFIG_PREEMPT_RT diff --git a/man7/sem_overview.7 b/man7/sem_overview.7 index 3fb98112f..cdf7958c5 100644 --- a/man7/sem_overview.7 +++ b/man7/sem_overview.7 @@ -28,7 +28,7 @@ sem_overview \- overview of POSIX semaphores .SH DESCRIPTION POSIX semaphores allow processes and threads to synchronize their actions. - +.PP A semaphore is an integer whose value is never allowed to fall below zero. Two operations can be performed on semaphores: increment the semaphore value by one @@ -38,7 +38,7 @@ and decrement the semaphore value by one If the value of a semaphore is currently zero, then a .BR sem_wait (3) operation will block until the value becomes greater than zero. - +.PP POSIX semaphores come in two forms: named semaphores and unnamed semaphores. .TP @@ -61,7 +61,7 @@ followed by one or more characters, none of which are slashes. Two processes can operate on the same named semaphore by passing the same name to .BR sem_open (3). - +.IP The .BR sem_open (3) function creates a new named semaphore or opens an existing @@ -91,7 +91,7 @@ A process-shared semaphore must be placed in a shared memory region .BR shmget (2), or a POSIX shared memory object built created using .BR shm_open (3)). - +.IP Before being used, an unnamed semaphore must be initialized using .BR sem_init (3). It can then be operated on using @@ -132,7 +132,7 @@ with names of the form rather than .B NAME_MAX characters.) - +.PP Since Linux 2.6.19, ACLs can be placed on files under this directory, to control object permissions on a per-user and per-group basis. .SH NOTES diff --git a/man7/session-keyring.7 b/man7/session-keyring.7 index 11de5fcda..070235f5d 100644 --- a/man7/session-keyring.7 +++ b/man7/session-keyring.7 @@ -22,17 +22,17 @@ Optionally, PAM may revoke the session keyring on logout. (In typical configurations, PAM does do this revocation.) The session keyring has the name (description) .IR _ses . - +.PP A special serial number value, .BR KEY_SPEC_SESSION_KEYRING , is defined that can be used in lieu of the actual serial number of the calling process's session keyring. - +.PP From the .BR keyctl (1) utility, '\fB@s\fP' can be used instead of a numeric key ID in much the same way. - +.PP A process's session keyring is inherited across .BR clone (2), .BR fork (2), @@ -44,7 +44,7 @@ is preserved across even when the executable is set-user-ID or set-group-ID or has capabilities. The session keyring is destroyed when the last process that refers to it exits. - +.PP If a process doesn't have a session keyring when it is accessed, then, under certain circumstances, the .BR user-session-keyring (7) @@ -84,7 +84,7 @@ operation.) These operations are also exposed through the .BR keyctl (1) utility as: - +.PP .nf .in +4n keyctl session @@ -92,9 +92,9 @@ keyctl session - [ ...] keyctl session [ ...] .in .fi - +.PP and: - +.PP .nf .in +4n keyctl new_session diff --git a/man7/shm_overview.7 b/man7/shm_overview.7 index 3ad4ffdea..3afdeb41a 100644 --- a/man7/shm_overview.7 +++ b/man7/shm_overview.7 @@ -30,7 +30,7 @@ shm_overview \- overview of POSIX shared memory .SH DESCRIPTION The POSIX shared memory API allows processes to communicate information by sharing a region of memory. - +.PP The interfaces employed in the API are: .TP 15 .BR shm_open (3) @@ -101,7 +101,7 @@ to control the permissions of objects in the virtual filesystem. .SH NOTES Typically, processes must synchronize their access to a shared memory object, using, for example, POSIX semaphores. - +.PP System V shared memory .RB ( shmget (2), .BR shmop (2), diff --git a/man7/signal-safety.7 b/man7/signal-safety.7 index 0e76ebff9..3879a5aef 100644 --- a/man7/signal-safety.7 +++ b/man7/signal-safety.7 @@ -34,13 +34,13 @@ Many functions are async-signal-safe. In particular, nonreentrant functions are generally unsafe to call from a signal handler. - +.PP The kinds of issues that render a function unsafe can be quickly understood when one considers the implementation of the .I stdio library, all of whose functions are not async-signal-safe. - +.PP When performing buffered I/O on a file, the .I stdio functions must maintain a statically allocated data buffer @@ -57,7 +57,7 @@ the program is interrupted by a signal handler that also calls then the second call to .BR printf (3) will operate on inconsistent data, with unpredictable results. - +.PP To avoid problems with unsafe functions, there are two possible choices: .IP 1. 3 Ensure that @@ -72,7 +72,7 @@ by the signal handler. .PP Generally, the second choice is difficult in programs of any complexity, so the first choice is taken. - +.PP POSIX.1 specifies a set of functions that an implementation must make async-signal-safe. (An implementation may provide safe implementations of additional functions, @@ -81,13 +81,13 @@ may not provide the same guarantees.) In general, a function is async-signal-safe either because it is reentrant or because it is atomic with respect to signals (i.e., its execution can't be interrupted by a signal handler). - +.PP The set of functions required to be async-signal-safe by POSIX.1 is shown in the following table. The functions not otherwise noted were required to be async-signal-safe in POSIX.1-2001; the table details changes in the subsequent standards. - +.PP .TS lb lb l l. @@ -284,7 +284,7 @@ Function Notes \fBwmemset\fP(3) Added in POSIX.1-2016 \fBwrite\fP(2) .TE - +.sp 1 Notes: .IP * 3 POSIX.1-2001 and POSIX.1-2004 required the functions diff --git a/man7/signal.7 b/man7/signal.7 index ea8cf51c6..8244dbbc7 100644 --- a/man7/signal.7 +++ b/man7/signal.7 @@ -54,7 +54,7 @@ Each signal has a current .IR disposition , which determines how the process behaves when it is delivered the signal. - +.PP The entries in the "Action" column of the tables below specify the default disposition for each signal, as follows: .IP Term @@ -90,11 +90,11 @@ It is possible to arrange that the signal handler uses an alternate stack; see .BR sigaltstack (2) for a discussion of how to do this and when it might be useful.) - +.PP The signal disposition is a per-process attribute: in a multithreaded application, the disposition of a particular signal is the same for all threads. - +.PP A child created via .BR fork (2) inherits a copy of its parent's signal dispositions. @@ -174,7 +174,7 @@ which means that it will not be delivered until it is later unblocked. Between the time when it is generated and when it is delivered a signal is said to be .IR pending . - +.PP Each thread in a process has an independent .IR "signal mask" , which indicates the set of signals that the thread is currently blocking. @@ -183,13 +183,13 @@ A thread can manipulate its signal mask using In a traditional single-threaded application, .BR sigprocmask (2) can be used to manipulate the signal mask. - +.PP A child created via .BR fork (2) inherits a copy of its parent's signal mask; the signal mask is preserved across .BR execve (2). - +.PP A signal may be generated (and thus pending) for a process as a whole (e.g., when sent using .BR kill (2)) @@ -206,14 +206,14 @@ A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked. If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which to deliver the signal. - +.PP A thread can obtain the set of signals that it currently has pending using .BR sigpending (2). This set will consist of the union of the set of pending process-directed signals and the set of signals pending for the calling thread. - +.PP A child created via .BR fork (2) initially has an empty pending signal set; @@ -231,7 +231,7 @@ and the last one for mips. .I not shown; see the Linux kernel source for signal numbering on that architecture.) A dash (\-) denotes that a signal is absent on the corresponding architecture. - +.PP First the signals described in the original POSIX.1-1990 standard. .TS l c c l @@ -260,13 +260,13 @@ SIGTSTP 18,20,24 Stop Stop typed at terminal SIGTTIN 21,21,26 Stop Terminal input for background process SIGTTOU 22,22,27 Stop Terminal output for background process .TE - +.sp 1 The signals .B SIGKILL and .B SIGSTOP cannot be caught, blocked, or ignored. - +.PP Next the signals not in the POSIX.1-1990 standard but described in SUSv2 and POSIX.1-2001. .TS @@ -288,7 +288,7 @@ SIGXCPU 24,24,30 Core CPU time limit exceeded (4.2BSD); SIGXFSZ 25,25,31 Core File size limit exceeded (4.2BSD); see \fBsetrlimit\fP(2) .TE - +.sp 1 Up to and including Linux 2.2, the default behavior for .BR SIGSYS ", " SIGXCPU ", " SIGXFSZ ", " and (on architectures other than SPARC and MIPS) @@ -299,7 +299,7 @@ was to terminate the process (without a core dump). is to terminate the process without a core dump.) Linux 2.4 conforms to the POSIX.1-2001 requirements for these signals, terminating the process with a core dump. - +.PP Next various other signals. .TS l c c l @@ -317,7 +317,7 @@ SIGLOST \-,\-,\- Term File lock lost (unused) SIGWINCH 28,28,20 Ign Window resize signal (4.3BSD, Sun) SIGUNUSED \-,31,\- Core Synonymous with \fBSIGSYS\fP .TE - +.sp 1 (Signal 29 is .B SIGINFO / @@ -325,21 +325,21 @@ SIGUNUSED \-,31,\- Core Synonymous with \fBSIGSYS\fP on an alpha but .B SIGLOST on a sparc.) - +.PP .B SIGEMT is not specified in POSIX.1-2001, but nevertheless appears on most other UNIX systems, where its default action is typically to terminate the process with a core dump. - +.PP .B SIGPWR (which is not specified in POSIX.1-2001) is typically ignored by default on those other UNIX systems where it appears. - +.PP .B SIGIO (which is not specified in POSIX.1-2001) is ignored by default on several other UNIX systems. - +.PP Where defined, .B SIGUNUSED is synonymous with @@ -452,7 +452,7 @@ resource limit, which specifies a per-user limit for queued signals; see .BR setrlimit (2) for further details. - +.PP The addition of real-time signals required the widening of the signal set structure .RI ( sigset_t ) @@ -488,7 +488,7 @@ flag (see .BR sigaction (2)). The details vary across UNIX systems; below, the details for Linux. - +.PP If a blocked call to one of the following interfaces is interrupted by a signal handler, then the call will be automatically restarted after the signal handler returns if the @@ -674,7 +674,7 @@ and then resumed via .BR SIGCONT . This behavior is not sanctioned by POSIX.1, and doesn't occur on other systems. - +.PP The Linux interfaces that display this behavior are: .IP * 2 "Input" socket interfaces, when a timeout diff --git a/man7/spufs.7 b/man7/spufs.7 index c8384a012..675e02762 100644 --- a/man7/spufs.7 +++ b/man7/spufs.7 @@ -31,7 +31,7 @@ spufs \- SPU filesystem The SPU filesystem is used on PowerPC machines that implement the Cell Broadband Engine Architecture in order to access Synergistic Processor Units (SPUs). - +.PP The filesystem provides a name space similar to POSIX shared memory or message queues. Users that have write permissions @@ -40,7 +40,7 @@ on the filesystem can use to establish SPU contexts under the .B spufs root directory. - +.PP Every SPU context is represented by a directory containing a predefined set of files. These files can be @@ -72,7 +72,7 @@ supported on regular filesystems. This list details the supported operations and the deviations from the standard behavior described in the respective man pages. - +.PP All files that support the .BR read (2) operation also support @@ -94,7 +94,7 @@ structure that contain reliable information are .IR st_uid , and .IR st_gid . - +.PP All files support the .BR chmod (2)/ fchmod (2) and @@ -103,7 +103,7 @@ operations, but will not be able to grant permissions that contradict the possible operations (e.g., read access on the .I wbox file). - +.PP The current set of files is: .TP .I /capabilities @@ -158,11 +158,11 @@ This file contains the 128-bit values of each register, from register 0 to register 127, in order. This allows the general-purpose registers to be inspected for debugging. - +.IP Reading to or writing from this file requires that the context is scheduled out, so use of this file is not recommended in normal program operation. - +.IP The .I regs file is not present on contexts that have been created with the @@ -214,7 +214,7 @@ Also, .BR poll (2) and similar system calls can be used to monitor for the presence of mailbox data. - +.IP The possible operations on an open .I ibox file are: @@ -236,7 +236,7 @@ the return value is set to \-1 and .I errno is set to .BR EAGAIN . - +.IP If there is no data available in the mailbox and the file descriptor has been opened without .BR O_NONBLOCK , @@ -283,7 +283,7 @@ value is set to \-1 and .I errno is set to .BR EAGAIN . - +.IP If there is no space available in the mailbox and the file descriptor has been opened without .BR O_NONBLOCK , @@ -385,7 +385,7 @@ If the register value is larger than the buffer passed to the .BR read (2) system call, subsequent reads will continue reading from the same buffer, until the end of the buffer is reached. - +.IP When a complete string has been read, all subsequent read operations will return zero bytes and a new file descriptor needs to be opened to read a new value. @@ -399,7 +399,7 @@ The string is parsed from the beginning until the first nonnumeric character or the end of the buffer. Subsequent writes to the same file descriptor overwrite the previous setting. - +.IP Except for the .I npc file, these files are not present on contexts that have been created with @@ -554,7 +554,7 @@ The and .I wbox_stat files contain the available message count. - +.IP The .I wbox_info file contains an array of four-byte mailbox messages, which have been @@ -563,12 +563,12 @@ With current CBEA machines, the array is four items in length, so up to 4 * 4 = 16 bytes can be read from this file. If any mailbox queue entry is empty, then the bytes read at the corresponding location are undefined. - +.IP The .I dma_info file contains the contents of the SPU MFC DMA queue, represented as the following structure: - +.IP .in +4n .nf struct spu_dma_info { @@ -581,13 +581,13 @@ struct spu_dma_info { }; .fi .in - +.IP The last member of this data structure is the actual DMA queue, containing 16 entries. The .I mfc_cq_sr structure is defined as: - +.IP .in +4n .nf struct mfc_cq_sr { @@ -598,13 +598,13 @@ struct mfc_cq_sr { }; .fi .in - +.IP The .I proxydma_info file contains similar information, but describes the proxy DMA queue (i.e., DMAs initiated by entities outside the SPU) instead. The file is in the following format: - +.IP .in +4n .nf struct spu_proxydma_info { @@ -615,11 +615,11 @@ struct spu_proxydma_info { }; .fi .in - +.IP Accessing these files requires that the SPU context is scheduled out - frequent use can be inefficient. These files should not be used for normal program operation. - +.IP These files are not present on contexts that have been created with the .B SPU_CREATE_NOSCHED flag. @@ -653,7 +653,7 @@ The following operations are supported: .BR write (2) Writes to this file need to be in the format of a MFC DMA command, defined as follows: - +.IP .in +4n .nf struct mfc_dma_command { @@ -667,7 +667,7 @@ struct mfc_dma_command { }; .fi .in - +.IP Writes are required to be exactly .I sizeof(struct mfc_dma_command) bytes in size. @@ -695,13 +695,13 @@ or until a previously started DMA (by checking for .BR POLLIN ) has been completed. - +.IP .I /mss Provides access to the MFC MultiSource Synchronization (MSS) facility. By .BR mmap (2)-ing this file, processes can access the MSS area of the SPU. - +.IP The following operations are supported: .TP .BR mmap (2) @@ -719,7 +719,7 @@ Provides access to the whole problem-state mapping of the SPU. Applications can use this area to interface to the SPU, rather than writing to individual register files in .BR spufs . - +.IP The following operations are supported: .RS .TP @@ -737,7 +737,7 @@ Read-only file containing the physical SPU number that the SPU context is running on. When the context is not running, this file contains the string "\-1". - +.IP The physical SPU number is given by an ASCII hex string. .TP .I /object-id @@ -768,5 +768,5 @@ none /spu spufs gid=spu 0 0 .BR spu_create (2), .BR spu_run (2), .BR capabilities (7) - +.PP .I The Cell Broadband Engine Architecture (CBEA) specification diff --git a/man7/standards.7 b/man7/standards.7 index 981263fe1..5ab913518 100644 --- a/man7/standards.7 +++ b/man7/standards.7 @@ -43,7 +43,7 @@ released by the University of California at Berkeley. This was the first Berkeley release that contained a TCP/IP stack and the sockets API. 4.2BSD was released in 1983. - +.IP Earlier major BSD releases included .IR 3BSD (1980), @@ -200,7 +200,7 @@ The standard is available online at .UE , and the interfaces that it describes are also available in the Linux manual pages package under sections 1p and 3p (e.g., "man 3p open"). - +.IP The standard defines two levels of conformance: .IR "POSIX conformance" , which is a baseline set of interfaces required of a conforming system; @@ -213,27 +213,27 @@ XSI-conformant systems can be branded (XSI conformance constitutes the .I "Single UNIX Specification version 3" .RI ( SUSv3 ).) - +.IP The POSIX.1-2001 document is broken into four parts: - +.IP .BR XBD : Definitions, terms and concepts, header file specifications. - +.IP .BR XSH : Specifications of functions (i.e., system calls and library functions in actual implementations). - +.IP .BR XCU : Specifications of commands and utilities (i.e., the area formerly described by POSIX.2). - +.IP .BR XRAT : Informative text on the other parts of the standard. - +.IP POSIX.1-2001 is aligned with C99, so that all of the library functions standardized in C99 are also standardized in POSIX.1-2001. - +.IP Two Technical Corrigenda (minor fixes and improvements) of the original 2001 standard have occurred: TC1 in 2003 (also known as @@ -244,7 +244,7 @@ and TC2 in 2004 (also known as .B POSIX.1-2008, SUSv4 Work on the next revision of POSIX.1/SUS was completed and ratified in 2008. - +.IP The changes in this revision are not as large as those that occurred for POSIX.1-2001/SUSv3, but a number of new interfaces are added @@ -253,7 +253,7 @@ Many of the interfaces that were optional in POSIX.1-2001 become mandatory in the 2008 revision of the standard. A few interfaces that are present in POSIX.1-2001 are marked as obsolete in POSIX.1-2008, or removed from the standard altogether. - +.IP The revised standard is broken into the same four parts as POSIX.1-2001, and again there are two levels of conformance: the baseline .IR "POSIX Conformance" , @@ -261,20 +261,20 @@ and .IR "XSI Conformance" , which mandates an additional set of interfaces beyond those in the base specification. - +.IP In general, where the CONFORMING TO section of a manual page lists POSIX.1-2001, it can be assumed that the interface also conforms to POSIX.1-2008, unless otherwise noted. - +.IP Technical Corrigendum 1 (minor fixes and improvements) of this standard was released in 2013 (also known as .IR POSIX.1-2013 ). - +.IP Technical Corrigendum 2 of this standard was released in 2016 (also known as .IR POSIX.1-2016 ). - +.IP Further information can be found on the Austin Group web site, .UR http://www.opengroup.org\:/austin/ .UE . diff --git a/man7/symlink.7 b/man7/symlink.7 index b7016cc1d..9f5bddd5d 100644 --- a/man7/symlink.7 +++ b/man7/symlink.7 @@ -41,7 +41,7 @@ symlink \- symbolic link handling Symbolic links are files that act as pointers to other files. To understand their behavior, you must first understand how hard links work. - +.PP A hard link to a file is indistinguishable from the original file because it is a reference to the object underlying the original filename. (To be precise: each of the hard links to a file is a reference to @@ -57,7 +57,7 @@ Hard links may not refer to directories which would confuse many programs) and may not refer to files on different filesystems (because inode numbers are not unique across filesystems). - +.PP A symbolic link is a special type of file whose contents are a string that is the pathname of another file, the file to which the link refers. (The contents of a symbolic link can be read using @@ -66,13 +66,13 @@ In other words, a symbolic link is a pointer to another name, and not to an underlying object. For this reason, symbolic links may refer to directories and may cross filesystem boundaries. - +.PP There is no requirement that the pathname referred to by a symbolic link should exist. A symbolic link that refers to a pathname that does not exist is said to be a .IR "dangling link" . - +.PP Because a symbolic link and its referenced object coexist in the filesystem name space, confusion can arise in distinguishing between the link itself and the referenced object. @@ -92,13 +92,13 @@ The only time that the ownership of a symbolic link matters is when the link is being removed or renamed in a directory that has the sticky bit set (see .BR stat (2)). - +.PP The last access and last modification timestamps of a symbolic link can be changed using .BR utimensat (2) or .BR lutimes (3). - +.PP On Linux, the permissions of a symbolic link are not used in any operations; the permissions are always 0777 (read, write, and execute for all user categories), @@ -140,7 +140,7 @@ and .BR readlinkat (2), in order to operate on the symbolic link itself (rather than the file to which it refers). - +.PP By default (i.e., if the .BR AT_SYMLINK_FOLLOW @@ -171,7 +171,7 @@ or a loop is detected. (Loop detection is done by placing an upper limit on the number of links that may be followed, and an error results if this limit is exceeded.) - +.PP There are three separate areas that need to be discussed. They are as follows: .IP 1. 3 @@ -186,7 +186,7 @@ file hierarchy walk). .SS System calls The first area is symbolic links used as filename arguments for system calls. - +.PP Except as noted below, all system calls follow symbolic links. For example, if there were a symbolic link .I slink @@ -196,7 +196,7 @@ the system call .I "open(""slink"" ...\&)" would return a file descriptor referring to the file .IR afile . - +.PP Various system calls do not follow links, and operate on the symbolic link itself. They are: @@ -211,7 +211,7 @@ They are: .BR rmdir (2), and .BR unlink (2). - +.PP Certain other system calls optionally follow symbolic links. They are: .BR faccessat (2), @@ -235,7 +235,7 @@ When .BR rmdir (2) is applied to a symbolic link, it fails with the error .BR ENOTDIR . - +.PP .BR link (2) warrants special discussion. POSIX.1-2001 specifies that @@ -252,7 +252,7 @@ either behavior in an implementation. .SS Commands not traversing a file tree The second area is symbolic links, specified as command-line filename arguments, to commands which are not traversing a file tree. - +.PP Except as noted below, commands follow symbolic links named as command-line arguments. For example, if there were a symbolic link @@ -263,7 +263,7 @@ the command .I "cat slink" would display the contents of the file .IR afile . - +.PP It is important to realize that this rule includes commands which may optionally traverse file trees; for example, the command .I "chown file" @@ -271,7 +271,7 @@ is included in this rule, while the command .IR "chown\ \-R file" , which performs a tree traversal, is not. (The latter is described in the third area, below.) - +.PP If it is explicitly intended that the command operate on the symbolic link instead of following the symbolic link\(emfor example, it is desired that .I "chown slink" @@ -289,7 +289,7 @@ while would change the ownership of .I slink itself. - +.PP There are some exceptions to this rule: .IP * 2 The @@ -362,16 +362,16 @@ The following commands either optionally or always traverse file trees: .BR rm (1), and .BR tar (1). - +.PP It is important to realize that the following rules apply equally to symbolic links encountered during the file tree traversal and symbolic links listed as command-line arguments. - +.PP The \fIfirst rule\fP applies to symbolic links that reference files other than directories. Operations that apply to symbolic links are performed on the links themselves, but otherwise the links are ignored. - +.PP The command .I "rm\ \-r slink directory" will remove @@ -383,12 +383,12 @@ In no case will .BR rm (1) affect the file referred to by .IR slink . - +.PP The \fIsecond rule\fP applies to symbolic links that refer to directories. Symbolic links that refer to directories are never followed by default. This is often referred to as a "physical" walk, as opposed to a "logical" walk (where symbolic links that refer to directories are followed). - +.PP Certain conventions are (should be) followed as consistently as possible by commands that perform file tree walks: .IP * 2 @@ -404,7 +404,7 @@ like the logical name space. flag will be ignored if the .I \-R flag is not also specified.) - +.IP For example, the command .I "chown\ \-HR user slink" will traverse the file hierarchy rooted in the file pointed to by @@ -434,7 +434,7 @@ the logical name space. flag will be ignored if the .I \-R flag is not also specified.) - +.IP For example, the command .I "chown\ \-LR user slink" will change the owner of the file referred to by @@ -474,7 +474,7 @@ options more than once; the last one specified determines the command's behavior. This is intended to permit you to alias commands to behave one way or the other, and then override that behavior on the command line. - +.PP The .BR ls (1) and diff --git a/man7/termio.7 b/man7/termio.7 index 373c971c6..bbc0204c8 100644 --- a/man7/termio.7 +++ b/man7/termio.7 @@ -35,7 +35,7 @@ This interface defined a structure used to store terminal settings, and a range of .BR ioctl (2) operations to get and set terminal attributes. - +.PP The .B termio interface is now obsolete: POSIX.1-1990 standardized a modified @@ -50,7 +50,7 @@ operations that existed in System V. .BR ioctl (2) was unstandardized, and its variadic third argument does not allow argument type checking.) - +.PP If you're looking for a page called "termio", then you can probably find most of the information that you seek in either .BR termios (3) diff --git a/man7/thread-keyring.7 b/man7/thread-keyring.7 index 93f0630a4..14ac95895 100644 --- a/man7/thread-keyring.7 +++ b/man7/thread-keyring.7 @@ -17,19 +17,19 @@ The thread keyring is a keyring used to anchor keys on behalf of a process. It is created only when a thread requests it. The thread keyring has the name (description) .IR _tid . - +.PP A special serial number value, .BR KEY_SPEC_THREAD_KEYRING , is defined that can be used in lieu of the actual serial number of the calling thread's thread keyring. - +.PP From the .BR keyctl (1) utility, '\fB@t\fP' can be used instead of a numeric key ID in much the same way, but as .BR keyctl (1) is a program run after forking, this is of no utility. - +.PP Thread keyrings are not inherited across .BR clone (2) and @@ -37,7 +37,7 @@ and and are cleared by .BR execve (2). A thread keyring is destroyed when the thread that refers to it terminates. - +.PP Initially, a thread does not have a thread keyring. If a thread doesn't have a thread keyring when it is accessed, then it will be created if it is to be modified; diff --git a/man7/time.7 b/man7/time.7 index 0cf3dcc12..a9282c45b 100644 --- a/man7/time.7 +++ b/man7/time.7 @@ -36,7 +36,7 @@ either from a standard point in the past (see the description of the Epoch and calendar time below), or from some point (e.g., the start) in the life of a process .RI ( "elapsed time" ). - +.PP .I "Process time" is defined as the amount of CPU time used by a process. This is sometimes divided into @@ -78,7 +78,7 @@ a clock maintained by the kernel which measures time in .IR jiffies . The size of a jiffy is determined by the value of the kernel constant .IR HZ . - +.PP The value of .I HZ varies across kernel versions and hardware platforms. @@ -93,7 +93,7 @@ yielding a jiffies value of, respectively, 0.01, 0.004, or 0.001 seconds. Since kernel 2.6.20, a further frequency is available: 300, a number that divides evenly for the common video frame rates (PAL, 25 HZ; NTSC, 30 HZ). - +.PP The .BR times (2) system call is a special case. @@ -107,7 +107,7 @@ User-space applications can determine the value of this constant using .SS High-resolution timers Before Linux 2.6.21, the accuracy of timer and sleep system calls (see below) was also limited by the size of the jiffy. - +.PP Since Linux 2.6.21, Linux supports high-resolution timers (HRTs), optionally configurable via .BR CONFIG_HIGH_RES_TIMERS . @@ -120,14 +120,14 @@ checking the resolution returned by a call to .BR clock_getres (2) or looking at the "resolution" entries in .IR /proc/timer_list . - +.PP HRTs are not supported on all hardware architectures. (Support is provided on x86, arm, and powerpc, among others.) .SS The Epoch UNIX systems represent time in seconds since the .IR Epoch , 1970-01-01 00:00:00 +0000 (UTC). - +.PP A program can determine the .I "calendar time" using @@ -164,7 +164,7 @@ Various system calls and functions allow a program to sleep .BR clock_nanosleep (2), and .BR sleep (3). - +.PP Various system calls allow a process to set a timer that expires at some point in the future, and optionally at repeated intervals; see diff --git a/man7/unicode.7 b/man7/unicode.7 index 371cd227f..934f8aeb5 100644 --- a/man7/unicode.7 +++ b/man7/unicode.7 @@ -37,7 +37,7 @@ It also guarantees "round-trip compatibility"; in other words, conversion tables can be built such that no information is lost when a string is converted from any other encoding to UCS and back. - +.PP UCS contains the characters required to represent practically all known languages. This includes not only the Latin, Greek, Cyrillic, @@ -59,7 +59,7 @@ graphical, typographical, mathematical, and scientific symbols, including those provided by TeX, Postscript, APL, MS-DOS, MS-Windows, Macintosh, OCR fonts, as well as many word processing and publishing systems, and more are being added. - +.PP The UCS standard (ISO 10646) describes a 31-bit character set architecture consisting of 128 24-bit @@ -166,7 +166,7 @@ code values (in all locales), a convention that is signaled by the GNU C library to applications by defining the constant .B __STDC_ISO_10646__ as specified in the ISO C99 standard. - +.PP UCS/Unicode can be used just like ASCII in input/output streams, terminal communication, plaintext files, filenames, and environment variables in the ASCII compatible UTF-8 multibyte encoding. @@ -216,7 +216,7 @@ Information technology \(em Universal Multiple-Octet Coded Character Set (UCS) \(em Part 1: Architecture and Basic Multilingual Plane. International Standard ISO/IEC 10646-1, International Organization for Standardization, Geneva, 2000. - +.IP This is the official specification of UCS . Available from .UR http://www.iso.ch/ @@ -228,7 +228,7 @@ Reading, MA, 2000, ISBN 0-201-61633-5. .IP * S. Harbison, G. Steele. C: A Reference Manual. Fourth edition, Prentice Hall, Englewood Cliffs, 1995, ISBN 0-13-326224-3. - +.IP A good reference book about the C programming language. The fourth edition covers the 1994 Amendment 1 to the ISO C90 standard, which diff --git a/man7/user-keyring.7 b/man7/user-keyring.7 index 298c319d6..29de53b76 100644 --- a/man7/user-keyring.7 +++ b/man7/user-keyring.7 @@ -21,7 +21,7 @@ The user keyring has a name (description) of the form where .I is the user ID of the corresponding user. - +.PP The user keyring is associated with the record that the kernel maintains for the UID. It comes into existence upon the first attempt to access either the @@ -33,28 +33,28 @@ The keyring remains pinned in existence so long as there are processes running with that real UID or files opened by those processes remain open. (The keyring can also be pinned indefinitely by linking it into another keyring.) - +.PP Typically, the user keyring is created by .BR pam_keyinit (8) when a user logs in. - +.PP The user keyring is not searched by default by .BR request_key (2). When .BR pam_keyinit (8) creates a session keyring, it adds to it a link to the user keyring so that the user keyring will be searched when the session keyring is. - +.PP A special serial number value, .BR KEY_SPEC_USER_KEYRING , is defined that can be used in lieu of the actual serial number of the calling process's user keyring. - +.PP From the .BR keyctl (1) utility, '\fB@u\fP' can be used instead of a numeric key ID in much the same way. - +.PP User keyrings are independent of .BR clone (2), .BR fork (2), diff --git a/man7/user-session-keyring.7 b/man7/user-session-keyring.7 index 76c9eafa0..5824a0f92 100644 --- a/man7/user-session-keyring.7 +++ b/man7/user-session-keyring.7 @@ -21,7 +21,7 @@ The user session keyring has a name (description) of the form where .I is the user ID of the corresponding user. - +.PP The user session keyring is associated with the record that the kernel maintains for the UID. It comes into existence upon the first attempt to access either the @@ -34,7 +34,7 @@ The keyring remains pinned in existence so long as there are processes running with that real UID or files opened by those processes remain open. (The keyring can also be pinned indefinitely by linking it into another keyring.) - +.PP The user session keyring is created on demand when a thread requests it or when a thread asks for its .BR session-keyring (7) @@ -42,22 +42,22 @@ and that keyring doesn't exist. In the latter case, a user session keyring will be created and, if the session keyring wasn't to be created, the user session keyring will be set as the process's actual session keyring. - +.PP The user session keyring is searched by .BR request_key (2) if the actual session keyring does not exist and is ignored otherwise. - +.PP A special serial number value, .BR KEY_SPEC_USER_SESSION_KEYRING , is defined that can be used in lieu of the actual serial number of the calling process's user session keyring. - +.PP From the .BR keyctl (1) utility, '\fB@us\fP' can be used instead of a numeric key ID in much the same way. - +.PP User session keyrings are independent of .BR clone (2), .BR fork (2), @@ -67,10 +67,10 @@ and .BR _exit (2) excepting that the keyring is destroyed when the UID record is destroyed when the last process pinning it exits. - +.PP If a user session keyring does not exist when it is accessed, it will be created. - +.PP Rather than relying on the user session keyring, it is strongly recommended\(emespecially if the process is running as root\(emthat a diff --git a/man7/user_namespaces.7 b/man7/user_namespaces.7 index 3314789dd..7af73639c 100644 --- a/man7/user_namespaces.7 +++ b/man7/user_namespaces.7 @@ -30,7 +30,7 @@ user_namespaces \- overview of Linux user namespaces .SH DESCRIPTION For an overview of namespaces, see .BR namespaces (7). - +.PP User namespaces isolate security-related identifiers and attributes, in particular, user IDs and group IDs (see @@ -66,7 +66,7 @@ or with the .BR CLONE_NEWUSER flag. - +.PP The kernel imposes (since version 3.11) a limit of 32 nested levels of .\" commit 8742f229b635bf1c1c84a3dfe5e47c814c20b5c8 user namespaces. @@ -77,7 +77,7 @@ or .BR clone (2) that would cause this limit to be exceeded fail with the error .BR EUSERS . - +.PP Each process is a member of exactly one user namespace. A process created via .BR fork (2) @@ -92,7 +92,7 @@ if it has the .BR CAP_SYS_ADMIN in that namespace; upon doing so, it gains a full set of capabilities in that namespace. - +.PP A call to .BR clone (2) or @@ -104,7 +104,7 @@ flag makes the new child process (for or the caller (for .BR unshare (2)) a member of the new user namespace created by the call. - +.PP The .BR NS_GET_PARENT .BR ioctl (2) @@ -136,7 +136,7 @@ and user namespace, even if the new namespace is created or joined by the root user (i.e., a process with user ID 0 in the root namespace). - +.PP Note that a call to .BR execve (2) will cause a process's capabilities to be recalculated in the usual way (see @@ -146,7 +146,7 @@ unless the process has a user ID of 0 within the namespace, or the executable file has a nonempty inheritable capabilities mask, the process will lose all capabilities. See the discussion of user and group ID mappings, below. - +.PP A call to .BR clone (2), .BR unshare (2), @@ -171,7 +171,7 @@ retaining its user namespace membership by using a pair of .BR setns (2) calls to move to another user namespace and then return to its original user namespace. - +.PP The rules for determining whether or not a process has a capability in a particular user namespace are as follows: .IP 1. 3 @@ -222,7 +222,7 @@ only on resources governed by that namespace. In other words, having a capability in a user namespace permits a process to perform privileged operations on resources that are governed by (nonuser) namespaces associated with the user namespace (see the next subsection). - +.PP On the other hand, there are many privileged operations that affect resources that are not associated with any namespace type, for example, changing the system time (governed by @@ -234,14 +234,14 @@ and creating a device (governed by Only a process with privileges in the .I initial user namespace can perform such operations. - +.PP Holding .B CAP_SYS_ADMIN within the user namespace associated with a process's mount namespace allows that process to create bind mounts and mount the following types of filesystems: .\" fs_flags = FS_USERNS_MOUNT in kernel sources - +.PP .RS 4 .PD 0 .IP * 2 @@ -278,7 +278,7 @@ cgroup version 1 named hierarchies (i.e., cgroup filesystems mounted with the .BR """none,name=""" option). - +.PP Holding .B CAP_SYS_ADMIN within the user namespace associated with a process's PID namespace @@ -286,7 +286,7 @@ allows (since Linux 3.8) that process to mount .I /proc filesystems. - +.PP Note however, that mounting block-based filesystems can be done only by a process that holds .BR CAP_SYS_ADMIN @@ -299,13 +299,13 @@ Starting in Linux 3.8, unprivileged processes can create user namespaces, and other the other types of namespaces can be created with just the .B CAP_SYS_ADMIN capability in the caller's user namespace. - +.PP When a non-user-namespace is created, it is owned by the user namespace in which the creating process was a member at the time of the creation of the namespace. Actions on the non-user-namespace require capabilities in the corresponding user namespace. - +.PP If .BR CLONE_NEWUSER is specified along with other @@ -322,7 +322,7 @@ or caller privileges over the remaining namespaces created by the call. Thus, it is possible for an unprivileged caller to specify this combination of flags. - +.PP When a new namespace (other than a user namespace) is created via .BR clone (2) or @@ -344,7 +344,7 @@ the process's UTS namespace, and check whether the process has the required capability .RB ( CAP_SYS_ADMIN ) in that user namespace. - +.PP The .BR NS_GET_USERNS .BR ioctl (2) @@ -369,13 +369,13 @@ inside the user namespace for the process .IR pid . These files can be read to view the mappings in a user namespace and written to (once) to define the mappings. - +.PP The description in the following paragraphs explains the details for .IR uid_map ; .IR gid_map is exactly the same, but each instance of "user ID" is replaced by "group ID". - +.PP The .I uid_map file exposes the mapping of user IDs from the user namespace @@ -389,7 +389,7 @@ will potentially see different values when reading from a particular .I uid_map file, depending on the user ID mappings for the user namespaces of the reading processes. - +.PP Each line in the .I uid_map file specifies a 1-to-1 mapping of a range of contiguous @@ -441,7 +441,7 @@ System calls that return user IDs (group IDs)\(emfor example, and the credential fields in the structure returned by .BR stat (2)\(emreturn the user ID (group ID) mapped into the caller's user namespace. - +.PP When a process accesses a file, its user and group IDs are mapped into the initial user namespace for the purpose of permission checking and assigning IDs when creating a file. @@ -449,7 +449,7 @@ When a process retrieves file user and group IDs via .BR stat (2), the IDs are mapped in the opposite direction, to produce values relative to the process user and group ID mappings. - +.PP The initial user namespace has no parent namespace, but, for consistency, the kernel provides dummy user and group ID mapping files for this namespace. @@ -458,14 +458,14 @@ Looking at the file .RI ( gid_map is the same) from a shell in the initial namespace shows: - +.PP .in +4n .nf $ \fBcat /proc/$$/uid_map\fP 0 0 4294967295 .fi .in - +.PP This mapping tells us that the range starting at user ID 0 in this namespace maps to a range starting at 0 in the (nonexistent) parent namespace, @@ -499,7 +499,7 @@ file in a user namespace fails with the error Similar rules apply for .I gid_map files. - +.PP The lines written to .IR uid_map .RI ( gid_map ) @@ -540,7 +540,7 @@ At least one line must be written to the file. .PP Writes that violate the above rules fail with the error .BR EINVAL . - +.PP In order for a process to write to the .I /proc/[pid]/uid_map .RI ( /proc/[pid]/gid_map ) @@ -623,7 +623,7 @@ and .I gid_map files have been written, only the mapped values may be used in system calls that change user and group IDs. - +.PP For user IDs, the relevant system calls include .BR setuid (2), .BR setfsuid (2), @@ -637,7 +637,7 @@ For group IDs, the relevant system calls include .BR setresgid (2), and .BR setgroups (2). - +.PP Writing .RI \(dq deny \(dq to the @@ -685,7 +685,7 @@ file (and regardless of the process's capabilities), calls to are also not permitted if .IR /proc/[pid]/gid_map has not yet been set. - +.PP A privileged process (one with the .BR CAP_SYS_ADMIN capability in the namespace) may write either of the strings @@ -701,7 +701,7 @@ Writing the string .RI \(dq deny \(dq prevents any process in the user namespace from employing .BR setgroups (2). - +.PP The essence of the restrictions described in the preceding paragraph is that it is permitted to write to .I /proc/[pid]/setgroups @@ -720,10 +720,10 @@ a process can transition only from being disallowed to .BR setgroups (2) being allowed. - +.PP The default value of this file in the initial user namespace is .RI \(dq allow \(dq. - +.PP Once .IR /proc/[pid]/gid_map has been written to @@ -738,11 +738,11 @@ to .IR /proc/[pid]/setgroups (the write fails with the error .BR EPERM ). - +.PP A child user namespace inherits the .IR /proc/[pid]/setgroups setting from its parent. - +.PP If the .I setgroups file has the value @@ -756,7 +756,7 @@ to the file) in this user namespace. .BR EPERM .) This restriction also propagates down to all child user namespaces of this user namespace. - +.PP The .I /proc/[pid]/setgroups file was added in Linux 3.19, @@ -815,7 +815,7 @@ and .IR /proc/sys/kernel/overflowgid in .BR proc (5). - +.PP The cases where unmapped IDs are mapped in this fashion include system calls that return user IDs .RB ( getuid (2), @@ -843,7 +843,7 @@ credentials written to the process accounting file (see .BR acct (5)), and credentials returned with POSIX message queue notifications (see .BR mq_notify (3)). - +.PP There is one notable case where unmapped user and group IDs are .I not .\" from_kuid(), from_kgid() @@ -909,7 +909,7 @@ User namespaces require support in a range of subsystems across the kernel. When an unsupported subsystem is configured into the kernel, it is not possible to configure user namespaces support. - +.PP As at Linux 3.8, most relevant subsystems supported user namespaces, but a number of filesystems did not have the infrastructure needed to map user and group IDs between user namespaces. @@ -929,9 +929,9 @@ The comments and .I usage() function inside the program provide a full explanation of the program. The following shell session demonstrates its use. - +.PP First, we look at the run-time environment: - +.PP .in +4n .nf $ \fBuname \-rs\fP # Need Linux 3.8 or later @@ -942,7 +942,7 @@ $ \fBid \-g\fP 1000 .fi .in - +.PP Now start a new shell in new user .RI ( \-U ), mount @@ -954,16 +954,16 @@ namespaces, with user ID and group ID .RI ( \-G ) 1000 mapped to 0 inside the user namespace: - +.PP .in +4n .nf $ \fB./userns_child_exec \-p \-m \-U \-M '0 1000 1' \-G '0 1000 1' bash\fP .fi .in - +.PP The shell has PID 1, because it is the first process in the new PID namespace: - +.PP .in +4n .nf bash$ \fBecho $$\fP @@ -975,7 +975,7 @@ Mounting a new filesystem and listing all of the processes visible in the new PID namespace shows that the shell can't see any processes outside the PID namespace: - +.PP .in +4n .nf bash$ \fBmount \-t proc proc /proc\fP @@ -985,10 +985,10 @@ bash$ \fBps ax\fP 22 pts/3 R+ 0:00 ps ax .fi .in - +.PP Inside the user namespace, the shell has user and group ID 0, and a full set of permitted and effective capabilities: - +.PP .in +4n .nf bash$ \fBcat /proc/$$/status | egrep '^[UG]id'\fP diff --git a/man7/utf-8.7 b/man7/utf-8.7 index 556a92b9a..f591347a1 100644 --- a/man7/utf-8.7 +++ b/man7/utf-8.7 @@ -46,7 +46,7 @@ The ISO 10646 Universal Character Set (UCS), a superset of Unicode, occupies an even larger code space\(em31\ bits\(emand the obvious UCS-4 encoding for it (a sequence of 32-bit words) has the same problems. - +.PP The UTF-8 encoding of Unicode and UCS does not have these problems and is the common way in which Unicode is used on UNIX-style operating systems. diff --git a/man7/xattr.7 b/man7/xattr.7 index 59b8a1ef1..512dc6dfb 100644 --- a/man7/xattr.7 +++ b/man7/xattr.7 @@ -144,7 +144,7 @@ The list of attribute names that can be returned is also limited to 64 kB (see BUGS in .BR listxattr (2)). - +.PP Some filesystems, such as Reiserfs (and, historically, ext2 and ext3), require the filesystem to be mounted with the .B user_xattr @@ -160,10 +160,10 @@ In the Btrfs, XFS, and Reiserfs filesystem implementations, there is no practical limit on the number of extended attributes associated with a file, and the algorithms used to store extended attribute information on disk are scalable. - +.PP In the JFS, XFS, and Reiserfs filesystem implementations, the limit on bytes used in an EA value is the ceiling imposed by the VFS. - +.PP In the Btrfs filesystem implementation, the total bytes used for the name, value, and implementation overhead bytes is limited to the filesystem @@ -177,7 +177,7 @@ Since the filesystems on which extended attributes are stored might also be used on architectures with a different byte order and machine word size, care should be taken to store attribute values in an architecture-independent format. - +.PP This page was formerly named .BR attr (5). .\" .SH AUTHORS