aio.7, arp.7, attributes.7, boot.7, cgroups.7, cpuset.7, credentials.7, fanotify.7, fifo.7, glob.7, hier.7, hostname.7, icmp.7, inode.7, inotify.7, keyrings.7, libc.7, mailaddr.7, mount_namespaces.7, mq_overview.7, nptl.7, numa.7, path_resolution.7, persistent-keyring.7, pid_namespaces.7, pipe.7, pkeys.7, process-keyring.7, pthreads.7, pty.7, random.7, sched.7, sem_overview.7, session-keyring.7, shm_overview.7, signal-safety.7, signal.7, spufs.7, standards.7, symlink.7, termio.7, thread-keyring.7, time.7, unicode.7, user-keyring.7, user-session-keyring.7, user_namespaces.7, utf-8.7, xattr.7: ffix

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2017-08-18 00:59:04 +02:00
parent 38db2ef4d0
commit a721e8b25f
49 changed files with 705 additions and 705 deletions

View File

@ -34,7 +34,7 @@ The application can elect to be notified of completion of
the I/O operation in a variety of ways:
by delivery of a signal, by instantiation of a thread,
or no notification at all.
.PP
The POSIX AIO interface consists of the following functions:
.TP 16
.BR aio_read (3)
@ -171,11 +171,11 @@ The control block buffer and the buffer pointed to by
.I aio_buf
must not be changed while the I/O operation is in progress.
These buffers must remain valid until the I/O operation completes.
.PP
Simultaneous asynchronous read or write operations using the same
.I aiocb
structure yield undefined results.
.PP
The current Linux POSIX AIO implementation is provided in user space by glibc.
This has a number of limitations, most notably that maintaining multiple
threads to perform I/O operations is expensive and scales poorly.
@ -206,18 +206,18 @@ of a signal.
After all I/O requests have completed,
the program retrieves their status using
.BR aio_return (3).
.PP
The
.B SIGQUIT
signal (generated by typing control-\\) causes the program to request
cancellation of each of the outstanding requests using
.BR aio_cancel (3).
.PP
Here is an example of what we might see when running this program.
In this example, the program queues two requests to standard input,
and these are satisfied by two lines of input containing
"abc" and "x".
.PP
.in +4n
.nf
$ \fB./a.out /dev/stdin /dev/stdin\fP
@ -462,7 +462,7 @@ main(int argc, char *argv[])
.BR aio_return (3),
.BR aio_write (3),
.BR lio_listio (3)
.PP
"Asynchronous I/O Support in Linux 2.5",
Bhattacharya, Pratt, Pulavarty, and Morgan,
Proceedings of the Linux Symposium, 2003,

View File

@ -21,7 +21,7 @@ and IPv4 protocol addresses on directly connected networks.
The user normally doesn't interact directly with this module except to
configure it;
instead it provides a service for other protocols in the kernel.
.PP
A user process can receive ARP packets by using
.BR packet (7)
sockets.
@ -34,7 +34,7 @@ The ARP table can also be controlled via
on any
.B AF_INET
socket.
.PP
The ARP module maintains a cache of mappings between hardware addresses
and protocol addresses.
The cache has a limited size so old and less
@ -46,7 +46,7 @@ be directly manipulated by the use of ioctls and its behavior can be
tuned by the
.I /proc
interfaces described below.
.PP
When there is no positive feedback for an existing mapping after some
time (see the
.I /proc
@ -69,7 +69,7 @@ If that fails too, it will broadcast a new ARP
request to the network.
Requests are sent only when there is data queued
for sending.
.PP
Linux will automatically add a nonpermanent proxy arp entry when it
receives a request for an address it forwards to and proxy arp is
enabled on the receiving interface.
@ -81,7 +81,7 @@ sockets.
They take a pointer to a
.I struct arpreq
as their argument.
.PP
.in +4n
.nf
struct arpreq {
@ -93,14 +93,14 @@ struct arpreq {
};
.fi
.in
.PP
.BR SIOCSARP ", " SIOCDARP " and " SIOCGARP
respectively set, delete and get an ARP mapping.
Setting and deleting ARP maps are privileged operations and may
be performed only by a process with the
.B CAP_NET_ADMIN
capability or an effective UID of 0.
.PP
.I arp_pa
must be an
.B AF_INET
@ -276,13 +276,13 @@ changed in Linux 2.0 to include the
.I arp_dev
member and the ioctl numbers changed at the same time.
Support for the old ioctls was dropped in Linux 2.2.
.PP
Support for proxy arp entries for networks (netmask not equal 0xffffffff)
was dropped in Linux 2.2.
It is replaced by automatic proxy arp setup by
the kernel for all reachable hosts on other interfaces (when
forwarding and proxy arp is enabled for the interface).
.PP
The
.I neigh/*
interfaces did not exist before Linux 2.2.
@ -290,13 +290,13 @@ interfaces did not exist before Linux 2.2.
Some timer settings are specified in jiffies, which is architecture-
and kernel version-dependent; see
.BR time (7).
.PP
There is no way to signal positive feedback from user space.
This means connection-oriented protocols implemented in user space
will generate excessive ARP traffic, because ndisc will regularly
reprobe the MAC address.
The same problem applies for some kernel protocols (e.g., NFS over UDP).
.PP
This man page mashes together functionality that is IPv4-specific
with functionality that is shared between IPv4 and IPv6.
.SH SEE ALSO

View File

@ -32,7 +32,7 @@ the text of this man page is based on the material taken from
the "POSIX Safety Concepts" section of the GNU C Library manual.
Further details on the topics described here can be found in that
manual.
.PP
Various function manual pages include a section ATTRIBUTES
that describes the safety of calling the function in various contexts.
This section annotates functions with the following safety markings:
@ -43,7 +43,7 @@ or
Thread-Safe functions are safe to call in the presence
of other threads.
MT, in MT-Safe, stands for Multi Thread.
.IP
Being MT-Safe does not imply a function is atomic, nor that it uses any
of the memory synchronization mechanisms POSIX exposes to users.
It is even possible that calling MT-Safe functions in sequence
@ -52,7 +52,7 @@ For example, having a thread call two MT-Safe
functions one right after the other does not guarantee behavior
equivalent to atomic execution of a combination of both functions,
since concurrent calls in other threads may interfere in a destructive way.
.IP
Whole-program optimizations that could inline functions across library
interfaces may expose unsafe reordering, and so performing inlining
across the GNU C Library interface is not recommended.
@ -340,7 +340,7 @@ Functions marked with
.I init
as an MT-Unsafe feature perform
MT-Unsafe initialization when they are first called.
.IP
Calling such a function at least once in single-threaded mode removes
this specific cause for the function to be regarded as MT-Unsafe.
If no other cause for that remains,
@ -517,7 +517,7 @@ modify enables readers to be regarded as MT-Safe \" and AS-Safe
(as long as no other reasons for them to be unsafe remain),
since the lack of synchronization is not a problem when the
objects are effectively constant.
.IP
The identifier that follows the
.I const
mark will appear by itself as a safety note in readers.
@ -556,7 +556,7 @@ as a MT-Safety issue
may temporarily install a signal handler for internal purposes,
which may interfere with other uses of the signal,
identified after a colon.
.IP
This safety problem can be worked around by ensuring that no other uses
of the signal will take place for the duration of the call.
Holding a non-recursive mutex while calling all functions that use the same
@ -594,7 +594,7 @@ are MT-Unsafe.
.\" The same window enables changes made by asynchronous signals to be lost.
.\" These functions are also AS-Unsafe,
.\" but the corresponding mark is omitted as redundant.
.IP
It is thus advisable for applications using the terminal to avoid
concurrent and reentrant interactions with it,
by not using it in signal handlers or blocking signals that might use it,
@ -645,7 +645,7 @@ annotated with
called concurrently with locale changes may
behave in ways that do not correspond to any of the locales active
during their execution, but an unpredictable mix thereof.
.IP
We do not mark these functions as MT-Unsafe, \" or AS-Unsafe,
however,
because functions that modify the locale object are marked with
@ -677,7 +677,7 @@ environment with
.BR getenv (3)
or similar, without any guards to ensure
safety in the presence of concurrent modifications.
.IP
We do not mark these functions as MT-Unsafe, \" or AS-Unsafe,
however,
because functions that modify the environment are all marked with
@ -716,7 +716,7 @@ GNU C Library
.I _sigintr
internal data structure without any guards to ensure
safety in the presence of concurrent modifications.
.IP
We do not mark these functions as MT-Unsafe, \" or AS-Unsafe,
however,
because functions that modify this data structure are all marked with
@ -797,7 +797,7 @@ as an MT-Safety issue may temporarily
change the current working directory during their execution,
which may cause relative pathnames to be resolved in unexpected ways in
other threads or within asynchronous signal or cancellation handlers.
.IP
This is not enough of a reason to mark so-marked functions as MT-Unsafe,
.\" or AS-Unsafe,
but when this behavior is optional (e.g.,
@ -836,7 +836,7 @@ It is envisioned that it may be applied to
and
.I corrupt
as well in the future.
.IP
In most cases, the identifier will name a set of functions,
but it may name global objects or function arguments,
or identifiable properties or logical components associated with them,
@ -848,7 +848,7 @@ or
.I :tcattr(fd)
to denote the terminal attributes of a file descriptor
.IR fd .
.IP
The most common use for identifiers is to provide logical groups of
functions and arguments that need to be protected by the same
synchronization primitive in order to ensure safe operation in a given
@ -874,7 +874,7 @@ indicate the preceding marker only applies when argument
is NULL, or global variable
.I one_per_line
is nonzero.
.IP
When all marks that render a function unsafe are
adorned with such conditions,
and none of the named conditions hold,

View File

@ -37,7 +37,7 @@ After power-on or hard reset, control is given
to a program stored in read-only memory (normally
PROM); for historical reasons involving the personal
computer, this program is often called "the \fBBIOS\fR".
.PP
This program normally performs a basic self-test of the
machine and accesses nonvolatile memory to read
further parameters.
@ -46,7 +46,7 @@ battery-backed CMOS memory, so most people
refer to it as "the \fBCMOS\fR"; outside
of the PC world, it is usually called "the \fBNVRAM\fR"
(nonvolatile RAM).
.PP
The parameters stored in the NVRAM vary among
systems, but as a minimum, they should specify
which device can supply an OS loader, or at least which
@ -67,11 +67,11 @@ interactive use, in order to enable specification of an alternative
kernel (maybe a backup in case the one last compiled
isn't functioning) and to pass optional parameters
to the kernel.
.PP
In a traditional PC, the OS loader is located in the initial 512-byte block
of the boot device; this block is known as "the \fBMBR\fR"
(Master Boot Record).
.PP
In most systems, the OS loader is very
limited due to various constraints.
Even on non-PC systems,
@ -79,12 +79,12 @@ there are some limitations on the size and complexity
of this loader, but the size limitation of the PC MBR
(512 bytes, including the partition table) makes it
almost impossible to squeeze much functionality into it.
.PP
Therefore, most systems split the role of loading the OS between
a primary OS loader and a secondary OS loader; this secondary
OS loader may be located within a larger portion of persistent
storage, such as a disk partition.
.PP
In Linux, the OS loader is often either
.BR lilo (8)
or
@ -98,13 +98,13 @@ The kernel starts the virtual memory
swapper (it is a kernel process, called "kswapd" in a modern Linux
kernel), and mounts some filesystem at the root path,
.IR / .
.PP
Some of the parameters that may be passed to the kernel
relate to these activities (for example, the default root filesystem
can be overridden); for further information
on Linux kernel parameters, read
.BR bootparam (7).
.PP
Only then does the kernel create the initial userland
process, which is given the number 1 as its
.B PID
@ -136,13 +136,13 @@ the administrator an easy way to establish an environment
for some usage; each run-level is associated with a set of services
(for example, run-level \fBS\fR is \fIsingle-user\fR mode,
and run-level \fB2\fR entails running most network services).
.PP
The administrator may change the current
run-level via
.BR init (1),
and query the current run-level via
.BR runlevel (8).
.PP
However, since it is not convenient to manage individual services
by editing this file,
.I /etc/inittab
@ -174,7 +174,7 @@ of the form \fI/etc/rc[0\-6S].d\fR.
In each of these directories,
there are links (usually symbolic) to the scripts in the \fI/etc/init.d\fR
directory.
.PP
A primary script (usually \fI/etc/rc\fR) is called from
.BR inittab (5);
this primary script calls each service's script via a link in the
@ -183,7 +183,7 @@ Each link whose name begins with \(aqS\(aq is called with
the argument "start" (thereby starting the service).
Each link whose name begins with \(aqK\(aq is called with
the argument "stop" (thereby stopping the service).
.PP
To define the starting or stopping order within the same run-level,
the name of a link contains an \fBorder-number\fR.
Also, for clarity, the name of a link usually
@ -193,7 +193,7 @@ the link \fI/etc/rc2.d/S80sendmail\fR starts the sendmail service on
runlevel 2.
This happens after \fI/etc/rc2.d/S12syslog\fR is run
but before \fI/etc/rc2.d/S90xfs\fR is run.
.PP
To manage these links is to manage the boot order and run-levels;
under many systems, there are tools to help with this task
(e.g.,
@ -207,7 +207,7 @@ inputs without editing an entire boot script,
some separate configuration file is used, and is located in a specific
directory where an associated boot script may find it
(\fI/etc/sysconfig\fR on older Red Hat systems).
.PP
In older UNIX systems, such a file contained the actual command line
options for a daemon, but in modern Linux systems (and also
in HP-UX), it just contains shell variables.

View File

@ -42,7 +42,7 @@ A
.I cgroup
is a collection of processes that are bound to a set of
limits or parameters defined via the cgroup filesystem.
.PP
A
.I subsystem
is a kernel component that modifies the behavior of
@ -54,7 +54,7 @@ and freezing and resuming execution of the processes in a cgroup.
Subsystems are sometimes also known as
.IR "resource controllers"
(or simply, controllers).
.PP
The cgroups for a controller are arranged in a
.IR hierarchy .
This hierarchy is defined by creating, removing, and
@ -77,7 +77,7 @@ and management of the cgroup hierarchies became rather complex.
(A longer description of these problems can be found in
the kernel source file
.IR Documentation/cgroup\-v2.txt .)
.PP
Because of the problems with the initial cgroups implementation
(cgroups version 1),
starting in Linux 3.10, work began on a new,
@ -87,7 +87,7 @@ Initially marked experimental, and hidden behind the
mount option, the new version (cgroups version 2)
was eventually made official with the release of Linux 4.5.
Differences between the two versions are described in the text below.
.PP
Although cgroups v2 is intended as a replacement for cgroups v1,
the older system continues to exist
(and for compatibility reasons is unlikely to be removed).
@ -109,7 +109,7 @@ processes on the system.
It is also possible comount multiple (or even all) cgroups v1 controllers
against the same cgroup filesystem, meaning that the comounted controllers
manage the same hierarchical organization of processes.
.PP
For each mounted hierarchy,
the directory tree mirrors the control group hierarchy.
Each control group is represented by a directory, with each of its child
@ -125,7 +125,7 @@ which is a child of
Under each cgroup directory is a set of files which can be read or
written to, reflecting resource limits and a few general cgroup
properties.
.PP
In addition, in cgroups v1,
cgroups can be mounted with no bound controller, in which case
they serve only to track processes.
@ -160,7 +160,7 @@ The use of cgroups requires a kernel built with the
option.
In addition, each of the v1 controllers has an associated
configuration option that must be set in order to employ that controller.
.PP
In order to use a v1 controller,
it must be mounted against a cgroup filesystem.
The usual place for such mounts is under a
@ -170,26 +170,26 @@ filesystem mounted at
Thus, one might mount the
.I cpu
controller as follows:
.PP
.nf
.in +4n
mount \-t cgroup \-o cpu none /sys/fs/cgroup/cpu
.in
.fi
.PP
It is possible to comount multiple controllers against the same hierarchy.
For example, here the
.IR cpu
and
.IR cpuacct
controllers are comounted against a single hierarchy:
.PP
.nf
.in +4n
mount \-t cgroup \-o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct
.in
.fi
.PP
Comounting controllers has the effect that a process is in the same cgroup for
all of the comounted controllers.
Separately mounting controllers allows a process to
@ -198,19 +198,19 @@ be in cgroup
for one controller while being in
.I /foo2/foo3
for another.
.PP
It is possible to comount all v1 controllers against the same hierarchy:
.PP
.nf
.in +4n
mount \-t cgroup \-o all cgroup /sys/fs/cgroup
.in
.fi
.PP
(One can achieve the same result by omitting
.IR "\-o all" ,
since it is the default if no controllers are explicitly specified.)
.PP
It is not possible to mount the same controller
against multiple cgroup hierarchies.
For example, it is not possible to mount both the
@ -224,7 +224,7 @@ It is possible to create multiple mount points with exactly
the same set of comounted controllers.
However, in this case all that results is multiple mount points
providing a view of the same hierarchy.
.PP
Note that on many systems, the v1 controllers are automatically mounted under
.IR /sys/fs/cgroup ;
in particular,
@ -244,7 +244,7 @@ when a system is busy.
This does not limit a cgroup's CPU usage if the CPUs are not busy.
For further information, see
.IR Documentation/scheduler/sched-design-CFS.txt .
.IP
In Linux 3.2,
this controller was extended to provide CPU "bandwidth" control.
If the kernel is configured with
@ -258,21 +258,21 @@ Further information can be found in the kernel source file
.TP
.IR cpuacct " (since Linux 2.6.24; " \fBCONFIG_CGROUP_CPUACCT\fP )
This provides accounting for CPU usage by groups of processes.
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup\-v1/cpuacct.txt .
.TP
.IR cpuset " (since Linux 2.6.24; " \fBCONFIG_CPUSETS\fP )
This cgroup can be used to bind the processes in a cgroup to
a specified set of CPUs and NUMA nodes.
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup\-v1/cpusets.txt .
.TP
.IR memory " (since Linux 2.6.25; " \fBCONFIG_MEMCG\fP )
The memory controller supports reporting and limiting of process memory, kernel
memory, and swap used by cgroups.
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup\-v1/memory.txt .
.TP
@ -282,7 +282,7 @@ well as open them for reading or writing.
The policies may be specified as whitelists and blacklists.
Hierarchy is enforced, so new rules must not
violate existing rules for the target or ancestor cgroups.
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup-v1/devices.txt .
.TP
@ -295,7 +295,7 @@ Freezing a cgroup
also causes its children, for example, processes in
.IR /A/B ,
to be frozen.
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup-v1/freezer-subsystem.txt .
.TP
@ -307,7 +307,7 @@ as well as used to shape traffic using
.BR tc (8).
This applies only to packets
leaving the cgroup, not to traffic arriving at the cgroup.
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup-v1/net_cls.txt .
.TP
@ -317,14 +317,14 @@ The
cgroup controls and limits access to specified block devices by
applying IO control in the form of throttling and upper limits against leaf
nodes and intermediate nodes in the storage hierarchy.
.IP
Two policies are available.
The first is a proportional-weight time-based division
of disk implemented with CFQ.
This is in effect for leaf nodes using CFQ.
The second is a throttling policy which specifies
upper I/O rate limits on a device.
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup-v1/blkio-controller.txt .
.TP
@ -332,26 +332,26 @@ Further information can be found in the kernel source file
This controller allows
.I perf
monitoring of the set of processes grouped in a cgroup.
.IP
Further information can be found in the kernel source file
.IR tools/perf/Documentation/perf-record.txt .
.TP
.IR net_prio " (since Linux 3.3; " \fBCONFIG_CGROUP_NET_PRIO\fP )
This allows priorities to be specified, per network interface, for cgroups.
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup-v1/net_prio.txt .
.TP
.IR hugetlb " (since Linux 3.5; " \fBCONFIG_CGROUP_HUGETLB\fP )
This supports limiting the use of huge pages by cgroups.
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup-v1/hugetlb.txt .
.TP
.IR pids " (since Linux 4.3; " \fBCONFIG_CGROUP_PIDS\fP )
This controller permits limiting the number of process that may be created
in a cgroup (and its descendants).
.IP
Further information can be found in the kernel source file
.IR Documentation/cgroup-v1/pids.txt .
.\"
@ -359,33 +359,33 @@ Further information can be found in the kernel source file
A cgroup filesystem initially contains a single root cgroup, '/',
which all processes belong to.
A new cgroup is created by creating a directory in the cgroup filesystem:
.PP
mkdir /sys/fs/cgroup/cpu/cg1
.PP
This creates a new empty cgroup.
.PP
A process may be moved to this cgroup by writing its PID into the cgroup's
.I cgroup.procs
file:
.PP
echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
.PP
Only one PID at a time should be written to this file.
.PP
Writing the value 0 to a
.IR cgroup.procs
file causes the writing process to be moved to the corresponding cgroup.
.PP
When writing a PID into the
.IR cgroup.procs ,
all threads in the process are moved into the new cgroup at once.
.PP
Within a hierarchy, a process can be a member of exactly one cgroup.
Writing a process's PID to a
.IR cgroup.procs
file automatically removes it from the cgroup of
which it was previously a member.
.PP
The
.I cgroup.procs
file can be read to obtain a list of the processes that are
@ -393,7 +393,7 @@ members of a cgroup.
The returned list of PIDs is not guaranteed to be in order.
Nor is it guaranteed to be free of duplicates.
(For example, a PID may be recycled while reading from the list.)
.PP
In cgroups v1 (but not cgroups v2), an individual thread can be moved to
another cgroup by writing its thread ID
(i.e., the kernel thread ID returned by
@ -420,7 +420,7 @@ Two files can be used to determine whether the kernel provides
notifications when a cgroup becomes empty.
A cgroup is considered to be empty when it contains no child
cgroups and no member processes.
.PP
A special file in the root directory of each cgroup hierarchy,
.IR release_agent ,
can be used to register the pathname of a program that may be invoked when
@ -433,11 +433,11 @@ The
.IR release_agent
program might remove the cgroup directory,
or perhaps repopulate with a process.
.PP
The default value of the
.IR release_agent
file is empty, meaning that no release agent is invoked.
.PP
Whether or not the
.IR release_agent
program is invoked when a particular cgroup becomes empty is determined
@ -462,7 +462,7 @@ While (different) controllers may be simultaneously
mounted under the v1 and v2 hierarchies,
it is not possible to mount the same controller simultaneously
under both the v1 and the v2 hierarchies.
.PP
The new behaviors in cgroups v2 are summarized here,
and in some cases elaborated in the following subsections.
.IP 1. 3
@ -506,9 +506,9 @@ all available controllers are mounted against a single hierarchy.
The available controllers are automatically mounted,
meaning that it is not necessary (or possible) to specify the controllers
when mounting the cgroup v2 filesystem using a command such as the following:
.PP
mount -t cgroup2 none /mnt/cgroup2
.PP
A cgroup v2 controller is available only if it is not currently in use
via a mount against a cgroup v1 hierarchy.
Or, to put things another way, it is not possible to employ
@ -519,7 +519,7 @@ With the exception of the root cgroup, processes may reside
only in leaf nodes (cgroups that do not themselves contain child cgroups).
This avoids the need to decide how to partition resources between
processes which are members of cgroup A and processes in child cgroups of A.
.PP
For instance, if cgroup
.I /cg1/cg2
exists, then a process may reside in
@ -580,7 +580,7 @@ which has either the value 0,
meaning that the cgroup (and its descendants)
contain no (nonzombie) processes,
or 1, meaning that the cgroup contains member processes.
.PP
The
.IR cgroup.events
file can be monitored, in order to receive notification when a cgroup
@ -594,7 +594,7 @@ events, and when monitoring the file using
transitions generate
.B POLLPRI
events.
.PP
The cgroups v2
.IR notify_on_release
mechanism offers at least two advantages over the cgroups v1
@ -616,7 +616,7 @@ This file contains information about the controllers
that are compiled into the kernel.
An example of the contents of this file (reformatted for readability)
is the following:
.IP
.nf
.in +4n
#subsys_name hierarchy num_cgroups enabled
@ -634,7 +634,7 @@ hugetlb 0 1 0
pids 2 1 1
.in
.fi
.IP
The fields in this file are, from left to right:
.RS
.IP 1. 3
@ -666,13 +666,13 @@ This file describes control groups to which the process
with the corresponding PID belongs.
The displayed information differs for
cgroups version 1 and version 2 hierarchies.
.IP
For each cgroup hierarchy of which the process is a member,
there is one entry containing three
colon-separated fields of the form:
.IP
hierarchy-ID:controller-list:cgroup-path
.IP
For example:
.IP
.in +4n

View File

@ -175,7 +175,7 @@ it from the cpuset that previously contained it) by writing its
PID to that cpuset's
.I tasks
file (with or without a trailing newline).
.IP
.B Warning:
only one PID may be written to the
.I tasks
@ -199,7 +199,7 @@ in that cpuset are allowed to execute.
See \fBList Format\fR below for a description of the
format of
.IR cpus .
.IP
The CPUs allowed to a cpuset may be changed by
writing a new list to its
.I cpus
@ -212,7 +212,7 @@ If set (1), the cpuset has exclusive use of
its CPUs (no sibling or cousin cpuset may overlap CPUs).
By default, this is off (0).
Newly created cpusets also initially default this to off (0).
.IP
Two cpusets are
.I sibling
cpusets if they share the same parent cpuset in the
@ -250,7 +250,7 @@ its memory nodes (no sibling or cousin may overlap).
Also if set (1), the cpuset is a \fBHardwall\fR cpuset (see below).
By default, this is off (0).
Newly created cpusets also initially default this to off (0).
.IP
Regardless of the
.I mem_exclusive
setting, if one cpuset is the ancestor of another,

View File

@ -38,7 +38,7 @@ A PID is represented using the type
.I pid_t
(defined in
.IR <sys/types.h> ).
.PP
PIDs are used in a range of system calls to identify the process
affected by the call, for example:
.BR kill (2),
@ -59,7 +59,7 @@ and
.BR waitpid (2).
.\" .BR waitid (2),
.\" .BR wait4 (2),
.PP
A process's PID is preserved across an
.BR execve (2).
.SS Parent process ID (PPID)
@ -70,7 +70,7 @@ A process can obtain its PPID using
.BR getppid (2).
A PPID is represented using the type
.IR pid_t .
.PP
A process's PPID is preserved across an
.BR execve (2).
.SS Process group ID and session ID
@ -81,13 +81,13 @@ A process can obtain its session ID using
.BR getsid (2),
and its process group ID using
.BR getpgrp (2).
.PP
A child created by
.BR fork (2)
inherits its parent's session ID and process group ID.
A process's session ID and process group ID are preserved across an
.BR execve (2).
.PP
Sessions and process groups are abstractions devised to support shell
job control.
A process group (sometimes called a "job") is a collection of
@ -100,7 +100,7 @@ A process's group membership can be set using
.BR setpgid (2).
The process whose process ID is the same as its process group ID is the
\fIprocess group leader\fP for that group.
.PP
A session is a collection of processes that share the same session ID.
All of the members of a process group also have the same session ID
(i.e., all of the members of a process group always belong to the
@ -112,7 +112,7 @@ which creates a new session whose session ID is the same
as the PID of the process that called
.BR setsid (2).
The creator of the session is called the \fIsession leader\fP.
.PP
All of the processes in a session share a
.IR "controlling terminal" .
The controlling terminal is established when the session leader
@ -121,7 +121,7 @@ first opens a terminal (unless the
flag is specified when calling
.BR open (2)).
A terminal may be the controlling terminal of at most one session.
.PP
At most one of the jobs in a session may be the
.IR "foreground job" ;
other jobs in the session are
@ -143,7 +143,7 @@ When terminal keys that generate a signal (such as the
.I interrupt
key, normally control-C)
are pressed, the signal is sent to the processes in the foreground job.
.PP
Various system calls and library functions
may operate on all members of a process group,
including
@ -172,7 +172,7 @@ and
.I gid_t
(defined in
.IR <sys/types.h> ).
.PP
On Linux, each process has the following user and group identifiers:
.IP * 3
Real user ID and real group ID.
@ -260,7 +260,7 @@ a process's real user and group ID and supplementary
group IDs are preserved;
the effective and saved set IDs may be changed, as described in
.BR execve (2).
.PP
Aside from the purposes noted above,
a process's user IDs are also employed in a number of other contexts:
.IP * 3

View File

@ -34,14 +34,14 @@ In particular, there is no support for create, delete, and move events.
(See
.BR inotify (7)
for details of an API that does notify those events.)
.PP
Additional capabilities compared to the
.BR inotify (7)
API include the ability to monitor all of the objects
in a mounted filesystem,
the ability to make access permission decisions, and the
possibility to read or modify files before access by other applications.
.PP
The following system calls are used with this API:
.BR fanotify_init (2),
.BR fanotify_mark (2),
@ -104,7 +104,7 @@ or similar)
from the fanotify file descriptor
returned by
.BR fanotify_init (2).
.PP
Two types of events are generated:
.I notification
events and
@ -118,7 +118,7 @@ Permission events are requests to the receiving application to decide
whether permission for a file access shall be granted.
For these events, the recipient must write a response which decides whether
access is granted or not.
.PP
An event is removed from the event queue of the fanotify group
when it has been read.
Permission events that have been read are kept in an internal list of the
@ -137,11 +137,11 @@ is not specified in the call to
until either a file event occurs or the call is interrupted by a signal
(see
.BR signal (7)).
.PP
After a successful
.BR read (2),
the read buffer contains one or more of the following structures:
.PP
.in +4n
.nf
struct fanotify_event_metadata {
@ -160,12 +160,12 @@ For performance reasons, it is recommended to use a large
buffer size (for example, 4096 bytes),
so that multiple events can be retrieved by a single
.BR read (2).
.PP
The return value of
.BR read (2)
is the number of bytes placed in the buffer,
or \-1 in case of an error (but see BUGS).
.PP
The fields of the
.I fanotify_event_metadata
structure are as follows:
@ -291,7 +291,7 @@ To check for any close event, the following bit mask may be used:
.B FAN_CLOSE
A file was closed.
This is a synonym for:
.IP
FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE
.PP
The following macros are provided to iterate over a buffer containing
@ -346,7 +346,7 @@ For permission events, the application must
.BR write (2)
a structure of the following form to the
fanotify file descriptor:
.PP
.in +4n
.nf
struct fanotify_response {
@ -495,7 +495,7 @@ calls to
generate
.B FAN_MODIFY
events.
.PP
As of Linux 3.17,
the following bugs exist:
.IP * 3

View File

@ -55,7 +55,7 @@ When a process tries to write to a FIFO that is not opened
for read on the other side, the process is sent a
.B SIGPIPE
signal.
.PP
FIFO special files can be created by
.BR mkfifo (3),
and are indicated by

View File

@ -31,11 +31,11 @@ Long ago, in UNIX\ V6, there was a program
.I /etc/glob
that would expand wildcard patterns.
Soon afterward this became a shell built-in.
.PP
These days there is also a library routine
.BR glob (3)
that will perform this function for a user program.
.PP
The rules are as follows (POSIX.2, 3.13).
.SS Wildcard matching
A string is a wildcard pattern if it contains one of the
@ -44,9 +44,9 @@ Globbing is the operation
that expands a wildcard pattern into the list of pathnames
matching the pattern.
Matching is defined by:
.PP
A \(aq?\(aq (not between brackets) matches any single character.
.PP
A \(aq*\(aq (not between brackets) matches any string,
including the empty string.
.PP
@ -81,7 +81,7 @@ any character that is not matched by the expression obtained
by removing the first \(aq!\(aq from it.
(Thus, "\fI[!]a\-]\fP" matches any
single character except \(aq]\(aq, \(aqa\(aq and \(aq\-\(aq.)
.PP
One can remove the special meaning of \(aq?\(aq, \(aq*\(aq and \(aq[\(aq by
preceding them by a backslash, or, in case this is part of
a shell command line, enclosing them in quotes.
@ -95,7 +95,7 @@ A \(aq/\(aq in a pathname cannot be matched by a \(aq?\(aq or \(aq*\(aq
wildcard, or by a range like "\fI[.\-0]\fP".
A range containing an explicit \(aq/\(aq character is syntactically incorrect.
(POSIX requires that syntactically incorrect patterns are left unchanged.)
.PP
If a filename starts with a \(aq.\(aq,
this character must be matched explicitly.
(Thus, \fIrm\ *\fP will not remove .profile, and \fItar\ c\ *\fP will not
@ -106,11 +106,11 @@ into the list of matching pathnames" was the original UNIX
definition.
It allowed one to have patterns that expand into
an empty list, as in
.PP
.nf
xv \-wait 0 *.gif *.jpg
.fi
.PP
where perhaps no *.gif files are present (and this is not
an error).
However, POSIX requires that a wildcard pattern is left
@ -119,23 +119,23 @@ matching pathnames is empty.
With
.I bash
one can force the classical behavior using this command:
.PP
shopt \-s nullglob
.\" In Bash v1, by setting allow_null_glob_expansion=true
.PP
(Similar problems occur elsewhere.
For example, where old scripts have
.PP
.nf
rm \`find . \-name "*~"\`
.fi
.PP
new scripts require
.PP
.nf
rm \-f nosuchfile \`find . \-name "*~"\`
.fi
.PP
to avoid error messages from
.I rm
called with an empty argument list.)
@ -147,7 +147,7 @@ First of all, they match
filenames, rather than text, and secondly, the conventions
are not the same: for example, in a regular expression \(aq*\(aq means zero or
more copies of the preceding thing.
.PP
Now that regular expressions have bracket expressions where
the negation is indicated by a \(aq^\(aq, POSIX has declared the
effect of a wildcard pattern "\fI[^...]\fP" to be undefined.
@ -169,13 +169,13 @@ expression: namely (i) the negation, (ii) explicit single characters,
and (iii) ranges.
POSIX specifies ranges in an internationally
more useful way and adds three more types:
.PP
(iii) Ranges X\-Y comprise all characters that fall between X
and Y (inclusive) in the current collating sequence as defined
by the
.B LC_COLLATE
category in the current locale.
.PP
(iv) Named character classes, like
.nf
@ -191,13 +191,13 @@ These character classes are defined by the
.B LC_CTYPE
category
in the current locale.
.PP
(v) Collating symbols, like "\fI[.ch.]\fP" or "\fI[.a-acute.]\fP",
where the string between "\fI[.\fP" and "\fI.]\fP" is a collating
element defined for the current locale.
Note that this may
be a multicharacter element.
.PP
(vi) Equivalence class expressions, like "\fI[=a=]\fP",
where the string between "\fI[=\fP" and "\fI=]\fP" is any collating
element from its equivalence class, as defined for the

View File

@ -271,7 +271,7 @@ This contains information which may change from system release to
system release and used to be a symbolic link to
.I /usr/src/linux/include/linux
to get at operating-system-specific information.
.IP
(Note that one should have include files there that work correctly with
the current libc and in user space.
However, Linux kernel source is not
@ -646,5 +646,5 @@ differently.
.BR ln (1),
.BR proc (5),
.BR mount (8)
.PP
The Filesystem Hierarchy Standard

View File

@ -43,7 +43,7 @@ hostname \- hostname resolution description
Hostnames are domains, where a domain is a hierarchical, dot-separated
list of subdomains; for example, the machine "monet", in the "example"
subdomain of the "com" domain would be represented as "monet.example.com".
.PP
Each element of the hostname must be from 1 to 63 characters long and the
entire hostname, including the dots, can be at most 253 characters long.
Valid characters for hostnames are
@ -58,7 +58,7 @@ to
.IR 9 ,
and the hyphen (\-).
A hostname may not start with a hyphen.
.PP
Hostnames are often used with network client and server programs,
which must generally translate the name to an address for use.
(This task is generally performed by either
@ -67,7 +67,7 @@ or the obsolete
.BR gethostbyname (3).)
Hostnames are resolved by the Internet name resolver in the following
fashion.
.PP
If the name consists of a single component, that is, contains no dot,
and if the environment variable
.B HOSTALIASES
@ -80,11 +80,11 @@ to be substituted for that alias.
If a case-insensitive match is found between the hostname to be resolved
and the first field of a line in the file, the substituted name is looked
up with no further processing.
.PP
If the input name ends with a trailing dot,
the trailing dot is removed,
and the remaining name is looked up with no further processing.
.PP
If the input name does not end with a trailing dot, it is looked up
by searching through a list of domains until a match is found.
The default search list includes first the local domain,
@ -103,11 +103,11 @@ by a system-wide configuration file (see
.BR resolver (5),
.BR mailaddr (7),
.BR named (8)
.PP
.UR http://www.ietf.org\:/rfc\:/rfc1123.txt
IETF RFC\ 1123
.UE
.PP
.UR http://www.ietf.org\:/rfc\:/rfc1178.txt
IETF RFC\ 1178
.UE

View File

@ -85,13 +85,13 @@ packets.
.\" The following taken from 2.6.28-rc4 Documentation/networking/ip-sysctl.txt
If disabled, ICMP error messages are sent with the primary address of
the exiting interface.
.IP
If enabled, the message will be sent with the primary address of
the interface that received the packet that caused the ICMP error.
This is the behavior that many network administrators will expect from
a router.
And it can make debugging complicated network layouts much easier.
.IP
Note that if no primary address exists for the interface selected,
then the primary address of the first non-loopback interface that
has one will be used regardless of this setting.
@ -122,11 +122,11 @@ otherwise the minimum space between responses in milliseconds.
.IR icmp_ratemask " (integer; default: see below; since Linux 2.4.10)"
.\" The following taken from 2.6.28-rc4 Documentation/networking/ip-sysctl.txt
Mask made of ICMP types for which rates are being limited.
.IP
Significant bits: IHGFEDCBA9876543210
.br
Default mask: 0000001100000011000 (0x1818)
.IP
Bit definitions (see the Linux kernel source file
.IR include/linux/icmp.h ):
.RS 12
@ -147,7 +147,7 @@ H Address Mask Request
I Address Mask Reply
.TE
.RE
.PP
The bits marked with an asterisk are rate limited by default
(see the default mask above).
.TP

View File

@ -37,7 +37,7 @@ structure, or
which returns a
.I statx
structure.
.PP
The following is a list of the information typically found in,
or associated with, the file inode,
with the names of the corresponding structure fields returned by
@ -47,7 +47,7 @@ and
.TP
Device where inode resides
\fIstat.st_dev\fP; \fIstatx.stx_dev_minor\fP and \fIstatx.stx_dev_major\fP
.IP
Each inode (as well as the associated file) resides in a filesystem
that is hosted on a device.
That device is identified by the combination of its major ID
@ -56,7 +56,7 @@ and minor ID (which identifies a specific instance in the general class).
.TP
Inode number
\fIstat.st_ino\fP; \fIstatx.stx_ino\fP
.IP
Each file in a filesystem has a unique inode number.
Inode numbers are guaranteed to be unique only within a filesystem
(i.e., the same inode numbers may be used by different filesystems,
@ -65,12 +65,12 @@ This field contains the file's inode number.
.TP
File type and mode
\fIstat.st_mode\fP; \fIstatx.stx_mode\fP
.IP
See the discussion of file type and mode, below.
.TP
Link count
\fIstat.st_nlink\fP; \fIstatx.stx_nlink\fP
.IP
This field contains the number of hard links to the file.
Additional links to an existing file are created using
.BR link (2).
@ -78,7 +78,7 @@ Additional links to an existing file are created using
User ID
.I st_uid
\fIstat.st_uid\fP; \fIstatx.stx_uid\fP
.IP
This field records the user ID of the owner of the file.
For newly created files,
the file user ID is the effective user ID of the creating process.
@ -87,7 +87,7 @@ The user ID of a file can be changed using
.TP
Group ID
\fIstat.st_gid\fP; \fIstatx.stx_gid\fP
.IP
The inode records the ID of the group owner of the file.
For newly created files,
the file group ID is either the group ID of the parent directory or
@ -99,13 +99,13 @@ The group ID of a file can be changed using
.TP
Device represented by this inode
\fIstat.st_rdev\fP; \fIstatx.stx_rdev_minor\fP and \fIstatx.stx_rdev_major\fP
.IP
If this file (inode) represents a device,
then the inode records the major and minor ID of that device.
.TP
File size
\fIstat.st_size\fP; \fIstatx.stx_size\fP
.IP
This field gives the size of the file (if it is a regular
file or a symbolic link) in bytes.
The size of a symbolic link is the length of the pathname
@ -113,20 +113,20 @@ it contains, without a terminating null byte.
.TP
Preferred block size for I/O
\fIstat.st_blksize\fP; \fIstatx.stx_blksize\fP
.IP
This field gives the "preferred" blocksize for efficient filesystem I/O.
(Writing to a file in smaller chunks may cause
an inefficient read-modify-rewrite.)
.TP
Number of blocks allocated to the file
\fIstat.st_blocks\fP; \fIstatx.stx_size\fP
.IP
This field indicates the number of blocks allocated to the file,
512-byte units,
(This may be smaller than
.IR st_size /512
when the file has holes.)
.IP
The POSIX.1 standard notes
.\" Rationale for sys/stat.h in POSIX.1-2008
that the unit for the
@ -140,7 +140,7 @@ Furthermore, the unit may differ on a per-filesystem basis.
.TP
Last access timestamp (atime)
\fIstat.st_atime\fP; \fIstatx.stx_atime\fP
.IP
This is the file's last access timestamp.
It is changed by file accesses, for example, by
.BR execve (2),
@ -153,7 +153,7 @@ and
Other interfaces, such as
.BR mmap (2),
may or may not update the atime timestamp
.IP
Some filesystem types allow mounting in such a way that file
and/or directory accesses do not cause an update of the atime timestamp.
(See
@ -173,17 +173,17 @@ flag; see
.TP
File creation (birth) timestamp (btime)
(not returned in the \fIstat\fP structure); \fIstatx.stx_btime\fP
.IP
The file's creation timestamp.
This is set on file creation and not changed subsequently.
.IP
The btime timestamp was not historically present on UNIX systems
and is not currently supported by most Linux filesystems.
.\" FIXME Is it supported on ext4 and XFS?
.TP
Last modification timestamp (mtime)
\fIstat.st_atime\fP; \fIstatx.stx_mtime\fP
.IP
This is the file's last modification timestamp.
It is changed by file modifications, for example, by
.BR mknod (2),
@ -201,7 +201,7 @@ changed for changes in owner, group, hard link count, or mode.
.TP
Last status change timestamp (ctime)
\fIstat.st_ctime\fP; \fIstatx.stx_ctime\fP
.IP
This is the file's last status change timestamp.
It is changed by writing or by setting inode information
(i.e., owner, group, link count, mode, etc.).
@ -225,7 +225,7 @@ field (for
the
.I statx.stx_mode
field) contains the file type and mode.
.PP
POSIX refers to the
.I stat.st_mode
bits corresponding to the mask
@ -254,7 +254,7 @@ S_IFIFO 0010000 FIFO
.in
.PP
Thus, to test for a regular file (for example), one could write:
.PP
.nf
.in +4n
stat(pathname, &sb);
@ -293,7 +293,7 @@ socket? (Not in POSIX.1-1996.)
.RE
.PP
The preceding code snippet could thus be rewritten as:
.PP
.nf
.in +4n
stat(pathname, &sb);
@ -319,7 +319,7 @@ and
are provided if
.BR _XOPEN_SOURCE
is defined.
.PP
The definition of
.BR S_IFSOCK
can also be exposed either by defining
@ -328,7 +328,7 @@ with a value of 500 or greater or (since glibc 2.24) by defining both
.BR _XOPEN_SOURCE
and
.BR _XOPEN_SOURCE_EXTENDED .
.PP
The definition of
.BR S_ISSOCK ()
is exposed if any of the following feature test macros is defined:
@ -424,7 +424,7 @@ and so on.
The
.BR S_IF*
constants are present in POSIX.1-2001 and later.
.PP
The
.BR S_ISLNK ()
and

View File

@ -35,7 +35,7 @@ Inotify can be used to monitor individual files,
or to monitor directories.
When a directory is monitored, inotify will return events
for the directory itself, and for files inside the directory.
.PP
The following system calls are used with this API:
.IP * 3
.BR inotify_init (2)
@ -99,7 +99,7 @@ in which case the call fails with the error
.BR EINTR ;
see
.BR signal (7)).
.PP
Each successful
.BR read (2)
returns a buffer containing one or more of the following structures:
@ -120,15 +120,15 @@ struct inotify_event {
};
.fi
.in
.PP
.I wd
identifies the watch for which this event occurs.
It is one of the watch descriptors returned by a previous call to
.BR inotify_add_watch (2).
.PP
.I mask
contains bits that describe the event that occurred (see below).
.PP
.I cookie
is a unique integer that connects related events.
Currently, this is used only for rename events, and
@ -140,7 +140,7 @@ events to be connected by the application.
For all other event types,
.I cookie
is set to 0.
.PP
The
.I name
field is present only when an event is returned
@ -149,7 +149,7 @@ it identifies the filename within to the watched directory.
This filename is null-terminated,
and may include further null bytes (\(aq\\0\(aq) to align subsequent reads to a
suitable address boundary.
.PP
The
.I len
field counts all of the bytes in
@ -159,7 +159,7 @@ the length of each
.I inotify_event
structure is thus
.IR "sizeof(struct inotify_event)+len" .
.PP
The behavior when the buffer given to
.BR read (2)
is too small to return information about the next event depends
@ -170,9 +170,9 @@ returns 0; since kernel 2.6.21,
fails with the error
.BR EINVAL .
Specifying a buffer of size
.PP
sizeof(struct inotify_event) + NAME_MAX + 1
.PP
will be sufficient to read at least one event.
.SS inotify events
The
@ -274,7 +274,7 @@ Inotify monitoring is inode-based: when monitoring a file
(but not when monitoring the directory containing a file),
an event can be generated for activity on any link to the file
(in the same or a different directory).
.PP
When monitoring a directory:
.IP * 3
the events marked above with an asterisk (*) can occur both
@ -288,7 +288,7 @@ when monitoring a directory,
events are not generated for the files inside the directory
when the events are performed via a pathname (i.e., a link)
that lies outside the monitored directory.
.PP
When events are generated for objects inside a watched directory, the
.I name
field in the returned
@ -302,7 +302,7 @@ This macro can be used as the
.I mask
argument when calling
.BR inotify_add_watch (2).
.PP
Two additional convenience macros are defined:
.RS 4
.TP
@ -582,7 +582,7 @@ Inotify file descriptors can be monitored using
and
.BR epoll (7).
When an event is available, the file descriptor indicates as readable.
.PP
Since Linux 2.6.25,
signal-driven I/O notification is available for inotify file descriptors;
see the discussion of
@ -611,7 +611,7 @@ and
.B POLLIN
is set in
.IR si_band .
.PP
If successive output inotify events produced on the
inotify file descriptor are identical (same
.IR wd ,
@ -624,13 +624,13 @@ older event has not yet been read (but see BUGS).
This reduces the amount of kernel memory required for the event queue,
but also means that an application can't use inotify to reliably count
file events.
.PP
The events returned by reading from an inotify file descriptor
form an ordered queue.
Thus, for example, it is guaranteed that when renaming from
one directory to another, events will be produced in the
correct order on the inotify file descriptor.
.PP
The set of watch descriptors that is being monitored via
an inotify file descriptor can be viewed via the entry for
the inotify file descriptor in the process's
@ -651,7 +651,7 @@ In particular, there is no easy
way for a process that is monitoring events via inotify
to distinguish events that it triggers
itself from those that are triggered by other processes.
.PP
Inotify reports only events that a user-space program triggers through
the filesystem API.
As a result, it does not catch remote events that occur
@ -664,28 +664,28 @@ Furthermore, various pseudo-filesystems such as
and
.IR /dev/pts
are not monitorable with inotify.
.PP
The inotify API does not report file accesses and modifications that
may occur because of
.BR mmap (2),
.BR msync (2),
and
.BR munmap (2).
.PP
The inotify API identifies affected files by filename.
However, by the time an application processes an inotify event,
the filename may already have been deleted or renamed.
.PP
The inotify API identifies events via watch descriptors.
It is the application's responsibility to cache a mapping
(if one is needed) between watch descriptors and pathnames.
Be aware that directory renamings may affect multiple cached pathnames.
.PP
Inotify monitoring of directories is not recursive:
to monitor subdirectories under a directory,
additional watches must be created.
This can take a significant amount time for large directory trees.
.PP
If monitoring an entire directory subtree,
and a new subdirectory is created in that tree or an existing directory
is renamed into that tree,
@ -694,7 +694,7 @@ new files (and subdirectories) may already exist inside the subdirectory.
Therefore, you may want to scan the contents of the subdirectory
immediately after adding the watch (and, if desired,
recursively add watches for any subdirectories that it contains).
.PP
Note that the event queue can overflow.
In this case, events are lost.
Robust applications should handle the possibility of
@ -706,7 +706,7 @@ approach is to close the inotify file descriptor, empty the cache,
create a new inotify file descriptor,
and then re-create watches and cache entries
for the objects to be monitored.)
.PP
If a filesystem is mounted on top of a monitored directory,
no event is generated, and no events are generated
for objects immediately under the new mount point.
@ -723,7 +723,7 @@ event pair that is generated by
.BR rename (2)
can be matched up via their shared cookie value.
However, the task of matching has some challenges.
.PP
These two events are usually consecutive in the event stream available
when reading from the inotify file descriptor.
However, this is not guaranteed.
@ -740,7 +740,7 @@ inserted into the queue: there may be a brief interval where the
has appeared, but the
.B IN_MOVED_TO
has not.
.PP
Matching up the
.B IN_MOVED_FROM
and
@ -765,7 +765,7 @@ then those watch descriptors will be inconsistent with
the watch descriptors in any pending events.
(Re-creating the inotify file descriptor and rebuilding the cache may
be useful to deal with this scenario.)
.PP
Applications should also allow for the possibility that the
.B IN_MOVED_FROM
event was the last event that could fit in the buffer
@ -793,7 +793,7 @@ calls to
generate
.B IN_MODIFY
events.
.PP
.\" FIXME . kernel commit 611da04f7a31b2208e838be55a42c7a1310ae321
.\" implies that unmount events were buggy 2.6.11 to 2.6.36
.\"
@ -801,7 +801,7 @@ In kernels before 2.6.16, the
.B IN_ONESHOT
.I mask
flag does not work.
.PP
As originally designed and implemented, the
.B IN_ONESHOT
flag did not cause an
@ -811,7 +811,7 @@ However, as an unintended effect of other changes,
since Linux 2.6.36, an
.B IN_IGNORED
event is generated in this case.
.PP
Before kernel 2.6.25,
.\" commit 1c17d18e3775485bf1e0ce79575eb637a94494a2
the kernel code that was intended to coalesce successive identical events
@ -820,7 +820,7 @@ if the older had not yet been read)
instead checked if the most recent event could be coalesced with the
.I oldest
unread event.
.PP
When a watch descriptor is removed by calling
.BR inotify_rm_watch (2)
(or because a watch file is deleted or the filesystem
@ -1089,6 +1089,6 @@ main(int argc, char* argv[])
.BR read (2),
.BR stat (2),
.BR fanotify (7)
.PP
.IR Documentation/filesystems/inotify.txt
in the Linux kernel source tree

View File

@ -25,7 +25,7 @@ those objects and also use the facility for their own purposes; see
.BR request_key (2),
and
.BR keyctl (2).
.PP
A library and some user-space utilities are provided to allow access to the
facility.
See
@ -48,7 +48,7 @@ Type
A key's type defines what sort of data can be held in the key,
how the proposed content of the key will be parsed,
and how the payload will be used.
.IP
There are a number of general-purpose types available, plus some specialist
types defined by specific kernel components.
.TP
@ -65,7 +65,7 @@ instantiation of a key if that key wasn't already known to the kernel
when it was requested.
For further details, see
.BR request_key (2).
.IP
A key's payload can be read and updated if the key type supports it and if
suitable permission is granted to the caller.
.TP
@ -78,7 +78,7 @@ and there is an additional category\(empossessor\(embeyond the usual user,
group, and other (see
.IR Possession ,
below).
.IP
Note that keys are quota controlled, since they require unswappable kernel
memory.
The owning user ID specifies whose quota is to be debited.
@ -113,7 +113,7 @@ to other keys (including other keyrings),
analogous to a directory holding links to files.
The main purpose of a keyring is to prevent other keys from
being garbage collected because nothing refers to them.
.IP
Keyrings with descriptions (names)
that begin with a period (\(aq.\(aq) are reserved to the implementation.
.TP
@ -121,10 +121,10 @@ that begin with a period (\(aq.\(aq) are reserved to the implementation.
This is a general-purpose key type.
The key is kept entirely within kernel memory.
The payload may be read and updated by user-space applications.
.IP
The payload for keys of this type is a blob of arbitrary data
of up to 32,767 bytes.
.IP
The description may be any valid string, though it is preferred that it
start with a colon-delimited prefix representing the service
to which the key is of interest
@ -149,7 +149,7 @@ This key type is similar to the
.I """user"""
key type, but it may hold a payload of up to 1 MiB in size.
This key type is useful for purposes such as holding Kerberos ticket caches.
.IP
The payload data may be stored in a tmpfs filesystem,
rather than in kernel memory,
if the data size exceeds the overhead of storing the data in the filesystem.
@ -165,7 +165,7 @@ thereby preventing it from being written unencrypted into swap space.
There are more specialized key types available also,
but they aren't discussed here
because they aren't intended for normal user-space use.
.PP
Key type names
that begin with a period (\(aq.\(aq) are reserved to the implementation.
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
@ -208,13 +208,13 @@ for more information.
To prevent a key from being garbage collected,
it must anchored to keep its reference count elevated
when it is not in active use by the kernel.
.PP
Keyrings are used to anchor other keys:
each link is a reference on a key.
Note that keyrings themselves are just keys and
are also subject to the same anchoring requirement to prevent
them being garbage collected.
.PP
The kernel makes available a number of anchor keyrings.
Note that some of these keyrings will be created only when first accessed.
.TP
@ -233,7 +233,7 @@ the
the
.BR thread-keyring (7)
(specific to a particular thread).
.IP
As an alternative to using the actual keyring IDs,
in calls to
.BR add_key (2),
@ -253,7 +253,7 @@ Each UID known to the kernel has a record that contains two keyrings: the
and the
.BR user-session-keyring (7).
These exist for as long as the UID record in the kernel exists.
.IP
As an alternative to using the actual keyring IDs,
in calls to
.BR add_key (2),
@ -265,7 +265,7 @@ the special keyring values
and
.BR KEY_SPEC_USER_SESSION_KEYRING
can be used to refer to the caller's own instances of these keyrings.
.IP
A link to the user keyring is placed in a new session keyring by
.BR pam_keyinit (8)
when a new login session is initiated.
@ -528,18 +528,18 @@ The thread need not possess the key for it to be visible in this file.
.\"
.\"Possibly it shouldn't be, but for now it is.
.\"
.IP
The only keys included in the list are those that grant
.I view
permission to the reading process
(regardless of whether or not it possesses them).
LSM security checks are still performed,
and may filter out further keys that the process is not authorized to view.
.IP
An example of the data that one might see in this file
(with the columns numbered for easy reference below)
is the following:
.IP
.nf
.in 0n
(1) (2) (3)(4) (5) (6) (7) (8) (9)
@ -554,7 +554,7 @@ is the following:
3ce56aea I--Q--- 5 perm 3f030000 1000 1000 keyring _ses: 1
.in
.fi
.IP
The fields shown in each line of this file are as follows:
.RS
.TP
@ -612,7 +612,7 @@ Permissions (5)
The key permissions, expressed as four hexadecimal bytes containing,
from left to right, the possessor, user, group, and other permissions.
Within each byte, the permission bits are as follows:
.IP
.PD 0
.RS 12
.TP
@ -651,9 +651,9 @@ Description (9)
The key description (name).
This field contains descriptive information about the key.
For most key types, it has the form
.IP
name[: extra\-info]
.IP
The
.I name
subfield is the key's description (name).
@ -690,9 +690,9 @@ key type
(authorization key; see
.BR request_key (2)),
the description field has the form shown in the following example:
.IP
key:c9a9b19 pid:28880 ci:10
.IP
The three subfields are as follows:
.RS
.TP 5
@ -713,7 +713,7 @@ be instantiated
This file lists various information for each user ID that
has at least one key on the system.
An example of the data that one might see in this file is the following:
.IP
.nf
.in +4n
0: 10 9/9 2/1000000 22/25000000
@ -721,7 +721,7 @@ An example of the data that one might see in this file is the following:
1000: 11 11/11 10/200 271/20000
.in
.fi
.IP
The fields shown in each line are as follows:
.RS
.TP
@ -755,7 +755,7 @@ of time where user space can see an error (respectively
and
.BR EKEYEXPIRED )
that indicates what happened to the key.
.IP
The default value in this file is 300 (i.e., 5 minutes).
.TP
.IR /proc/sys/kernel/keys/persistent_keyring_expiry " (since Linux 3.13)"
@ -768,7 +768,7 @@ or the
.BR keyctl (2)
.B KEYCTL_GET_PERSISTENT
operation.)
.IP
The default value in this file is 259200 (i.e., 3 days).
.PP
The following files (which are writable by privileged processes)
@ -780,21 +780,21 @@ and number of bytes of data that can be stored in key payloads:
.\" Previously: KEYQUOTA_MAX_BYTES 10000
This is the maximum number of bytes of data that a nonroot user
can hold in the payloads of the keys owned by the user.
.IP
The default value in this file is 20,000.
.TP
.IR /proc/sys/kernel/keys/maxkeys " (since Linux 2.6.26)"
.\" commit 0b77f5bfb45c13e1e5142374f9d6ca75292252a4
.\" Previously: KEYQUOTA_MAX_KEYS 100
This is the maximum number of keys that a nonroot user may own.
.IP
The default value in this file is 200.
.TP
.IR /proc/sys/kernel/keys/root_maxbytes " (since Linux 2.6.26)"
This is the maximum number of bytes of data that the root user
(UID 0 in the root user namespace)
can hold in the payloads of the keys owned by root.
.IP
.\"738c5d190f6540539a04baf36ce21d46b5da04bd
The default value in this file is 25,000,000 (20,000 before Linux 3.17).
.\" commit 0b77f5bfb45c13e1e5142374f9d6ca75292252a4
@ -804,7 +804,7 @@ The default value in this file is 25,000,000 (20,000 before Linux 3.17).
This is the maximum number of keys that the root user
(UID 0 in the root user namespace)
may own.
.IP
.\"738c5d190f6540539a04baf36ce21d46b5da04bd
The default value in this file is 1,000,000 (200 before Linux 3.17).
.PP

View File

@ -51,7 +51,7 @@ available via the command
Release 1.0 of glibc was made in September 1992.
(There were earlier 0.x releases.)
The next major release of glibc was 2.0, at the beginning of 1997.
.PP
The pathname
.I /lib/libc.so.6
(or something similar) is normally a symbolic link that
@ -73,7 +73,7 @@ this version used the shared library soname
.IR libc.so.5 .
For a while,
Linux libc was the standard C library in many Linux distributions.
.PP
However, notwithstanding the original motivations of the Linux libc effort,
by the time glibc 2.0 was released (in 1997),
it was clearly superior to Linux libc,
@ -82,7 +82,7 @@ soon switched back to glibc.
To avoid any confusion with Linux libc versions,
glibc 2.0 and later used the shared library soname
.IR libc.so.6 .
.PP
Since the switch from Linux libc to glibc 2.0 occurred long ago,
.I man-pages
no longer takes care to document Linux libc details.

View File

@ -116,7 +116,7 @@ The "postmaster" address is not case sensitive.
.BR aliases (5),
.BR forward (5),
.BR sendmail (8)
.PP
.UR http://www.ietf.org\:/rfc\:/rfc5322.txt
IETF RFC\ 5322
.UE

View File

@ -29,12 +29,12 @@ mount_namespaces \- overview of Linux mount namespaces
.SH DESCRIPTION
For an overview of namespaces, see
.BR namespaces (7).
.PP
Mount namespaces provide isolation of the list of mount points seen
by the processes in each namespace instance.
Thus, the processes in each of the mount namespace instances
will see distinct single-directory hierarchies.
.PP
The views provided by the
.IR /proc/[pid]/mounts ,
.IR /proc/[pid]/mountinfo ,
@ -47,7 +47,7 @@ correspond to the mount namespace in which the process with the PID
resides.
(All of the processes that reside in the same mount namespace
will see the same view in these files.)
.PP
When a process creates a new mount namespace using
.BR clone (2)
or
@ -146,7 +146,7 @@ between namespaces
(or, more precisely, between the members of a
.IR "peer group"
that are propagating events to one another).
.PP
Each mount point is marked (via
.BR mount (2))
as having one of the following
@ -170,7 +170,7 @@ Mount and unmount events do not propagate into or out of this mount point.
Mount and unmount events propagate into this mount point from
a (master) shared peer group.
Mount and unmount events under this mount point do not propagate to any peer.
.IP
Note that a mount point can be the slave of another peer group
while at the same time sharing mount and unmount events
with a peer group of which it is a member.
@ -184,7 +184,7 @@ Attempts to bind mount this mount
with the
.BR MS_BIND
flag) will fail.
.IP
When a recursive bind mount
.RB ( mount (2)
with the
@ -198,13 +198,13 @@ when replicating that subtree to produce the target subtree.
.PP
For a discussion of the propagation type assigned to a new mount,
see NOTES.
.PP
The propagation type is a per-mount-point setting;
some mount points may be marked as shared
(with each shared mount point being a member of a distinct peer group),
while others are private
(or slaved or unbindable).
.PP
Note that a mount's propagation type determines whether
mounts and unmounts of mount points
.I "immediately under"
@ -215,7 +215,7 @@ What happens if the mount point itself is unmounted is determined by
the propagation type that is in effect for the
.I parent
of the mount point.
.PP
Members are added to a
.IR "peer group"
when a mount point is marked as shared and either:
@ -230,7 +230,7 @@ A mount ceases to be a member of a peer group when either
the mount is explicitly unmounted,
or when the mount is implicitly unmounted because a mount namespace is removed
(because it has no more member processes).
.PP
The propagation type of the mount points in a mount namespace
can be discovered via the "optional fields" exposed in
.IR /proc/[pid]/mountinfo .
@ -283,7 +283,7 @@ Suppose that on a terminal in the initial mount namespace,
we mark one mount point as shared and another as private,
and then view the mounts in
.IR /proc/self/mountinfo :
.PP
.nf
.in +4n
sh1# \fBmount \-\-make\-shared /mntS\fP
@ -293,7 +293,7 @@ sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
83 61 8:15 / /mntP rw,relatime
.in
.fi
.PP
From the
.IR /proc/self/mountinfo
output, we see that
@ -310,18 +310,18 @@ and
is the root directory,
.IR / ,
which is mounted as private:
.PP
.nf
.in +4n
sh1# \fBcat /proc/self/mountinfo | awk \(aq$1 == 61\(aq | sed \(aqs/ \- .*//\(aq\fP
61 0 8:2 / / rw,relatime
.in
.fi
.PP
On a second terminal,
we create a new mount namespace where we run a second shell
and inspect the mounts:
.PP
.nf
.in +4n
$ \fBPS1=\(aqsh2# \(aq sudo unshare \-m \-\-propagation unchanged sh\fP
@ -330,7 +330,7 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
225 145 8:15 / /mntP rw,relatime
.in
.fi
.PP
The new mount namespace received a copy of the initial mount namespace's
mount points.
These new mount points maintain the same propagation types,
@ -342,13 +342,13 @@ option prevents
from marking all mounts as private when creating a new mount namespace,
.\" Since util-linux 2.27
which it does by default.)
.PP
In the second terminal, we then create submounts under each of
.IR /mntS
and
.IR /mntP
and inspect the set-up:
.PP
.nf
.in +4n
sh2# \fBmkdir /mntS/a\fP
@ -362,13 +362,13 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
230 225 8:23 / /mntP/b rw,relatime
.in
.fi
.PP
From the above, it can be seen that
.IR /mntS/a
was created as shared (inheriting this setting from its parent mount) and
.IR /mntP/b
was created as a private mount.
.PP
Returning to the first terminal and inspecting the set-up,
we see that the new mount created under the shared mount point
.IR /mntS
@ -376,7 +376,7 @@ propagated to its peer mount (in the initial mount namespace),
but the new mount created under the private mount point
.IR /mntP
did not propagate:
.PP
.nf
.in +4n
sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
@ -395,10 +395,10 @@ an optical disk is mounted in the master shared peer group
(in another mount namespace),
but want to prevent mount and unmount events under the slave mount
from having side effects in other namespaces.
.PP
We can demonstrate the effect of slaving by first marking
two mount points as shared in the initial mount namespace:
.PP
.nf
.in +4n
sh1# \fBmount \-\-make\-shared /mntX\fP
@ -408,10 +408,10 @@ sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
133 83 8:22 / /mntY rw,relatime shared:2
.in
.fi
.PP
On a second terminal,
we create a new mount namespace and inspect the mount points:
.PP
.nf
.in +4n
sh2# \fBunshare \-m \-\-propagation unchanged sh\fP
@ -420,9 +420,9 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
169 167 8:22 / /mntY rw,relatime shared:2
.in
.fi
.PP
In the new mount namespace, we then mark one of the mount points as a slave:
.PP
.nf
.in +4n
sh2# \fBmount \-\-make\-slave /mntY\fP
@ -431,17 +431,17 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
169 167 8:22 / /mntY rw,relatime master:2
.in
.fi
.PP
From the above output, we see that
.IR /mntY
is now a slave mount that is receiving propagation events from
the shared peer group with the ID 2.
.PP
Continuing in the new namespace, we create submounts under each of
.IR /mntX
and
.IR /mntY :
.PP
.nf
.in +4n
sh2# \fBmkdir /mntX/a\fP
@ -450,7 +450,7 @@ sh2# \fBmkdir /mntY/b\fP
sh2# \fBmount /dev/sda5 /mntY/b\fP
.in
.fi
.PP
When we inspect the state of the mount points in the new mount namespace,
we see that
.IR /mntX/a
@ -458,7 +458,7 @@ was created as a new shared mount
(inheriting the "shared" setting from its parent mount) and
.IR /mntY/b
was created as a private mount:
.PP
.nf
.in +4n
sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
@ -468,7 +468,7 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
175 169 8:5 / /mntY/b rw,relatime
.in
.fi
.PP
Returning to the first terminal (in the initial mount namespace),
we see that the mount
.IR /mntX/a
@ -477,7 +477,7 @@ propagated to the peer (the shared
but the mount
.IR /mntY/b
was not propagated:
.PP
.nf
.in +4n
sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
@ -486,11 +486,11 @@ sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
174 132 8:3 / /mntX/a rw,relatime shared:3
.in
.fi
.PP
Now we create a new mount point under
.IR /mntY
in the first shell:
.PP
.nf
.in +4n
sh1# \fBmkdir /mntY/c\fP
@ -502,12 +502,12 @@ sh1# \fBcat /proc/self/mountinfo | grep '/mnt' | sed 's/ \- .*//'\fP
178 133 8:1 / /mntY/c rw,relatime shared:4
.in
.fi
.PP
When we examine the mount points in the second mount namespace,
we see that in this case the new mount has been propagated
to the slave mount point,
and that the new mount is itself a slave mount (to peer group 4):
.PP
.nf
.in +4n
sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
@ -524,9 +524,9 @@ One of the primary purposes of unbindable mounts is to avoid
the "mount point explosion" problem when repeatedly performing bind mounts
of a higher-level subtree at a lower-level mount point.
The problem is illustrated by the following shell session.
.PP
Suppose we have a system with the following mount points:
.PP
.nf
.in +4n
# \fBmount | awk \(aq{print $1, $2, $3}\(aq\fP
@ -535,11 +535,11 @@ Suppose we have a system with the following mount points:
/dev/sdb7 on /mntY
.in
.fi
.PP
Suppose furthermore that we wish to recursively bind mount
the root directory under several users' home directories.
We do this for the first user, and inspect the mount points:
.PP
.nf
.in +4n
# \fBmount \-\-rbind / /home/cecilia/\fP
@ -552,10 +552,10 @@ We do this for the first user, and inspect the mount points:
/dev/sdb7 on /home/cecilia/mntY
.in
.fi
.PP
When we repeat this operation for the second user,
we start to see the explosion problem:
.PP
.nf
.in +4n
# \fBmount \-\-rbind / /home/henry\fP
@ -574,7 +574,7 @@ we start to see the explosion problem:
/dev/sdb7 on /home/henry/home/cecilia/mntY
.in
.fi
.PP
Under
.IR /home/henry ,
we have not only recursively added the
@ -586,7 +586,7 @@ mounts, but also the recursive mounts of those directories under
that were created in the previous step.
Upon repeating the step for a third user,
it becomes obvious that the explosion is exponential in nature:
.PP
.nf
.in +4n
# \fBmount \-\-rbind / /home/otto\fP
@ -617,21 +617,21 @@ it becomes obvious that the explosion is exponential in nature:
/dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY
.in
.fi
.PP
The mount explosion problem in the above scenario can be avoided
by making each of the new mounts unbindable.
The effect of doing this is that recursive mounts of the root
directory will not replicate the unbindable mounts.
We make such a mount for the first user:
.PP
.nf
.in +4n
# \fBmount \-\-rbind \-\-make\-unbindable / /home/cecilia\fP
.in
.fi
.PP
Before going further, we show that unbindable mounts are indeed unbindable:
.PP
.nf
.in +4n
# \fBmkdir /mntZ\fP
@ -643,21 +643,21 @@ mount: wrong fs type, bad option, bad superblock on /home/cecilia,
dmesg | tail or so.
.in
.fi
.PP
Now we create unbindable recursive bind mounts for the other two users:
.PP
.nf
.in +4n
# \fBmount \-\-rbind \-\-make\-unbindable / /home/henry\fP
# \fBmount \-\-rbind \-\-make\-unbindable / /home/otto\fP
.in
.fi
.PP
Upon examining the list of mount points,
we see there has been no explosion of mount points,
because the unbindable mounts were not replicated
under each user's directory:
.PP
.nf
.in +4n
# \fBmount | awk \(aq{print $1, $2, $3}\(aq\fP
@ -695,7 +695,7 @@ slave+shared slave+shared slave priv unbind
private shared priv [2] priv unbind
unbindable shared unbind [2] priv unbind
.TE
.sp 1
Note the following details to the table:
.IP [1] 4
If a shared mount is the only mount in its peer group,
@ -705,9 +705,9 @@ Slaving a nonshared mount has no effect on the mount.
.\"
.SS Bind (MS_BIND) semantics
Suppose that the following command is performed:
.PP
mount \-\-bind A/a B/b
.PP
Here,
.I A
is the source mount point,
@ -727,7 +727,7 @@ depends on the propagation types of the mount points
and
.IR B ,
and is summarized in the following table.
.PP
.TS
lb2 lb1 lb2 lb2 lb2 lb0
lb2 lb1 lb2 lb2 lb2 lb0
@ -738,20 +738,20 @@ _
dest(B) shared | shared shared slave+shared invalid
nonshared | shared private slave invalid
.TE
.sp 1
Note that a recursive bind of a subtree follows the same semantics
as for a bind operation on each mount in the subtree.
(Unbindable mounts are automatically pruned at the target mount point.)
.PP
For further details, see
.I Documentation/filesystems/sharedsubtree.txt
in the kernel source tree.
.\"
.SS Move (MS_MOVE) semantics
Suppose that the following command is performed:
.PP
mount \-\-move A B/b
.PP
Here,
.I A
is the source mount point,
@ -767,7 +767,7 @@ depends on the propagation types of the mount points
and
.IR B ,
and is summarized in the following table.
.PP
.TS
lb2 lb1 lb2 lb2 lb2 lb0
lb2 lb1 lb2 lb2 lb2 lb0
@ -778,18 +778,18 @@ _
dest(B) shared | shared shared slave+shared invalid
nonshared | shared private slave unbindable
.TE
.sp 1
Note: moving a mount that resides under a shared mount is invalid.
.PP
For further details, see
.I Documentation/filesystems/sharedsubtree.txt
in the kernel source tree.
.\"
.SS Mount semantics
Suppose that we use the following command to create a mount point:
.PP
mount device B/b
.PP
Here,
.I B
is the destination mount point, and
@ -804,9 +804,9 @@ is considered always to be private.
.\"
.SS Unmount semantics
Suppose that we use the following command to tear down a mount point:
.PP
unmount A
.PP
Here,
.I A
is a mount point on
@ -835,7 +835,7 @@ record in cases where a process can't see a slave's immediate master
the filesystem root directory)
and so cannot determine the
chain of propagation between the mounts it can see.
.PP
In the following example, we first create a two-link master-slave chain
between the mounts
.IR /mnt ,
@ -850,7 +850,7 @@ mount point unreachable from the root directory,
creating a situation where the master of
.IR /mnt/tmp/etc
is not reachable from the (new) root directory of the process.
.PP
First, we bind mount the root directory onto
.IR /mnt
and then bind mount
@ -863,7 +863,7 @@ the
.BR proc (5)
filesystem remains visible at the correct location
in the chroot-ed environment.
.PP
.nf
.in +4n
# \fBmkdir \-p /mnt/proc\fP
@ -871,11 +871,11 @@ in the chroot-ed environment.
# \fBmount \-\-bind /proc /mnt/proc\fP
.in
.fi
.PP
Next, we ensure that the
.IR /mnt
mount is a shared mount in a new peer group (with no peers):
.PP
.nf
.in +4n
# \fBmount \-\-make\-private /mnt\fP # Isolate from any previous peer group
@ -885,12 +885,12 @@ mount is a shared mount in a new peer group (with no peers):
248 239 0:4 / /mnt/proc ... shared:5
.in
.fi
.PP
Next, we bind mount
.IR /mnt/etc
onto
.IR /tmp/etc :
.PP
.nf
.in +4n
# \fBmkdir \-p /tmp/etc\fP
@ -901,7 +901,7 @@ onto
267 40 8:2 /etc /tmp/etc ... shared:102
.in
.fi
.PP
Initially, these two mount points are in the same peer group,
but we then make the
.IR /tmp/etc
@ -911,7 +911,7 @@ and then make
.IR /tmp/etc
shared as well,
so that it can propagate events to the next slave in the chain:
.PP
.nf
.in +4n
# \fBmount \-\-make\-slave /tmp/etc\fP
@ -922,7 +922,7 @@ so that it can propagate events to the next slave in the chain:
267 40 8:2 /etc /tmp/etc ... shared:105 master:102
.in
.fi
.PP
Then we bind mount
.IR /tmp/etc
onto
@ -932,7 +932,7 @@ but we then make
.IR /mnt/tmp/etc
a slave of
.IR /tmp/etc :
.PP
.nf
.in +4n
# \fBmkdir \-p /mnt/tmp/etc\fP
@ -952,23 +952,23 @@ is the master of the slave
.IR /tmp/etc ,
which in turn is the master of the slave
.IR /mnt/tmp/etc .
.PP
We then
.BR chroot (1)
to the
.IR /mnt
directory, which renders the mount with ID 267 unreachable
from the (new) root directory:
.PP
.nf
.in +4n
# \fBchroot /mnt\fP
.in
.fi
.PP
When we examine the state of the mounts inside the chroot-ed environment,
we see the following:
.PP
.nf
.in +4n
# \fBcat /proc/self/mountinfo | sed \(aqs/ \- .*//\(aq\fP
@ -977,7 +977,7 @@ we see the following:
273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102
.in
.fi
.PP
Above, we see that the mount with ID 273
is a slave whose master is the peer group 105.
The mount point for that master is unreachable, and so a
@ -1006,7 +1006,7 @@ then the propagation type of the new mount is also
Otherwise, the propagation type of the new mount is
.BR MS_PRIVATE .
But see also NOTES.
.PP
Notwithstanding the fact that the default propagation type
for new mount points is in many cases
.BR MS_PRIVATE ,
@ -1019,7 +1019,7 @@ automatically remounts all mount points as
on system startup.
Thus, on most modern systems, the default propagation type is in practice
.BR MS_SHARED .
.PP
Since, when one uses
.BR unshare (1)
to create a mount namespace,
@ -1034,14 +1034,14 @@ by making all mount points private in the new namespace.
That is,
.BR unshare (1)
performs the equivalent of the following in the new mount namespace:
.PP
mount \-\-make\-rprivate /
.PP
To prevent this, one can use the
.IR "\-\-propagation\ unchanged"
option to
.BR unshare (1).
.PP
For a discussion of propagation types when moving mounts
.RB ( MS_MOVE )
and creating bind mounts
@ -1058,6 +1058,6 @@ see
.BR proc (5),
.BR namespaces (7),
.BR user_namespaces (7)
.PP
.IR Documentation/filesystems/sharedsubtree.txt
in the kernel source tree.

View File

@ -34,7 +34,7 @@ This API is distinct from that provided by System V message queues
.BR msgsnd (2),
.BR msgrcv (2),
etc.), but provides similar functionality.
.PP
Message queues are created and opened using
.BR mq_open (3);
this function returns a
@ -49,7 +49,7 @@ that is, a null-terminated string of up to
followed by one or more characters, none of which are slashes.
Two processes can operate on the same queue by passing the same name to
.BR mq_open (3).
.PP
Messages are transferred to and from a queue using
.BR mq_send (3)
and
@ -65,7 +65,7 @@ and
A process can request asynchronous notification
of the arrival of a message on a previously empty queue using
.BR mq_notify (3).
.PP
A message queue descriptor is a reference to an
.I "open message queue description"
(cf.
@ -78,7 +78,7 @@ as the corresponding message queue descriptors in the parent.
Corresponding message queue descriptors in the two processes share the flags
.RI ( mq_flags )
that are associated with the open message queue description.
.PP
Each message has an associated
.IR priority ,
and messages are always delivered to the receiving process
@ -184,7 +184,7 @@ limit is ignored for privileged processes
but the
.BR HARD_MSGMAX
ceiling is nevertheless imposed.
.IP
The definition of
.BR HARD_MSGMAX
has changed across kernel versions:
@ -294,14 +294,14 @@ commands:
.fi
.in
The sticky bit is automatically enabled on the mount directory.
.PP
After the filesystem has been mounted, the message queues on the system
can be viewed and manipulated using the commands usually used for files
(e.g.,
.BR ls (1)
and
.BR rm (1)).
.PP
The contents of each file in the directory consist of a single line
containing information about the queue:
.in +4n
@ -345,7 +345,7 @@ This means that a message queue descriptor can be monitored using
or
.BR epoll (7).
This is not portable.
.PP
The close-on-exec flag (see
.BR open (2))
is automatically set on the file descriptor returned by
@ -364,7 +364,7 @@ POSIX message queues provide a better designed interface than
System V message queues;
on the other hand POSIX message queues are less widely available
(especially on older systems) than System V message queues.
.PP
Linux does not currently (2.6.26) support the use of access control
lists (ACLs) for POSIX message queues.
.SH BUGS
@ -376,7 +376,7 @@ limit could be raised,
and the ceiling was enforced even for privileged processes.
This ceiling value was removed in Linux 3.14,
and patches to stable kernels 3.5.x to 3.13.x also removed the ceiling.
.PP
As originally implemented (and documented),
the QSIZE field displayed the total number of (user-supplied)
bytes in all messages in the message queue.

View File

@ -40,7 +40,7 @@ One of these signals is used to support thread cancellation and POSIX timers
the other is used as part of a mechanism that ensures all threads in
a process always have the same UIDs and GIDs, as required by POSIX.
These signals cannot be used in applications.
.PP
To prevent accidental use of these signals in applications,
which might interfere with the operation of the NPTL implementation,
various glibc library functions and system call wrapper functions
@ -86,7 +86,7 @@ the NPTL implementation wraps all of the system calls that
change process credentials with functions that,
in addition to invoking the underlying system call,
arrange for all other threads in the process to also change their credentials.
.PP
The implementation of each of these system calls involves the use of
a real-time signal that is sent (using
.BR tgkill (2))
@ -96,7 +96,7 @@ saves the new credential(s) and records the system call being employed
in a global buffer.
A signal handler in the receiving thread(s) fetches this information and
then uses the same system call to change its credentials.
.PP
Wrapper functions employing this technique are provided for
.BR setgid (2),
.BR setuid (2),

View File

@ -55,11 +55,11 @@ see "Library Support" below.
.\" See also Changelog-2.6.14
This file displays information about a process's
NUMA memory policy and allocation.
.PP
Each line contains information about a memory range used by the process,
displaying\(emamong other information\(emthe effective memory policy for
that memory range and on which nodes the pages have been allocated.
.PP
.I numa_maps
is a read-only file.
When
@ -67,14 +67,14 @@ When
is read, the kernel will scan the virtual address space of the
process and report how memory is used.
One line is displayed for each unique memory range of the process.
.PP
The first field of each line shows the starting address of the memory range.
This field allows a correlation with the contents of the
.I /proc/<pid>/maps
file,
which contains the end address of the range and other information,
such as the access permissions and sharing.
.PP
The second field shows the memory policy currently in effect for the
memory range.
Note that the effective policy is not necessarily the policy
@ -82,7 +82,7 @@ installed by the process for that memory range.
Specifically, if the process installed a "default" policy for that range,
the effective policy for that range will be the process policy,
which may or may not be "default".
.PP
The rest of the line contains information about the pages allocated in
the memory range, as follows:
.TP
@ -163,7 +163,7 @@ and the required
header are available in the
.I numactl
package.
.PP
However, applications should not use these system calls directly.
Instead, the higher level interface provided by the
.BR numa (3)

View File

@ -47,7 +47,7 @@ system call that had the
.B CLONE_NEWNS
flag set.)
This handles the \(aq/\(aq part of the pathname.
.PP
If the pathname does not start with the \(aq/\(aq character, the
starting lookup directory of the resolution process is the current working
directory of the process.
@ -55,7 +55,7 @@ directory of the process.
It can be changed by use of the
.BR chdir (2)
system call.)
.PP
Pathnames starting with a \(aq/\(aq character are called absolute pathnames.
Pathnames not starting with a \(aq/\(aq are called relative pathnames.
.SS Step 2: walk along the path
@ -63,27 +63,27 @@ Set the current lookup directory to the starting lookup directory.
Now, for each nonfinal component of the pathname, where a component
is a substring delimited by \(aq/\(aq characters, this component is looked up
in the current lookup directory.
.PP
If the process does not have search permission on
the current lookup directory,
an
.B EACCES
error is returned ("Permission denied").
.PP
If the component is not found, an
.B ENOENT
error is returned
("No such file or directory").
.PP
If the component is found, but is neither a directory nor a symbolic link,
an
.B ENOTDIR
error is returned ("Not a directory").
.PP
If the component is found and is a directory, we set the
current lookup directory to that directory, and go to the
next component.
.PP
If the component is found and is a symbolic link (symlink), we first
resolve this symbolic link (with the current lookup directory
as starting lookup directory).
@ -106,7 +106,7 @@ An
.B ELOOP
error is returned when the maximum is
exceeded ("Too many levels of symbolic links").
.PP
.\"
.\" presently: max recursion depth during symlink resolution: 5
.\" max total number of symbolic links followed: 40
@ -140,17 +140,17 @@ system calls.
By convention, every directory has the entries "." and "..",
which refer to the directory itself and to its parent directory,
respectively.
.PP
The path resolution process will assume that these entries have
their conventional meanings, regardless of whether they are
actually present in the physical filesystem.
.PP
One cannot walk down past the root: "/.." is the same as "/".
.SS Mount points
After a "mount dev path" command, the pathname "path" refers to
the root of the filesystem hierarchy on the device "dev", and no
longer to whatever it referred to earlier.
.PP
One can walk out of a mounted filesystem: "path/.." refers to
the parent directory of "path",
outside of the filesystem hierarchy on "dev".
@ -196,16 +196,16 @@ effective group ID of the calling process, or is one of the
supplementary group IDs of the calling process (as set by
.BR setgroups (2)).
When neither holds, the third group is used.
.PP
Of the three bits used, the first bit determines read permission,
the second write permission, and the last execute permission
in case of ordinary files, or search permission in case of directories.
.PP
Linux uses the fsuid instead of the effective user ID in permission checks.
Ordinarily the fsuid will equal the effective user ID, but the fsuid can be
changed by the system call
.BR setfsuid (2).
.PP
(Here "fsuid" stands for something like "filesystem user ID".
The concept was required for the implementation of a user space
NFS server at a time when processes could send a signal to a process
@ -213,7 +213,7 @@ with the same effective user ID.
It is obsolete now.
Nobody should use
.BR setfsuid (2).)
.PP
Similarly, Linux uses the fsgid ("filesystem group ID")
instead of the effective group ID.
See
@ -230,7 +230,7 @@ when accessing files.
.\" on some implementations (e.g., Solaris, FreeBSD),
.\" access(X_OK) by superuser will report success, regardless
.\" of the file's execute permission bits. -- MTK (Oct 05)
.PP
On Linux, superuser privileges are divided into capabilities (see
.BR capabilities (7)).
Two capabilities are relevant for file permissions checks:
@ -238,13 +238,13 @@ Two capabilities are relevant for file permissions checks:
and
.BR CAP_DAC_READ_SEARCH .
(A process has these capabilities if its fsuid is 0.)
.PP
The
.B CAP_DAC_OVERRIDE
capability overrides all permission checking,
but grants execute permission only when at least one
of the file's three execute permission bits is set.
.PP
The
.B CAP_DAC_READ_SEARCH
capability grants read and search permission

View File

@ -21,7 +21,7 @@ The persistent keyring has a name (description) of the form
where
.I <UID>
is the user ID of the corresponding user.
.PP
The persistent keyring may not be accessed directly,
even by processes with the appropriate UID.
.\" FIXME The meaning of the preceding sentence isn't clear. What is meant?
@ -31,30 +31,30 @@ by virtue of its possessor permits.
This linking is done with the
.BR keyctl_get_persistent (3)
function.
.PP
If a persistent keyring does not exist when it is accessed by the
.BR keyctl_get_persistent (3)
operation, it will be automatically created.
.PP
Each time the
.BR keyctl_get_persistent (3)
operation is performed,
the persistent key's expiration timer is reset to the value in:
.PP
/proc/sys/kernel/keys/persistent_keyring_expiry
.PP
Should the timeout be reached,
the persistent keyring will be removed and
everything it pins can then be garbage collected.
The key will then be re-created on a subsequent call to
.BR keyctl_get_persistent (3).
.PP
The persistent keyring is not directly searched by
.BR request_key (2);
it is searched only if it is linked into one of the keyrings
that is searched by
.BR request_key (2).
.PP
The persistent keyring is independent of
.BR clone (2),
.BR fork (2),
@ -74,7 +74,7 @@ The persistent keyring can thus be used to
hold authentication tokens for processes that run without user interaction,
such as programs started by
.BR cron (8).
.PP
The persistent keyring is used to store UID-specific objects that
themselves have limited lifetimes (e.g., kerberos tokens).
If those tokens cease to be used

View File

@ -30,14 +30,14 @@ pid_namespaces \- overview of Linux PID namespaces
.SH DESCRIPTION
For an overview of namespaces, see
.BR namespaces (7).
.PP
PID namespaces isolate the process ID number space,
meaning that processes in different PID namespaces can have the same PID.
PID namespaces allow containers to provide functionality
such as suspending/resuming the set of processes in the container and
migrating the container to a new host
while the processes inside the container maintain the same PIDs.
.PP
PIDs in a new PID namespace start at 1,
somewhat like a standalone system, and calls to
.BR fork (2),
@ -45,7 +45,7 @@ somewhat like a standalone system, and calls to
or
.BR clone (2)
will produce processes with PIDs that are unique within the namespace.
.PP
Use of PID namespaces requires a kernel that is configured with the
.B CONFIG_PID_NS
option.
@ -72,7 +72,7 @@ in the same PID namespace employed the
.BR prctl (2)
.B PR_SET_CHILD_SUBREAPER
command to mark itself as the reaper of orphaned descendant processes).
.PP
If the "init" process of a PID namespace terminates,
the kernel terminates all of the processes in the namespace via a
.BR SIGKILL
@ -99,13 +99,13 @@ terminates, then subsequent calls to
.BR fork (2)
will fail with
.BR ENOMEM .
.PP
Only signals for which the "init" process has established a signal handler
can be sent to the "init" process by other members of the PID namespace.
This restriction applies even to privileged processes,
and prevents other members of the PID namespace from
accidentally killing the "init" process.
.PP
Likewise, a process in an ancestor namespace
can\(emsubject to the usual permission checks described in
.BR kill (2)\(emsend
@ -125,7 +125,7 @@ these signals are forcibly delivered when sent from an ancestor PID namespace.
Neither of these signals can be caught by the "init" process,
and so will result in the usual actions associated with those signals
(respectively, terminating and stopping the process).
.PP
Starting with Linux 3.4, the
.BR reboot (2)
system call causes a signal to be sent to the namespace "init" process.
@ -150,7 +150,7 @@ Since Linux 3.7,
.\" commit f2302505775fd13ba93f034206f1e2a587017929
.\" The kernel constant MAX_PID_NS_LEVEL
the kernel limits the maximum nesting depth for PID namespaces to 32.
.PP
A process is visible to other processes in its PID namespace,
and to the processes in each direct ancestor PID namespace
going back to the root PID namespace.
@ -165,7 +165,7 @@ set nice values with
.BR setpriority (2),
etc.) only processes contained in its own PID namespace
and in descendants of that namespace.
.PP
A process has one process ID in each of the layers of the PID
namespace hierarchy in which is visible,
and walking back though each direct ancestor namespace
@ -177,7 +177,7 @@ A call to
.BR getpid (2)
always returns the PID associated with the namespace in which
the process was created.
.PP
Some processes in a PID namespace may have parents
that are outside of the namespace.
For example, the parent of the initial process in the namespace
@ -192,7 +192,7 @@ PID namespace from the caller of
Calls to
.BR getppid (2)
for such processes return 0.
.PP
While processes may freely descend into child PID namespaces
(e.g., using
.BR setns (2)
@ -201,7 +201,7 @@ they may not move in the other direction.
That is to say, processes may not enter any ancestor namespaces
(parent, grandparent, etc.).
Changing PID namespaces is a one-way operation.
.PP
The
.BR NS_GET_PARENT
.BR ioctl (2)
@ -231,7 +231,7 @@ because doing so would change the caller's idea of its own PID
(as reported by
.BR getpid ()),
which would break many applications and libraries.
.PP
To put things another way:
a process's PID namespace membership is determined when the process is created
and cannot be changed thereafter.
@ -260,7 +260,7 @@ type in
Since this is computed when a signal is enqueued,
a signal queue shared by processes in multiple PID namespaces
would defeat that.
.PP
.\" Note these restrictions were all introduced in
.\" 8382fcac1b813ad0a4e68a838fc7ae93fa39eda0
.\" when CLONE_NEWPID|CLONE_VM was disallowed
@ -289,7 +289,7 @@ directories) only processes visible in the PID namespace
of the process that performed the mount, even if the
.I /proc
filesystem is viewed from processes in other namespaces.
.PP
After creating a new PID namespace,
it is useful for the child to change its root directory
and mount a new procfs instance at
@ -308,7 +308,7 @@ or
then it isn't necessary to change the root directory:
a new procfs instance can be mounted directly over
.IR /proc .
.PP
From a shell, the command to mount
.I /proc
is:

View File

@ -34,7 +34,7 @@ and a
.IR "write end" .
Data written to the write end of a pipe can be read
from the read end of the pipe.
.PP
A pipe is created using
.BR pipe (2),
which creates a new pipe and returns two file descriptors,
@ -44,7 +44,7 @@ Pipes can be used to create a communication channel between related
processes; see
.BR pipe (2)
for an example.
.PP
A FIFO (short for First In First Out) has a name within the filesystem
(created using
.BR mkfifo (3)),
@ -68,7 +68,7 @@ The only difference between pipes and FIFOs is the manner in which
they are created and opened.
Once these tasks have been accomplished,
I/O on pipes and FIFOs has exactly the same semantics.
.PP
If a process attempts to read from an empty pipe, then
.BR read (2)
will block until data is available.
@ -82,11 +82,11 @@ Nonblocking I/O is possible by using the
operation to enable the
.B O_NONBLOCK
open file status flag.
.PP
The communication channel provided by a pipe is a
.IR "byte stream" :
there is no concept of message boundaries.
.PP
If all file descriptors referring to the write end of a pipe
have been closed, then an attempt to
.BR read (2)
@ -113,7 +113,7 @@ calls to close unnecessary duplicate file descriptors;
this ensures that end-of-file and
.BR SIGPIPE / EPIPE
are delivered when appropriate.
.PP
It is not possible to apply
.BR lseek (2)
to a pipe.
@ -129,7 +129,7 @@ Applications should not rely on a particular capacity:
an application should be designed so that a reading process consumes data
as soon as it is available,
so that a writing process does not remain blocked.
.PP
In Linux versions before 2.6.11, the capacity of a pipe was the same as
the system page size (e.g., 4096 bytes on i386).
Since Linux 2.6.11, the pipe capacity is 16 pages
@ -144,7 +144,7 @@ operations.
See
.BR fcntl (2)
for more information.
.PP
The following
.BR ioctl (2)
operation, which can be applied to a file descriptor
@ -152,9 +152,9 @@ that refers to either end of a pipe,
places a count of the number of unread bytes in the pipe in the
.I int
buffer pointed to by the final argument of the call:
.PP
ioctl(fd, FIONREAD, &nbytes);
.PP
The
.B FIONREAD
operation is not specified in any standard,
@ -170,10 +170,10 @@ An upper limit, in pages, on the capacity that an unprivileged user
.BR CAP_SYS_RESOURCE
capability)
can set for a pipe.
.IP
The default value for this limit is 16 times the default pipe capacity
(see above); the lower limit is two pages.
.IP
This interface was removed in Linux 2.6.35, in favor of
.IR /proc/sys/fs/pipe-max-size .
.TP
@ -189,14 +189,14 @@ The value assigned to this file may be rounded upward,
to reflect the value actually employed for a convenient implementation.
To determine the rounded-up value,
display the contents of this file after assigning a value to it.
.IP
The default value for this file is 1048576 (1 MiB).
The minimum value that can be assigned to this file is the system page size.
Attempts to set a limit less than the page size cause
.BR write (2)
to fail with the error
.BR EINVAL .
.IP
Since Linux 4.9,
.\" commit 086e774a57fba4695f14383c0818994c0b31da7c
the value on this file also acts as a ceiling on the default capacity
@ -214,7 +214,7 @@ So long as the total number of pages allocated to pipe buffers
for this user is at this limit,
attempts to create new pipes will be denied,
and attempts to increase a pipe's capacity will be denied.
.IP
When the value of this limit is zero (which is the default),
no hard limit is applied.
.\" The default was chosen to avoid breaking existing applications that
@ -232,7 +232,7 @@ So long as the total number of pages allocated to pipe buffers
for this user is at this limit,
individual pipes created by a user will be limited to one page,
and attempts to increase a pipe's capacity will be denied.
.IP
When the value of this limit is zero, no soft limit is applied.
The default value for this file is 16384,
which permits creating up to 1024 pipes with the default capacity.
@ -321,7 +321,7 @@ a pipe or FIFO are
.B O_NONBLOCK
and
.BR O_ASYNC .
.PP
Setting the
.B O_ASYNC
flag for the read end of a pipe causes a signal
@ -359,7 +359,7 @@ and excluded the memory required for the increased pipe capacity.
The new increase in pipe capacity could then push the total
memory used by the user for pipes (possibly far) over a limit.
(This could also trigger the problem described next.)
.IP
Starting with Linux 4.9,
the limit checking includes the memory required for the new pipe capacity.
.IP (2)
@ -368,13 +368,13 @@ less than the existing pipe capacity.
This could lead to problems if a user set a large pipe capacity,
and then the limits were lowered, with the result that the user could
no longer decrease the pipe capacity.
.IP
Starting with Linux 4.9, checks against the limits
are performed only when increasing a pipe's capacity;
an unprivileged user can always decrease a pipe's capacity.
.IP (3)
The accounting and checking against the limits were done as follows:
.IP
.RS
.PD 0
.IP (a) 4
@ -391,7 +391,7 @@ Multiple processes could pass point (a) simultaneously,
and then allocate pipe buffers that were accounted for only in step (c),
with the result that the user's pipe buffer
allocation could be pushed over the limit.
.IP
Starting with Linux 4.9,
the accounting step is performed before doing the allocation,
and the operation fails if the limit would be exceeded.

View File

@ -34,13 +34,13 @@ when changing permissions.
Memory Protection Keys provide a mechanism for changing
protections without requiring modification of the page tables on
every permission change.
.PP
To use pkeys, software must first "tag" a page in the page tables
with a pkey.
After this tag is in place, an application only has
to change the contents of a register in order to remove write
access, or all access to a tagged page.
.PP
Protection keys work in conjunction with the existing
.BR PROT_READ /
.BR PROT_WRITE /
@ -51,7 +51,7 @@ and
.BR mmap (2),
but always act to further restrict these traditional permission
mechanisms.
.PP
If a process performs an access that violates pkey
restrictions, it receives a
.BR SIGSEGV
@ -59,7 +59,7 @@ signal.
See
.BR sigaction (2)
for details of the information available with that signal.
.PP
To use the pkeys feature, the processor must support it, and the kernel
must contain support for the feature on a given processor.
As of early 2016 only future Intel x86 processors are supported,
@ -69,7 +69,7 @@ are available for actual application use.
The default key is assigned to any memory region for which a
pkey has not been explicitly assigned via
.BR pkey_mprotect (2).
.PP
Protection keys have the potential to add a layer of security and
reliability to applications.
But they have not been primarily designed as
@ -77,7 +77,7 @@ a security feature.
For instance, WRPKRU is a completely unprivileged
instruction, so pkeys are useless in any case that an attacker controls
the PKRU register or can execute arbitrary instructions.
.PP
Applications should be very careful to ensure that they do not "leak"
protection keys.
For instance, before calling
@ -96,7 +96,7 @@ Applications may implement these checks by searching the
file for memory regions with the pkey assigned.
Further details can be found in
.BR proc (5).
.PP
Any application wanting to use protection keys needs to be able
to function without them.
They might be unavailable because the hardware that the
@ -110,7 +110,7 @@ keys should simply call
and test whether the call succeeds,
instead of attempting to detect support for the
feature in any other way.
.PP
Although unnecessary, hardware support for protection keys may be
enumerated with the
.I cpuid
@ -123,7 +123,7 @@ under the "flags" field.
The string "pku" in this field indicates hardware support for protection
keys and the string "ospke" indicates that the kernel contains and has
enabled protection keys support.
.PP
Applications using threads and protection keys should be especially
careful.
Threads inherit the protection key rights of the parent at the time
@ -145,7 +145,7 @@ key rights upon entering a signal handler if the desired rights differ
from the defaults.
The rights of any interrupted context are restored when the signal
handler returns.
.PP
This signal behavior is unusual and is due to the fact that the x86 PKRU
register (which stores protection key access rights) is managed with the
same hardware mechanism (XSAVE) that manages floating-point registers.
@ -157,7 +157,7 @@ The Linux kernel implements the following pkey-related system calls:
.BR pkey_alloc (2),
and
.BR pkey_free (2).
.PP
The Linux pkey system calls are available only if the kernel was
configured and built with the
.BR CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
@ -171,7 +171,7 @@ After that, it attempts to allocate a protection key and
disallows access to the page by using the WRPKRU instruction.
It then tries to access the page,
which we now expect to cause a fatal signal to the application.
.PP
.in +4n
.nf
.RB "$" " ./a.out"

View File

@ -22,14 +22,14 @@ A special serial number value,
.BR KEY_SPEC_PROCESS_KEYRING ,
is defined that can be used in lieu of the actual serial number of
the calling process's process keyring.
.PP
From the
.BR keyctl (1)
utility, '\fB@p\fP' can be used instead of a numeric key ID in
much the same way, but since
.BR keyctl (1)
is a program run after forking, this is of no utility.
.PP
A thread created using the
.BR clone (2)
.B CLONE_THREAD
@ -42,7 +42,7 @@ A process's process keyring is cleared on
.BR execve (2).
The process keyring is destroyed when the last
thread that refers to it terminates.
.PP
If a process doesn't have a process keyring when it is accessed,
then the process keyring will be created if the keyring is to be modified;
otherwise, the error

View File

@ -33,7 +33,7 @@ A single process can contain multiple threads,
all of which are executing the same program.
These threads share the same global memory (data and heap segments),
but each thread has its own stack (automatic variables).
.PP
POSIX.1 also requires that threads share a range of other attributes
(i.e., these attributes are process-wide rather than per-thread):
.IP \- 3
@ -121,12 +121,12 @@ This identifier is returned to the caller of
.BR pthread_create (3),
and a thread can obtain its own thread identifier using
.BR pthread_self (3).
.PP
Thread IDs are guaranteed to be unique only within a process.
(In all pthreads functions that accept a thread ID as an argument,
that ID by definition refers to a thread in
the same process as the caller.)
.PP
The system may reuse a thread ID after a terminated thread has been joined,
or a detached thread has terminated.
POSIX says: "If an application attempts to use a thread ID whose
@ -135,7 +135,7 @@ lifetime has ended, the behavior is undefined."
A thread-safe function is one that can be safely
(i.e., it will deliver the same results regardless of whether it is)
called from multiple threads at the same time.
.PP
POSIX.1-2001 and POSIX.1-2008 require that all functions specified
in the standard shall be thread-safe,
except for the following functions:
@ -239,7 +239,7 @@ wctomb()
An async-cancel-safe function is one that can be safely called
in an application where asynchronous cancelability is enabled (see
.BR pthread_setcancelstate (3)).
.PP
Only the following functions are required to be async-cancel-safe by
POSIX.1-2001 and POSIX.1-2008:
.in +4n
@ -257,10 +257,10 @@ If a thread is cancelable, its cancelability type is deferred,
and a cancellation request is pending for the thread,
then the thread is canceled when it calls a function
that is a cancellation point.
.PP
The following functions are required to be cancellation points by
POSIX.1-2001 and/or POSIX.1-2008:
.PP
.\" FIXME
.\" Document the list of all functions that are cancellation points in glibc
.in +4n
@ -325,10 +325,10 @@ write()
writev()
.fi
.in
.PP
The following functions may be cancellation points according to
POSIX.1-2001 and/or POSIX.1-2008:
.PP
.in +4n
.nf
access()
@ -558,7 +558,7 @@ wprintf()
wscanf()
.fi
.in
.PP
An implementation may also mark other functions
not specified in the standard as cancellation points.
In particular, an implementation is likely to mark
@ -792,13 +792,13 @@ With NPTL, all of the threads in a process are placed
in the same thread group;
all members of a thread group share the same PID.
NPTL does not employ a manager thread.
.PP
NPTL makes internal use of the first two real-time signals;
these signals cannot be used in applications.
See
.BR nptl (7)
for further details.
.PP
NPTL still has at least one nonconformance with POSIX.1:
.IP \- 3
Threads do not share a common nice value.
@ -909,7 +909,7 @@ bash$ $( LD_ASSUME_KERNEL=2.2.5 ldd /bin/ls | grep libc.so | \\
.BR nptl (7),
.BR sigevent (7),
.BR signal (7)
.PP
Various Pthreads manual pages, for example:
.BR pthread_attr_init (3),
.BR pthread_atfork (3),

View File

@ -58,19 +58,19 @@ terminal emulators such as
.BR unbuffer (1),
and
.BR expect (1).
.PP
Data flow between master and slave is handled asynchronously,
much like data flow with a physical terminal.
Data written to the slave will be available at the master promptly,
but may not be available immediately.
Similarly, there may be a small processing delay between
a write to the master, and the effect being visible at the slave.
.PP
Historically, two pseudoterminal APIs have evolved: BSD and System V.
SUSv1 standardized a pseudoterminal API based on the System V API,
and this API should be employed in all new programs that use
pseudoterminals.
.PP
Linux provides both BSD-style and (standardized) System V-style
pseudoterminals.
System V-style terminals are commonly called UNIX 98 pseudoterminals
@ -95,7 +95,7 @@ the name returned by
.BR ptsname (3)
in a call to
.BR open (2).
.PP
The Linux kernel imposes a limit on the number of available
UNIX 98 pseudoterminals.
In kernels up to and including 2.6.3, this limit is configured
@ -149,7 +149,7 @@ A description of the
.BR ioctl (2),
which controls packet mode operation, can be found in
.BR ioctl_tty (2).
.PP
The BSD
.BR ioctl (2)
operations

View File

@ -36,7 +36,7 @@ The kernel random-number generator relies on entropy gathered from
device drivers and other sources of environmental noise to seed
a cryptographically secure pseudorandom number generator (CSPRNG).
It is designed for security, rather than speed.
.PP
The following interfaces provide access to output from the kernel CSPRNG:
.IP * 3
The
@ -96,7 +96,7 @@ flag.
The cryptographic algorithms used for the
.IR urandom
source are quite conservative, and so should be sufficient for all purposes.
.PP
The disadvantage of
.B GRND_RANDOM
and reads from
@ -213,7 +213,7 @@ or Diffie-Hellman private key has an effective key size of 128 bits
(it requires about 2^128 operations to break) so a key generator
needs only 128 bits (16 bytes) of seed material from
.IR /dev/random .
.PP
While some safety margin above that minimum is reasonable, as a guard
against flaws in the CSPRNG algorithm, no cryptographic primitive
available today can hope to promise more than 256 bits of security,

View File

@ -110,12 +110,12 @@ scheduling priority,
.IR sched_priority .
The scheduler makes its decisions based on knowledge of the scheduling
policy and static priority of all threads on the system.
.PP
For threads scheduled under one of the normal scheduling policies
(\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
\fIsched_priority\fP is not used in scheduling
decisions (it must be specified as 0).
.PP
Processes scheduled under one of the real-time policies
(\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
\fIsched_priority\fP value in the range 1 (low) to 99 (high).
@ -129,17 +129,17 @@ Portable programs should use
and
.BR sched_get_priority_max (2)
to find the range of priorities supported for a particular policy.
.PP
Conceptually, the scheduler maintains a list of runnable
threads for each possible \fIsched_priority\fP value.
In order to determine which thread runs next, the scheduler looks for
the nonempty list with the highest static priority and selects the
thread at the head of this list.
.PP
A thread's scheduling policy determines
where it will be inserted into the list of threads
with equal static priority and how it will move inside this list.
.PP
All scheduling is preemptive: if a thread with a higher static
priority becomes ready to run, the currently running thread
will be preempted and
@ -187,7 +187,7 @@ will be put at the end of the list.
No other events will move a thread
scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
runnable threads with equal static priority.
.PP
A \fBSCHED_FIFO\fP
thread runs until either it is blocked by an I/O request, it is
preempted by a higher priority thread, or it calls
@ -223,7 +223,7 @@ one must use the Linux-specific
and
.BR sched_getattr (2)
system calls.
.PP
A sporadic task is one that has a sequence of jobs, where each
job is activated at most once per period.
Each job also has a
@ -241,9 +241,9 @@ is the time at which a task starts its execution.
The
.I "absolute deadline"
is thus obtained by adding the relative deadline to the arrival time.
.PP
The following diagram clarifies these terms:
.PP
.in +4n
.nf
arrival/wakeup absolute deadline
@ -256,7 +256,7 @@ arrival/wakeup absolute deadline
|<-------------- period ------------------->|
.fi
.in
.PP
When setting a
.B SCHED_DEADLINE
policy for a thread using
@ -273,7 +273,7 @@ Deadline to the relative deadline, and Period to the period of the task.
Thus, for
.BR SCHED_DEADLINE
scheduling, we have:
.PP
.in +4n
.nf
arrival/wakeup absolute deadline
@ -286,7 +286,7 @@ arrival/wakeup absolute deadline
|<-------------- Period ------------------->|
.fi
.in
.PP
The three deadline-scheduling parameters correspond to the
.IR sched_runtime ,
.IR sched_deadline ,
@ -304,11 +304,11 @@ If
.IR sched_period
is specified as 0, then it is made the same as
.IR sched_deadline .
.PP
The kernel requires that:
.PP
sched_runtime <= sched_deadline <= sched_period
.PP
.\" See __checkparam_dl in kernel/sched/core.c
In addition, under the current implementation,
all of the parameter values must be at least 1024
@ -318,10 +318,10 @@ If any of these checks fails,
.BR sched_setattr (2)
fails with the error
.BR EINVAL .
.PP
The CBS guarantees non-interference between tasks, by throttling
threads that attempt to over-run their specified Runtime.
.PP
To ensure deadline scheduling guarantees,
the kernel must prevent situations where the set of
.B SCHED_DEADLINE
@ -334,13 +334,13 @@ if it is not,
.BR sched_setattr (2)
fails with the error
.BR EBUSY .
.PP
For example, it is required (but not necessarily sufficient) for
the total utilization to be less than or equal to the total number of
CPUs available, where, since each thread can maximally run for
Runtime per Period, that thread's utilization is its
Runtime divided by its Period.
.PP
In order to fulfill the guarantees that are made when
a thread is admitted to the
.BR SCHED_DEADLINE
@ -351,7 +351,7 @@ system; if any
.BR SCHED_DEADLINE
thread is runnable,
it will preempt any thread scheduled under one of the other policies.
.PP
A call to
.BR fork (2)
by a thread scheduled under the
@ -359,7 +359,7 @@ by a thread scheduled under the
policy will fail with the error
.BR EAGAIN ,
unless the thread has its reset-on-fork flag set (see below).
.PP
A
.B SCHED_DEADLINE
thread that calls
@ -378,7 +378,7 @@ processes).
\fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
intended for all threads that do not require the special
real-time mechanisms.
.PP
The thread to run is chosen from the static
priority 0 list based on a \fIdynamic\fP priority that is determined only
inside this list.
@ -401,12 +401,12 @@ The nice value can be modified using
.BR setpriority (2),
or
.BR sched_setattr (2).
.PP
According to POSIX.1, the nice value is a per-process attribute;
that is, the threads in a process should share a nice value.
However, on Linux, the nice value is a per-thread attribute:
different threads in the same process may have different nice values.
.PP
The range of the nice value
varies across UNIX systems.
On modern Linux, the range is \-20 (high priority) to +19 (low priority).
@ -414,12 +414,12 @@ On some other systems, the range is \-20..20.
Very early Linux kernels (Before Linux 2.0) had the range \-infinity..15.
.\" Linux before 1.3.36 had \-infinity..15.
.\" Since kernel 1.3.43, Linux has the range \-20..19.
.PP
The degree to which the nice value affects the relative scheduling of
.BR SCHED_OTHER
processes likewise varies across UNIX systems and
across Linux kernel versions.
.PP
With the advent of the CFS scheduler in kernel 2.6.23,
Linux adopted an algorithm that causes
relative differences in nice values to have a much stronger effect.
@ -431,14 +431,14 @@ to a process whenever there is any other
higher priority load on the system,
and makes high nice values (\-20) deliver most of the CPU to applications
that require it (e.g., some audio applications).
.PP
On Linux, the
.BR RLIMIT_NICE
resource limit can be used to define a limit to which
an unprivileged process's nice value can be raised; see
.BR setrlimit (2)
for details.
.PP
For further details on the nice value, see the subsections on
the autogroup feature and group scheduling, below.
.\"
@ -454,7 +454,7 @@ that the thread is CPU-intensive.
Consequently, the scheduler will apply a small scheduling
penalty with respect to wakeup behavior,
so that this thread is mildly disfavored in scheduling decisions.
.PP
.\" The following paragraph is drawn largely from the text that
.\" accompanied Ingo Molnar's patch for the implementation of
.\" SCHED_BATCH.
@ -468,7 +468,7 @@ interactivity causing extra preemptions (between the workload's tasks).
(Since Linux 2.6.23.)
\fBSCHED_IDLE\fP can be used only at static priority 0;
the process nice value has no influence for this policy.
.PP
This policy is intended for running jobs at extremely low
priority (lower even than a +19 nice value with the
.B SCHED_OTHER
@ -504,14 +504,14 @@ The state of the reset-on-fork flag can analogously be retrieved using
.BR sched_getscheduler (2)
and
.BR sched_getattr (2).
.PP
The reset-on-fork feature is intended for media-playback applications,
and can be used to prevent applications evading the
.BR RLIMIT_RTTIME
resource limit (see
.BR getrlimit (2))
by creating multiple child processes.
.PP
More precisely, if the reset-on-fork flag is set,
the following rules apply for subsequently created children:
.IP * 3
@ -545,13 +545,13 @@ matches the real or effective user ID of the target thread
(i.e., the thread specified by
.IR pid )
whose policy is being changed.
.PP
A thread must be privileged
.RB ( CAP_SYS_NICE )
in order to set or modify a
.BR SCHED_DEADLINE
policy.
.PP
Since Linux 2.6.12, the
.B RLIMIT_RTPRIO
resource limit defines a ceiling on an unprivileged thread's
@ -622,7 +622,7 @@ process from freezing the system was to run (at the console)
a shell scheduled under a higher static priority than the tested application.
This allows an emergency kill of tested
real-time applications that do not block or terminate as expected.
.PP
Since Linux 2.6.25, there are other techniques for dealing with runaway
real-time and deadline processes.
One of these is to use the
@ -632,7 +632,7 @@ a real-time process may consume.
See
.BR getrlimit (2)
for details.
.PP
Since version 2.6.25, Linux also provides two
.I /proc
files that can be used to reserve a certain amount of CPU time
@ -675,7 +675,7 @@ Child processes inherit the scheduling policy and parameters across a
.BR fork (2).
The scheduling policy and parameters are preserved across
.BR execve (2).
.PP
Memory locking is usually needed for real-time processes to avoid
paging delays; this can be done with
.BR mlock (2)
@ -692,7 +692,7 @@ parallel build processes (i.e., the
.BR make (1)
.BR \-j
flag).
.PP
This feature operates in conjunction with the
CFS scheduler and requires a kernel that is configured with
.BR CONFIG_SCHED_AUTOGROUP .
@ -702,7 +702,7 @@ a value of 0 disables the feature, while a value of 1 enables it.
The default value in this file is 1, unless the kernel was booted with the
.IR noautogroup
parameter.
.PP
A new autogroup is created when a new session is created via
.BR setsid (2);
this happens, for example, when a new terminal window is started.
@ -712,14 +712,14 @@ inherits its parent's autogroup membership.
Thus, all of the processes in a session are members of the same autogroup.
An autogroup is automatically destroyed when the last process
in the group terminates.
.PP
When autogrouping is enabled, all of the members of an autogroup
are placed in the same kernel scheduler "task group".
The CFS scheduler employs an algorithm that equalizes the
distribution of CPU cycles across task groups.
The benefits of this for interactive desktop performance
can be described via the following example.
.PP
Suppose that there are two autogroups competing for the same CPU
(i.e., presume either a single CPU system or the use of
.BR taskset (1)
@ -750,17 +750,17 @@ the scheduler distributes CPU cycles across task groups such that
an autogroup that contains a large number of CPU-bound processes
does not end up hogging CPU cycles at the expense of the other
jobs on the system.
.PP
A process's autogroup (task group) membership can be viewed via the file
.IR /proc/[pid]/autogroup :
.PP
.nf
.in +4n
$ \fBcat /proc/1/autogroup\fP
/autogroup-1 nice 0
.in
.fi
.PP
This file can also be used to modify the CPU bandwidth allocated
to an autogroup.
This is done by writing a number in the "nice" range to the file
@ -782,7 +782,7 @@ to fail with the error
.\" A patch was posted on 23 Nov 2016
.\" ("sched/autogroup: Fix 64bit kernel nice adjustment";
.\" check later to see in which kernel version it lands.
.PP
The autogroup nice setting has the same meaning as the process nice value,
but applies to distribution of CPU cycles to the autogroup as a whole,
based on the relative nice values of other autogroups.
@ -791,12 +791,12 @@ will be a product of the autogroup's nice value
(compared to other autogroups)
and the process's nice value
(compared to other processes in the same autogroup.
.PP
The use of the
.BR cgroups (7)
CPU controller to place processes in cgroups other than the
root CPU cgroup overrides the effect of autogrouping.
.PP
The autogroup feature groups only processes scheduled under
non-real-time policies
.RB ( SCHED_OTHER ,
@ -817,7 +817,7 @@ policies), the CFS scheduler employs a technique known as "group scheduling",
if the kernel was configured with the
.BR CONFIG_FAIR_GROUP_SCHED
option (which is typical).
.PP
Under group scheduling, threads are scheduled in "task groups".
Task groups have a hierarchical relationship,
rooted under the initial task group on the system,
@ -861,7 +861,7 @@ or
on a process has an effect only for scheduling relative
to other processes executed in the same session
(typically: the same terminal window).
.PP
Conversely, for two processes that are (for example)
the sole CPU-bound processes in different sessions
(e.g., different terminal windows,
@ -877,7 +877,7 @@ A possibly useful workaround here is to use a command such as
the following to modify the autogroup nice value for
.I all
of the processes in a terminal session:
.PP
.nf
.in +4n
$ \fBecho 10 > /proc/self/autogroup\fP
@ -905,7 +905,7 @@ patch-\fIkernelversion\fP-rt\fIpatchversion\fP
and can be downloaded from
.UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
.UE .
.PP
Without the patches and prior to their full inclusion into the mainline
kernel, the kernel configuration offers only the three preemption classes
.BR CONFIG_PREEMPT_NONE ,
@ -914,7 +914,7 @@ and
.B CONFIG_PREEMPT_DESKTOP
which respectively provide no, some, and considerable
reduction of the worst-case scheduling latency.
.PP
With the patches applied or after their full inclusion into the mainline
kernel, the additional configuration item
.B CONFIG_PREEMPT_RT

View File

@ -28,7 +28,7 @@
sem_overview \- overview of POSIX semaphores
.SH DESCRIPTION
POSIX semaphores allow processes and threads to synchronize their actions.
.PP
A semaphore is an integer whose value is never allowed to fall below zero.
Two operations can be performed on semaphores:
increment the semaphore value by one
@ -38,7 +38,7 @@ and decrement the semaphore value by one
If the value of a semaphore is currently zero, then a
.BR sem_wait (3)
operation will block until the value becomes greater than zero.
.PP
POSIX semaphores come in two forms: named semaphores and
unnamed semaphores.
.TP
@ -61,7 +61,7 @@ followed by one or more characters, none of which are slashes.
Two processes can operate on the same named semaphore by passing
the same name to
.BR sem_open (3).
.IP
The
.BR sem_open (3)
function creates a new named semaphore or opens an existing
@ -91,7 +91,7 @@ A process-shared semaphore must be placed in a shared memory region
.BR shmget (2),
or a POSIX shared memory object built created using
.BR shm_open (3)).
.IP
Before being used, an unnamed semaphore must be initialized using
.BR sem_init (3).
It can then be operated on using
@ -132,7 +132,7 @@ with names of the form
rather than
.B NAME_MAX
characters.)
.PP
Since Linux 2.6.19, ACLs can be placed on files under this directory,
to control object permissions on a per-user and per-group basis.
.SH NOTES

View File

@ -22,17 +22,17 @@ Optionally, PAM may revoke the session keyring on logout.
(In typical configurations, PAM does do this revocation.)
The session keyring has the name (description)
.IR _ses .
.PP
A special serial number value,
.BR KEY_SPEC_SESSION_KEYRING ,
is defined that can be used in lieu of the actual serial number of
the calling process's session keyring.
.PP
From the
.BR keyctl (1)
utility, '\fB@s\fP' can be used instead of a numeric key ID in
much the same way.
.PP
A process's session keyring is inherited across
.BR clone (2),
.BR fork (2),
@ -44,7 +44,7 @@ is preserved across
even when the executable is set-user-ID or set-group-ID or has capabilities.
The session keyring is destroyed when the last process that
refers to it exits.
.PP
If a process doesn't have a session keyring when it is accessed, then,
under certain circumstances, the
.BR user-session-keyring (7)
@ -84,7 +84,7 @@ operation.)
These operations are also exposed through the
.BR keyctl (1)
utility as:
.PP
.nf
.in +4n
keyctl session
@ -92,9 +92,9 @@ keyctl session - [<prog> <arg1> <arg2> ...]
keyctl session <name> [<prog> <arg1> <arg2> ...]
.in
.fi
.PP
and:
.PP
.nf
.in +4n
keyctl new_session

View File

@ -30,7 +30,7 @@ shm_overview \- overview of POSIX shared memory
.SH DESCRIPTION
The POSIX shared memory API allows processes to communicate information
by sharing a region of memory.
.PP
The interfaces employed in the API are:
.TP 15
.BR shm_open (3)
@ -101,7 +101,7 @@ to control the permissions of objects in the virtual filesystem.
.SH NOTES
Typically, processes must synchronize their access to a shared
memory object, using, for example, POSIX semaphores.
.PP
System V shared memory
.RB ( shmget (2),
.BR shmop (2),

View File

@ -34,13 +34,13 @@ Many functions are
async-signal-safe.
In particular,
nonreentrant functions are generally unsafe to call from a signal handler.
.PP
The kinds of issues that render a function
unsafe can be quickly understood when one considers
the implementation of the
.I stdio
library, all of whose functions are not async-signal-safe.
.PP
When performing buffered I/O on a file, the
.I stdio
functions must maintain a statically allocated data buffer
@ -57,7 +57,7 @@ the program is interrupted by a signal handler that also calls
then the second call to
.BR printf (3)
will operate on inconsistent data, with unpredictable results.
.PP
To avoid problems with unsafe functions, there are two possible choices:
.IP 1. 3
Ensure that
@ -72,7 +72,7 @@ by the signal handler.
.PP
Generally, the second choice is difficult in programs of any complexity,
so the first choice is taken.
.PP
POSIX.1 specifies a set of functions that an implementation
must make async-signal-safe.
(An implementation may provide safe implementations of additional functions,
@ -81,13 +81,13 @@ may not provide the same guarantees.)
In general, a function is async-signal-safe either because it is reentrant
or because it is atomic with respect to signals
(i.e., its execution can't be interrupted by a signal handler).
.PP
The set of functions required to be async-signal-safe by POSIX.1
is shown in the following table.
The functions not otherwise noted were required to be async-signal-safe
in POSIX.1-2001;
the table details changes in the subsequent standards.
.PP
.TS
lb lb
l l.
@ -284,7 +284,7 @@ Function Notes
\fBwmemset\fP(3) Added in POSIX.1-2016
\fBwrite\fP(2)
.TE
.sp 1
Notes:
.IP * 3
POSIX.1-2001 and POSIX.1-2004 required the functions

View File

@ -54,7 +54,7 @@ Each signal has a current
.IR disposition ,
which determines how the process behaves when it is delivered
the signal.
.PP
The entries in the "Action" column of the tables below specify
the default disposition for each signal, as follows:
.IP Term
@ -90,11 +90,11 @@ It is possible to arrange that the signal handler
uses an alternate stack; see
.BR sigaltstack (2)
for a discussion of how to do this and when it might be useful.)
.PP
The signal disposition is a per-process attribute:
in a multithreaded application, the disposition of a
particular signal is the same for all threads.
.PP
A child created via
.BR fork (2)
inherits a copy of its parent's signal dispositions.
@ -174,7 +174,7 @@ which means that it will not be delivered until it is later unblocked.
Between the time when it is generated and when it is delivered
a signal is said to be
.IR pending .
.PP
Each thread in a process has an independent
.IR "signal mask" ,
which indicates the set of signals that the thread is currently blocking.
@ -183,13 +183,13 @@ A thread can manipulate its signal mask using
In a traditional single-threaded application,
.BR sigprocmask (2)
can be used to manipulate the signal mask.
.PP
A child created via
.BR fork (2)
inherits a copy of its parent's signal mask;
the signal mask is preserved across
.BR execve (2).
.PP
A signal may be generated (and thus pending)
for a process as a whole (e.g., when sent using
.BR kill (2))
@ -206,14 +206,14 @@ A process-directed signal may be delivered to any one of the
threads that does not currently have the signal blocked.
If more than one of the threads has the signal unblocked, then the
kernel chooses an arbitrary thread to which to deliver the signal.
.PP
A thread can obtain the set of signals that it currently has pending
using
.BR sigpending (2).
This set will consist of the union of the set of pending
process-directed signals and the set of signals pending for
the calling thread.
.PP
A child created via
.BR fork (2)
initially has an empty pending signal set;
@ -231,7 +231,7 @@ and the last one for mips.
.I not
shown; see the Linux kernel source for signal numbering on that architecture.)
A dash (\-) denotes that a signal is absent on the corresponding architecture.
.PP
First the signals described in the original POSIX.1-1990 standard.
.TS
l c c l
@ -260,13 +260,13 @@ SIGTSTP 18,20,24 Stop Stop typed at terminal
SIGTTIN 21,21,26 Stop Terminal input for background process
SIGTTOU 22,22,27 Stop Terminal output for background process
.TE
.sp 1
The signals
.B SIGKILL
and
.B SIGSTOP
cannot be caught, blocked, or ignored.
.PP
Next the signals not in the POSIX.1-1990 standard but described in
SUSv2 and POSIX.1-2001.
.TS
@ -288,7 +288,7 @@ SIGXCPU 24,24,30 Core CPU time limit exceeded (4.2BSD);
SIGXFSZ 25,25,31 Core File size limit exceeded (4.2BSD);
see \fBsetrlimit\fP(2)
.TE
.sp 1
Up to and including Linux 2.2, the default behavior for
.BR SIGSYS ", " SIGXCPU ", " SIGXFSZ ", "
and (on architectures other than SPARC and MIPS)
@ -299,7 +299,7 @@ was to terminate the process (without a core dump).
is to terminate the process without a core dump.)
Linux 2.4 conforms to the POSIX.1-2001 requirements for these signals,
terminating the process with a core dump.
.PP
Next various other signals.
.TS
l c c l
@ -317,7 +317,7 @@ SIGLOST \-,\-,\- Term File lock lost (unused)
SIGWINCH 28,28,20 Ign Window resize signal (4.3BSD, Sun)
SIGUNUSED \-,31,\- Core Synonymous with \fBSIGSYS\fP
.TE
.sp 1
(Signal 29 is
.B SIGINFO
/
@ -325,21 +325,21 @@ SIGUNUSED \-,31,\- Core Synonymous with \fBSIGSYS\fP
on an alpha but
.B SIGLOST
on a sparc.)
.PP
.B SIGEMT
is not specified in POSIX.1-2001, but nevertheless appears
on most other UNIX systems,
where its default action is typically to terminate
the process with a core dump.
.PP
.B SIGPWR
(which is not specified in POSIX.1-2001) is typically ignored
by default on those other UNIX systems where it appears.
.PP
.B SIGIO
(which is not specified in POSIX.1-2001) is ignored by default
on several other UNIX systems.
.PP
Where defined,
.B SIGUNUSED
is synonymous with
@ -452,7 +452,7 @@ resource limit, which specifies a per-user limit for queued
signals; see
.BR setrlimit (2)
for further details.
.PP
The addition of real-time signals required the widening
of the signal set structure
.RI ( sigset_t )
@ -488,7 +488,7 @@ flag (see
.BR sigaction (2)).
The details vary across UNIX systems;
below, the details for Linux.
.PP
If a blocked call to one of the following interfaces is interrupted
by a signal handler, then the call will be automatically restarted
after the signal handler returns if the
@ -674,7 +674,7 @@ and then resumed via
.BR SIGCONT .
This behavior is not sanctioned by POSIX.1, and doesn't occur
on other systems.
.PP
The Linux interfaces that display this behavior are:
.IP * 2
"Input" socket interfaces, when a timeout

View File

@ -31,7 +31,7 @@ spufs \- SPU filesystem
The SPU filesystem is used on PowerPC machines that implement the
Cell Broadband Engine Architecture in order to access Synergistic
Processor Units (SPUs).
.PP
The filesystem provides a name space similar to POSIX shared
memory or message queues.
Users that have write permissions
@ -40,7 +40,7 @@ on the filesystem can use
to establish SPU contexts under the
.B spufs
root directory.
.PP
Every SPU context is represented by a directory containing
a predefined set of files.
These files can be
@ -72,7 +72,7 @@ supported on regular filesystems.
This list details the supported
operations and the deviations from the standard behavior described
in the respective man pages.
.PP
All files that support the
.BR read (2)
operation also support
@ -94,7 +94,7 @@ structure that contain reliable information are
.IR st_uid ,
and
.IR st_gid .
.PP
All files support the
.BR chmod (2)/ fchmod (2)
and
@ -103,7 +103,7 @@ operations, but will not be able to grant permissions that contradict
the possible operations (e.g., read access on the
.I wbox
file).
.PP
The current set of files is:
.TP
.I /capabilities
@ -158,11 +158,11 @@ This file contains the 128-bit values of each register,
from register 0 to register 127, in order.
This allows the general-purpose registers to be
inspected for debugging.
.IP
Reading to or writing from this file requires that the context is
scheduled out, so use of this file is not recommended in normal
program operation.
.IP
The
.I regs
file is not present on contexts that have been created with the
@ -214,7 +214,7 @@ Also,
.BR poll (2)
and similar system calls can be used to monitor for the presence
of mailbox data.
.IP
The possible operations on an open
.I ibox
file are:
@ -236,7 +236,7 @@ the return value is set to \-1 and
.I errno
is set to
.BR EAGAIN .
.IP
If there is no data available in the mailbox and the file
descriptor has been opened without
.BR O_NONBLOCK ,
@ -283,7 +283,7 @@ value is set to \-1 and
.I errno
is set to
.BR EAGAIN .
.IP
If there is no space available in the mailbox and the file
descriptor has been opened without
.BR O_NONBLOCK ,
@ -385,7 +385,7 @@ If the register value is larger than the buffer passed to the
.BR read (2)
system call, subsequent reads will continue reading from the same
buffer, until the end of the buffer is reached.
.IP
When a complete string has been read, all subsequent read operations
will return zero bytes and a new file descriptor needs to be opened
to read a new value.
@ -399,7 +399,7 @@ The string is parsed from the beginning
until the first nonnumeric character or the end of the buffer.
Subsequent writes to the same file descriptor overwrite the
previous setting.
.IP
Except for the
.I npc
file, these files are not present on contexts that have been created with
@ -554,7 +554,7 @@ The
and
.I wbox_stat
files contain the available message count.
.IP
The
.I wbox_info
file contains an array of four-byte mailbox messages, which have been
@ -563,12 +563,12 @@ With current CBEA machines, the array is four items in
length, so up to 4 * 4 = 16 bytes can be read from this file.
If any mailbox queue entry is empty,
then the bytes read at the corresponding location are undefined.
.IP
The
.I dma_info
file contains the contents of the SPU MFC DMA queue, represented as the
following structure:
.IP
.in +4n
.nf
struct spu_dma_info {
@ -581,13 +581,13 @@ struct spu_dma_info {
};
.fi
.in
.IP
The last member of this data structure is the actual DMA queue,
containing 16 entries.
The
.I mfc_cq_sr
structure is defined as:
.IP
.in +4n
.nf
struct mfc_cq_sr {
@ -598,13 +598,13 @@ struct mfc_cq_sr {
};
.fi
.in
.IP
The
.I proxydma_info
file contains similar information, but describes the proxy DMA queue
(i.e., DMAs initiated by entities outside the SPU) instead.
The file is in the following format:
.IP
.in +4n
.nf
struct spu_proxydma_info {
@ -615,11 +615,11 @@ struct spu_proxydma_info {
};
.fi
.in
.IP
Accessing these files requires that the SPU context is scheduled out -
frequent use can be inefficient.
These files should not be used for normal program operation.
.IP
These files are not present on contexts that have been created with the
.B SPU_CREATE_NOSCHED
flag.
@ -653,7 +653,7 @@ The following operations are supported:
.BR write (2)
Writes to this file need to be in the format of a MFC DMA command,
defined as follows:
.IP
.in +4n
.nf
struct mfc_dma_command {
@ -667,7 +667,7 @@ struct mfc_dma_command {
};
.fi
.in
.IP
Writes are required to be exactly
.I sizeof(struct mfc_dma_command)
bytes in size.
@ -695,13 +695,13 @@ or until a previously started DMA
(by checking for
.BR POLLIN )
has been completed.
.IP
.I /mss
Provides access to the MFC MultiSource Synchronization (MSS) facility.
By
.BR mmap (2)-ing
this file, processes can access the MSS area of the SPU.
.IP
The following operations are supported:
.TP
.BR mmap (2)
@ -719,7 +719,7 @@ Provides access to the whole problem-state mapping of the SPU.
Applications can use this area to interface to the SPU, rather than
writing to individual register files in
.BR spufs .
.IP
The following operations are supported:
.RS
.TP
@ -737,7 +737,7 @@ Read-only file containing the physical SPU number that the SPU context
is running on.
When the context is not running, this file contains the
string "\-1".
.IP
The physical SPU number is given by an ASCII hex string.
.TP
.I /object-id
@ -768,5 +768,5 @@ none /spu spufs gid=spu 0 0
.BR spu_create (2),
.BR spu_run (2),
.BR capabilities (7)
.PP
.I The Cell Broadband Engine Architecture (CBEA) specification

View File

@ -43,7 +43,7 @@ released by the University of California at Berkeley.
This was the first Berkeley release that contained a TCP/IP
stack and the sockets API.
4.2BSD was released in 1983.
.IP
Earlier major BSD releases included
.IR 3BSD
(1980),
@ -200,7 +200,7 @@ The standard is available online at
.UE ,
and the interfaces that it describes are also available in the Linux
manual pages package under sections 1p and 3p (e.g., "man 3p open").
.IP
The standard defines two levels of conformance:
.IR "POSIX conformance" ,
which is a baseline set of interfaces required of a conforming system;
@ -213,27 +213,27 @@ XSI-conformant systems can be branded
(XSI conformance constitutes the
.I "Single UNIX Specification version 3"
.RI ( SUSv3 ).)
.IP
The POSIX.1-2001 document is broken into four parts:
.IP
.BR XBD :
Definitions, terms and concepts, header file specifications.
.IP
.BR XSH :
Specifications of functions (i.e., system calls and library
functions in actual implementations).
.IP
.BR XCU :
Specifications of commands and utilities
(i.e., the area formerly described by POSIX.2).
.IP
.BR XRAT :
Informative text on the other parts of the standard.
.IP
POSIX.1-2001 is aligned with C99, so that all of the
library functions standardized in C99 are also
standardized in POSIX.1-2001.
.IP
Two Technical Corrigenda (minor fixes and improvements)
of the original 2001 standard have occurred:
TC1 in 2003 (also known as
@ -244,7 +244,7 @@ and TC2 in 2004 (also known as
.B POSIX.1-2008, SUSv4
Work on the next revision of POSIX.1/SUS was completed and
ratified in 2008.
.IP
The changes in this revision are not as large as those
that occurred for POSIX.1-2001/SUSv3,
but a number of new interfaces are added
@ -253,7 +253,7 @@ Many of the interfaces that were optional in
POSIX.1-2001 become mandatory in the 2008 revision of the standard.
A few interfaces that are present in POSIX.1-2001 are marked
as obsolete in POSIX.1-2008, or removed from the standard altogether.
.IP
The revised standard is broken into the same four parts as POSIX.1-2001,
and again there are two levels of conformance: the baseline
.IR "POSIX Conformance" ,
@ -261,20 +261,20 @@ and
.IR "XSI Conformance" ,
which mandates an additional set of interfaces
beyond those in the base specification.
.IP
In general, where the CONFORMING TO section of a manual page
lists POSIX.1-2001, it can be assumed that the interface also
conforms to POSIX.1-2008, unless otherwise noted.
.IP
Technical Corrigendum 1 (minor fixes and improvements)
of this standard was released in 2013
(also known as
.IR POSIX.1-2013 ).
.IP
Technical Corrigendum 2 of this standard was released in 2016
(also known as
.IR POSIX.1-2016 ).
.IP
Further information can be found on the Austin Group web site,
.UR http://www.opengroup.org\:/austin/
.UE .

View File

@ -41,7 +41,7 @@ symlink \- symbolic link handling
Symbolic links are files that act as pointers to other files.
To understand their behavior, you must first understand how hard links
work.
.PP
A hard link to a file is indistinguishable from the original file because
it is a reference to the object underlying the original filename.
(To be precise: each of the hard links to a file is a reference to
@ -57,7 +57,7 @@ Hard links may not refer to directories
which would confuse many programs)
and may not refer to files on different filesystems
(because inode numbers are not unique across filesystems).
.PP
A symbolic link is a special type of file whose contents are a string
that is the pathname of another file, the file to which the link refers.
(The contents of a symbolic link can be read using
@ -66,13 +66,13 @@ In other words, a symbolic link is a pointer to another name,
and not to an underlying object.
For this reason, symbolic links may refer to directories and may cross
filesystem boundaries.
.PP
There is no requirement that the pathname referred to by a symbolic link
should exist.
A symbolic link that refers to a pathname that does not exist is said
to be a
.IR "dangling link" .
.PP
Because a symbolic link and its referenced object coexist in the filesystem
name space, confusion can arise in distinguishing between the link itself
and the referenced object.
@ -92,13 +92,13 @@ The only time that the ownership of a symbolic link matters is
when the link is being removed or renamed in a directory that
has the sticky bit set (see
.BR stat (2)).
.PP
The last access and last modification timestamps
of a symbolic link can be changed using
.BR utimensat (2)
or
.BR lutimes (3).
.PP
On Linux, the permissions of a symbolic link are not used
in any operations; the permissions are always
0777 (read, write, and execute for all user categories),
@ -140,7 +140,7 @@ and
.BR readlinkat (2),
in order to operate on the symbolic link itself
(rather than the file to which it refers).
.PP
By default
(i.e., if the
.BR AT_SYMLINK_FOLLOW
@ -171,7 +171,7 @@ or a loop is detected.
(Loop detection is done by placing an upper limit on the number of
links that may be followed, and an error results if this limit is
exceeded.)
.PP
There are three separate areas that need to be discussed.
They are as follows:
.IP 1. 3
@ -186,7 +186,7 @@ file hierarchy walk).
.SS System calls
The first area is symbolic links used as filename arguments for
system calls.
.PP
Except as noted below, all system calls follow symbolic links.
For example, if there were a symbolic link
.I slink
@ -196,7 +196,7 @@ the system call
.I "open(""slink"" ...\&)"
would return a file descriptor referring to the file
.IR afile .
.PP
Various system calls do not follow links, and operate
on the symbolic link itself.
They are:
@ -211,7 +211,7 @@ They are:
.BR rmdir (2),
and
.BR unlink (2).
.PP
Certain other system calls optionally follow symbolic links.
They are:
.BR faccessat (2),
@ -235,7 +235,7 @@ When
.BR rmdir (2)
is applied to a symbolic link, it fails with the error
.BR ENOTDIR .
.PP
.BR link (2)
warrants special discussion.
POSIX.1-2001 specifies that
@ -252,7 +252,7 @@ either behavior in an implementation.
.SS Commands not traversing a file tree
The second area is symbolic links, specified as command-line
filename arguments, to commands which are not traversing a file tree.
.PP
Except as noted below, commands follow symbolic links named as
command-line arguments.
For example, if there were a symbolic link
@ -263,7 +263,7 @@ the command
.I "cat slink"
would display the contents of the file
.IR afile .
.PP
It is important to realize that this rule includes commands which may
optionally traverse file trees; for example, the command
.I "chown file"
@ -271,7 +271,7 @@ is included in this rule, while the command
.IR "chown\ \-R file" ,
which performs a tree traversal, is not.
(The latter is described in the third area, below.)
.PP
If it is explicitly intended that the command operate on the symbolic
link instead of following the symbolic link\(emfor example, it is desired that
.I "chown slink"
@ -289,7 +289,7 @@ while
would change the ownership of
.I slink
itself.
.PP
There are some exceptions to this rule:
.IP * 2
The
@ -362,16 +362,16 @@ The following commands either optionally or always traverse file trees:
.BR rm (1),
and
.BR tar (1).
.PP
It is important to realize that the following rules apply equally to
symbolic links encountered during the file tree traversal and symbolic
links listed as command-line arguments.
.PP
The \fIfirst rule\fP applies to symbolic links that reference files other
than directories.
Operations that apply to symbolic links are performed on the links
themselves, but otherwise the links are ignored.
.PP
The command
.I "rm\ \-r slink directory"
will remove
@ -383,12 +383,12 @@ In no case will
.BR rm (1)
affect the file referred to by
.IR slink .
.PP
The \fIsecond rule\fP applies to symbolic links that refer to directories.
Symbolic links that refer to directories are never followed by default.
This is often referred to as a "physical" walk, as opposed to a "logical"
walk (where symbolic links that refer to directories are followed).
.PP
Certain conventions are (should be) followed as consistently as
possible by commands that perform file tree walks:
.IP * 2
@ -404,7 +404,7 @@ like the logical name space.
flag will be ignored if the
.I \-R
flag is not also specified.)
.IP
For example, the command
.I "chown\ \-HR user slink"
will traverse the file hierarchy rooted in the file pointed to by
@ -434,7 +434,7 @@ the logical name space.
flag will be ignored if the
.I \-R
flag is not also specified.)
.IP
For example, the command
.I "chown\ \-LR user slink"
will change the owner of the file referred to by
@ -474,7 +474,7 @@ options more than once;
the last one specified determines the command's behavior.
This is intended to permit you to alias commands to behave one way
or the other, and then override that behavior on the command line.
.PP
The
.BR ls (1)
and

View File

@ -35,7 +35,7 @@ This interface defined a
structure used to store terminal settings, and a range of
.BR ioctl (2)
operations to get and set terminal attributes.
.PP
The
.B termio
interface is now obsolete: POSIX.1-1990 standardized a modified
@ -50,7 +50,7 @@ operations that existed in System V.
.BR ioctl (2)
was unstandardized, and its variadic third argument
does not allow argument type checking.)
.PP
If you're looking for a page called "termio", then you can probably
find most of the information that you seek in either
.BR termios (3)

View File

@ -17,19 +17,19 @@ The thread keyring is a keyring used to anchor keys on behalf of a process.
It is created only when a thread requests it.
The thread keyring has the name (description)
.IR _tid .
.PP
A special serial number value,
.BR KEY_SPEC_THREAD_KEYRING ,
is defined that can be used in lieu of the actual serial number of
the calling thread's thread keyring.
.PP
From the
.BR keyctl (1)
utility, '\fB@t\fP' can be used instead of a numeric key ID in
much the same way, but as
.BR keyctl (1)
is a program run after forking, this is of no utility.
.PP
Thread keyrings are not inherited across
.BR clone (2)
and
@ -37,7 +37,7 @@ and
and are cleared by
.BR execve (2).
A thread keyring is destroyed when the thread that refers to it terminates.
.PP
Initially, a thread does not have a thread keyring.
If a thread doesn't have a thread keyring when it is accessed,
then it will be created if it is to be modified;

View File

@ -36,7 +36,7 @@ either from a standard point in the past
(see the description of the Epoch and calendar time below),
or from some point (e.g., the start) in the life of a process
.RI ( "elapsed time" ).
.PP
.I "Process time"
is defined as the amount of CPU time used by a process.
This is sometimes divided into
@ -78,7 +78,7 @@ a clock maintained by the kernel which measures time in
.IR jiffies .
The size of a jiffy is determined by the value of the kernel constant
.IR HZ .
.PP
The value of
.I HZ
varies across kernel versions and hardware platforms.
@ -93,7 +93,7 @@ yielding a jiffies value of, respectively, 0.01, 0.004, or 0.001 seconds.
Since kernel 2.6.20, a further frequency is available:
300, a number that divides evenly for the common video
frame rates (PAL, 25 HZ; NTSC, 30 HZ).
.PP
The
.BR times (2)
system call is a special case.
@ -107,7 +107,7 @@ User-space applications can determine the value of this constant using
.SS High-resolution timers
Before Linux 2.6.21, the accuracy of timer and sleep system calls
(see below) was also limited by the size of the jiffy.
.PP
Since Linux 2.6.21, Linux supports high-resolution timers (HRTs),
optionally configurable via
.BR CONFIG_HIGH_RES_TIMERS .
@ -120,14 +120,14 @@ checking the resolution returned by a call to
.BR clock_getres (2)
or looking at the "resolution" entries in
.IR /proc/timer_list .
.PP
HRTs are not supported on all hardware architectures.
(Support is provided on x86, arm, and powerpc, among others.)
.SS The Epoch
UNIX systems represent time in seconds since the
.IR Epoch ,
1970-01-01 00:00:00 +0000 (UTC).
.PP
A program can determine the
.I "calendar time"
using
@ -164,7 +164,7 @@ Various system calls and functions allow a program to sleep
.BR clock_nanosleep (2),
and
.BR sleep (3).
.PP
Various system calls allow a process to set a timer that expires
at some point in the future, and optionally at repeated intervals;
see

View File

@ -37,7 +37,7 @@ It also guarantees "round-trip compatibility";
in other words,
conversion tables can be built such that no information is lost
when a string is converted from any other encoding to UCS and back.
.PP
UCS contains the characters required to represent practically all
known languages.
This includes not only the Latin, Greek, Cyrillic,
@ -59,7 +59,7 @@ graphical, typographical, mathematical, and scientific symbols,
including those provided by TeX, Postscript, APL, MS-DOS, MS-Windows,
Macintosh, OCR fonts, as well as many word processing and publishing
systems, and more are being added.
.PP
The UCS standard (ISO 10646) describes a
31-bit character set architecture
consisting of 128 24-bit
@ -166,7 +166,7 @@ code values (in all locales), a convention that is signaled by the GNU
C library to applications by defining the constant
.B __STDC_ISO_10646__
as specified in the ISO C99 standard.
.PP
UCS/Unicode can be used just like ASCII in input/output streams,
terminal communication, plaintext files, filenames, and environment
variables in the ASCII compatible UTF-8 multibyte encoding.
@ -216,7 +216,7 @@ Information technology \(em Universal Multiple-Octet Coded Character
Set (UCS) \(em Part 1: Architecture and Basic Multilingual Plane.
International Standard ISO/IEC 10646-1, International Organization
for Standardization, Geneva, 2000.
.IP
This is the official specification of UCS .
Available from
.UR http://www.iso.ch/
@ -228,7 +228,7 @@ Reading, MA, 2000, ISBN 0-201-61633-5.
.IP *
S. Harbison, G. Steele. C: A Reference Manual. Fourth edition,
Prentice Hall, Englewood Cliffs, 1995, ISBN 0-13-326224-3.
.IP
A good reference book about the C programming language.
The fourth
edition covers the 1994 Amendment 1 to the ISO C90 standard, which

View File

@ -21,7 +21,7 @@ The user keyring has a name (description) of the form
where
.I <UID>
is the user ID of the corresponding user.
.PP
The user keyring is associated with the record that the kernel maintains
for the UID.
It comes into existence upon the first attempt to access either the
@ -33,28 +33,28 @@ The keyring remains pinned in existence so long as there are processes
running with that real UID or files opened by those processes remain open.
(The keyring can also be pinned indefinitely by linking it
into another keyring.)
.PP
Typically, the user keyring is created by
.BR pam_keyinit (8)
when a user logs in.
.PP
The user keyring is not searched by default by
.BR request_key (2).
When
.BR pam_keyinit (8)
creates a session keyring, it adds to it a link to the user
keyring so that the user keyring will be searched when the session keyring is.
.PP
A special serial number value,
.BR KEY_SPEC_USER_KEYRING ,
is defined that can be used in lieu of the actual serial number of
the calling process's user keyring.
.PP
From the
.BR keyctl (1)
utility, '\fB@u\fP' can be used instead of a numeric key ID in
much the same way.
.PP
User keyrings are independent of
.BR clone (2),
.BR fork (2),

View File

@ -21,7 +21,7 @@ The user session keyring has a name (description) of the form
where
.I <UID>
is the user ID of the corresponding user.
.PP
The user session keyring is associated with the record that
the kernel maintains for the UID.
It comes into existence upon the first attempt to access either the
@ -34,7 +34,7 @@ The keyring remains pinned in existence so long as there are processes
running with that real UID or files opened by those processes remain open.
(The keyring can also be pinned indefinitely by linking it
into another keyring.)
.PP
The user session keyring is created on demand when a thread requests it
or when a thread asks for its
.BR session-keyring (7)
@ -42,22 +42,22 @@ and that keyring doesn't exist.
In the latter case, a user session keyring will be created and,
if the session keyring wasn't to be created,
the user session keyring will be set as the process's actual session keyring.
.PP
The user session keyring is searched by
.BR request_key (2)
if the actual session keyring does not exist and is ignored otherwise.
.PP
A special serial number value,
.BR KEY_SPEC_USER_SESSION_KEYRING ,
is defined
that can be used in lieu of the actual serial number of
the calling process's user session keyring.
.PP
From the
.BR keyctl (1)
utility, '\fB@us\fP' can be used instead of a numeric key ID in
much the same way.
.PP
User session keyrings are independent of
.BR clone (2),
.BR fork (2),
@ -67,10 +67,10 @@ and
.BR _exit (2)
excepting that the keyring is destroyed when the UID record is destroyed
when the last process pinning it exits.
.PP
If a user session keyring does not exist when it is accessed,
it will be created.
.PP
Rather than relying on the user session keyring,
it is strongly recommended\(emespecially if the process
is running as root\(emthat a

View File

@ -30,7 +30,7 @@ user_namespaces \- overview of Linux user namespaces
.SH DESCRIPTION
For an overview of namespaces, see
.BR namespaces (7).
.PP
User namespaces isolate security-related identifiers and attributes,
in particular,
user IDs and group IDs (see
@ -66,7 +66,7 @@ or
with the
.BR CLONE_NEWUSER
flag.
.PP
The kernel imposes (since version 3.11) a limit of 32 nested levels of
.\" commit 8742f229b635bf1c1c84a3dfe5e47c814c20b5c8
user namespaces.
@ -77,7 +77,7 @@ or
.BR clone (2)
that would cause this limit to be exceeded fail with the error
.BR EUSERS .
.PP
Each process is a member of exactly one user namespace.
A process created via
.BR fork (2)
@ -92,7 +92,7 @@ if it has the
.BR CAP_SYS_ADMIN
in that namespace;
upon doing so, it gains a full set of capabilities in that namespace.
.PP
A call to
.BR clone (2)
or
@ -104,7 +104,7 @@ flag makes the new child process (for
or the caller (for
.BR unshare (2))
a member of the new user namespace created by the call.
.PP
The
.BR NS_GET_PARENT
.BR ioctl (2)
@ -136,7 +136,7 @@ and
user namespace,
even if the new namespace is created or joined by the root user
(i.e., a process with user ID 0 in the root namespace).
.PP
Note that a call to
.BR execve (2)
will cause a process's capabilities to be recalculated in the usual way (see
@ -146,7 +146,7 @@ unless the process has a user ID of 0 within the namespace,
or the executable file has a nonempty inheritable capabilities mask,
the process will lose all capabilities.
See the discussion of user and group ID mappings, below.
.PP
A call to
.BR clone (2),
.BR unshare (2),
@ -171,7 +171,7 @@ retaining its user namespace membership by using a pair of
.BR setns (2)
calls to move to another user namespace and then return to
its original user namespace.
.PP
The rules for determining whether or not a process has a capability
in a particular user namespace are as follows:
.IP 1. 3
@ -222,7 +222,7 @@ only on resources governed by that namespace.
In other words, having a capability in a user namespace permits a process
to perform privileged operations on resources that are governed by (nonuser)
namespaces associated with the user namespace (see the next subsection).
.PP
On the other hand, there are many privileged operations that affect
resources that are not associated with any namespace type,
for example, changing the system time (governed by
@ -234,14 +234,14 @@ and creating a device (governed by
Only a process with privileges in the
.I initial
user namespace can perform such operations.
.PP
Holding
.B CAP_SYS_ADMIN
within the user namespace associated with a process's mount namespace
allows that process to create bind mounts
and mount the following types of filesystems:
.\" fs_flags = FS_USERNS_MOUNT in kernel sources
.PP
.RS 4
.PD 0
.IP * 2
@ -278,7 +278,7 @@ cgroup version 1 named hierarchies
(i.e., cgroup filesystems mounted with the
.BR """none,name="""
option).
.PP
Holding
.B CAP_SYS_ADMIN
within the user namespace associated with a process's PID namespace
@ -286,7 +286,7 @@ allows (since Linux 3.8)
that process to mount
.I /proc
filesystems.
.PP
Note however, that mounting block-based filesystems can be done
only by a process that holds
.BR CAP_SYS_ADMIN
@ -299,13 +299,13 @@ Starting in Linux 3.8, unprivileged processes can create user namespaces,
and other the other types of namespaces can be created with just the
.B CAP_SYS_ADMIN
capability in the caller's user namespace.
.PP
When a non-user-namespace is created,
it is owned by the user namespace in which the creating process
was a member at the time of the creation of the namespace.
Actions on the non-user-namespace
require capabilities in the corresponding user namespace.
.PP
If
.BR CLONE_NEWUSER
is specified along with other
@ -322,7 +322,7 @@ or caller
privileges over the remaining namespaces created by the call.
Thus, it is possible for an unprivileged caller to specify this combination
of flags.
.PP
When a new namespace (other than a user namespace) is created via
.BR clone (2)
or
@ -344,7 +344,7 @@ the process's UTS namespace, and check whether the process has the
required capability
.RB ( CAP_SYS_ADMIN )
in that user namespace.
.PP
The
.BR NS_GET_USERNS
.BR ioctl (2)
@ -369,13 +369,13 @@ inside the user namespace for the process
.IR pid .
These files can be read to view the mappings in a user namespace and
written to (once) to define the mappings.
.PP
The description in the following paragraphs explains the details for
.IR uid_map ;
.IR gid_map
is exactly the same,
but each instance of "user ID" is replaced by "group ID".
.PP
The
.I uid_map
file exposes the mapping of user IDs from the user namespace
@ -389,7 +389,7 @@ will potentially see different values when reading from a particular
.I uid_map
file, depending on the user ID mappings for the user namespaces
of the reading processes.
.PP
Each line in the
.I uid_map
file specifies a 1-to-1 mapping of a range of contiguous
@ -441,7 +441,7 @@ System calls that return user IDs (group IDs)\(emfor example,
and the credential fields in the structure returned by
.BR stat (2)\(emreturn
the user ID (group ID) mapped into the caller's user namespace.
.PP
When a process accesses a file, its user and group IDs
are mapped into the initial user namespace for the purpose of permission
checking and assigning IDs when creating a file.
@ -449,7 +449,7 @@ When a process retrieves file user and group IDs via
.BR stat (2),
the IDs are mapped in the opposite direction,
to produce values relative to the process user and group ID mappings.
.PP
The initial user namespace has no parent namespace,
but, for consistency, the kernel provides dummy user and group
ID mapping files for this namespace.
@ -458,14 +458,14 @@ Looking at the
file
.RI ( gid_map
is the same) from a shell in the initial namespace shows:
.PP
.in +4n
.nf
$ \fBcat /proc/$$/uid_map\fP
0 0 4294967295
.fi
.in
.PP
This mapping tells us
that the range starting at user ID 0 in this namespace
maps to a range starting at 0 in the (nonexistent) parent namespace,
@ -499,7 +499,7 @@ file in a user namespace fails with the error
Similar rules apply for
.I gid_map
files.
.PP
The lines written to
.IR uid_map
.RI ( gid_map )
@ -540,7 +540,7 @@ At least one line must be written to the file.
.PP
Writes that violate the above rules fail with the error
.BR EINVAL .
.PP
In order for a process to write to the
.I /proc/[pid]/uid_map
.RI ( /proc/[pid]/gid_map )
@ -623,7 +623,7 @@ and
.I gid_map
files have been written, only the mapped values may be used in
system calls that change user and group IDs.
.PP
For user IDs, the relevant system calls include
.BR setuid (2),
.BR setfsuid (2),
@ -637,7 +637,7 @@ For group IDs, the relevant system calls include
.BR setresgid (2),
and
.BR setgroups (2).
.PP
Writing
.RI \(dq deny \(dq
to the
@ -685,7 +685,7 @@ file (and regardless of the process's capabilities), calls to
are also not permitted if
.IR /proc/[pid]/gid_map
has not yet been set.
.PP
A privileged process (one with the
.BR CAP_SYS_ADMIN
capability in the namespace) may write either of the strings
@ -701,7 +701,7 @@ Writing the string
.RI \(dq deny \(dq
prevents any process in the user namespace from employing
.BR setgroups (2).
.PP
The essence of the restrictions described in the preceding
paragraph is that it is permitted to write to
.I /proc/[pid]/setgroups
@ -720,10 +720,10 @@ a process can transition only from
being disallowed to
.BR setgroups (2)
being allowed.
.PP
The default value of this file in the initial user namespace is
.RI \(dq allow \(dq.
.PP
Once
.IR /proc/[pid]/gid_map
has been written to
@ -738,11 +738,11 @@ to
.IR /proc/[pid]/setgroups
(the write fails with the error
.BR EPERM ).
.PP
A child user namespace inherits the
.IR /proc/[pid]/setgroups
setting from its parent.
.PP
If the
.I setgroups
file has the value
@ -756,7 +756,7 @@ to the file) in this user namespace.
.BR EPERM .)
This restriction also propagates down to all child user namespaces of
this user namespace.
.PP
The
.I /proc/[pid]/setgroups
file was added in Linux 3.19,
@ -815,7 +815,7 @@ and
.IR /proc/sys/kernel/overflowgid
in
.BR proc (5).
.PP
The cases where unmapped IDs are mapped in this fashion include
system calls that return user IDs
.RB ( getuid (2),
@ -843,7 +843,7 @@ credentials written to the process accounting file (see
.BR acct (5)),
and credentials returned with POSIX message queue notifications (see
.BR mq_notify (3)).
.PP
There is one notable case where unmapped user and group IDs are
.I not
.\" from_kuid(), from_kgid()
@ -909,7 +909,7 @@ User namespaces require support in a range of subsystems across
the kernel.
When an unsupported subsystem is configured into the kernel,
it is not possible to configure user namespaces support.
.PP
As at Linux 3.8, most relevant subsystems supported user namespaces,
but a number of filesystems did not have the infrastructure needed
to map user and group IDs between user namespaces.
@ -929,9 +929,9 @@ The comments and
.I usage()
function inside the program provide a full explanation of the program.
The following shell session demonstrates its use.
.PP
First, we look at the run-time environment:
.PP
.in +4n
.nf
$ \fBuname \-rs\fP # Need Linux 3.8 or later
@ -942,7 +942,7 @@ $ \fBid \-g\fP
1000
.fi
.in
.PP
Now start a new shell in new user
.RI ( \-U ),
mount
@ -954,16 +954,16 @@ namespaces, with user ID
and group ID
.RI ( \-G )
1000 mapped to 0 inside the user namespace:
.PP
.in +4n
.nf
$ \fB./userns_child_exec \-p \-m \-U \-M '0 1000 1' \-G '0 1000 1' bash\fP
.fi
.in
.PP
The shell has PID 1, because it is the first process in the new
PID namespace:
.PP
.in +4n
.nf
bash$ \fBecho $$\fP
@ -975,7 +975,7 @@ Mounting a new
filesystem and listing all of the processes visible
in the new PID namespace shows that the shell can't see
any processes outside the PID namespace:
.PP
.in +4n
.nf
bash$ \fBmount \-t proc proc /proc\fP
@ -985,10 +985,10 @@ bash$ \fBps ax\fP
22 pts/3 R+ 0:00 ps ax
.fi
.in
.PP
Inside the user namespace, the shell has user and group ID 0,
and a full set of permitted and effective capabilities:
.PP
.in +4n
.nf
bash$ \fBcat /proc/$$/status | egrep '^[UG]id'\fP

View File

@ -46,7 +46,7 @@ The ISO 10646 Universal Character Set (UCS),
a superset of Unicode, occupies an even larger code
space\(em31\ bits\(emand the obvious
UCS-4 encoding for it (a sequence of 32-bit words) has the same problems.
.PP
The UTF-8 encoding of Unicode and UCS
does not have these problems and is the common way in which
Unicode is used on UNIX-style operating systems.

View File

@ -144,7 +144,7 @@ The list of attribute names that
can be returned is also limited to 64 kB
(see BUGS in
.BR listxattr (2)).
.PP
Some filesystems, such as Reiserfs (and, historically, ext2 and ext3),
require the filesystem to be mounted with the
.B user_xattr
@ -160,10 +160,10 @@ In the Btrfs, XFS, and Reiserfs filesystem implementations, there is no
practical limit on the number of extended attributes
associated with a file, and the algorithms used to store extended
attribute information on disk are scalable.
.PP
In the JFS, XFS, and Reiserfs filesystem implementations,
the limit on bytes used in an EA value is the ceiling imposed by the VFS.
.PP
In the Btrfs filesystem implementation,
the total bytes used for the name, value, and implementation overhead bytes
is limited to the filesystem
@ -177,7 +177,7 @@ Since the filesystems on which extended attributes are stored might also
be used on architectures with a different byte order and machine word
size, care should be taken to store attribute values in an
architecture-independent format.
.PP
This page was formerly named
.BR attr (5).
.\" .SH AUTHORS