mirror of https://github.com/mkerrisk/man-pages
aio.7, arp.7, attributes.7, boot.7, cgroups.7, cpuset.7, credentials.7, fanotify.7, fifo.7, glob.7, hier.7, hostname.7, icmp.7, inode.7, inotify.7, keyrings.7, libc.7, mailaddr.7, mount_namespaces.7, mq_overview.7, nptl.7, numa.7, path_resolution.7, persistent-keyring.7, pid_namespaces.7, pipe.7, pkeys.7, process-keyring.7, pthreads.7, pty.7, random.7, sched.7, sem_overview.7, session-keyring.7, shm_overview.7, signal-safety.7, signal.7, spufs.7, standards.7, symlink.7, termio.7, thread-keyring.7, time.7, unicode.7, user-keyring.7, user-session-keyring.7, user_namespaces.7, utf-8.7, xattr.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
parent
38db2ef4d0
commit
a721e8b25f
14
man7/aio.7
14
man7/aio.7
|
@ -34,7 +34,7 @@ The application can elect to be notified of completion of
|
|||
the I/O operation in a variety of ways:
|
||||
by delivery of a signal, by instantiation of a thread,
|
||||
or no notification at all.
|
||||
|
||||
.PP
|
||||
The POSIX AIO interface consists of the following functions:
|
||||
.TP 16
|
||||
.BR aio_read (3)
|
||||
|
@ -171,11 +171,11 @@ The control block buffer and the buffer pointed to by
|
|||
.I aio_buf
|
||||
must not be changed while the I/O operation is in progress.
|
||||
These buffers must remain valid until the I/O operation completes.
|
||||
|
||||
.PP
|
||||
Simultaneous asynchronous read or write operations using the same
|
||||
.I aiocb
|
||||
structure yield undefined results.
|
||||
|
||||
.PP
|
||||
The current Linux POSIX AIO implementation is provided in user space by glibc.
|
||||
This has a number of limitations, most notably that maintaining multiple
|
||||
threads to perform I/O operations is expensive and scales poorly.
|
||||
|
@ -206,18 +206,18 @@ of a signal.
|
|||
After all I/O requests have completed,
|
||||
the program retrieves their status using
|
||||
.BR aio_return (3).
|
||||
|
||||
.PP
|
||||
The
|
||||
.B SIGQUIT
|
||||
signal (generated by typing control-\\) causes the program to request
|
||||
cancellation of each of the outstanding requests using
|
||||
.BR aio_cancel (3).
|
||||
|
||||
.PP
|
||||
Here is an example of what we might see when running this program.
|
||||
In this example, the program queues two requests to standard input,
|
||||
and these are satisfied by two lines of input containing
|
||||
"abc" and "x".
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
$ \fB./a.out /dev/stdin /dev/stdin\fP
|
||||
|
@ -462,7 +462,7 @@ main(int argc, char *argv[])
|
|||
.BR aio_return (3),
|
||||
.BR aio_write (3),
|
||||
.BR lio_listio (3)
|
||||
|
||||
.PP
|
||||
"Asynchronous I/O Support in Linux 2.5",
|
||||
Bhattacharya, Pratt, Pulavarty, and Morgan,
|
||||
Proceedings of the Linux Symposium, 2003,
|
||||
|
|
22
man7/arp.7
22
man7/arp.7
|
@ -21,7 +21,7 @@ and IPv4 protocol addresses on directly connected networks.
|
|||
The user normally doesn't interact directly with this module except to
|
||||
configure it;
|
||||
instead it provides a service for other protocols in the kernel.
|
||||
|
||||
.PP
|
||||
A user process can receive ARP packets by using
|
||||
.BR packet (7)
|
||||
sockets.
|
||||
|
@ -34,7 +34,7 @@ The ARP table can also be controlled via
|
|||
on any
|
||||
.B AF_INET
|
||||
socket.
|
||||
|
||||
.PP
|
||||
The ARP module maintains a cache of mappings between hardware addresses
|
||||
and protocol addresses.
|
||||
The cache has a limited size so old and less
|
||||
|
@ -46,7 +46,7 @@ be directly manipulated by the use of ioctls and its behavior can be
|
|||
tuned by the
|
||||
.I /proc
|
||||
interfaces described below.
|
||||
|
||||
.PP
|
||||
When there is no positive feedback for an existing mapping after some
|
||||
time (see the
|
||||
.I /proc
|
||||
|
@ -69,7 +69,7 @@ If that fails too, it will broadcast a new ARP
|
|||
request to the network.
|
||||
Requests are sent only when there is data queued
|
||||
for sending.
|
||||
|
||||
.PP
|
||||
Linux will automatically add a nonpermanent proxy arp entry when it
|
||||
receives a request for an address it forwards to and proxy arp is
|
||||
enabled on the receiving interface.
|
||||
|
@ -81,7 +81,7 @@ sockets.
|
|||
They take a pointer to a
|
||||
.I struct arpreq
|
||||
as their argument.
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
struct arpreq {
|
||||
|
@ -93,14 +93,14 @@ struct arpreq {
|
|||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
.BR SIOCSARP ", " SIOCDARP " and " SIOCGARP
|
||||
respectively set, delete and get an ARP mapping.
|
||||
Setting and deleting ARP maps are privileged operations and may
|
||||
be performed only by a process with the
|
||||
.B CAP_NET_ADMIN
|
||||
capability or an effective UID of 0.
|
||||
|
||||
.PP
|
||||
.I arp_pa
|
||||
must be an
|
||||
.B AF_INET
|
||||
|
@ -276,13 +276,13 @@ changed in Linux 2.0 to include the
|
|||
.I arp_dev
|
||||
member and the ioctl numbers changed at the same time.
|
||||
Support for the old ioctls was dropped in Linux 2.2.
|
||||
|
||||
.PP
|
||||
Support for proxy arp entries for networks (netmask not equal 0xffffffff)
|
||||
was dropped in Linux 2.2.
|
||||
It is replaced by automatic proxy arp setup by
|
||||
the kernel for all reachable hosts on other interfaces (when
|
||||
forwarding and proxy arp is enabled for the interface).
|
||||
|
||||
.PP
|
||||
The
|
||||
.I neigh/*
|
||||
interfaces did not exist before Linux 2.2.
|
||||
|
@ -290,13 +290,13 @@ interfaces did not exist before Linux 2.2.
|
|||
Some timer settings are specified in jiffies, which is architecture-
|
||||
and kernel version-dependent; see
|
||||
.BR time (7).
|
||||
|
||||
.PP
|
||||
There is no way to signal positive feedback from user space.
|
||||
This means connection-oriented protocols implemented in user space
|
||||
will generate excessive ARP traffic, because ndisc will regularly
|
||||
reprobe the MAC address.
|
||||
The same problem applies for some kernel protocols (e.g., NFS over UDP).
|
||||
|
||||
.PP
|
||||
This man page mashes together functionality that is IPv4-specific
|
||||
with functionality that is shared between IPv4 and IPv6.
|
||||
.SH SEE ALSO
|
||||
|
|
|
@ -32,7 +32,7 @@ the text of this man page is based on the material taken from
|
|||
the "POSIX Safety Concepts" section of the GNU C Library manual.
|
||||
Further details on the topics described here can be found in that
|
||||
manual.
|
||||
|
||||
.PP
|
||||
Various function manual pages include a section ATTRIBUTES
|
||||
that describes the safety of calling the function in various contexts.
|
||||
This section annotates functions with the following safety markings:
|
||||
|
@ -43,7 +43,7 @@ or
|
|||
Thread-Safe functions are safe to call in the presence
|
||||
of other threads.
|
||||
MT, in MT-Safe, stands for Multi Thread.
|
||||
|
||||
.IP
|
||||
Being MT-Safe does not imply a function is atomic, nor that it uses any
|
||||
of the memory synchronization mechanisms POSIX exposes to users.
|
||||
It is even possible that calling MT-Safe functions in sequence
|
||||
|
@ -52,7 +52,7 @@ For example, having a thread call two MT-Safe
|
|||
functions one right after the other does not guarantee behavior
|
||||
equivalent to atomic execution of a combination of both functions,
|
||||
since concurrent calls in other threads may interfere in a destructive way.
|
||||
|
||||
.IP
|
||||
Whole-program optimizations that could inline functions across library
|
||||
interfaces may expose unsafe reordering, and so performing inlining
|
||||
across the GNU C Library interface is not recommended.
|
||||
|
@ -340,7 +340,7 @@ Functions marked with
|
|||
.I init
|
||||
as an MT-Unsafe feature perform
|
||||
MT-Unsafe initialization when they are first called.
|
||||
|
||||
.IP
|
||||
Calling such a function at least once in single-threaded mode removes
|
||||
this specific cause for the function to be regarded as MT-Unsafe.
|
||||
If no other cause for that remains,
|
||||
|
@ -517,7 +517,7 @@ modify enables readers to be regarded as MT-Safe \" and AS-Safe
|
|||
(as long as no other reasons for them to be unsafe remain),
|
||||
since the lack of synchronization is not a problem when the
|
||||
objects are effectively constant.
|
||||
|
||||
.IP
|
||||
The identifier that follows the
|
||||
.I const
|
||||
mark will appear by itself as a safety note in readers.
|
||||
|
@ -556,7 +556,7 @@ as a MT-Safety issue
|
|||
may temporarily install a signal handler for internal purposes,
|
||||
which may interfere with other uses of the signal,
|
||||
identified after a colon.
|
||||
|
||||
.IP
|
||||
This safety problem can be worked around by ensuring that no other uses
|
||||
of the signal will take place for the duration of the call.
|
||||
Holding a non-recursive mutex while calling all functions that use the same
|
||||
|
@ -594,7 +594,7 @@ are MT-Unsafe.
|
|||
.\" The same window enables changes made by asynchronous signals to be lost.
|
||||
.\" These functions are also AS-Unsafe,
|
||||
.\" but the corresponding mark is omitted as redundant.
|
||||
|
||||
.IP
|
||||
It is thus advisable for applications using the terminal to avoid
|
||||
concurrent and reentrant interactions with it,
|
||||
by not using it in signal handlers or blocking signals that might use it,
|
||||
|
@ -645,7 +645,7 @@ annotated with
|
|||
called concurrently with locale changes may
|
||||
behave in ways that do not correspond to any of the locales active
|
||||
during their execution, but an unpredictable mix thereof.
|
||||
|
||||
.IP
|
||||
We do not mark these functions as MT-Unsafe, \" or AS-Unsafe,
|
||||
however,
|
||||
because functions that modify the locale object are marked with
|
||||
|
@ -677,7 +677,7 @@ environment with
|
|||
.BR getenv (3)
|
||||
or similar, without any guards to ensure
|
||||
safety in the presence of concurrent modifications.
|
||||
|
||||
.IP
|
||||
We do not mark these functions as MT-Unsafe, \" or AS-Unsafe,
|
||||
however,
|
||||
because functions that modify the environment are all marked with
|
||||
|
@ -716,7 +716,7 @@ GNU C Library
|
|||
.I _sigintr
|
||||
internal data structure without any guards to ensure
|
||||
safety in the presence of concurrent modifications.
|
||||
|
||||
.IP
|
||||
We do not mark these functions as MT-Unsafe, \" or AS-Unsafe,
|
||||
however,
|
||||
because functions that modify this data structure are all marked with
|
||||
|
@ -797,7 +797,7 @@ as an MT-Safety issue may temporarily
|
|||
change the current working directory during their execution,
|
||||
which may cause relative pathnames to be resolved in unexpected ways in
|
||||
other threads or within asynchronous signal or cancellation handlers.
|
||||
|
||||
.IP
|
||||
This is not enough of a reason to mark so-marked functions as MT-Unsafe,
|
||||
.\" or AS-Unsafe,
|
||||
but when this behavior is optional (e.g.,
|
||||
|
@ -836,7 +836,7 @@ It is envisioned that it may be applied to
|
|||
and
|
||||
.I corrupt
|
||||
as well in the future.
|
||||
|
||||
.IP
|
||||
In most cases, the identifier will name a set of functions,
|
||||
but it may name global objects or function arguments,
|
||||
or identifiable properties or logical components associated with them,
|
||||
|
@ -848,7 +848,7 @@ or
|
|||
.I :tcattr(fd)
|
||||
to denote the terminal attributes of a file descriptor
|
||||
.IR fd .
|
||||
|
||||
.IP
|
||||
The most common use for identifiers is to provide logical groups of
|
||||
functions and arguments that need to be protected by the same
|
||||
synchronization primitive in order to ensure safe operation in a given
|
||||
|
@ -874,7 +874,7 @@ indicate the preceding marker only applies when argument
|
|||
is NULL, or global variable
|
||||
.I one_per_line
|
||||
is nonzero.
|
||||
|
||||
.IP
|
||||
When all marks that render a function unsafe are
|
||||
adorned with such conditions,
|
||||
and none of the named conditions hold,
|
||||
|
|
28
man7/boot.7
28
man7/boot.7
|
@ -37,7 +37,7 @@ After power-on or hard reset, control is given
|
|||
to a program stored in read-only memory (normally
|
||||
PROM); for historical reasons involving the personal
|
||||
computer, this program is often called "the \fBBIOS\fR".
|
||||
|
||||
.PP
|
||||
This program normally performs a basic self-test of the
|
||||
machine and accesses nonvolatile memory to read
|
||||
further parameters.
|
||||
|
@ -46,7 +46,7 @@ battery-backed CMOS memory, so most people
|
|||
refer to it as "the \fBCMOS\fR"; outside
|
||||
of the PC world, it is usually called "the \fBNVRAM\fR"
|
||||
(nonvolatile RAM).
|
||||
|
||||
.PP
|
||||
The parameters stored in the NVRAM vary among
|
||||
systems, but as a minimum, they should specify
|
||||
which device can supply an OS loader, or at least which
|
||||
|
@ -67,11 +67,11 @@ interactive use, in order to enable specification of an alternative
|
|||
kernel (maybe a backup in case the one last compiled
|
||||
isn't functioning) and to pass optional parameters
|
||||
to the kernel.
|
||||
|
||||
.PP
|
||||
In a traditional PC, the OS loader is located in the initial 512-byte block
|
||||
of the boot device; this block is known as "the \fBMBR\fR"
|
||||
(Master Boot Record).
|
||||
|
||||
.PP
|
||||
In most systems, the OS loader is very
|
||||
limited due to various constraints.
|
||||
Even on non-PC systems,
|
||||
|
@ -79,12 +79,12 @@ there are some limitations on the size and complexity
|
|||
of this loader, but the size limitation of the PC MBR
|
||||
(512 bytes, including the partition table) makes it
|
||||
almost impossible to squeeze much functionality into it.
|
||||
|
||||
.PP
|
||||
Therefore, most systems split the role of loading the OS between
|
||||
a primary OS loader and a secondary OS loader; this secondary
|
||||
OS loader may be located within a larger portion of persistent
|
||||
storage, such as a disk partition.
|
||||
|
||||
.PP
|
||||
In Linux, the OS loader is often either
|
||||
.BR lilo (8)
|
||||
or
|
||||
|
@ -98,13 +98,13 @@ The kernel starts the virtual memory
|
|||
swapper (it is a kernel process, called "kswapd" in a modern Linux
|
||||
kernel), and mounts some filesystem at the root path,
|
||||
.IR / .
|
||||
|
||||
.PP
|
||||
Some of the parameters that may be passed to the kernel
|
||||
relate to these activities (for example, the default root filesystem
|
||||
can be overridden); for further information
|
||||
on Linux kernel parameters, read
|
||||
.BR bootparam (7).
|
||||
|
||||
.PP
|
||||
Only then does the kernel create the initial userland
|
||||
process, which is given the number 1 as its
|
||||
.B PID
|
||||
|
@ -136,13 +136,13 @@ the administrator an easy way to establish an environment
|
|||
for some usage; each run-level is associated with a set of services
|
||||
(for example, run-level \fBS\fR is \fIsingle-user\fR mode,
|
||||
and run-level \fB2\fR entails running most network services).
|
||||
|
||||
.PP
|
||||
The administrator may change the current
|
||||
run-level via
|
||||
.BR init (1),
|
||||
and query the current run-level via
|
||||
.BR runlevel (8).
|
||||
|
||||
.PP
|
||||
However, since it is not convenient to manage individual services
|
||||
by editing this file,
|
||||
.I /etc/inittab
|
||||
|
@ -174,7 +174,7 @@ of the form \fI/etc/rc[0\-6S].d\fR.
|
|||
In each of these directories,
|
||||
there are links (usually symbolic) to the scripts in the \fI/etc/init.d\fR
|
||||
directory.
|
||||
|
||||
.PP
|
||||
A primary script (usually \fI/etc/rc\fR) is called from
|
||||
.BR inittab (5);
|
||||
this primary script calls each service's script via a link in the
|
||||
|
@ -183,7 +183,7 @@ Each link whose name begins with \(aqS\(aq is called with
|
|||
the argument "start" (thereby starting the service).
|
||||
Each link whose name begins with \(aqK\(aq is called with
|
||||
the argument "stop" (thereby stopping the service).
|
||||
|
||||
.PP
|
||||
To define the starting or stopping order within the same run-level,
|
||||
the name of a link contains an \fBorder-number\fR.
|
||||
Also, for clarity, the name of a link usually
|
||||
|
@ -193,7 +193,7 @@ the link \fI/etc/rc2.d/S80sendmail\fR starts the sendmail service on
|
|||
runlevel 2.
|
||||
This happens after \fI/etc/rc2.d/S12syslog\fR is run
|
||||
but before \fI/etc/rc2.d/S90xfs\fR is run.
|
||||
|
||||
.PP
|
||||
To manage these links is to manage the boot order and run-levels;
|
||||
under many systems, there are tools to help with this task
|
||||
(e.g.,
|
||||
|
@ -207,7 +207,7 @@ inputs without editing an entire boot script,
|
|||
some separate configuration file is used, and is located in a specific
|
||||
directory where an associated boot script may find it
|
||||
(\fI/etc/sysconfig\fR on older Red Hat systems).
|
||||
|
||||
.PP
|
||||
In older UNIX systems, such a file contained the actual command line
|
||||
options for a daemon, but in modern Linux systems (and also
|
||||
in HP-UX), it just contains shell variables.
|
||||
|
|
106
man7/cgroups.7
106
man7/cgroups.7
|
@ -42,7 +42,7 @@ A
|
|||
.I cgroup
|
||||
is a collection of processes that are bound to a set of
|
||||
limits or parameters defined via the cgroup filesystem.
|
||||
|
||||
.PP
|
||||
A
|
||||
.I subsystem
|
||||
is a kernel component that modifies the behavior of
|
||||
|
@ -54,7 +54,7 @@ and freezing and resuming execution of the processes in a cgroup.
|
|||
Subsystems are sometimes also known as
|
||||
.IR "resource controllers"
|
||||
(or simply, controllers).
|
||||
|
||||
.PP
|
||||
The cgroups for a controller are arranged in a
|
||||
.IR hierarchy .
|
||||
This hierarchy is defined by creating, removing, and
|
||||
|
@ -77,7 +77,7 @@ and management of the cgroup hierarchies became rather complex.
|
|||
(A longer description of these problems can be found in
|
||||
the kernel source file
|
||||
.IR Documentation/cgroup\-v2.txt .)
|
||||
|
||||
.PP
|
||||
Because of the problems with the initial cgroups implementation
|
||||
(cgroups version 1),
|
||||
starting in Linux 3.10, work began on a new,
|
||||
|
@ -87,7 +87,7 @@ Initially marked experimental, and hidden behind the
|
|||
mount option, the new version (cgroups version 2)
|
||||
was eventually made official with the release of Linux 4.5.
|
||||
Differences between the two versions are described in the text below.
|
||||
|
||||
.PP
|
||||
Although cgroups v2 is intended as a replacement for cgroups v1,
|
||||
the older system continues to exist
|
||||
(and for compatibility reasons is unlikely to be removed).
|
||||
|
@ -109,7 +109,7 @@ processes on the system.
|
|||
It is also possible comount multiple (or even all) cgroups v1 controllers
|
||||
against the same cgroup filesystem, meaning that the comounted controllers
|
||||
manage the same hierarchical organization of processes.
|
||||
|
||||
.PP
|
||||
For each mounted hierarchy,
|
||||
the directory tree mirrors the control group hierarchy.
|
||||
Each control group is represented by a directory, with each of its child
|
||||
|
@ -125,7 +125,7 @@ which is a child of
|
|||
Under each cgroup directory is a set of files which can be read or
|
||||
written to, reflecting resource limits and a few general cgroup
|
||||
properties.
|
||||
|
||||
.PP
|
||||
In addition, in cgroups v1,
|
||||
cgroups can be mounted with no bound controller, in which case
|
||||
they serve only to track processes.
|
||||
|
@ -160,7 +160,7 @@ The use of cgroups requires a kernel built with the
|
|||
option.
|
||||
In addition, each of the v1 controllers has an associated
|
||||
configuration option that must be set in order to employ that controller.
|
||||
|
||||
.PP
|
||||
In order to use a v1 controller,
|
||||
it must be mounted against a cgroup filesystem.
|
||||
The usual place for such mounts is under a
|
||||
|
@ -170,26 +170,26 @@ filesystem mounted at
|
|||
Thus, one might mount the
|
||||
.I cpu
|
||||
controller as follows:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
mount \-t cgroup \-o cpu none /sys/fs/cgroup/cpu
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
It is possible to comount multiple controllers against the same hierarchy.
|
||||
For example, here the
|
||||
.IR cpu
|
||||
and
|
||||
.IR cpuacct
|
||||
controllers are comounted against a single hierarchy:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
mount \-t cgroup \-o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Comounting controllers has the effect that a process is in the same cgroup for
|
||||
all of the comounted controllers.
|
||||
Separately mounting controllers allows a process to
|
||||
|
@ -198,19 +198,19 @@ be in cgroup
|
|||
for one controller while being in
|
||||
.I /foo2/foo3
|
||||
for another.
|
||||
|
||||
.PP
|
||||
It is possible to comount all v1 controllers against the same hierarchy:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
mount \-t cgroup \-o all cgroup /sys/fs/cgroup
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
(One can achieve the same result by omitting
|
||||
.IR "\-o all" ,
|
||||
since it is the default if no controllers are explicitly specified.)
|
||||
|
||||
.PP
|
||||
It is not possible to mount the same controller
|
||||
against multiple cgroup hierarchies.
|
||||
For example, it is not possible to mount both the
|
||||
|
@ -224,7 +224,7 @@ It is possible to create multiple mount points with exactly
|
|||
the same set of comounted controllers.
|
||||
However, in this case all that results is multiple mount points
|
||||
providing a view of the same hierarchy.
|
||||
|
||||
.PP
|
||||
Note that on many systems, the v1 controllers are automatically mounted under
|
||||
.IR /sys/fs/cgroup ;
|
||||
in particular,
|
||||
|
@ -244,7 +244,7 @@ when a system is busy.
|
|||
This does not limit a cgroup's CPU usage if the CPUs are not busy.
|
||||
For further information, see
|
||||
.IR Documentation/scheduler/sched-design-CFS.txt .
|
||||
|
||||
.IP
|
||||
In Linux 3.2,
|
||||
this controller was extended to provide CPU "bandwidth" control.
|
||||
If the kernel is configured with
|
||||
|
@ -258,21 +258,21 @@ Further information can be found in the kernel source file
|
|||
.TP
|
||||
.IR cpuacct " (since Linux 2.6.24; " \fBCONFIG_CGROUP_CPUACCT\fP )
|
||||
This provides accounting for CPU usage by groups of processes.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup\-v1/cpuacct.txt .
|
||||
.TP
|
||||
.IR cpuset " (since Linux 2.6.24; " \fBCONFIG_CPUSETS\fP )
|
||||
This cgroup can be used to bind the processes in a cgroup to
|
||||
a specified set of CPUs and NUMA nodes.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup\-v1/cpusets.txt .
|
||||
.TP
|
||||
.IR memory " (since Linux 2.6.25; " \fBCONFIG_MEMCG\fP )
|
||||
The memory controller supports reporting and limiting of process memory, kernel
|
||||
memory, and swap used by cgroups.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup\-v1/memory.txt .
|
||||
.TP
|
||||
|
@ -282,7 +282,7 @@ well as open them for reading or writing.
|
|||
The policies may be specified as whitelists and blacklists.
|
||||
Hierarchy is enforced, so new rules must not
|
||||
violate existing rules for the target or ancestor cgroups.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup-v1/devices.txt .
|
||||
.TP
|
||||
|
@ -295,7 +295,7 @@ Freezing a cgroup
|
|||
also causes its children, for example, processes in
|
||||
.IR /A/B ,
|
||||
to be frozen.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup-v1/freezer-subsystem.txt .
|
||||
.TP
|
||||
|
@ -307,7 +307,7 @@ as well as used to shape traffic using
|
|||
.BR tc (8).
|
||||
This applies only to packets
|
||||
leaving the cgroup, not to traffic arriving at the cgroup.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup-v1/net_cls.txt .
|
||||
.TP
|
||||
|
@ -317,14 +317,14 @@ The
|
|||
cgroup controls and limits access to specified block devices by
|
||||
applying IO control in the form of throttling and upper limits against leaf
|
||||
nodes and intermediate nodes in the storage hierarchy.
|
||||
|
||||
.IP
|
||||
Two policies are available.
|
||||
The first is a proportional-weight time-based division
|
||||
of disk implemented with CFQ.
|
||||
This is in effect for leaf nodes using CFQ.
|
||||
The second is a throttling policy which specifies
|
||||
upper I/O rate limits on a device.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup-v1/blkio-controller.txt .
|
||||
.TP
|
||||
|
@ -332,26 +332,26 @@ Further information can be found in the kernel source file
|
|||
This controller allows
|
||||
.I perf
|
||||
monitoring of the set of processes grouped in a cgroup.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR tools/perf/Documentation/perf-record.txt .
|
||||
.TP
|
||||
.IR net_prio " (since Linux 3.3; " \fBCONFIG_CGROUP_NET_PRIO\fP )
|
||||
This allows priorities to be specified, per network interface, for cgroups.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup-v1/net_prio.txt .
|
||||
.TP
|
||||
.IR hugetlb " (since Linux 3.5; " \fBCONFIG_CGROUP_HUGETLB\fP )
|
||||
This supports limiting the use of huge pages by cgroups.
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup-v1/hugetlb.txt .
|
||||
.TP
|
||||
.IR pids " (since Linux 4.3; " \fBCONFIG_CGROUP_PIDS\fP )
|
||||
This controller permits limiting the number of process that may be created
|
||||
in a cgroup (and its descendants).
|
||||
|
||||
.IP
|
||||
Further information can be found in the kernel source file
|
||||
.IR Documentation/cgroup-v1/pids.txt .
|
||||
.\"
|
||||
|
@ -359,33 +359,33 @@ Further information can be found in the kernel source file
|
|||
A cgroup filesystem initially contains a single root cgroup, '/',
|
||||
which all processes belong to.
|
||||
A new cgroup is created by creating a directory in the cgroup filesystem:
|
||||
|
||||
.PP
|
||||
mkdir /sys/fs/cgroup/cpu/cg1
|
||||
|
||||
.PP
|
||||
This creates a new empty cgroup.
|
||||
|
||||
.PP
|
||||
A process may be moved to this cgroup by writing its PID into the cgroup's
|
||||
.I cgroup.procs
|
||||
file:
|
||||
|
||||
.PP
|
||||
echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
|
||||
|
||||
.PP
|
||||
Only one PID at a time should be written to this file.
|
||||
|
||||
.PP
|
||||
Writing the value 0 to a
|
||||
.IR cgroup.procs
|
||||
file causes the writing process to be moved to the corresponding cgroup.
|
||||
|
||||
.PP
|
||||
When writing a PID into the
|
||||
.IR cgroup.procs ,
|
||||
all threads in the process are moved into the new cgroup at once.
|
||||
|
||||
.PP
|
||||
Within a hierarchy, a process can be a member of exactly one cgroup.
|
||||
Writing a process's PID to a
|
||||
.IR cgroup.procs
|
||||
file automatically removes it from the cgroup of
|
||||
which it was previously a member.
|
||||
|
||||
.PP
|
||||
The
|
||||
.I cgroup.procs
|
||||
file can be read to obtain a list of the processes that are
|
||||
|
@ -393,7 +393,7 @@ members of a cgroup.
|
|||
The returned list of PIDs is not guaranteed to be in order.
|
||||
Nor is it guaranteed to be free of duplicates.
|
||||
(For example, a PID may be recycled while reading from the list.)
|
||||
|
||||
.PP
|
||||
In cgroups v1 (but not cgroups v2), an individual thread can be moved to
|
||||
another cgroup by writing its thread ID
|
||||
(i.e., the kernel thread ID returned by
|
||||
|
@ -420,7 +420,7 @@ Two files can be used to determine whether the kernel provides
|
|||
notifications when a cgroup becomes empty.
|
||||
A cgroup is considered to be empty when it contains no child
|
||||
cgroups and no member processes.
|
||||
|
||||
.PP
|
||||
A special file in the root directory of each cgroup hierarchy,
|
||||
.IR release_agent ,
|
||||
can be used to register the pathname of a program that may be invoked when
|
||||
|
@ -433,11 +433,11 @@ The
|
|||
.IR release_agent
|
||||
program might remove the cgroup directory,
|
||||
or perhaps repopulate with a process.
|
||||
|
||||
.PP
|
||||
The default value of the
|
||||
.IR release_agent
|
||||
file is empty, meaning that no release agent is invoked.
|
||||
|
||||
.PP
|
||||
Whether or not the
|
||||
.IR release_agent
|
||||
program is invoked when a particular cgroup becomes empty is determined
|
||||
|
@ -462,7 +462,7 @@ While (different) controllers may be simultaneously
|
|||
mounted under the v1 and v2 hierarchies,
|
||||
it is not possible to mount the same controller simultaneously
|
||||
under both the v1 and the v2 hierarchies.
|
||||
|
||||
.PP
|
||||
The new behaviors in cgroups v2 are summarized here,
|
||||
and in some cases elaborated in the following subsections.
|
||||
.IP 1. 3
|
||||
|
@ -506,9 +506,9 @@ all available controllers are mounted against a single hierarchy.
|
|||
The available controllers are automatically mounted,
|
||||
meaning that it is not necessary (or possible) to specify the controllers
|
||||
when mounting the cgroup v2 filesystem using a command such as the following:
|
||||
|
||||
.PP
|
||||
mount -t cgroup2 none /mnt/cgroup2
|
||||
|
||||
.PP
|
||||
A cgroup v2 controller is available only if it is not currently in use
|
||||
via a mount against a cgroup v1 hierarchy.
|
||||
Or, to put things another way, it is not possible to employ
|
||||
|
@ -519,7 +519,7 @@ With the exception of the root cgroup, processes may reside
|
|||
only in leaf nodes (cgroups that do not themselves contain child cgroups).
|
||||
This avoids the need to decide how to partition resources between
|
||||
processes which are members of cgroup A and processes in child cgroups of A.
|
||||
|
||||
.PP
|
||||
For instance, if cgroup
|
||||
.I /cg1/cg2
|
||||
exists, then a process may reside in
|
||||
|
@ -580,7 +580,7 @@ which has either the value 0,
|
|||
meaning that the cgroup (and its descendants)
|
||||
contain no (nonzombie) processes,
|
||||
or 1, meaning that the cgroup contains member processes.
|
||||
|
||||
.PP
|
||||
The
|
||||
.IR cgroup.events
|
||||
file can be monitored, in order to receive notification when a cgroup
|
||||
|
@ -594,7 +594,7 @@ events, and when monitoring the file using
|
|||
transitions generate
|
||||
.B POLLPRI
|
||||
events.
|
||||
|
||||
.PP
|
||||
The cgroups v2
|
||||
.IR notify_on_release
|
||||
mechanism offers at least two advantages over the cgroups v1
|
||||
|
@ -616,7 +616,7 @@ This file contains information about the controllers
|
|||
that are compiled into the kernel.
|
||||
An example of the contents of this file (reformatted for readability)
|
||||
is the following:
|
||||
|
||||
.IP
|
||||
.nf
|
||||
.in +4n
|
||||
#subsys_name hierarchy num_cgroups enabled
|
||||
|
@ -634,7 +634,7 @@ hugetlb 0 1 0
|
|||
pids 2 1 1
|
||||
.in
|
||||
.fi
|
||||
|
||||
.IP
|
||||
The fields in this file are, from left to right:
|
||||
.RS
|
||||
.IP 1. 3
|
||||
|
@ -666,13 +666,13 @@ This file describes control groups to which the process
|
|||
with the corresponding PID belongs.
|
||||
The displayed information differs for
|
||||
cgroups version 1 and version 2 hierarchies.
|
||||
|
||||
.IP
|
||||
For each cgroup hierarchy of which the process is a member,
|
||||
there is one entry containing three
|
||||
colon-separated fields of the form:
|
||||
|
||||
.IP
|
||||
hierarchy-ID:controller-list:cgroup-path
|
||||
|
||||
.IP
|
||||
For example:
|
||||
.IP
|
||||
.in +4n
|
||||
|
|
|
@ -175,7 +175,7 @@ it from the cpuset that previously contained it) by writing its
|
|||
PID to that cpuset's
|
||||
.I tasks
|
||||
file (with or without a trailing newline).
|
||||
|
||||
.IP
|
||||
.B Warning:
|
||||
only one PID may be written to the
|
||||
.I tasks
|
||||
|
@ -199,7 +199,7 @@ in that cpuset are allowed to execute.
|
|||
See \fBList Format\fR below for a description of the
|
||||
format of
|
||||
.IR cpus .
|
||||
|
||||
.IP
|
||||
The CPUs allowed to a cpuset may be changed by
|
||||
writing a new list to its
|
||||
.I cpus
|
||||
|
@ -212,7 +212,7 @@ If set (1), the cpuset has exclusive use of
|
|||
its CPUs (no sibling or cousin cpuset may overlap CPUs).
|
||||
By default, this is off (0).
|
||||
Newly created cpusets also initially default this to off (0).
|
||||
|
||||
.IP
|
||||
Two cpusets are
|
||||
.I sibling
|
||||
cpusets if they share the same parent cpuset in the
|
||||
|
@ -250,7 +250,7 @@ its memory nodes (no sibling or cousin may overlap).
|
|||
Also if set (1), the cpuset is a \fBHardwall\fR cpuset (see below).
|
||||
By default, this is off (0).
|
||||
Newly created cpusets also initially default this to off (0).
|
||||
|
||||
.IP
|
||||
Regardless of the
|
||||
.I mem_exclusive
|
||||
setting, if one cpuset is the ancestor of another,
|
||||
|
|
|
@ -38,7 +38,7 @@ A PID is represented using the type
|
|||
.I pid_t
|
||||
(defined in
|
||||
.IR <sys/types.h> ).
|
||||
|
||||
.PP
|
||||
PIDs are used in a range of system calls to identify the process
|
||||
affected by the call, for example:
|
||||
.BR kill (2),
|
||||
|
@ -59,7 +59,7 @@ and
|
|||
.BR waitpid (2).
|
||||
.\" .BR waitid (2),
|
||||
.\" .BR wait4 (2),
|
||||
|
||||
.PP
|
||||
A process's PID is preserved across an
|
||||
.BR execve (2).
|
||||
.SS Parent process ID (PPID)
|
||||
|
@ -70,7 +70,7 @@ A process can obtain its PPID using
|
|||
.BR getppid (2).
|
||||
A PPID is represented using the type
|
||||
.IR pid_t .
|
||||
|
||||
.PP
|
||||
A process's PPID is preserved across an
|
||||
.BR execve (2).
|
||||
.SS Process group ID and session ID
|
||||
|
@ -81,13 +81,13 @@ A process can obtain its session ID using
|
|||
.BR getsid (2),
|
||||
and its process group ID using
|
||||
.BR getpgrp (2).
|
||||
|
||||
.PP
|
||||
A child created by
|
||||
.BR fork (2)
|
||||
inherits its parent's session ID and process group ID.
|
||||
A process's session ID and process group ID are preserved across an
|
||||
.BR execve (2).
|
||||
|
||||
.PP
|
||||
Sessions and process groups are abstractions devised to support shell
|
||||
job control.
|
||||
A process group (sometimes called a "job") is a collection of
|
||||
|
@ -100,7 +100,7 @@ A process's group membership can be set using
|
|||
.BR setpgid (2).
|
||||
The process whose process ID is the same as its process group ID is the
|
||||
\fIprocess group leader\fP for that group.
|
||||
|
||||
.PP
|
||||
A session is a collection of processes that share the same session ID.
|
||||
All of the members of a process group also have the same session ID
|
||||
(i.e., all of the members of a process group always belong to the
|
||||
|
@ -112,7 +112,7 @@ which creates a new session whose session ID is the same
|
|||
as the PID of the process that called
|
||||
.BR setsid (2).
|
||||
The creator of the session is called the \fIsession leader\fP.
|
||||
|
||||
.PP
|
||||
All of the processes in a session share a
|
||||
.IR "controlling terminal" .
|
||||
The controlling terminal is established when the session leader
|
||||
|
@ -121,7 +121,7 @@ first opens a terminal (unless the
|
|||
flag is specified when calling
|
||||
.BR open (2)).
|
||||
A terminal may be the controlling terminal of at most one session.
|
||||
|
||||
.PP
|
||||
At most one of the jobs in a session may be the
|
||||
.IR "foreground job" ;
|
||||
other jobs in the session are
|
||||
|
@ -143,7 +143,7 @@ When terminal keys that generate a signal (such as the
|
|||
.I interrupt
|
||||
key, normally control-C)
|
||||
are pressed, the signal is sent to the processes in the foreground job.
|
||||
|
||||
.PP
|
||||
Various system calls and library functions
|
||||
may operate on all members of a process group,
|
||||
including
|
||||
|
@ -172,7 +172,7 @@ and
|
|||
.I gid_t
|
||||
(defined in
|
||||
.IR <sys/types.h> ).
|
||||
|
||||
.PP
|
||||
On Linux, each process has the following user and group identifiers:
|
||||
.IP * 3
|
||||
Real user ID and real group ID.
|
||||
|
@ -260,7 +260,7 @@ a process's real user and group ID and supplementary
|
|||
group IDs are preserved;
|
||||
the effective and saved set IDs may be changed, as described in
|
||||
.BR execve (2).
|
||||
|
||||
.PP
|
||||
Aside from the purposes noted above,
|
||||
a process's user IDs are also employed in a number of other contexts:
|
||||
.IP * 3
|
||||
|
|
|
@ -34,14 +34,14 @@ In particular, there is no support for create, delete, and move events.
|
|||
(See
|
||||
.BR inotify (7)
|
||||
for details of an API that does notify those events.)
|
||||
|
||||
.PP
|
||||
Additional capabilities compared to the
|
||||
.BR inotify (7)
|
||||
API include the ability to monitor all of the objects
|
||||
in a mounted filesystem,
|
||||
the ability to make access permission decisions, and the
|
||||
possibility to read or modify files before access by other applications.
|
||||
|
||||
.PP
|
||||
The following system calls are used with this API:
|
||||
.BR fanotify_init (2),
|
||||
.BR fanotify_mark (2),
|
||||
|
@ -104,7 +104,7 @@ or similar)
|
|||
from the fanotify file descriptor
|
||||
returned by
|
||||
.BR fanotify_init (2).
|
||||
|
||||
.PP
|
||||
Two types of events are generated:
|
||||
.I notification
|
||||
events and
|
||||
|
@ -118,7 +118,7 @@ Permission events are requests to the receiving application to decide
|
|||
whether permission for a file access shall be granted.
|
||||
For these events, the recipient must write a response which decides whether
|
||||
access is granted or not.
|
||||
|
||||
.PP
|
||||
An event is removed from the event queue of the fanotify group
|
||||
when it has been read.
|
||||
Permission events that have been read are kept in an internal list of the
|
||||
|
@ -137,11 +137,11 @@ is not specified in the call to
|
|||
until either a file event occurs or the call is interrupted by a signal
|
||||
(see
|
||||
.BR signal (7)).
|
||||
|
||||
.PP
|
||||
After a successful
|
||||
.BR read (2),
|
||||
the read buffer contains one or more of the following structures:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
struct fanotify_event_metadata {
|
||||
|
@ -160,12 +160,12 @@ For performance reasons, it is recommended to use a large
|
|||
buffer size (for example, 4096 bytes),
|
||||
so that multiple events can be retrieved by a single
|
||||
.BR read (2).
|
||||
|
||||
.PP
|
||||
The return value of
|
||||
.BR read (2)
|
||||
is the number of bytes placed in the buffer,
|
||||
or \-1 in case of an error (but see BUGS).
|
||||
|
||||
.PP
|
||||
The fields of the
|
||||
.I fanotify_event_metadata
|
||||
structure are as follows:
|
||||
|
@ -291,7 +291,7 @@ To check for any close event, the following bit mask may be used:
|
|||
.B FAN_CLOSE
|
||||
A file was closed.
|
||||
This is a synonym for:
|
||||
|
||||
.IP
|
||||
FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE
|
||||
.PP
|
||||
The following macros are provided to iterate over a buffer containing
|
||||
|
@ -346,7 +346,7 @@ For permission events, the application must
|
|||
.BR write (2)
|
||||
a structure of the following form to the
|
||||
fanotify file descriptor:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
struct fanotify_response {
|
||||
|
@ -495,7 +495,7 @@ calls to
|
|||
generate
|
||||
.B FAN_MODIFY
|
||||
events.
|
||||
|
||||
.PP
|
||||
As of Linux 3.17,
|
||||
the following bugs exist:
|
||||
.IP * 3
|
||||
|
|
|
@ -55,7 +55,7 @@ When a process tries to write to a FIFO that is not opened
|
|||
for read on the other side, the process is sent a
|
||||
.B SIGPIPE
|
||||
signal.
|
||||
|
||||
.PP
|
||||
FIFO special files can be created by
|
||||
.BR mkfifo (3),
|
||||
and are indicated by
|
||||
|
|
38
man7/glob.7
38
man7/glob.7
|
@ -31,11 +31,11 @@ Long ago, in UNIX\ V6, there was a program
|
|||
.I /etc/glob
|
||||
that would expand wildcard patterns.
|
||||
Soon afterward this became a shell built-in.
|
||||
|
||||
.PP
|
||||
These days there is also a library routine
|
||||
.BR glob (3)
|
||||
that will perform this function for a user program.
|
||||
|
||||
.PP
|
||||
The rules are as follows (POSIX.2, 3.13).
|
||||
.SS Wildcard matching
|
||||
A string is a wildcard pattern if it contains one of the
|
||||
|
@ -44,9 +44,9 @@ Globbing is the operation
|
|||
that expands a wildcard pattern into the list of pathnames
|
||||
matching the pattern.
|
||||
Matching is defined by:
|
||||
|
||||
.PP
|
||||
A \(aq?\(aq (not between brackets) matches any single character.
|
||||
|
||||
.PP
|
||||
A \(aq*\(aq (not between brackets) matches any string,
|
||||
including the empty string.
|
||||
.PP
|
||||
|
@ -81,7 +81,7 @@ any character that is not matched by the expression obtained
|
|||
by removing the first \(aq!\(aq from it.
|
||||
(Thus, "\fI[!]a\-]\fP" matches any
|
||||
single character except \(aq]\(aq, \(aqa\(aq and \(aq\-\(aq.)
|
||||
|
||||
.PP
|
||||
One can remove the special meaning of \(aq?\(aq, \(aq*\(aq and \(aq[\(aq by
|
||||
preceding them by a backslash, or, in case this is part of
|
||||
a shell command line, enclosing them in quotes.
|
||||
|
@ -95,7 +95,7 @@ A \(aq/\(aq in a pathname cannot be matched by a \(aq?\(aq or \(aq*\(aq
|
|||
wildcard, or by a range like "\fI[.\-0]\fP".
|
||||
A range containing an explicit \(aq/\(aq character is syntactically incorrect.
|
||||
(POSIX requires that syntactically incorrect patterns are left unchanged.)
|
||||
|
||||
.PP
|
||||
If a filename starts with a \(aq.\(aq,
|
||||
this character must be matched explicitly.
|
||||
(Thus, \fIrm\ *\fP will not remove .profile, and \fItar\ c\ *\fP will not
|
||||
|
@ -106,11 +106,11 @@ into the list of matching pathnames" was the original UNIX
|
|||
definition.
|
||||
It allowed one to have patterns that expand into
|
||||
an empty list, as in
|
||||
|
||||
.PP
|
||||
.nf
|
||||
xv \-wait 0 *.gif *.jpg
|
||||
.fi
|
||||
|
||||
.PP
|
||||
where perhaps no *.gif files are present (and this is not
|
||||
an error).
|
||||
However, POSIX requires that a wildcard pattern is left
|
||||
|
@ -119,23 +119,23 @@ matching pathnames is empty.
|
|||
With
|
||||
.I bash
|
||||
one can force the classical behavior using this command:
|
||||
|
||||
.PP
|
||||
shopt \-s nullglob
|
||||
.\" In Bash v1, by setting allow_null_glob_expansion=true
|
||||
|
||||
.PP
|
||||
(Similar problems occur elsewhere.
|
||||
For example, where old scripts have
|
||||
|
||||
.PP
|
||||
.nf
|
||||
rm \`find . \-name "*~"\`
|
||||
.fi
|
||||
|
||||
.PP
|
||||
new scripts require
|
||||
|
||||
.PP
|
||||
.nf
|
||||
rm \-f nosuchfile \`find . \-name "*~"\`
|
||||
.fi
|
||||
|
||||
.PP
|
||||
to avoid error messages from
|
||||
.I rm
|
||||
called with an empty argument list.)
|
||||
|
@ -147,7 +147,7 @@ First of all, they match
|
|||
filenames, rather than text, and secondly, the conventions
|
||||
are not the same: for example, in a regular expression \(aq*\(aq means zero or
|
||||
more copies of the preceding thing.
|
||||
|
||||
.PP
|
||||
Now that regular expressions have bracket expressions where
|
||||
the negation is indicated by a \(aq^\(aq, POSIX has declared the
|
||||
effect of a wildcard pattern "\fI[^...]\fP" to be undefined.
|
||||
|
@ -169,13 +169,13 @@ expression: namely (i) the negation, (ii) explicit single characters,
|
|||
and (iii) ranges.
|
||||
POSIX specifies ranges in an internationally
|
||||
more useful way and adds three more types:
|
||||
|
||||
.PP
|
||||
(iii) Ranges X\-Y comprise all characters that fall between X
|
||||
and Y (inclusive) in the current collating sequence as defined
|
||||
by the
|
||||
.B LC_COLLATE
|
||||
category in the current locale.
|
||||
|
||||
.PP
|
||||
(iv) Named character classes, like
|
||||
.nf
|
||||
|
||||
|
@ -191,13 +191,13 @@ These character classes are defined by the
|
|||
.B LC_CTYPE
|
||||
category
|
||||
in the current locale.
|
||||
|
||||
.PP
|
||||
(v) Collating symbols, like "\fI[.ch.]\fP" or "\fI[.a-acute.]\fP",
|
||||
where the string between "\fI[.\fP" and "\fI.]\fP" is a collating
|
||||
element defined for the current locale.
|
||||
Note that this may
|
||||
be a multicharacter element.
|
||||
|
||||
.PP
|
||||
(vi) Equivalence class expressions, like "\fI[=a=]\fP",
|
||||
where the string between "\fI[=\fP" and "\fI=]\fP" is any collating
|
||||
element from its equivalence class, as defined for the
|
||||
|
|
|
@ -271,7 +271,7 @@ This contains information which may change from system release to
|
|||
system release and used to be a symbolic link to
|
||||
.I /usr/src/linux/include/linux
|
||||
to get at operating-system-specific information.
|
||||
|
||||
.IP
|
||||
(Note that one should have include files there that work correctly with
|
||||
the current libc and in user space.
|
||||
However, Linux kernel source is not
|
||||
|
@ -646,5 +646,5 @@ differently.
|
|||
.BR ln (1),
|
||||
.BR proc (5),
|
||||
.BR mount (8)
|
||||
|
||||
.PP
|
||||
The Filesystem Hierarchy Standard
|
||||
|
|
|
@ -43,7 +43,7 @@ hostname \- hostname resolution description
|
|||
Hostnames are domains, where a domain is a hierarchical, dot-separated
|
||||
list of subdomains; for example, the machine "monet", in the "example"
|
||||
subdomain of the "com" domain would be represented as "monet.example.com".
|
||||
|
||||
.PP
|
||||
Each element of the hostname must be from 1 to 63 characters long and the
|
||||
entire hostname, including the dots, can be at most 253 characters long.
|
||||
Valid characters for hostnames are
|
||||
|
@ -58,7 +58,7 @@ to
|
|||
.IR 9 ,
|
||||
and the hyphen (\-).
|
||||
A hostname may not start with a hyphen.
|
||||
|
||||
.PP
|
||||
Hostnames are often used with network client and server programs,
|
||||
which must generally translate the name to an address for use.
|
||||
(This task is generally performed by either
|
||||
|
@ -67,7 +67,7 @@ or the obsolete
|
|||
.BR gethostbyname (3).)
|
||||
Hostnames are resolved by the Internet name resolver in the following
|
||||
fashion.
|
||||
|
||||
.PP
|
||||
If the name consists of a single component, that is, contains no dot,
|
||||
and if the environment variable
|
||||
.B HOSTALIASES
|
||||
|
@ -80,11 +80,11 @@ to be substituted for that alias.
|
|||
If a case-insensitive match is found between the hostname to be resolved
|
||||
and the first field of a line in the file, the substituted name is looked
|
||||
up with no further processing.
|
||||
|
||||
.PP
|
||||
If the input name ends with a trailing dot,
|
||||
the trailing dot is removed,
|
||||
and the remaining name is looked up with no further processing.
|
||||
|
||||
.PP
|
||||
If the input name does not end with a trailing dot, it is looked up
|
||||
by searching through a list of domains until a match is found.
|
||||
The default search list includes first the local domain,
|
||||
|
@ -103,11 +103,11 @@ by a system-wide configuration file (see
|
|||
.BR resolver (5),
|
||||
.BR mailaddr (7),
|
||||
.BR named (8)
|
||||
|
||||
.PP
|
||||
.UR http://www.ietf.org\:/rfc\:/rfc1123.txt
|
||||
IETF RFC\ 1123
|
||||
.UE
|
||||
|
||||
.PP
|
||||
.UR http://www.ietf.org\:/rfc\:/rfc1178.txt
|
||||
IETF RFC\ 1178
|
||||
.UE
|
||||
|
|
10
man7/icmp.7
10
man7/icmp.7
|
@ -85,13 +85,13 @@ packets.
|
|||
.\" The following taken from 2.6.28-rc4 Documentation/networking/ip-sysctl.txt
|
||||
If disabled, ICMP error messages are sent with the primary address of
|
||||
the exiting interface.
|
||||
|
||||
.IP
|
||||
If enabled, the message will be sent with the primary address of
|
||||
the interface that received the packet that caused the ICMP error.
|
||||
This is the behavior that many network administrators will expect from
|
||||
a router.
|
||||
And it can make debugging complicated network layouts much easier.
|
||||
|
||||
.IP
|
||||
Note that if no primary address exists for the interface selected,
|
||||
then the primary address of the first non-loopback interface that
|
||||
has one will be used regardless of this setting.
|
||||
|
@ -122,11 +122,11 @@ otherwise the minimum space between responses in milliseconds.
|
|||
.IR icmp_ratemask " (integer; default: see below; since Linux 2.4.10)"
|
||||
.\" The following taken from 2.6.28-rc4 Documentation/networking/ip-sysctl.txt
|
||||
Mask made of ICMP types for which rates are being limited.
|
||||
|
||||
.IP
|
||||
Significant bits: IHGFEDCBA9876543210
|
||||
.br
|
||||
Default mask: 0000001100000011000 (0x1818)
|
||||
|
||||
.IP
|
||||
Bit definitions (see the Linux kernel source file
|
||||
.IR include/linux/icmp.h ):
|
||||
.RS 12
|
||||
|
@ -147,7 +147,7 @@ H Address Mask Request
|
|||
I Address Mask Reply
|
||||
.TE
|
||||
.RE
|
||||
|
||||
.PP
|
||||
The bits marked with an asterisk are rate limited by default
|
||||
(see the default mask above).
|
||||
.TP
|
||||
|
|
48
man7/inode.7
48
man7/inode.7
|
@ -37,7 +37,7 @@ structure, or
|
|||
which returns a
|
||||
.I statx
|
||||
structure.
|
||||
|
||||
.PP
|
||||
The following is a list of the information typically found in,
|
||||
or associated with, the file inode,
|
||||
with the names of the corresponding structure fields returned by
|
||||
|
@ -47,7 +47,7 @@ and
|
|||
.TP
|
||||
Device where inode resides
|
||||
\fIstat.st_dev\fP; \fIstatx.stx_dev_minor\fP and \fIstatx.stx_dev_major\fP
|
||||
|
||||
.IP
|
||||
Each inode (as well as the associated file) resides in a filesystem
|
||||
that is hosted on a device.
|
||||
That device is identified by the combination of its major ID
|
||||
|
@ -56,7 +56,7 @@ and minor ID (which identifies a specific instance in the general class).
|
|||
.TP
|
||||
Inode number
|
||||
\fIstat.st_ino\fP; \fIstatx.stx_ino\fP
|
||||
|
||||
.IP
|
||||
Each file in a filesystem has a unique inode number.
|
||||
Inode numbers are guaranteed to be unique only within a filesystem
|
||||
(i.e., the same inode numbers may be used by different filesystems,
|
||||
|
@ -65,12 +65,12 @@ This field contains the file's inode number.
|
|||
.TP
|
||||
File type and mode
|
||||
\fIstat.st_mode\fP; \fIstatx.stx_mode\fP
|
||||
|
||||
.IP
|
||||
See the discussion of file type and mode, below.
|
||||
.TP
|
||||
Link count
|
||||
\fIstat.st_nlink\fP; \fIstatx.stx_nlink\fP
|
||||
|
||||
.IP
|
||||
This field contains the number of hard links to the file.
|
||||
Additional links to an existing file are created using
|
||||
.BR link (2).
|
||||
|
@ -78,7 +78,7 @@ Additional links to an existing file are created using
|
|||
User ID
|
||||
.I st_uid
|
||||
\fIstat.st_uid\fP; \fIstatx.stx_uid\fP
|
||||
|
||||
.IP
|
||||
This field records the user ID of the owner of the file.
|
||||
For newly created files,
|
||||
the file user ID is the effective user ID of the creating process.
|
||||
|
@ -87,7 +87,7 @@ The user ID of a file can be changed using
|
|||
.TP
|
||||
Group ID
|
||||
\fIstat.st_gid\fP; \fIstatx.stx_gid\fP
|
||||
|
||||
.IP
|
||||
The inode records the ID of the group owner of the file.
|
||||
For newly created files,
|
||||
the file group ID is either the group ID of the parent directory or
|
||||
|
@ -99,13 +99,13 @@ The group ID of a file can be changed using
|
|||
.TP
|
||||
Device represented by this inode
|
||||
\fIstat.st_rdev\fP; \fIstatx.stx_rdev_minor\fP and \fIstatx.stx_rdev_major\fP
|
||||
|
||||
.IP
|
||||
If this file (inode) represents a device,
|
||||
then the inode records the major and minor ID of that device.
|
||||
.TP
|
||||
File size
|
||||
\fIstat.st_size\fP; \fIstatx.stx_size\fP
|
||||
|
||||
.IP
|
||||
This field gives the size of the file (if it is a regular
|
||||
file or a symbolic link) in bytes.
|
||||
The size of a symbolic link is the length of the pathname
|
||||
|
@ -113,20 +113,20 @@ it contains, without a terminating null byte.
|
|||
.TP
|
||||
Preferred block size for I/O
|
||||
\fIstat.st_blksize\fP; \fIstatx.stx_blksize\fP
|
||||
|
||||
.IP
|
||||
This field gives the "preferred" blocksize for efficient filesystem I/O.
|
||||
(Writing to a file in smaller chunks may cause
|
||||
an inefficient read-modify-rewrite.)
|
||||
.TP
|
||||
Number of blocks allocated to the file
|
||||
\fIstat.st_blocks\fP; \fIstatx.stx_size\fP
|
||||
|
||||
.IP
|
||||
This field indicates the number of blocks allocated to the file,
|
||||
512-byte units,
|
||||
(This may be smaller than
|
||||
.IR st_size /512
|
||||
when the file has holes.)
|
||||
|
||||
.IP
|
||||
The POSIX.1 standard notes
|
||||
.\" Rationale for sys/stat.h in POSIX.1-2008
|
||||
that the unit for the
|
||||
|
@ -140,7 +140,7 @@ Furthermore, the unit may differ on a per-filesystem basis.
|
|||
.TP
|
||||
Last access timestamp (atime)
|
||||
\fIstat.st_atime\fP; \fIstatx.stx_atime\fP
|
||||
|
||||
.IP
|
||||
This is the file's last access timestamp.
|
||||
It is changed by file accesses, for example, by
|
||||
.BR execve (2),
|
||||
|
@ -153,7 +153,7 @@ and
|
|||
Other interfaces, such as
|
||||
.BR mmap (2),
|
||||
may or may not update the atime timestamp
|
||||
|
||||
.IP
|
||||
Some filesystem types allow mounting in such a way that file
|
||||
and/or directory accesses do not cause an update of the atime timestamp.
|
||||
(See
|
||||
|
@ -173,17 +173,17 @@ flag; see
|
|||
.TP
|
||||
File creation (birth) timestamp (btime)
|
||||
(not returned in the \fIstat\fP structure); \fIstatx.stx_btime\fP
|
||||
|
||||
.IP
|
||||
The file's creation timestamp.
|
||||
This is set on file creation and not changed subsequently.
|
||||
|
||||
.IP
|
||||
The btime timestamp was not historically present on UNIX systems
|
||||
and is not currently supported by most Linux filesystems.
|
||||
.\" FIXME Is it supported on ext4 and XFS?
|
||||
.TP
|
||||
Last modification timestamp (mtime)
|
||||
\fIstat.st_atime\fP; \fIstatx.stx_mtime\fP
|
||||
|
||||
.IP
|
||||
This is the file's last modification timestamp.
|
||||
It is changed by file modifications, for example, by
|
||||
.BR mknod (2),
|
||||
|
@ -201,7 +201,7 @@ changed for changes in owner, group, hard link count, or mode.
|
|||
.TP
|
||||
Last status change timestamp (ctime)
|
||||
\fIstat.st_ctime\fP; \fIstatx.stx_ctime\fP
|
||||
|
||||
.IP
|
||||
This is the file's last status change timestamp.
|
||||
It is changed by writing or by setting inode information
|
||||
(i.e., owner, group, link count, mode, etc.).
|
||||
|
@ -225,7 +225,7 @@ field (for
|
|||
the
|
||||
.I statx.stx_mode
|
||||
field) contains the file type and mode.
|
||||
|
||||
.PP
|
||||
POSIX refers to the
|
||||
.I stat.st_mode
|
||||
bits corresponding to the mask
|
||||
|
@ -254,7 +254,7 @@ S_IFIFO 0010000 FIFO
|
|||
.in
|
||||
.PP
|
||||
Thus, to test for a regular file (for example), one could write:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
stat(pathname, &sb);
|
||||
|
@ -293,7 +293,7 @@ socket? (Not in POSIX.1-1996.)
|
|||
.RE
|
||||
.PP
|
||||
The preceding code snippet could thus be rewritten as:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
stat(pathname, &sb);
|
||||
|
@ -319,7 +319,7 @@ and
|
|||
are provided if
|
||||
.BR _XOPEN_SOURCE
|
||||
is defined.
|
||||
|
||||
.PP
|
||||
The definition of
|
||||
.BR S_IFSOCK
|
||||
can also be exposed either by defining
|
||||
|
@ -328,7 +328,7 @@ with a value of 500 or greater or (since glibc 2.24) by defining both
|
|||
.BR _XOPEN_SOURCE
|
||||
and
|
||||
.BR _XOPEN_SOURCE_EXTENDED .
|
||||
|
||||
.PP
|
||||
The definition of
|
||||
.BR S_ISSOCK ()
|
||||
is exposed if any of the following feature test macros is defined:
|
||||
|
@ -424,7 +424,7 @@ and so on.
|
|||
The
|
||||
.BR S_IF*
|
||||
constants are present in POSIX.1-2001 and later.
|
||||
|
||||
.PP
|
||||
The
|
||||
.BR S_ISLNK ()
|
||||
and
|
||||
|
|
|
@ -35,7 +35,7 @@ Inotify can be used to monitor individual files,
|
|||
or to monitor directories.
|
||||
When a directory is monitored, inotify will return events
|
||||
for the directory itself, and for files inside the directory.
|
||||
|
||||
.PP
|
||||
The following system calls are used with this API:
|
||||
.IP * 3
|
||||
.BR inotify_init (2)
|
||||
|
@ -99,7 +99,7 @@ in which case the call fails with the error
|
|||
.BR EINTR ;
|
||||
see
|
||||
.BR signal (7)).
|
||||
|
||||
.PP
|
||||
Each successful
|
||||
.BR read (2)
|
||||
returns a buffer containing one or more of the following structures:
|
||||
|
@ -120,15 +120,15 @@ struct inotify_event {
|
|||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
.I wd
|
||||
identifies the watch for which this event occurs.
|
||||
It is one of the watch descriptors returned by a previous call to
|
||||
.BR inotify_add_watch (2).
|
||||
|
||||
.PP
|
||||
.I mask
|
||||
contains bits that describe the event that occurred (see below).
|
||||
|
||||
.PP
|
||||
.I cookie
|
||||
is a unique integer that connects related events.
|
||||
Currently, this is used only for rename events, and
|
||||
|
@ -140,7 +140,7 @@ events to be connected by the application.
|
|||
For all other event types,
|
||||
.I cookie
|
||||
is set to 0.
|
||||
|
||||
.PP
|
||||
The
|
||||
.I name
|
||||
field is present only when an event is returned
|
||||
|
@ -149,7 +149,7 @@ it identifies the filename within to the watched directory.
|
|||
This filename is null-terminated,
|
||||
and may include further null bytes (\(aq\\0\(aq) to align subsequent reads to a
|
||||
suitable address boundary.
|
||||
|
||||
.PP
|
||||
The
|
||||
.I len
|
||||
field counts all of the bytes in
|
||||
|
@ -159,7 +159,7 @@ the length of each
|
|||
.I inotify_event
|
||||
structure is thus
|
||||
.IR "sizeof(struct inotify_event)+len" .
|
||||
|
||||
.PP
|
||||
The behavior when the buffer given to
|
||||
.BR read (2)
|
||||
is too small to return information about the next event depends
|
||||
|
@ -170,9 +170,9 @@ returns 0; since kernel 2.6.21,
|
|||
fails with the error
|
||||
.BR EINVAL .
|
||||
Specifying a buffer of size
|
||||
|
||||
.PP
|
||||
sizeof(struct inotify_event) + NAME_MAX + 1
|
||||
|
||||
.PP
|
||||
will be sufficient to read at least one event.
|
||||
.SS inotify events
|
||||
The
|
||||
|
@ -274,7 +274,7 @@ Inotify monitoring is inode-based: when monitoring a file
|
|||
(but not when monitoring the directory containing a file),
|
||||
an event can be generated for activity on any link to the file
|
||||
(in the same or a different directory).
|
||||
|
||||
.PP
|
||||
When monitoring a directory:
|
||||
.IP * 3
|
||||
the events marked above with an asterisk (*) can occur both
|
||||
|
@ -288,7 +288,7 @@ when monitoring a directory,
|
|||
events are not generated for the files inside the directory
|
||||
when the events are performed via a pathname (i.e., a link)
|
||||
that lies outside the monitored directory.
|
||||
|
||||
.PP
|
||||
When events are generated for objects inside a watched directory, the
|
||||
.I name
|
||||
field in the returned
|
||||
|
@ -302,7 +302,7 @@ This macro can be used as the
|
|||
.I mask
|
||||
argument when calling
|
||||
.BR inotify_add_watch (2).
|
||||
|
||||
.PP
|
||||
Two additional convenience macros are defined:
|
||||
.RS 4
|
||||
.TP
|
||||
|
@ -582,7 +582,7 @@ Inotify file descriptors can be monitored using
|
|||
and
|
||||
.BR epoll (7).
|
||||
When an event is available, the file descriptor indicates as readable.
|
||||
|
||||
.PP
|
||||
Since Linux 2.6.25,
|
||||
signal-driven I/O notification is available for inotify file descriptors;
|
||||
see the discussion of
|
||||
|
@ -611,7 +611,7 @@ and
|
|||
.B POLLIN
|
||||
is set in
|
||||
.IR si_band .
|
||||
|
||||
.PP
|
||||
If successive output inotify events produced on the
|
||||
inotify file descriptor are identical (same
|
||||
.IR wd ,
|
||||
|
@ -624,13 +624,13 @@ older event has not yet been read (but see BUGS).
|
|||
This reduces the amount of kernel memory required for the event queue,
|
||||
but also means that an application can't use inotify to reliably count
|
||||
file events.
|
||||
|
||||
.PP
|
||||
The events returned by reading from an inotify file descriptor
|
||||
form an ordered queue.
|
||||
Thus, for example, it is guaranteed that when renaming from
|
||||
one directory to another, events will be produced in the
|
||||
correct order on the inotify file descriptor.
|
||||
|
||||
.PP
|
||||
The set of watch descriptors that is being monitored via
|
||||
an inotify file descriptor can be viewed via the entry for
|
||||
the inotify file descriptor in the process's
|
||||
|
@ -651,7 +651,7 @@ In particular, there is no easy
|
|||
way for a process that is monitoring events via inotify
|
||||
to distinguish events that it triggers
|
||||
itself from those that are triggered by other processes.
|
||||
|
||||
.PP
|
||||
Inotify reports only events that a user-space program triggers through
|
||||
the filesystem API.
|
||||
As a result, it does not catch remote events that occur
|
||||
|
@ -664,28 +664,28 @@ Furthermore, various pseudo-filesystems such as
|
|||
and
|
||||
.IR /dev/pts
|
||||
are not monitorable with inotify.
|
||||
|
||||
.PP
|
||||
The inotify API does not report file accesses and modifications that
|
||||
may occur because of
|
||||
.BR mmap (2),
|
||||
.BR msync (2),
|
||||
and
|
||||
.BR munmap (2).
|
||||
|
||||
.PP
|
||||
The inotify API identifies affected files by filename.
|
||||
However, by the time an application processes an inotify event,
|
||||
the filename may already have been deleted or renamed.
|
||||
|
||||
.PP
|
||||
The inotify API identifies events via watch descriptors.
|
||||
It is the application's responsibility to cache a mapping
|
||||
(if one is needed) between watch descriptors and pathnames.
|
||||
Be aware that directory renamings may affect multiple cached pathnames.
|
||||
|
||||
.PP
|
||||
Inotify monitoring of directories is not recursive:
|
||||
to monitor subdirectories under a directory,
|
||||
additional watches must be created.
|
||||
This can take a significant amount time for large directory trees.
|
||||
|
||||
.PP
|
||||
If monitoring an entire directory subtree,
|
||||
and a new subdirectory is created in that tree or an existing directory
|
||||
is renamed into that tree,
|
||||
|
@ -694,7 +694,7 @@ new files (and subdirectories) may already exist inside the subdirectory.
|
|||
Therefore, you may want to scan the contents of the subdirectory
|
||||
immediately after adding the watch (and, if desired,
|
||||
recursively add watches for any subdirectories that it contains).
|
||||
|
||||
.PP
|
||||
Note that the event queue can overflow.
|
||||
In this case, events are lost.
|
||||
Robust applications should handle the possibility of
|
||||
|
@ -706,7 +706,7 @@ approach is to close the inotify file descriptor, empty the cache,
|
|||
create a new inotify file descriptor,
|
||||
and then re-create watches and cache entries
|
||||
for the objects to be monitored.)
|
||||
|
||||
.PP
|
||||
If a filesystem is mounted on top of a monitored directory,
|
||||
no event is generated, and no events are generated
|
||||
for objects immediately under the new mount point.
|
||||
|
@ -723,7 +723,7 @@ event pair that is generated by
|
|||
.BR rename (2)
|
||||
can be matched up via their shared cookie value.
|
||||
However, the task of matching has some challenges.
|
||||
|
||||
.PP
|
||||
These two events are usually consecutive in the event stream available
|
||||
when reading from the inotify file descriptor.
|
||||
However, this is not guaranteed.
|
||||
|
@ -740,7 +740,7 @@ inserted into the queue: there may be a brief interval where the
|
|||
has appeared, but the
|
||||
.B IN_MOVED_TO
|
||||
has not.
|
||||
|
||||
.PP
|
||||
Matching up the
|
||||
.B IN_MOVED_FROM
|
||||
and
|
||||
|
@ -765,7 +765,7 @@ then those watch descriptors will be inconsistent with
|
|||
the watch descriptors in any pending events.
|
||||
(Re-creating the inotify file descriptor and rebuilding the cache may
|
||||
be useful to deal with this scenario.)
|
||||
|
||||
.PP
|
||||
Applications should also allow for the possibility that the
|
||||
.B IN_MOVED_FROM
|
||||
event was the last event that could fit in the buffer
|
||||
|
@ -793,7 +793,7 @@ calls to
|
|||
generate
|
||||
.B IN_MODIFY
|
||||
events.
|
||||
|
||||
.PP
|
||||
.\" FIXME . kernel commit 611da04f7a31b2208e838be55a42c7a1310ae321
|
||||
.\" implies that unmount events were buggy 2.6.11 to 2.6.36
|
||||
.\"
|
||||
|
@ -801,7 +801,7 @@ In kernels before 2.6.16, the
|
|||
.B IN_ONESHOT
|
||||
.I mask
|
||||
flag does not work.
|
||||
|
||||
.PP
|
||||
As originally designed and implemented, the
|
||||
.B IN_ONESHOT
|
||||
flag did not cause an
|
||||
|
@ -811,7 +811,7 @@ However, as an unintended effect of other changes,
|
|||
since Linux 2.6.36, an
|
||||
.B IN_IGNORED
|
||||
event is generated in this case.
|
||||
|
||||
.PP
|
||||
Before kernel 2.6.25,
|
||||
.\" commit 1c17d18e3775485bf1e0ce79575eb637a94494a2
|
||||
the kernel code that was intended to coalesce successive identical events
|
||||
|
@ -820,7 +820,7 @@ if the older had not yet been read)
|
|||
instead checked if the most recent event could be coalesced with the
|
||||
.I oldest
|
||||
unread event.
|
||||
|
||||
.PP
|
||||
When a watch descriptor is removed by calling
|
||||
.BR inotify_rm_watch (2)
|
||||
(or because a watch file is deleted or the filesystem
|
||||
|
@ -1089,6 +1089,6 @@ main(int argc, char* argv[])
|
|||
.BR read (2),
|
||||
.BR stat (2),
|
||||
.BR fanotify (7)
|
||||
|
||||
.PP
|
||||
.IR Documentation/filesystems/inotify.txt
|
||||
in the Linux kernel source tree
|
||||
|
|
|
@ -25,7 +25,7 @@ those objects and also use the facility for their own purposes; see
|
|||
.BR request_key (2),
|
||||
and
|
||||
.BR keyctl (2).
|
||||
|
||||
.PP
|
||||
A library and some user-space utilities are provided to allow access to the
|
||||
facility.
|
||||
See
|
||||
|
@ -48,7 +48,7 @@ Type
|
|||
A key's type defines what sort of data can be held in the key,
|
||||
how the proposed content of the key will be parsed,
|
||||
and how the payload will be used.
|
||||
|
||||
.IP
|
||||
There are a number of general-purpose types available, plus some specialist
|
||||
types defined by specific kernel components.
|
||||
.TP
|
||||
|
@ -65,7 +65,7 @@ instantiation of a key if that key wasn't already known to the kernel
|
|||
when it was requested.
|
||||
For further details, see
|
||||
.BR request_key (2).
|
||||
|
||||
.IP
|
||||
A key's payload can be read and updated if the key type supports it and if
|
||||
suitable permission is granted to the caller.
|
||||
.TP
|
||||
|
@ -78,7 +78,7 @@ and there is an additional category\(empossessor\(embeyond the usual user,
|
|||
group, and other (see
|
||||
.IR Possession ,
|
||||
below).
|
||||
|
||||
.IP
|
||||
Note that keys are quota controlled, since they require unswappable kernel
|
||||
memory.
|
||||
The owning user ID specifies whose quota is to be debited.
|
||||
|
@ -113,7 +113,7 @@ to other keys (including other keyrings),
|
|||
analogous to a directory holding links to files.
|
||||
The main purpose of a keyring is to prevent other keys from
|
||||
being garbage collected because nothing refers to them.
|
||||
|
||||
.IP
|
||||
Keyrings with descriptions (names)
|
||||
that begin with a period (\(aq.\(aq) are reserved to the implementation.
|
||||
.TP
|
||||
|
@ -121,10 +121,10 @@ that begin with a period (\(aq.\(aq) are reserved to the implementation.
|
|||
This is a general-purpose key type.
|
||||
The key is kept entirely within kernel memory.
|
||||
The payload may be read and updated by user-space applications.
|
||||
|
||||
.IP
|
||||
The payload for keys of this type is a blob of arbitrary data
|
||||
of up to 32,767 bytes.
|
||||
|
||||
.IP
|
||||
The description may be any valid string, though it is preferred that it
|
||||
start with a colon-delimited prefix representing the service
|
||||
to which the key is of interest
|
||||
|
@ -149,7 +149,7 @@ This key type is similar to the
|
|||
.I """user"""
|
||||
key type, but it may hold a payload of up to 1 MiB in size.
|
||||
This key type is useful for purposes such as holding Kerberos ticket caches.
|
||||
|
||||
.IP
|
||||
The payload data may be stored in a tmpfs filesystem,
|
||||
rather than in kernel memory,
|
||||
if the data size exceeds the overhead of storing the data in the filesystem.
|
||||
|
@ -165,7 +165,7 @@ thereby preventing it from being written unencrypted into swap space.
|
|||
There are more specialized key types available also,
|
||||
but they aren't discussed here
|
||||
because they aren't intended for normal user-space use.
|
||||
|
||||
.PP
|
||||
Key type names
|
||||
that begin with a period (\(aq.\(aq) are reserved to the implementation.
|
||||
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
|
||||
|
@ -208,13 +208,13 @@ for more information.
|
|||
To prevent a key from being garbage collected,
|
||||
it must anchored to keep its reference count elevated
|
||||
when it is not in active use by the kernel.
|
||||
|
||||
.PP
|
||||
Keyrings are used to anchor other keys:
|
||||
each link is a reference on a key.
|
||||
Note that keyrings themselves are just keys and
|
||||
are also subject to the same anchoring requirement to prevent
|
||||
them being garbage collected.
|
||||
|
||||
.PP
|
||||
The kernel makes available a number of anchor keyrings.
|
||||
Note that some of these keyrings will be created only when first accessed.
|
||||
.TP
|
||||
|
@ -233,7 +233,7 @@ the
|
|||
the
|
||||
.BR thread-keyring (7)
|
||||
(specific to a particular thread).
|
||||
|
||||
.IP
|
||||
As an alternative to using the actual keyring IDs,
|
||||
in calls to
|
||||
.BR add_key (2),
|
||||
|
@ -253,7 +253,7 @@ Each UID known to the kernel has a record that contains two keyrings: the
|
|||
and the
|
||||
.BR user-session-keyring (7).
|
||||
These exist for as long as the UID record in the kernel exists.
|
||||
|
||||
.IP
|
||||
As an alternative to using the actual keyring IDs,
|
||||
in calls to
|
||||
.BR add_key (2),
|
||||
|
@ -265,7 +265,7 @@ the special keyring values
|
|||
and
|
||||
.BR KEY_SPEC_USER_SESSION_KEYRING
|
||||
can be used to refer to the caller's own instances of these keyrings.
|
||||
|
||||
.IP
|
||||
A link to the user keyring is placed in a new session keyring by
|
||||
.BR pam_keyinit (8)
|
||||
when a new login session is initiated.
|
||||
|
@ -528,18 +528,18 @@ The thread need not possess the key for it to be visible in this file.
|
|||
.\"
|
||||
.\"Possibly it shouldn't be, but for now it is.
|
||||
.\"
|
||||
|
||||
.IP
|
||||
The only keys included in the list are those that grant
|
||||
.I view
|
||||
permission to the reading process
|
||||
(regardless of whether or not it possesses them).
|
||||
LSM security checks are still performed,
|
||||
and may filter out further keys that the process is not authorized to view.
|
||||
|
||||
.IP
|
||||
An example of the data that one might see in this file
|
||||
(with the columns numbered for easy reference below)
|
||||
is the following:
|
||||
|
||||
.IP
|
||||
.nf
|
||||
.in 0n
|
||||
(1) (2) (3)(4) (5) (6) (7) (8) (9)
|
||||
|
@ -554,7 +554,7 @@ is the following:
|
|||
3ce56aea I--Q--- 5 perm 3f030000 1000 1000 keyring _ses: 1
|
||||
.in
|
||||
.fi
|
||||
|
||||
.IP
|
||||
The fields shown in each line of this file are as follows:
|
||||
.RS
|
||||
.TP
|
||||
|
@ -612,7 +612,7 @@ Permissions (5)
|
|||
The key permissions, expressed as four hexadecimal bytes containing,
|
||||
from left to right, the possessor, user, group, and other permissions.
|
||||
Within each byte, the permission bits are as follows:
|
||||
|
||||
.IP
|
||||
.PD 0
|
||||
.RS 12
|
||||
.TP
|
||||
|
@ -651,9 +651,9 @@ Description (9)
|
|||
The key description (name).
|
||||
This field contains descriptive information about the key.
|
||||
For most key types, it has the form
|
||||
|
||||
.IP
|
||||
name[: extra\-info]
|
||||
|
||||
.IP
|
||||
The
|
||||
.I name
|
||||
subfield is the key's description (name).
|
||||
|
@ -690,9 +690,9 @@ key type
|
|||
(authorization key; see
|
||||
.BR request_key (2)),
|
||||
the description field has the form shown in the following example:
|
||||
|
||||
.IP
|
||||
key:c9a9b19 pid:28880 ci:10
|
||||
|
||||
.IP
|
||||
The three subfields are as follows:
|
||||
.RS
|
||||
.TP 5
|
||||
|
@ -713,7 +713,7 @@ be instantiated
|
|||
This file lists various information for each user ID that
|
||||
has at least one key on the system.
|
||||
An example of the data that one might see in this file is the following:
|
||||
|
||||
.IP
|
||||
.nf
|
||||
.in +4n
|
||||
0: 10 9/9 2/1000000 22/25000000
|
||||
|
@ -721,7 +721,7 @@ An example of the data that one might see in this file is the following:
|
|||
1000: 11 11/11 10/200 271/20000
|
||||
.in
|
||||
.fi
|
||||
|
||||
.IP
|
||||
The fields shown in each line are as follows:
|
||||
.RS
|
||||
.TP
|
||||
|
@ -755,7 +755,7 @@ of time where user space can see an error (respectively
|
|||
and
|
||||
.BR EKEYEXPIRED )
|
||||
that indicates what happened to the key.
|
||||
|
||||
.IP
|
||||
The default value in this file is 300 (i.e., 5 minutes).
|
||||
.TP
|
||||
.IR /proc/sys/kernel/keys/persistent_keyring_expiry " (since Linux 3.13)"
|
||||
|
@ -768,7 +768,7 @@ or the
|
|||
.BR keyctl (2)
|
||||
.B KEYCTL_GET_PERSISTENT
|
||||
operation.)
|
||||
|
||||
.IP
|
||||
The default value in this file is 259200 (i.e., 3 days).
|
||||
.PP
|
||||
The following files (which are writable by privileged processes)
|
||||
|
@ -780,21 +780,21 @@ and number of bytes of data that can be stored in key payloads:
|
|||
.\" Previously: KEYQUOTA_MAX_BYTES 10000
|
||||
This is the maximum number of bytes of data that a nonroot user
|
||||
can hold in the payloads of the keys owned by the user.
|
||||
|
||||
.IP
|
||||
The default value in this file is 20,000.
|
||||
.TP
|
||||
.IR /proc/sys/kernel/keys/maxkeys " (since Linux 2.6.26)"
|
||||
.\" commit 0b77f5bfb45c13e1e5142374f9d6ca75292252a4
|
||||
.\" Previously: KEYQUOTA_MAX_KEYS 100
|
||||
This is the maximum number of keys that a nonroot user may own.
|
||||
|
||||
.IP
|
||||
The default value in this file is 200.
|
||||
.TP
|
||||
.IR /proc/sys/kernel/keys/root_maxbytes " (since Linux 2.6.26)"
|
||||
This is the maximum number of bytes of data that the root user
|
||||
(UID 0 in the root user namespace)
|
||||
can hold in the payloads of the keys owned by root.
|
||||
|
||||
.IP
|
||||
.\"738c5d190f6540539a04baf36ce21d46b5da04bd
|
||||
The default value in this file is 25,000,000 (20,000 before Linux 3.17).
|
||||
.\" commit 0b77f5bfb45c13e1e5142374f9d6ca75292252a4
|
||||
|
@ -804,7 +804,7 @@ The default value in this file is 25,000,000 (20,000 before Linux 3.17).
|
|||
This is the maximum number of keys that the root user
|
||||
(UID 0 in the root user namespace)
|
||||
may own.
|
||||
|
||||
.IP
|
||||
.\"738c5d190f6540539a04baf36ce21d46b5da04bd
|
||||
The default value in this file is 1,000,000 (200 before Linux 3.17).
|
||||
.PP
|
||||
|
|
|
@ -51,7 +51,7 @@ available via the command
|
|||
Release 1.0 of glibc was made in September 1992.
|
||||
(There were earlier 0.x releases.)
|
||||
The next major release of glibc was 2.0, at the beginning of 1997.
|
||||
|
||||
.PP
|
||||
The pathname
|
||||
.I /lib/libc.so.6
|
||||
(or something similar) is normally a symbolic link that
|
||||
|
@ -73,7 +73,7 @@ this version used the shared library soname
|
|||
.IR libc.so.5 .
|
||||
For a while,
|
||||
Linux libc was the standard C library in many Linux distributions.
|
||||
|
||||
.PP
|
||||
However, notwithstanding the original motivations of the Linux libc effort,
|
||||
by the time glibc 2.0 was released (in 1997),
|
||||
it was clearly superior to Linux libc,
|
||||
|
@ -82,7 +82,7 @@ soon switched back to glibc.
|
|||
To avoid any confusion with Linux libc versions,
|
||||
glibc 2.0 and later used the shared library soname
|
||||
.IR libc.so.6 .
|
||||
|
||||
.PP
|
||||
Since the switch from Linux libc to glibc 2.0 occurred long ago,
|
||||
.I man-pages
|
||||
no longer takes care to document Linux libc details.
|
||||
|
|
|
@ -116,7 +116,7 @@ The "postmaster" address is not case sensitive.
|
|||
.BR aliases (5),
|
||||
.BR forward (5),
|
||||
.BR sendmail (8)
|
||||
|
||||
.PP
|
||||
.UR http://www.ietf.org\:/rfc\:/rfc5322.txt
|
||||
IETF RFC\ 5322
|
||||
.UE
|
||||
|
|
|
@ -29,12 +29,12 @@ mount_namespaces \- overview of Linux mount namespaces
|
|||
.SH DESCRIPTION
|
||||
For an overview of namespaces, see
|
||||
.BR namespaces (7).
|
||||
|
||||
.PP
|
||||
Mount namespaces provide isolation of the list of mount points seen
|
||||
by the processes in each namespace instance.
|
||||
Thus, the processes in each of the mount namespace instances
|
||||
will see distinct single-directory hierarchies.
|
||||
|
||||
.PP
|
||||
The views provided by the
|
||||
.IR /proc/[pid]/mounts ,
|
||||
.IR /proc/[pid]/mountinfo ,
|
||||
|
@ -47,7 +47,7 @@ correspond to the mount namespace in which the process with the PID
|
|||
resides.
|
||||
(All of the processes that reside in the same mount namespace
|
||||
will see the same view in these files.)
|
||||
|
||||
.PP
|
||||
When a process creates a new mount namespace using
|
||||
.BR clone (2)
|
||||
or
|
||||
|
@ -146,7 +146,7 @@ between namespaces
|
|||
(or, more precisely, between the members of a
|
||||
.IR "peer group"
|
||||
that are propagating events to one another).
|
||||
|
||||
.PP
|
||||
Each mount point is marked (via
|
||||
.BR mount (2))
|
||||
as having one of the following
|
||||
|
@ -170,7 +170,7 @@ Mount and unmount events do not propagate into or out of this mount point.
|
|||
Mount and unmount events propagate into this mount point from
|
||||
a (master) shared peer group.
|
||||
Mount and unmount events under this mount point do not propagate to any peer.
|
||||
|
||||
.IP
|
||||
Note that a mount point can be the slave of another peer group
|
||||
while at the same time sharing mount and unmount events
|
||||
with a peer group of which it is a member.
|
||||
|
@ -184,7 +184,7 @@ Attempts to bind mount this mount
|
|||
with the
|
||||
.BR MS_BIND
|
||||
flag) will fail.
|
||||
|
||||
.IP
|
||||
When a recursive bind mount
|
||||
.RB ( mount (2)
|
||||
with the
|
||||
|
@ -198,13 +198,13 @@ when replicating that subtree to produce the target subtree.
|
|||
.PP
|
||||
For a discussion of the propagation type assigned to a new mount,
|
||||
see NOTES.
|
||||
|
||||
.PP
|
||||
The propagation type is a per-mount-point setting;
|
||||
some mount points may be marked as shared
|
||||
(with each shared mount point being a member of a distinct peer group),
|
||||
while others are private
|
||||
(or slaved or unbindable).
|
||||
|
||||
.PP
|
||||
Note that a mount's propagation type determines whether
|
||||
mounts and unmounts of mount points
|
||||
.I "immediately under"
|
||||
|
@ -215,7 +215,7 @@ What happens if the mount point itself is unmounted is determined by
|
|||
the propagation type that is in effect for the
|
||||
.I parent
|
||||
of the mount point.
|
||||
|
||||
.PP
|
||||
Members are added to a
|
||||
.IR "peer group"
|
||||
when a mount point is marked as shared and either:
|
||||
|
@ -230,7 +230,7 @@ A mount ceases to be a member of a peer group when either
|
|||
the mount is explicitly unmounted,
|
||||
or when the mount is implicitly unmounted because a mount namespace is removed
|
||||
(because it has no more member processes).
|
||||
|
||||
.PP
|
||||
The propagation type of the mount points in a mount namespace
|
||||
can be discovered via the "optional fields" exposed in
|
||||
.IR /proc/[pid]/mountinfo .
|
||||
|
@ -283,7 +283,7 @@ Suppose that on a terminal in the initial mount namespace,
|
|||
we mark one mount point as shared and another as private,
|
||||
and then view the mounts in
|
||||
.IR /proc/self/mountinfo :
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh1# \fBmount \-\-make\-shared /mntS\fP
|
||||
|
@ -293,7 +293,7 @@ sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
|||
83 61 8:15 / /mntP rw,relatime
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
From the
|
||||
.IR /proc/self/mountinfo
|
||||
output, we see that
|
||||
|
@ -310,18 +310,18 @@ and
|
|||
is the root directory,
|
||||
.IR / ,
|
||||
which is mounted as private:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh1# \fBcat /proc/self/mountinfo | awk \(aq$1 == 61\(aq | sed \(aqs/ \- .*//\(aq\fP
|
||||
61 0 8:2 / / rw,relatime
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
On a second terminal,
|
||||
we create a new mount namespace where we run a second shell
|
||||
and inspect the mounts:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
$ \fBPS1=\(aqsh2# \(aq sudo unshare \-m \-\-propagation unchanged sh\fP
|
||||
|
@ -330,7 +330,7 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
|||
225 145 8:15 / /mntP rw,relatime
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
The new mount namespace received a copy of the initial mount namespace's
|
||||
mount points.
|
||||
These new mount points maintain the same propagation types,
|
||||
|
@ -342,13 +342,13 @@ option prevents
|
|||
from marking all mounts as private when creating a new mount namespace,
|
||||
.\" Since util-linux 2.27
|
||||
which it does by default.)
|
||||
|
||||
.PP
|
||||
In the second terminal, we then create submounts under each of
|
||||
.IR /mntS
|
||||
and
|
||||
.IR /mntP
|
||||
and inspect the set-up:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh2# \fBmkdir /mntS/a\fP
|
||||
|
@ -362,13 +362,13 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
|||
230 225 8:23 / /mntP/b rw,relatime
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
From the above, it can be seen that
|
||||
.IR /mntS/a
|
||||
was created as shared (inheriting this setting from its parent mount) and
|
||||
.IR /mntP/b
|
||||
was created as a private mount.
|
||||
|
||||
.PP
|
||||
Returning to the first terminal and inspecting the set-up,
|
||||
we see that the new mount created under the shared mount point
|
||||
.IR /mntS
|
||||
|
@ -376,7 +376,7 @@ propagated to its peer mount (in the initial mount namespace),
|
|||
but the new mount created under the private mount point
|
||||
.IR /mntP
|
||||
did not propagate:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
||||
|
@ -395,10 +395,10 @@ an optical disk is mounted in the master shared peer group
|
|||
(in another mount namespace),
|
||||
but want to prevent mount and unmount events under the slave mount
|
||||
from having side effects in other namespaces.
|
||||
|
||||
.PP
|
||||
We can demonstrate the effect of slaving by first marking
|
||||
two mount points as shared in the initial mount namespace:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh1# \fBmount \-\-make\-shared /mntX\fP
|
||||
|
@ -408,10 +408,10 @@ sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
|||
133 83 8:22 / /mntY rw,relatime shared:2
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
On a second terminal,
|
||||
we create a new mount namespace and inspect the mount points:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh2# \fBunshare \-m \-\-propagation unchanged sh\fP
|
||||
|
@ -420,9 +420,9 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
|||
169 167 8:22 / /mntY rw,relatime shared:2
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
In the new mount namespace, we then mark one of the mount points as a slave:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh2# \fBmount \-\-make\-slave /mntY\fP
|
||||
|
@ -431,17 +431,17 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
|||
169 167 8:22 / /mntY rw,relatime master:2
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
From the above output, we see that
|
||||
.IR /mntY
|
||||
is now a slave mount that is receiving propagation events from
|
||||
the shared peer group with the ID 2.
|
||||
|
||||
.PP
|
||||
Continuing in the new namespace, we create submounts under each of
|
||||
.IR /mntX
|
||||
and
|
||||
.IR /mntY :
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh2# \fBmkdir /mntX/a\fP
|
||||
|
@ -450,7 +450,7 @@ sh2# \fBmkdir /mntY/b\fP
|
|||
sh2# \fBmount /dev/sda5 /mntY/b\fP
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
When we inspect the state of the mount points in the new mount namespace,
|
||||
we see that
|
||||
.IR /mntX/a
|
||||
|
@ -458,7 +458,7 @@ was created as a new shared mount
|
|||
(inheriting the "shared" setting from its parent mount) and
|
||||
.IR /mntY/b
|
||||
was created as a private mount:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
||||
|
@ -468,7 +468,7 @@ sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
|||
175 169 8:5 / /mntY/b rw,relatime
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Returning to the first terminal (in the initial mount namespace),
|
||||
we see that the mount
|
||||
.IR /mntX/a
|
||||
|
@ -477,7 +477,7 @@ propagated to the peer (the shared
|
|||
but the mount
|
||||
.IR /mntY/b
|
||||
was not propagated:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
||||
|
@ -486,11 +486,11 @@ sh1# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
|||
174 132 8:3 / /mntX/a rw,relatime shared:3
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Now we create a new mount point under
|
||||
.IR /mntY
|
||||
in the first shell:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh1# \fBmkdir /mntY/c\fP
|
||||
|
@ -502,12 +502,12 @@ sh1# \fBcat /proc/self/mountinfo | grep '/mnt' | sed 's/ \- .*//'\fP
|
|||
178 133 8:1 / /mntY/c rw,relatime shared:4
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
When we examine the mount points in the second mount namespace,
|
||||
we see that in this case the new mount has been propagated
|
||||
to the slave mount point,
|
||||
and that the new mount is itself a slave mount (to peer group 4):
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
sh2# \fBcat /proc/self/mountinfo | grep \(aq/mnt\(aq | sed \(aqs/ \- .*//\(aq\fP
|
||||
|
@ -524,9 +524,9 @@ One of the primary purposes of unbindable mounts is to avoid
|
|||
the "mount point explosion" problem when repeatedly performing bind mounts
|
||||
of a higher-level subtree at a lower-level mount point.
|
||||
The problem is illustrated by the following shell session.
|
||||
|
||||
.PP
|
||||
Suppose we have a system with the following mount points:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmount | awk \(aq{print $1, $2, $3}\(aq\fP
|
||||
|
@ -535,11 +535,11 @@ Suppose we have a system with the following mount points:
|
|||
/dev/sdb7 on /mntY
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Suppose furthermore that we wish to recursively bind mount
|
||||
the root directory under several users' home directories.
|
||||
We do this for the first user, and inspect the mount points:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmount \-\-rbind / /home/cecilia/\fP
|
||||
|
@ -552,10 +552,10 @@ We do this for the first user, and inspect the mount points:
|
|||
/dev/sdb7 on /home/cecilia/mntY
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
When we repeat this operation for the second user,
|
||||
we start to see the explosion problem:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmount \-\-rbind / /home/henry\fP
|
||||
|
@ -574,7 +574,7 @@ we start to see the explosion problem:
|
|||
/dev/sdb7 on /home/henry/home/cecilia/mntY
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Under
|
||||
.IR /home/henry ,
|
||||
we have not only recursively added the
|
||||
|
@ -586,7 +586,7 @@ mounts, but also the recursive mounts of those directories under
|
|||
that were created in the previous step.
|
||||
Upon repeating the step for a third user,
|
||||
it becomes obvious that the explosion is exponential in nature:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmount \-\-rbind / /home/otto\fP
|
||||
|
@ -617,21 +617,21 @@ it becomes obvious that the explosion is exponential in nature:
|
|||
/dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
The mount explosion problem in the above scenario can be avoided
|
||||
by making each of the new mounts unbindable.
|
||||
The effect of doing this is that recursive mounts of the root
|
||||
directory will not replicate the unbindable mounts.
|
||||
We make such a mount for the first user:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmount \-\-rbind \-\-make\-unbindable / /home/cecilia\fP
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Before going further, we show that unbindable mounts are indeed unbindable:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmkdir /mntZ\fP
|
||||
|
@ -643,21 +643,21 @@ mount: wrong fs type, bad option, bad superblock on /home/cecilia,
|
|||
dmesg | tail or so.
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Now we create unbindable recursive bind mounts for the other two users:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmount \-\-rbind \-\-make\-unbindable / /home/henry\fP
|
||||
# \fBmount \-\-rbind \-\-make\-unbindable / /home/otto\fP
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Upon examining the list of mount points,
|
||||
we see there has been no explosion of mount points,
|
||||
because the unbindable mounts were not replicated
|
||||
under each user's directory:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmount | awk \(aq{print $1, $2, $3}\(aq\fP
|
||||
|
@ -695,7 +695,7 @@ slave+shared slave+shared slave priv unbind
|
|||
private shared priv [2] priv unbind
|
||||
unbindable shared unbind [2] priv unbind
|
||||
.TE
|
||||
|
||||
.sp 1
|
||||
Note the following details to the table:
|
||||
.IP [1] 4
|
||||
If a shared mount is the only mount in its peer group,
|
||||
|
@ -705,9 +705,9 @@ Slaving a nonshared mount has no effect on the mount.
|
|||
.\"
|
||||
.SS Bind (MS_BIND) semantics
|
||||
Suppose that the following command is performed:
|
||||
|
||||
.PP
|
||||
mount \-\-bind A/a B/b
|
||||
|
||||
.PP
|
||||
Here,
|
||||
.I A
|
||||
is the source mount point,
|
||||
|
@ -727,7 +727,7 @@ depends on the propagation types of the mount points
|
|||
and
|
||||
.IR B ,
|
||||
and is summarized in the following table.
|
||||
|
||||
.PP
|
||||
.TS
|
||||
lb2 lb1 lb2 lb2 lb2 lb0
|
||||
lb2 lb1 lb2 lb2 lb2 lb0
|
||||
|
@ -738,20 +738,20 @@ _
|
|||
dest(B) shared | shared shared slave+shared invalid
|
||||
nonshared | shared private slave invalid
|
||||
.TE
|
||||
|
||||
.sp 1
|
||||
Note that a recursive bind of a subtree follows the same semantics
|
||||
as for a bind operation on each mount in the subtree.
|
||||
(Unbindable mounts are automatically pruned at the target mount point.)
|
||||
|
||||
.PP
|
||||
For further details, see
|
||||
.I Documentation/filesystems/sharedsubtree.txt
|
||||
in the kernel source tree.
|
||||
.\"
|
||||
.SS Move (MS_MOVE) semantics
|
||||
Suppose that the following command is performed:
|
||||
|
||||
.PP
|
||||
mount \-\-move A B/b
|
||||
|
||||
.PP
|
||||
Here,
|
||||
.I A
|
||||
is the source mount point,
|
||||
|
@ -767,7 +767,7 @@ depends on the propagation types of the mount points
|
|||
and
|
||||
.IR B ,
|
||||
and is summarized in the following table.
|
||||
|
||||
.PP
|
||||
.TS
|
||||
lb2 lb1 lb2 lb2 lb2 lb0
|
||||
lb2 lb1 lb2 lb2 lb2 lb0
|
||||
|
@ -778,18 +778,18 @@ _
|
|||
dest(B) shared | shared shared slave+shared invalid
|
||||
nonshared | shared private slave unbindable
|
||||
.TE
|
||||
|
||||
.sp 1
|
||||
Note: moving a mount that resides under a shared mount is invalid.
|
||||
|
||||
.PP
|
||||
For further details, see
|
||||
.I Documentation/filesystems/sharedsubtree.txt
|
||||
in the kernel source tree.
|
||||
.\"
|
||||
.SS Mount semantics
|
||||
Suppose that we use the following command to create a mount point:
|
||||
|
||||
.PP
|
||||
mount device B/b
|
||||
|
||||
.PP
|
||||
Here,
|
||||
.I B
|
||||
is the destination mount point, and
|
||||
|
@ -804,9 +804,9 @@ is considered always to be private.
|
|||
.\"
|
||||
.SS Unmount semantics
|
||||
Suppose that we use the following command to tear down a mount point:
|
||||
|
||||
.PP
|
||||
unmount A
|
||||
|
||||
.PP
|
||||
Here,
|
||||
.I A
|
||||
is a mount point on
|
||||
|
@ -835,7 +835,7 @@ record in cases where a process can't see a slave's immediate master
|
|||
the filesystem root directory)
|
||||
and so cannot determine the
|
||||
chain of propagation between the mounts it can see.
|
||||
|
||||
.PP
|
||||
In the following example, we first create a two-link master-slave chain
|
||||
between the mounts
|
||||
.IR /mnt ,
|
||||
|
@ -850,7 +850,7 @@ mount point unreachable from the root directory,
|
|||
creating a situation where the master of
|
||||
.IR /mnt/tmp/etc
|
||||
is not reachable from the (new) root directory of the process.
|
||||
|
||||
.PP
|
||||
First, we bind mount the root directory onto
|
||||
.IR /mnt
|
||||
and then bind mount
|
||||
|
@ -863,7 +863,7 @@ the
|
|||
.BR proc (5)
|
||||
filesystem remains visible at the correct location
|
||||
in the chroot-ed environment.
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmkdir \-p /mnt/proc\fP
|
||||
|
@ -871,11 +871,11 @@ in the chroot-ed environment.
|
|||
# \fBmount \-\-bind /proc /mnt/proc\fP
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Next, we ensure that the
|
||||
.IR /mnt
|
||||
mount is a shared mount in a new peer group (with no peers):
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmount \-\-make\-private /mnt\fP # Isolate from any previous peer group
|
||||
|
@ -885,12 +885,12 @@ mount is a shared mount in a new peer group (with no peers):
|
|||
248 239 0:4 / /mnt/proc ... shared:5
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Next, we bind mount
|
||||
.IR /mnt/etc
|
||||
onto
|
||||
.IR /tmp/etc :
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmkdir \-p /tmp/etc\fP
|
||||
|
@ -901,7 +901,7 @@ onto
|
|||
267 40 8:2 /etc /tmp/etc ... shared:102
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Initially, these two mount points are in the same peer group,
|
||||
but we then make the
|
||||
.IR /tmp/etc
|
||||
|
@ -911,7 +911,7 @@ and then make
|
|||
.IR /tmp/etc
|
||||
shared as well,
|
||||
so that it can propagate events to the next slave in the chain:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmount \-\-make\-slave /tmp/etc\fP
|
||||
|
@ -922,7 +922,7 @@ so that it can propagate events to the next slave in the chain:
|
|||
267 40 8:2 /etc /tmp/etc ... shared:105 master:102
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Then we bind mount
|
||||
.IR /tmp/etc
|
||||
onto
|
||||
|
@ -932,7 +932,7 @@ but we then make
|
|||
.IR /mnt/tmp/etc
|
||||
a slave of
|
||||
.IR /tmp/etc :
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBmkdir \-p /mnt/tmp/etc\fP
|
||||
|
@ -952,23 +952,23 @@ is the master of the slave
|
|||
.IR /tmp/etc ,
|
||||
which in turn is the master of the slave
|
||||
.IR /mnt/tmp/etc .
|
||||
|
||||
.PP
|
||||
We then
|
||||
.BR chroot (1)
|
||||
to the
|
||||
.IR /mnt
|
||||
directory, which renders the mount with ID 267 unreachable
|
||||
from the (new) root directory:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBchroot /mnt\fP
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
When we examine the state of the mounts inside the chroot-ed environment,
|
||||
we see the following:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
# \fBcat /proc/self/mountinfo | sed \(aqs/ \- .*//\(aq\fP
|
||||
|
@ -977,7 +977,7 @@ we see the following:
|
|||
273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
Above, we see that the mount with ID 273
|
||||
is a slave whose master is the peer group 105.
|
||||
The mount point for that master is unreachable, and so a
|
||||
|
@ -1006,7 +1006,7 @@ then the propagation type of the new mount is also
|
|||
Otherwise, the propagation type of the new mount is
|
||||
.BR MS_PRIVATE .
|
||||
But see also NOTES.
|
||||
|
||||
.PP
|
||||
Notwithstanding the fact that the default propagation type
|
||||
for new mount points is in many cases
|
||||
.BR MS_PRIVATE ,
|
||||
|
@ -1019,7 +1019,7 @@ automatically remounts all mount points as
|
|||
on system startup.
|
||||
Thus, on most modern systems, the default propagation type is in practice
|
||||
.BR MS_SHARED .
|
||||
|
||||
.PP
|
||||
Since, when one uses
|
||||
.BR unshare (1)
|
||||
to create a mount namespace,
|
||||
|
@ -1034,14 +1034,14 @@ by making all mount points private in the new namespace.
|
|||
That is,
|
||||
.BR unshare (1)
|
||||
performs the equivalent of the following in the new mount namespace:
|
||||
|
||||
.PP
|
||||
mount \-\-make\-rprivate /
|
||||
|
||||
.PP
|
||||
To prevent this, one can use the
|
||||
.IR "\-\-propagation\ unchanged"
|
||||
option to
|
||||
.BR unshare (1).
|
||||
|
||||
.PP
|
||||
For a discussion of propagation types when moving mounts
|
||||
.RB ( MS_MOVE )
|
||||
and creating bind mounts
|
||||
|
@ -1058,6 +1058,6 @@ see
|
|||
.BR proc (5),
|
||||
.BR namespaces (7),
|
||||
.BR user_namespaces (7)
|
||||
|
||||
.PP
|
||||
.IR Documentation/filesystems/sharedsubtree.txt
|
||||
in the kernel source tree.
|
||||
|
|
|
@ -34,7 +34,7 @@ This API is distinct from that provided by System V message queues
|
|||
.BR msgsnd (2),
|
||||
.BR msgrcv (2),
|
||||
etc.), but provides similar functionality.
|
||||
|
||||
.PP
|
||||
Message queues are created and opened using
|
||||
.BR mq_open (3);
|
||||
this function returns a
|
||||
|
@ -49,7 +49,7 @@ that is, a null-terminated string of up to
|
|||
followed by one or more characters, none of which are slashes.
|
||||
Two processes can operate on the same queue by passing the same name to
|
||||
.BR mq_open (3).
|
||||
|
||||
.PP
|
||||
Messages are transferred to and from a queue using
|
||||
.BR mq_send (3)
|
||||
and
|
||||
|
@ -65,7 +65,7 @@ and
|
|||
A process can request asynchronous notification
|
||||
of the arrival of a message on a previously empty queue using
|
||||
.BR mq_notify (3).
|
||||
|
||||
.PP
|
||||
A message queue descriptor is a reference to an
|
||||
.I "open message queue description"
|
||||
(cf.
|
||||
|
@ -78,7 +78,7 @@ as the corresponding message queue descriptors in the parent.
|
|||
Corresponding message queue descriptors in the two processes share the flags
|
||||
.RI ( mq_flags )
|
||||
that are associated with the open message queue description.
|
||||
|
||||
.PP
|
||||
Each message has an associated
|
||||
.IR priority ,
|
||||
and messages are always delivered to the receiving process
|
||||
|
@ -184,7 +184,7 @@ limit is ignored for privileged processes
|
|||
but the
|
||||
.BR HARD_MSGMAX
|
||||
ceiling is nevertheless imposed.
|
||||
|
||||
.IP
|
||||
The definition of
|
||||
.BR HARD_MSGMAX
|
||||
has changed across kernel versions:
|
||||
|
@ -294,14 +294,14 @@ commands:
|
|||
.fi
|
||||
.in
|
||||
The sticky bit is automatically enabled on the mount directory.
|
||||
|
||||
.PP
|
||||
After the filesystem has been mounted, the message queues on the system
|
||||
can be viewed and manipulated using the commands usually used for files
|
||||
(e.g.,
|
||||
.BR ls (1)
|
||||
and
|
||||
.BR rm (1)).
|
||||
|
||||
.PP
|
||||
The contents of each file in the directory consist of a single line
|
||||
containing information about the queue:
|
||||
.in +4n
|
||||
|
@ -345,7 +345,7 @@ This means that a message queue descriptor can be monitored using
|
|||
or
|
||||
.BR epoll (7).
|
||||
This is not portable.
|
||||
|
||||
.PP
|
||||
The close-on-exec flag (see
|
||||
.BR open (2))
|
||||
is automatically set on the file descriptor returned by
|
||||
|
@ -364,7 +364,7 @@ POSIX message queues provide a better designed interface than
|
|||
System V message queues;
|
||||
on the other hand POSIX message queues are less widely available
|
||||
(especially on older systems) than System V message queues.
|
||||
|
||||
.PP
|
||||
Linux does not currently (2.6.26) support the use of access control
|
||||
lists (ACLs) for POSIX message queues.
|
||||
.SH BUGS
|
||||
|
@ -376,7 +376,7 @@ limit could be raised,
|
|||
and the ceiling was enforced even for privileged processes.
|
||||
This ceiling value was removed in Linux 3.14,
|
||||
and patches to stable kernels 3.5.x to 3.13.x also removed the ceiling.
|
||||
|
||||
.PP
|
||||
As originally implemented (and documented),
|
||||
the QSIZE field displayed the total number of (user-supplied)
|
||||
bytes in all messages in the message queue.
|
||||
|
|
|
@ -40,7 +40,7 @@ One of these signals is used to support thread cancellation and POSIX timers
|
|||
the other is used as part of a mechanism that ensures all threads in
|
||||
a process always have the same UIDs and GIDs, as required by POSIX.
|
||||
These signals cannot be used in applications.
|
||||
|
||||
.PP
|
||||
To prevent accidental use of these signals in applications,
|
||||
which might interfere with the operation of the NPTL implementation,
|
||||
various glibc library functions and system call wrapper functions
|
||||
|
@ -86,7 +86,7 @@ the NPTL implementation wraps all of the system calls that
|
|||
change process credentials with functions that,
|
||||
in addition to invoking the underlying system call,
|
||||
arrange for all other threads in the process to also change their credentials.
|
||||
|
||||
.PP
|
||||
The implementation of each of these system calls involves the use of
|
||||
a real-time signal that is sent (using
|
||||
.BR tgkill (2))
|
||||
|
@ -96,7 +96,7 @@ saves the new credential(s) and records the system call being employed
|
|||
in a global buffer.
|
||||
A signal handler in the receiving thread(s) fetches this information and
|
||||
then uses the same system call to change its credentials.
|
||||
|
||||
.PP
|
||||
Wrapper functions employing this technique are provided for
|
||||
.BR setgid (2),
|
||||
.BR setuid (2),
|
||||
|
|
12
man7/numa.7
12
man7/numa.7
|
@ -55,11 +55,11 @@ see "Library Support" below.
|
|||
.\" See also Changelog-2.6.14
|
||||
This file displays information about a process's
|
||||
NUMA memory policy and allocation.
|
||||
|
||||
.PP
|
||||
Each line contains information about a memory range used by the process,
|
||||
displaying\(emamong other information\(emthe effective memory policy for
|
||||
that memory range and on which nodes the pages have been allocated.
|
||||
|
||||
.PP
|
||||
.I numa_maps
|
||||
is a read-only file.
|
||||
When
|
||||
|
@ -67,14 +67,14 @@ When
|
|||
is read, the kernel will scan the virtual address space of the
|
||||
process and report how memory is used.
|
||||
One line is displayed for each unique memory range of the process.
|
||||
|
||||
.PP
|
||||
The first field of each line shows the starting address of the memory range.
|
||||
This field allows a correlation with the contents of the
|
||||
.I /proc/<pid>/maps
|
||||
file,
|
||||
which contains the end address of the range and other information,
|
||||
such as the access permissions and sharing.
|
||||
|
||||
.PP
|
||||
The second field shows the memory policy currently in effect for the
|
||||
memory range.
|
||||
Note that the effective policy is not necessarily the policy
|
||||
|
@ -82,7 +82,7 @@ installed by the process for that memory range.
|
|||
Specifically, if the process installed a "default" policy for that range,
|
||||
the effective policy for that range will be the process policy,
|
||||
which may or may not be "default".
|
||||
|
||||
.PP
|
||||
The rest of the line contains information about the pages allocated in
|
||||
the memory range, as follows:
|
||||
.TP
|
||||
|
@ -163,7 +163,7 @@ and the required
|
|||
header are available in the
|
||||
.I numactl
|
||||
package.
|
||||
|
||||
.PP
|
||||
However, applications should not use these system calls directly.
|
||||
Instead, the higher level interface provided by the
|
||||
.BR numa (3)
|
||||
|
|
|
@ -47,7 +47,7 @@ system call that had the
|
|||
.B CLONE_NEWNS
|
||||
flag set.)
|
||||
This handles the \(aq/\(aq part of the pathname.
|
||||
|
||||
.PP
|
||||
If the pathname does not start with the \(aq/\(aq character, the
|
||||
starting lookup directory of the resolution process is the current working
|
||||
directory of the process.
|
||||
|
@ -55,7 +55,7 @@ directory of the process.
|
|||
It can be changed by use of the
|
||||
.BR chdir (2)
|
||||
system call.)
|
||||
|
||||
.PP
|
||||
Pathnames starting with a \(aq/\(aq character are called absolute pathnames.
|
||||
Pathnames not starting with a \(aq/\(aq are called relative pathnames.
|
||||
.SS Step 2: walk along the path
|
||||
|
@ -63,27 +63,27 @@ Set the current lookup directory to the starting lookup directory.
|
|||
Now, for each nonfinal component of the pathname, where a component
|
||||
is a substring delimited by \(aq/\(aq characters, this component is looked up
|
||||
in the current lookup directory.
|
||||
|
||||
.PP
|
||||
If the process does not have search permission on
|
||||
the current lookup directory,
|
||||
an
|
||||
.B EACCES
|
||||
error is returned ("Permission denied").
|
||||
|
||||
.PP
|
||||
If the component is not found, an
|
||||
.B ENOENT
|
||||
error is returned
|
||||
("No such file or directory").
|
||||
|
||||
.PP
|
||||
If the component is found, but is neither a directory nor a symbolic link,
|
||||
an
|
||||
.B ENOTDIR
|
||||
error is returned ("Not a directory").
|
||||
|
||||
.PP
|
||||
If the component is found and is a directory, we set the
|
||||
current lookup directory to that directory, and go to the
|
||||
next component.
|
||||
|
||||
.PP
|
||||
If the component is found and is a symbolic link (symlink), we first
|
||||
resolve this symbolic link (with the current lookup directory
|
||||
as starting lookup directory).
|
||||
|
@ -106,7 +106,7 @@ An
|
|||
.B ELOOP
|
||||
error is returned when the maximum is
|
||||
exceeded ("Too many levels of symbolic links").
|
||||
|
||||
.PP
|
||||
.\"
|
||||
.\" presently: max recursion depth during symlink resolution: 5
|
||||
.\" max total number of symbolic links followed: 40
|
||||
|
@ -140,17 +140,17 @@ system calls.
|
|||
By convention, every directory has the entries "." and "..",
|
||||
which refer to the directory itself and to its parent directory,
|
||||
respectively.
|
||||
|
||||
.PP
|
||||
The path resolution process will assume that these entries have
|
||||
their conventional meanings, regardless of whether they are
|
||||
actually present in the physical filesystem.
|
||||
|
||||
.PP
|
||||
One cannot walk down past the root: "/.." is the same as "/".
|
||||
.SS Mount points
|
||||
After a "mount dev path" command, the pathname "path" refers to
|
||||
the root of the filesystem hierarchy on the device "dev", and no
|
||||
longer to whatever it referred to earlier.
|
||||
|
||||
.PP
|
||||
One can walk out of a mounted filesystem: "path/.." refers to
|
||||
the parent directory of "path",
|
||||
outside of the filesystem hierarchy on "dev".
|
||||
|
@ -196,16 +196,16 @@ effective group ID of the calling process, or is one of the
|
|||
supplementary group IDs of the calling process (as set by
|
||||
.BR setgroups (2)).
|
||||
When neither holds, the third group is used.
|
||||
|
||||
.PP
|
||||
Of the three bits used, the first bit determines read permission,
|
||||
the second write permission, and the last execute permission
|
||||
in case of ordinary files, or search permission in case of directories.
|
||||
|
||||
.PP
|
||||
Linux uses the fsuid instead of the effective user ID in permission checks.
|
||||
Ordinarily the fsuid will equal the effective user ID, but the fsuid can be
|
||||
changed by the system call
|
||||
.BR setfsuid (2).
|
||||
|
||||
.PP
|
||||
(Here "fsuid" stands for something like "filesystem user ID".
|
||||
The concept was required for the implementation of a user space
|
||||
NFS server at a time when processes could send a signal to a process
|
||||
|
@ -213,7 +213,7 @@ with the same effective user ID.
|
|||
It is obsolete now.
|
||||
Nobody should use
|
||||
.BR setfsuid (2).)
|
||||
|
||||
.PP
|
||||
Similarly, Linux uses the fsgid ("filesystem group ID")
|
||||
instead of the effective group ID.
|
||||
See
|
||||
|
@ -230,7 +230,7 @@ when accessing files.
|
|||
.\" on some implementations (e.g., Solaris, FreeBSD),
|
||||
.\" access(X_OK) by superuser will report success, regardless
|
||||
.\" of the file's execute permission bits. -- MTK (Oct 05)
|
||||
|
||||
.PP
|
||||
On Linux, superuser privileges are divided into capabilities (see
|
||||
.BR capabilities (7)).
|
||||
Two capabilities are relevant for file permissions checks:
|
||||
|
@ -238,13 +238,13 @@ Two capabilities are relevant for file permissions checks:
|
|||
and
|
||||
.BR CAP_DAC_READ_SEARCH .
|
||||
(A process has these capabilities if its fsuid is 0.)
|
||||
|
||||
.PP
|
||||
The
|
||||
.B CAP_DAC_OVERRIDE
|
||||
capability overrides all permission checking,
|
||||
but grants execute permission only when at least one
|
||||
of the file's three execute permission bits is set.
|
||||
|
||||
.PP
|
||||
The
|
||||
.B CAP_DAC_READ_SEARCH
|
||||
capability grants read and search permission
|
||||
|
|
|
@ -21,7 +21,7 @@ The persistent keyring has a name (description) of the form
|
|||
where
|
||||
.I <UID>
|
||||
is the user ID of the corresponding user.
|
||||
|
||||
.PP
|
||||
The persistent keyring may not be accessed directly,
|
||||
even by processes with the appropriate UID.
|
||||
.\" FIXME The meaning of the preceding sentence isn't clear. What is meant?
|
||||
|
@ -31,30 +31,30 @@ by virtue of its possessor permits.
|
|||
This linking is done with the
|
||||
.BR keyctl_get_persistent (3)
|
||||
function.
|
||||
|
||||
.PP
|
||||
If a persistent keyring does not exist when it is accessed by the
|
||||
.BR keyctl_get_persistent (3)
|
||||
operation, it will be automatically created.
|
||||
|
||||
.PP
|
||||
Each time the
|
||||
.BR keyctl_get_persistent (3)
|
||||
operation is performed,
|
||||
the persistent key's expiration timer is reset to the value in:
|
||||
|
||||
.PP
|
||||
/proc/sys/kernel/keys/persistent_keyring_expiry
|
||||
|
||||
.PP
|
||||
Should the timeout be reached,
|
||||
the persistent keyring will be removed and
|
||||
everything it pins can then be garbage collected.
|
||||
The key will then be re-created on a subsequent call to
|
||||
.BR keyctl_get_persistent (3).
|
||||
|
||||
.PP
|
||||
The persistent keyring is not directly searched by
|
||||
.BR request_key (2);
|
||||
it is searched only if it is linked into one of the keyrings
|
||||
that is searched by
|
||||
.BR request_key (2).
|
||||
|
||||
.PP
|
||||
The persistent keyring is independent of
|
||||
.BR clone (2),
|
||||
.BR fork (2),
|
||||
|
@ -74,7 +74,7 @@ The persistent keyring can thus be used to
|
|||
hold authentication tokens for processes that run without user interaction,
|
||||
such as programs started by
|
||||
.BR cron (8).
|
||||
|
||||
.PP
|
||||
The persistent keyring is used to store UID-specific objects that
|
||||
themselves have limited lifetimes (e.g., kerberos tokens).
|
||||
If those tokens cease to be used
|
||||
|
|
|
@ -30,14 +30,14 @@ pid_namespaces \- overview of Linux PID namespaces
|
|||
.SH DESCRIPTION
|
||||
For an overview of namespaces, see
|
||||
.BR namespaces (7).
|
||||
|
||||
.PP
|
||||
PID namespaces isolate the process ID number space,
|
||||
meaning that processes in different PID namespaces can have the same PID.
|
||||
PID namespaces allow containers to provide functionality
|
||||
such as suspending/resuming the set of processes in the container and
|
||||
migrating the container to a new host
|
||||
while the processes inside the container maintain the same PIDs.
|
||||
|
||||
.PP
|
||||
PIDs in a new PID namespace start at 1,
|
||||
somewhat like a standalone system, and calls to
|
||||
.BR fork (2),
|
||||
|
@ -45,7 +45,7 @@ somewhat like a standalone system, and calls to
|
|||
or
|
||||
.BR clone (2)
|
||||
will produce processes with PIDs that are unique within the namespace.
|
||||
|
||||
.PP
|
||||
Use of PID namespaces requires a kernel that is configured with the
|
||||
.B CONFIG_PID_NS
|
||||
option.
|
||||
|
@ -72,7 +72,7 @@ in the same PID namespace employed the
|
|||
.BR prctl (2)
|
||||
.B PR_SET_CHILD_SUBREAPER
|
||||
command to mark itself as the reaper of orphaned descendant processes).
|
||||
|
||||
.PP
|
||||
If the "init" process of a PID namespace terminates,
|
||||
the kernel terminates all of the processes in the namespace via a
|
||||
.BR SIGKILL
|
||||
|
@ -99,13 +99,13 @@ terminates, then subsequent calls to
|
|||
.BR fork (2)
|
||||
will fail with
|
||||
.BR ENOMEM .
|
||||
|
||||
.PP
|
||||
Only signals for which the "init" process has established a signal handler
|
||||
can be sent to the "init" process by other members of the PID namespace.
|
||||
This restriction applies even to privileged processes,
|
||||
and prevents other members of the PID namespace from
|
||||
accidentally killing the "init" process.
|
||||
|
||||
.PP
|
||||
Likewise, a process in an ancestor namespace
|
||||
can\(emsubject to the usual permission checks described in
|
||||
.BR kill (2)\(emsend
|
||||
|
@ -125,7 +125,7 @@ these signals are forcibly delivered when sent from an ancestor PID namespace.
|
|||
Neither of these signals can be caught by the "init" process,
|
||||
and so will result in the usual actions associated with those signals
|
||||
(respectively, terminating and stopping the process).
|
||||
|
||||
.PP
|
||||
Starting with Linux 3.4, the
|
||||
.BR reboot (2)
|
||||
system call causes a signal to be sent to the namespace "init" process.
|
||||
|
@ -150,7 +150,7 @@ Since Linux 3.7,
|
|||
.\" commit f2302505775fd13ba93f034206f1e2a587017929
|
||||
.\" The kernel constant MAX_PID_NS_LEVEL
|
||||
the kernel limits the maximum nesting depth for PID namespaces to 32.
|
||||
|
||||
.PP
|
||||
A process is visible to other processes in its PID namespace,
|
||||
and to the processes in each direct ancestor PID namespace
|
||||
going back to the root PID namespace.
|
||||
|
@ -165,7 +165,7 @@ set nice values with
|
|||
.BR setpriority (2),
|
||||
etc.) only processes contained in its own PID namespace
|
||||
and in descendants of that namespace.
|
||||
|
||||
.PP
|
||||
A process has one process ID in each of the layers of the PID
|
||||
namespace hierarchy in which is visible,
|
||||
and walking back though each direct ancestor namespace
|
||||
|
@ -177,7 +177,7 @@ A call to
|
|||
.BR getpid (2)
|
||||
always returns the PID associated with the namespace in which
|
||||
the process was created.
|
||||
|
||||
.PP
|
||||
Some processes in a PID namespace may have parents
|
||||
that are outside of the namespace.
|
||||
For example, the parent of the initial process in the namespace
|
||||
|
@ -192,7 +192,7 @@ PID namespace from the caller of
|
|||
Calls to
|
||||
.BR getppid (2)
|
||||
for such processes return 0.
|
||||
|
||||
.PP
|
||||
While processes may freely descend into child PID namespaces
|
||||
(e.g., using
|
||||
.BR setns (2)
|
||||
|
@ -201,7 +201,7 @@ they may not move in the other direction.
|
|||
That is to say, processes may not enter any ancestor namespaces
|
||||
(parent, grandparent, etc.).
|
||||
Changing PID namespaces is a one-way operation.
|
||||
|
||||
.PP
|
||||
The
|
||||
.BR NS_GET_PARENT
|
||||
.BR ioctl (2)
|
||||
|
@ -231,7 +231,7 @@ because doing so would change the caller's idea of its own PID
|
|||
(as reported by
|
||||
.BR getpid ()),
|
||||
which would break many applications and libraries.
|
||||
|
||||
.PP
|
||||
To put things another way:
|
||||
a process's PID namespace membership is determined when the process is created
|
||||
and cannot be changed thereafter.
|
||||
|
@ -260,7 +260,7 @@ type in
|
|||
Since this is computed when a signal is enqueued,
|
||||
a signal queue shared by processes in multiple PID namespaces
|
||||
would defeat that.
|
||||
|
||||
.PP
|
||||
.\" Note these restrictions were all introduced in
|
||||
.\" 8382fcac1b813ad0a4e68a838fc7ae93fa39eda0
|
||||
.\" when CLONE_NEWPID|CLONE_VM was disallowed
|
||||
|
@ -289,7 +289,7 @@ directories) only processes visible in the PID namespace
|
|||
of the process that performed the mount, even if the
|
||||
.I /proc
|
||||
filesystem is viewed from processes in other namespaces.
|
||||
|
||||
.PP
|
||||
After creating a new PID namespace,
|
||||
it is useful for the child to change its root directory
|
||||
and mount a new procfs instance at
|
||||
|
@ -308,7 +308,7 @@ or
|
|||
then it isn't necessary to change the root directory:
|
||||
a new procfs instance can be mounted directly over
|
||||
.IR /proc .
|
||||
|
||||
.PP
|
||||
From a shell, the command to mount
|
||||
.I /proc
|
||||
is:
|
||||
|
|
42
man7/pipe.7
42
man7/pipe.7
|
@ -34,7 +34,7 @@ and a
|
|||
.IR "write end" .
|
||||
Data written to the write end of a pipe can be read
|
||||
from the read end of the pipe.
|
||||
|
||||
.PP
|
||||
A pipe is created using
|
||||
.BR pipe (2),
|
||||
which creates a new pipe and returns two file descriptors,
|
||||
|
@ -44,7 +44,7 @@ Pipes can be used to create a communication channel between related
|
|||
processes; see
|
||||
.BR pipe (2)
|
||||
for an example.
|
||||
|
||||
.PP
|
||||
A FIFO (short for First In First Out) has a name within the filesystem
|
||||
(created using
|
||||
.BR mkfifo (3)),
|
||||
|
@ -68,7 +68,7 @@ The only difference between pipes and FIFOs is the manner in which
|
|||
they are created and opened.
|
||||
Once these tasks have been accomplished,
|
||||
I/O on pipes and FIFOs has exactly the same semantics.
|
||||
|
||||
.PP
|
||||
If a process attempts to read from an empty pipe, then
|
||||
.BR read (2)
|
||||
will block until data is available.
|
||||
|
@ -82,11 +82,11 @@ Nonblocking I/O is possible by using the
|
|||
operation to enable the
|
||||
.B O_NONBLOCK
|
||||
open file status flag.
|
||||
|
||||
.PP
|
||||
The communication channel provided by a pipe is a
|
||||
.IR "byte stream" :
|
||||
there is no concept of message boundaries.
|
||||
|
||||
.PP
|
||||
If all file descriptors referring to the write end of a pipe
|
||||
have been closed, then an attempt to
|
||||
.BR read (2)
|
||||
|
@ -113,7 +113,7 @@ calls to close unnecessary duplicate file descriptors;
|
|||
this ensures that end-of-file and
|
||||
.BR SIGPIPE / EPIPE
|
||||
are delivered when appropriate.
|
||||
|
||||
.PP
|
||||
It is not possible to apply
|
||||
.BR lseek (2)
|
||||
to a pipe.
|
||||
|
@ -129,7 +129,7 @@ Applications should not rely on a particular capacity:
|
|||
an application should be designed so that a reading process consumes data
|
||||
as soon as it is available,
|
||||
so that a writing process does not remain blocked.
|
||||
|
||||
.PP
|
||||
In Linux versions before 2.6.11, the capacity of a pipe was the same as
|
||||
the system page size (e.g., 4096 bytes on i386).
|
||||
Since Linux 2.6.11, the pipe capacity is 16 pages
|
||||
|
@ -144,7 +144,7 @@ operations.
|
|||
See
|
||||
.BR fcntl (2)
|
||||
for more information.
|
||||
|
||||
.PP
|
||||
The following
|
||||
.BR ioctl (2)
|
||||
operation, which can be applied to a file descriptor
|
||||
|
@ -152,9 +152,9 @@ that refers to either end of a pipe,
|
|||
places a count of the number of unread bytes in the pipe in the
|
||||
.I int
|
||||
buffer pointed to by the final argument of the call:
|
||||
|
||||
.PP
|
||||
ioctl(fd, FIONREAD, &nbytes);
|
||||
|
||||
.PP
|
||||
The
|
||||
.B FIONREAD
|
||||
operation is not specified in any standard,
|
||||
|
@ -170,10 +170,10 @@ An upper limit, in pages, on the capacity that an unprivileged user
|
|||
.BR CAP_SYS_RESOURCE
|
||||
capability)
|
||||
can set for a pipe.
|
||||
|
||||
.IP
|
||||
The default value for this limit is 16 times the default pipe capacity
|
||||
(see above); the lower limit is two pages.
|
||||
|
||||
.IP
|
||||
This interface was removed in Linux 2.6.35, in favor of
|
||||
.IR /proc/sys/fs/pipe-max-size .
|
||||
.TP
|
||||
|
@ -189,14 +189,14 @@ The value assigned to this file may be rounded upward,
|
|||
to reflect the value actually employed for a convenient implementation.
|
||||
To determine the rounded-up value,
|
||||
display the contents of this file after assigning a value to it.
|
||||
|
||||
.IP
|
||||
The default value for this file is 1048576 (1 MiB).
|
||||
The minimum value that can be assigned to this file is the system page size.
|
||||
Attempts to set a limit less than the page size cause
|
||||
.BR write (2)
|
||||
to fail with the error
|
||||
.BR EINVAL .
|
||||
|
||||
.IP
|
||||
Since Linux 4.9,
|
||||
.\" commit 086e774a57fba4695f14383c0818994c0b31da7c
|
||||
the value on this file also acts as a ceiling on the default capacity
|
||||
|
@ -214,7 +214,7 @@ So long as the total number of pages allocated to pipe buffers
|
|||
for this user is at this limit,
|
||||
attempts to create new pipes will be denied,
|
||||
and attempts to increase a pipe's capacity will be denied.
|
||||
|
||||
.IP
|
||||
When the value of this limit is zero (which is the default),
|
||||
no hard limit is applied.
|
||||
.\" The default was chosen to avoid breaking existing applications that
|
||||
|
@ -232,7 +232,7 @@ So long as the total number of pages allocated to pipe buffers
|
|||
for this user is at this limit,
|
||||
individual pipes created by a user will be limited to one page,
|
||||
and attempts to increase a pipe's capacity will be denied.
|
||||
|
||||
.IP
|
||||
When the value of this limit is zero, no soft limit is applied.
|
||||
The default value for this file is 16384,
|
||||
which permits creating up to 1024 pipes with the default capacity.
|
||||
|
@ -321,7 +321,7 @@ a pipe or FIFO are
|
|||
.B O_NONBLOCK
|
||||
and
|
||||
.BR O_ASYNC .
|
||||
|
||||
.PP
|
||||
Setting the
|
||||
.B O_ASYNC
|
||||
flag for the read end of a pipe causes a signal
|
||||
|
@ -359,7 +359,7 @@ and excluded the memory required for the increased pipe capacity.
|
|||
The new increase in pipe capacity could then push the total
|
||||
memory used by the user for pipes (possibly far) over a limit.
|
||||
(This could also trigger the problem described next.)
|
||||
|
||||
.IP
|
||||
Starting with Linux 4.9,
|
||||
the limit checking includes the memory required for the new pipe capacity.
|
||||
.IP (2)
|
||||
|
@ -368,13 +368,13 @@ less than the existing pipe capacity.
|
|||
This could lead to problems if a user set a large pipe capacity,
|
||||
and then the limits were lowered, with the result that the user could
|
||||
no longer decrease the pipe capacity.
|
||||
|
||||
.IP
|
||||
Starting with Linux 4.9, checks against the limits
|
||||
are performed only when increasing a pipe's capacity;
|
||||
an unprivileged user can always decrease a pipe's capacity.
|
||||
.IP (3)
|
||||
The accounting and checking against the limits were done as follows:
|
||||
|
||||
.IP
|
||||
.RS
|
||||
.PD 0
|
||||
.IP (a) 4
|
||||
|
@ -391,7 +391,7 @@ Multiple processes could pass point (a) simultaneously,
|
|||
and then allocate pipe buffers that were accounted for only in step (c),
|
||||
with the result that the user's pipe buffer
|
||||
allocation could be pushed over the limit.
|
||||
|
||||
.IP
|
||||
Starting with Linux 4.9,
|
||||
the accounting step is performed before doing the allocation,
|
||||
and the operation fails if the limit would be exceeded.
|
||||
|
|
24
man7/pkeys.7
24
man7/pkeys.7
|
@ -34,13 +34,13 @@ when changing permissions.
|
|||
Memory Protection Keys provide a mechanism for changing
|
||||
protections without requiring modification of the page tables on
|
||||
every permission change.
|
||||
|
||||
.PP
|
||||
To use pkeys, software must first "tag" a page in the page tables
|
||||
with a pkey.
|
||||
After this tag is in place, an application only has
|
||||
to change the contents of a register in order to remove write
|
||||
access, or all access to a tagged page.
|
||||
|
||||
.PP
|
||||
Protection keys work in conjunction with the existing
|
||||
.BR PROT_READ /
|
||||
.BR PROT_WRITE /
|
||||
|
@ -51,7 +51,7 @@ and
|
|||
.BR mmap (2),
|
||||
but always act to further restrict these traditional permission
|
||||
mechanisms.
|
||||
|
||||
.PP
|
||||
If a process performs an access that violates pkey
|
||||
restrictions, it receives a
|
||||
.BR SIGSEGV
|
||||
|
@ -59,7 +59,7 @@ signal.
|
|||
See
|
||||
.BR sigaction (2)
|
||||
for details of the information available with that signal.
|
||||
|
||||
.PP
|
||||
To use the pkeys feature, the processor must support it, and the kernel
|
||||
must contain support for the feature on a given processor.
|
||||
As of early 2016 only future Intel x86 processors are supported,
|
||||
|
@ -69,7 +69,7 @@ are available for actual application use.
|
|||
The default key is assigned to any memory region for which a
|
||||
pkey has not been explicitly assigned via
|
||||
.BR pkey_mprotect (2).
|
||||
|
||||
.PP
|
||||
Protection keys have the potential to add a layer of security and
|
||||
reliability to applications.
|
||||
But they have not been primarily designed as
|
||||
|
@ -77,7 +77,7 @@ a security feature.
|
|||
For instance, WRPKRU is a completely unprivileged
|
||||
instruction, so pkeys are useless in any case that an attacker controls
|
||||
the PKRU register or can execute arbitrary instructions.
|
||||
|
||||
.PP
|
||||
Applications should be very careful to ensure that they do not "leak"
|
||||
protection keys.
|
||||
For instance, before calling
|
||||
|
@ -96,7 +96,7 @@ Applications may implement these checks by searching the
|
|||
file for memory regions with the pkey assigned.
|
||||
Further details can be found in
|
||||
.BR proc (5).
|
||||
|
||||
.PP
|
||||
Any application wanting to use protection keys needs to be able
|
||||
to function without them.
|
||||
They might be unavailable because the hardware that the
|
||||
|
@ -110,7 +110,7 @@ keys should simply call
|
|||
and test whether the call succeeds,
|
||||
instead of attempting to detect support for the
|
||||
feature in any other way.
|
||||
|
||||
.PP
|
||||
Although unnecessary, hardware support for protection keys may be
|
||||
enumerated with the
|
||||
.I cpuid
|
||||
|
@ -123,7 +123,7 @@ under the "flags" field.
|
|||
The string "pku" in this field indicates hardware support for protection
|
||||
keys and the string "ospke" indicates that the kernel contains and has
|
||||
enabled protection keys support.
|
||||
|
||||
.PP
|
||||
Applications using threads and protection keys should be especially
|
||||
careful.
|
||||
Threads inherit the protection key rights of the parent at the time
|
||||
|
@ -145,7 +145,7 @@ key rights upon entering a signal handler if the desired rights differ
|
|||
from the defaults.
|
||||
The rights of any interrupted context are restored when the signal
|
||||
handler returns.
|
||||
|
||||
.PP
|
||||
This signal behavior is unusual and is due to the fact that the x86 PKRU
|
||||
register (which stores protection key access rights) is managed with the
|
||||
same hardware mechanism (XSAVE) that manages floating-point registers.
|
||||
|
@ -157,7 +157,7 @@ The Linux kernel implements the following pkey-related system calls:
|
|||
.BR pkey_alloc (2),
|
||||
and
|
||||
.BR pkey_free (2).
|
||||
|
||||
.PP
|
||||
The Linux pkey system calls are available only if the kernel was
|
||||
configured and built with the
|
||||
.BR CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
|
||||
|
@ -171,7 +171,7 @@ After that, it attempts to allocate a protection key and
|
|||
disallows access to the page by using the WRPKRU instruction.
|
||||
It then tries to access the page,
|
||||
which we now expect to cause a fatal signal to the application.
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
.RB "$" " ./a.out"
|
||||
|
|
|
@ -22,14 +22,14 @@ A special serial number value,
|
|||
.BR KEY_SPEC_PROCESS_KEYRING ,
|
||||
is defined that can be used in lieu of the actual serial number of
|
||||
the calling process's process keyring.
|
||||
|
||||
.PP
|
||||
From the
|
||||
.BR keyctl (1)
|
||||
utility, '\fB@p\fP' can be used instead of a numeric key ID in
|
||||
much the same way, but since
|
||||
.BR keyctl (1)
|
||||
is a program run after forking, this is of no utility.
|
||||
|
||||
.PP
|
||||
A thread created using the
|
||||
.BR clone (2)
|
||||
.B CLONE_THREAD
|
||||
|
@ -42,7 +42,7 @@ A process's process keyring is cleared on
|
|||
.BR execve (2).
|
||||
The process keyring is destroyed when the last
|
||||
thread that refers to it terminates.
|
||||
|
||||
.PP
|
||||
If a process doesn't have a process keyring when it is accessed,
|
||||
then the process keyring will be created if the keyring is to be modified;
|
||||
otherwise, the error
|
||||
|
|
|
@ -33,7 +33,7 @@ A single process can contain multiple threads,
|
|||
all of which are executing the same program.
|
||||
These threads share the same global memory (data and heap segments),
|
||||
but each thread has its own stack (automatic variables).
|
||||
|
||||
.PP
|
||||
POSIX.1 also requires that threads share a range of other attributes
|
||||
(i.e., these attributes are process-wide rather than per-thread):
|
||||
.IP \- 3
|
||||
|
@ -121,12 +121,12 @@ This identifier is returned to the caller of
|
|||
.BR pthread_create (3),
|
||||
and a thread can obtain its own thread identifier using
|
||||
.BR pthread_self (3).
|
||||
|
||||
.PP
|
||||
Thread IDs are guaranteed to be unique only within a process.
|
||||
(In all pthreads functions that accept a thread ID as an argument,
|
||||
that ID by definition refers to a thread in
|
||||
the same process as the caller.)
|
||||
|
||||
.PP
|
||||
The system may reuse a thread ID after a terminated thread has been joined,
|
||||
or a detached thread has terminated.
|
||||
POSIX says: "If an application attempts to use a thread ID whose
|
||||
|
@ -135,7 +135,7 @@ lifetime has ended, the behavior is undefined."
|
|||
A thread-safe function is one that can be safely
|
||||
(i.e., it will deliver the same results regardless of whether it is)
|
||||
called from multiple threads at the same time.
|
||||
|
||||
.PP
|
||||
POSIX.1-2001 and POSIX.1-2008 require that all functions specified
|
||||
in the standard shall be thread-safe,
|
||||
except for the following functions:
|
||||
|
@ -239,7 +239,7 @@ wctomb()
|
|||
An async-cancel-safe function is one that can be safely called
|
||||
in an application where asynchronous cancelability is enabled (see
|
||||
.BR pthread_setcancelstate (3)).
|
||||
|
||||
.PP
|
||||
Only the following functions are required to be async-cancel-safe by
|
||||
POSIX.1-2001 and POSIX.1-2008:
|
||||
.in +4n
|
||||
|
@ -257,10 +257,10 @@ If a thread is cancelable, its cancelability type is deferred,
|
|||
and a cancellation request is pending for the thread,
|
||||
then the thread is canceled when it calls a function
|
||||
that is a cancellation point.
|
||||
|
||||
.PP
|
||||
The following functions are required to be cancellation points by
|
||||
POSIX.1-2001 and/or POSIX.1-2008:
|
||||
|
||||
.PP
|
||||
.\" FIXME
|
||||
.\" Document the list of all functions that are cancellation points in glibc
|
||||
.in +4n
|
||||
|
@ -325,10 +325,10 @@ write()
|
|||
writev()
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
The following functions may be cancellation points according to
|
||||
POSIX.1-2001 and/or POSIX.1-2008:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
access()
|
||||
|
@ -558,7 +558,7 @@ wprintf()
|
|||
wscanf()
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
An implementation may also mark other functions
|
||||
not specified in the standard as cancellation points.
|
||||
In particular, an implementation is likely to mark
|
||||
|
@ -792,13 +792,13 @@ With NPTL, all of the threads in a process are placed
|
|||
in the same thread group;
|
||||
all members of a thread group share the same PID.
|
||||
NPTL does not employ a manager thread.
|
||||
|
||||
.PP
|
||||
NPTL makes internal use of the first two real-time signals;
|
||||
these signals cannot be used in applications.
|
||||
See
|
||||
.BR nptl (7)
|
||||
for further details.
|
||||
|
||||
.PP
|
||||
NPTL still has at least one nonconformance with POSIX.1:
|
||||
.IP \- 3
|
||||
Threads do not share a common nice value.
|
||||
|
@ -909,7 +909,7 @@ bash$ $( LD_ASSUME_KERNEL=2.2.5 ldd /bin/ls | grep libc.so | \\
|
|||
.BR nptl (7),
|
||||
.BR sigevent (7),
|
||||
.BR signal (7)
|
||||
|
||||
.PP
|
||||
Various Pthreads manual pages, for example:
|
||||
.BR pthread_attr_init (3),
|
||||
.BR pthread_atfork (3),
|
||||
|
|
10
man7/pty.7
10
man7/pty.7
|
@ -58,19 +58,19 @@ terminal emulators such as
|
|||
.BR unbuffer (1),
|
||||
and
|
||||
.BR expect (1).
|
||||
|
||||
.PP
|
||||
Data flow between master and slave is handled asynchronously,
|
||||
much like data flow with a physical terminal.
|
||||
Data written to the slave will be available at the master promptly,
|
||||
but may not be available immediately.
|
||||
Similarly, there may be a small processing delay between
|
||||
a write to the master, and the effect being visible at the slave.
|
||||
|
||||
.PP
|
||||
Historically, two pseudoterminal APIs have evolved: BSD and System V.
|
||||
SUSv1 standardized a pseudoterminal API based on the System V API,
|
||||
and this API should be employed in all new programs that use
|
||||
pseudoterminals.
|
||||
|
||||
.PP
|
||||
Linux provides both BSD-style and (standardized) System V-style
|
||||
pseudoterminals.
|
||||
System V-style terminals are commonly called UNIX 98 pseudoterminals
|
||||
|
@ -95,7 +95,7 @@ the name returned by
|
|||
.BR ptsname (3)
|
||||
in a call to
|
||||
.BR open (2).
|
||||
|
||||
.PP
|
||||
The Linux kernel imposes a limit on the number of available
|
||||
UNIX 98 pseudoterminals.
|
||||
In kernels up to and including 2.6.3, this limit is configured
|
||||
|
@ -149,7 +149,7 @@ A description of the
|
|||
.BR ioctl (2),
|
||||
which controls packet mode operation, can be found in
|
||||
.BR ioctl_tty (2).
|
||||
|
||||
.PP
|
||||
The BSD
|
||||
.BR ioctl (2)
|
||||
operations
|
||||
|
|
|
@ -36,7 +36,7 @@ The kernel random-number generator relies on entropy gathered from
|
|||
device drivers and other sources of environmental noise to seed
|
||||
a cryptographically secure pseudorandom number generator (CSPRNG).
|
||||
It is designed for security, rather than speed.
|
||||
|
||||
.PP
|
||||
The following interfaces provide access to output from the kernel CSPRNG:
|
||||
.IP * 3
|
||||
The
|
||||
|
@ -96,7 +96,7 @@ flag.
|
|||
The cryptographic algorithms used for the
|
||||
.IR urandom
|
||||
source are quite conservative, and so should be sufficient for all purposes.
|
||||
|
||||
.PP
|
||||
The disadvantage of
|
||||
.B GRND_RANDOM
|
||||
and reads from
|
||||
|
@ -213,7 +213,7 @@ or Diffie-Hellman private key has an effective key size of 128 bits
|
|||
(it requires about 2^128 operations to break) so a key generator
|
||||
needs only 128 bits (16 bytes) of seed material from
|
||||
.IR /dev/random .
|
||||
|
||||
.PP
|
||||
While some safety margin above that minimum is reasonable, as a guard
|
||||
against flaws in the CSPRNG algorithm, no cryptographic primitive
|
||||
available today can hope to promise more than 256 bits of security,
|
||||
|
|
104
man7/sched.7
104
man7/sched.7
|
@ -110,12 +110,12 @@ scheduling priority,
|
|||
.IR sched_priority .
|
||||
The scheduler makes its decisions based on knowledge of the scheduling
|
||||
policy and static priority of all threads on the system.
|
||||
|
||||
.PP
|
||||
For threads scheduled under one of the normal scheduling policies
|
||||
(\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
|
||||
\fIsched_priority\fP is not used in scheduling
|
||||
decisions (it must be specified as 0).
|
||||
|
||||
.PP
|
||||
Processes scheduled under one of the real-time policies
|
||||
(\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
|
||||
\fIsched_priority\fP value in the range 1 (low) to 99 (high).
|
||||
|
@ -129,17 +129,17 @@ Portable programs should use
|
|||
and
|
||||
.BR sched_get_priority_max (2)
|
||||
to find the range of priorities supported for a particular policy.
|
||||
|
||||
.PP
|
||||
Conceptually, the scheduler maintains a list of runnable
|
||||
threads for each possible \fIsched_priority\fP value.
|
||||
In order to determine which thread runs next, the scheduler looks for
|
||||
the nonempty list with the highest static priority and selects the
|
||||
thread at the head of this list.
|
||||
|
||||
.PP
|
||||
A thread's scheduling policy determines
|
||||
where it will be inserted into the list of threads
|
||||
with equal static priority and how it will move inside this list.
|
||||
|
||||
.PP
|
||||
All scheduling is preemptive: if a thread with a higher static
|
||||
priority becomes ready to run, the currently running thread
|
||||
will be preempted and
|
||||
|
@ -187,7 +187,7 @@ will be put at the end of the list.
|
|||
No other events will move a thread
|
||||
scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
|
||||
runnable threads with equal static priority.
|
||||
|
||||
.PP
|
||||
A \fBSCHED_FIFO\fP
|
||||
thread runs until either it is blocked by an I/O request, it is
|
||||
preempted by a higher priority thread, or it calls
|
||||
|
@ -223,7 +223,7 @@ one must use the Linux-specific
|
|||
and
|
||||
.BR sched_getattr (2)
|
||||
system calls.
|
||||
|
||||
.PP
|
||||
A sporadic task is one that has a sequence of jobs, where each
|
||||
job is activated at most once per period.
|
||||
Each job also has a
|
||||
|
@ -241,9 +241,9 @@ is the time at which a task starts its execution.
|
|||
The
|
||||
.I "absolute deadline"
|
||||
is thus obtained by adding the relative deadline to the arrival time.
|
||||
|
||||
.PP
|
||||
The following diagram clarifies these terms:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
arrival/wakeup absolute deadline
|
||||
|
@ -256,7 +256,7 @@ arrival/wakeup absolute deadline
|
|||
|<-------------- period ------------------->|
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
When setting a
|
||||
.B SCHED_DEADLINE
|
||||
policy for a thread using
|
||||
|
@ -273,7 +273,7 @@ Deadline to the relative deadline, and Period to the period of the task.
|
|||
Thus, for
|
||||
.BR SCHED_DEADLINE
|
||||
scheduling, we have:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
arrival/wakeup absolute deadline
|
||||
|
@ -286,7 +286,7 @@ arrival/wakeup absolute deadline
|
|||
|<-------------- Period ------------------->|
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
The three deadline-scheduling parameters correspond to the
|
||||
.IR sched_runtime ,
|
||||
.IR sched_deadline ,
|
||||
|
@ -304,11 +304,11 @@ If
|
|||
.IR sched_period
|
||||
is specified as 0, then it is made the same as
|
||||
.IR sched_deadline .
|
||||
|
||||
.PP
|
||||
The kernel requires that:
|
||||
|
||||
.PP
|
||||
sched_runtime <= sched_deadline <= sched_period
|
||||
|
||||
.PP
|
||||
.\" See __checkparam_dl in kernel/sched/core.c
|
||||
In addition, under the current implementation,
|
||||
all of the parameter values must be at least 1024
|
||||
|
@ -318,10 +318,10 @@ If any of these checks fails,
|
|||
.BR sched_setattr (2)
|
||||
fails with the error
|
||||
.BR EINVAL .
|
||||
|
||||
.PP
|
||||
The CBS guarantees non-interference between tasks, by throttling
|
||||
threads that attempt to over-run their specified Runtime.
|
||||
|
||||
.PP
|
||||
To ensure deadline scheduling guarantees,
|
||||
the kernel must prevent situations where the set of
|
||||
.B SCHED_DEADLINE
|
||||
|
@ -334,13 +334,13 @@ if it is not,
|
|||
.BR sched_setattr (2)
|
||||
fails with the error
|
||||
.BR EBUSY .
|
||||
|
||||
.PP
|
||||
For example, it is required (but not necessarily sufficient) for
|
||||
the total utilization to be less than or equal to the total number of
|
||||
CPUs available, where, since each thread can maximally run for
|
||||
Runtime per Period, that thread's utilization is its
|
||||
Runtime divided by its Period.
|
||||
|
||||
.PP
|
||||
In order to fulfill the guarantees that are made when
|
||||
a thread is admitted to the
|
||||
.BR SCHED_DEADLINE
|
||||
|
@ -351,7 +351,7 @@ system; if any
|
|||
.BR SCHED_DEADLINE
|
||||
thread is runnable,
|
||||
it will preempt any thread scheduled under one of the other policies.
|
||||
|
||||
.PP
|
||||
A call to
|
||||
.BR fork (2)
|
||||
by a thread scheduled under the
|
||||
|
@ -359,7 +359,7 @@ by a thread scheduled under the
|
|||
policy will fail with the error
|
||||
.BR EAGAIN ,
|
||||
unless the thread has its reset-on-fork flag set (see below).
|
||||
|
||||
.PP
|
||||
A
|
||||
.B SCHED_DEADLINE
|
||||
thread that calls
|
||||
|
@ -378,7 +378,7 @@ processes).
|
|||
\fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
|
||||
intended for all threads that do not require the special
|
||||
real-time mechanisms.
|
||||
|
||||
.PP
|
||||
The thread to run is chosen from the static
|
||||
priority 0 list based on a \fIdynamic\fP priority that is determined only
|
||||
inside this list.
|
||||
|
@ -401,12 +401,12 @@ The nice value can be modified using
|
|||
.BR setpriority (2),
|
||||
or
|
||||
.BR sched_setattr (2).
|
||||
|
||||
.PP
|
||||
According to POSIX.1, the nice value is a per-process attribute;
|
||||
that is, the threads in a process should share a nice value.
|
||||
However, on Linux, the nice value is a per-thread attribute:
|
||||
different threads in the same process may have different nice values.
|
||||
|
||||
.PP
|
||||
The range of the nice value
|
||||
varies across UNIX systems.
|
||||
On modern Linux, the range is \-20 (high priority) to +19 (low priority).
|
||||
|
@ -414,12 +414,12 @@ On some other systems, the range is \-20..20.
|
|||
Very early Linux kernels (Before Linux 2.0) had the range \-infinity..15.
|
||||
.\" Linux before 1.3.36 had \-infinity..15.
|
||||
.\" Since kernel 1.3.43, Linux has the range \-20..19.
|
||||
|
||||
.PP
|
||||
The degree to which the nice value affects the relative scheduling of
|
||||
.BR SCHED_OTHER
|
||||
processes likewise varies across UNIX systems and
|
||||
across Linux kernel versions.
|
||||
|
||||
.PP
|
||||
With the advent of the CFS scheduler in kernel 2.6.23,
|
||||
Linux adopted an algorithm that causes
|
||||
relative differences in nice values to have a much stronger effect.
|
||||
|
@ -431,14 +431,14 @@ to a process whenever there is any other
|
|||
higher priority load on the system,
|
||||
and makes high nice values (\-20) deliver most of the CPU to applications
|
||||
that require it (e.g., some audio applications).
|
||||
|
||||
.PP
|
||||
On Linux, the
|
||||
.BR RLIMIT_NICE
|
||||
resource limit can be used to define a limit to which
|
||||
an unprivileged process's nice value can be raised; see
|
||||
.BR setrlimit (2)
|
||||
for details.
|
||||
|
||||
.PP
|
||||
For further details on the nice value, see the subsections on
|
||||
the autogroup feature and group scheduling, below.
|
||||
.\"
|
||||
|
@ -454,7 +454,7 @@ that the thread is CPU-intensive.
|
|||
Consequently, the scheduler will apply a small scheduling
|
||||
penalty with respect to wakeup behavior,
|
||||
so that this thread is mildly disfavored in scheduling decisions.
|
||||
|
||||
.PP
|
||||
.\" The following paragraph is drawn largely from the text that
|
||||
.\" accompanied Ingo Molnar's patch for the implementation of
|
||||
.\" SCHED_BATCH.
|
||||
|
@ -468,7 +468,7 @@ interactivity causing extra preemptions (between the workload's tasks).
|
|||
(Since Linux 2.6.23.)
|
||||
\fBSCHED_IDLE\fP can be used only at static priority 0;
|
||||
the process nice value has no influence for this policy.
|
||||
|
||||
.PP
|
||||
This policy is intended for running jobs at extremely low
|
||||
priority (lower even than a +19 nice value with the
|
||||
.B SCHED_OTHER
|
||||
|
@ -504,14 +504,14 @@ The state of the reset-on-fork flag can analogously be retrieved using
|
|||
.BR sched_getscheduler (2)
|
||||
and
|
||||
.BR sched_getattr (2).
|
||||
|
||||
.PP
|
||||
The reset-on-fork feature is intended for media-playback applications,
|
||||
and can be used to prevent applications evading the
|
||||
.BR RLIMIT_RTTIME
|
||||
resource limit (see
|
||||
.BR getrlimit (2))
|
||||
by creating multiple child processes.
|
||||
|
||||
.PP
|
||||
More precisely, if the reset-on-fork flag is set,
|
||||
the following rules apply for subsequently created children:
|
||||
.IP * 3
|
||||
|
@ -545,13 +545,13 @@ matches the real or effective user ID of the target thread
|
|||
(i.e., the thread specified by
|
||||
.IR pid )
|
||||
whose policy is being changed.
|
||||
|
||||
.PP
|
||||
A thread must be privileged
|
||||
.RB ( CAP_SYS_NICE )
|
||||
in order to set or modify a
|
||||
.BR SCHED_DEADLINE
|
||||
policy.
|
||||
|
||||
.PP
|
||||
Since Linux 2.6.12, the
|
||||
.B RLIMIT_RTPRIO
|
||||
resource limit defines a ceiling on an unprivileged thread's
|
||||
|
@ -622,7 +622,7 @@ process from freezing the system was to run (at the console)
|
|||
a shell scheduled under a higher static priority than the tested application.
|
||||
This allows an emergency kill of tested
|
||||
real-time applications that do not block or terminate as expected.
|
||||
|
||||
.PP
|
||||
Since Linux 2.6.25, there are other techniques for dealing with runaway
|
||||
real-time and deadline processes.
|
||||
One of these is to use the
|
||||
|
@ -632,7 +632,7 @@ a real-time process may consume.
|
|||
See
|
||||
.BR getrlimit (2)
|
||||
for details.
|
||||
|
||||
.PP
|
||||
Since version 2.6.25, Linux also provides two
|
||||
.I /proc
|
||||
files that can be used to reserve a certain amount of CPU time
|
||||
|
@ -675,7 +675,7 @@ Child processes inherit the scheduling policy and parameters across a
|
|||
.BR fork (2).
|
||||
The scheduling policy and parameters are preserved across
|
||||
.BR execve (2).
|
||||
|
||||
.PP
|
||||
Memory locking is usually needed for real-time processes to avoid
|
||||
paging delays; this can be done with
|
||||
.BR mlock (2)
|
||||
|
@ -692,7 +692,7 @@ parallel build processes (i.e., the
|
|||
.BR make (1)
|
||||
.BR \-j
|
||||
flag).
|
||||
|
||||
.PP
|
||||
This feature operates in conjunction with the
|
||||
CFS scheduler and requires a kernel that is configured with
|
||||
.BR CONFIG_SCHED_AUTOGROUP .
|
||||
|
@ -702,7 +702,7 @@ a value of 0 disables the feature, while a value of 1 enables it.
|
|||
The default value in this file is 1, unless the kernel was booted with the
|
||||
.IR noautogroup
|
||||
parameter.
|
||||
|
||||
.PP
|
||||
A new autogroup is created when a new session is created via
|
||||
.BR setsid (2);
|
||||
this happens, for example, when a new terminal window is started.
|
||||
|
@ -712,14 +712,14 @@ inherits its parent's autogroup membership.
|
|||
Thus, all of the processes in a session are members of the same autogroup.
|
||||
An autogroup is automatically destroyed when the last process
|
||||
in the group terminates.
|
||||
|
||||
.PP
|
||||
When autogrouping is enabled, all of the members of an autogroup
|
||||
are placed in the same kernel scheduler "task group".
|
||||
The CFS scheduler employs an algorithm that equalizes the
|
||||
distribution of CPU cycles across task groups.
|
||||
The benefits of this for interactive desktop performance
|
||||
can be described via the following example.
|
||||
|
||||
.PP
|
||||
Suppose that there are two autogroups competing for the same CPU
|
||||
(i.e., presume either a single CPU system or the use of
|
||||
.BR taskset (1)
|
||||
|
@ -750,17 +750,17 @@ the scheduler distributes CPU cycles across task groups such that
|
|||
an autogroup that contains a large number of CPU-bound processes
|
||||
does not end up hogging CPU cycles at the expense of the other
|
||||
jobs on the system.
|
||||
|
||||
.PP
|
||||
A process's autogroup (task group) membership can be viewed via the file
|
||||
.IR /proc/[pid]/autogroup :
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
$ \fBcat /proc/1/autogroup\fP
|
||||
/autogroup-1 nice 0
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
This file can also be used to modify the CPU bandwidth allocated
|
||||
to an autogroup.
|
||||
This is done by writing a number in the "nice" range to the file
|
||||
|
@ -782,7 +782,7 @@ to fail with the error
|
|||
.\" A patch was posted on 23 Nov 2016
|
||||
.\" ("sched/autogroup: Fix 64bit kernel nice adjustment";
|
||||
.\" check later to see in which kernel version it lands.
|
||||
|
||||
.PP
|
||||
The autogroup nice setting has the same meaning as the process nice value,
|
||||
but applies to distribution of CPU cycles to the autogroup as a whole,
|
||||
based on the relative nice values of other autogroups.
|
||||
|
@ -791,12 +791,12 @@ will be a product of the autogroup's nice value
|
|||
(compared to other autogroups)
|
||||
and the process's nice value
|
||||
(compared to other processes in the same autogroup.
|
||||
|
||||
.PP
|
||||
The use of the
|
||||
.BR cgroups (7)
|
||||
CPU controller to place processes in cgroups other than the
|
||||
root CPU cgroup overrides the effect of autogrouping.
|
||||
|
||||
.PP
|
||||
The autogroup feature groups only processes scheduled under
|
||||
non-real-time policies
|
||||
.RB ( SCHED_OTHER ,
|
||||
|
@ -817,7 +817,7 @@ policies), the CFS scheduler employs a technique known as "group scheduling",
|
|||
if the kernel was configured with the
|
||||
.BR CONFIG_FAIR_GROUP_SCHED
|
||||
option (which is typical).
|
||||
|
||||
.PP
|
||||
Under group scheduling, threads are scheduled in "task groups".
|
||||
Task groups have a hierarchical relationship,
|
||||
rooted under the initial task group on the system,
|
||||
|
@ -861,7 +861,7 @@ or
|
|||
on a process has an effect only for scheduling relative
|
||||
to other processes executed in the same session
|
||||
(typically: the same terminal window).
|
||||
|
||||
.PP
|
||||
Conversely, for two processes that are (for example)
|
||||
the sole CPU-bound processes in different sessions
|
||||
(e.g., different terminal windows,
|
||||
|
@ -877,7 +877,7 @@ A possibly useful workaround here is to use a command such as
|
|||
the following to modify the autogroup nice value for
|
||||
.I all
|
||||
of the processes in a terminal session:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
$ \fBecho 10 > /proc/self/autogroup\fP
|
||||
|
@ -905,7 +905,7 @@ patch-\fIkernelversion\fP-rt\fIpatchversion\fP
|
|||
and can be downloaded from
|
||||
.UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
|
||||
.UE .
|
||||
|
||||
.PP
|
||||
Without the patches and prior to their full inclusion into the mainline
|
||||
kernel, the kernel configuration offers only the three preemption classes
|
||||
.BR CONFIG_PREEMPT_NONE ,
|
||||
|
@ -914,7 +914,7 @@ and
|
|||
.B CONFIG_PREEMPT_DESKTOP
|
||||
which respectively provide no, some, and considerable
|
||||
reduction of the worst-case scheduling latency.
|
||||
|
||||
.PP
|
||||
With the patches applied or after their full inclusion into the mainline
|
||||
kernel, the additional configuration item
|
||||
.B CONFIG_PREEMPT_RT
|
||||
|
|
|
@ -28,7 +28,7 @@
|
|||
sem_overview \- overview of POSIX semaphores
|
||||
.SH DESCRIPTION
|
||||
POSIX semaphores allow processes and threads to synchronize their actions.
|
||||
|
||||
.PP
|
||||
A semaphore is an integer whose value is never allowed to fall below zero.
|
||||
Two operations can be performed on semaphores:
|
||||
increment the semaphore value by one
|
||||
|
@ -38,7 +38,7 @@ and decrement the semaphore value by one
|
|||
If the value of a semaphore is currently zero, then a
|
||||
.BR sem_wait (3)
|
||||
operation will block until the value becomes greater than zero.
|
||||
|
||||
.PP
|
||||
POSIX semaphores come in two forms: named semaphores and
|
||||
unnamed semaphores.
|
||||
.TP
|
||||
|
@ -61,7 +61,7 @@ followed by one or more characters, none of which are slashes.
|
|||
Two processes can operate on the same named semaphore by passing
|
||||
the same name to
|
||||
.BR sem_open (3).
|
||||
|
||||
.IP
|
||||
The
|
||||
.BR sem_open (3)
|
||||
function creates a new named semaphore or opens an existing
|
||||
|
@ -91,7 +91,7 @@ A process-shared semaphore must be placed in a shared memory region
|
|||
.BR shmget (2),
|
||||
or a POSIX shared memory object built created using
|
||||
.BR shm_open (3)).
|
||||
|
||||
.IP
|
||||
Before being used, an unnamed semaphore must be initialized using
|
||||
.BR sem_init (3).
|
||||
It can then be operated on using
|
||||
|
@ -132,7 +132,7 @@ with names of the form
|
|||
rather than
|
||||
.B NAME_MAX
|
||||
characters.)
|
||||
|
||||
.PP
|
||||
Since Linux 2.6.19, ACLs can be placed on files under this directory,
|
||||
to control object permissions on a per-user and per-group basis.
|
||||
.SH NOTES
|
||||
|
|
|
@ -22,17 +22,17 @@ Optionally, PAM may revoke the session keyring on logout.
|
|||
(In typical configurations, PAM does do this revocation.)
|
||||
The session keyring has the name (description)
|
||||
.IR _ses .
|
||||
|
||||
.PP
|
||||
A special serial number value,
|
||||
.BR KEY_SPEC_SESSION_KEYRING ,
|
||||
is defined that can be used in lieu of the actual serial number of
|
||||
the calling process's session keyring.
|
||||
|
||||
.PP
|
||||
From the
|
||||
.BR keyctl (1)
|
||||
utility, '\fB@s\fP' can be used instead of a numeric key ID in
|
||||
much the same way.
|
||||
|
||||
.PP
|
||||
A process's session keyring is inherited across
|
||||
.BR clone (2),
|
||||
.BR fork (2),
|
||||
|
@ -44,7 +44,7 @@ is preserved across
|
|||
even when the executable is set-user-ID or set-group-ID or has capabilities.
|
||||
The session keyring is destroyed when the last process that
|
||||
refers to it exits.
|
||||
|
||||
.PP
|
||||
If a process doesn't have a session keyring when it is accessed, then,
|
||||
under certain circumstances, the
|
||||
.BR user-session-keyring (7)
|
||||
|
@ -84,7 +84,7 @@ operation.)
|
|||
These operations are also exposed through the
|
||||
.BR keyctl (1)
|
||||
utility as:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
keyctl session
|
||||
|
@ -92,9 +92,9 @@ keyctl session - [<prog> <arg1> <arg2> ...]
|
|||
keyctl session <name> [<prog> <arg1> <arg2> ...]
|
||||
.in
|
||||
.fi
|
||||
|
||||
.PP
|
||||
and:
|
||||
|
||||
.PP
|
||||
.nf
|
||||
.in +4n
|
||||
keyctl new_session
|
||||
|
|
|
@ -30,7 +30,7 @@ shm_overview \- overview of POSIX shared memory
|
|||
.SH DESCRIPTION
|
||||
The POSIX shared memory API allows processes to communicate information
|
||||
by sharing a region of memory.
|
||||
|
||||
.PP
|
||||
The interfaces employed in the API are:
|
||||
.TP 15
|
||||
.BR shm_open (3)
|
||||
|
@ -101,7 +101,7 @@ to control the permissions of objects in the virtual filesystem.
|
|||
.SH NOTES
|
||||
Typically, processes must synchronize their access to a shared
|
||||
memory object, using, for example, POSIX semaphores.
|
||||
|
||||
.PP
|
||||
System V shared memory
|
||||
.RB ( shmget (2),
|
||||
.BR shmop (2),
|
||||
|
|
|
@ -34,13 +34,13 @@ Many functions are
|
|||
async-signal-safe.
|
||||
In particular,
|
||||
nonreentrant functions are generally unsafe to call from a signal handler.
|
||||
|
||||
.PP
|
||||
The kinds of issues that render a function
|
||||
unsafe can be quickly understood when one considers
|
||||
the implementation of the
|
||||
.I stdio
|
||||
library, all of whose functions are not async-signal-safe.
|
||||
|
||||
.PP
|
||||
When performing buffered I/O on a file, the
|
||||
.I stdio
|
||||
functions must maintain a statically allocated data buffer
|
||||
|
@ -57,7 +57,7 @@ the program is interrupted by a signal handler that also calls
|
|||
then the second call to
|
||||
.BR printf (3)
|
||||
will operate on inconsistent data, with unpredictable results.
|
||||
|
||||
.PP
|
||||
To avoid problems with unsafe functions, there are two possible choices:
|
||||
.IP 1. 3
|
||||
Ensure that
|
||||
|
@ -72,7 +72,7 @@ by the signal handler.
|
|||
.PP
|
||||
Generally, the second choice is difficult in programs of any complexity,
|
||||
so the first choice is taken.
|
||||
|
||||
.PP
|
||||
POSIX.1 specifies a set of functions that an implementation
|
||||
must make async-signal-safe.
|
||||
(An implementation may provide safe implementations of additional functions,
|
||||
|
@ -81,13 +81,13 @@ may not provide the same guarantees.)
|
|||
In general, a function is async-signal-safe either because it is reentrant
|
||||
or because it is atomic with respect to signals
|
||||
(i.e., its execution can't be interrupted by a signal handler).
|
||||
|
||||
.PP
|
||||
The set of functions required to be async-signal-safe by POSIX.1
|
||||
is shown in the following table.
|
||||
The functions not otherwise noted were required to be async-signal-safe
|
||||
in POSIX.1-2001;
|
||||
the table details changes in the subsequent standards.
|
||||
|
||||
.PP
|
||||
.TS
|
||||
lb lb
|
||||
l l.
|
||||
|
@ -284,7 +284,7 @@ Function Notes
|
|||
\fBwmemset\fP(3) Added in POSIX.1-2016
|
||||
\fBwrite\fP(2)
|
||||
.TE
|
||||
|
||||
.sp 1
|
||||
Notes:
|
||||
.IP * 3
|
||||
POSIX.1-2001 and POSIX.1-2004 required the functions
|
||||
|
|
|
@ -54,7 +54,7 @@ Each signal has a current
|
|||
.IR disposition ,
|
||||
which determines how the process behaves when it is delivered
|
||||
the signal.
|
||||
|
||||
.PP
|
||||
The entries in the "Action" column of the tables below specify
|
||||
the default disposition for each signal, as follows:
|
||||
.IP Term
|
||||
|
@ -90,11 +90,11 @@ It is possible to arrange that the signal handler
|
|||
uses an alternate stack; see
|
||||
.BR sigaltstack (2)
|
||||
for a discussion of how to do this and when it might be useful.)
|
||||
|
||||
.PP
|
||||
The signal disposition is a per-process attribute:
|
||||
in a multithreaded application, the disposition of a
|
||||
particular signal is the same for all threads.
|
||||
|
||||
.PP
|
||||
A child created via
|
||||
.BR fork (2)
|
||||
inherits a copy of its parent's signal dispositions.
|
||||
|
@ -174,7 +174,7 @@ which means that it will not be delivered until it is later unblocked.
|
|||
Between the time when it is generated and when it is delivered
|
||||
a signal is said to be
|
||||
.IR pending .
|
||||
|
||||
.PP
|
||||
Each thread in a process has an independent
|
||||
.IR "signal mask" ,
|
||||
which indicates the set of signals that the thread is currently blocking.
|
||||
|
@ -183,13 +183,13 @@ A thread can manipulate its signal mask using
|
|||
In a traditional single-threaded application,
|
||||
.BR sigprocmask (2)
|
||||
can be used to manipulate the signal mask.
|
||||
|
||||
.PP
|
||||
A child created via
|
||||
.BR fork (2)
|
||||
inherits a copy of its parent's signal mask;
|
||||
the signal mask is preserved across
|
||||
.BR execve (2).
|
||||
|
||||
.PP
|
||||
A signal may be generated (and thus pending)
|
||||
for a process as a whole (e.g., when sent using
|
||||
.BR kill (2))
|
||||
|
@ -206,14 +206,14 @@ A process-directed signal may be delivered to any one of the
|
|||
threads that does not currently have the signal blocked.
|
||||
If more than one of the threads has the signal unblocked, then the
|
||||
kernel chooses an arbitrary thread to which to deliver the signal.
|
||||
|
||||
.PP
|
||||
A thread can obtain the set of signals that it currently has pending
|
||||
using
|
||||
.BR sigpending (2).
|
||||
This set will consist of the union of the set of pending
|
||||
process-directed signals and the set of signals pending for
|
||||
the calling thread.
|
||||
|
||||
.PP
|
||||
A child created via
|
||||
.BR fork (2)
|
||||
initially has an empty pending signal set;
|
||||
|
@ -231,7 +231,7 @@ and the last one for mips.
|
|||
.I not
|
||||
shown; see the Linux kernel source for signal numbering on that architecture.)
|
||||
A dash (\-) denotes that a signal is absent on the corresponding architecture.
|
||||
|
||||
.PP
|
||||
First the signals described in the original POSIX.1-1990 standard.
|
||||
.TS
|
||||
l c c l
|
||||
|
@ -260,13 +260,13 @@ SIGTSTP 18,20,24 Stop Stop typed at terminal
|
|||
SIGTTIN 21,21,26 Stop Terminal input for background process
|
||||
SIGTTOU 22,22,27 Stop Terminal output for background process
|
||||
.TE
|
||||
|
||||
.sp 1
|
||||
The signals
|
||||
.B SIGKILL
|
||||
and
|
||||
.B SIGSTOP
|
||||
cannot be caught, blocked, or ignored.
|
||||
|
||||
.PP
|
||||
Next the signals not in the POSIX.1-1990 standard but described in
|
||||
SUSv2 and POSIX.1-2001.
|
||||
.TS
|
||||
|
@ -288,7 +288,7 @@ SIGXCPU 24,24,30 Core CPU time limit exceeded (4.2BSD);
|
|||
SIGXFSZ 25,25,31 Core File size limit exceeded (4.2BSD);
|
||||
see \fBsetrlimit\fP(2)
|
||||
.TE
|
||||
|
||||
.sp 1
|
||||
Up to and including Linux 2.2, the default behavior for
|
||||
.BR SIGSYS ", " SIGXCPU ", " SIGXFSZ ", "
|
||||
and (on architectures other than SPARC and MIPS)
|
||||
|
@ -299,7 +299,7 @@ was to terminate the process (without a core dump).
|
|||
is to terminate the process without a core dump.)
|
||||
Linux 2.4 conforms to the POSIX.1-2001 requirements for these signals,
|
||||
terminating the process with a core dump.
|
||||
|
||||
.PP
|
||||
Next various other signals.
|
||||
.TS
|
||||
l c c l
|
||||
|
@ -317,7 +317,7 @@ SIGLOST \-,\-,\- Term File lock lost (unused)
|
|||
SIGWINCH 28,28,20 Ign Window resize signal (4.3BSD, Sun)
|
||||
SIGUNUSED \-,31,\- Core Synonymous with \fBSIGSYS\fP
|
||||
.TE
|
||||
|
||||
.sp 1
|
||||
(Signal 29 is
|
||||
.B SIGINFO
|
||||
/
|
||||
|
@ -325,21 +325,21 @@ SIGUNUSED \-,31,\- Core Synonymous with \fBSIGSYS\fP
|
|||
on an alpha but
|
||||
.B SIGLOST
|
||||
on a sparc.)
|
||||
|
||||
.PP
|
||||
.B SIGEMT
|
||||
is not specified in POSIX.1-2001, but nevertheless appears
|
||||
on most other UNIX systems,
|
||||
where its default action is typically to terminate
|
||||
the process with a core dump.
|
||||
|
||||
.PP
|
||||
.B SIGPWR
|
||||
(which is not specified in POSIX.1-2001) is typically ignored
|
||||
by default on those other UNIX systems where it appears.
|
||||
|
||||
.PP
|
||||
.B SIGIO
|
||||
(which is not specified in POSIX.1-2001) is ignored by default
|
||||
on several other UNIX systems.
|
||||
|
||||
.PP
|
||||
Where defined,
|
||||
.B SIGUNUSED
|
||||
is synonymous with
|
||||
|
@ -452,7 +452,7 @@ resource limit, which specifies a per-user limit for queued
|
|||
signals; see
|
||||
.BR setrlimit (2)
|
||||
for further details.
|
||||
|
||||
.PP
|
||||
The addition of real-time signals required the widening
|
||||
of the signal set structure
|
||||
.RI ( sigset_t )
|
||||
|
@ -488,7 +488,7 @@ flag (see
|
|||
.BR sigaction (2)).
|
||||
The details vary across UNIX systems;
|
||||
below, the details for Linux.
|
||||
|
||||
.PP
|
||||
If a blocked call to one of the following interfaces is interrupted
|
||||
by a signal handler, then the call will be automatically restarted
|
||||
after the signal handler returns if the
|
||||
|
@ -674,7 +674,7 @@ and then resumed via
|
|||
.BR SIGCONT .
|
||||
This behavior is not sanctioned by POSIX.1, and doesn't occur
|
||||
on other systems.
|
||||
|
||||
.PP
|
||||
The Linux interfaces that display this behavior are:
|
||||
.IP * 2
|
||||
"Input" socket interfaces, when a timeout
|
||||
|
|
56
man7/spufs.7
56
man7/spufs.7
|
@ -31,7 +31,7 @@ spufs \- SPU filesystem
|
|||
The SPU filesystem is used on PowerPC machines that implement the
|
||||
Cell Broadband Engine Architecture in order to access Synergistic
|
||||
Processor Units (SPUs).
|
||||
|
||||
.PP
|
||||
The filesystem provides a name space similar to POSIX shared
|
||||
memory or message queues.
|
||||
Users that have write permissions
|
||||
|
@ -40,7 +40,7 @@ on the filesystem can use
|
|||
to establish SPU contexts under the
|
||||
.B spufs
|
||||
root directory.
|
||||
|
||||
.PP
|
||||
Every SPU context is represented by a directory containing
|
||||
a predefined set of files.
|
||||
These files can be
|
||||
|
@ -72,7 +72,7 @@ supported on regular filesystems.
|
|||
This list details the supported
|
||||
operations and the deviations from the standard behavior described
|
||||
in the respective man pages.
|
||||
|
||||
.PP
|
||||
All files that support the
|
||||
.BR read (2)
|
||||
operation also support
|
||||
|
@ -94,7 +94,7 @@ structure that contain reliable information are
|
|||
.IR st_uid ,
|
||||
and
|
||||
.IR st_gid .
|
||||
|
||||
.PP
|
||||
All files support the
|
||||
.BR chmod (2)/ fchmod (2)
|
||||
and
|
||||
|
@ -103,7 +103,7 @@ operations, but will not be able to grant permissions that contradict
|
|||
the possible operations (e.g., read access on the
|
||||
.I wbox
|
||||
file).
|
||||
|
||||
.PP
|
||||
The current set of files is:
|
||||
.TP
|
||||
.I /capabilities
|
||||
|
@ -158,11 +158,11 @@ This file contains the 128-bit values of each register,
|
|||
from register 0 to register 127, in order.
|
||||
This allows the general-purpose registers to be
|
||||
inspected for debugging.
|
||||
|
||||
.IP
|
||||
Reading to or writing from this file requires that the context is
|
||||
scheduled out, so use of this file is not recommended in normal
|
||||
program operation.
|
||||
|
||||
.IP
|
||||
The
|
||||
.I regs
|
||||
file is not present on contexts that have been created with the
|
||||
|
@ -214,7 +214,7 @@ Also,
|
|||
.BR poll (2)
|
||||
and similar system calls can be used to monitor for the presence
|
||||
of mailbox data.
|
||||
|
||||
.IP
|
||||
The possible operations on an open
|
||||
.I ibox
|
||||
file are:
|
||||
|
@ -236,7 +236,7 @@ the return value is set to \-1 and
|
|||
.I errno
|
||||
is set to
|
||||
.BR EAGAIN .
|
||||
|
||||
.IP
|
||||
If there is no data available in the mailbox and the file
|
||||
descriptor has been opened without
|
||||
.BR O_NONBLOCK ,
|
||||
|
@ -283,7 +283,7 @@ value is set to \-1 and
|
|||
.I errno
|
||||
is set to
|
||||
.BR EAGAIN .
|
||||
|
||||
.IP
|
||||
If there is no space available in the mailbox and the file
|
||||
descriptor has been opened without
|
||||
.BR O_NONBLOCK ,
|
||||
|
@ -385,7 +385,7 @@ If the register value is larger than the buffer passed to the
|
|||
.BR read (2)
|
||||
system call, subsequent reads will continue reading from the same
|
||||
buffer, until the end of the buffer is reached.
|
||||
|
||||
.IP
|
||||
When a complete string has been read, all subsequent read operations
|
||||
will return zero bytes and a new file descriptor needs to be opened
|
||||
to read a new value.
|
||||
|
@ -399,7 +399,7 @@ The string is parsed from the beginning
|
|||
until the first nonnumeric character or the end of the buffer.
|
||||
Subsequent writes to the same file descriptor overwrite the
|
||||
previous setting.
|
||||
|
||||
.IP
|
||||
Except for the
|
||||
.I npc
|
||||
file, these files are not present on contexts that have been created with
|
||||
|
@ -554,7 +554,7 @@ The
|
|||
and
|
||||
.I wbox_stat
|
||||
files contain the available message count.
|
||||
|
||||
.IP
|
||||
The
|
||||
.I wbox_info
|
||||
file contains an array of four-byte mailbox messages, which have been
|
||||
|
@ -563,12 +563,12 @@ With current CBEA machines, the array is four items in
|
|||
length, so up to 4 * 4 = 16 bytes can be read from this file.
|
||||
If any mailbox queue entry is empty,
|
||||
then the bytes read at the corresponding location are undefined.
|
||||
|
||||
.IP
|
||||
The
|
||||
.I dma_info
|
||||
file contains the contents of the SPU MFC DMA queue, represented as the
|
||||
following structure:
|
||||
|
||||
.IP
|
||||
.in +4n
|
||||
.nf
|
||||
struct spu_dma_info {
|
||||
|
@ -581,13 +581,13 @@ struct spu_dma_info {
|
|||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.IP
|
||||
The last member of this data structure is the actual DMA queue,
|
||||
containing 16 entries.
|
||||
The
|
||||
.I mfc_cq_sr
|
||||
structure is defined as:
|
||||
|
||||
.IP
|
||||
.in +4n
|
||||
.nf
|
||||
struct mfc_cq_sr {
|
||||
|
@ -598,13 +598,13 @@ struct mfc_cq_sr {
|
|||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.IP
|
||||
The
|
||||
.I proxydma_info
|
||||
file contains similar information, but describes the proxy DMA queue
|
||||
(i.e., DMAs initiated by entities outside the SPU) instead.
|
||||
The file is in the following format:
|
||||
|
||||
.IP
|
||||
.in +4n
|
||||
.nf
|
||||
struct spu_proxydma_info {
|
||||
|
@ -615,11 +615,11 @@ struct spu_proxydma_info {
|
|||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.IP
|
||||
Accessing these files requires that the SPU context is scheduled out -
|
||||
frequent use can be inefficient.
|
||||
These files should not be used for normal program operation.
|
||||
|
||||
.IP
|
||||
These files are not present on contexts that have been created with the
|
||||
.B SPU_CREATE_NOSCHED
|
||||
flag.
|
||||
|
@ -653,7 +653,7 @@ The following operations are supported:
|
|||
.BR write (2)
|
||||
Writes to this file need to be in the format of a MFC DMA command,
|
||||
defined as follows:
|
||||
|
||||
.IP
|
||||
.in +4n
|
||||
.nf
|
||||
struct mfc_dma_command {
|
||||
|
@ -667,7 +667,7 @@ struct mfc_dma_command {
|
|||
};
|
||||
.fi
|
||||
.in
|
||||
|
||||
.IP
|
||||
Writes are required to be exactly
|
||||
.I sizeof(struct mfc_dma_command)
|
||||
bytes in size.
|
||||
|
@ -695,13 +695,13 @@ or until a previously started DMA
|
|||
(by checking for
|
||||
.BR POLLIN )
|
||||
has been completed.
|
||||
|
||||
.IP
|
||||
.I /mss
|
||||
Provides access to the MFC MultiSource Synchronization (MSS) facility.
|
||||
By
|
||||
.BR mmap (2)-ing
|
||||
this file, processes can access the MSS area of the SPU.
|
||||
|
||||
.IP
|
||||
The following operations are supported:
|
||||
.TP
|
||||
.BR mmap (2)
|
||||
|
@ -719,7 +719,7 @@ Provides access to the whole problem-state mapping of the SPU.
|
|||
Applications can use this area to interface to the SPU, rather than
|
||||
writing to individual register files in
|
||||
.BR spufs .
|
||||
|
||||
.IP
|
||||
The following operations are supported:
|
||||
.RS
|
||||
.TP
|
||||
|
@ -737,7 +737,7 @@ Read-only file containing the physical SPU number that the SPU context
|
|||
is running on.
|
||||
When the context is not running, this file contains the
|
||||
string "\-1".
|
||||
|
||||
.IP
|
||||
The physical SPU number is given by an ASCII hex string.
|
||||
.TP
|
||||
.I /object-id
|
||||
|
@ -768,5 +768,5 @@ none /spu spufs gid=spu 0 0
|
|||
.BR spu_create (2),
|
||||
.BR spu_run (2),
|
||||
.BR capabilities (7)
|
||||
|
||||
.PP
|
||||
.I The Cell Broadband Engine Architecture (CBEA) specification
|
||||
|
|
|
@ -43,7 +43,7 @@ released by the University of California at Berkeley.
|
|||
This was the first Berkeley release that contained a TCP/IP
|
||||
stack and the sockets API.
|
||||
4.2BSD was released in 1983.
|
||||
|
||||
.IP
|
||||
Earlier major BSD releases included
|
||||
.IR 3BSD
|
||||
(1980),
|
||||
|
@ -200,7 +200,7 @@ The standard is available online at
|
|||
.UE ,
|
||||
and the interfaces that it describes are also available in the Linux
|
||||
manual pages package under sections 1p and 3p (e.g., "man 3p open").
|
||||
|
||||
.IP
|
||||
The standard defines two levels of conformance:
|
||||
.IR "POSIX conformance" ,
|
||||
which is a baseline set of interfaces required of a conforming system;
|
||||
|
@ -213,27 +213,27 @@ XSI-conformant systems can be branded
|
|||
(XSI conformance constitutes the
|
||||
.I "Single UNIX Specification version 3"
|
||||
.RI ( SUSv3 ).)
|
||||
|
||||
.IP
|
||||
The POSIX.1-2001 document is broken into four parts:
|
||||
|
||||
.IP
|
||||
.BR XBD :
|
||||
Definitions, terms and concepts, header file specifications.
|
||||
|
||||
.IP
|
||||
.BR XSH :
|
||||
Specifications of functions (i.e., system calls and library
|
||||
functions in actual implementations).
|
||||
|
||||
.IP
|
||||
.BR XCU :
|
||||
Specifications of commands and utilities
|
||||
(i.e., the area formerly described by POSIX.2).
|
||||
|
||||
.IP
|
||||
.BR XRAT :
|
||||
Informative text on the other parts of the standard.
|
||||
|
||||
.IP
|
||||
POSIX.1-2001 is aligned with C99, so that all of the
|
||||
library functions standardized in C99 are also
|
||||
standardized in POSIX.1-2001.
|
||||
|
||||
.IP
|
||||
Two Technical Corrigenda (minor fixes and improvements)
|
||||
of the original 2001 standard have occurred:
|
||||
TC1 in 2003 (also known as
|
||||
|
@ -244,7 +244,7 @@ and TC2 in 2004 (also known as
|
|||
.B POSIX.1-2008, SUSv4
|
||||
Work on the next revision of POSIX.1/SUS was completed and
|
||||
ratified in 2008.
|
||||
|
||||
.IP
|
||||
The changes in this revision are not as large as those
|
||||
that occurred for POSIX.1-2001/SUSv3,
|
||||
but a number of new interfaces are added
|
||||
|
@ -253,7 +253,7 @@ Many of the interfaces that were optional in
|
|||
POSIX.1-2001 become mandatory in the 2008 revision of the standard.
|
||||
A few interfaces that are present in POSIX.1-2001 are marked
|
||||
as obsolete in POSIX.1-2008, or removed from the standard altogether.
|
||||
|
||||
.IP
|
||||
The revised standard is broken into the same four parts as POSIX.1-2001,
|
||||
and again there are two levels of conformance: the baseline
|
||||
.IR "POSIX Conformance" ,
|
||||
|
@ -261,20 +261,20 @@ and
|
|||
.IR "XSI Conformance" ,
|
||||
which mandates an additional set of interfaces
|
||||
beyond those in the base specification.
|
||||
|
||||
.IP
|
||||
In general, where the CONFORMING TO section of a manual page
|
||||
lists POSIX.1-2001, it can be assumed that the interface also
|
||||
conforms to POSIX.1-2008, unless otherwise noted.
|
||||
|
||||
.IP
|
||||
Technical Corrigendum 1 (minor fixes and improvements)
|
||||
of this standard was released in 2013
|
||||
(also known as
|
||||
.IR POSIX.1-2013 ).
|
||||
|
||||
.IP
|
||||
Technical Corrigendum 2 of this standard was released in 2016
|
||||
(also known as
|
||||
.IR POSIX.1-2016 ).
|
||||
|
||||
.IP
|
||||
Further information can be found on the Austin Group web site,
|
||||
.UR http://www.opengroup.org\:/austin/
|
||||
.UE .
|
||||
|
|
|
@ -41,7 +41,7 @@ symlink \- symbolic link handling
|
|||
Symbolic links are files that act as pointers to other files.
|
||||
To understand their behavior, you must first understand how hard links
|
||||
work.
|
||||
|
||||
.PP
|
||||
A hard link to a file is indistinguishable from the original file because
|
||||
it is a reference to the object underlying the original filename.
|
||||
(To be precise: each of the hard links to a file is a reference to
|
||||
|
@ -57,7 +57,7 @@ Hard links may not refer to directories
|
|||
which would confuse many programs)
|
||||
and may not refer to files on different filesystems
|
||||
(because inode numbers are not unique across filesystems).
|
||||
|
||||
.PP
|
||||
A symbolic link is a special type of file whose contents are a string
|
||||
that is the pathname of another file, the file to which the link refers.
|
||||
(The contents of a symbolic link can be read using
|
||||
|
@ -66,13 +66,13 @@ In other words, a symbolic link is a pointer to another name,
|
|||
and not to an underlying object.
|
||||
For this reason, symbolic links may refer to directories and may cross
|
||||
filesystem boundaries.
|
||||
|
||||
.PP
|
||||
There is no requirement that the pathname referred to by a symbolic link
|
||||
should exist.
|
||||
A symbolic link that refers to a pathname that does not exist is said
|
||||
to be a
|
||||
.IR "dangling link" .
|
||||
|
||||
.PP
|
||||
Because a symbolic link and its referenced object coexist in the filesystem
|
||||
name space, confusion can arise in distinguishing between the link itself
|
||||
and the referenced object.
|
||||
|
@ -92,13 +92,13 @@ The only time that the ownership of a symbolic link matters is
|
|||
when the link is being removed or renamed in a directory that
|
||||
has the sticky bit set (see
|
||||
.BR stat (2)).
|
||||
|
||||
.PP
|
||||
The last access and last modification timestamps
|
||||
of a symbolic link can be changed using
|
||||
.BR utimensat (2)
|
||||
or
|
||||
.BR lutimes (3).
|
||||
|
||||
.PP
|
||||
On Linux, the permissions of a symbolic link are not used
|
||||
in any operations; the permissions are always
|
||||
0777 (read, write, and execute for all user categories),
|
||||
|
@ -140,7 +140,7 @@ and
|
|||
.BR readlinkat (2),
|
||||
in order to operate on the symbolic link itself
|
||||
(rather than the file to which it refers).
|
||||
|
||||
.PP
|
||||
By default
|
||||
(i.e., if the
|
||||
.BR AT_SYMLINK_FOLLOW
|
||||
|
@ -171,7 +171,7 @@ or a loop is detected.
|
|||
(Loop detection is done by placing an upper limit on the number of
|
||||
links that may be followed, and an error results if this limit is
|
||||
exceeded.)
|
||||
|
||||
.PP
|
||||
There are three separate areas that need to be discussed.
|
||||
They are as follows:
|
||||
.IP 1. 3
|
||||
|
@ -186,7 +186,7 @@ file hierarchy walk).
|
|||
.SS System calls
|
||||
The first area is symbolic links used as filename arguments for
|
||||
system calls.
|
||||
|
||||
.PP
|
||||
Except as noted below, all system calls follow symbolic links.
|
||||
For example, if there were a symbolic link
|
||||
.I slink
|
||||
|
@ -196,7 +196,7 @@ the system call
|
|||
.I "open(""slink"" ...\&)"
|
||||
would return a file descriptor referring to the file
|
||||
.IR afile .
|
||||
|
||||
.PP
|
||||
Various system calls do not follow links, and operate
|
||||
on the symbolic link itself.
|
||||
They are:
|
||||
|
@ -211,7 +211,7 @@ They are:
|
|||
.BR rmdir (2),
|
||||
and
|
||||
.BR unlink (2).
|
||||
|
||||
.PP
|
||||
Certain other system calls optionally follow symbolic links.
|
||||
They are:
|
||||
.BR faccessat (2),
|
||||
|
@ -235,7 +235,7 @@ When
|
|||
.BR rmdir (2)
|
||||
is applied to a symbolic link, it fails with the error
|
||||
.BR ENOTDIR .
|
||||
|
||||
.PP
|
||||
.BR link (2)
|
||||
warrants special discussion.
|
||||
POSIX.1-2001 specifies that
|
||||
|
@ -252,7 +252,7 @@ either behavior in an implementation.
|
|||
.SS Commands not traversing a file tree
|
||||
The second area is symbolic links, specified as command-line
|
||||
filename arguments, to commands which are not traversing a file tree.
|
||||
|
||||
.PP
|
||||
Except as noted below, commands follow symbolic links named as
|
||||
command-line arguments.
|
||||
For example, if there were a symbolic link
|
||||
|
@ -263,7 +263,7 @@ the command
|
|||
.I "cat slink"
|
||||
would display the contents of the file
|
||||
.IR afile .
|
||||
|
||||
.PP
|
||||
It is important to realize that this rule includes commands which may
|
||||
optionally traverse file trees; for example, the command
|
||||
.I "chown file"
|
||||
|
@ -271,7 +271,7 @@ is included in this rule, while the command
|
|||
.IR "chown\ \-R file" ,
|
||||
which performs a tree traversal, is not.
|
||||
(The latter is described in the third area, below.)
|
||||
|
||||
.PP
|
||||
If it is explicitly intended that the command operate on the symbolic
|
||||
link instead of following the symbolic link\(emfor example, it is desired that
|
||||
.I "chown slink"
|
||||
|
@ -289,7 +289,7 @@ while
|
|||
would change the ownership of
|
||||
.I slink
|
||||
itself.
|
||||
|
||||
.PP
|
||||
There are some exceptions to this rule:
|
||||
.IP * 2
|
||||
The
|
||||
|
@ -362,16 +362,16 @@ The following commands either optionally or always traverse file trees:
|
|||
.BR rm (1),
|
||||
and
|
||||
.BR tar (1).
|
||||
|
||||
.PP
|
||||
It is important to realize that the following rules apply equally to
|
||||
symbolic links encountered during the file tree traversal and symbolic
|
||||
links listed as command-line arguments.
|
||||
|
||||
.PP
|
||||
The \fIfirst rule\fP applies to symbolic links that reference files other
|
||||
than directories.
|
||||
Operations that apply to symbolic links are performed on the links
|
||||
themselves, but otherwise the links are ignored.
|
||||
|
||||
.PP
|
||||
The command
|
||||
.I "rm\ \-r slink directory"
|
||||
will remove
|
||||
|
@ -383,12 +383,12 @@ In no case will
|
|||
.BR rm (1)
|
||||
affect the file referred to by
|
||||
.IR slink .
|
||||
|
||||
.PP
|
||||
The \fIsecond rule\fP applies to symbolic links that refer to directories.
|
||||
Symbolic links that refer to directories are never followed by default.
|
||||
This is often referred to as a "physical" walk, as opposed to a "logical"
|
||||
walk (where symbolic links that refer to directories are followed).
|
||||
|
||||
.PP
|
||||
Certain conventions are (should be) followed as consistently as
|
||||
possible by commands that perform file tree walks:
|
||||
.IP * 2
|
||||
|
@ -404,7 +404,7 @@ like the logical name space.
|
|||
flag will be ignored if the
|
||||
.I \-R
|
||||
flag is not also specified.)
|
||||
|
||||
.IP
|
||||
For example, the command
|
||||
.I "chown\ \-HR user slink"
|
||||
will traverse the file hierarchy rooted in the file pointed to by
|
||||
|
@ -434,7 +434,7 @@ the logical name space.
|
|||
flag will be ignored if the
|
||||
.I \-R
|
||||
flag is not also specified.)
|
||||
|
||||
.IP
|
||||
For example, the command
|
||||
.I "chown\ \-LR user slink"
|
||||
will change the owner of the file referred to by
|
||||
|
@ -474,7 +474,7 @@ options more than once;
|
|||
the last one specified determines the command's behavior.
|
||||
This is intended to permit you to alias commands to behave one way
|
||||
or the other, and then override that behavior on the command line.
|
||||
|
||||
.PP
|
||||
The
|
||||
.BR ls (1)
|
||||
and
|
||||
|
|
|
@ -35,7 +35,7 @@ This interface defined a
|
|||
structure used to store terminal settings, and a range of
|
||||
.BR ioctl (2)
|
||||
operations to get and set terminal attributes.
|
||||
|
||||
.PP
|
||||
The
|
||||
.B termio
|
||||
interface is now obsolete: POSIX.1-1990 standardized a modified
|
||||
|
@ -50,7 +50,7 @@ operations that existed in System V.
|
|||
.BR ioctl (2)
|
||||
was unstandardized, and its variadic third argument
|
||||
does not allow argument type checking.)
|
||||
|
||||
.PP
|
||||
If you're looking for a page called "termio", then you can probably
|
||||
find most of the information that you seek in either
|
||||
.BR termios (3)
|
||||
|
|
|
@ -17,19 +17,19 @@ The thread keyring is a keyring used to anchor keys on behalf of a process.
|
|||
It is created only when a thread requests it.
|
||||
The thread keyring has the name (description)
|
||||
.IR _tid .
|
||||
|
||||
.PP
|
||||
A special serial number value,
|
||||
.BR KEY_SPEC_THREAD_KEYRING ,
|
||||
is defined that can be used in lieu of the actual serial number of
|
||||
the calling thread's thread keyring.
|
||||
|
||||
.PP
|
||||
From the
|
||||
.BR keyctl (1)
|
||||
utility, '\fB@t\fP' can be used instead of a numeric key ID in
|
||||
much the same way, but as
|
||||
.BR keyctl (1)
|
||||
is a program run after forking, this is of no utility.
|
||||
|
||||
.PP
|
||||
Thread keyrings are not inherited across
|
||||
.BR clone (2)
|
||||
and
|
||||
|
@ -37,7 +37,7 @@ and
|
|||
and are cleared by
|
||||
.BR execve (2).
|
||||
A thread keyring is destroyed when the thread that refers to it terminates.
|
||||
|
||||
.PP
|
||||
Initially, a thread does not have a thread keyring.
|
||||
If a thread doesn't have a thread keyring when it is accessed,
|
||||
then it will be created if it is to be modified;
|
||||
|
|
14
man7/time.7
14
man7/time.7
|
@ -36,7 +36,7 @@ either from a standard point in the past
|
|||
(see the description of the Epoch and calendar time below),
|
||||
or from some point (e.g., the start) in the life of a process
|
||||
.RI ( "elapsed time" ).
|
||||
|
||||
.PP
|
||||
.I "Process time"
|
||||
is defined as the amount of CPU time used by a process.
|
||||
This is sometimes divided into
|
||||
|
@ -78,7 +78,7 @@ a clock maintained by the kernel which measures time in
|
|||
.IR jiffies .
|
||||
The size of a jiffy is determined by the value of the kernel constant
|
||||
.IR HZ .
|
||||
|
||||
.PP
|
||||
The value of
|
||||
.I HZ
|
||||
varies across kernel versions and hardware platforms.
|
||||
|
@ -93,7 +93,7 @@ yielding a jiffies value of, respectively, 0.01, 0.004, or 0.001 seconds.
|
|||
Since kernel 2.6.20, a further frequency is available:
|
||||
300, a number that divides evenly for the common video
|
||||
frame rates (PAL, 25 HZ; NTSC, 30 HZ).
|
||||
|
||||
.PP
|
||||
The
|
||||
.BR times (2)
|
||||
system call is a special case.
|
||||
|
@ -107,7 +107,7 @@ User-space applications can determine the value of this constant using
|
|||
.SS High-resolution timers
|
||||
Before Linux 2.6.21, the accuracy of timer and sleep system calls
|
||||
(see below) was also limited by the size of the jiffy.
|
||||
|
||||
.PP
|
||||
Since Linux 2.6.21, Linux supports high-resolution timers (HRTs),
|
||||
optionally configurable via
|
||||
.BR CONFIG_HIGH_RES_TIMERS .
|
||||
|
@ -120,14 +120,14 @@ checking the resolution returned by a call to
|
|||
.BR clock_getres (2)
|
||||
or looking at the "resolution" entries in
|
||||
.IR /proc/timer_list .
|
||||
|
||||
.PP
|
||||
HRTs are not supported on all hardware architectures.
|
||||
(Support is provided on x86, arm, and powerpc, among others.)
|
||||
.SS The Epoch
|
||||
UNIX systems represent time in seconds since the
|
||||
.IR Epoch ,
|
||||
1970-01-01 00:00:00 +0000 (UTC).
|
||||
|
||||
.PP
|
||||
A program can determine the
|
||||
.I "calendar time"
|
||||
using
|
||||
|
@ -164,7 +164,7 @@ Various system calls and functions allow a program to sleep
|
|||
.BR clock_nanosleep (2),
|
||||
and
|
||||
.BR sleep (3).
|
||||
|
||||
.PP
|
||||
Various system calls allow a process to set a timer that expires
|
||||
at some point in the future, and optionally at repeated intervals;
|
||||
see
|
||||
|
|
|
@ -37,7 +37,7 @@ It also guarantees "round-trip compatibility";
|
|||
in other words,
|
||||
conversion tables can be built such that no information is lost
|
||||
when a string is converted from any other encoding to UCS and back.
|
||||
|
||||
.PP
|
||||
UCS contains the characters required to represent practically all
|
||||
known languages.
|
||||
This includes not only the Latin, Greek, Cyrillic,
|
||||
|
@ -59,7 +59,7 @@ graphical, typographical, mathematical, and scientific symbols,
|
|||
including those provided by TeX, Postscript, APL, MS-DOS, MS-Windows,
|
||||
Macintosh, OCR fonts, as well as many word processing and publishing
|
||||
systems, and more are being added.
|
||||
|
||||
.PP
|
||||
The UCS standard (ISO 10646) describes a
|
||||
31-bit character set architecture
|
||||
consisting of 128 24-bit
|
||||
|
@ -166,7 +166,7 @@ code values (in all locales), a convention that is signaled by the GNU
|
|||
C library to applications by defining the constant
|
||||
.B __STDC_ISO_10646__
|
||||
as specified in the ISO C99 standard.
|
||||
|
||||
.PP
|
||||
UCS/Unicode can be used just like ASCII in input/output streams,
|
||||
terminal communication, plaintext files, filenames, and environment
|
||||
variables in the ASCII compatible UTF-8 multibyte encoding.
|
||||
|
@ -216,7 +216,7 @@ Information technology \(em Universal Multiple-Octet Coded Character
|
|||
Set (UCS) \(em Part 1: Architecture and Basic Multilingual Plane.
|
||||
International Standard ISO/IEC 10646-1, International Organization
|
||||
for Standardization, Geneva, 2000.
|
||||
|
||||
.IP
|
||||
This is the official specification of UCS .
|
||||
Available from
|
||||
.UR http://www.iso.ch/
|
||||
|
@ -228,7 +228,7 @@ Reading, MA, 2000, ISBN 0-201-61633-5.
|
|||
.IP *
|
||||
S. Harbison, G. Steele. C: A Reference Manual. Fourth edition,
|
||||
Prentice Hall, Englewood Cliffs, 1995, ISBN 0-13-326224-3.
|
||||
|
||||
.IP
|
||||
A good reference book about the C programming language.
|
||||
The fourth
|
||||
edition covers the 1994 Amendment 1 to the ISO C90 standard, which
|
||||
|
|
|
@ -21,7 +21,7 @@ The user keyring has a name (description) of the form
|
|||
where
|
||||
.I <UID>
|
||||
is the user ID of the corresponding user.
|
||||
|
||||
.PP
|
||||
The user keyring is associated with the record that the kernel maintains
|
||||
for the UID.
|
||||
It comes into existence upon the first attempt to access either the
|
||||
|
@ -33,28 +33,28 @@ The keyring remains pinned in existence so long as there are processes
|
|||
running with that real UID or files opened by those processes remain open.
|
||||
(The keyring can also be pinned indefinitely by linking it
|
||||
into another keyring.)
|
||||
|
||||
.PP
|
||||
Typically, the user keyring is created by
|
||||
.BR pam_keyinit (8)
|
||||
when a user logs in.
|
||||
|
||||
.PP
|
||||
The user keyring is not searched by default by
|
||||
.BR request_key (2).
|
||||
When
|
||||
.BR pam_keyinit (8)
|
||||
creates a session keyring, it adds to it a link to the user
|
||||
keyring so that the user keyring will be searched when the session keyring is.
|
||||
|
||||
.PP
|
||||
A special serial number value,
|
||||
.BR KEY_SPEC_USER_KEYRING ,
|
||||
is defined that can be used in lieu of the actual serial number of
|
||||
the calling process's user keyring.
|
||||
|
||||
.PP
|
||||
From the
|
||||
.BR keyctl (1)
|
||||
utility, '\fB@u\fP' can be used instead of a numeric key ID in
|
||||
much the same way.
|
||||
|
||||
.PP
|
||||
User keyrings are independent of
|
||||
.BR clone (2),
|
||||
.BR fork (2),
|
||||
|
|
|
@ -21,7 +21,7 @@ The user session keyring has a name (description) of the form
|
|||
where
|
||||
.I <UID>
|
||||
is the user ID of the corresponding user.
|
||||
|
||||
.PP
|
||||
The user session keyring is associated with the record that
|
||||
the kernel maintains for the UID.
|
||||
It comes into existence upon the first attempt to access either the
|
||||
|
@ -34,7 +34,7 @@ The keyring remains pinned in existence so long as there are processes
|
|||
running with that real UID or files opened by those processes remain open.
|
||||
(The keyring can also be pinned indefinitely by linking it
|
||||
into another keyring.)
|
||||
|
||||
.PP
|
||||
The user session keyring is created on demand when a thread requests it
|
||||
or when a thread asks for its
|
||||
.BR session-keyring (7)
|
||||
|
@ -42,22 +42,22 @@ and that keyring doesn't exist.
|
|||
In the latter case, a user session keyring will be created and,
|
||||
if the session keyring wasn't to be created,
|
||||
the user session keyring will be set as the process's actual session keyring.
|
||||
|
||||
.PP
|
||||
The user session keyring is searched by
|
||||
.BR request_key (2)
|
||||
if the actual session keyring does not exist and is ignored otherwise.
|
||||
|
||||
.PP
|
||||
A special serial number value,
|
||||
.BR KEY_SPEC_USER_SESSION_KEYRING ,
|
||||
is defined
|
||||
that can be used in lieu of the actual serial number of
|
||||
the calling process's user session keyring.
|
||||
|
||||
.PP
|
||||
From the
|
||||
.BR keyctl (1)
|
||||
utility, '\fB@us\fP' can be used instead of a numeric key ID in
|
||||
much the same way.
|
||||
|
||||
.PP
|
||||
User session keyrings are independent of
|
||||
.BR clone (2),
|
||||
.BR fork (2),
|
||||
|
@ -67,10 +67,10 @@ and
|
|||
.BR _exit (2)
|
||||
excepting that the keyring is destroyed when the UID record is destroyed
|
||||
when the last process pinning it exits.
|
||||
|
||||
.PP
|
||||
If a user session keyring does not exist when it is accessed,
|
||||
it will be created.
|
||||
|
||||
.PP
|
||||
Rather than relying on the user session keyring,
|
||||
it is strongly recommended\(emespecially if the process
|
||||
is running as root\(emthat a
|
||||
|
|
|
@ -30,7 +30,7 @@ user_namespaces \- overview of Linux user namespaces
|
|||
.SH DESCRIPTION
|
||||
For an overview of namespaces, see
|
||||
.BR namespaces (7).
|
||||
|
||||
.PP
|
||||
User namespaces isolate security-related identifiers and attributes,
|
||||
in particular,
|
||||
user IDs and group IDs (see
|
||||
|
@ -66,7 +66,7 @@ or
|
|||
with the
|
||||
.BR CLONE_NEWUSER
|
||||
flag.
|
||||
|
||||
.PP
|
||||
The kernel imposes (since version 3.11) a limit of 32 nested levels of
|
||||
.\" commit 8742f229b635bf1c1c84a3dfe5e47c814c20b5c8
|
||||
user namespaces.
|
||||
|
@ -77,7 +77,7 @@ or
|
|||
.BR clone (2)
|
||||
that would cause this limit to be exceeded fail with the error
|
||||
.BR EUSERS .
|
||||
|
||||
.PP
|
||||
Each process is a member of exactly one user namespace.
|
||||
A process created via
|
||||
.BR fork (2)
|
||||
|
@ -92,7 +92,7 @@ if it has the
|
|||
.BR CAP_SYS_ADMIN
|
||||
in that namespace;
|
||||
upon doing so, it gains a full set of capabilities in that namespace.
|
||||
|
||||
.PP
|
||||
A call to
|
||||
.BR clone (2)
|
||||
or
|
||||
|
@ -104,7 +104,7 @@ flag makes the new child process (for
|
|||
or the caller (for
|
||||
.BR unshare (2))
|
||||
a member of the new user namespace created by the call.
|
||||
|
||||
.PP
|
||||
The
|
||||
.BR NS_GET_PARENT
|
||||
.BR ioctl (2)
|
||||
|
@ -136,7 +136,7 @@ and
|
|||
user namespace,
|
||||
even if the new namespace is created or joined by the root user
|
||||
(i.e., a process with user ID 0 in the root namespace).
|
||||
|
||||
.PP
|
||||
Note that a call to
|
||||
.BR execve (2)
|
||||
will cause a process's capabilities to be recalculated in the usual way (see
|
||||
|
@ -146,7 +146,7 @@ unless the process has a user ID of 0 within the namespace,
|
|||
or the executable file has a nonempty inheritable capabilities mask,
|
||||
the process will lose all capabilities.
|
||||
See the discussion of user and group ID mappings, below.
|
||||
|
||||
.PP
|
||||
A call to
|
||||
.BR clone (2),
|
||||
.BR unshare (2),
|
||||
|
@ -171,7 +171,7 @@ retaining its user namespace membership by using a pair of
|
|||
.BR setns (2)
|
||||
calls to move to another user namespace and then return to
|
||||
its original user namespace.
|
||||
|
||||
.PP
|
||||
The rules for determining whether or not a process has a capability
|
||||
in a particular user namespace are as follows:
|
||||
.IP 1. 3
|
||||
|
@ -222,7 +222,7 @@ only on resources governed by that namespace.
|
|||
In other words, having a capability in a user namespace permits a process
|
||||
to perform privileged operations on resources that are governed by (nonuser)
|
||||
namespaces associated with the user namespace (see the next subsection).
|
||||
|
||||
.PP
|
||||
On the other hand, there are many privileged operations that affect
|
||||
resources that are not associated with any namespace type,
|
||||
for example, changing the system time (governed by
|
||||
|
@ -234,14 +234,14 @@ and creating a device (governed by
|
|||
Only a process with privileges in the
|
||||
.I initial
|
||||
user namespace can perform such operations.
|
||||
|
||||
.PP
|
||||
Holding
|
||||
.B CAP_SYS_ADMIN
|
||||
within the user namespace associated with a process's mount namespace
|
||||
allows that process to create bind mounts
|
||||
and mount the following types of filesystems:
|
||||
.\" fs_flags = FS_USERNS_MOUNT in kernel sources
|
||||
|
||||
.PP
|
||||
.RS 4
|
||||
.PD 0
|
||||
.IP * 2
|
||||
|
@ -278,7 +278,7 @@ cgroup version 1 named hierarchies
|
|||
(i.e., cgroup filesystems mounted with the
|
||||
.BR """none,name="""
|
||||
option).
|
||||
|
||||
.PP
|
||||
Holding
|
||||
.B CAP_SYS_ADMIN
|
||||
within the user namespace associated with a process's PID namespace
|
||||
|
@ -286,7 +286,7 @@ allows (since Linux 3.8)
|
|||
that process to mount
|
||||
.I /proc
|
||||
filesystems.
|
||||
|
||||
.PP
|
||||
Note however, that mounting block-based filesystems can be done
|
||||
only by a process that holds
|
||||
.BR CAP_SYS_ADMIN
|
||||
|
@ -299,13 +299,13 @@ Starting in Linux 3.8, unprivileged processes can create user namespaces,
|
|||
and other the other types of namespaces can be created with just the
|
||||
.B CAP_SYS_ADMIN
|
||||
capability in the caller's user namespace.
|
||||
|
||||
.PP
|
||||
When a non-user-namespace is created,
|
||||
it is owned by the user namespace in which the creating process
|
||||
was a member at the time of the creation of the namespace.
|
||||
Actions on the non-user-namespace
|
||||
require capabilities in the corresponding user namespace.
|
||||
|
||||
.PP
|
||||
If
|
||||
.BR CLONE_NEWUSER
|
||||
is specified along with other
|
||||
|
@ -322,7 +322,7 @@ or caller
|
|||
privileges over the remaining namespaces created by the call.
|
||||
Thus, it is possible for an unprivileged caller to specify this combination
|
||||
of flags.
|
||||
|
||||
.PP
|
||||
When a new namespace (other than a user namespace) is created via
|
||||
.BR clone (2)
|
||||
or
|
||||
|
@ -344,7 +344,7 @@ the process's UTS namespace, and check whether the process has the
|
|||
required capability
|
||||
.RB ( CAP_SYS_ADMIN )
|
||||
in that user namespace.
|
||||
|
||||
.PP
|
||||
The
|
||||
.BR NS_GET_USERNS
|
||||
.BR ioctl (2)
|
||||
|
@ -369,13 +369,13 @@ inside the user namespace for the process
|
|||
.IR pid .
|
||||
These files can be read to view the mappings in a user namespace and
|
||||
written to (once) to define the mappings.
|
||||
|
||||
.PP
|
||||
The description in the following paragraphs explains the details for
|
||||
.IR uid_map ;
|
||||
.IR gid_map
|
||||
is exactly the same,
|
||||
but each instance of "user ID" is replaced by "group ID".
|
||||
|
||||
.PP
|
||||
The
|
||||
.I uid_map
|
||||
file exposes the mapping of user IDs from the user namespace
|
||||
|
@ -389,7 +389,7 @@ will potentially see different values when reading from a particular
|
|||
.I uid_map
|
||||
file, depending on the user ID mappings for the user namespaces
|
||||
of the reading processes.
|
||||
|
||||
.PP
|
||||
Each line in the
|
||||
.I uid_map
|
||||
file specifies a 1-to-1 mapping of a range of contiguous
|
||||
|
@ -441,7 +441,7 @@ System calls that return user IDs (group IDs)\(emfor example,
|
|||
and the credential fields in the structure returned by
|
||||
.BR stat (2)\(emreturn
|
||||
the user ID (group ID) mapped into the caller's user namespace.
|
||||
|
||||
.PP
|
||||
When a process accesses a file, its user and group IDs
|
||||
are mapped into the initial user namespace for the purpose of permission
|
||||
checking and assigning IDs when creating a file.
|
||||
|
@ -449,7 +449,7 @@ When a process retrieves file user and group IDs via
|
|||
.BR stat (2),
|
||||
the IDs are mapped in the opposite direction,
|
||||
to produce values relative to the process user and group ID mappings.
|
||||
|
||||
.PP
|
||||
The initial user namespace has no parent namespace,
|
||||
but, for consistency, the kernel provides dummy user and group
|
||||
ID mapping files for this namespace.
|
||||
|
@ -458,14 +458,14 @@ Looking at the
|
|||
file
|
||||
.RI ( gid_map
|
||||
is the same) from a shell in the initial namespace shows:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
$ \fBcat /proc/$$/uid_map\fP
|
||||
0 0 4294967295
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
This mapping tells us
|
||||
that the range starting at user ID 0 in this namespace
|
||||
maps to a range starting at 0 in the (nonexistent) parent namespace,
|
||||
|
@ -499,7 +499,7 @@ file in a user namespace fails with the error
|
|||
Similar rules apply for
|
||||
.I gid_map
|
||||
files.
|
||||
|
||||
.PP
|
||||
The lines written to
|
||||
.IR uid_map
|
||||
.RI ( gid_map )
|
||||
|
@ -540,7 +540,7 @@ At least one line must be written to the file.
|
|||
.PP
|
||||
Writes that violate the above rules fail with the error
|
||||
.BR EINVAL .
|
||||
|
||||
.PP
|
||||
In order for a process to write to the
|
||||
.I /proc/[pid]/uid_map
|
||||
.RI ( /proc/[pid]/gid_map )
|
||||
|
@ -623,7 +623,7 @@ and
|
|||
.I gid_map
|
||||
files have been written, only the mapped values may be used in
|
||||
system calls that change user and group IDs.
|
||||
|
||||
.PP
|
||||
For user IDs, the relevant system calls include
|
||||
.BR setuid (2),
|
||||
.BR setfsuid (2),
|
||||
|
@ -637,7 +637,7 @@ For group IDs, the relevant system calls include
|
|||
.BR setresgid (2),
|
||||
and
|
||||
.BR setgroups (2).
|
||||
|
||||
.PP
|
||||
Writing
|
||||
.RI \(dq deny \(dq
|
||||
to the
|
||||
|
@ -685,7 +685,7 @@ file (and regardless of the process's capabilities), calls to
|
|||
are also not permitted if
|
||||
.IR /proc/[pid]/gid_map
|
||||
has not yet been set.
|
||||
|
||||
.PP
|
||||
A privileged process (one with the
|
||||
.BR CAP_SYS_ADMIN
|
||||
capability in the namespace) may write either of the strings
|
||||
|
@ -701,7 +701,7 @@ Writing the string
|
|||
.RI \(dq deny \(dq
|
||||
prevents any process in the user namespace from employing
|
||||
.BR setgroups (2).
|
||||
|
||||
.PP
|
||||
The essence of the restrictions described in the preceding
|
||||
paragraph is that it is permitted to write to
|
||||
.I /proc/[pid]/setgroups
|
||||
|
@ -720,10 +720,10 @@ a process can transition only from
|
|||
being disallowed to
|
||||
.BR setgroups (2)
|
||||
being allowed.
|
||||
|
||||
.PP
|
||||
The default value of this file in the initial user namespace is
|
||||
.RI \(dq allow \(dq.
|
||||
|
||||
.PP
|
||||
Once
|
||||
.IR /proc/[pid]/gid_map
|
||||
has been written to
|
||||
|
@ -738,11 +738,11 @@ to
|
|||
.IR /proc/[pid]/setgroups
|
||||
(the write fails with the error
|
||||
.BR EPERM ).
|
||||
|
||||
.PP
|
||||
A child user namespace inherits the
|
||||
.IR /proc/[pid]/setgroups
|
||||
setting from its parent.
|
||||
|
||||
.PP
|
||||
If the
|
||||
.I setgroups
|
||||
file has the value
|
||||
|
@ -756,7 +756,7 @@ to the file) in this user namespace.
|
|||
.BR EPERM .)
|
||||
This restriction also propagates down to all child user namespaces of
|
||||
this user namespace.
|
||||
|
||||
.PP
|
||||
The
|
||||
.I /proc/[pid]/setgroups
|
||||
file was added in Linux 3.19,
|
||||
|
@ -815,7 +815,7 @@ and
|
|||
.IR /proc/sys/kernel/overflowgid
|
||||
in
|
||||
.BR proc (5).
|
||||
|
||||
.PP
|
||||
The cases where unmapped IDs are mapped in this fashion include
|
||||
system calls that return user IDs
|
||||
.RB ( getuid (2),
|
||||
|
@ -843,7 +843,7 @@ credentials written to the process accounting file (see
|
|||
.BR acct (5)),
|
||||
and credentials returned with POSIX message queue notifications (see
|
||||
.BR mq_notify (3)).
|
||||
|
||||
.PP
|
||||
There is one notable case where unmapped user and group IDs are
|
||||
.I not
|
||||
.\" from_kuid(), from_kgid()
|
||||
|
@ -909,7 +909,7 @@ User namespaces require support in a range of subsystems across
|
|||
the kernel.
|
||||
When an unsupported subsystem is configured into the kernel,
|
||||
it is not possible to configure user namespaces support.
|
||||
|
||||
.PP
|
||||
As at Linux 3.8, most relevant subsystems supported user namespaces,
|
||||
but a number of filesystems did not have the infrastructure needed
|
||||
to map user and group IDs between user namespaces.
|
||||
|
@ -929,9 +929,9 @@ The comments and
|
|||
.I usage()
|
||||
function inside the program provide a full explanation of the program.
|
||||
The following shell session demonstrates its use.
|
||||
|
||||
.PP
|
||||
First, we look at the run-time environment:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
$ \fBuname \-rs\fP # Need Linux 3.8 or later
|
||||
|
@ -942,7 +942,7 @@ $ \fBid \-g\fP
|
|||
1000
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
Now start a new shell in new user
|
||||
.RI ( \-U ),
|
||||
mount
|
||||
|
@ -954,16 +954,16 @@ namespaces, with user ID
|
|||
and group ID
|
||||
.RI ( \-G )
|
||||
1000 mapped to 0 inside the user namespace:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
$ \fB./userns_child_exec \-p \-m \-U \-M '0 1000 1' \-G '0 1000 1' bash\fP
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
The shell has PID 1, because it is the first process in the new
|
||||
PID namespace:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
bash$ \fBecho $$\fP
|
||||
|
@ -975,7 +975,7 @@ Mounting a new
|
|||
filesystem and listing all of the processes visible
|
||||
in the new PID namespace shows that the shell can't see
|
||||
any processes outside the PID namespace:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
bash$ \fBmount \-t proc proc /proc\fP
|
||||
|
@ -985,10 +985,10 @@ bash$ \fBps ax\fP
|
|||
22 pts/3 R+ 0:00 ps ax
|
||||
.fi
|
||||
.in
|
||||
|
||||
.PP
|
||||
Inside the user namespace, the shell has user and group ID 0,
|
||||
and a full set of permitted and effective capabilities:
|
||||
|
||||
.PP
|
||||
.in +4n
|
||||
.nf
|
||||
bash$ \fBcat /proc/$$/status | egrep '^[UG]id'\fP
|
||||
|
|
|
@ -46,7 +46,7 @@ The ISO 10646 Universal Character Set (UCS),
|
|||
a superset of Unicode, occupies an even larger code
|
||||
space\(em31\ bits\(emand the obvious
|
||||
UCS-4 encoding for it (a sequence of 32-bit words) has the same problems.
|
||||
|
||||
.PP
|
||||
The UTF-8 encoding of Unicode and UCS
|
||||
does not have these problems and is the common way in which
|
||||
Unicode is used on UNIX-style operating systems.
|
||||
|
|
|
@ -144,7 +144,7 @@ The list of attribute names that
|
|||
can be returned is also limited to 64 kB
|
||||
(see BUGS in
|
||||
.BR listxattr (2)).
|
||||
|
||||
.PP
|
||||
Some filesystems, such as Reiserfs (and, historically, ext2 and ext3),
|
||||
require the filesystem to be mounted with the
|
||||
.B user_xattr
|
||||
|
@ -160,10 +160,10 @@ In the Btrfs, XFS, and Reiserfs filesystem implementations, there is no
|
|||
practical limit on the number of extended attributes
|
||||
associated with a file, and the algorithms used to store extended
|
||||
attribute information on disk are scalable.
|
||||
|
||||
.PP
|
||||
In the JFS, XFS, and Reiserfs filesystem implementations,
|
||||
the limit on bytes used in an EA value is the ceiling imposed by the VFS.
|
||||
|
||||
.PP
|
||||
In the Btrfs filesystem implementation,
|
||||
the total bytes used for the name, value, and implementation overhead bytes
|
||||
is limited to the filesystem
|
||||
|
@ -177,7 +177,7 @@ Since the filesystems on which extended attributes are stored might also
|
|||
be used on architectures with a different byte order and machine word
|
||||
size, care should be taken to store attribute values in an
|
||||
architecture-independent format.
|
||||
|
||||
.PP
|
||||
This page was formerly named
|
||||
.BR attr (5).
|
||||
.\" .SH AUTHORS
|
||||
|
|
Loading…
Reference in New Issue