Commit Graph

3682 Commits

Author SHA1 Message Date
Jakub Wilk b784b9d50f user_namespaces.7: tfix
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-09 16:02:07 +01:00
Michael Kerrisk a13b92e5da signal.7: tfix
Reported-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-09 04:48:59 +01:00
Michael Kerrisk 4a501601a6 signal.7: Reorder the architectures in the signal number lists
x86 and ARM are the most common architectures, but currently
are in the second subfield in the signal number lists.
Instead, swap that info with subfield 1, so the most
common architectures are first in the list.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-07 22:35:50 +01:00
Helge Deller a42f9c51cb signal.7: Add signal numbers for parisc
This patch adds the signal numbers for parisc to the signal(7) man page.

Those parisc-specific values for the various signals are valid since the
Linux kernel upstream commit ("parisc: Reduce SIGRTMIN from 37 to 32 to
behave like other Linux architectures") during development of kernel 3.18:
http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f25df2eff5b25f52c139d3ff31bc883eee9a0ab

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-07 22:35:45 +01:00
Michael Kerrisk aa2c362324 cgroups.7: Minor fix: bump kernel version to 4.19 in a couple of points
The stated points still hold true as at Linux 4.1.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-07 21:30:33 +01:00
Jakub Wilk 587ff4d5af vdso.7: tfix
Escape hyphens; use \(aq for ASCII apostrophes.

Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-05 17:00:05 +01:00
Michael Kerrisk 77eefc59bd cgroups.7: tfix
Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-04 11:29:06 +01:00
Michael Kerrisk c6c28d527d user_namespaces.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-02 13:52:24 +01:00
Michael Kerrisk 2c1608c23b namespaces.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-02 13:32:25 +01:00
Michael Kerrisk 2eb89baa0e capabilities.7: Minor fixes to Marcus Gelderie's patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-01 20:55:13 +01:00
Marcus Gelderie 35ecd12dd9 capabilities.7: Mention header for SECBIT constants
Mention that the named constants (SECBIT_KEEP_CAPS and others)
are available only if the linux/securebits.h user-space header
is included.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-01 20:55:13 +01:00
Michael Kerrisk 53666f6c30 bpf-helpers.7: Add new man page for eBPF helper functions
eBPF sub-system on Linux can use "helper functions", functions
implemented in the kernel that can be called from within a eBPF program
injected by a user on Linux. The kernel already supports a long list of
such helpers (sixty-seven at this time, new ones are under review).
Therefore, it is proposed to create a new manual page, separate from
bpf(2), to document those helpers for people willing to develop new eBPF
programs.

Additionally, in an effort to keep this documentation in synchronisation
with what is implemented in the kernel, it is further proposed to keep
the documentation itself in the kernel sources, as comments in file
"include/uapi/linux/bpf.h", and to generate the man page from there.

This patch adds the new man page, generated from kernel sources, to the
man-pages repository. For each eBPF helper function, a description of
the helper, of its arguments and of the return value is provided. The
idea is that all future changes for this page should be redirected to
the kernel file "include/uapi/linux/bpf.h", and the modified page
generated from there.

Generating the page itself is a two-step process. First, the
documentation is extracted from include/uapi/linux/bpf.h, and converted
to a RST (reStructuredText-formatted) page, with the relevant script
from Linux sources:

      $ ./scripts/bpf_helpers_doc.py > /tmp/bpf-helpers.rst

The second step consists in turning the RST document into the final man
page, with rst2man:

      $ rst2man /tmp/bpf-helpers.rst > bpf-helpers.7

The bpf.h file was taken as at kernel 4.19

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-01 14:57:49 +01:00
Michael Kerrisk dd63e15948 capabilities.7: Correct the description of SECBIT_KEEP_CAPS
This just adds to the point made by Marcus Gelderie's patch.  Note
also that SECBIT_KEEP_CAPS provides the same functionality as the
prctl() PR_SET_KEEPCAPS flag, and the prctl(2) manual page has the
correct description of the semantics (i.e., that the flag affects
the treatment of onlt the permitted capability set).

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-01 14:40:49 +01:00
Michael Kerrisk ab7ef2a882 capabilities.7: Minor tweaks to the text added by Marcus Gelderie's patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-01 14:40:49 +01:00
Marcus Gelderie 7d32b135d6 capabilities.7: Add details about SECBIT_KEEP_CAPS
The description of SECBIT_KEEP_CAPS is misleading about the
effects on the effective capabilities of a process during a
switch to nonzero UIDs.  The effective set is cleared based on
the effective UID switching to a nonzero value, even if
SECBIT_KEEP_CAPS is set. However, with this bit set, the
effective and permitted sets are not cleared if the real and
saved set-user-ID are set to nonzero values.

This was tested using the following C code and reading the kernel
source at security/commoncap.c: cap_emulate_setxuid.

void print_caps(void) {
    cap_t current = cap_get_proc();
    if (!current) {
        perror("Current caps");
        return;
    }
    char *text = cap_to_text(current, NULL);
    if (!text) {
        perror("Converting caps to text");
        goto free_caps;
    }
    printf("Capabilities: %s\n", text);
    cap_free(text);
free_caps:
    cap_free(current);
}

void print_creds(void) {
    uid_t ruid, suid, euid;
    if (getresuid(&ruid, &euid, &suid)) {
        perror("Error getting UIDs");
        return;
    }
    printf("real = %d, effective = %d, saved set-user-ID = %d\n", ruid, euid, suid);
}

void set_caps(int size, const cap_value_t *caps) {
    cap_t current = cap_init();
    if (!current) {
        perror("Error getting current caps");
        return;
    }
    if (cap_clear(current)) {
        perror("Error clearing caps");
    }
    if (cap_set_flag(current, CAP_INHERITABLE, size, caps, CAP_SET)) {
        perror("setting caps");
        goto free_caps;
    }
    if (cap_set_flag(current, CAP_EFFECTIVE, size, caps, CAP_SET)) {
        perror("setting caps");
        goto free_caps;
    }
    if (cap_set_flag(current, CAP_PERMITTED, size, caps, CAP_SET)) {
        perror("setting caps");
        goto free_caps;
    }
    if (cap_set_proc(current)) {
        perror("Comitting caps");
        goto free_caps;
    }
free_caps:
    cap_free(current);
}

const cap_value_t caps[] = {CAP_SETUID, CAP_SETPCAP};
const size_t num_caps = sizeof(caps) / sizeof(cap_value_t);

int main(int argc, char **argv) {
    puts("[+] Dropping most capabilities to reduce amount of console output...");
    set_caps(num_caps, caps);
    puts("[+] Dropped capabilities. Starting with these credentials and capabilities:");

    print_caps();
    print_creds();

    if (argc >= 2 && 0 == strncmp(argv[1], "keep", 4)) {
        puts("[+] Setting SECBIT_KEEP_CAPS bit");
        if (prctl(PR_SET_SECUREBITS, SECBIT_KEEP_CAPS, 0, 0, 0)) {
            perror("Setting secure bits");
            return 1;
        }
    }

    puts("[+] Setting effective UID to 1000");
    if (seteuid(1000)) {
        perror("Error setting effective UID");
        return 2;
    }
    print_caps();
    print_creds();

    puts("[+] Raising caps again");
    set_caps(num_caps, caps);
    print_caps();
    print_creds();

    puts("[+] Setting all remaining UIDs to nonzero values");
    if (setreuid(1000, 1000)) {
        perror("Error setting all UIDs to 1000");
        return 3;
    }
    print_caps();
    print_creds();

    return 0;
}

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-11-01 14:39:25 +01:00
Michael Kerrisk 6e8a3b421b user_namespaces.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-31 08:47:02 +01:00
Michael Kerrisk 043aaa9427 namespaces.7: f
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-31 08:40:21 +01:00
Michael Kerrisk d45e85a94b namespaces.7: Briefly explain why CAP_SYS_ADMIN is needed to create nonuser namespaces
Reported-by: Tycho Kirchner <tychokirchner@mail.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-31 08:39:02 +01:00
Michael Kerrisk 29af6f1a59 user_namespaces.7: Rework terminology describing ownership of nonuser namespaces
Prefer the word "owns" rather than "associated with" when
describing the relationship between user namespaces and non-user
namespaces. The existing text used a mix of the two terms, with
"associated with" being predominant, but to my ear, describing the
relationship as "ownership" is more comprehensible.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-31 08:31:47 +01:00
Josh Triplett d63618d564 precedence.7: Add as a redirect to operator.7
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-28 10:10:20 +01:00
Michael Kerrisk d7d7c8ea04 namespaces.7: SEE ALSO: add pam_namespace(8)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-25 10:19:45 +02:00
Jakub Wilk 29c8d172fd address_families.7: tfix
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-21 19:58:12 +02:00
Michael Kerrisk e1b1b8985c inode.7: tfix
Reported-by: Burkhard Lück <lueck@hube-lueck.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-17 08:19:39 +02:00
Michael Kerrisk a5409af7ec socket.7: SEE ALSO: add address_families(7)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-16 10:46:49 +02:00
Michael Kerrisk a88c75c24b address_families.7: New page that contains details of socket address families
There is too much detail in socket(2). Move most of it into
a new page instead.

Cowritten-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-16 10:46:16 +02:00
Michael Kerrisk a970e1f920 sched.7: In the kernel source SCHED_OTHER is actually called SCHED_NORMAL
Reported-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-14 16:15:50 +02:00
Michael Kerrisk c9a35b01a1 cgroup_namespaces.7: Clarify
Clarify the example by making an implied detail more explicit.

Quoting the Troy Engel on the problem with the original text:

    The problem is "and a process in a sibling cgroup (sub2)"
    (shown as PID 20124 here) - how did this get here? How do I
    recreate this? Following this example, there's no mention of
    how, it's out of place when following the instructions.
    There is nothing in any of the cgroup files which contain
    this (# grep freezer /proc/*/cgroup) while at this stage.

    The intent is understood, however the man page seems to skip
    a step to create this in the teaching example. We should add
    whatever simple steps are needed to create the "process in a
    sibling cgroup" as outlined so it makes sense - as written,
    I have no clue where "sibling cgroup (sub2)" came from, it
    just appeared out of the blue in that step. Thanks!

See https://bugzilla.kernel.org/show_bug.cgi?id=201047

Reported-by: Troy Engel <troyengel@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-14 13:56:27 +02:00
Michael Kerrisk d190902bc2 cgroup_namespaces.7: Move a sentence from DESCRIPTION to NOTES
This sentence fits better in NOTES.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-14 13:40:47 +02:00
Michael Kerrisk e39f614f9f cgroup_namespaces.7: Remove redundant use of 'sh -c' in shell session
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-14 13:37:02 +02:00
Michael Kerrisk 4d9b3039d6 cgroup_namespaces.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-14 11:41:57 +02:00
Michael Kerrisk 44084d19bb cgroups.7: Complete partial sentence re kernel boot options and 'nsdelegate'
The intended text was hidden elsewhere in the source of the
page as a comment.

https://bugzilla.kernel.org/show_bug.cgi?id=201029

Reported-by: Mike Weilgart <mike.weilgart@verticalsysadmin.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-14 10:10:02 +02:00
Michael Kerrisk 2b3c0042d1 sched.7: SEE ALSO: add ps(1) and top(1)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-09 12:53:13 +02:00
Michael Kerrisk 17094a28ff cgroups.7: Minor wording fix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-09 11:48:45 +02:00
Michael Kerrisk edc90967b9 cgroups.7: wfix: use "threads" consistently
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-09 11:48:03 +02:00
Michael Kerrisk 0bef253ec5 cgroups.7: Add more detail on v2 'cpu' controller and realtime threads
Explicitly note the scheduling policies that are relevant for the
v2 'cpu' controller.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-09 11:45:43 +02:00
Michael Kerrisk 4644794c1e cgroups.7: Minor wording fix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-05 08:49:15 +02:00
Michael Kerrisk 6c9aa5ad5f cgroups.7: Rework discussion of writing to cgroup.type file
In particular, it is possible to write "threaded" to a
cgroup.type file if the current type is "domain threaded".
Previously, the text had implied that this was not possible.
Verified by experiment on Linux 4.15 and 4.19-rc.

Reported-by: Leah Hanson <lhanson@pivotal.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-05 08:22:10 +02:00
Michael Kerrisk df0a41dfe3 pid_namespaces.7: Note a detail of /proc/PID/ns/pid_for_children behavior
After clone(CLONE_NEWPID), /proc/PID/ns/pid_for_children is empty
until the first child is created. Verified by experiment.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-01 14:49:08 +02:00
Michael Kerrisk e5cd406d8e pid_namespaces.7: Note that a process can do unshare(CLONE_NEWPID) only once
(See the recent commit to the unshare(2) manual page.)

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-10-01 14:42:07 +02:00
Michael Kerrisk 3acd70581d capabilities.7: Update URL for location of POSIX.1e draft standard
Reported-by: Allison Randal <allison@lohutok.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-09-29 00:02:44 +02:00
Michael Kerrisk 37894e514e sched.7: SEE ALSO: add chcpu(1), lscpu(1)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-09-28 18:38:48 +02:00
Michael Kerrisk 396761eee3 cgroups.7: Minor clarification to remove possible ambiguity
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-09-20 12:25:00 +02:00
Michael Kerrisk 5367a9aba9 capabilities.7: Ambient capabilities do not trigger secure-execution mode
Reported-by: Pierre Chifflier <pollux@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-09-13 11:41:08 +02:00
Michael Kerrisk 96123f413d signal.7: SEE ALSO: add clone(2)
Because of the discussion of trheads and signals in clone(2)/

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-09-10 11:18:06 +02:00
Michael Kerrisk c2df769494 cgroups.7: tfix
Reported-by: Mike Weilgart <mike.weilgart@verticalsysadmin.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-09-06 23:19:36 +02:00
Lucas Werkmeister 8bd6881ea9 user_namespaces.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-08-18 09:45:06 +02:00
Jakub Wilk 68bd4ad98c namespaces.7: tfix
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-08-12 14:08:19 +02:00
Tobias Klauser 5a2ed9eebe namespaces.7: tfix
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-08-06 21:42:42 +02:00
Michael Kerrisk 0d59d0c8bf capabilities.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-08-03 16:07:59 +02:00
Michael Kerrisk 50c7074665 posixoptions.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-08-03 15:53:03 +02:00
Michael Kerrisk 3426f62cea namespaces.7: Mention ioctl(2) in discussion of namespaces APIs
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-08-03 07:36:48 +02:00
Michael Kerrisk 9a6d888cb6 namespaces.7: List factors that may pin a namespace into existence
Various factors may pin a namespace into existence, even when it
has no member processes.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-08-03 07:30:17 +02:00
Michael Kerrisk 7df0e773c7 unix.7: wfix: s/foreign process/peer process/
The more common parlance these days is, I think, "peer".

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-28 12:30:44 +02:00
Michael Kerrisk 94950b9a68 socket.7, unix.7: Move text describing SO_PEERCRED from socket(7) to unix(7)
This is, AFAIK, an option specific to UNIX domain sockets, so
place it in unix(7).

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-28 12:30:44 +02:00
Michael Kerrisk ffab8460c6 unix.7: Refer reader to socket(7) for information about SO_PEEK_OFF
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-28 12:30:44 +02:00
Michael Kerrisk 2fc7c74cc5 socket.7: Refer reader to unix(7) for information on SO_PASSSEC
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-28 12:30:44 +02:00
Michael Kerrisk 48c2b7065d tcp.7, udp.7: Add a reference to socket(7) noting existence of further socket options
Some other socket options that are applicable for TCP and UDP sockets
are documented in socket(7), so help the reader by pointing them at
that page.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-28 12:30:44 +02:00
Michael Kerrisk 670387c122 udp.7: srcfix: add FIXME
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-28 12:30:44 +02:00
Michael Kerrisk 1221abb60e unix.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-28 12:30:44 +02:00
Michael Kerrisk ffad6a017f unix.7: Document SCM_SECURITY ancillary data
And fix a wording error in the description of SO_PASSSEC.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-28 12:30:44 +02:00
Michael Kerrisk 366a9bffc8 unix.7: Document SO_PASSSEC
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-28 11:50:11 +02:00
Michael Kerrisk 5af0f223d1 unix.7: Ancillary data forms a barrier when receiving on a stream socket
Thanks to a tip from Keith Packard:
https://keithp.com/blogs/fd-passing/
(Also verified by experiment.)

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-17 09:39:56 +02:00
Michael Kerrisk 5219daec26 unix.7: One must send at least one byte of real data with ancillary data
When sending ancillary data, at least one byte of real data should
also be sent.  This is strictly necessary for stream sockets
(verified by experiment). It is not required for datagram sockets
on Linux (verified by experiment), but portable applications
should do so.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-15 10:33:42 +02:00
Michael Kerrisk c0e56ed687 unix.7: Clarify treatment of incoming ancillary data if 'msg_control' is NULL
If no buffer is supplied for incoming ancillary data, then
the data is lost.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-15 10:33:32 +02:00
Michael Kerrisk 4564dd1fee unix.7: If the buffer to receive SCM_RIGHTS FDs is too small, FDs are closed
If the ancillary data buffer for receiving SCM_RIGHTS file
descriptors is too small, then the excess file descriptors are
automatically closed in the receiving process. Verified by
experiment.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-15 10:16:49 +02:00
Michael Kerrisk b65f4c691d unix.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-15 10:16:49 +02:00
Michael Kerrisk 879962006f unix.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-15 09:50:30 +02:00
Michael Kerrisk 93f5b0f8f4 mount_namespaces.7: SEE ALSO: add findmnt(8)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-13 07:08:28 +02:00
Michael Kerrisk 5b5cb19580 unix.7: When sending ancillary data, only one item of each type may be sent
Verified by experiment and reading the source code (although
the SCM_RIGHTS case is not so clear to me in the source code).

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-10 07:14:50 +02:00
Michael Kerrisk 52900faab3 unix.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-10 07:14:50 +02:00
Michael Kerrisk 311bf2f694 unix.7: Minor wording fixes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-10 07:14:50 +02:00
Michael Kerrisk 05bf3361a6 unix.7: grfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-10 07:14:50 +02:00
Michael Kerrisk c87721467e unix.7: Note behavior if buffer to receive ancillary data is too small
If the buffer supplied to recvmsg() to receive ancillary data is
too small, then the data is truncated and the MSG_CTRUNC flag is
set.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-08 21:13:08 +02:00
Michael Kerrisk 13600496d3 unix.7: Enhance the description of SCM_RIGHTS
The existing description is rather thin. More can be said.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-08 10:57:27 +02:00
Michael Kerrisk 8bdcf4bf81 unix.7: There is a limit on the size of the file descriptor array for SCM_RIGHTS
The limit is defined in the kernel as SCM_MAX_FD (253).

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-08 10:38:44 +02:00
Michael Kerrisk f1081bdc42 unix.7: Fix a minor imprecision in description of SCM_CREDENTIALS
To spoof credentials requires privilege (i.e., capabilities),
not UID 0.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-08 10:21:43 +02:00
Michael Kerrisk b66d5714b1 unix.7: grfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-08 10:20:52 +02:00
Michael Kerrisk bdef802116 unix.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-08 10:20:32 +02:00
Michael Kerrisk 2c77e8de08 capabilities.7: Note that v3 security.attributes are transparently created/retrieved
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-02 09:59:21 +02:00
Michael Kerrisk 00ae99b028 capabilities.7: Fix some imprecisions in discussion of namespaced file capabilities
The file UID does not come into play when creating a v3
security.capability extended attribute.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-01 11:42:13 +02:00
Michael Kerrisk 9b2c207a33 capabilities.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-01 11:42:13 +02:00
Michael Kerrisk c281d0505d capabilities.7: wfix
Fix some confusion between "mask" and "extended attribute"

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-01 11:42:13 +02:00
Michael Kerrisk 54254ef33a capabilities.7: srcfix: Removed FIXME
No credential match of file UID and namespace creator UID
is needed to create a v3 security extended attribute.

Verified by experiment using my userns_child_exec.c and
show_creds.c programs (available on http://man7.org/tlpi/code):

    $ sudo setcap cap_setuid,cap_dac_override=pe \
            ./userns_child_exec
    $ ./userns_child_exec -U -r setcap cap_kill=pe show_creds
    $ ./userns_child_exec -U -M '0 1000 10' -G '0 1000 1' \
            -s 1 ./show_creds
    eUID = 1;  eGID = 0;  capabilities: = cap_kill+ep

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-07-01 11:42:07 +02:00
Michael Kerrisk ffea2c14f2 capabilities.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-06-24 08:54:17 +02:00
Michael Kerrisk a607673bb8 epoll.7: Consistently use the term "interest list" rather than "epoll set"
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-06-22 12:21:56 +02:00
Michael Kerrisk d1d90ea54d epoll.7: Expand the discussion of the implications of file descriptor duplication
In particular, note that it may be difficult for an application
to know about the existence of duplicate file descriptors.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-06-22 12:20:25 +02:00
Michael Kerrisk a3961b2fd5 epoll.7: Note that edge-triggered notification wakes up only one waiter
Note a useful performance benefit of EPOLLET: ensuring that
only one of multiple waiters (in epoll_wait()) is woken
up when a file descriptor becomes ready.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-06-22 12:20:25 +02:00
Michael Kerrisk 0409116028 epoll.7: Introduce the terms "interest list" and "ready list"
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-06-22 12:20:25 +02:00
Michael Kerrisk 4524285a71 epoll.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-06-22 09:41:16 +02:00
Michael Kerrisk 1e79ad8cd8 epoll.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-06-22 09:30:02 +02:00
Michael Kerrisk b4ebb4ee79 epoll.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-06-22 09:27:46 +02:00
Michael Kerrisk 6832efaf3c epoll.7: Reformat Q&A list
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-06-22 09:27:24 +02:00
Helge Deller 0201f48246 vdso.7: Fix parisc gateway page description
The parisc gateway page currently only exports 3 functions:
The lws_entry for CAS operations (at 0xb0), the set_thread_pointer
function for usage in glibc (at 0xe0) and the Linux syscall entry
(at 0x100).

All other symbols in the manpage are internal labels and
shouldn't be used directly by userspace or glibc, so drop them
from the man page documentation.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-28 11:04:33 +02:00
Michael Kerrisk 0cec24722b signal.7: Clarify that sigsuspend() and pause() suspend the calling *thread*
Reported-by: Robin Kuzmin <kuzmin.robin@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-18 10:04:37 +02:00
Michael Kerrisk 390795d76a inotify.7: Note ENOTDIR error that can occur for IN_ONLYDIR
Note ENOTDIR error that occurs when requesting a watch on a
nondirectory with IN_ONLYDIR.

Reported-by: Paul Millar <paul.millar@desy.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-06 10:22:13 +02:00
Michael Kerrisk 0a719e9411 capabilities.7: tfix
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-02 21:16:20 +02:00
Michael Kerrisk c87cbea10f capabilities.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-02 11:37:29 +02:00
Michael Kerrisk c2b279afb7 capabilities.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-01 13:55:37 +02:00
Michael Kerrisk ddc1ad3079 capabilities.7: Add background details on capability transformations during execve(2)
Add background details on ambient and bounding set when
discussing capability transformations during execve(2).

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-01 13:55:37 +02:00
Michael Kerrisk 7c957134f1 capabilities.7: Minor rewording
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-01 13:55:37 +02:00
Michael Kerrisk bb1f24fab8 capabilities.7: Reorder text on capability bounding set
Reverse order of text blocks describing pre- and
post-2.6.25 bounding set. No content changes.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-01 13:55:37 +02:00
Michael Kerrisk 2e87ced3b5 capabilities.7: Rework bounding set as per-thread set in transformation rules
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-01 13:55:37 +02:00
Michael Kerrisk 36de80b984 capabilities.7: Add text introducing bounding set along with other thread capability sets
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-01 13:55:37 +02:00
Michael Kerrisk daf8312704 capabilities.7: Clarify which capability sets capset(2) and capget(2) apply to
capset(2) and capget(2) apply operate only on the permitted,
effective, and inheritable process capability sets.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-01 12:46:48 +02:00
Michael Kerrisk 1db1d36d82 capabilities.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-05-01 12:40:14 +02:00
Michael Kerrisk 09b8afdc04 execve.2, fallocate.2, getrlimit.2, io_submit.2, membarrier.2, mmap.2, msgget.2, open.2, ptrace.2, readv.2, semget.2, shmget.2, shutdown.2, syscall.2, wait.2, wait4.2, crypt.3, encrypt.3, fseek.3, getcwd.3, makedev.3, pthread_create.3, puts.3, tsearch.3, elf.5, filesystems.5, group.5, passwd.5, sysfs.5, mount_namespaces.7, posixoptions.7, time.7, unix.7, vdso.7, xattr.7, ld.so.8: tstamp
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-30 17:41:31 +02:00
Michael Kerrisk 29c0586f51 bpf.2, sched_setattr.2, crypt.3, elf.5, proc.5, fanotify.7, feature_test_macros.7, sched.7: spfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-27 14:48:33 +02:00
Michael Kerrisk 075f5e6592 namespaces.7: Mention that device ID should also be checked when comparing NS symlinks
When comparing two namespaces symlinks to see if they refer to
the same namespace, both the inode number and the device ID
should be compared. This point was already made clear in
ioctl_ns(2), but was missing from this page.

Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-27 14:10:32 +02:00
Jakub Wilk 3eb078c52f unix.7: tfix
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-27 14:01:50 +02:00
Jakub Wilk 90ef0f7bf8 capabilities.7: tfix
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-27 14:01:43 +02:00
Michael Kerrisk 314d88f611 vdso.7: VDSO symbols (system calls) are not visible to seccomp(2) filters
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-24 18:25:44 +02:00
Michael Kerrisk 115c1eb46c capabilities.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-19 11:18:31 +02:00
Michael Kerrisk 690e62da71 capabilities.7: srcfix: FIXME
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 21:23:28 +02:00
Michael Kerrisk bcaa30c985 capabilities.7: Rework file capability versioning and namespaced file caps text
There was some confused missing of concepts between the
two subsections, and some other details that needed fixing up.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 21:23:28 +02:00
Michael Kerrisk 6442c03b68 capabilities.7: Explain when VFS_CAP_REVISION_3 file capabilities have effect
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 21:23:28 +02:00
Michael Kerrisk 7b45f4b2ad capabilities.7: Explain rules that determine version of security.capability xattr
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 21:23:28 +02:00
Michael Kerrisk 7da0c87a78 capabilities.7: Explain term "namespace root user ID"
Confirmed with Serge Hallyn that: "nsroot" means the UID 0
in the namespace as it would be mapped into the initial userns.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 21:23:28 +02:00
Michael Kerrisk 12dce73121 capabilities.7: Document namespaced-file capabilities
Cowritten-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 21:23:28 +02:00
Michael Kerrisk b684870410 capabilities.7: Describe file capability versioning
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 21:23:28 +02:00
Michael Kerrisk 873727f44a posixoptions.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 17:02:28 +02:00
Michael Kerrisk 11e9d8f890 posixoptions.7: Use a more consistent, less cluttered layout for option lists
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 17:02:18 +02:00
Michael Kerrisk 17282a589f posixoptions.7: Make function lists more consistent and less cluttered
Use more consistent layout for lists of functions, and
remove punctuation from the lists to make them less cluttered.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 10:44:01 +02:00
Michael Kerrisk 5a9ef49145 posixoptions.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 10:25:11 +02:00
Michael Kerrisk 6f131a899a posixoptions.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 10:25:11 +02:00
Michael Kerrisk 45adee316b posixoptions.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 10:25:11 +02:00
Michael Kerrisk 742ce8ddec posixoptions.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 10:25:11 +02:00
Michael Kerrisk 6b2300a2f3 posixoptions.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 09:42:26 +02:00
Carlos O'Donell 233b0395d8 posixoptions.7: Expand XSI Options groups
We define in detail the X/Open System Interfaces i.e. _XOPEN_UNIX
and all of the X/Open System Interfaces (XSI) Options Groups.

The XSI options groups include encryption, realtime, advanced
realtime, realtime threads, advanced realtime threads, tracing,
streams, and legacy interfaces.

Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-13 09:39:10 +02:00
Michael Kerrisk 7934bcdfdc unix.7: ERRORS: add EBADF for sending closed file descriptor with SCM_RIGHTS
As noted by Rusty Russell:

I was really surprised that sendmsg() returned EBADF on a valid fd;
turns out I was using sendmsg with SCM_RIGHTS to send a closed fd,
which gives EBADF (see test program below).

But this is only obliquely referenced in unix(7):

       SCM_RIGHTS
              Send or receive a set  of  open  file  descriptors
              from  another  process.  The data portion contains
              an integer array of  the  file  descriptors.   The
              passed file descriptors behave as though they have
              been created with dup(2).

EBADF is not mentioned in the unix(7) ERRORS (it's mentioned in
dup(2)).

int fdpass_send(int sockout, int fd)
{
	/* From the cmsg(3) manpage: */
	struct msghdr msg = { 0 };
	struct cmsghdr *cmsg;
	struct iovec iov;
	char c = 0;
	union {         /* Ancillary data buffer, wrapped in a union
			   in order to ensure it is suitably aligned */
		char buf[CMSG_SPACE(sizeof(fd))];
		struct cmsghdr align;
	} u;

	msg.msg_control = u.buf;
	msg.msg_controllen = sizeof(u.buf);
	memset(&u, 0, sizeof(u));
	cmsg = CMSG_FIRSTHDR(&msg);
	cmsg->cmsg_level = SOL_SOCKET;
	cmsg->cmsg_type = SCM_RIGHTS;
	cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
	memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));

	msg.msg_name = NULL;
	msg.msg_namelen = 0;
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_flags = 0;

	/* Keith Packard reports that 0-length sends don't work, so we
	 * always send 1 byte. */
	iov.iov_base = &c;
	iov.iov_len = 1;

	return sendmsg(sockout, &msg, 0);
}

int fdpass_recv(int sockin)
{
	/* From the cmsg(3) manpage: */
	struct msghdr msg = { 0 };
	struct cmsghdr *cmsg;
	struct iovec iov;
	int fd;
	char c;
	union {         /* Ancillary data buffer, wrapped in a union
			   in order to ensure it is suitably aligned */
		char buf[CMSG_SPACE(sizeof(fd))];
		struct cmsghdr align;
	} u;

	msg.msg_control = u.buf;
	msg.msg_controllen = sizeof(u.buf);

	msg.msg_name = NULL;
	msg.msg_namelen = 0;
	msg.msg_iov = &iov;
	msg.msg_iovlen = 1;
	msg.msg_flags = 0;

	iov.iov_base = &c;
	iov.iov_len = 1;

	if (recvmsg(sockin, &msg, 0) < 0)
		return -1;

	cmsg = CMSG_FIRSTHDR(&msg);
        if (!cmsg
	    || cmsg->cmsg_len != CMSG_LEN(sizeof(fd))
	    || cmsg->cmsg_level != SOL_SOCKET
	    || cmsg->cmsg_type != SCM_RIGHTS) {
		errno = -EINVAL;
		return -1;
	}

	memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd));
	return fd;
}

static void child(int sockfd)
{
	int newfd = fdpass_recv(sockfd);
	assert(newfd < 0);
	exit(0);
}

int main(void)
{
	int sv[2];
	int pid, ret;

	assert(socketpair(AF_UNIX, SOCK_STREAM, 0, sv) == 0);

	pid = fork();
	if (pid == 0) {
		close(sv[1]);
		child(sv[0]);
	}

	close(sv[0]);
	ret = fdpass_send(sv[1], sv[0]);
	printf("fdpass of bad fd return %i (%s)\n", ret, strerror(errno));
	return 0;
}

Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-12 10:55:29 +02:00
Michael Kerrisk d3e7786def unix.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-12 10:42:34 +02:00
Konstantin Grinemayer 04c8a02088 keyring.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-04-12 08:46:42 +02:00
Michael Kerrisk 3f6061d025 socket.7: Fix error in SO_INCOMING_CPU code snippet
The last argument is passed by value, not reference.
Reported-by: Tomi Salminen <tsalminen@forcepoint.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-03-27 22:06:52 +02:00
Michael Kerrisk d8c64e25f8 network_namespaces.7: Add cross reference to unix(7)
For further information on UNIX domain abstract sockets.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-03-16 08:50:36 +01:00
Michael Kerrisk 39ad46695f time.7: Mention clock_gettime()/clock_settime() rather than [gs]ettimeofday()
gettimeofday() is declared obsolete by POSIX. Mention instead
the modern APIs for working with the realtime clock.

See https://bugzilla.kernel.org/show_bug.cgi?id=199049

Reported-by: Enrique Garcia <cquike@arcor.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-03-16 08:50:36 +01:00
Michael Kerrisk 6b49df2229 mount_namespaces.7: Note another case where shared "peer groups" are formed
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-25 16:42:16 +01:00
Michael Kerrisk 46af719866 mount_namespaces.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-25 16:37:08 +01:00
Michael Kerrisk a21658aad3 network_namespaces.7: Network namespaces isolate the UNIX domain abstract socket namespace
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-24 23:04:53 +01:00
Michael Kerrisk aeeb48005e user_namespaces.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-23 10:38:47 +01:00
Michael Kerrisk 1a7e08e367 namespaces.7: Note an idiosyncracy of /proc/[pid]/ns/pid_for_children
/proc/[pid]/ns/pid_for_children has a value only after first
child is created in PID namespace. Verified by experiment.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-21 17:31:48 +01:00
Michael Kerrisk 0813749503 capabilities.7: remove redundant mention of PTRACE_SECCOMP_GET_FILTER
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-21 10:38:17 +01:00
Michael Kerrisk 9863b9acfe xattr.7: SEE ALSO: add selinux(8)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-21 08:43:14 +01:00
Michael Kerrisk 7747ed9789 cgroups.7: cgroup.events transitions generate POLLERR as well as POLLPRI
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-10 09:46:14 +01:00
Michael Kerrisk 2cd9bbfa48 Removed trailing white space at end of lines 2018-02-02 07:48:33 +01:00
Michael Kerrisk 8538a62b4c iconv.1, bpf.2, copy_file_range.2, fcntl.2, memfd_create.2, mlock.2, mount.2, mprotect.2, perf_event_open.2, pkey_alloc.2, prctl.2, read.2, recvmmsg.2, s390_sthyi.2, seccomp.2, sendmmsg.2, syscalls.2, unshare.2, write.2, errno.3, fgetpwent.3, fts.3, pthread_rwlockattr_setkind_np.3, fuse.4, veth.4, capabilities.7, cgroups.7, ip.7, man-pages.7, namespaces.7, network_namespaces.7, sched.7, socket.7, user_namespaces.7, iconvconfig.8: tstamp
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-02 07:38:54 +01:00
Michael Kerrisk 93b96116f0 vsock.7: Add license and copyright
Stefan noted on the mailing list that selection of the
verbatim license was fine.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-02-01 22:23:28 +01:00
Jakub Wilk 7a1cddd289 cgroups.7: tfix
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-26 19:58:40 +01:00
Michael Kerrisk 42dfc34c33 capabilities.7: spfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-13 20:58:58 +01:00
Michael Kerrisk cd7f4c4958 cgroups.7: Add a detail on delegation of cgroup.threads
Some notes from a conversation with Tejun Heo:

    Subject: Re: cgroups(7): documenting cgroups v2 delegation
    Date: Wed, 10 Jan 2018 14:27:26 -0800
    From: Tejun Heo <tj@kernel.org>

    > > 1. When delegating, cgroup.threads should be delegated.  Doing that
    > >    selectively doesn't achieve anything meaningful.
    >
    > Understood. But surely delegating cgroup.threads is effectively
    > meaningless when delegating a "domain" cgroup tree? (Obviously it's
    > not harmful to delegate the the cgroup.threads file in this case;
    > it's just not useful to do so.)

    Yeap, unless we can somehow support non-root mixed domains.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:52:26 +01:00
Michael Kerrisk 6dc513cd38 cgroups.7: Subhierarchy under delegated subtree will be owned by delegatee
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:47:12 +01:00
Michael Kerrisk 7b327dd5f3 cgroups.7: Add a detail on delegation of cgroup.threads
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:47:12 +01:00
Michael Kerrisk d84e558ef3 cgroups.7: Define containment rules for cgroup.threads
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:47:12 +01:00
Michael Kerrisk 446d164326 cgroups.7: Minor wording fixes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:47:12 +01:00
Michael Kerrisk c7913617f7 cgroups.7: cgroup.threads should appear in /sys/kernel/cgroup/delegate
As discussed with Tejun Heo and Roman Gushchin, the
omission of this file from the list is a bug, and
is about to be fixed by a kernel patch from Roman.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:47:12 +01:00
Michael Kerrisk 6125483529 cgroups.7: Add some rationale for the existence of the "domain invalid" cgroup type
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:47:12 +01:00
Michael Kerrisk dc581e07a4 cgroups.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:47:12 +01:00
Michael Kerrisk 0736182888 cgroups.7: Point out that 'nsdelegate' can also be applied on a remount
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:47:12 +01:00
Michael Kerrisk 277559a45c cgroups.7: Clarify that cgroup.controllers is read-only
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:47:12 +01:00
Michael Kerrisk 639b6c8c57 cgroups.7: cgroup.threads is also delegated if delegating a threaded subtree
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-11 00:46:56 +01:00
Michael Kerrisk 4178f13224 cgroups.7: cgroup.threads is writable only inside a threaded subtree
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk b2c3e72073 cgroups.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 2e69ff536c cgroups.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk d311c798b7 cgroups.7: Add a more complete description of cgroup v1 named hierarchies
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 218eadf4ae cgroups.7: srcfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk a76748a0e1 cgroups.7: Remove accidentally duplicated NOTES and ERRORS sections
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk c56ec51ba6 cgroups.7: Elaborate a little on problems of splitting threads across cgroups in v1
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 7b574df5c6 cgroups.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 59af05147e cgroups.7: Document 'release_agent' mount option
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 56769384da cgroups.7: Rework text on threads and cgroups v2
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 980f1827b0 cgroups.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk fcf115f54f cgroups.7: wfix
Reported-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk d1d4f69503 cgroups.7: srcfix: remove FIXME
Tejun noted that his statement wasn't correct.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk ed3f4f34fc cgroups.7: Document cgroup v2 delegation via the 'nsdelegate' mount option
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 148e0800eb cgroups.7: Modify cgroup v2 delegation subheading
We are about to add description of a different kind
of delegation (nsdelegate) with its own subheading.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 27b086e998 cgroups.7: Add a subheading for delegation containment rules
This is useful in preparation for adding discussion of the
'nsdelegate' mount option.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 6413d78493 cgroups.7: Document /sys/kernel/cgroup/features
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 668ef76586 cgroups.7: Document /sys/kernel/cgroup/delegate
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 28f612ea3d cgroups.7: Note Linux 4.11 changes to cgroup v2 delegation containment rules
See kernel commit 576dd464505fc53d501bb94569db76f220104d28

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:47 +01:00
Michael Kerrisk 896305ece8 cgroups.7: srcfix: Remove FIXME
Tejun Heo confirmed that the existing text is correct.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:46 +01:00
Michael Kerrisk e5936eb62f cgroups.7: Tweak the description of delegation of cgroup.subtree_control
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:46 +01:00
Michael Kerrisk 00c2709250 cgroups.7: Remove bogus "constraint" relating to thread mode
Existing cgroups under threaded root *must*, by definition,
be either domain or part of threaded subtrees, so this is not
a constraint on the creation of threaded subtrees.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:46 +01:00
Michael Kerrisk c7f63e7434 cgroups.7: Minor tweaks to text on cgroup.stat
Reported-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:35:43 +01:00
Michael Kerrisk 06dadef809 cgroups.7: srcfix: FIXME (nsdelegate)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:19 +01:00
Michael Kerrisk 75e83bc270 cgroups.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:19 +01:00
Michael Kerrisk d0dd7b8844 cgroups.7: srcfix FIXME
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk 1de5994653 cgroups.7: srcfix: FIXME
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk b59229e4f9 cgroups.7: srcfix FIXME
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk 0735069bf3 cgroups.7: Minor tweak to text on v2 delegation
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk e5bd7e6598 cgroups.7: Minor fix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk 5714ccee0a cgroups.7: Add some section (SH) headings
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk c8902e25cc cgroups.7: Document cgroups v2 "thread mode"
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk e91d4f9ee7 cgroups.7: Mention the existence of "thread mode" in Linux 4.14
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk 5845e10bdb cgroups.7: Document the cgroup.max.depth and cgroup.max.descendants files
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk 5e071499bb cgroups.7: Document cgroups v2 cgroup.stat file
Based on the text in Documentation/cgroup-v2.txt.

Reviewed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk f7286edcde cgroups.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-10 00:32:18 +01:00
Michael Kerrisk eaf4a2607b sched.7: Correctly describe effect of priority changes for RT threads
The placement of a thread in the run queue for its new
priority depends on the direction of movement in priority.
(This appears to contradict POSIX, except in the case of
pthread_setschedprio().)

As reported by Andrea, and followed up by me:

> I point out that the semantics of sched_setscheduler(2) for RT threads
> indicated in sched(7) and, in particular, in
>
>    "A call to sched_setscheduler(2), sched_setparam(2), or
>     sched_setattr(2) will put the SCHED_FIFO (or SCHED_RR) thread
>     identified by pid at the start of the list if it was runnable."
>
> does not "reflect" the current implementation of this syscall(s) that, in
> turn; based on the source, I think a more appropriate description of this
> semantics would be:
>
>    "... the effect on its position in the thread list depends on the
>     direction  of the modification, as follows:
>
>       a. if the priority is raised, the thread becomes the tail of the
>          thread list.
>       b. if the priority is unchanged, the thread does not change position
>          in the thread list.
>       c. if the priority is lowered, the thread becomes the head of the
>          thread list."
>
> (copied from
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_08_04_01
> ).

So, I did some testing, and can confirm that the above is the behavior
on Linux for changes to scheduling priorities for RT processes.
(My tests consisted of creating a multithreaded process where all
threads are confined to the same CPU with taskset(), and each thread
is in a CPU-bound loop. I then maipulated their priorities with
chrt(1) and watched the CPU time being consumed with ps(1).)

Back in SUSv2 there was this text:

[[
6. If a thread whose policy or priority has been modified is a running
thread or is runnable, it then becomes the tail of the thread list for
its new priority.
]]

And certainly Linux used to behave this way. I remember testing it,
and when one looks at the Linux 2.2 source code for example, one can
see that there is a call to move_first_runqueue() in this case. At some
point, things changed, and I have not investigated exactly where that
change occurred (but I imagine it was quite a long time ago).

Looking at SUSv4, let's expand the range of your quote, since
point 7 is interesting. Here's text from Section 2.8.4
"Process Scheduling" in POSIX.1-2008/SUSv4 TC2:

[[
7. If a thread whose policy or priority has been modified other
   than by pthread_setschedprio() is a running thread or is runnable,
   it then becomes the tail of the thread list for its new priority.
8. If a thread whose priority has been modified by pthread_setschedprio()
   is a running thread or is runnable, the effect on its position in the
   thread list depends on the direction of the modification, as follows:
   a. If the priority is raised, the thread becomes the tail of the
      thread list.
   b. If the priority is unchanged, the thread does not change position
      in the thread list.
   c. If the priority is lowered, the thread becomes the head of the
      thread list.
]]

(Note that the preceding points mention variously sched_setscheduler(),
sched_setsparam(), and pthread_setschedprio(), so that the mention of
just pthread_setschedprio() in points 7 and 8 is significant.)

Now, since chrt(1) uses sched_setscheduler(), rather than
pthread_setschedprio(), then arguably the Linux behavior is a
violation of POSIX. (Indeed, buried in the man-pages source, I find
that I many years ago wrote the comment:

    In 2.2.x and 2.4.x, the thread is placed at the front of the queue
    In 2.0.x, the Right Thing happened: the thread went to the back -- MTK

But the Linux behavior seems reasonable to me and I'm inclined
to just document it (see the patch below).

Reported-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-09 19:46:28 +01:00
Michael Kerrisk ffbfb5abd4 udplite.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-06 23:01:45 +01:00
Michael Kerrisk 6f9c4ef241 pty.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2018-01-06 23:00:52 +01:00
Michael Kerrisk 2468f14e4b cgroups.7: Relocate the 'Cgroups v2 "no internal processes" rule' subsection
Logically, this section should follow the section that
describes cgroup.subtree_control.

No content changes in this patch.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-27 06:32:48 +01:00
Michael Kerrisk 4f017a682c cgroups.7: Elaborate on the "no internal processes" rule
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-27 06:32:48 +01:00
Michael Kerrisk c9b101d1a2 cgroups.7: Mention ENOENT error that can occur when writing to subtree_control file
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-27 06:32:48 +01:00
Michael Kerrisk 4242dfbe4f cgroups.7: Add subsection describing cgroups v2 subtree delegation
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-27 06:32:48 +01:00
Michael Kerrisk ccb1a2621b cgroups.7: Minor rewording
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 14:09:00 +01:00
Michael Kerrisk 8d5f42dc46 cgroups.7: Rewrite the description of cgroup v2 subtree control
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 13:41:39 +01:00
Michael Kerrisk 57cbb0dbb0 cgroups.7: One may need to unmount v1 controllers before they can be used in v2
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 10:58:08 +01:00
Michael Kerrisk 75a12bb537 cgroups.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 10:39:36 +01:00
Michael Kerrisk 7409b54bdd cgroups.7: Add a section on unmounting cgroup v1 filesystems
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 10:27:11 +01:00
Michael Kerrisk 783a40b677 cgroups.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 09:34:14 +01:00
Michael Kerrisk 03bb1264cd cgroups.7: Note that systemd(1) nowadays automatically mount the cgroup2 filesystem
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 09:29:45 +01:00
Michael Kerrisk 2e33b59ee3 cgroups.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 09:09:41 +01:00
Michael Kerrisk 4769a77817 cgroups.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 09:09:26 +01:00
Michael Kerrisk 44c429ed45 cgroups.7: Add list of currently available version 2 controllers
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-23 09:07:00 +01:00
Michael Kerrisk d5034243fa sched.7: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-19 10:22:09 +01:00
Michael Kerrisk 286bdd7ca2 sched.7: Remove a mention of SCHED_RR in discussion of priority changes
Later in the page it is stated that SCHED_RR is the same as SCHED_FIFO.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-19 10:19:39 +01:00
Michael Kerrisk 329c0e77d1 sched.7: Minor clarifications
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-19 09:04:26 +01:00
Michael Kerrisk cb57fbc284 ip.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-18 17:43:54 +01:00
Michael Kerrisk 5d0ea688e3 ip.7: s/INADDR_ANY/INADDR_LOOPBACK/ in discussion of htonl()
INADDR_LOOPBACK is a better example, since it is not
byte-order neutral.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-18 17:39:02 +01:00
Ricardo Biehl Pasquali c0a0e532ae ip.7: INADDR_* values cannot be assigned directly to 's_addr'
According to The Open Group Base Specifications Issue 7, RATIONALE
section of
http://pubs.opengroup.org/onlinepubs/9699919799/ basedefs/netinet_in.h.html
some INADDR_* values must be converted using htonl().

INADDR_ANY and INADDR_BROADCAST are byte-order-neutral so they do
not require htonl(), however I only comment this fact in NOTES.
On the text I recommend to use htonl(), "even if for some subset
it's not necessary".

Signed-off-by: Ricardo Biehl Pasquali <pasqualirb@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-18 17:36:57 +01:00
Michael Kerrisk bd05436994 fifo.7: wfix
Reported-by: Adam Liddell <ml+kernel.org@aliddell.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-18 17:16:34 +01:00
Michael Kerrisk d145c0250b cgroups.7: Minor rewording
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-18 17:14:49 +01:00
Nikolay Borisov cfec905ed7 cgroups.7: Add information about RDMA controller
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-18 17:11:47 +01:00
Michael Kerrisk b8cee784b3 capabilities.7: Clarify effect of CAP_SETFCAP
Make it clear that CAP_SETFCAP allows setting arbitrary
capabilities on a file.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-16 00:09:25 +01:00
Stefan Hajnoczi ba294a0ee6 vsock.7: Clarify send(2)/recv(2) families of system calls
Sockets support both read(2)/write(2) and send(2)/recv(2) system
calls.  Each of these is actually a family of multiple system
calls such as send(2), sendfile(2), sendmsg(2), sendmmsg(2), and
sendto(2).

This patch claries which families of system calls can be used.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-12 19:12:07 +01:00
Michael Kerrisk 308a16d989 vsock.7: Place SEE ALSO and ERRORS in alphabetical order
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-11 20:30:38 +01:00
Michael Kerrisk 2472922151 vsock.7: Minor fixes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-11 20:30:37 +01:00
Michael Kerrisk 4a70bb07bc vsock.7: srcfix: rewrap source lines
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-11 20:19:03 +01:00
Stefan Hajnoczi 29598b2f2d vsock.7: Document the VSOCK socket address family
The AF_VSOCK address family has been available since Linux 3.9.

This patch adds vsock.7 and describes its use along the same lines as
existing ip.7, unix.7, and netlink.7 man pages.

CC: Jorgen Hansen <jhansen@vmware.com>
CC: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-11 20:11:12 +01:00
Michael Kerrisk 46010ab917 socket.7: tfix
Reported-by: Joel Williamson <jwilliamson@carnegietechnologies.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-11 18:40:14 +01:00
Michael Kerrisk ec9612a19f network_namespaces.7: Minor adjustments to list of resources governed by network namespaces
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-10 23:19:17 +01:00
Michael Kerrisk f9ecf99e59 network_namespaces.7: When a NW namespace is freed, veth devices are destroyed
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-10 23:19:17 +01:00
Michael Kerrisk f051ce24ac network_namespaces.7: Reorganize text
No content changes...

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-10 23:19:17 +01:00
Michael Kerrisk 2685b303e3 namespaces.7, network_namespaces.7: Move content from namespaces(7) to network_namespaces(7)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-10 23:19:17 +01:00
Michael Kerrisk 9f7ce0c2e8 network_namespaces.7: New page describing network namespaces
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-10 23:19:17 +01:00
Michael Kerrisk 4bf43ba523 pid_namespaces.7: SEE ALSO: add mount_namespaces(7)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-08 10:13:42 +01:00
Michael Kerrisk 54b9d7bf87 user_namespaces.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-06 15:05:15 +01:00
Michael Kerrisk e62172cbd9 capabilities.7: Rephrase CAP_SETPCAP description
* Mention kernel versions.
* Place current kernel behavior first

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-05 22:27:27 +01:00
G. Branden Robinson 777411ae61 iconv.1, pthread_rwlockattr_setkind_np.3, man-pages.7, socket.7, iconvconfig.8: Standardize on "nonzero"
Also add this term to the style guide in man-pages(7).

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-05 22:27:13 +01:00
Michael Kerrisk e93e59f97b capabilities.7: SECBIT_KEEP_CAPS is ignored if SECBIT_NO_SETUID_FIXUP is set
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-03 11:16:32 +01:00
Michael Kerrisk e43d2a6013 capabilities.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-03 11:16:32 +01:00
Michael Kerrisk 02ff4f27c2 capabilities.7: Note which capability sets are affected by SECBIT_NO_SETUID_FIXUP
Note explicitly that SECBIT_NO_SETUID_FIXUP is relevant for
the permitted, effective, and ambient capability sets.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-03 11:16:19 +01:00
Michael Kerrisk 7c8eb8f7cf capabilities.7: Deemphasize the ancient prctl(2) PR_SET_KEEPCAPS command
The modern approach is SECBITS_KEEP_CAPS.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-02 16:21:37 +01:00
Michael Kerrisk f7dbc40ee7 capabilities.7: Minor wording fix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-02 16:21:37 +01:00
Michael Kerrisk 705a8f33f1 capabilities.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-02 15:43:02 +01:00
Michael Kerrisk bbb186d403 capabilities.7: Clarify which capability sets are effected by SECBIT_KEEP_CAPS
This flag has relevance only for the process permitted and
effective sets.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-02 15:40:39 +01:00
Michael Kerrisk e67ac266c8 capabilities.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-02 15:08:52 +01:00
Michael Kerrisk f6b60423bd capabilities.7: Ambient set is also cleared when UIDs are set to nonzero value
See cap_emulate_setxuid():

        kuid_t root_uid = make_kuid(old->user_ns, 0);

        if ((uid_eq(old->uid, root_uid) ||
             uid_eq(old->euid, root_uid) ||
             uid_eq(old->suid, root_uid)) &&
            (!uid_eq(new->uid, root_uid) &&
             !uid_eq(new->euid, root_uid) &&
             !uid_eq(new->suid, root_uid))) {
                if (!issecure(SECURE_KEEP_CAPS)) {
                        cap_clear(new->cap_permitted);
                        cap_clear(new->cap_effective);
                }

                /*
                 * Pre-ambient programs expect setresuid to nonroot followed
                 * by exec to drop capabilities.  We should make sure that
                 * this remains the case.
                 */
                cap_clear(new->cap_ambient);
        }

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-02 11:08:40 +01:00
Michael Kerrisk 8e821c3aa8 user_namespaces.7: Mention NS_GET_OWNER_UID ioctl() operation
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-02 09:22:40 +01:00
Michael Kerrisk a563b19b70 capabilities.7: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-12-02 09:12:07 +01:00
Michael Kerrisk 1c6f59c276 getpid.2, pipe.2, abort.3, daemon.3, pthread_yield.3, stdio.3, sysconf.3, tty.4, shells.5, sysfs.5, fifo.7, hier.7, icmp.7, path_resolution.7, pid_namespaces.7, standards.7: tstamp
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-11-26 12:38:46 +01:00
Michael Kerrisk 8466189293 fifo.7: Refer reader to pipe(7) for details of I/O semantics of FIFOs
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-11-23 13:36:00 +01:00
Michael Kerrisk 4cee582147 socket.7: Correct the description of SO_RXQ_OVFL
Two reports that the description of SO_RXQ_OVFL was wrong.

======

Commentary from Tobias:

This bug pertains to the manpage as visible on man7.org right
now.

The socket(7) man page has this paragraph:

       SO_RXQ_OVFL (since Linux 2.6.33)
              Indicates that an unsigned 32-bit value ancillary
              message (cmsg) should be attached to received skbs
              indicating the number of packets dropped by the
              socket between the last received packet and this
              received packet.

The second half is wrong: the counter (internally,
SOCK_SKB_CB(skb)->dropcount is *not* reset after every packet.
That is, it is a proper counter, not a gauge, in monitoring
parlance.

A better version of that paragraph:

       SO_RXQ_OVFL (since Linux 2.6.33)
              Indicates that an unsigned 32-bit value ancillary
              message (cmsg) should be attached to received skbs
              indicating the number of packets dropped by the
              socket since its creation.
======
Commentary from Petr

Generic SO_RXQ_OVFL helpers sock_skb_set_dropcount() and
sock_recv_drops() implements returning of sk->sk_drops (the total
number of dropped packets), although the documentation says the
number of dropped packets since the last received one should be
returned (quoting the current socket.7):

  SO_RXQ_OVFL (since Linux 2.6.33)
  Indicates that an unsigned 32-bit value ancillary message (cmsg)
  should be attached to received skbs indicating the number of packets
  dropped by the socket between the last received packet and this
  received packet.

I assume the documentation needs to be updated, as fixing this in
the code could break programs depending on the current behavior,
although the formerly planned functionality seems to be more
useful.

The problem can be revealed with the following program:

int extract_drop(struct msghdr *msg)
{
        struct cmsghdr *cmsg;
        int rtn;

        for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg,cmsg)) {
                if (cmsg->cmsg_level == SOL_SOCKET &&
                    cmsg->cmsg_type == SO_RXQ_OVFL) {
                        memcpy(&rtn, CMSG_DATA(cmsg), sizeof rtn);
                        return rtn;
                }
        }
        return -1;
}

int main(int argc, char *argv[])
{
        struct sockaddr_in addr = { .sin_family = AF_INET };
        char msg[48*1024], cmsgbuf[256];
        struct iovec iov = { .iov_base = msg, .iov_len = sizeof msg };
        int sk1, sk2, i, one = 1;

        sk1 = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
        sk2 = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);

        inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
        addr.sin_port = htons(53333);

        bind(sk1, (struct sockaddr*)&addr, sizeof addr);
        connect(sk2, (struct sockaddr*)&addr, sizeof addr);

        // Kernel doubles this limit, but it accounts also the SKB overhead,
        // but it receives as long as there is at least 1 byte free.
        i = sizeof msg;
        setsockopt(sk1, SOL_SOCKET, SO_RCVBUF, &i, sizeof i);
        setsockopt(sk1, SOL_SOCKET, SO_RXQ_OVFL, &one, sizeof one);

        for (i = 0; i < 4; i++) {
                int rtn;

                send(sk2, msg, sizeof msg, 0);
                send(sk2, msg, sizeof msg, 0);
                send(sk2, msg, sizeof msg, 0);

                do {
                        struct msghdr msghdr = {
                                        .msg_iov = &iov, .msg_iovlen = 1,
                                        .msg_control = &cmsgbuf,
                                        .msg_controllen = sizeof cmsgbuf };
                        rtn = recvmsg(sk1, &msghdr, MSG_DONTWAIT);
                        if (rtn > 0) {
                                printf("rtn: %d drop %d\n", rtn,
                                                extract_drop(&msghdr));
                        } else {
                                printf("rtn: %d\n", rtn);
                        }
                } while (rtn > 0);
        }

        return 0;
}

which prints
  rtn: 49152 drop -1
  rtn: 49152 drop -1
  rtn: -1
  rtn: 49152 drop 1
  rtn: 49152 drop 1
  rtn: -1
  rtn: 49152 drop 2
  rtn: 49152 drop 2
  rtn: -1
  rtn: 49152 drop 3
  rtn: 49152 drop 3
  rtn: -1
although it should print (according to the documentation):
  rtn: 49152 drop 0
  rtn: 49152 drop 0
  rtn: -1
  rtn: 49152 drop 1
  rtn: 49152 drop 0
  rtn: -1
  rtn: 49152 drop 1
  rtn: 49152 drop 0
  rtn: -1
  rtn: 49152 drop 1
  rtn: 49152 drop 0
  rtn: -1

Reported-by: Petr Malat <oss@malat.biz>
Reported-by: Tobias Klausmann <klausman@schwarzvogel.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2017-11-20 13:54:28 +01:00