The Blackfin port was removed in Linux 4.17. Mention this in the
section concerning Blackfin vDSO functions.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Improved the readability of a sentence that describes the use of
FAN_REPORT_FID and how this particular flag influences what data
structures a listening application could expect to receive when
describing an event.
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the symbols exported by the RISCV vDSO which is present
from kernel 4.15 onwards.
See kernel source files in arch/riscv/kernel/vdso.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Details relating to the new initialization flag FAN_REPORT_FID has been
added. As part of the FAN_REPORT_FID feature, a new set of event masks are
available and have been documented accordingly.
A simple example program has been added to also support the understanding
and use of FAN_REPORT_FID and directory modification events.
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Give the shell in the second cgroup namespace a different prompt,
so as to clearly distinguish the two namespaces.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The section "Example Programs ..." was renamed to "Example programs ..."
(with lowercase p) in c634028ab5, but the reference was not
updated.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
groff_mdoc(7) from the groff project provides a better
equivalent of mdoc.samples(7) and the 'mandoc' project
provides a better mdoc(7). And nowadays, there are virtually
no pages in "man-pages" that use mdoc markup.
So, drop these pages.
From a conversation on linux-man with Ingo Schwarz:
[[
Subject: Re: [groff] [PATCH] man7/mdoc_samples.7: srcfix: Avoid a warning about a wrong section
Date: Wed, 27 Feb 2019 15:28:19 +0100
> The two actual problems are both within the Linux man-pages project,
> not within groff:
>
> 1. While back in the early 1990ies, Cynthia Livingston's
> mdoc.samples(7) manual page was an important document and the
> de-facto language definition of the mdoc(7) language, it has
> been outdated for a long time now. The current groff_mdoc(7)
> manual page is based on it but contains large numbers of important
> improvements by Werner Lemberg and others. As an alternative
> language definition that is slightly more concise without being
> less precise and complete, the mdoc(7) manual page is available
> from the mandoc(1) distribution (mandoc.bsd.lv). If there are
> any contradictions between groff_mdoc(7) and mdoc(7), those are
> unintended and i ought to fix them.
>
> So i really believe that the Linux man-pages project ought to
> stop distributing the woefully outdated mdoc.samples(7) manual
> page. If you want to include documentation for the mdoc language,
> i suggest that you either include a copy of the current version
> of the groff_mdoc(7) manual from the groff(1) distribution or
> of the mdoc(7) manual from the mandoc(1) distribution, whichever
> you think harmonizes better with the Linux man-pages project.
> Both are BSD-style licensed, so there should be no licensing
> issues.
>
> I'm not sure whether it is better for you to include or not
> include it. There is probably value in having mdoc(7) documentation
> out of the box with the Linux man-pages project. Then again,
> having groff_mdoc(7) in both the Linux man-pages package and
> in the groff package - or having mdoc(7) in both the Linux
> man-pages project and the mandoc(1) package - might cause
> packaging conflicts for some distributions. I don't rightly
> know how such conflicts are typically handled by Linux
> distributions. Not being able to install the Linux man-pages
> pages project, groff(1) and mandoc(1) all together on the same
> Linux machine would certainly be a bad situation...
>
> By the way, the mdoc(7) manual page distributed by the Linux
> man-pages project also makes very little sense. It is a partial
> repetition of information from groff_mdoc(7)/[mandoc-]mdoc(7),
> but so compressed that it is mostly unintelligible. Besides,
> it is incomplete: e.g. .Lk, .Mt, .Dx, .Ox, .Nx, .Ta, .%U, .Bk,
> .Ek, .Lb, .In, .Ft, .Ms, .Brq, .Bro, .Brc, .Ex are missing -
> it seems outdated by at lest 25 years. Also, some claims are
> outright wrong - for example, you *cannot* use .UR/.UE in an
> mdoc(7) document, and i cannot remember ever having seen an
> implementation of a .UN macro anywhere. Some macros descriptions
> are also wrong, e.g. .Fd is *not* intended for "function
> declarations", and .Vt is *not* "Fortran only". And so on.
>
> 2. I don't recommend keeping the old mdoc.samples(7) and mdoc(7)
> manual pages, but if you think you must do that for some reason,
> then you must at least revert this bogus commit:
I am *not at all* attached to keeping to these pages. Their
presence in the project has always felt a bit anomalous to me.
Back when I took over maintainership in 2004, there were a small
number of pages that used mdoc markup, and so it seemed wise
to keep these pages. Over time, most of those few pages were
converted to 'man' markup, and today the only other page in the
project that still uses mdoc markup is in queue(3). So, there is
just about zero value in having 'mdoc' documentation come with
the "Linux man-pages" box.
Since I seldom use mdoc markup myself, I've had no reason to
monitor pages such as groff_mdoc(7) or the mdoc(7) page
provided my ther 'mandoc' project and compare them with
the pages provided by "Linux man-pages". Now I've had a
closer look. It's sad.
I've removed mdoc(7) and mdoc.samples(7) from "Linux -man-pages".
]]
Reported-by: Ingo Schwarze <schwarze@usta.de>
Quoting Branden:
*roff escape sequences may sometimes look like C escapes, but that
is misleading. *roff is in part a macro language and that means
recursive expansion to arbitrary depths.
You can get away with "\\" in a context where no macro expansion
is taking place, but try to spell a literal backslash this way in
the argument to a macro and you will likely be unhappy with
results.
Try viewing the attached file with "man -l".
"\e" is the preferred and portable way to get a portable "escape
literal" going back to CSTR #54, the original Bell Labs troff
paper.
groff(7) discusses the issue:
\\ reduces to a single backslash; useful to delay its
interpretation as escape character in copy mode. For a
printable backslash, use \e, or even better \[rs], to be
independent from the current escape character.
As of groff 1.22.4, groff_man(7) does as well:
\e Widely used in man pages to represent a backslash output
glyph. It works reliably as long as the .ec request is
not used, which should never happen in man pages, and it
is slightly more portable than the more exact ‘\(rs’
(“reverse solidus”) escape sequence.
People not concerned with portability to extremely old troffs should
probably just use \(rs (or \[rs]), as it means "the backslash
glyph", not "the glyph corresponding to whatever the current escape
character is".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Branden:
*roff systems will interpret the period in the unpatched
page as sentence-ending punctuation and put inter-sentence
spacing after it. (This might not be visible on
nroff/terminal devices, but it is more likely to be on
typesetter/PostScript/PDF output).
groff_man(7) in groff 1.22.4 attempts to throw man page
writers a bone here:
\& Zero‐width space. Append to an input line to prevent
an end‐of‐ sentence punctuation sequence from being
recognized as such, or insert at the beginning of an
input line to prevent a dot or apostrophe from being
interpreted as the beginning of a roff request.
Reported-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Reported-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fanotify_init.2: add new flag FAN_REPORT_TID
fanotify.7: update description of member pid in
struct fanotify_event_metadata
Signed-off-by: nixiaoming <nixiaoming@huawei.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Monitor fanotify events on the entire filesystem.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
New event masks have been added to the fanotify API. Documentation to
support the use and behaviour of these new masks has been added
accordingly.
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for new flag IN_MASK_CREATE for inotify_add_watch()
which is used to only allow new watches to be created.
Information obtained from a patch I submitted to the linux kernel
https://marc.info/?l=linux-fsdevel&m=152775980422847&w=2
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man2/socket.2 (.SH DESCRIPTION): Mention that the list of
address families is Linux-specific.
* man7/address_families.7 (.SH DESCRIPTION): Likewise.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I need to get the TTL of UDP datagrams from userspace, so I set
the IP_RECVTTL socket option. And as promised by ip.7, I then get
IP_TTL messages from recvfrom. However, unlike what the manpage
promises, the TTL field gets passed as a 32 bit integer.
The following userspace code works:
uint32_t ttl32;
for (cmsg = CMSG_FIRSTHDR(msgh); cmsg != NULL; cmsg = CMSG_NXTHDR(msgh,cmsg)) {
if ((cmsg->cmsg_level == IPPROTO_IP) && (cmsg->cmsg_type == IP_TTL) &&
CMSG_LEN(sizeof(ttl32)) == cmsg->cmsg_len) {
memcpy(&ttl32, CMSG_DATA(cmsg), sizeof(ttl32));
*ttl=ttl32;
return true;
}
else
cerr<<"Saw something else "<<(cmsg->cmsg_type == IP_TTL) <<
", "<<(int)cmsg->cmsg_level<<", "<<cmsg->cmsg_len<<", "<<
CMSG_LEN(1)<<endl;
}
The 'else' field was used to figure out I go the length wrong.
Note from mtk:
Reading the source code also seems to confirm this, from
net/ipv4/ip_sockglue.c:
[[
static void ip_cmsg_recv_ttl(struct msghdr *msg, struct sk_buff *skb)
{
int ttl = ip_hdr(skb)->ttl;
put_cmsg(msg, SOL_IP, IP_TTL, sizeof(int), &ttl);
}
]]
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This best belongs at the end of the page, after the subsections
that already make some mention of user namespaces.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The text stated that the execve() capability transitions are not
performed for the same reasons that setuid and setgid mode bits
may be ignored (as described in execve(2)). But, that's not quite
correct: rather, the file capability sets are treated as empty
for the purpose of the capability transition calculations.
Also merge the new 'no_file_caps' kernel option text into the
same paragraph.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clarify the "Capabilities and execution of programs by root"
section, and correct a couple of details:
* If a process with rUID == 0 && eUID != 0 does an exec,
the process will nevertheless gain effective capabilities
if the file effective bit is set.
* Set-UID-root programs only confer a full set of capabilities
if the binary does not also have attached capabilities.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Balbir pointed out that v1 delegation was not an accidental
feature.
Reported-by: Balbir Singh <bsingharora@gmail.com>
Reported-by: Marcus Gelderie <redmnic@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use \(aq for ASCII apostrophes and \(ga for backtick,
as recommended by groff_man(7).
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clarify that SO_PASSCRED results in SCM_CREDENTIALS data in each
subsequently received message.
See https://bugzilla.kernel.org/show_bug.cgi?id=201805
Reported-by: Felipe Gasper <felipe@felipegasper.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use "FAN_OPEN_PERM" consistently rather than "FAN_PERM_OPEN".
Signed-off-by: Anthony Iliopoulos <ailiopoulos@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Because of setns() semantics, the parent of a process may reside
in the outer PID namespace. If that parent terminates, then the
child is adopted by the "init" in the outer PID namespace (rather
than the "init" of the PID namespace of the child).
Thus, in a scenario such as the following, if process M
terminates, P is adopted by the init process in the initial
PID namespace, and if P terminates, Q is adopted by the init
process in the inner PID namespace.
+---------------------------------------------+
| Initial PID NS |
| +---------------+ |
| +-+ | inner PID NS | |
| |1| | | |
| +-+ | +-+ | |
| | |1| | |
| | +-+ | |
| | | |
| +-+ setns(), fork() | +-+ | |
| |M|----------------------+--> |P| | |
| +-+ | +-+ | |
| | | fork() | |
| | v | |
| | +-+ | |
| | |Q| | |
| | +-+ | |
| +---------------+ |
+---------------------------------------------+
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Having the signals listed in three different tables reduces
readability, and would require more table splits if future
standards specify other signals.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current tables of signal information are unwieldy,
as they try to cram in too much information.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The previous location does not seem to be getting updated.
(For example, at the time of this commit, libcap-2.26
had been out for two months, but was not present at
http://www.kernel.org/pub/linux/libs/security/linux-privs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
x86 and ARM are the most common architectures, but currently
are in the second subfield in the signal number lists.
Instead, swap that info with subfield 1, so the most
common architectures are first in the list.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch adds the signal numbers for parisc to the signal(7) man page.
Those parisc-specific values for the various signals are valid since the
Linux kernel upstream commit ("parisc: Reduce SIGRTMIN from 37 to 32 to
behave like other Linux architectures") during development of kernel 3.18:
http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f25df2eff5b25f52c139d3ff31bc883eee9a0ab
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mention that the named constants (SECBIT_KEEP_CAPS and others)
are available only if the linux/securebits.h user-space header
is included.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
eBPF sub-system on Linux can use "helper functions", functions
implemented in the kernel that can be called from within a eBPF program
injected by a user on Linux. The kernel already supports a long list of
such helpers (sixty-seven at this time, new ones are under review).
Therefore, it is proposed to create a new manual page, separate from
bpf(2), to document those helpers for people willing to develop new eBPF
programs.
Additionally, in an effort to keep this documentation in synchronisation
with what is implemented in the kernel, it is further proposed to keep
the documentation itself in the kernel sources, as comments in file
"include/uapi/linux/bpf.h", and to generate the man page from there.
This patch adds the new man page, generated from kernel sources, to the
man-pages repository. For each eBPF helper function, a description of
the helper, of its arguments and of the return value is provided. The
idea is that all future changes for this page should be redirected to
the kernel file "include/uapi/linux/bpf.h", and the modified page
generated from there.
Generating the page itself is a two-step process. First, the
documentation is extracted from include/uapi/linux/bpf.h, and converted
to a RST (reStructuredText-formatted) page, with the relevant script
from Linux sources:
$ ./scripts/bpf_helpers_doc.py > /tmp/bpf-helpers.rst
The second step consists in turning the RST document into the final man
page, with rst2man:
$ rst2man /tmp/bpf-helpers.rst > bpf-helpers.7
The bpf.h file was taken as at kernel 4.19
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This just adds to the point made by Marcus Gelderie's patch. Note
also that SECBIT_KEEP_CAPS provides the same functionality as the
prctl() PR_SET_KEEPCAPS flag, and the prctl(2) manual page has the
correct description of the semantics (i.e., that the flag affects
the treatment of onlt the permitted capability set).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The description of SECBIT_KEEP_CAPS is misleading about the
effects on the effective capabilities of a process during a
switch to nonzero UIDs. The effective set is cleared based on
the effective UID switching to a nonzero value, even if
SECBIT_KEEP_CAPS is set. However, with this bit set, the
effective and permitted sets are not cleared if the real and
saved set-user-ID are set to nonzero values.
This was tested using the following C code and reading the kernel
source at security/commoncap.c: cap_emulate_setxuid.
void print_caps(void) {
cap_t current = cap_get_proc();
if (!current) {
perror("Current caps");
return;
}
char *text = cap_to_text(current, NULL);
if (!text) {
perror("Converting caps to text");
goto free_caps;
}
printf("Capabilities: %s\n", text);
cap_free(text);
free_caps:
cap_free(current);
}
void print_creds(void) {
uid_t ruid, suid, euid;
if (getresuid(&ruid, &euid, &suid)) {
perror("Error getting UIDs");
return;
}
printf("real = %d, effective = %d, saved set-user-ID = %d\n", ruid, euid, suid);
}
void set_caps(int size, const cap_value_t *caps) {
cap_t current = cap_init();
if (!current) {
perror("Error getting current caps");
return;
}
if (cap_clear(current)) {
perror("Error clearing caps");
}
if (cap_set_flag(current, CAP_INHERITABLE, size, caps, CAP_SET)) {
perror("setting caps");
goto free_caps;
}
if (cap_set_flag(current, CAP_EFFECTIVE, size, caps, CAP_SET)) {
perror("setting caps");
goto free_caps;
}
if (cap_set_flag(current, CAP_PERMITTED, size, caps, CAP_SET)) {
perror("setting caps");
goto free_caps;
}
if (cap_set_proc(current)) {
perror("Comitting caps");
goto free_caps;
}
free_caps:
cap_free(current);
}
const cap_value_t caps[] = {CAP_SETUID, CAP_SETPCAP};
const size_t num_caps = sizeof(caps) / sizeof(cap_value_t);
int main(int argc, char **argv) {
puts("[+] Dropping most capabilities to reduce amount of console output...");
set_caps(num_caps, caps);
puts("[+] Dropped capabilities. Starting with these credentials and capabilities:");
print_caps();
print_creds();
if (argc >= 2 && 0 == strncmp(argv[1], "keep", 4)) {
puts("[+] Setting SECBIT_KEEP_CAPS bit");
if (prctl(PR_SET_SECUREBITS, SECBIT_KEEP_CAPS, 0, 0, 0)) {
perror("Setting secure bits");
return 1;
}
}
puts("[+] Setting effective UID to 1000");
if (seteuid(1000)) {
perror("Error setting effective UID");
return 2;
}
print_caps();
print_creds();
puts("[+] Raising caps again");
set_caps(num_caps, caps);
print_caps();
print_creds();
puts("[+] Setting all remaining UIDs to nonzero values");
if (setreuid(1000, 1000)) {
perror("Error setting all UIDs to 1000");
return 3;
}
print_caps();
print_creds();
return 0;
}
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Prefer the word "owns" rather than "associated with" when
describing the relationship between user namespaces and non-user
namespaces. The existing text used a mix of the two terms, with
"associated with" being predominant, but to my ear, describing the
relationship as "ownership" is more comprehensible.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There is too much detail in socket(2). Move most of it into
a new page instead.
Cowritten-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clarify the example by making an implied detail more explicit.
Quoting the Troy Engel on the problem with the original text:
The problem is "and a process in a sibling cgroup (sub2)"
(shown as PID 20124 here) - how did this get here? How do I
recreate this? Following this example, there's no mention of
how, it's out of place when following the instructions.
There is nothing in any of the cgroup files which contain
this (# grep freezer /proc/*/cgroup) while at this stage.
The intent is understood, however the man page seems to skip
a step to create this in the teaching example. We should add
whatever simple steps are needed to create the "process in a
sibling cgroup" as outlined so it makes sense - as written,
I have no clue where "sibling cgroup (sub2)" came from, it
just appeared out of the blue in that step. Thanks!
See https://bugzilla.kernel.org/show_bug.cgi?id=201047
Reported-by: Troy Engel <troyengel@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The intended text was hidden elsewhere in the source of the
page as a comment.
https://bugzilla.kernel.org/show_bug.cgi?id=201029
Reported-by: Mike Weilgart <mike.weilgart@verticalsysadmin.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In particular, it is possible to write "threaded" to a
cgroup.type file if the current type is "domain threaded".
Previously, the text had implied that this was not possible.
Verified by experiment on Linux 4.15 and 4.19-rc.
Reported-by: Leah Hanson <lhanson@pivotal.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After clone(CLONE_NEWPID), /proc/PID/ns/pid_for_children is empty
until the first child is created. Verified by experiment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some other socket options that are applicable for TCP and UDP sockets
are documented in socket(7), so help the reader by pointing them at
that page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Thanks to a tip from Keith Packard:
https://keithp.com/blogs/fd-passing/
(Also verified by experiment.)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When sending ancillary data, at least one byte of real data should
also be sent. This is strictly necessary for stream sockets
(verified by experiment). It is not required for datagram sockets
on Linux (verified by experiment), but portable applications
should do so.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If the ancillary data buffer for receiving SCM_RIGHTS file
descriptors is too small, then the excess file descriptors are
automatically closed in the receiving process. Verified by
experiment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Verified by experiment and reading the source code (although
the SCM_RIGHTS case is not so clear to me in the source code).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If the buffer supplied to recvmsg() to receive ancillary data is
too small, then the data is truncated and the MSG_CTRUNC flag is
set.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The file UID does not come into play when creating a v3
security.capability extended attribute.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In particular, note that it may be difficult for an application
to know about the existence of duplicate file descriptors.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note a useful performance benefit of EPOLLET: ensuring that
only one of multiple waiters (in epoll_wait()) is woken
up when a file descriptor becomes ready.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The parisc gateway page currently only exports 3 functions:
The lws_entry for CAS operations (at 0xb0), the set_thread_pointer
function for usage in glibc (at 0xe0) and the Linux syscall entry
(at 0x100).
All other symbols in the manpage are internal labels and
shouldn't be used directly by userspace or glibc, so drop them
from the man page documentation.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note ENOTDIR error that occurs when requesting a watch on a
nondirectory with IN_ONLYDIR.
Reported-by: Paul Millar <paul.millar@desy.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add background details on ambient and bounding set when
discussing capability transformations during execve(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
capset(2) and capget(2) apply operate only on the permitted,
effective, and inheritable process capability sets.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When comparing two namespaces symlinks to see if they refer to
the same namespace, both the inode number and the device ID
should be compared. This point was already made clear in
ioctl_ns(2), but was missing from this page.
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There was some confused missing of concepts between the
two subsections, and some other details that needed fixing up.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Confirmed with Serge Hallyn that: "nsroot" means the UID 0
in the namespace as it would be mapped into the initial userns.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use more consistent layout for lists of functions, and
remove punctuation from the lists to make them less cluttered.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We define in detail the X/Open System Interfaces i.e. _XOPEN_UNIX
and all of the X/Open System Interfaces (XSI) Options Groups.
The XSI options groups include encryption, realtime, advanced
realtime, realtime threads, advanced realtime threads, tracing,
streams, and legacy interfaces.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted by Rusty Russell:
I was really surprised that sendmsg() returned EBADF on a valid fd;
turns out I was using sendmsg with SCM_RIGHTS to send a closed fd,
which gives EBADF (see test program below).
But this is only obliquely referenced in unix(7):
SCM_RIGHTS
Send or receive a set of open file descriptors
from another process. The data portion contains
an integer array of the file descriptors. The
passed file descriptors behave as though they have
been created with dup(2).
EBADF is not mentioned in the unix(7) ERRORS (it's mentioned in
dup(2)).
int fdpass_send(int sockout, int fd)
{
/* From the cmsg(3) manpage: */
struct msghdr msg = { 0 };
struct cmsghdr *cmsg;
struct iovec iov;
char c = 0;
union { /* Ancillary data buffer, wrapped in a union
in order to ensure it is suitably aligned */
char buf[CMSG_SPACE(sizeof(fd))];
struct cmsghdr align;
} u;
msg.msg_control = u.buf;
msg.msg_controllen = sizeof(u.buf);
memset(&u, 0, sizeof(u));
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
msg.msg_name = NULL;
msg.msg_namelen = 0;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_flags = 0;
/* Keith Packard reports that 0-length sends don't work, so we
* always send 1 byte. */
iov.iov_base = &c;
iov.iov_len = 1;
return sendmsg(sockout, &msg, 0);
}
int fdpass_recv(int sockin)
{
/* From the cmsg(3) manpage: */
struct msghdr msg = { 0 };
struct cmsghdr *cmsg;
struct iovec iov;
int fd;
char c;
union { /* Ancillary data buffer, wrapped in a union
in order to ensure it is suitably aligned */
char buf[CMSG_SPACE(sizeof(fd))];
struct cmsghdr align;
} u;
msg.msg_control = u.buf;
msg.msg_controllen = sizeof(u.buf);
msg.msg_name = NULL;
msg.msg_namelen = 0;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_flags = 0;
iov.iov_base = &c;
iov.iov_len = 1;
if (recvmsg(sockin, &msg, 0) < 0)
return -1;
cmsg = CMSG_FIRSTHDR(&msg);
if (!cmsg
|| cmsg->cmsg_len != CMSG_LEN(sizeof(fd))
|| cmsg->cmsg_level != SOL_SOCKET
|| cmsg->cmsg_type != SCM_RIGHTS) {
errno = -EINVAL;
return -1;
}
memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd));
return fd;
}
static void child(int sockfd)
{
int newfd = fdpass_recv(sockfd);
assert(newfd < 0);
exit(0);
}
int main(void)
{
int sv[2];
int pid, ret;
assert(socketpair(AF_UNIX, SOCK_STREAM, 0, sv) == 0);
pid = fork();
if (pid == 0) {
close(sv[1]);
child(sv[0]);
}
close(sv[0]);
ret = fdpass_send(sv[1], sv[0]);
printf("fdpass of bad fd return %i (%s)\n", ret, strerror(errno));
return 0;
}
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The last argument is passed by value, not reference.
Reported-by: Tomi Salminen <tsalminen@forcepoint.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
gettimeofday() is declared obsolete by POSIX. Mention instead
the modern APIs for working with the realtime clock.
See https://bugzilla.kernel.org/show_bug.cgi?id=199049
Reported-by: Enrique Garcia <cquike@arcor.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
/proc/[pid]/ns/pid_for_children has a value only after first
child is created in PID namespace. Verified by experiment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>