Recast the advice against manually declaring 'errno' to
a more modern perspective. It's 13 years since the original
text was added, and even then it was describing old behavior.
Cast the description to be about behavior further away in
time, and note more clearly that manual declaration will
cause problems with modern C libraries.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Added after a patch from Wesley Aptekar-Cassels that proposed
to add error numbers to the text.
Reported-by: Wesley Aptekar-Cassels <w.aptekar@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX.1-2008 noted the explicitly the change (to align with
the C standards) that error numbers are positive, rather
than nonzero.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Restructure the text and add some subheadings for better
readability. No (intentional) content changes.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Based on comparing the filtered content of the two main
kernel errno files:
cat include/uapi/asm-generic/errno.h \
include/uapi/asm-generic/errno-base.h | grep define | \
grep -v 'define _' | awk '{print $2}' | sort -u
to see what is absent from this page, and used in either kernel
or glibc.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The recent addition of NFS re-export and the possibility of using
name_to_handle_at() on an NFS filesystem raises issues with
name_to_handle_at() which have not been properly documented.
Getting the file handle for an untriggered automount point is
arguably meaningless and in certainly not supported by NFS.
name_to_handle_at() will return -EOVERFLOW even though the
requested "handle_bytes" is large enough. This is an unfortunate
overloading of the error code, but is manageable.
So clarify this and also note that the mount_id is returned when
EOVERFLOW is reported.
Thought: it would be nice if mount_id were returned in the
EOPNOTSUPP case too. I guess it is too late to fix that (?).
Link: https://github.com/systemd/systemd/issues/7082
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the glibc 2.24 change that dropped CWD from the default
search path employed by execlp(), execvp() and execvpe() when
PATH is not defined.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add more information about the iocb structure. It explains the
fields of the I/O control block structure which is passed to the
io_submit call.
The work also includes the nowait feature flags which is currently
posted at http://marc.info/?l=linux-fsdevel&m=149664103900715&w=2
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Two reports that the description of SO_RXQ_OVFL was wrong.
======
Commentary from Tobias:
This bug pertains to the manpage as visible on man7.org right
now.
The socket(7) man page has this paragraph:
SO_RXQ_OVFL (since Linux 2.6.33)
Indicates that an unsigned 32-bit value ancillary
message (cmsg) should be attached to received skbs
indicating the number of packets dropped by the
socket between the last received packet and this
received packet.
The second half is wrong: the counter (internally,
SOCK_SKB_CB(skb)->dropcount is *not* reset after every packet.
That is, it is a proper counter, not a gauge, in monitoring
parlance.
A better version of that paragraph:
SO_RXQ_OVFL (since Linux 2.6.33)
Indicates that an unsigned 32-bit value ancillary
message (cmsg) should be attached to received skbs
indicating the number of packets dropped by the
socket since its creation.
======
Commentary from Petr
Generic SO_RXQ_OVFL helpers sock_skb_set_dropcount() and
sock_recv_drops() implements returning of sk->sk_drops (the total
number of dropped packets), although the documentation says the
number of dropped packets since the last received one should be
returned (quoting the current socket.7):
SO_RXQ_OVFL (since Linux 2.6.33)
Indicates that an unsigned 32-bit value ancillary message (cmsg)
should be attached to received skbs indicating the number of packets
dropped by the socket between the last received packet and this
received packet.
I assume the documentation needs to be updated, as fixing this in
the code could break programs depending on the current behavior,
although the formerly planned functionality seems to be more
useful.
The problem can be revealed with the following program:
int extract_drop(struct msghdr *msg)
{
struct cmsghdr *cmsg;
int rtn;
for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg,cmsg)) {
if (cmsg->cmsg_level == SOL_SOCKET &&
cmsg->cmsg_type == SO_RXQ_OVFL) {
memcpy(&rtn, CMSG_DATA(cmsg), sizeof rtn);
return rtn;
}
}
return -1;
}
int main(int argc, char *argv[])
{
struct sockaddr_in addr = { .sin_family = AF_INET };
char msg[48*1024], cmsgbuf[256];
struct iovec iov = { .iov_base = msg, .iov_len = sizeof msg };
int sk1, sk2, i, one = 1;
sk1 = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
sk2 = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
addr.sin_port = htons(53333);
bind(sk1, (struct sockaddr*)&addr, sizeof addr);
connect(sk2, (struct sockaddr*)&addr, sizeof addr);
// Kernel doubles this limit, but it accounts also the SKB overhead,
// but it receives as long as there is at least 1 byte free.
i = sizeof msg;
setsockopt(sk1, SOL_SOCKET, SO_RCVBUF, &i, sizeof i);
setsockopt(sk1, SOL_SOCKET, SO_RXQ_OVFL, &one, sizeof one);
for (i = 0; i < 4; i++) {
int rtn;
send(sk2, msg, sizeof msg, 0);
send(sk2, msg, sizeof msg, 0);
send(sk2, msg, sizeof msg, 0);
do {
struct msghdr msghdr = {
.msg_iov = &iov, .msg_iovlen = 1,
.msg_control = &cmsgbuf,
.msg_controllen = sizeof cmsgbuf };
rtn = recvmsg(sk1, &msghdr, MSG_DONTWAIT);
if (rtn > 0) {
printf("rtn: %d drop %d\n", rtn,
extract_drop(&msghdr));
} else {
printf("rtn: %d\n", rtn);
}
} while (rtn > 0);
}
return 0;
}
which prints
rtn: 49152 drop -1
rtn: 49152 drop -1
rtn: -1
rtn: 49152 drop 1
rtn: 49152 drop 1
rtn: -1
rtn: 49152 drop 2
rtn: 49152 drop 2
rtn: -1
rtn: 49152 drop 3
rtn: 49152 drop 3
rtn: -1
although it should print (according to the documentation):
rtn: 49152 drop 0
rtn: 49152 drop 0
rtn: -1
rtn: 49152 drop 1
rtn: 49152 drop 0
rtn: -1
rtn: 49152 drop 1
rtn: 49152 drop 0
rtn: -1
rtn: 49152 drop 1
rtn: 49152 drop 0
rtn: -1
Reported-by: Petr Malat <oss@malat.biz>
Reported-by: Tobias Klausmann <klausman@schwarzvogel.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Currently pkey_alloc() syscall has two arguments, and the very
first argument is still not supported as in kernel 4.14-rc8 and
should be set to zero, as showed in the following syscall
implementation:
SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, ...)
{
int pkey;
int ret;
/* No flags supported yet. */
if (flags)
return -EINVAL;
This behaviour is also documented correctly in the kernel
documentation as Documentation/x86/protection-keys.txt
The second argument is the one that should specify the page
access rights.
This patch fixes the manpage to describe how the code behaves.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After comments from Miklos, and further digging in the kernel
source that showed that chroot() can also result in "hidden"
parent-IDs in mountinfo, I've revised the description of
mountinfo.
In fs/proc_namespace.cs::how_mountinfo() there is:
/* mountpoints outside of chroot jail will give SEQ_SKIP on this */
err = seq_path_root(m, &mnt_path, &p->root, " \t\n\\");
if (err)
goto out;
I instrumented the 'if (err)' code path with printk()
to show that there is indeed a record corresponding to the
parent-ID for the process root that is being skipped.
Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I do not have an exact handle on the details, but I can see
roughly what is going on. Internally, there seems to be one
("hidden") mount ID reserved to each mount namespace, and that ID
is the parent of the root mount point.
Looking through the (4.14) kernel source, mount IDs are allocated
by a kernel function called mnt_alloc_id() (in fs/namespace.c),
which is in turn called by alloc_vfsmnt() which is in turn called
by clone_mnt().
A new mount namespace is created by the kernel function
copy_mnt_ns() (in fs/namespace.c, called by
create_new_namespaces() in kernel/nsproxy.c). The copy_mnt_ns()
function calls copy_tree() (in fs/namespace.c), and copy_tree()
calls clone_mnt() in *two* places. The first of these is the call
that creates the "hidden" mount ID that becomes the parent of the
root mount point. (I verified this by instrumenting the kernel
with a few printk() calls to display the IDs.) The second place
where copy_tree() calls clone_mnt() is in a loop that replicates
each of the mount points (including the root mount point) in the
source mount namespace.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After Linux 2.6.36, the heuristic calculation of oom_score
has changed to only consider used memory and CAP_SYS_ADMIN.
See kernel commit a63d83f427fbce97a6cea0db2e64b0eb8435cd10.
Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for those new membarrier() commands:
MEMBARRIER_CMD_PRIVATE_EXPEDITED
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED
Adapt the MEMBARRIER_CMD_SHARED return value documentation to reflect
that it now returns -EINVAL when issued on a system configured for
nohz_full.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul Turner <pjt@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Hunter <ahh@google.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: Dave Watson <davejwatson@fb.com>
CC: Chris Lameter <cl@linux.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ben Maurer <bmaurer@fb.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: linux-api@vger.kernel.org
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>