O_RSYNC is defined in <asm/fcntl.h> on HP PA-RISC, but is not
used anyway.
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a note regarding other implementations of whiteout inodes
and update filesystem support information.
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This information is already summarized in syscall(2), so there's
no need to repeat it in each page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some architectures (ab)use second return value register for additional
return value in some system calls. Let's describe this.
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by Nadav Har'El in
https://bugzilla.kernel.org/show_bug.cgi?id=197961
The write(2) manual page has this paragraph:
"On success, the number of bytes written is returned
(zero indicates nothing was written). It is not an error
if this number is smaller than the number of bytes
requested; this may happen for example because the disk
device was filled. See also NOTES."
I find a few problems with this paragraph:
1. It's not clear what "See also NOTES." refers to (does it
refer to anything?). What in the NOTES is relevant here?
2. The paragraph seems to suggest that write(2) of a
non-empty buffer may sometimes return even 0 in case of an
error like the device being filled. I think this is wrong
- if there was an error after already writing some number
of bytes, this non-zero number is returned. But if there's
an error before writing any bytes, -1 will be returned
(and the error reason in errno) - 0 will not be returned
unless the given count is 0 (that case is explained in the
following paragraph).
3. The paragraph doesn't explain what a user should do
after a short write (i.e., write(2) returning less than
count). How would the user know why there was an error, or
if there even was one? I think users should be told what
to do next because this information is part of how to use
this API correctly. I think users should be told to retry
the rest of the write (i.e., write(fd, buf+ret, count-ret)
and this will either succeed in writing some more data if
the error reason was solved, or the second write will
return -1 and the error reason in errno.
Reported-by: Nadav Har'El <nyh@math.technion.ac.il>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
ENOATTR is not a standard error code, but rather one that is
defined in 'libattr' as a synonym for ENODATA. The manual pages
should use the error code actually returned by the kernel APIs.
See also https://bugzilla.kernel.org/show_bug.cgi?id=201995
Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
alpha use v0 e.g. $0 as the return value register both in
syscall ABI and C ABI.
see also
https://github.com/torvalds/linux/blob/master/arch/alpha/kernel/entry.S#L479
The normal Alpha C ABI use a0~a5 to pass arguments and use v0 as
the return value register. See here
https://www2.cs.arizona.edu/projects/alto/Doc/local/alpha.register.html
The syscall ABI use v0 as the trap number, a0~a5 to pass arguments
and use a3 as a indicator (bool type) whether has a error occurred.
We can also see the libc's syscall wrapper implements at
https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/alpha/syscall.S.html
The v0 is the normal used as return register, and we can see the
return processing doesn't do anything about a0 which is the wrong
register of currently syscall(2) description.
p.s. I found this wrong description because I'm porting Go gc to
a new CPU architecture which is similar to Alpha, And I use the
wrong register at first, then I have inspect the kernel code and
objdump to ensure the right syscall ABI.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Back in 2014 (37bee118ad) the text
describing when multiplexing happens was changed in a confusing way.
This is an attempt to clarify things a bit.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since 93e06c7a6453 ("mm: enable MADV_FREE for swapless system") we
handle MADV_FREE on a swapless system the same way as with the
swap available. Clarify that fact in the man page.
Reported-by: Niklas Hambüchen <mail@nh2.me>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove the old statement that PTRACE_O_TRACESYSGOOD may not work
on all architectures. As far as I can tell, all kernel code
properly tests PT_TRACESYSGOOD flag and sets the 7th bit in the
exit code passed to ptrace_notify().
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
To test the behavior documented by this patch, the following
demos employ the program shown at the foot of this commit message.
First, show that the pdeath signal is sent when the parent
terminates:
$ ./pdeath_signal 0 10 4
Parent (18595) about to sleep for 4 seconds
Child about to set PR_SET_PDEATHSIG
Child about to sleep
Parent (18595) terminating
*********** Child (18596) got signal; si_pid = 18595; si_uid = 1000
Parent PID is now 1403
$ Child about to exit
But the signal is not sent if the parent terminates before the
child uses PR_SET_PDEATHSIG:
$ ./pdeath_signal 2 10 0
Parent (18707) about to sleep for 0 seconds
Parent (18707) terminating
Child about to sleep 2 seconds before setting PR_SET_PDEATHSIG
$ Child about to set PR_SET_PDEATHSIG
Child about to sleep
Child about to exit
Demonstrate that the pdeath signal is sent on termination of each
ancestor subreaper process:
$ ./pdeath_signal 2 10 3 7 6 5
18786 marked itself as a subreaper
18786 subreaper about to sleep 7 seconds
18787 marked itself as a subreaper
18787 subreaper about to sleep 6 seconds
18788 marked itself as a subreaper
18788 subreaper about to sleep 5 seconds
Parent (18789) about to sleep for 3 seconds
Child about to sleep 2 seconds before setting PR_SET_PDEATHSIG
Child about to set PR_SET_PDEATHSIG
Child about to sleep
Parent (18789) terminating
*********** Child (18790) got signal; si_pid = 18789; si_uid = 1000
Parent PID is now 18788
18788 subreaper about to terminate
*********** Child (18790) got signal; si_pid = 18788; si_uid = 1000
Parent PID is now 18787
18787 subreaper about to terminate
*********** Child (18790) got signal; si_pid = 18787; si_uid = 1000
Parent PID is now 18786
18786 subreaper about to terminate
*********** Child (18790) got signal; si_pid = 18786; si_uid = 1000
Parent PID is now 1403
$ Child about to exit
But in the case where some subreapers terminate before they
have a chance to adopt the child, the terminations of those
subreapers do not result in a signal for the child:
$ ./pdeath_signal 2 10 3 5 6 7
18836 marked itself as a subreaper
18836 subreaper about to sleep 5 seconds
18837 marked itself as a subreaper
18837 subreaper about to sleep 6 seconds
18838 marked itself as a subreaper
18838 subreaper about to sleep 7 seconds
Parent (18839) about to sleep for 3 seconds
Child about to sleep 2 seconds before setting PR_SET_PDEATHSIG
Child about to set PR_SET_PDEATHSIG
Child about to sleep
Parent (18839) terminating
*********** Child (18840) got signal; si_pid = 18839; si_uid = 1000
Parent PID is now 18838
18836 subreaper about to terminate
$ 18837 subreaper about to terminate
18838 subreaper about to terminate
*********** Child (18840) got signal; si_pid = 18838; si_uid = 1000
Parent PID is now 1403
Child about to exit
============================
/* pdeath_signal.c */
} while (0)
static void
handler(int sig, siginfo_t *si, void *ucontext)
{
printf("*********** Child (%ld) got signal; si_pid = %d; si_uid = %d\n",
(long) getpid(), si->si_pid, si->si_uid);
printf(" Parent PID is now %ld\n", (long) getppid());
}
int
main(int argc, char *argv[])
{
struct sigaction sa;
int childPreSleep, childPostSleep, parentSleep;
if (argc < 2) {
fprintf(stderr, "Usage: %s child-pre-sleep "
"[child-post-sleep [parent-sleep [subreaper-sleep...]]]\n",
argv[0]);
exit(EXIT_FAILURE);
}
childPreSleep = atoi(argv[1]);
if (argc > 2)
childPostSleep = atoi(argv[2]);
if (argc > 3)
parentSleep = atoi(argv[3]);
/* Optionally create a series of subreapers */
if (argc > 4) {
for (int sr = 4; sr < argc; sr++) {
if (prctl(PR_SET_CHILD_SUBREAPER, 1) == -1)
errExit("prctl");
printf("%ld marked itself as a subreaper\n", (long) getpid());
switch (fork()) {
case -1:
errExit("fork");
case 0:
break;
default:
printf("%ld subreaper about to sleep %s seconds\n",
(long) getpid(), argv[sr]);
sleep(atoi(argv[sr]));
printf("%ld subreaper about to terminate\n", (long) getpid());
exit(EXIT_SUCCESS);
}
}
}
switch (fork()) {
case -1:
errExit("fork");
case 0:
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = handler;
if (sigaction(SIGUSR1, &sa, NULL) == -1)
errExit("sigaction");
if (childPreSleep > 0) {
printf("Child about to sleep %d seconds before setting "
"PR_SET_PDEATHSIG\n", childPreSleep);
sleep(childPreSleep);
}
printf("Child about to set PR_SET_PDEATHSIG\n");
if (prctl(PR_SET_PDEATHSIG, SIGUSR1) == -1)
errExit("prctl");
printf("Child about to sleep\n");
for (int j = 0; j < childPostSleep; j++)
sleep(1);
printf("Child about to exit\n");
exit(EXIT_SUCCESS);
default:
printf("Parent (%ld) about to sleep for %d seconds\n",
(long) getpid(), parentSleep);
sleep(parentSleep);
printf("Parent (%ld) terminating\n", (long) getpid());
exit(EXIT_SUCCESS);
}
}
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The signal is process directed and the siginfo_t->si_pid
filed contains the PID of the terminating parent.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
ptrace() with requests PTRACE_PEEKTEXT, PTRACE_PEEKDATA and
PTRACE_PEEKUSER can set errno to zero. AFAICS this is for a good
reason (so that you can tell the difference between a successful
PEEK with a result of -1 and a failed PEEK, even if you forget to
clear errno yourself), but it technically violates the rules
described in the errno.3 manpage.
glibc snippet from sysdeps/unix/sysv/linux/ptrace.c:
res = INLINE_SYSCALL (ptrace, 4, request, pid, addr, data);
if (res >= 0 && request > 0 && request < 4)
{
__set_errno (0);
return ret;
}
reproducer:
$ cat ptrace_test.c
char foobar_data[4] = "ABCD";
int main(void) {
pid_t child = fork();
if (child == -1) err(1, "fork");
if (child == 0) {
if (prctl(PR_SET_PDEATHSIG, SIGKILL)) err(1, "prctl");
while (1) sleep(1);
}
int status;
if (ptrace(PTRACE_ATTACH, child, NULL, NULL)) err(1, "attach");
if (waitpid(child, &status, 0) != child) err(1, "wait");
errno = EINVAL;
unsigned int res = ptrace(PTRACE_PEEKDATA, child, foobar_data, NULL);
printf("errno after PEEKDATA: %d\n", errno);
printf("PEEKDATA result: 0x%x\n", res);
}
$ gcc -o ptrace_test ptrace_test.c -Wall
$ ./ptrace_test
errno after PEEKDATA: 0
PEEKDATA result: 0x44434241
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See copy_process() in kernel/fork.c:
if (clone_flags & CLONE_THREAD) {
if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
(task_active_pid_ns(current) !=
current->nsproxy->pid_ns_for_children))
return ERR_PTR(-EINVAL);
}
current->nsproxy->pid_ns_for_children is where unshare(CLONE_NEWPID)
stashes the pending namespace.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The extra detail has little of noting with -test 2.6.0
added a particular feature has little value these days,
and is likely to confuse some readers who don't know
(and probably don't care) about the historical details.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Checking the FreeBSD source code, there's explicit support for
this to accommodate non-BSD systems (such as Linux).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use "UFFDIO_ZEROPAGE" consistently rather than "UFFDIO_ZERO".
Signed-off-by: Anthony Iliopoulos <ailiopoulos@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Based on text from Documentation/filesystems/ramfs-rootfs-initramfs.txt.
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Initially it was planned that the parisc linux port would natively
support 32-bit HP-UX binaries, but this compatibility was never
reached and finally dropped with Linux kernel 3.14.
With that background, drop parisc from the list of of platforms
which supports it's proprietary operating-system.
Additional notes from mtk:
The most relevant commit from the Linux 3.14 change log was:
[[
commit f5a408d53edef3af07ac7697b8bc54a755628450
Author: Guy Martin <gmsoft@tuxicoman.be>
Date: Thu Jan 16 17:17:53 2014 +0100
parisc: Make EWOULDBLOCK be equal to EAGAIN on parisc
On Linux, only parisc uses a different value for EWOULDBLOCK which
causes a lot of troubles for applications not checking for both values.
Since the hpux compat is long dead, make EWOULDBLOCK behave the same as
all other architectures.
]]
Additional notes from Helge:
The patch above is the initial and most important one with which
we stopped the HP-UX compatibility.
Then, with this commit in kernel 3.18 there is no way back:
"parisc: Reduce SIGRTMIN from 37 to 32 to behave like
other Linux architectures"
commit 1f25df2eff5b25f52c139d3ff31bc883eee9a0ab
And in kernel 4.0 we finally dropped the HP-UX compat layer
from Linux kernel source code with the commit series
"parisc: hpux - Drop support for HP-UX binaries":
commit 04c1614977168fb8f002e2d81f704eeabe0c5ebd
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
On parisc one needs to take care of the 32-bit calling conventions
with 64-bit syscall parameters on a 32-bit kernel. So on parisc we
suffer from the same issues like ARM, PowerPC and Xtensa.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There was probably a little too much detail in
Lukas Werkmeister's patch. Simplify, by removing a few
file systems, and arrange the information as a bulleted
list for easier readability.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The RENAME_NOREPLACE flag was added with the initial release of the
renameat2 syscall in Linux 3.15, but support for most filesystems was
only added in later versions, and some may still not support it.
Signed-off-by: Lucas Werkmeister <mail@lucaswerkmeister.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The original implementation of PR_SET_MM_EXE_FILE only allowed it
to be used once in a process's lifetime. This restriction was
lifted in Linux commit 3fb4afd9a504c2386b8435028d43283216bf588e
("prctl: remove one-shot limitation for changing exe link").
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[I got two patches for this; the other from Florian Weimer]
According to the following kernel code, preadv2(2)/pwritev2(2) with
an unknown flag actually returned EOPNOTSUPP instead of EINVAL:
----------------------------------------------------------------
static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
{
if (unlikely(flags & ~RWF_SUPPORTED)) {
return -EOPNOTSUPP;
}
...
}
static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
loff_t *ppos, int type, rwf_t flags)
{
...
if (flags & ~RWF_HIPRI)
return -EOPNOTSUPP;
...
}
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This fixes three typos of EACCES (one "S" is the correct errno
name).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Introduced by Linux commit v4.12-rc1~64^3~304^2~1.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The list of address families in this page is still
overwhelmingly long. So let's shorten it.
The removed entries are all in address_families(7).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add some information about some other address families present in
<linux/socket.h>.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As truncate(3) should dispatch between truncate/truncate64,
as noted later in the page.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note that clone() definition on IA-64 is the same as on
SH/Tile/Alpha, align __clone2 declarations in line with the
previous ones, add clone2 syscall prototype.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The kernel doesn't allow unsharing a pid NS if it has previously been
unshared, per this check in copy_pid_ns:
if (task_active_pid_ns(current) != old_ns)
return ERR_PTR(-EINVAL);
so let's note that.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The number of bytes that can be written to the pipe may be less
(sometimes substantially less) than the nominal capacity. This
was confirmed with some testing. For example, when writing
4097-byte blocks to a pipe with 65536 byte capacity, only 45066
bytes could be written (i.e., 20470 bytes less than 65536).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The discussion is phrased in terms of signals send using kill(2),
but applies equally to a signal sent by the kernel.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The memfd_create function and its corresponding constants have
required _GNU_SOURCE for as long as they've been in glibc.
Signed-off-by: Joseph C. Sible <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
connect(2) on a nonblocking UNIX domain socket when the receive
queue is full results in EAGAIN [1]. This is unlike other
connection-based socket families that return EINPROGRESS as
already documented.
mtk: confirmed with some light testing. And in
net/unix/af_unix.c::unix_stream_connect(), we have:
if (unix_recvq_full(other)) {
err = -EAGAIN;
if (!timeo)
goto out_unlock;
Signed-off-by: Benjamin Peterson <benjamin@python.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I regularly see excessive fd usage bugs (or even leaks) caused by
people who think they need to keep the fd open as long as the
mapping exists.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Per the comment on the pivot_root syscall in fs/namespace.c:
Also, the current root cannot be on the 'rootfs'
(initial ramfs) filesystem. See
Documentation/filesystems/ramfs-rootfs-initramfs.txt
for alternatives in this situation.
Signed-off-by: Joseph C. Sible <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
MINSTKSZ is not defined anywhere, MINSIGSTKSZ seems valid instead.
Signed-off-by: Hiroya Ito <hiroyan@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
MS_SILENT can be specified when changing propagation type,
but is ignored, as far as I can see from reading the code.
(The flags are passed to do_change_type(), which, as well
as the propagation flags, allows MS_REC and MS_SILENT
(in flags_to_propagation_type()), but does noting with
MS_SILENT. (Linux 4.17 source)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This sentence is left over from an earlier rewrite of the text,
and the relevant details are covered a few paragraphs later.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The glibc wrapper for renameat2() was added in glibc 2.28 and
requires _GNU_SOURCE.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The newly added IOCB_FLAG_IOPRIO aio_flag introduces new behaviors
and return values.
The details of this new feature are posted here:
https://lkml.org/lkml/2018/5/22/809
Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The PERF_EVENT_IOC_QUERY_BPF ioctl was introduced in Linux 4.16.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The PERF_EVENT_IOC_MODIFY_ATTRIBUTES ioctl was introduced in
Linux 4.17. It currently only works on breakpoint events.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The PERF_EVENT_IOC_PAUSE_OUTPUT ioctl was introduced in Linux 4.7.
I've have this patch for a long time, I apologize for the delay
in getting it submitted. I've made some minor changes to the
original patch proposed by Wang Nan.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Reviewed-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It turns out no one is really sure what the perf_event_open.2
exclude_idle field is supposed to do, and a recent thread on the
linux-kernel list:
[RFC] perf/core: what is exclude_idle supposed to do
did not really clarify things.
I think the following adjustment to the page clarifies things
at least a little.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some discussion on the linux-perf-users list has turned up that
the perf_event_open.2 description of how
PR_TASK_PERF_EVENTS_ENABLE / PR_TASK_PERF_EVENTS_DISABLE prctl()
works is misleading.
The descriptions were based on the tools/perf/design.txt document
which describes behavior that was removed in 082ff5a2767a06 (prior
to 2.6.31, the first release with perf_event_open support).
I have written some tests in my perf_event_tests testsuite that
verifies the behavior of prctl() in this case.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
JIT support for x86-32 was during the Linux 4.18 release cycle.
Also correct the entry for MIPS (only MIPS64 is supported).
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The man page notes that vmsplice() can splice pages from memory
to a pipe, but it can work in the other direction as well.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
sys/memfd.h doesn't exist. memfd_create() is declared in
sys/mman.h and some flags are available only in linux/memfd.h.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The EINVAL errors that can occur for clone(2) when it is called
with various CLONE_NEW* flags and the kernel was not configured
with support for the corresponding namespace can also occur for
unshare(2). (As far as I can see, these errors don't occur for
either clone(2) or unshare(2) when it comes to CLONE_NEWNS and
CLONE_NEWCGROUP.)
Reported-by: Shawn Landden <shawn@git.icu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note that EINVAL can occur with CLONE_NEWUSER if the kernel was
not configured with CONFIG_USER_NS.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When it fails, inet_aton() does not set errno, so using
perror() is not appropriate.
Reported-by: Antonio Chirizzi <antonio.chirizzi@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the kernel, the type of the arguments to pkey_alloc() is
"unsigned long" and that's what the page documented until now.
Now that glibc support is added for pkey_alloc(), switch to the
glibc prototype, which uses "unsigned int".
Reported-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Direct writes can perform partial writes because large writes
can be broken into smaller chunks by the block layer. Part of
the I/O submitted can fail and the failure is returned to write
as an error in the return value. However, part of the write can
be successful which means that data at the offset is inconsistent.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>