The documentation for set_thread_area was very vague. This
improves it, accounts for recent kernel changes, and merges
it with get_thread_area.2.
get_thread_area.2 now becomes a link.
While I'm at it, clarify the related arch_prctl.2 man page.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This clarifies the behavior and documents all four functions.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Currently the PERF_EVENT_IOC_REFRESH ioctl, when applied to a group
leader, will refresh all children. Also if a refresh value of 0
is chosen then the refresh becomes infinite (never runs out).
Back in 2011 PAPI was relying on these behaviors but I was told
that both were unsupported and subject to being removed at any time.
(See https://lkml.org/lkml/2011/5/24/337 )
However the behavior has not been changed.
This patch updates the manpage to still list the behavior as
unsupported, but removes the inaccurate description of it
only being a problem with 2.6 kernels.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fork.2 should clearly point out that child and parent
process run in separate memory spaces.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Extend description of PTRACE_SEIZE with the short summary of its
differences from PTRACE_ATTACH.
The following paragraph:
PTRACE_EVENT_STOP
Stop induced by PTRACE_INTERRUPT command, or group-stop, or ini-
tial ptrace-stop when a new child is attached (only if attached
using PTRACE_SEIZE), or PTRACE_EVENT_STOP if PTRACE_SEIZE was used.
has an editing error (the part after last comma makes no sense).
Removing it.
Mention that legacy post-execve SIGTRAP is disabled by PTRACE_SEIZE.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This behaviour was verified by reading the kernel source and
confirming the behaviour using a test program.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The following program illustrates the difference between TCP
and Unix stream sockets doing sendfile. Since TCP implements
zero-copy, the new modifications to the file transferred is
seen upon reading despite the modifications happening after
sendfile was last called.
Unix stream sockets do not implement zero-copy (as of
Linux 3.15), so readers continue to see the contents of the
file at the time it was sent, not as they are at the time of
reading.
----------------- sendfile-mod.c ---------------
#define _GNU_SOURCE
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/sendfile.h>
#include <arpa/inet.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <fcntl.h>
static void tcp_socketpair(int sv[2])
{
struct sockaddr_in addr;
socklen_t addrlen = sizeof(addr);
int l = socket(PF_INET, SOCK_STREAM, 0);
int c = socket(PF_INET, SOCK_STREAM, 0);
int a;
int val = 1;
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_port = 0;
assert(0 == bind(l, (struct sockaddr*)&addr, addrlen));
assert(0 == listen(l, 1024));
assert(0 == getsockname(l, (struct sockaddr *)&addr, &addrlen));
assert(0 == connect(c, (struct sockaddr *)&addr, addrlen));
a = accept4(l, NULL, NULL, SOCK_NONBLOCK);
assert(a >= 0);
close(l);
assert(0 == ioctl(c, FIONBIO, &val));
sv[0] = a;
sv[1] = c;
}
int main(int argc, char *argv[])
{
int pair[2];
FILE *tmp = tmpfile();
int tfd;
char buf[16384];
ssize_t w, r;
size_t i;
const size_t n = 2048;
off_t off = 0;
char expect[4096];
int flags = SOCK_STREAM|SOCK_NONBLOCK;
tfd = fileno(tmp);
assert(tfd >= 0);
/* prepare the tempfile */
memset(buf, 'a', sizeof(buf));
for (i = 0; i < n; i++)
assert(sizeof(buf) == write(tfd, buf, sizeof(buf)));
if (argc == 2 && strcmp(argv[1], "unix") == 0)
assert(0 == socketpair(AF_UNIX, flags, 0, pair));
else if (argc == 2 && strcmp(argv[1], "pipe") == 0)
assert(0 == pipe2(pair, O_NONBLOCK));
else
tcp_socketpair(pair);
/* fill up the socket buffer */
for (;;) {
w = sendfile(pair[1], tfd, &off, n);
if (w > 0)
continue;
if (w < 0 && errno == EAGAIN)
break;
assert(0 && "unhandled error" && w && errno);
}
printf("wrote off=%lld\n", (long long)off);
/* rewrite the tempfile */
memset(buf, 'A', sizeof(buf));
assert(0 == lseek(tfd, 0, SEEK_SET));
for (i = 0; i < n; i++)
assert(sizeof(buf) == write(tfd, buf, sizeof(buf)));
/* we should be reading 'a's, not 'A's */
memset(expect, 'a', sizeof(expect));
do {
r = read(pair[0], buf, sizeof(expect));
/* TCP fails here since it is zero copy (on Linux 3.15.5) */
if (r > 0)
assert(memcmp(buf, expect, r) == 0);
} while (r > 0);
return 0;
}
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CLONE_PARENT_SETTID only stores child thread ID in parent memory.
Signed-off-by: Peng Haitao <penght@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch the fact that a successful execve(2) in a process that
is sharing a file descriptor table results in unsharing the table.
I discovered this through testing and verified it by source
inspection - there is a call to unshare_files() early in
do_execve_common().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I encountered these errors while writing testcase for migrate_pages
syscall for LTP (Linux test project).
I checked stable kernel tree 3.5 to see which paths return these.
Both can be returned from get_nodes(), which is called from:
SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
const unsigned long __user *, old_nodes,
const unsigned long __user *, new_nodes)
The testcase does following:
EFAULT
a) old_nodes/new_nodes is area mmaped with PROT_NONE
b) old_nodes/new_nodes is area not mmapped in process address
space, -1 or area that has been just munmmaped
EINVAL
a) maxnodes overflows kernel limit
b) new_nodes contain node, which has no memory or does not exist
or is not returned for get_mempolicy(MPOL_F_MEMS_ALLOWED).
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I puzzled over mprotect()'s effect on /proc/*/maps for a while
yesterday -- it was setting "x" without PROT_EXEC being specified.
Here is a patch to add some explanation.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We have users who are terribly confused why their binaries
with CAP_DAC_OVERRIDE capability see EACCESS from access() calls,
but are able to read the file.
The reason is access() isn't the "can I read/write/execute this
file?" question, it is the "(assuming that I'm a setuid binary,)
can *the user who invoked me* read/write/execute this file?"
question.
That's why it uses real UIDs as documented, and why it ignores
capabilities when capability-endorsed binaries are run by non-root
(this patch adds this information).
To make users more likely to notice this less-known detail,
the patch expands the explanation with rationale for this logic
into a separate paragraph.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: linux-man@vger.kernel.org
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I am not sure why we have:
"EAGAIN fork() cannot allocate sufficient memory to copy
the parent's page tables and allocate a task structure
or the child."
The text seems to be there from the time when man-pages
were moved to git so there is no history for it.
And it doesn't reflect reality: the kernel reports both
dup_task_struct and dup_mm failures as ENOMEM to the
userspace. This seems to be the case from early 2.x times
so let's simply remove this part.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>