The ioctl(2) system call may be used to retrieve information about
the FAT file system and to set file attributes.
Signed-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The lm bit should never have existed in the first place. Sigh.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The documentation for set_thread_area was very vague. This
improves it, accounts for recent kernel changes, and merges
it with get_thread_area.2.
get_thread_area.2 now becomes a link.
While I'm at it, clarify the related arch_prctl.2 man page.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This clarifies the behavior and documents all four functions.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Currently the PERF_EVENT_IOC_REFRESH ioctl, when applied to a group
leader, will refresh all children. Also if a refresh value of 0
is chosen then the refresh becomes infinite (never runs out).
Back in 2011 PAPI was relying on these behaviors but I was told
that both were unsupported and subject to being removed at any time.
(See https://lkml.org/lkml/2011/5/24/337 )
However the behavior has not been changed.
This patch updates the manpage to still list the behavior as
unsupported, but removes the inaccurate description of it
only being a problem with 2.6 kernels.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fork.2 should clearly point out that child and parent
process run in separate memory spaces.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Extend description of PTRACE_SEIZE with the short summary of its
differences from PTRACE_ATTACH.
The following paragraph:
PTRACE_EVENT_STOP
Stop induced by PTRACE_INTERRUPT command, or group-stop, or ini-
tial ptrace-stop when a new child is attached (only if attached
using PTRACE_SEIZE), or PTRACE_EVENT_STOP if PTRACE_SEIZE was used.
has an editing error (the part after last comma makes no sense).
Removing it.
Mention that legacy post-execve SIGTRAP is disabled by PTRACE_SEIZE.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This behaviour was verified by reading the kernel source and
confirming the behaviour using a test program.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The following program illustrates the difference between TCP
and Unix stream sockets doing sendfile. Since TCP implements
zero-copy, the new modifications to the file transferred is
seen upon reading despite the modifications happening after
sendfile was last called.
Unix stream sockets do not implement zero-copy (as of
Linux 3.15), so readers continue to see the contents of the
file at the time it was sent, not as they are at the time of
reading.
----------------- sendfile-mod.c ---------------
#define _GNU_SOURCE
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/sendfile.h>
#include <arpa/inet.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <fcntl.h>
static void tcp_socketpair(int sv[2])
{
struct sockaddr_in addr;
socklen_t addrlen = sizeof(addr);
int l = socket(PF_INET, SOCK_STREAM, 0);
int c = socket(PF_INET, SOCK_STREAM, 0);
int a;
int val = 1;
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_port = 0;
assert(0 == bind(l, (struct sockaddr*)&addr, addrlen));
assert(0 == listen(l, 1024));
assert(0 == getsockname(l, (struct sockaddr *)&addr, &addrlen));
assert(0 == connect(c, (struct sockaddr *)&addr, addrlen));
a = accept4(l, NULL, NULL, SOCK_NONBLOCK);
assert(a >= 0);
close(l);
assert(0 == ioctl(c, FIONBIO, &val));
sv[0] = a;
sv[1] = c;
}
int main(int argc, char *argv[])
{
int pair[2];
FILE *tmp = tmpfile();
int tfd;
char buf[16384];
ssize_t w, r;
size_t i;
const size_t n = 2048;
off_t off = 0;
char expect[4096];
int flags = SOCK_STREAM|SOCK_NONBLOCK;
tfd = fileno(tmp);
assert(tfd >= 0);
/* prepare the tempfile */
memset(buf, 'a', sizeof(buf));
for (i = 0; i < n; i++)
assert(sizeof(buf) == write(tfd, buf, sizeof(buf)));
if (argc == 2 && strcmp(argv[1], "unix") == 0)
assert(0 == socketpair(AF_UNIX, flags, 0, pair));
else if (argc == 2 && strcmp(argv[1], "pipe") == 0)
assert(0 == pipe2(pair, O_NONBLOCK));
else
tcp_socketpair(pair);
/* fill up the socket buffer */
for (;;) {
w = sendfile(pair[1], tfd, &off, n);
if (w > 0)
continue;
if (w < 0 && errno == EAGAIN)
break;
assert(0 && "unhandled error" && w && errno);
}
printf("wrote off=%lld\n", (long long)off);
/* rewrite the tempfile */
memset(buf, 'A', sizeof(buf));
assert(0 == lseek(tfd, 0, SEEK_SET));
for (i = 0; i < n; i++)
assert(sizeof(buf) == write(tfd, buf, sizeof(buf)));
/* we should be reading 'a's, not 'A's */
memset(expect, 'a', sizeof(expect));
do {
r = read(pair[0], buf, sizeof(expect));
/* TCP fails here since it is zero copy (on Linux 3.15.5) */
if (r > 0)
assert(memcmp(buf, expect, r) == 0);
} while (r > 0);
return 0;
}
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CLONE_PARENT_SETTID only stores child thread ID in parent memory.
Signed-off-by: Peng Haitao <penght@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch the fact that a successful execve(2) in a process that
is sharing a file descriptor table results in unsharing the table.
I discovered this through testing and verified it by source
inspection - there is a call to unshare_files() early in
do_execve_common().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I encountered these errors while writing testcase for migrate_pages
syscall for LTP (Linux test project).
I checked stable kernel tree 3.5 to see which paths return these.
Both can be returned from get_nodes(), which is called from:
SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
const unsigned long __user *, old_nodes,
const unsigned long __user *, new_nodes)
The testcase does following:
EFAULT
a) old_nodes/new_nodes is area mmaped with PROT_NONE
b) old_nodes/new_nodes is area not mmapped in process address
space, -1 or area that has been just munmmaped
EINVAL
a) maxnodes overflows kernel limit
b) new_nodes contain node, which has no memory or does not exist
or is not returned for get_mempolicy(MPOL_F_MEMS_ALLOWED).
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I puzzled over mprotect()'s effect on /proc/*/maps for a while
yesterday -- it was setting "x" without PROT_EXEC being specified.
Here is a patch to add some explanation.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We have users who are terribly confused why their binaries
with CAP_DAC_OVERRIDE capability see EACCESS from access() calls,
but are able to read the file.
The reason is access() isn't the "can I read/write/execute this
file?" question, it is the "(assuming that I'm a setuid binary,)
can *the user who invoked me* read/write/execute this file?"
question.
That's why it uses real UIDs as documented, and why it ignores
capabilities when capability-endorsed binaries are run by non-root
(this patch adds this information).
To make users more likely to notice this less-known detail,
the patch expands the explanation with rationale for this logic
into a separate paragraph.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: linux-man@vger.kernel.org
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I am not sure why we have:
"EAGAIN fork() cannot allocate sufficient memory to copy
the parent's page tables and allocate a task structure
or the child."
The text seems to be there from the time when man-pages
were moved to git so there is no history for it.
And it doesn't reflect reality: the kernel reports both
dup_task_struct and dup_mm failures as ENOMEM to the
userspace. This seems to be the case from early 2.x times
so let's simply remove this part.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Verified by experiment on Linux 3.15 and 3.19rc4.
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Let's assume Michael's email address did not change.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a reference to the AF_ALG protocol accessible via socket(2).
Signed-off-by: Stephan Mueller <stephan.mueller@atsec.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
With his last patches for getrandom.2 Michael Kerrisk posed a few
questions and left some comments in the man-page. This patch
seeks to clarify the open issues.
72 For example, if the call is interrupted by a signal handler,
73 it may return a partially filled buffer, or fail with the error
74 .BR EINTR .
75 .\" Tested with buffer sizes > 256 bytes: both partial reads
76 .\" and EINTR can occur, with the former being more frequent.
77 .\"
Michael's observation agrees with the code.
For buffer size > 256: If the buffer is still empty EINTR occurs.
If any number of bytes has been read to the buffer, that number
is returned. The comment can be removed.
78 .\" mtk: In the absence of signals, in my testing, even very large reads
79 .\" return full buffers. I found that reads of up to 33554431 always
80 .\" returned a filled buffer. Specifying 'buflen' > 33554431 always
81 .\" returned just 33554431 bytes. (I'm not sure where that number comes
from.
The maximum number of bytes transferred is limited for
/dev/urandom to:
nbytes = min_t(size_t, nbytes, INT_MAX >> (ENTROPY_SHIFT + 3));
// <= 0x1fffff
and for /dev/random to
nbytes = min_t(size_t, nbytes, SEC_XFER_SIZE); // <= 0x200
Lets put this into the NOTES section.
224 When reading from
225 .IR /dev/random ,
226 blocking requests of any size can be interrupted by a signal
227 (the call fails with the error
228 .BR EINTR ).
Thats ok.
82 If the pool has not yet been initialized, then the call blocks, unless
83 .B GRND_RANDOM
84 is specified in
85 .IR flags .
86 .\" FIXME We need a bit more information here.
87 .\" The reader will ask: when is /dev/urandom initialized?
88 .\" There should be some text here to explain that.
Entropy is collected from different sources, e.g.
- time of reaping a thread
- MAC address of a network interfaces
- Allwinner security ID
- ROM content of a firewire device
- ...
When more than 128 bits have been collected, the pool is set
to initialized.
I suggest that detailed information about the initialization
should be provided on the random.4 page.
I added a paragraph in the NOTES section.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>