Back in 2014 (37bee118ad) the text
describing when multiplexing happens was changed in a confusing way.
This is an attempt to clarify things a bit.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since 93e06c7a6453 ("mm: enable MADV_FREE for swapless system") we
handle MADV_FREE on a swapless system the same way as with the
swap available. Clarify that fact in the man page.
Reported-by: Niklas Hambüchen <mail@nh2.me>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove the old statement that PTRACE_O_TRACESYSGOOD may not work
on all architectures. As far as I can tell, all kernel code
properly tests PT_TRACESYSGOOD flag and sets the 7th bit in the
exit code passed to ptrace_notify().
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
To test the behavior documented by this patch, the following
demos employ the program shown at the foot of this commit message.
First, show that the pdeath signal is sent when the parent
terminates:
$ ./pdeath_signal 0 10 4
Parent (18595) about to sleep for 4 seconds
Child about to set PR_SET_PDEATHSIG
Child about to sleep
Parent (18595) terminating
*********** Child (18596) got signal; si_pid = 18595; si_uid = 1000
Parent PID is now 1403
$ Child about to exit
But the signal is not sent if the parent terminates before the
child uses PR_SET_PDEATHSIG:
$ ./pdeath_signal 2 10 0
Parent (18707) about to sleep for 0 seconds
Parent (18707) terminating
Child about to sleep 2 seconds before setting PR_SET_PDEATHSIG
$ Child about to set PR_SET_PDEATHSIG
Child about to sleep
Child about to exit
Demonstrate that the pdeath signal is sent on termination of each
ancestor subreaper process:
$ ./pdeath_signal 2 10 3 7 6 5
18786 marked itself as a subreaper
18786 subreaper about to sleep 7 seconds
18787 marked itself as a subreaper
18787 subreaper about to sleep 6 seconds
18788 marked itself as a subreaper
18788 subreaper about to sleep 5 seconds
Parent (18789) about to sleep for 3 seconds
Child about to sleep 2 seconds before setting PR_SET_PDEATHSIG
Child about to set PR_SET_PDEATHSIG
Child about to sleep
Parent (18789) terminating
*********** Child (18790) got signal; si_pid = 18789; si_uid = 1000
Parent PID is now 18788
18788 subreaper about to terminate
*********** Child (18790) got signal; si_pid = 18788; si_uid = 1000
Parent PID is now 18787
18787 subreaper about to terminate
*********** Child (18790) got signal; si_pid = 18787; si_uid = 1000
Parent PID is now 18786
18786 subreaper about to terminate
*********** Child (18790) got signal; si_pid = 18786; si_uid = 1000
Parent PID is now 1403
$ Child about to exit
But in the case where some subreapers terminate before they
have a chance to adopt the child, the terminations of those
subreapers do not result in a signal for the child:
$ ./pdeath_signal 2 10 3 5 6 7
18836 marked itself as a subreaper
18836 subreaper about to sleep 5 seconds
18837 marked itself as a subreaper
18837 subreaper about to sleep 6 seconds
18838 marked itself as a subreaper
18838 subreaper about to sleep 7 seconds
Parent (18839) about to sleep for 3 seconds
Child about to sleep 2 seconds before setting PR_SET_PDEATHSIG
Child about to set PR_SET_PDEATHSIG
Child about to sleep
Parent (18839) terminating
*********** Child (18840) got signal; si_pid = 18839; si_uid = 1000
Parent PID is now 18838
18836 subreaper about to terminate
$ 18837 subreaper about to terminate
18838 subreaper about to terminate
*********** Child (18840) got signal; si_pid = 18838; si_uid = 1000
Parent PID is now 1403
Child about to exit
============================
/* pdeath_signal.c */
} while (0)
static void
handler(int sig, siginfo_t *si, void *ucontext)
{
printf("*********** Child (%ld) got signal; si_pid = %d; si_uid = %d\n",
(long) getpid(), si->si_pid, si->si_uid);
printf(" Parent PID is now %ld\n", (long) getppid());
}
int
main(int argc, char *argv[])
{
struct sigaction sa;
int childPreSleep, childPostSleep, parentSleep;
if (argc < 2) {
fprintf(stderr, "Usage: %s child-pre-sleep "
"[child-post-sleep [parent-sleep [subreaper-sleep...]]]\n",
argv[0]);
exit(EXIT_FAILURE);
}
childPreSleep = atoi(argv[1]);
if (argc > 2)
childPostSleep = atoi(argv[2]);
if (argc > 3)
parentSleep = atoi(argv[3]);
/* Optionally create a series of subreapers */
if (argc > 4) {
for (int sr = 4; sr < argc; sr++) {
if (prctl(PR_SET_CHILD_SUBREAPER, 1) == -1)
errExit("prctl");
printf("%ld marked itself as a subreaper\n", (long) getpid());
switch (fork()) {
case -1:
errExit("fork");
case 0:
break;
default:
printf("%ld subreaper about to sleep %s seconds\n",
(long) getpid(), argv[sr]);
sleep(atoi(argv[sr]));
printf("%ld subreaper about to terminate\n", (long) getpid());
exit(EXIT_SUCCESS);
}
}
}
switch (fork()) {
case -1:
errExit("fork");
case 0:
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = handler;
if (sigaction(SIGUSR1, &sa, NULL) == -1)
errExit("sigaction");
if (childPreSleep > 0) {
printf("Child about to sleep %d seconds before setting "
"PR_SET_PDEATHSIG\n", childPreSleep);
sleep(childPreSleep);
}
printf("Child about to set PR_SET_PDEATHSIG\n");
if (prctl(PR_SET_PDEATHSIG, SIGUSR1) == -1)
errExit("prctl");
printf("Child about to sleep\n");
for (int j = 0; j < childPostSleep; j++)
sleep(1);
printf("Child about to exit\n");
exit(EXIT_SUCCESS);
default:
printf("Parent (%ld) about to sleep for %d seconds\n",
(long) getpid(), parentSleep);
sleep(parentSleep);
printf("Parent (%ld) terminating\n", (long) getpid());
exit(EXIT_SUCCESS);
}
}
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The signal is process directed and the siginfo_t->si_pid
filed contains the PID of the terminating parent.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
ptrace() with requests PTRACE_PEEKTEXT, PTRACE_PEEKDATA and
PTRACE_PEEKUSER can set errno to zero. AFAICS this is for a good
reason (so that you can tell the difference between a successful
PEEK with a result of -1 and a failed PEEK, even if you forget to
clear errno yourself), but it technically violates the rules
described in the errno.3 manpage.
glibc snippet from sysdeps/unix/sysv/linux/ptrace.c:
res = INLINE_SYSCALL (ptrace, 4, request, pid, addr, data);
if (res >= 0 && request > 0 && request < 4)
{
__set_errno (0);
return ret;
}
reproducer:
$ cat ptrace_test.c
char foobar_data[4] = "ABCD";
int main(void) {
pid_t child = fork();
if (child == -1) err(1, "fork");
if (child == 0) {
if (prctl(PR_SET_PDEATHSIG, SIGKILL)) err(1, "prctl");
while (1) sleep(1);
}
int status;
if (ptrace(PTRACE_ATTACH, child, NULL, NULL)) err(1, "attach");
if (waitpid(child, &status, 0) != child) err(1, "wait");
errno = EINVAL;
unsigned int res = ptrace(PTRACE_PEEKDATA, child, foobar_data, NULL);
printf("errno after PEEKDATA: %d\n", errno);
printf("PEEKDATA result: 0x%x\n", res);
}
$ gcc -o ptrace_test ptrace_test.c -Wall
$ ./ptrace_test
errno after PEEKDATA: 0
PEEKDATA result: 0x44434241
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See copy_process() in kernel/fork.c:
if (clone_flags & CLONE_THREAD) {
if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
(task_active_pid_ns(current) !=
current->nsproxy->pid_ns_for_children))
return ERR_PTR(-EINVAL);
}
current->nsproxy->pid_ns_for_children is where unshare(CLONE_NEWPID)
stashes the pending namespace.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use "FAN_OPEN_PERM" consistently rather than "FAN_PERM_OPEN".
Signed-off-by: Anthony Iliopoulos <ailiopoulos@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See Linux commit a5ad88ce8c7fae7ddc72ee49a11a75aa837788e0,
"mm: get rid of 'vmalloc_info' from /proc/meminfo".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Because of setns() semantics, the parent of a process may reside
in the outer PID namespace. If that parent terminates, then the
child is adopted by the "init" in the outer PID namespace (rather
than the "init" of the PID namespace of the child).
Thus, in a scenario such as the following, if process M
terminates, P is adopted by the init process in the initial
PID namespace, and if P terminates, Q is adopted by the init
process in the inner PID namespace.
+---------------------------------------------+
| Initial PID NS |
| +---------------+ |
| +-+ | inner PID NS | |
| |1| | | |
| +-+ | +-+ | |
| | |1| | |
| | +-+ | |
| | | |
| +-+ setns(), fork() | +-+ | |
| |M|----------------------+--> |P| | |
| +-+ | +-+ | |
| | | fork() | |
| | v | |
| | +-+ | |
| | |Q| | |
| | +-+ | |
| +---------------+ |
+---------------------------------------------+
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The extra detail has little of noting with -test 2.6.0
added a particular feature has little value these days,
and is likely to confuse some readers who don't know
(and probably don't care) about the historical details.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Checking the FreeBSD source code, there's explicit support for
this to accommodate non-BSD systems (such as Linux).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Having the signals listed in three different tables reduces
readability, and would require more table splits if future
standards specify other signals.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current tables of signal information are unwieldy,
as they try to cram in too much information.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The notes in pthread_rwlockattr_setkind_np.3 imply there is a bug
in glibc's implementation of PTHREAD_RWLOCK_PREFER_WRITER_NP (a
non-portable constant anyway), but this is not true. The
implementation of PTHREAD_RWLOCK_PREFER_WRITER_NP is made almost
impossible by the POSIX standard requirement that reader locks be
allowed to be recursive, and that requirement makes writer
preference deadlock without an impossibly complex requirement that
we track all reader locks. Therefore the only sensible solution
was to add PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP and
disallow recursive reader locks if you want writer preference.
This patch removes the bug description and documents the current
state and recommendations for glibc. I have also updated bug 7057
with this information, answering Steven Munroe's almost 10 year
old question :-) I hope Steven is enjoying his much earned
retirement.
Should we move the glibc discussion to some footnote? Some libc
may be able to implement the requirement to avoid deadlocks in the
future, but I doubt it (fundamental CS stuff).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The previous location does not seem to be getting updated.
(For example, at the time of this commit, libcap-2.26
had been out for two months, but was not present at
http://www.kernel.org/pub/linux/libs/security/linux-privs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use "UFFDIO_ZEROPAGE" consistently rather than "UFFDIO_ZERO".
Signed-off-by: Anthony Iliopoulos <ailiopoulos@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Based on text from Documentation/filesystems/ramfs-rootfs-initramfs.txt.
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>