man-pages

Commit Graph

Author	SHA1	Message	Date
Michael Kerrisk	441632abcc	pidfd_open.2: Remove a redundant sentence clone() CLONE_PIDFD is already mentioned elsewhere in NOTES. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-12 23:39:46 +02:00
Michael Kerrisk	23c167c6a5	select.2: POLLIN_SET/POLLOUT_SET/POLLEX_SET are now defined in terms of EPOLL* Since kernel commit a9a08845e9acbd224e4ee466f5c1275ed50054e8, the equivalence between select() and poll()/epoll is defined in terms of the EPOLL* constants, rather than the POLL* constants. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-12 13:49:27 +02:00
Michael Kerrisk	867c9b3408	localedef.1, close.2, copy_file_range.2, execve.2, get_robust_list.2, getdomainname.2, gethostname.2, inotify_add_watch.2, io_submit.2, ioctl_fideduperange.2, kcmp.2, kill.2, mmap.2, move_pages.2, perf_event_open.2, ptrace.2, rt_sigqueueinfo.2, sched_setaffinity.2, sched_setparam.2, setns.2, sigaction.2, signalfd.2, statx.2, syscall.2, syscalls.2, uname.2, write.2, errno.3, fexecve.3, getauxval.3, printf.3, pthread_mutex_consistent.3, pthread_mutexattr_init.3, pthread_mutexattr_setrobust.3, pthread_setcancelstate.3, regex.3, strtok.3, strtol.3, ttyname.3, smartpqi.4, core.5, resolv.conf.5, man-pages.7, mq_overview.7, operator.7, pthreads.7, signal-safety.7, sysvipc.7: Update timestamp Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-11 10:45:02 +02:00
Michael Kerrisk	20a3713221	pidfd_open.2: Further enhancements to fork() + pidfd_open() text Christian noted that SA_NOCLDWAIT also matters in this scenario. Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	b869edcbc9	pidfd_open.2: wfix Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	8f57d60f5c	pidfd_open.2: tfix Reported-by: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	13ad736507	pidfd_open.2: Enhance the discussion of usage of fork() + pidfd_open() After review comments from Christian and Daniel. Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Reported-by: Daniel Colascione <dancol@google.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	59341b5269	pidfd_open.2: Explain how pidfd_open() can be used to with fork() Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	6059228b26	pidfd_send_signal.2: Minor wording improvement Reported-by: Daniel Colascione <dancol@google.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	6ff9a0d85e	pidfd_send_signal.2: tfix Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	d651cc6015	pidfd_send_signal.2: Fixes after review comments from Florian Weimer Reported-by: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	f831492d2b	pidfd_send_signal.2: wfix Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	1a5adccc69	pidfd_open.2: Add some missing errors Reported-by: Florian Weimer <fw@deneb.enyo.de> Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	57a436eb4c	pidfd_open.2: Improve description in example Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	4e547536bb	pidfd_open.2: Add a comment on system call number in example code Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	ad0434de58	pidfd_open.2: read(2) of a PID file descriptor fails with EINVAL Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	9e9bf5383a	pidfd_open.2: wfix Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	1523b08d3b	pidfd_open.2: wfix Reported-by: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	465f610c4c	pidfd_open.2: Add <sys/types.h> to SYNOPSIS Reported-by: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	4654d63c35	pidfd_send_signal.2: tfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	690fbab2ee	pidfd_open.2: tfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	d883766832	clone.2: SEE ALSO: add pidfd_open(2) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	30d0d39a4f	pidfd_open.2: Opening /proc/PID doesn't yield a pollable file descriptor Thus, pidfd_open() is the preferred way of obtaining a PID file descriptor. Notes from a conversation with Christian Brauner: [[ > A further question... We now have three ways of getting a > process file descriptor [*]: > > open() of /proc/PID > pidfd_open() > clone()/clone3() with CLONE_PIDFD > > I thought the FD was supposed to be equivalent in all three cases. > However, if I try (on kernel 5.3) poll() an FD returned by opening > /proc/PID, poll() tells me POLLNVAL for the FD. Is that difference > intentional? (I am guessing it is not.) It's intentional. The short answer is that /proc/<pid> is a convenience for sending signals. The longer answer is that this stems from a heavy debate about what a process file descriptor was supposed to be and some people pushing for at least being able to use /proc/<pid> dirfds while ignoring security problems as soon as you're talking about returning those fds from clone(); not to mention the additional problems discovered when trying to implementing this. A "real" pidfd is one from CLONE_PIDFD or pidfd_open() and all features such as exit notification, read, and other future extensions will only be implemented on top of them. As much as we'd have liked to get rid of two different file descriptor types it doesn't hurt us much and is not that much different from what we will e.g. see with fsinfo() in the new mount api which needs to work on regular fds gotten via open()/openat() and mountfds gotten from fsopen() and fspick(). The mountfds will also allow for advanced operations that the other ones will not. There's even an argument to be made that fds you will get from open()/openat() and openat2() are different types since they have very different behavior; openat2() returning fds that are non arbitrarily upgradable etc. ]] Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	4b5f60c597	sigaction.2: SEE ALSO: add pidfd_send_signal(2) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	8c2bb83d9b	rt_sigqueueinfo.2: SEE ALSO: add pidfd_send_signal(2) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	86314949ad	kill.2: SEE ALSO: add pidfd_send_signal(2) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	9517cf56fc	pidfd_send_signal.2: New page documenting pidfd_send_signal(2) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	f110d8350c	pidfd_open.2: New page documenting pidfd_open(2) Notes from a conversation on linux-man@ with Christian Brauner: [[ > [} By the way, going forward, can we call these things > "process FDs", rather than "PID FDs"? The API names are what > they are, an that's okay, but these just as we have socket > FDs that refer to sockets, directory FDs that refer to > directories, and timer FDs that refer to timers, and so on, > these are FDs that refer to processes*, not "process IDs". > It's a little thing, but I think the naming better, and > it's what I propose to use in the manual pages. The naming was another debate and we ended with this compromise. I would just clarify that a pidfd is a process file descriptor. I wouldn't make too much of a deal of hiding the shortcut "pidfd". People are already using it out there in the wild and it's never proven a good idea to go against accepted practice. ]] Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	34a975f8ae	clone.2: Refer to pidfd_open(2) for the purpose of PID file descriptors Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	b97cc7ae40	clone.2: Minor wording improvements Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	7d7dc1877f	clone.2: Remove a CLONE_PIDFD detail that wasn't true in the final implementation Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	b4ebffb230	clone.2: The close-on-exec flag is set on the new FD returned by CLONE_PIDFD In the kernel source (kernel/fork.c::copy_process()), there is: pidfile = anon_inode_getfile("[pidfd]", &pidfd_fops, pid, O_RDWR \| O_CLOEXEC); Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	0eec009fb3	clone.2: ffix (split a paragraph) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	4e98b07476	clone.2: Minor tweaks to Christian Brauner's patch Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	99f6c1d734	clone.2: srcfix: wrap source at sentence boundaries Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Christian Brauner	9f93898154	clone.2: Document CLONE_PIDFD Add an entry for CLONE_PIDFD. This flag is available starting with kernel 5.2. If specified, a process file descriptor ("pidfd") referring to the child process will be returned in the ptid argument. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	a2dd6388a7	pivot_root.2: Update the copyright and license After my rewriting, almost nothing of the original page remains, so update the copyright. As the author, I'm relicensing to the "verbatim" license most commonly used in man pages. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	875298005d	pivot_root.2: Minor wording tweaks Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	ba4b07c30f	pivot_root.2: Another couple of s/filesystem/mount/ This is consistent with some earlier changes suggested by Eric Biederman. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	542175d8e4	pivot_root.2: Tweak text of an EINVAL error to correspond to DESCRIPTION Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	01c64c3b4b	pivot_root.2: Relegate text about what pivot_root() may or may not do to NOTES The text stating that "pivot_root() may or may not change the current root and the current working directory of any processes or threads which use the old root directory" was written 19 years ago, before the system call itself was even finalized in the kernel. The implementation has never changed, and it won't change in the future, since that would cause user-space breakage. The existence of that text in DESCRIPTION, followed by qualifying text stating what the implementation actually does (and has always done) makes for confusing reading. Therefore, relegate this text to a historical note in NOTES (so that readers with long memories can see why the manual page was changed) and rework the text in DESCRIPTION accordingly. Reported-by: Philipp Wendler <ml@philippwendler.de> Reported-by: Eric W. Biederman <ebiederm@xmission.com> Reported-by: Reid Priedhorsky <reidpr@lanl.gov> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	3db820fe18	pivot_root.2: Add a subsection header for the pivot_root(".", ".") discussion Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	97076c5a0b	pivot_root.2: Minor change: relocate a paragraph in NOTES Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 12:24:28 +02:00
Michael Kerrisk	0843016c9b	pivot_root.2: s/root filesystem/root mount/ As suggested by Eric Biederman. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-10 11:37:02 +02:00
Michael Kerrisk	666373fc08	pivot_root.2: Reword one of the restrictions on 'new_root' A suggested by Eric Biederman Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-09 23:26:54 +02:00
Michael Kerrisk	33313a260c	pivot_root.2: Change "filesystem" to "mount" in various places Quoting Eric: If we are going to be pedantic "filesystem" is really the wrong concept here. The section about bind mount clarifies it, but I wonder if there is a better term. I think I would say: "new_root and put_old must not be on the same mount as the current root." I think using "mount" instead of "filesystem" keeps the concepts less confusing. As I am reading through this email and seeing text that is trying to be precise and clear then hitting the term "filesystem" is a bit jarring. pivot_root doesn't care a thing for file systems. pivot_root only cares about mounts. And by a "mount" I mean the thing that you get when you create a bind mount or you call mount normally. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-09 12:14:35 +02:00
Michael Kerrisk	9f3af6b8c8	pivot_root.2: Simplify discussion of restrictions for 'new_root' Philipp Wendler noted that the text on the restrictions for 'new_root' was slightly contradictory, and things could be clarified and simplified by describing the restrictions on 'new_root' in one place. Reported-by: Philipp Wendler <ml@philippwendler.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-09 09:40:15 +02:00
Michael Kerrisk	b27d444f34	pivot_root.2: Remove an imprecision in description Remove the text that suggests that pivot_root() changes the root directory and CWD of process that have directory and CWD on the old root filesystem. Change "filesystem" to "directory". Reported-by: Philipp Wendler <ml@philippwendler.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-09 09:40:09 +02:00
Michael Kerrisk	47b69a37cf	pivot_root.2: srcfix: FIXME Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-08 21:29:28 +02:00
Michael Kerrisk	8f2a9129e6	pivot_root.2: Remove the term 'old_root' Reid noted a confusion between 'old_root' (my attempt at a shorthand for the old root point) and 'put_old. Eliminate the confusion by replacing the shorthand with "old root mount point". Reported-by: Reid Priedhorsky <reidpr@lanl.gov> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-08 20:57:55 +02:00
Michael Kerrisk	459fe99546	mount.2: Describe the concept of "parent mounts" Reported-by: Reid Priedhorsky <reidpr@lanl.gov> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-08 17:26:51 +02:00
Michael Kerrisk	e0e0ba7d01	mount.2: tfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-08 17:23:20 +02:00
Michael Kerrisk	dd858bfd5e	mount.2: Rework the text on mount namespaces a little Eliminate the term "Per-process namespaces" and add a reference to mount_namespaces(7). Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-08 16:44:58 +02:00
Michael Kerrisk	5d3bcce72d	mount.2: wfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-08 16:44:58 +02:00
Michael Kerrisk	632940d96d	mount.2: NOTES: add subsection heading for /proc/[pid]/{mounts,mountinfo} Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-08 16:44:58 +02:00
Michael Kerrisk	93cc3b3827	pivot_root.2: Simplify pivot_root(".", ".") example Eric Biederman notes that the change in commit `f646ac88ef` was not strictly necessary for this example, since one of the already documented requirements is that various mount points must not have shared propagation, or else pivot_root() will fail. So, simplify the example. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-07 14:02:42 +03:00
Michael Kerrisk	c6ed23c5da	perf_event_open.2: SEE ALSO: add Documentation/admin-guide/perf-security.rst Reported-by: Alexey Budankov <alexey.budankov@linux.intel.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-05 11:30:42 +03:00
Michael Kerrisk	1ff5960b23	prctl.2: Clarify that PR_MCE_KILL_GET returns value via function result Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-02 07:20:45 +03:00
Michael Kerrisk	035a7bf179	prctl.2: wfix (for consistency) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-02 07:19:53 +03:00
Michael Kerrisk	7f5d84426c	prctl.2: RETURN VALUE: add some missing entries Note success return for PR_GET_SPECULATION_CTRL and PR_GET_FP_MODE. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-02 07:09:38 +03:00
Michael Kerrisk	1cea09b38b	prctl.2: Clarify that PR_GET_SPECULATION_CTRL returns value as function result Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-02 06:56:50 +03:00
Michael Kerrisk	f1bb579885	prctl.2: grfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-02 06:11:02 +03:00
Michael Kerrisk	f1ba3ad272	prctl.2: wfix (for consistency with usage in rest of this page) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-02 06:07:52 +03:00
Michael Kerrisk	3946602978	prctl.2: Clarify that PR_GET_FP_MODE returns value as function result Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-10-02 06:05:50 +03:00
Michael Kerrisk	27f942adbc	sched_setparam.2, pthread_mutexattr_init.3, pthread_mutexattr_setrobust.3, pthread_mutex_consistent.3, strtol.3, sched.7, uts_namespaces.7: SEE ALSO: correct list order Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-27 14:18:46 +02:00
Michael Kerrisk	549597a85f	close.2, execve.2, io_submit.2, prctl.2, write.2: Remove section number from references to function in its own page Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-27 14:18:46 +02:00
Michael Kerrisk	49a2a1052b	copy_file_range.2, fanotify_mark.2, inotify_add_watch.2, ioctl_fideduperange.2, kcmp.2, prctl.2, get_robust_list.2, tkill.2, ttyname.3: ERRORS: correct alphabetical order Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-27 14:18:08 +02:00
Amir Goldstein	88e75e2c56	copy_file_range.2: Kernel v5.3 updates Update with all the missing errors the syscall can return, the behaviour the syscall should have w.r.t. to copies within single files, etc. [Amir] updates for final released version. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-27 13:26:03 +02:00
Michael Kerrisk	4985364098	epoll_wait.2: tfix Reported-by: nilsocket <nilsocket@gmail.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-27 08:36:52 +02:00
Michael Kerrisk	362310a7bd	signalfd.2: tfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-25 23:20:08 +02:00
Jakub Wilk	bf421740d4	pivot_root.2: tfix Remove duplicated words. Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-25 20:41:48 +02:00
Michael Kerrisk	d703afe9a6	sched_setaffinity.2: RETURN VALUE: sched_getaffinity() syscall differs from the wrapper In RETURN VALUE, point reader at subsection noting that the return value of the raw sched_setaffinity() system call differs from the wrapper function in glibc. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-25 14:35:27 +02:00
Michael Kerrisk	f3fdbe2812	open.2: tfix Reported-by: Дилян Палаузов <dilyan.palauzov@aegee.org> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-24 12:16:07 +02:00
Michael Kerrisk	b892d64f4f	signalfd.2: Rewrite the text on epoll semantics I also verified the behavior reported by Andrew Clayton with the program below. $ ./epoll_signalfd PID of parent: 5661 PID of child: 5662 epoll_wait() returned 0 PID 5662: got signal 10 Successfully read signal, even though epoll_wait() didn't say FD was ready! 8x----8x----8x----8x----8x----8x----8x----8x----8x----8x----8x----8x---- /* epoll_signalfd.c / #include <sys/signalfd.h> #include <signal.h> #include <sys/epoll.h> #include <sys/wait.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \ } while (0) static void signalTest(int sfd, int epfd) { struct signalfd_siginfo fdsi; struct epoll_event rev; int ready; ssize_t s; usleep(50000); ready = epoll_wait(epfd, &rev, 1, 0); if (ready == -1) errExit("epoll_wait"); printf("epoll_wait() returned %d\n", ready); s = read(sfd, &fdsi, sizeof(struct signalfd_siginfo)); if (s != sizeof(struct signalfd_siginfo)) errExit("read"); printf("PID %ld: got signal %d\n", (long) getpid(), fdsi.ssi_signo); if (ready == 0 && s > 0) printf("Successfully read signal, even though epoll_wait() " "didn't say FD was ready!\n"); } int main(int argc, char argv[]) { struct epoll_event ev; sigset_t mask; int sfd, epfd; sigfillset(&mask); sigdelset(&mask, SIGINT); if (sigprocmask(SIG_BLOCK, &mask, NULL) == -1) errExit("sigprocmask"); sfd = signalfd(-1, &mask, SFD_NONBLOCK); if (sfd == -1) errExit("signalfd"); epfd = epoll_create(5); if (epfd == -1) errExit("epoll_create"); ev.data.fd = sfd; ev.events = EPOLLIN; if (epoll_ctl(epfd, EPOLL_CTL_ADD, sfd, &ev) == -1) errExit("epoll_ctl"); switch (fork()) { case -1: errExit("fork"); case 0: printf("PID of child: %ld\n", (long) getpid()); raise(SIGUSR1); signalTest(sfd, epfd); break; default: printf("PID of parent: %ld\n", (long) getpid()); wait(NULL); break; } exit(EXIT_SUCCESS); } 8x----8x----8x----8x----8x----8x----8x----8x----8x----8x----8x----8x---- Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 16:48:36 +02:00
Andrew Clayton	e95f6bf482	signalfd.2: Note about interactions with epoll & fork Using signalfd(2) with epoll(7) and fork(2) can lead to some head scratching. It seems that when a signalfd file descriptor is added to epoll you will only get notifications for signals sent to the process that added the file descriptor to epoll. So if you have a signalfd fd registered with epoll and then call fork(2), perhaps by way of daemon(3) for example. Then you will find that you no longer get notifications for signals sent to the newly forked process. User kentonv on ycombinator[0] explained it thus "One place where the inconsistency gets weird is when you use signalfd with epoll. The epoll will flag events on the signalfd based on the process where the signalfd was registered with epoll, not the process where the epoll is being used. One case where this can be surprising is if you set up a signalfd and an epoll and then fork() for the purpose of daemonizing -- now you will find that your epoll mysteriously doesn't deliver any events for the signalfd despite the signalfd otherwise appearing to function as expected." And another post from the same person[1]. And then there is this snippet from this kernel commit message[2] "If you share epoll fd which contains our sigfd with another process you should blame yourself. signalfd is "really special"." So add a note to the man page that points this out where people will hopefully find it sooner rather than later! [0]: https://news.ycombinator.com/item?id=9564975 [1]: https://stackoverflow.com/questions/26701159/sending-signalfd-to-another-process/29751604#29751604 [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d80e731ecab420ddcb79ee9d0ac427acbc187b4b Signed-off-by: Andrew Clayton <andrew@digital-domain.net> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 15:57:21 +02:00
Michael Kerrisk	9d33e03b95	pivot_root.2: Explain why various mount points can't have shared propagation Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	d4b2104ae5	pivot_root.2: Correct the list of mount points that can't be MS_SHARED Eric Biederman noted that my list of directories that could not have shared propagation was incorrect. I had written that new_root could not be shared; rather it should be: the parent of the current root mount point. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	f646ac88ef	pivot_root.2: Tweak pivot_root(".", ".") example Quoting Eric Biederman: The concern from our conversation at the container mini-summit was that there is a pathology if in your initial mount namespace all of the mounts are marked MS_SHARED like systemd does (and is almost necessary if you are going to use mount propagation), that if new_root itself is MS_SHARED then unmounting the old_root could propagate. So I believe the desired sequence is: >>> chdir(new_root); +++ mount("", ".", MS_SLAVE \| MS_REC, NULL); >>> pivot_root(".", "."); >>> umount2(".", MNT_DETACH); The change to new new_root could be either MS_SLAVE or MS_PRIVATE. So long as it is not MS_SHARED the mount won't propagate back to the parent mount namespace. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	57bab66a92	pivot_root.2: pivot_root(".", ".") really is a thing LXC uses this [1]. I tested, to double-check, and it works. The fchdir() dance done by LXC is not needed though: fchdir(old_root); umount(".", MNT_DETACH); fchdir(new_root); As far as I can see, just the umount() is sufficient, since, after pivot_root(), oldi_root is at the top of the stack of mounts at "/" and thus (so long as CWD is at "/") the umount will remove the mount at the top of the stack. Eric Biederman confirmed my understanding by mail, and Philipp Wendler verified my results by experiment. [1] See the following commit in LXC: commit 2d489f9e87fa0cccd8a1762680a43eeff2fe1b6e Author: Serge Hallyn <serge.hallyn@ubuntu.com> Date: Sat Sep 20 03:15:44 2014 +0000 pivot_root: switch to a new mechanism (v2) Helped-by: Eric W. Biederman <ebiederm@xmission.com> Helped-by: Philipp Wendler <ml@philippwendler.de> Helped-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	682e1329f9	pivot_root.2: Eliminate text suggesting that behavior may change in the future After around 19 years, the behavior of pivot_root() has not been changed, and will almost certainly not change in the future. So, reword to remove the suggestion that the behavior may change. Also, more clearly document the effect of pivot_root() on the calling process's current working directory. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	4a8b7d7b13	pivot_root.2: Rework a "hanging" description into an earlier paragraph The reference of "Note that this also applies" was vague. So combine this paragraph with an earlier one to make the linkage clearer. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	aff78c76f7	pivot_root.2: Remove a note about a historical idea/expectation The idea that there might one day be a mechanism for kernel threads to explicitly relinquish access to the filesystem never came to pass (after 20 years), and the presence of text describing this idea is, IMO, a distraction. So, remove it. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	c4bf33331b	pivot_root.2: ffix (break up a paragraph) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	eb9078a7a9	pivot_root.2: Remove text describing case where current root is not a mount point One kernel printk() later, my suspicions seem confirmed: the text describing the situation where the current root is not a mount point (because of a chroot()) seems to be bogus. (Perhaps it was true once upon a time.) In my testing, if the current root is not a mount point, an EINVAL error results. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	fc17fc6502	pivot_root.2: srcfix: FIXME Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	d761305516	pivot_root.2: Fix a technical detail In this text: If the current root is not a mount point (e.g., after an earlier chroot(2) or pivot_root())... mention of pivot_root() makes no sense, since (as noted in an earlier commit message for this page) 'new_root' in a previous pivot_root() must (since Linux 2.4.5) have been a mount point. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	14caaed2c1	pivot_root.2: Minor change: rewrite the reference to pivot_root(8) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	bbae63c580	pivot_root.2: Remove BUGS section One of these "bugs" is a philosophical point already covered elsewhere in the page, while the other is a somewhat obscure joke. Both pieces are a bit of a distraction, really. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	41d4557c09	pivot_root.2: Minor wording fix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	fc2f474d77	pivot_root.2: Relocate details about kernel threads to NOTES This text is a side point that somewhat distracts from the flow in DESCRIPTION. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	b647c4c93a	pivot_root.2: Add some more detail to the remaining EBUSY error Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	071505e9fb	pivot_root.2: Remove bogus a bogus EBUSY error case The note that EBUSY is given if a filesystem is already mounted on 'Iput_old' was never really true. That restriction was in Linux 2.3.14, but removed in Linux 2.3.99-pre6 so it never made it to mainline. The relevant diff in pivot_root() was: error = -EBUSY; - if (d_new_root->d_sb == root->d_sb \|\| d_put_old->d_sb == root->d_sb) + if (new_nd.mnt == root_mnt \|\| old_nd.mnt == root_mnt) goto out2; /* loop / - if (d_put_old != d_put_old->d_covers) - goto out2; / mount point is busy */ error = -EINVAL; Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	2f2e1a2296	pivot_root.2: Add an example program Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	0c2329cdbe	pivot_root.2: Minor fix: add a reference to a relevant piece in NOTES Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	422e36b7f2	pivot_root.2: Relocate text on use cases and add text on purpose of pivot_root(2) Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	a94f69d6db	pivot_root.2: Rework the text on "future changes" to reflect that 20 years have passed Some of the text was written long ago, and hinted that things might change in the future. However, 20 years have passed and these details have not changed, so rework the text to hint at that fact. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	3afc97b20b	pivot_root.2: Mention containers as a use case for pivot_root() Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	0ac6f9008e	pivot_root.2: srcfix Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	b16dd3037d	pivot_root.2: There is no restriction against 'put_old' being a mount point As far as I can see from the source code, the statement that "No other filesystem may be mounted on 'put_old'" is incorrect. Even looking at the 2.4.0 source code, there I can't see such a restriction. In addition, some testing on a 5.0 kernel (mounting 'put_old' in the new mount namespace just before pivot_root()) did not result in an error for this case when calling pivot_root(). Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:19 +02:00
Michael Kerrisk	83cc245d6d	pivot_root.2: srcfix: add self to copyright Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>	2019-09-23 13:11:18 +02:00

1 2 3 4 5 ...

8147 Commits