Commit Graph

8199 Commits

Author SHA1 Message Date
Christian Brauner a17b9d28c3 clone.2: Mention that CLONE_PARENT is off-limits for inits
The CLONE_PARENT flag cannot but used by init processes. Let's mention
this in the manpages to prevent surprises.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-21 10:52:14 +01:00
Michael Kerrisk a10c5a33de clone.2: Note that CLONE_THREAD causes similar behavior to CLONE_PARENT
The introductory paragraphs note that "the calling process" is
normally synonymous with the "the parent process", except in the
case of CLONE_PARENT. The same is also true of CLONE_THREAD.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-21 10:52:14 +01:00
Michael Kerrisk 324f6154f4 Removed trailing white space at end of lines 2019-11-19 15:31:20 +01:00
Michael Kerrisk a5409de92c clone.2, fallocate.2, ioctl_iflags.2, ioctl_list.2, pidfd_open.2, pivot_root.2, quotactl.2, seccomp.2, select.2, wait.2, proc.5, cgroups.7, netdevice.7, uts_namespaces.7: tstamp
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-19 15:31:20 +01:00
Christian Brauner be66dbc7a7 clone.2: Use pid_t for clone3() {child,parent}_tid
Advertise to userspace that they should use proper pid_t types
for arguments returning a pid.

The kernel-internal struct kernel_clone_args currently uses int
as type and since POSIX mandates that pid_t is a signed integer
type and glibc and friends use int this is not an issue. After
the merge window for v5.5 closes we can switch struct
kernel_clone_args over to using pid_t as well without any danger
in regressing current userspace.

Also note, that the new set tid feature which will be merged for
v5.5 uses pid_t types as well.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-17 18:58:24 +01:00
Christian Brauner 8eea66b8bb clone.2: Check for MAP_FAILED not NULL on mmap()
If mmap() fails it will return MAP_FAILED which according to the manpage
is (void *)-1 not NULL.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-17 18:56:07 +01:00
Christian Brauner 225f5da8ac clone.2: tfix
Fix two spelling mistakes in manpage describing the clone{2,3}()
syscalls/syscall wrappers.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-17 18:55:46 +01:00
Michael Kerrisk efc7fb935e mmap.2: tfix
Reported-by: Marko Myllynen <myllynen@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-16 23:35:14 +01:00
Michael Kerrisk 91243dad42 mmap.2: Some rewording of the description of MAP_STACK
Reword a little to allow for the fact that there are now
*two* reasons to consider using this flag.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-14 22:24:52 +01:00
Michael Kerrisk d3d881232b mmap.2: Note that MAP_STACK exists on some other systems
As noted in man-pages commit 99c3a00027,
MAP_STACK exists on at least OpenBSD and FreeBSD.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-14 22:24:52 +01:00
Michael Kerrisk 1b54731692 pivot_root.2: EXAMPLE: allocate stack using mmap() MAP_STACK rather than malloc()
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-14 22:24:45 +01:00
Michael Kerrisk 99c3a00027 clone.2: Allocate child's stack using mmap(2) rather than malloc(3)
Christian Brauner suggested mmap(MAP_STACKED), rather than
malloc(), as the canonical way of allocating a stack for the
child of clone(), and Jann Horn noted some reasons why:

    Not on Linux, but on OpenBSD, they do use MAP_STACK now
    AFAIK; this was announced here:
    <http://openbsd-archive.7691.n7.nabble.com/stack-register-checking-td338238.html>.
    Basically they periodically check whether the userspace
    stack pointer points into a MAP_STACK region, and if not,
    they kill the process. So even if it's a no-op on Linux, it
    might make sense to advise people to use the flag to improve
    portability? I'm not sure if that's something that belongs
    in Linux manpages.

    Another reason against malloc() is that when setting up
    thread stacks in proper, reliable software, you'll probably
    want to place a guard page (in other words, a 4K PROT_NONE
    VMA) at the bottom of the stack to reliably catch stack
    overflows; and you probably don't want to do that with
    malloc, in particular with non-page-aligned allocations.

And the OpenBSD 6.5 manual pages says:

    MAP_STACK
        Indicate that the mapping is used as a stack. This
        flag must be used in combination with MAP_ANON and
        MAP_PRIVATE.

And I then noticed that MAP_STACK seems already to be on
FreeBSD for a long time:

    MAP_STACK
        Map the area as a stack.  MAP_ANON is implied.
        Offset should be 0, fd must be -1, and prot should
        include at least PROT_READ and PROT_WRITE.  This
        option creates a memory region that grows to at
        most len bytes in size, starting from the stack
        top and growing down.  The stack top is the start‐
        ing address returned by the call, plus len bytes.
        The bottom of the stack at maximum growth is the
        starting address returned by the call.

        The entire area is reserved from the point of view
        of other mmap() calls, even if not faulted in yet.

Reported-by: Jann Horn <jannh@google.com>
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-14 12:19:21 +01:00
Michael Kerrisk 8dd6b0bcd2 clone.2: Minor tweaks after feedback from Christian Brauner
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-10 20:39:17 +01:00
Jakub Wilk edf93e146d clone.2: tfix
Remove duplicated word.

Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-09 12:54:41 +01:00
Michael Kerrisk baa435c66c clone.2: Tidy up the description of CLONE_DETACHED
The obsolete CLONE_DETACHED flag has never been properly
documented, but now the discussion CLONE_PIDFD also requires
mention of CLONE_DETACHED. So, properly document CLONE_DETACHED,
and mention its interactions with CLONE_PIDFD.

Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-09 09:09:18 +01:00
Michael Kerrisk f6183e5b21 clone.2: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-09 09:09:18 +01:00
Michael Kerrisk 981eda4aa5 clone.2: Consistently order paragraphs for CLONE_NEW* flags
Sometimes the descriptions of these flags mentioned the
corresponding section 7 namespace manual page and then the
required capabilities, and sometimes the order was the was
the reverse. Make it consistent.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-09 09:09:18 +01:00
Michael Kerrisk d2799a466c clone.2: Remove various details that are already covered in namespaces pages
Remove details of UTS, IPC, and network namespaces that are
already covered in the corresponding namespaces pages in
section 7. This change is for consistency, since corresponding
details were not provided for other namespace types in clone(2)
and these details do not appear in unshare(2).

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-09 09:09:18 +01:00
Michael Kerrisk 1270276bc3 clone.2: Remove wording that suggests CLONE_NEW* flags are for containers
These flags are used for implementing many other interesting
things by now.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-09 09:09:18 +01:00
Michael Kerrisk f5d5180f5c clone.2: Adjustments to clone3() text as well as some other details in the page
After feedback from Christian Brauner [1], I've adjusted a few pieces
of the clone3() text, and also adjusted some of the older text in
the page.

[1] https://lore.kernel.org/linux-man/20191107151941.dw4gtul5lrtax4se@wittgenstein/

Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-09 09:09:02 +01:00
Michael Kerrisk 1033756742 clone.2: Give the introductory paragraph a new coat of paint
Change the text in the introductory paragraph (which was written
20 years ago) to reflect the fact that clone*() does more things
nowadays.

Cowritten-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-08 16:32:38 +01:00
Michael Kerrisk 1373b98190 ioctl_iflags.2: Emphasize that FS_IOC_GETFLAGS and FS_IOC_SETFLAGS argument is 'int *'
Reported-by: Robert Edmonds <edmonds@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-05 10:31:39 +01:00
Michael Kerrisk 556e715a8a ioctl_list.2: Add reference to ioctl(2) SEE ALSO section
The referenced section lists various pages that document ioctls.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-05 10:06:03 +01:00
Andrew Price 3bf86e7d53 fallocate.2: Add gfs2 to the list of punch hole-capable filesystems
Also remove a stray " from the previous item.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-11-01 09:35:12 +01:00
Michael Kerrisk 75e28ebad4 clone.2: ffix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-31 14:22:13 +01:00
Michael Kerrisk 400027959e clone.2, proc.5: Adjust references to namespaces(7)
Adjust references to namespaces(7) to be references to pages
describing specific namespace types.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-31 11:34:08 +01:00
Michael Kerrisk 96e60ae500 quotactl.2: wfix: consistently use 'operation', rather than 'command'
A mix of the two words was being used, with 'operation' being
more common.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-31 07:02:10 +01:00
Michael Kerrisk a5394cba1c quotactl.2: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-31 07:02:10 +01:00
Michael Kerrisk f5fd82cc4e quotactl.2: Minor tweaks to Yang Xu's patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-31 07:02:10 +01:00
Yang Xu ae848b1d80 quotactl.2: Add some details about Q_QUOTAON
For Q_QUOTAON, on old kernel we can use quotacheck -ug to generate
quota files. But in current kernel, we can also hide them in
system inodes and indicate them by using "quota" or project
feature.

For user or group quota, we can do as below (etc ext4):

mkfs.ext4 -F -o quota /dev/sda5
mount /dev/sda5 /mnt
quotactl(QCMD(Q_QUOTAON, USRQUOTA), /dev/sda5, QFMT_VFS_V0, NULL);

For project quota, we can do as below (etc ext4):

mkfs.ext4 -F -o quota,project /dev/sda5
mount /dev/sda5 /mnt
quotactl(QCMD(Q_QUOTAON, PRJQUOTA), /dev/sda5, QFMT_VFS_V0, NULL);

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-31 07:01:59 +01:00
Yang Xu 13a07cc485 copy_file_range.2: tfix
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-31 06:24:58 +01:00
Jakub Wilk a9e52b437f clone.2: Include clone3 in NAME section.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-31 06:22:15 +01:00
Michael Kerrisk 4e4e9e83b6 pidfd_open.2: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-30 10:23:10 +01:00
Michael Kerrisk 462ce23d49 seccomp.2: Switch to "considerate language"
Thanks-to: https://twitter.com/expensivestevie
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-28 12:40:36 +01:00
Michael Kerrisk 16853a31ee clone.2: Introduce "flags mask" as a generic term for clone()/clone3()
Use "flags mask" as a generic term to refer to the clone()
'flags' argument and the clone3() 'cl_args.flags' field.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-25 21:51:04 +02:00
Michael Kerrisk 5261b0fe75 clone.2: Minor improvements following clone3() additions
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-25 21:20:38 +02:00
Michael Kerrisk fb1fa92b0a clone.2: srcfix: update copyright
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-25 20:25:53 +02:00
Michael Kerrisk d89d14246a clone3.2: New link to clone(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-25 16:32:33 +02:00
Michael Kerrisk faa0e55ae9 clone.2: Document clone3()
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-25 15:39:04 +02:00
Michael Kerrisk e2bf12346d clone.2: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-24 15:34:14 +02:00
Michael Kerrisk b7cf324fd8 clone.2: Minor change: move a paragraph from DESCRIPTION to NOTES
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-24 09:26:26 +02:00
Michael Kerrisk 5fbce8f22b clone.2: Add some subsection headings
Again, in preparation for adding clone3() documentation.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-24 09:25:14 +02:00
Michael Kerrisk 81c2368f46 clone.2: Rename arguments for consistency with clone3()
Sometime soon, we'll have to add documentation of clone3() to this
page. As a preparatorys step, make the names of the clone()
arguments the same as the fields in the clone3() 'args' struct:

    ctid        ==> child_pid
    ptid        ==> parent_tid
    newtls      ==> tld
    child_stack ==> stack

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-24 09:13:44 +02:00
Michael Kerrisk f83eb6bf0d mount.2, pidfd_open.2, fuse.4: Minor fix: s/file system/filesystem/
Reported-by: Marko Myllynen <myllynen@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-13 21:04:32 +02:00
Michael Kerrisk ff8ddd1121 wait.2: Clarify semantics of waitpid(0, ...)
As noted in kernel commit 821cc7b0b205c0df64cce59aacc330af251fa8f7,
threads create an ambiguity: what if the calling process's PGID
is changed by another thread while waitpid(0, ...) is blocked?
So, clarify that waitpid(0, ...) means wait for children whose
PGID matches the caller's PGID at the time of the call to
waitpid().

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-13 00:20:46 +02:00
Michael Kerrisk bc4a678613 wait.2: waitid() can be used to wait on children in same process group as caller
Since Linux 5.4, idtype == P_PGID && id == 0 can be used to wait
on children in same process group as caller.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-12 23:41:40 +02:00
Michael Kerrisk f3ea12fb84 wait.2: Add P_PIDFD for waiting on a child referred to by a PID file descriptor
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-12 23:41:40 +02:00
Michael Kerrisk 9e1b1cd286 pidfd_open.2: Note the waitid() use case for PID file descriptors
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-12 23:41:40 +02:00
Michael Kerrisk d069725512 pidfd_open.2: Minor fix: add some structure to text on use cases
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-12 23:41:40 +02:00
Michael Kerrisk ecefd5997f pidfd_open.2: Add a subsection header "Use cases for PID file descriptors"
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-12 23:41:35 +02:00
Michael Kerrisk ecf77dbc1c pidfd_open.2: Make it a little more explicit the CLONE_PIDFD returns a PID FD
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-12 23:39:53 +02:00
Michael Kerrisk 6c7331a414 pidfd_open.2: Minor wording improvement
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-12 23:39:53 +02:00
Michael Kerrisk 441632abcc pidfd_open.2: Remove a redundant sentence
clone() CLONE_PIDFD is already mentioned elsewhere in NOTES.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-12 23:39:46 +02:00
Michael Kerrisk 23c167c6a5 select.2: POLLIN_SET/POLLOUT_SET/POLLEX_SET are now defined in terms of EPOLL*
Since kernel commit a9a08845e9acbd224e4ee466f5c1275ed50054e8, the
equivalence between select() and poll()/epoll is defined in terms
of the EPOLL* constants, rather than the POLL* constants.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-12 13:49:27 +02:00
Michael Kerrisk 867c9b3408 localedef.1, close.2, copy_file_range.2, execve.2, get_robust_list.2, getdomainname.2, gethostname.2, inotify_add_watch.2, io_submit.2, ioctl_fideduperange.2, kcmp.2, kill.2, mmap.2, move_pages.2, perf_event_open.2, ptrace.2, rt_sigqueueinfo.2, sched_setaffinity.2, sched_setparam.2, setns.2, sigaction.2, signalfd.2, statx.2, syscall.2, syscalls.2, uname.2, write.2, errno.3, fexecve.3, getauxval.3, printf.3, pthread_mutex_consistent.3, pthread_mutexattr_init.3, pthread_mutexattr_setrobust.3, pthread_setcancelstate.3, regex.3, strtok.3, strtol.3, ttyname.3, smartpqi.4, core.5, resolv.conf.5, man-pages.7, mq_overview.7, operator.7, pthreads.7, signal-safety.7, sysvipc.7: Update timestamp
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-11 10:45:02 +02:00
Michael Kerrisk 20a3713221 pidfd_open.2: Further enhancements to fork() + pidfd_open() text
Christian noted that SA_NOCLDWAIT also matters in this scenario.

Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk b869edcbc9 pidfd_open.2: wfix
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 8f57d60f5c pidfd_open.2: tfix
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 13ad736507 pidfd_open.2: Enhance the discussion of usage of fork() + pidfd_open()
After review comments from Christian and Daniel.

Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 59341b5269 pidfd_open.2: Explain how pidfd_open() can be used to with fork()
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 6059228b26 pidfd_send_signal.2: Minor wording improvement
Reported-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 6ff9a0d85e pidfd_send_signal.2: tfix
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk d651cc6015 pidfd_send_signal.2: Fixes after review comments from Florian Weimer
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk f831492d2b pidfd_send_signal.2: wfix
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 1a5adccc69 pidfd_open.2: Add some missing errors
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 57a436eb4c pidfd_open.2: Improve description in example
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 4e547536bb pidfd_open.2: Add a comment on system call number in example code
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk ad0434de58 pidfd_open.2: read(2) of a PID file descriptor fails with EINVAL
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 9e9bf5383a pidfd_open.2: wfix
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 1523b08d3b pidfd_open.2: wfix
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 465f610c4c pidfd_open.2: Add <sys/types.h> to SYNOPSIS
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 4654d63c35 pidfd_send_signal.2: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 690fbab2ee pidfd_open.2: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk d883766832 clone.2: SEE ALSO: add pidfd_open(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 30d0d39a4f pidfd_open.2: Opening /proc/PID doesn't yield a pollable file descriptor
Thus, pidfd_open() is the preferred way of obtaining a PID
file descriptor.

Notes from a conversation with Christian Brauner:

[[
> A further question... We now have three ways of getting a
> process file descriptor [*]:
>
> open() of /proc/PID
> pidfd_open()
> clone()/clone3() with CLONE_PIDFD
>
> I thought the FD was supposed to be equivalent in all three cases.
> However, if I try (on kernel 5.3) poll() an FD returned by opening
> /proc/PID, poll() tells me POLLNVAL for the FD. Is that difference
> intentional? (I am guessing it is not.)

It's intentional.
The short answer is that /proc/<pid> is a convenience for sending
signals.
The longer answer is that this stems from a heavy debate about what a
process file descriptor was supposed to be and some people pushing for
at least being able to use /proc/<pid> dirfds while ignoring security
problems as soon as you're talking about returning those fds from
clone(); not to mention the additional problems discovered when trying
to implementing this.
A "real" pidfd is one from CLONE_PIDFD or pidfd_open() and all features
such as exit notification, read, and other future extensions will only
be implemented on top of them.
As much as we'd have liked to get rid of two different file descriptor
types it doesn't hurt us much and is not that much different from what
we will e.g. see with fsinfo() in the new mount api which needs to work
on regular fds gotten via open()/openat() and mountfds gotten from
fsopen() and fspick(). The mountfds will also allow for advanced
operations that the other ones will not. There's even an argument to be
made that fds you will get from open()/openat() and openat2() are
different types since they have very different behavior; openat2()
returning fds that are non arbitrarily upgradable etc.
]]

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 4b5f60c597 sigaction.2: SEE ALSO: add pidfd_send_signal(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 8c2bb83d9b rt_sigqueueinfo.2: SEE ALSO: add pidfd_send_signal(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 86314949ad kill.2: SEE ALSO: add pidfd_send_signal(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 9517cf56fc pidfd_send_signal.2: New page documenting pidfd_send_signal(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk f110d8350c pidfd_open.2: New page documenting pidfd_open(2)
Notes from a conversation on linux-man@ with Christian Brauner:

[[
> [*} By the way, going forward, can we call these things
> "process FDs", rather than "PID FDs"? The API names are what
> they are, an that's okay, but these just as we have socket
> FDs that refer to sockets, directory FDs that refer to
> directories, and timer FDs that refer to timers, and so on,
> these are FDs that refer to *processes*, not "process IDs".
> It's a little thing, but I think the naming better, and
> it's what I propose to use in the manual pages.

The naming was another debate and we ended with this compromise.
I would just clarify that a pidfd is a process file descriptor. I
wouldn't make too much of a deal of hiding the shortcut "pidfd".
People are already using it out there in the wild and it's never
proven a good idea to go against accepted practice.
]]

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 34a975f8ae clone.2: Refer to pidfd_open(2) for the purpose of PID file descriptors
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk b97cc7ae40 clone.2: Minor wording improvements
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 7d7dc1877f clone.2: Remove a CLONE_PIDFD detail that wasn't true in the final implementation
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk b4ebffb230 clone.2: The close-on-exec flag is set on the new FD returned by CLONE_PIDFD
In the kernel source (kernel/fork.c::copy_process()), there is:

        pidfile = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
                                      O_RDWR | O_CLOEXEC);

Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 0eec009fb3 clone.2: ffix (split a paragraph)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 4e98b07476 clone.2: Minor tweaks to Christian Brauner's patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 99f6c1d734 clone.2: srcfix: wrap source at sentence boundaries
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Christian Brauner 9f93898154 clone.2: Document CLONE_PIDFD
Add an entry for CLONE_PIDFD. This flag is available starting
with kernel 5.2. If specified, a process file descriptor
("pidfd") referring to the child process will be returned in
the ptid argument.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk a2dd6388a7 pivot_root.2: Update the copyright and license
After my rewriting, almost nothing of the original page remains,
so update the copyright. As the author, I'm relicensing to the
"verbatim" license most commonly used in man pages.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 875298005d pivot_root.2: Minor wording tweaks
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk ba4b07c30f pivot_root.2: Another couple of s/filesystem/mount/
This is consistent with some earlier changes suggested by
Eric Biederman.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 542175d8e4 pivot_root.2: Tweak text of an EINVAL error to correspond to DESCRIPTION
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 01c64c3b4b pivot_root.2: Relegate text about what pivot_root() may or may not do to NOTES
The text stating that "pivot_root() may or may not change the
current root and the current working directory of any processes
or threads which use the old root directory" was written 19 years
ago, before the system call itself was even finalized in the
kernel. The implementation has never changed, and it won't
change in the future, since that would cause user-space breakage.
The existence of that text in DESCRIPTION, followed by qualifying
text stating what the implementation actually does (and has always
done) makes for confusing reading. Therefore, relegate this text
to a historical note in NOTES (so that readers with long memories
can see why the manual page was changed) and rework the text in
DESCRIPTION accordingly.

Reported-by: Philipp Wendler <ml@philippwendler.de>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Reid Priedhorsky <reidpr@lanl.gov>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 3db820fe18 pivot_root.2: Add a subsection header for the pivot_root(".", ".") discussion
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 97076c5a0b pivot_root.2: Minor change: relocate a paragraph in NOTES
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 0843016c9b pivot_root.2: s/root filesystem/root mount/
As suggested by Eric Biederman.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 11:37:02 +02:00
Michael Kerrisk 666373fc08 pivot_root.2: Reword one of the restrictions on 'new_root'
A suggested by Eric Biederman

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 23:26:54 +02:00
Michael Kerrisk 33313a260c pivot_root.2: Change "filesystem" to "mount" in various places
Quoting Eric:

    If we are going to be pedantic "filesystem" is really the
    wrong concept here.  The section about bind mount clarifies
    it, but I wonder if there is a better term.

    I think I would say: "new_root and put_old must not be on
    the same mount as the current root."

    I think using "mount" instead of "filesystem" keeps the
    concepts less confusing.

    As I am reading through this email and seeing text that is
    trying to be precise and clear then hitting the term
    "filesystem" is a bit jarring.  pivot_root doesn't care a
    thing for file systems.  pivot_root only cares about mounts.

    And by a "mount" I mean the thing that you get when you
    create a bind mount or you call mount normally.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 12:14:35 +02:00
Michael Kerrisk 9f3af6b8c8 pivot_root.2: Simplify discussion of restrictions for 'new_root'
Philipp Wendler noted that the text on the restrictions for
'new_root' was slightly contradictory, and things could be
clarified and simplified by describing the restrictions on
'new_root' in one place.

Reported-by: Philipp Wendler <ml@philippwendler.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 09:40:15 +02:00
Michael Kerrisk b27d444f34 pivot_root.2: Remove an imprecision in description
Remove the text that suggests that pivot_root() changes the root
directory and CWD of process that have directory and CWD on the
old root *filesystem*. Change "filesystem" to "directory".

Reported-by: Philipp Wendler <ml@philippwendler.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 09:40:09 +02:00