Commit Graph

20306 Commits

Author SHA1 Message Date
Michael Kerrisk 30d0d39a4f pidfd_open.2: Opening /proc/PID doesn't yield a pollable file descriptor
Thus, pidfd_open() is the preferred way of obtaining a PID
file descriptor.

Notes from a conversation with Christian Brauner:

[[
> A further question... We now have three ways of getting a
> process file descriptor [*]:
>
> open() of /proc/PID
> pidfd_open()
> clone()/clone3() with CLONE_PIDFD
>
> I thought the FD was supposed to be equivalent in all three cases.
> However, if I try (on kernel 5.3) poll() an FD returned by opening
> /proc/PID, poll() tells me POLLNVAL for the FD. Is that difference
> intentional? (I am guessing it is not.)

It's intentional.
The short answer is that /proc/<pid> is a convenience for sending
signals.
The longer answer is that this stems from a heavy debate about what a
process file descriptor was supposed to be and some people pushing for
at least being able to use /proc/<pid> dirfds while ignoring security
problems as soon as you're talking about returning those fds from
clone(); not to mention the additional problems discovered when trying
to implementing this.
A "real" pidfd is one from CLONE_PIDFD or pidfd_open() and all features
such as exit notification, read, and other future extensions will only
be implemented on top of them.
As much as we'd have liked to get rid of two different file descriptor
types it doesn't hurt us much and is not that much different from what
we will e.g. see with fsinfo() in the new mount api which needs to work
on regular fds gotten via open()/openat() and mountfds gotten from
fsopen() and fspick(). The mountfds will also allow for advanced
operations that the other ones will not. There's even an argument to be
made that fds you will get from open()/openat() and openat2() are
different types since they have very different behavior; openat2()
returning fds that are non arbitrarily upgradable etc.
]]

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 4b5f60c597 sigaction.2: SEE ALSO: add pidfd_send_signal(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 8c2bb83d9b rt_sigqueueinfo.2: SEE ALSO: add pidfd_send_signal(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 86fd6bad0a signal.7: SEE ALSO: add pidfd_send_signal(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 86314949ad kill.2: SEE ALSO: add pidfd_send_signal(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 9517cf56fc pidfd_send_signal.2: New page documenting pidfd_send_signal(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk f110d8350c pidfd_open.2: New page documenting pidfd_open(2)
Notes from a conversation on linux-man@ with Christian Brauner:

[[
> [*} By the way, going forward, can we call these things
> "process FDs", rather than "PID FDs"? The API names are what
> they are, an that's okay, but these just as we have socket
> FDs that refer to sockets, directory FDs that refer to
> directories, and timer FDs that refer to timers, and so on,
> these are FDs that refer to *processes*, not "process IDs".
> It's a little thing, but I think the naming better, and
> it's what I propose to use in the manual pages.

The naming was another debate and we ended with this compromise.
I would just clarify that a pidfd is a process file descriptor. I
wouldn't make too much of a deal of hiding the shortcut "pidfd".
People are already using it out there in the wild and it's never
proven a good idea to go against accepted practice.
]]

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 34a975f8ae clone.2: Refer to pidfd_open(2) for the purpose of PID file descriptors
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk b97cc7ae40 clone.2: Minor wording improvements
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 7d7dc1877f clone.2: Remove a CLONE_PIDFD detail that wasn't true in the final implementation
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk b4ebffb230 clone.2: The close-on-exec flag is set on the new FD returned by CLONE_PIDFD
In the kernel source (kernel/fork.c::copy_process()), there is:

        pidfile = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
                                      O_RDWR | O_CLOEXEC);

Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 0eec009fb3 clone.2: ffix (split a paragraph)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 4e98b07476 clone.2: Minor tweaks to Christian Brauner's patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 99f6c1d734 clone.2: srcfix: wrap source at sentence boundaries
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Christian Brauner 9f93898154 clone.2: Document CLONE_PIDFD
Add an entry for CLONE_PIDFD. This flag is available starting
with kernel 5.2. If specified, a process file descriptor
("pidfd") referring to the child process will be returned in
the ptid argument.

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk a2dd6388a7 pivot_root.2: Update the copyright and license
After my rewriting, almost nothing of the original page remains,
so update the copyright. As the author, I'm relicensing to the
"verbatim" license most commonly used in man pages.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 875298005d pivot_root.2: Minor wording tweaks
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk ba4b07c30f pivot_root.2: Another couple of s/filesystem/mount/
This is consistent with some earlier changes suggested by
Eric Biederman.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 542175d8e4 pivot_root.2: Tweak text of an EINVAL error to correspond to DESCRIPTION
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 01c64c3b4b pivot_root.2: Relegate text about what pivot_root() may or may not do to NOTES
The text stating that "pivot_root() may or may not change the
current root and the current working directory of any processes
or threads which use the old root directory" was written 19 years
ago, before the system call itself was even finalized in the
kernel. The implementation has never changed, and it won't
change in the future, since that would cause user-space breakage.
The existence of that text in DESCRIPTION, followed by qualifying
text stating what the implementation actually does (and has always
done) makes for confusing reading. Therefore, relegate this text
to a historical note in NOTES (so that readers with long memories
can see why the manual page was changed) and rework the text in
DESCRIPTION accordingly.

Reported-by: Philipp Wendler <ml@philippwendler.de>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Reid Priedhorsky <reidpr@lanl.gov>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 3db820fe18 pivot_root.2: Add a subsection header for the pivot_root(".", ".") discussion
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 97076c5a0b pivot_root.2: Minor change: relocate a paragraph in NOTES
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 12:24:28 +02:00
Michael Kerrisk 0843016c9b pivot_root.2: s/root filesystem/root mount/
As suggested by Eric Biederman.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-10 11:37:02 +02:00
Michael Kerrisk 666373fc08 pivot_root.2: Reword one of the restrictions on 'new_root'
A suggested by Eric Biederman

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 23:26:54 +02:00
Michael Kerrisk 33313a260c pivot_root.2: Change "filesystem" to "mount" in various places
Quoting Eric:

    If we are going to be pedantic "filesystem" is really the
    wrong concept here.  The section about bind mount clarifies
    it, but I wonder if there is a better term.

    I think I would say: "new_root and put_old must not be on
    the same mount as the current root."

    I think using "mount" instead of "filesystem" keeps the
    concepts less confusing.

    As I am reading through this email and seeing text that is
    trying to be precise and clear then hitting the term
    "filesystem" is a bit jarring.  pivot_root doesn't care a
    thing for file systems.  pivot_root only cares about mounts.

    And by a "mount" I mean the thing that you get when you
    create a bind mount or you call mount normally.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 12:14:35 +02:00
Michael Kerrisk 9f3af6b8c8 pivot_root.2: Simplify discussion of restrictions for 'new_root'
Philipp Wendler noted that the text on the restrictions for
'new_root' was slightly contradictory, and things could be
clarified and simplified by describing the restrictions on
'new_root' in one place.

Reported-by: Philipp Wendler <ml@philippwendler.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 09:40:15 +02:00
Michael Kerrisk b27d444f34 pivot_root.2: Remove an imprecision in description
Remove the text that suggests that pivot_root() changes the root
directory and CWD of process that have directory and CWD on the
old root *filesystem*. Change "filesystem" to "directory".

Reported-by: Philipp Wendler <ml@philippwendler.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 09:40:09 +02:00
Michael Kerrisk ee81d7e418 namespaces.7: Include manual page references in the summary table of namespace types
Make the page more compact by removing the stub subsections that
list the manual pages for the namespace types. And while we're
here, add an explanation of the table columns.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 08:59:22 +02:00
Michael Kerrisk 4d75df3711 mount_namespaces.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-09 08:25:04 +02:00
Michael Kerrisk 19416046c5 mount_namespaces.7: Tweak discussion of "less privileged" mount namespace
Eric Biederman:

    I hate to nitpick, but I am going to say that when I read
    the text above the phrase "mount namespace of the process
    that created the new mount namespace" feels wrong.

    Either you use unshare(2) and the mount namespace of the
    process that created the mount namespace changes.

    Or you use clone(2) and you could argue it is the new child
    that created the mount namespace.

    Having a different mount namespace at the end of the
    creation operation feels like it makes your phrase confusing
    about what the starting mount namespace is.  I hate to use
    references that are ambiguous when things are changing.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 23:30:55 +02:00
Michael Kerrisk 534755eed9 mount_namespaces.7: Explain how a namespace's mount point list is initialized
Provide a more detailed explanation of the initialization of
the mount point list in a new mount namespace.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 22:51:59 +02:00
Michael Kerrisk 47b69a37cf pivot_root.2: srcfix: FIXME
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 21:29:28 +02:00
Michael Kerrisk 8f2a9129e6 pivot_root.2: Remove the term 'old_root'
Reid noted a confusion between 'old_root' (my attempt at a
shorthand for the old root point) and 'put_old. Eliminate the
confusion by replacing the shorthand with "old root mount point".

Reported-by: Reid Priedhorsky <reidpr@lanl.gov>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 20:57:55 +02:00
Michael Kerrisk 459fe99546 mount.2: Describe the concept of "parent mounts"
Reported-by: Reid Priedhorsky <reidpr@lanl.gov>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 17:26:51 +02:00
Michael Kerrisk e0e0ba7d01 mount.2: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 17:23:20 +02:00
Michael Kerrisk dd858bfd5e mount.2: Rework the text on mount namespaces a little
Eliminate the term "Per-process namespaces" and add a reference
to mount_namespaces(7).

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 16:44:58 +02:00
Michael Kerrisk 5d3bcce72d mount.2: wfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 16:44:58 +02:00
Michael Kerrisk 632940d96d mount.2: NOTES: add subsection heading for /proc/[pid]/{mounts,mountinfo}
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 16:44:58 +02:00
Michael Kerrisk ed425459c5 mount_namespaces.7: tfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 16:26:15 +02:00
Michael Kerrisk a0c9733194 mount_namespaces.7: Clarify description of "less privileged" mount namespaces
The current text talks about "parent mount namespaces", but there
is no such concept. As confirmed by Eric Biederman, what is mean
here is "the mount namespace this mount namespace started as a
copy of". So, this change writes up Eric's description in a more
detailed way.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-08 16:20:59 +02:00
Michael Kerrisk 93cc3b3827 pivot_root.2: Simplify pivot_root(".", ".") example
Eric Biederman notes that the change in commit f646ac88ef was
not strictly necessary for this example, since one of the already
documented requirements is that various mount points must not have
shared propagation, or else pivot_root() will fail. So, simplify
the example.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-07 14:02:42 +03:00
Michael Kerrisk a2fc45a9f8 mount_namespaces.7: It may be desirable to disable propagation after creating a namespace
After creating a new mount namespace, it may be desirable to
disable mount propagation. Give the reader a more explicit
hint about this.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-07 12:11:30 +03:00
Michael Kerrisk 0b6cf5d26e pthreads.7: Minor tweaks to Carlos O'Donell's patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-05 14:54:24 +03:00
Michael Kerrisk 50639a2a18 pthread_setcancelstate.3, pthreads.7: srcfix: wrap source lines at sentence boundaries
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-05 14:54:15 +03:00
Carlos O'Donell dbb01cbbdb pthread_setcancelstate.3, pthreads.7, signal-safety.7: Describe issues with cancellation points in signal handlers
In a recent conversation with Mathieu Desnoyers I was reminded
that we haven't written up anything about how deferred
cancellation and asynchronous signal handlers interact. Mathieu
ran into some of this behaviour and I promised to improve the
documentation in this area to point out the potential pitfall.

Thoughts?

8< --- 8< --- 8<
In pthread_setcancelstate.3, pthreads.7, and signal-safety.7 we
describe that if you have an asynchronous signal nesting over a
deferred cancellation region that any cancellation point in the
signal handler may trigger a cancellation that will behave
as-if it was an asynchronous cancellation. This asynchronous
cancellation may have unexpected effects on the consistency of
the application. Therefore care should be taken with asynchronous
signals and deferred cancellation.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-05 14:54:02 +03:00
Michael Kerrisk c6ed23c5da perf_event_open.2: SEE ALSO: add Documentation/admin-guide/perf-security.rst
Reported-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-05 11:30:42 +03:00
Michael Kerrisk 1ff5960b23 prctl.2: Clarify that PR_MCE_KILL_GET returns value via function result
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-02 07:20:45 +03:00
Michael Kerrisk 035a7bf179 prctl.2: wfix (for consistency)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-02 07:19:53 +03:00
Michael Kerrisk 7f5d84426c prctl.2: RETURN VALUE: add some missing entries
Note success return for PR_GET_SPECULATION_CTRL and PR_GET_FP_MODE.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-02 07:09:38 +03:00
Michael Kerrisk 1cea09b38b prctl.2: Clarify that PR_GET_SPECULATION_CTRL returns value as function result
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2019-10-02 06:56:50 +03:00