pidfd_open.2: Opening /proc/PID doesn't yield a pollable file descriptor

Thus, pidfd_open() is the preferred way of obtaining a PID
file descriptor.

Notes from a conversation with Christian Brauner:

[[
> A further question... We now have three ways of getting a
> process file descriptor [*]:
>
> open() of /proc/PID
> pidfd_open()
> clone()/clone3() with CLONE_PIDFD
>
> I thought the FD was supposed to be equivalent in all three cases.
> However, if I try (on kernel 5.3) poll() an FD returned by opening
> /proc/PID, poll() tells me POLLNVAL for the FD. Is that difference
> intentional? (I am guessing it is not.)

It's intentional.
The short answer is that /proc/<pid> is a convenience for sending
signals.
The longer answer is that this stems from a heavy debate about what a
process file descriptor was supposed to be and some people pushing for
at least being able to use /proc/<pid> dirfds while ignoring security
problems as soon as you're talking about returning those fds from
clone(); not to mention the additional problems discovered when trying
to implementing this.
A "real" pidfd is one from CLONE_PIDFD or pidfd_open() and all features
such as exit notification, read, and other future extensions will only
be implemented on top of them.
As much as we'd have liked to get rid of two different file descriptor
types it doesn't hurt us much and is not that much different from what
we will e.g. see with fsinfo() in the new mount api which needs to work
on regular fds gotten via open()/openat() and mountfds gotten from
fsopen() and fspick(). The mountfds will also allow for advanced
operations that the other ones will not. There's even an argument to be
made that fds you will get from open()/openat() and openat2() are
different types since they have very different behavior; openat2()
returning fds that are non arbitrarily upgradable etc.
]]

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2019-09-23 10:00:19 +02:00
parent 4b5f60c597
commit 30d0d39a4f
1 changed files with 13 additions and 0 deletions

View File

@ -88,6 +88,19 @@ the file descriptor indicates as readable.
Note, however, that in the current implementation,
nothing can be read from the file descriptor.
.PP
The
.BR pidfd_open ()
system call is the preferred way of obtaining a PID file descriptor.
The alternative is to obtain a file descriptor by opening a
.I /proc/[pid]
directory.
However, the latter technique is possible only if the
.BR proc (5)
file system is mounted;
furthermore, the file descriptor obtained in this way is
.I not
pollable.
.PP
See also the discussion of the
.BR CLONE_PIDFD
flag in