seccomp_unotify.2: A cookie check is also required after reading target's memory

Quoting Jann Horn:

[[
As discussed at
<https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>,
we need to re-check checkNotificationIdIsValid() after reading remote
memory but before using the read value in any way. Otherwise, the
syscall could in the meantime get interrupted by a signal handler, the
signal handler could return, and then the function that performed the
syscall could free() allocations or return (thereby freeing buffers on
the stack).

In essence, this pread() is (unavoidably) a potential use-after-free
read; and to make that not have any security impact, we need to check
whether UAF read occurred before using the read value. This should
probably be called out elsewhere in the manpage, too...

Now, of course, **reading** is the easy case. The difficult case is if
we have to **write** to the remote process... because then we can't
play games like that. If we write data to a freed pointer, we're
screwed, that's it. (And for somewhat unrelated bonus fun, consider
that /proc/$pid/mem is originally intended for process debugging,
including installing breakpoints, and will therefore happily write
over "readonly" private mappings, such as typical mappings of
executable code.)

So, uuuuh... I guess if anyone wants to actually write memory back to
the target process, we'd better come up with some dedicated API for
that, using an ioctl on the seccomp fd that magically freezes the
target process inside the syscall while writing to its memory, or
something like that? And until then, the manpage should have a big fat
warning that writing to the target's memory is simply not possible
(safely).
]]

and
<https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>:

[[
The second bit of trouble is that if the supervisor is so oblivious
that it doesn't realize that syscalls can be interrupted, it'll run
into other problems. Let's say the target process does something like
this:

int func(void) {
  char pathbuf[4096];
  sprintf(pathbuf, "/tmp/blah.%d", some_number);
  mount("foo", pathbuf, ...);
}

and mount() is handled with a notification. If the supervisor just
reads the path string and immediately passes it into the real mount()
syscall, something like this can happen:

target: starts mount()
target: receives signal, aborts mount()
target: runs signal handler, returns from signal handler
target: returns out of func()
supervisor: receives notification
supervisor: reads path from remote buffer
supervisor: calls mount()

but because the stack allocation has already been freed by the time
the supervisor reads it, the supervisor just reads random garbage, and
beautiful fireworks ensue.

So the supervisor *fundamentally* has to be written to expect that at
*any* time, the target can abandon a syscall. And every read of remote
memory has to be separated from uses of that remote memory by a
notification ID recheck.

And at that point, I think it's reasonable to expect the supervisor to
also be able to handle that a syscall can be aborted before the
notification is delivered.
]]

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2020-10-29 19:41:22 +01:00
parent 8742c19c9f
commit a46a1879c5
1 changed files with 92 additions and 1 deletions

View File

@ -476,6 +476,10 @@ from the file descriptor may return 0, indicating end of file.)
.\" with the "struct pid", which is not reused, instead of the
.\" numeric PID.
.IP
See NOTES for a discussion of other cases where
.B SECCOMP_IOCTL_NOTIF_ID_VALID
checks must be performed.
.IP
On success (i.e., the notification ID is still valid),
this operation returns 0.
On failure (i.e., the notification ID is no longer valid),
@ -742,6 +746,80 @@ in order to continue a system call, the supervisor should be sure that
another security mechanism or the kernel itself will sufficiently block
the system call if its arguments are rewritten to something unsafe.
.\"
.SS Caveats regarding the use of /proc/[tid]/mem
The discussion above noted the need to use the
.BR SECCOMP_IOCTL_NOTIF_ID_VALID
.BR ioctl (2)
when opening the
.IR /proc/[tid]/mem
file of the target
to avoid the possibility of accessing the memory of the wrong process
in the event that the target terminates and its ID
is recycled by another (unrelated) thread.
However, the use of this
.BR ioctl (2)
operation is also necessary in other situations,
as explained in the following paragraphs.
.PP
Consider the following scenario, where the supervisor
tries to read the pathname argument of a target's blocked
.BR mount (2)
system call:
.IP \(bu 2
From one of its functions
.RI ( func() ),
the target calls
.BR mount (2),
which triggers a user-space notification and causes the target to block.
.IP \(bu
The supervisor receives the notification, opens
.IR /proc/[tid]/mem ,
and (successfully) performs the
.BR SECCOMP_IOCTL_NOTIF_ID_VALID
check.
.IP \(bu
The target receives a signal, which causes the
.BR mount (2)
to abort.
.IP \(bu
The signal handler executes in the target, and returns.
.IP \(bu
Upon return from the handler, the execution of
.I func()
resumes, and it returns (and perhaps other functions are called,
overwriting the memory that had been used for the stack frame of
.IR func() ).
.IP \(bu
Using the address provided in the notification information,
the supervisor reads from the target's memory location that used to
contain the pathname.
.IP \(bu
The supervisor now calls
.BR mount (2)
with some arbitrary bytes obtained in the previous step.
.PP
The conclusion from the above scenario is this:
since the target's blocked system call may be interrupted by a signal handler,
the supervisor must be written to expect that the
target may abandon its system call at
.B any
time;
in such an event, any information that the supervisor obtained from
the target's memory must be considered invalid.
.PP
To prevent such scenarios,
every read from the target's memory must be separated from use of
the bytes so obtained by a
.BR SECCOMP_IOCTL_NOTIF_ID_VALID
check.
In the above example, the check would be placed between the two final steps.
An example of such a check is shown in EXAMPLES.
.PP
Following on from the above, it should be clear that
a write by the supervisor into the target's memory can
.B never
be considered safe.
.\"
.SS Interaction with SA_RESTART signal handlers
Consider the following scenario:
.IP \(bu 2
@ -1383,7 +1461,20 @@ getTargetPathname(struct seccomp_notif *req, int notifyFd,
if (close(procMemFd) == \-1)
errExit("Supervisor: close\-/proc/PID/mem");
/* We have no guarantees about what was in the memory of the target
/* Once again check that the notification ID is still valid. The
case we are particularly concerned about here is that just
before we fetched the pathname, the target\(aqs blocked system
call was interrupted by a signal handler, and after the handler
returned, the target carried on execution (past the interrupted
system call). In that case, we have no guarantees about what we
are reading, since the target\(aqs memory may have been arbitrarily
changed by subsequent operations. */
if (!notificationIdIsValid(notifyFd, req\->id))
return false;
/* Even if the target\(aqs system call was not interrupted by a signal,
we have no guarantees about what was in the memory of the target
process. (The memory may have been modified by another thread, or
even by an external attacking process.) We therefore treat the
buffer returned by pread() as untrusted input. The buffer should