Commit Graph

22796 Commits

Author SHA1 Message Date
Michael Kerrisk d1774d6af8 seccomp_unotify.2: Better handling of invalid target pathname
After some discussions with Jann Horn, perhaps a better way of
dealing with an invalid target pathname is to trigger an
error for the system call.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 47056412d7 seccomp_unotify.2: EXAMPLE: rename a variable
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 2f37aeb620 seccomp_unotify.2: EXAMPLE: Improve allocation of response buffer
From a conversation with Jann Horn:

[[
>>>>            struct seccomp_notif_resp *resp = malloc(sizes.seccomp_notif_resp);
>>>
>>> This should probably do something like max(sizes.seccomp_notif_resp,
>>> sizeof(struct seccomp_notif_resp)) in case the program was built
>>> against new UAPI headers that make struct seccomp_notif_resp big, but
>>> is running under an old kernel where that struct is still smaller?
>>
>> I'm confused. Why? I mean, if the running kernel says that it expects
>> a buffer of a certain size, and we allocate a buffer of that size,
>> what's the problem?
>
> Because in userspace, we cast the result of malloc() to a "struct
> seccomp_notif_resp *". If the kernel tells us that it expects a size
> smaller than sizeof(struct seccomp_notif_resp), then we end up with a
> pointer to a struct that consists partly of allocated memory, partly
> of out-of-bounds memory, which is generally a bad idea - I'm not sure
> whether the C standard permits that. And if userspace then e.g.
> decides to access some member of that struct that is beyond what the
> kernel thinks is the struct size, we get actual OOB memory accesses.
Got it. (But gosh, this seems like a fragile API mess.)

I added the following to the code:

    /* When allocating the response buffer, we must allow for the fact
       that the user-space binary may have been built with user-space
       headers where 'struct seccomp_notif_resp' is bigger than the
       response buffer expected by the (older) kernel. Therefore, we
       allocate a buffer that is the maximum of the two sizes. This
       ensures that if the supervisor places bytes into the response
       structure that are past the response size that the kernel expects,
       then the supervisor is not touching an invalid memory location. */

    size_t resp_size = sizes.seccomp_notif_resp;
    if (sizeof(struct seccomp_notif_resp) > resp_size)
        resp_size = sizeof(struct seccomp_notif_resp);

    struct seccomp_notif_resp *resp = malloc(resp_size);
]]

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk bf892a6527 seccomp_unotify.2: EXAMPLE: ensure path read() by the supervisor is null-terminated
From a conversation with Jann Horn:

    >> We should probably make sure here that the value we read is actually
    >> NUL-terminated?
    >
    > So, I was curious about that point also. But, (why) are we not
    > guaranteed that it will be NUL-terminated?

    Because it's random memory filled by another process, which we don't
    necessarily trust. While seccomp notifiers aren't usable for applying
    *extra* security restrictions, the supervisor will still often be more
    privileged than the supervised process.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk e4db7ae69d seccomp_unotify.2: wfix in example program
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 5c12cebdf2 seccomp_unotify.2: Small wording fix
Change "read(2) will return 0" to "read(2) may return 0".

Quoting Jann Horn:

    Maybe make that "may return 0" instead of "will return 0" -
    reading from /proc/$pid/mem can only return 0 in the
    following cases AFAICS:

    1. task->mm was already gone at open() time
    2. mm->mm_users has dropped to zero (the mm only has lazytlb
       users; page tables and VMAs are being blown away or have
       been blown away)
    3. the syscall was called with length 0

    When a process has gone away, normally mm->mm_users will
    drop to zero, but someone else could theoretically still be
    holding a reference to the mm (e.g. someone else in the
    middle of accessing /proc/$pid/mem).  (Such references
    should normally not be very long-lived though.)

    Additionally, in the unlikely case that the OOM killer just
    chomped through the page tables of the target process, I
    think the read will return -EIO (same error as if the
    address was simply unmapped) if the address is within a
    non-shared mapping. (Maybe that's something procfs could do
    better...)

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk e06808b4b1 seccomp_unotify.2: Minor wording change + add a FIXME
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk bcfeed7d4e seccomp_unotify.2: User-space notification can't be used to implement security policy
Add some strongly worded text warning the reader about the correct
uses of seccomp user-space notification.

Reported-by: Jann Horn <jannh@google.com>
Cowritten-by: Christian Brauner <christian@brauner.io>
Cowritten-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 03e4237409 seccomp_unotify.2: Fixes after review comments from Christian Brauner
Reported-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk fd376c6b2a seccomp.2, seccomp_unotify.2: Clarify that there can be only one SECCOMP_FILTER_FLAG_NEW_LISTENER
Reported-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk cd3224b7df seccomp_unotify.2: Note when FD indicates EOF/(E)POLLHUP in (e)poll/select
Verified by experiment.

Reported-by: Christian Brauner <christian.brauner@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 6048506c77 seccomp_unotify.2: Note when notification FD indicates as writable by select/poll/epoll
Reported-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk ea4d03e6b0 seccomp_unotify.2: Minor fixes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk a08715b41e seccomp_unotify.2: Fixes after review comments by Jann Horn
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk d85217eff7 seccomp_unotify.2: Add BUGS section describing SECCOMP_IOCTL_NOTIF_RECV bug
Tycho Andersen confirmed that this issue is present.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 72a8602617 seccomp_unotify.2: srcfix: remove bogus FIXME
Pathname arguments are limited to PATH_MAX bytes.

Reported-by: Tycho Andersen <tycho@tycho.pizza>
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 391194cd52 seccomp_unotify.2: Changes after feed back from Tycho Andersen
Reported-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk a9a8e35644 seccomp_unotify.2: Document the seccomp user-space notification mechanism
The APIs used by this mechanism comprise not only seccomp(2), but
also a number of ioctl(2) operations. And any useful example
demonstrating these APIs is will necessarily be rather long.
Trying to cram all of this into the seccomp(2) page would make
that page unmanageably long. Therefore, let's document this
mechanism in a separate page.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 0a86ac9c9b seccomp.2: Note that SECCOMP_RET_USER_NOTIF can be overridden
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 8459e46597 seccomp.2: wfix: mention term "supervisor" in description of SECCOMP_RET_USER_NOTIF
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 1741f7fc2e seccomp.2: SEE ALSO: add seccomp_unotify(2)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk 2bbe9bd9ae seccomp.2: Rework SECCOMP_GET_NOTIF_SIZES somewhat
The existing text says the structures (plural!) contain a 'struct
seccomp_data'. But this is only true for the received notification
structure (seccomp_notif). So, reword the sentence to be more
general, noting simply that the structures may evolve over time.

Add some comments to the structure definition.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk b723c6d8dd seccomp.2: Add some details for SECCOMP_FILTER_FLAG_NEW_LISTENER
Rework the description a little, and note that the close-on-exec
flag is set for the returned file descriptor.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:17 +12:00
Michael Kerrisk d7a3918456 seccomp.2: Minor edits to Tycho's SECCOMP_FILTER_FLAG_NEW_LISTENER patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:16 +12:00
Tycho Andersen b9395f4a3e seccomp.2: Document SECCOMP_FILTER_FLAG_NEW_LISTENER
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:16 +12:00
Michael Kerrisk 8fa47f3ae4 seccomp.2: Reorder list of SECCOMP_SET_MODE_FILTER flags alphabetically
(No content changes.)

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:16 +12:00
Michael Kerrisk 3bed246e7e seccomp.2: Some reworking of Tycho's SECCOMP_RET_USER_NOTIF patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:16 +12:00
Tycho Andersen c734bbd265 seccomp.2: Document SECCOMP_RET_USER_NOTIF
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:16 +12:00
Michael Kerrisk 6fc8b8a0a1 seccomp.2: Minor edits to Tycho Andersen's patch
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:16 +12:00
Tycho Andersen 9bc48145a6 seccomp.2: Document SECCOMP_GET_NOTIF_SIZES
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:40:16 +12:00
Michael Kerrisk 408483bd31 socketcall.2: srcfix
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar b6687e3971 socketcall.2: Use syscall(SYS_...); for system calls without a wrapper
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar 1b4d275a0e sigprocmask.2: Use syscall(SYS_...); for raw system calls
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar aa03a4e732 shmop.2: Remove unused include
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar 1cd36d9dea sgetmask.2: Use syscall(SYS_...); for system calls without a wrapper
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar 18e21e1e4c set_tid_address.2: Use syscall(SYS_...); for system calls without a wrapper
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar ba4d34a16d set_thread_area.2: Use syscall(SYS_...); for system calls without a wrapper
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar 9202a1eb8e rt_sigqueueinfo.2: Use syscall(SYS_...); for system calls without a wrapper
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar 5e9623f3b9 open.2: Remove unused <sys/stat.h>
I can't see a reason to include it.  <fcntl.h> provides O_*
constants for 'flags', S_* constants for 'mode', and mode_t.

Probably a long time ago, some of those weren't defined in
<fcntl.h>, and both headers needed to be included, or maybe it's
a historical error.

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Michael Kerrisk 0ba6b2966c system_data_types.7: Minor enhancement of description of mode_t
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar 0ace616cf8 mode_t.3: New link to system_data_types(7)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar e0b6220511 system_data_types.7: Add 'mode_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar 6c2508dc6f blksize_t.3: New link to system_data_types(7)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar 111ad1edd5 system_data_types.7: Add 'blksize_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar acb5994605 cc_t.3: New link to system_data_types(7)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar f71cb14dcb system_data_types.7: Add 'cc_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar d9e9879139 blkcnt_t.3: New link to system_data_types(7)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
Alejandro Colomar 8d1df7f260 system_data_types.7: Add 'blkcnt_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:46 +12:00
dann frazier 9d39058523 kernel_lockdown.7: Remove additional text alluding to lifting via SysRq
My previous patch intended to drop the docs for the lockdown lift
SysRq, but it missed this other section that refers to lifting it
via a keyboard - an allusion to that same SysRq.

Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:37:17 +12:00
dann frazier a989677777 kernel_lockdown.7: Remove description of lifting via SysRq (not upstream)
The patch that implemented lockdown lifting via SysRq ended up
getting dropped[*] before the feature was merged upstream. Having
the feature documented but unsupported has caused some confusion
for our users.

[*] http://archive.lwn.net:8080/linux-kernel/CACdnJuuxAM06TcnczOA6NwxhnmQUeqqm3Ma8btukZpuCS+dOqg@mail.gmail.com/

Signed-off-by: dann frazier <dann.frazier@canonical.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Pedro Principeza <pedro.principeza@canonical.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Kyle McMartin <kyle@redhat.com>
Cc: Matthew Garrett <mjg59@google.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2021-06-10 10:33:07 +12:00