clone.2: Allocate child's stack using mmap(2) rather than malloc(3)

Christian Brauner suggested mmap(MAP_STACKED), rather than
malloc(), as the canonical way of allocating a stack for the
child of clone(), and Jann Horn noted some reasons why:

    Not on Linux, but on OpenBSD, they do use MAP_STACK now
    AFAIK; this was announced here:
    <http://openbsd-archive.7691.n7.nabble.com/stack-register-checking-td338238.html>.
    Basically they periodically check whether the userspace
    stack pointer points into a MAP_STACK region, and if not,
    they kill the process. So even if it's a no-op on Linux, it
    might make sense to advise people to use the flag to improve
    portability? I'm not sure if that's something that belongs
    in Linux manpages.

    Another reason against malloc() is that when setting up
    thread stacks in proper, reliable software, you'll probably
    want to place a guard page (in other words, a 4K PROT_NONE
    VMA) at the bottom of the stack to reliably catch stack
    overflows; and you probably don't want to do that with
    malloc, in particular with non-page-aligned allocations.

And the OpenBSD 6.5 manual pages says:

    MAP_STACK
        Indicate that the mapping is used as a stack. This
        flag must be used in combination with MAP_ANON and
        MAP_PRIVATE.

And I then noticed that MAP_STACK seems already to be on
FreeBSD for a long time:

    MAP_STACK
        Map the area as a stack.  MAP_ANON is implied.
        Offset should be 0, fd must be -1, and prot should
        include at least PROT_READ and PROT_WRITE.  This
        option creates a memory region that grows to at
        most len bytes in size, starting from the stack
        top and growing down.  The stack top is the start‐
        ing address returned by the call, plus len bytes.
        The bottom of the stack at maximum growth is the
        starting address returned by the call.

        The entire area is reserved from the point of view
        of other mmap() calls, even if not faulted in yet.

Reported-by: Jann Horn <jannh@google.com>
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Michael Kerrisk 2019-11-14 12:19:21 +01:00
parent 0f2b59f5ca
commit 99c3a00027
1 changed files with 29 additions and 3 deletions

View File

@ -1547,6 +1547,28 @@ making it possible to see that the hostname
differs in the UTS namespaces of the parent and child.
For an example of the use of this program, see
.BR setns (2).
.PP
Within the sample program, we allocate the memory that is to
be used for the child's stack using
.BR mmap (2)
rather than
.BR malloc (3)
for the following reasons:
.IP * 3
.BR mmap (2)
allocates a block of memory that starts on a page
boundary and is a multiple of the page size.
This is useful if we want to establish a guard page (a page with protection
.BR PROT_NONE )
at the end of the stack using
.BR mprotect (2).
.IP *
We can specify the
.BR MAP_STACK
flag to request a mapping that is suitable for a stack.
For the moment, this flag is a no-op on Linux,
but it exists and has effect on some other systems,
so we should include it for portability.
.SS Program source
.EX
#define _GNU_SOURCE
@ -1557,6 +1579,7 @@ For an example of the use of this program, see
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \e
} while (0)
@ -1601,11 +1624,13 @@ main(int argc, char *argv[])
exit(EXIT_SUCCESS);
}
/* Allocate stack for child */
/* Allocate memory to be used for the stack of the child */
stack = malloc(STACK_SIZE);
stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, \-1, 0);
if (stack == NULL)
errExit("malloc");
errExit("mmap");
stackTop = stack + STACK_SIZE; /* Assume stack grows downward */
/* Create child that has its own UTS namespace;
@ -1640,6 +1665,7 @@ main(int argc, char *argv[])
.BR getpid (2),
.BR gettid (2),
.BR kcmp (2),
.BR mmap (2),
.BR pidfd_open (2),
.BR set_thread_area (2),
.BR set_tid_address (2),