After comments from Florian Weimer, who pointed out various
confusions in the earlier text.
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The attr argument of sched_setattr was documented as const but the
kernel will modify the size field of this struct if it contains an
invalid value. See the documentation of the size field for details.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch relates to the exclude_host and exclude_guest bits added
by the following commit:
exclude_host, exclude_guest; Linux 3.2
commit a240f76165e6255384d4bdb8139895fac7988799
Author: Joerg Roedel <joerg.roedel@amd.com>
Date: Wed Oct 5 14:01:16 2011 +0200
perf, core: Introduce attrs to count in either host or guest mode
The updated manpage text clarifies that the "exclude_host" and
"exclude_guest" perf_event_open() attr bits only apply in the
context of a KVM environment and are currently x86 only.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch relates to the addition of PERF_SAMPLE_REGS_INTR
support added in the following commit:
perf_sample_regs_intr; Linux 3.19
commit 60e2364e60e86e81bc6377f49779779e6120977f
Author: Stephane Eranian <eranian@google.com>
perf: Add ability to sample machine state on interrupt
The primary difference between PERF_SAMPLE_REGS_INTR and the
existing PERF_SAMPLE_REGS_USER is that the new support will
return kernel register values. Also if precise_ip is
set higher than 0 then the PEBS register state will be returned
rather than the saved interrupt state.
This patch incorporates feedback from Stephane Eranian and
Andi Kleen.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
ENOMEM can occur if locking/unlocking in the middle of a region
would increase the number of VMAs beyond the system limit (64k).
Reported-by: Mehdi Aqadjani Memar <m.aqadjanimemar@student.vu.nl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These days, glibc implements _exit() as a wrapper around
exit_group(2). (When seccomp was originally introduced, this was
not the case.) Give the reader a clue that, despite what glibc is
doing, what SECCOMP_SET_MODE_STRICT permits is the true _exit(2)
system call, and not exit_group(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It is unfortunate that this discourages this use of chroot(2)
without pointing out alternative solutions - for example,
OpenSSH and vsftpd both still rely on chroot(2) for security.
Bind mounts should theoretically be usable as a replacement, but
currently, they have a similar problem (CVE-2015-2925) that hasn't
been fixed in ~6 months, so I'd rather not add it to the manpage
as a solution before a fix lands.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Many years ago, text was added to the page saying that it is
implementation-dependent whether stdio streams are flushed and
whether temporary are removed. In part, this change appears to
be because POSIX.1-2001 added text related to this point.
However, that seems to have been an error in POSIX, and the
text was subsequently removed for POSIX.1-2008. See
https://collaboration.opengroup.org/austin/interps/documents/9984/AI-085.txt
Austin Group Interpretation reference 1003.1-2001 #085
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After research, We think utimensat() and futimens() are thread-safe.
But, there are not markings of utimensat() and futimens() in glibc
document.
Signed-off-by: Zeng Linggang <zenglg.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After research, We think eventfd() is thread-safe. But, there
is not marking of eventfd() in glibc document.
Signed-off-by: Zeng Linggang <zenglg.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After research, We think clock_getres(), clock_gettime() and
clock_settime() are thread-safe. But, there are not markings of
clock_getres(), clock_gettime() and clock_settime() in glibc document.
Signed-off-by: Zeng Linggang <zenglg.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
David Rientjes has noticed that MAP_POPULATE wording might promise
much more than the kernel actually provides and intend to provide.
The primary usage of the flag is to pre-fault the range. There is
no guarantee that no major faults will happen later on. The pages
might have been reclaimed by the time the process tries to access
them.
Reviewed-by: Eric B Munson <emunson@akamai.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2)
since it has been introduced.
mlock(2) fails if the memory range cannot get populated to
guarantee that no future major faults will happen on the range.
mmap(MAP_LOCKED) on the other hand silently succeeds even if
the range was populated only partially.
Fixing this subtle difference in the kernel is rather awkward
because the memory population happens after mm locks have been
dropped and so the cleanup before returning failure (munlock)
could operate on something else than the originally mapped area.
E.g. speculative userspace page fault handler catching SEGV and
doing mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion
of a racing mmap and lead to lost data. Although it is not clear
whether such a usage would be valid, mmap page doesn't explicitly
describe requirements for threaded applications so we cannot
exclude this possibility.
This patch makes the semantic of MAP_LOCKED explicit and suggests
using mmap + mlock as the only way to guarantee no later major
page faults.
Reviewed-by: Eric B Munson <emunson@akamai.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The marking matches glibc marking.
The marking of functions in glibc is:
- sigaltstack: MT-Safe
Signed-off-by: Zeng Linggang <zenglg.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The marking matches glibc marking.
The marking of functions in glibc is:
- getrusage: MT-Safe
Signed-off-by: Zeng Linggang <zenglg.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After research, We think prlimit() is thread-safe. But, there
is not marking of prlimit() in glibc document.
getrlimit() and setrlimit() match glibc markings.
- getrlimit: MT-Safe
- setrlimit: MT-Safe
- prlimit: MT-Safe
Signed-off-by: Zeng Linggang <zenglg.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The text on mixinf I/O syscalls and stdio is a general point
of behavior. It's not a bug as such.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The "ABI" doesn't really convey anything significant in
the title. These subsections are about describing differences
between the kernel and (g)libc interfaces.
Reported-by: Andries E. Brouwer <Andries.Brouwer@cwi.nl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
set/get_mempolicy manpages say that the memory allocation
policy is per process while reading the code and testing shows
that it's actually per thread. Here's a quick fix, which may
need to be improved to better explain that we're allocating
in the context of a thread within a process address space.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=43300
Reported-by: David Wilcox <davidvsthegiant@gmail.com>
Reported-by: Filipe Brandenburger <filbranden@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing text in this page:
MAP_SHARED Share this mapping. Updates to the mapping
are visible to other processes that map this
file, and are carried through to the underly‐
ing file. The file may not actually be
updated until msync(2) or munmap() is called.
implies that munmap() will sync the mapping to the underlying
file. POSIX doesn't require this, and some light reading of the
code and some light testing (fsync() after munmap() of a large
file) also indicates that Linux doesn't do this.
See also this mail thread:
Subject: munmap, msync: synchronization
Newsgroups: gmane.linux.man
Date: 2014-04-20 10:28:40 GMT
http://thread.gmane.org/gmane.linux.man/5548
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rewrite the text somewhat, for easier comprehension.
No (intentional) changes to factual content
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The same thing was fixed for execve() in kernel commit
8b01fc86b9f425899f8a3a8fc1c47d73c2c20543, but for performance
reasons, that simple patch won't work for stat().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explain the effect that default ACLs have (instead of the umask)
in umask.2. Mention that default ACLs can have an affect in
open.2, mknod.2, and mkdir.2.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
According to POSIX, the the 9 UGO*RWX bits are permissions, and
'mode' is used to refer to collectively to those bits plus sticky,
set-UID, and set_GID bits.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Based on Ted T'so's commit message 0ae45f63d4e
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cowritten-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
timerfd_create.2 mentions TFD_IOC_SET_TICKS. We should add it to
ioctl_list.2, too.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Really just a marker to record the reporters of bugs
that stemmed from the fact that the page did not
document getdents64(). I'll fix things up in the changelog.
See https://bugzilla.kernel.org/show_bug.cgi?id=14795
Reported-by: Dima Tisnek <dimaqq@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Three versions of "stat" appeared on 32-bit systems,
dealing with structures of different (increasing) sizes.
Explain some of the details, and also note that the
situation is simpler on modern 64-bit architectures.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The multiple-system-call-version phenomenon is particular a
feature of older 32-bit platforms. Hint at that fact in the text.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the discussion of the new *64() system calls added in
Linux 2.4, use truncate64() father than ftruncate64(),
since the text goes on to say "and their analogs that work with
file descriptors or symbolic links".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
madvise(2) actually returns with error EINVAL for MADV_REMOVE
when used for hugetlb VMAs, not EOPNOTSUPP, and this has been
the case since MADV_REMOVE was introduced in commit f6b3ec238d12
("madvise(MADV_REMOVE): remove pages from tmpfs shm backing
store").
Specify the exact behavior.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The glibc wrapper gives an EINVAL error on attempts to change the
disposition of either of the two real-time signals used by NPTL.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
At the kernel level, credentials (UIDs and GIDs) are a per-thread
attribute. NPTL uses a signal-based mechanism to ensure that
when one thread changes its credentials, all other threads change
credentials to the same values. By this means, the NPTL
implementation conforms to the POSIX requirement that the threads
in a process share credentials.
Reported-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
At the kernel level, credentials (UIDs and GIDs) are a per-thread
attribute. NPTL uses a signal-based mechanism to ensure that
when one thread changes its credentials, all other threads change
credentials to the same values. By this means, the NPTL
implementation conforms to the POSIX requirement that the threads
in a process share credentials.
Reported-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
At the kernel level, credentials (UIDs and GIDs) are a per-thread
attribute. NPTL uses a signal-based mechanism to ensure that
when one thread changes its credentials, all other threads change
credentials to the same values. By this means, the NPTL
implementation conforms to the POSIX requirement that the threads
in a process share credentials.
Reported-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
At the kernel level, credentials (UIDs and GIDs) are a per-thread
attribute. NPTL uses a signal-based mechanism to ensure that
when one thread changes its credentials, all other threads change
credentials to the same values. By this means, the NPTL
implementation conforms to the POSIX requirement that the threads
in a process share credentials.
Reported-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
At the kernel level, credentials (UIDs and GIDs) are a per-thread
attribute. NPTL uses a signal-based mechanism to ensure that
when one thread changes its credentials, all other threads change
credentials to the same values. By this means, the NPTL
implementation conforms to the POSIX requirement that the threads
in a process share credentials.
Reported-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
On Wed, Mar 11, 2015 at 10:43:50PM +0100, Mikael Pettersson wrote:
> Jann Horn writes:
> > Or should I throw this patch away and write a patch
> > for the prctl() manpage instead that documents that
> > being able to call sigreturn() implies being able to
> > effectively call sigprocmask(), at least on some
> > architectures like X86?
>
> Well, that is the semantics of sigreturn(). It is essentially
> setcontext() [which includes the actions of sigprocmask()], but
> with restrictions on parameter placement (at least on x86).
>
> You could introduce some setting to restrict that aspect for
> seccomp processes, but you can't change this for normal processes
> without breaking things.
Then I think it's probably better and easier to just document the
existing behavior? If a new setting would have to be introduced
and developers would need to be aware of that, it's probably
easier to just tell everyone to use SIGKILL.
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Mikael Pettersson <mikpelinux@gmail.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>