There is a lot of unnecessary duplication of content of the seccomp
material in prctl(2) and seccomp(2). Trevor Woerner also noted that
there is an error in prctl(2), where it says that the filters
"are run in order until the first non-allow result is seen", which
contradicts the correct statement in seccomp(2) that *all* filters
are executed.
So, rewrite the seccomp material in prctl(2) to strip out most of
the content duplicated in seccomp(2), and replace the removed
text with statements deferring to to seccomp(2).
Reported-by: Trevor Woerner <twoerner@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add some subsection (.SS) headings and paragraph breaks in
DESCRIPTION, to make the page more easily readable.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
glibc has tightened up its rules for replacing the memory
allocator. I went through the malloc man page and looked for how
it documented malloc() and related functions, and fixed
discrepancies with glibc malloc() documentation and/or
implementation. I also reorganized the portability discussion so
that portability issues can be seen more clearly.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Documentation/filesystems/sharedsubtree.txt has changed to
Documentation/filesystems/sharedsubtree.rst.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Especially the change to .rst format in the kernel Documentation/
tree has rendered many of the references in this manual page
obsolete. Fix them.
Reported-by: Vito Caputo <vcaputo@pengaru.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This makes it easier to compare this page to the standard,
to get more details about the rules between operators.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Unary operators are mentioned in C11::6.5.3, and casts are in
C11::6.5.4 (they are mentioned in order of precedence).
And from note 85 (in section 6.5) in that same C11 standard, major
subsections 6.5.X are sorted by precedence.
As an example (from Jakub), `sizeof(int)+1` is interpreted as
`(sizeof(int))+1`, and not `sizeof((int)+1)`.
I used C11 and not C18 (the latest) because at least in the draft
copy of C18 that I have, there are a few important typos in that
section, while the draft copy of C11 that I have is free of those
typos. And C11 and C18 are almost identical, with no major
changes to the language.
Reported-by: David Sletten <david.paul.sletten@gmail.com>
Cc: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Zero in this case refers to literal constant 0 and not symbolic
constant B0.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted by Jakub:
BTW, the exit_group.2 man page could use an update (possibly
by merging it into exit.2): it says that the "system
call is is equivalent to _exit(2) except that it terminates
not only the calling thread, but all threads in the calling
process's thread group", which isn't helpful these days.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The previous wording wasn't very explicit, leaving room for
believing that 'errno' may be 0 after returning EAI_SYSTEM.
Use a wording similar to other pages, for added consistency.
[mtk: edited commit message title; also, POSIX notes that
'errno' is set in this case.]
Reported-by: Cristian Morales Vega <christian.morales.vega@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Many times, this page use the terminology "mount point", where
"mount" would be better. A "mount point" is the location at which
a mount is attached. A "mount" is an association between a
filesystem and a mount point.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The correct terminology is "less privileged mount namespace"
(not "less privileged user namespace").
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The "Restrictions on mount namespaces" subsection belongs lower in
the page, following the discussion of concepts (e.g., shared
subtrees and propagation) that are discussed elsewhere in the page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The previous commit injected a large block of text into a list,
separating one example in the previous list item from a
"continuation" in the following list item. repair that.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For a long time, this manual page has had a brief discussion of
"locked" mounts, without clearly saying what this concept is, or
why it exists. Expand the discussion with an explanation of what
locked mounts are, why mounts are locked, and some examples of the
effect of locking.
Thanks to Christian Brauner for a lot of help in understanding
these details.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These pages split out extra errors for some APIs into a separate
list. Probably, the pages are easier to ready if all errors are
combined into a single list.
Note that there still remain a few pages where the errors are
listed separately for different APIs. For the moment, it seems
best to leave those pages as is, since the error lists are
largely distinct in those pages.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These split out errors into separate lists (perhaps per API,
perhaps "may" vs "shall", perhaps "Linux-specific" vs
standard(??)), but there's no good reason to do this. It makes
the error list harder to read, and is inconsistent with other
pages. So, combine the errors into a single list.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mainly in preparation for the following patch on project IDs maps.
Add some words that will make the parallels between the rules for
updating uid_map and projid_map clearer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Many times, these pages use the terminology "mount point", where
"mount" would be better. A "mount point" is the location at which
a mount is attached. A "mount" is an association between a
filesystem and a mount point.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux 5.8 adds STATX_MNT_ID and stx_mnt_id.
Add description to statx.2
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email with Christian Brauner:
>>>>>> int fd_tree = open_tree(-EBADF, source,
>>>>>> OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC |
>>>>>> AT_EMPTY_PATH | (recursive ? AT_RECURSIVE : 0));
>>>>>
>>>>> ???
>>>>> What is the significance of -EBADF here? As far as I can tell, it
>>>>> is not meaningful to open_tree()?
>>>>
>>>> I always pass -EBADF for similar reasons to [2]. Feel free to just use -1.
>>>
>>> ????
>>> But here, both -EBADF and -1 seem to be wrong. This argument
>>> is a dirfd, and so should either be a file descriptor or the
>>> value AT_FDCWD, right?
>>
>> [1]: In this code "source" is expected to be absolute. If it's not
>> absolute we should fail. This can be achieved by passing -1/-EBADF,
>> afaict.
>
> D'oh! Okay. I hadn't considered that use case for an invalid dirfd.
> (And now I've done some adjustments to openat(2),which contains a
> rationale for the *at() functions.)
>
> So, now I understand your purpose, but still the code is obscure,
> since
>
> * You use a magic value (-EBADF) rather than (say) -1.
> * There's no explanation (comment about) of the fact that you want
> to prevent relative pathnames.
>
> So, I've changed the code to use -1, not -EBADF, and I've added some
> comments to explain that the intent is to prevent relative pathnames.
> Okay?
Sounds good.
>
> But, there is still the meta question: what's the problem with using
> a relative pathname?
Nothing per se. Ok, you asked so it's your fault:
When writing programs I like to never use relative paths with AT_FDCWD
because. Because making assumptions about the current working directory
of the calling process is just too easy to get wrong; especially when
pivot_root() or chroot() are in play.
My absolut preference (joke intended) is to open a well-known starting
point with an absolute path to get a dirfd and then scope all future
operations beneath that dirfd. This already works with old-style
openat() and _very_ cautious programming but openat2() and its
resolve-flag space have made this **chef's kiss**.
If I can't operate based on a well-known dirfd I use absolute paths with
a -EBADF dirfd passed to *at() functions.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email:
>> Thanks. I made it "detached". Elsewhere, the page already explains
>> that a detached mount is one that:
>>
>> must have been created by calling open_tree(2) with the
>> OPEN_TREE_CLONE flag and it must not already have been
>> visible in the filesystem.
>>
>> Which seems a fine explanation.
>>
>> ????
>> But, just a thought... "visible in the filesystem" seems not quite accurate.
>> What you really mean I guess is that it must not already have been
>> /visible in the filesystem hierarchy/previously mounted/something else/,
>> right?
I suppose that I should have clarified that my main problem was
that you were using the word "filesystem" in a way that I find
unconventional/ambiguous. I mean, I normally take the term
"filesystem" to be "a storage system for folding files".
Here, you are using "filesystem" to mean something else, what
I might call like "the single directory hierarchy" or "the
filesystem hierarchy" or "the list of mount points".
> A detached mount is created via the OPEN_TREE_CLONE flag. It is a
> separate new mount so "previously mounted" is not applicable.
> A detached mount is _related_ to what the MS_BIND flag gives you with
> mount(2). However, they differ conceptually and technically. A MS_BIND
> mount(2) is always visible in the fileystem when mount(2) returns, i.e.
> it is discoverable by regular path-lookup starting within the
> filesystem.
>
> However, a detached mount can be seen as a split of MS_BIND into two
> distinct steps:
> 1. fd_tree = open_tree(OPEN_TREE_CLONE): create a new mount
> 2. move_mount(fd_tree, <somewhere>): attach the mount to the filesystem
>
> 1. and 2. together give you the equivalent of MS_BIND.
> In between 1. and 2. however the mount is detached. For the kernel
> "detached" means that an anonymous mount namespace is attached to it
> which doen't appear in proc and has a 0 sequence number (Technically,
> there's a bit of semantical argument to be made that "attached" and
> "detached" are ambiguous as they could also be taken to mean "does or
> does not have a parent mount". This ambiguity e.g. appears in
> do_move_mount(). That's why the kernel itself calls it an "anonymous
> mount". However, an OPEN_TREE_CLONE-detached mount of course doesn't
> have a parent mount so it works.).
>
> For userspace it's better to think of detached and attached in terms of
> visibility in the filesystem or in a mount namespace. That's more
> straightfoward, more relevant, and hits the target in 90% of the cases.
>
> However, the better and clearer picture is to say that a
> OPEN_TREE_CLONE-detached mount is a mount that has never been
> move_mount()ed. Which in turn can be defined as the detached mount has
> never been made visible in a mount namespace. Once that has happened the
> mount is irreversibly an attached mount.
>
> I keep thinking that maybe we should just say "anonymous mount"
> everywhere. So changing the wording to:
I'm not against the word "detached". To user space, I think it is a
little more meaningful than "anonymous". For the moment, I'll stay with
"detached", but if you insist on "anonymous", I'll probably change it.
> [...]
> EINVAL The mount that is to be ID mapped is not an anonymous mount;
> that is, the mount has already been visible in a mount namespace.
I like that text *a lot* better! Thanks very much for suggesting
wordings. It makes my life much easier.
I've made the text:
EINVAL The mount that is to be ID mapped is not a detached
mount; that is, the mount has not previously been
visible in a mount namespace.
> [...]
> The mount must be an anonymous mount; that is, it must have been
> created by calling open_tree(2) with the OPEN_TREE_CLONE flag and it
> must not already have been visible in a mount namespace, i.e. it must
> not have been attached to the filesystem hierarchy with syscalls such
> as move_mount() syscall.
And that too! I've made the text:
• The mount must be a detached mount; that is, it must have
been created by calling open_tree(2) with the
OPEN_TREE_CLONE flag and it must not already have been
visible in a mount namespace. (To put things another way:
the mount must not have been attached to the filesystem
hierarchy with a system call such as move_mount(2).)
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email with Christian Braner:
> [1]: In this code "source" is expected to be absolute. If it's not
> absolute we should fail. This can be achieved by passing -1/-EBADF,
> afaict.
D'oh! Okay. I hadn't considered that use case for an invalid dirfd.
(And now I've done some adjustments to openat(2),which contains a
rationale for the *at() functions.)
So, now I understand your purpose, but still the code is obscure,
since
* You use a magic value (-EBADF) rather than (say) -1.
* There's no explanation (comment about) of the fact that you want
to prevent relative pathnames.
So, I've changed the code to use -1, not -EBADF, and I've added some
comments to explain that the intent is to prevent relative pathnames.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make the description of the EBADF error for invalid 'dirfd' more
uniform. In particular, note that the error only occurs when the
pathname is relative, and that it occurs when the 'dirfd' is
neither valid *nor* has the value AT_FDCWD.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In particular, specifying an invalid file descriptor number
in 'dirfd' can be used as a check that 'pathname' is absolute.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Kir Kolyshkin made a start, but I think much more needs to
be said...
Reviewed-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From an email conversation with Alexis:
Hello Alexis,
On 8/6/21 7:06 PM, Alexis Wilke wrote:
> Hi guys,
>
> The pthread_setname_np(3) manual page has an example where the second
> argument is used to get a size of the thread name.
>
> https://man7.org/linux/man-pages/man3/pthread_setname_np.3.html#EXAMPLES
>
> The current code:
>
> rc = pthread_getname_np(thread, thread_name,
> (argc > 2) ? atoi(argv[1]) : NAMELEN);
>
> The suggested code:
>
> rc = pthread_getname_np(thread, thread_name,
> (argc > 2) ? atoi(argv[2]) : NAMELEN);
I agree that there's a problem, but I think we could go even simpler:
rc = pthread_getname_np(thread, thread_name, NAMELEN);
> I'm thinking that maybe the author meant to compute the length like so:
>
> rc = pthread_getname_np(thread, thread_name,
> (argc > 2) ? strlen(argv[1]) + 1 :
> NAMELEN);
>
> But I think that the atoi() points to using argv[2] as a number
> representing the length.
>
> (Of course, it should be tested against NAMELEN as a maximum, but I
> understand that examples do not always show how to verify each possible
> error).
I imagine that the author's intention was to allow the user to do
experiments where argv[2] specified a number less than NAMELEN,
in order to see the resulting ERANGE error. But, that experiment
is of limited value, and complicates the code unnecessarily, IMO,
so that's why I made the change above.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Reported-by: Alexis Wilke <alexis@m2osw.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Phrases such as "In the new mount API" will date fast. Remove it.
Also:
* Make it clear that MOUNT_ATTR__ATIME expresses a bit field.
* Replace 'enum' with 'enumeration'.
* Clarify what is meant by "partially" set MOUNT_ATTR__ATIME.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These types are already well described in mount_namespaces(7);
indeed, much of the text from that page seems to have just been
cut and pasted into this page! Simply referring the reader to
mount_namespaces(7) is sufficient.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Point out that this field can have the value zero, meaning
no change. And avoid discussions of 'enum', and simply say
that otherwise the field has one of the MS_* values.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Having this discussion under DESCRIPTION clutters that section,
and has the effect of burying the discussion of propagation. Move
the discussion to NOTES, to make the page more readable.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Change some instances of "-" to "\"
- Use C99 style (declare variables nearer use in code)
- Add a bit of white space
- Remove one 'const...const' added by Alex that caused
compiler warnings
- Use "reverse Christmas tree" form for declarations in main()
- Other minor changes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We don't really need ext4(5) and xfs(5) here. They provide
no further info that is directly relevant to the reader of
mount_setattr(2).
clone3(2) isn't necessary because it is the same page as clone(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Fix SYNOPSIS to fit in 78 columns
Also, we don't show when an include is included for a specific type,
unless that header is included _only_ for the type,
or there might be confusion (e.g., termios).
Instead, that type should be documented in system_data_types(7),
with a link page mount_attr-struct(3).
- Fix references to mount_setattr(). See man-pages(7):
Any reference to the subject of the current manual page should be writ‐
ten with the name in bold followed by a pair of parentheses in Roman
(normal) font. For example, in the fcntl(2) man page, references to
the subject of the page would be written as: fcntl(). The preferred
way to write this in the source file is:
.BR fcntl ()
- Fix line breaks according to semantic newline rules (and add some commas)
- Fix wrong usage of .IR when .RI should have been used
- Fix formatting of variable part in FOO<number>:
- Make italic the variable part (as groff_man(7) recommends)
- Remove <>
- Use syntax recommended by G. Branden Robinson (groff)
- Fix unnecessary uses of .BR or .IR when .B or .I would suffice
- Fix formatting of punctuation
In some cases, it was in italics or bold, and it should always be in roman.
- Use uppercase to begin text, even in bullet points, since those were
multi-sentence.
- Simplify usage of .RS/.RE in combination with .IP
- s/fat/FAT/ as fs(7) does
- Slightly reword some sentences for consistency
- Use Linux-specific for consistency with other pages (in VERSIONS)
- EXAMPLES: Place the return type in a line of its own (as in other pages)
- Fix alignment of code
- Replace unnecessary use of the GNU extension ({}) by do {} while (0)
In that case, there was no return value (moreover, it's a noreturn).
- Break complex declaration lines into a line for each variable
The variables were being initialized, some to non-zero values,
so for clarity, a line for each one seems more appropriate.
- Add const to pointers when possible
- s/\\/\e/
- Remove unmatched groff commands
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note the use of FUTEX_CLOCK_REALTIME for selecting the clock,
and eliminate repetition of details already covered in the
description of FUTEX_LOCK_PI.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
FUTEX_LOCK_PI2 is a new futex operation which was recently introduced into the
Linux kernel. It works exactly like FUTEX_LOCK_PI. However, it has support for
selectable clocks for timeouts. By default CLOCK_MONOTONIC is used. If
FUTEX_CLOCK_REALTIME is specified then the timeout is measured against
CLOCK_REALTIME.
This new operation addresses an inconsistency in the futex interface:
FUTEX_LOCK_PI only works with timeouts based on CLOCK_REALTIME in contrast to
all the other PI operations.
Document the FUTEX_LOCK_PI2 command.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Move example program to a new EXAMPLES section
- Invert logic in the handler to have the failure in the
conditional path, and the success out of any conditionals.
- Use NULL, EXIT_SUCCESS, and EXIT_FAILURE instead of magic numbers
- Separate declarations from code
- Put function return type on its own line
- Put function opening brace on its line
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=212385
some/path/dir/ is not always the same as some/path/dir/:
$ mkdir u
$ rmdir u/.
rmdir: failed to remove 'u/.': Invalid argument
$ rmdir u
$
The text in POSIX.1-2018 Section 4.13 ("Pathname Resolution")
is helpful in pointing to a better wording.
Reported-by: Askar Safin <safinaskar@mail.ru>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Emanuele Torre via linux-man@:
[
I was reading the man page for ldd(1)[1]; and I read this in the first
paragraph of the DECRIPTION section:
ldd prints the shared objects (shared libraries) required by each
program or shared object specified on the command line. An
example of its use and output (using sed(1) to trim leading white
space for readability in this page) is the following:
$ ldd /bin/ls | sed 's/^ */ /'
linux-vdso.so.1 (0x00007ffcc3563000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f87e5459000)
libcap.so.2 => /lib64/libcap.so.2 (0x00007f87e5254000)
libc.so.6 => /lib64/libc.so.6 (0x00007f87e4e92000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f87e4c22000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f87e4a1e000)
/lib64/ld-linux-x86-64.so.2 (0x00005574bf12e000)
libattr.so.1 => /lib64/libattr.so.1 (0x00007f87e4817000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f87e45fa000)
This is a little confusing though since that sed(1) command does not
seem to work. (and also potentially misleading for someone who is trying
figure out how to parse ldd(1)'s output.)
ldd(1) prepends a TAB character (0x09) to each line, not spaces:
$ ldd /bin/ls | xxd | head -1
00000000: 096c 696e 7578 2d76 6473 6f2e 736f 2e31 .linux-vdso.so.1
I read ldd(1)'s source code[2] (it is part of glibc) and it seems to be
a bash script that tries to use different rtld programs ( ld.so(8) )
from an RTLDLIST.
Those, on my system, are:
* /usr/lib/ld-linux.so.2
* /usr/lib64/ld-linux-x86-64.so.2
* /usr/libx32/ld-linux-x32.so.2
And they all seem to also be part of glibc.
I have tried to follow the git history of glibc to see when the switch
from spaces to the TAB character occured, but, to me, it seems like
glibc.git/elf/rtld.c has always used '\t'; at since
6a76c115150318eae5d02eca76f2fc03be7bd029[3] (358th commit since glibc
started using the git repository repository - Nov 18th 1995): before
that commit there are not any results for `git grep '\\t'` in the elf
directory and I did not investigate further.
Still, at the time of that commit, glibc did not seem to have an ldd(1)
utility.
Perhaps the man page is old and its original author was using and
documenting an ldd(1) utility that was not part of glibc when he was
writing it.
Anyhow, since I think that sed(1) command will not work on any system
that uses, at least, the most recent version of glibc (because lld(1)
and the ld.so(8) programs it depends on are all part of glibc), I think
that that example should be changed to avoid confusions.
The output format of ldd(1) does not seem to be clearly defined, so I
think this would be a good option:
$ ldd /bin/ls | sed 's/^[[:space:]]*/ /'
NB: ^\s* should also work on most GNU/Linux systems, but \s is
non-standard or documented so I don not suggest using it in the man
page.
Another option could be to remove "the pipe to sed(1)" part and the note
in parentheses that explains why it was used by the original author.
Cheers.
emanuele6
[1]: https://man7.org/linux/man-pages/man1/ldd.1.html
[2]: https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/ldd.bash.in;h=ba736464ac5e4a9390b1b6a39595035238250232;hb=5188a9d0265cc6f7235a8af1d31ab02e4a24853d
[3]: https://sourceware.org/git/?p=glibc.git;a=commit;h=6a76c115150318eae5d02eca76f2fc03be7bd029
///////
$ uname -a
Linux t420 5.10.54-1-lts #1 SMP Wed, 28 Jul 2021 15:05:20 +0000
x86_64 GNU/Linux
$ pacman -Qo ldd
/usr/bin/ldd is owned by glibc 2.33-5
$ pacman -Qo /usr/share/man/man1/ldd.1.gz
/usr/share/man/man1/ldd.1.gz is owned by man-pages 5.12-2
$ pacman -Qo /usr/lib/ld-linux.so.2
/usr/lib/ld-linux.so.2 is owned by lib32-glibc 2.33-5
$ pacman -Qo /usr/lib64/ld-linux-x86-64.so.2
/usr/lib/ld-linux-x86-64.so.2 is owned by glibc 2.33-5
$ pacman -F /usr/libx32/ld-linux-x32.so.2 || echo not available on arch linux.
not available on arch linux.
]
Reported-by: EmanueleTorre <torreemanuele6@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: James O. D. Hunt <jamesodhunt@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explain that `optstring` cannot contain a semi-colon (`;`)
character.
[mtk: verified with a small test program; see also posix/getopt.c
in the glibc sources:
if (temp == NULL || c == ':' || c == ';')
{
if (print_errors)
fprintf (stderr, _("%s: invalid option -- '%c'\n"), argv[0], c);
d->optopt = c;
return '?';
}
]
Also explain that `optstring` can include `+` as an option
character, possibly in addition to that character being used as
the first character in `optstring` to denote `POSIXLY_CORRECT`
behaviour.
[mtk: verified with a small test program.]
Test program below. Example runs:
$ ./a.out -+
opt = 43 (+); optind = 2
Got plus
$ ./a.out -';'
./a.out: invalid option -- ';'
opt = 63 (?); optind = 2; optopt = 59 (;)
Unrecognized option (-;)
Usage: ./a.out [-p arg] [-x]
Signed-off-by: James O. D. Hunt <jamesodhunt@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
#include <ctype.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define printable(ch) (isprint((unsigned char) ch) ? ch : '#')
static void /* Print "usage" message and exit */
usageError(char *progName, char *msg, int opt)
{
if (msg != NULL && opt != 0)
fprintf(stderr, "%s (-%c)\n", msg, printable(opt));
fprintf(stderr, "Usage: %s [-p arg] [-x]\n", progName);
exit(EXIT_FAILURE);
}
int
main(int argc, char *argv[])
{
int opt, xfnd;
char *pstr;
xfnd = 0;
pstr = NULL;
while ((opt = getopt(argc, argv, "p:x+;")) != -1) {
printf("opt =%3d (%c); optind = %d", opt, printable(opt), optind);
if (opt == '?' || opt == ':')
printf("; optopt =%3d (%c)", optopt, printable(optopt));
printf("\n");
switch (opt) {
case 'p': pstr = optarg; break;
case 'x': xfnd++; break;
case ';': printf("Got semicolon\n"); break;
case '+': printf("Got plus\n"); break;
case ':': usageError(argv[0], "Missing argument", optopt);
case '?': usageError(argv[0], "Unrecognized option", optopt);
default:
printf("Unexpected case in switch()\n");
exit(EXIT_FAILURE);
}
}
if (xfnd != 0)
printf("-x was specified (count=%d)\n", xfnd);
if (pstr != NULL)
printf("-p was specified with the value \"%s\"\n", pstr);
if (optind < argc)
printf("First nonoption argument is \"%s\" at argv[%d]\n",
argv[optind], optind);
exit(EXIT_SUCCESS);
}
Do not include unused (and incompatible) header file termios.h and
include required header files for puts() and close() functions.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
SPARC is special, it does not have Bnnn constants for baud rates above
2000000. Instead it defines 4 Bnnn constants with smaller baud rates.
This difference between SPARC and non-SPARC architectures is present in
both glibc API (termios.h) and also kernel ioctl API (asm/termbits.h).
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Format variable parts of words referring to a group of identifiers
in italics, following groff_man(7) recommendations.
Also srcfix surrounding uses of \f escape sequences to use macros
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These baud-rate macro constants are defined in bits/termios.h and are
already supported.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Saw this while preparing the "switch to \~" change Alex invited.
Signed-off-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After a patch proposal from наб triggered by concerns that, when
talking about PIPE_BUF, pipe(7) explicitly mentions write(2) but
not writev(2), I've concluded that the reference in writev(2) to
pipe(7) is not needed (mea culpa; I added that text), and I think
the text in pipe(7) could be written to be closer to the POSIX
spec, which doesn't talk about "write() calls", but simply about
"writes".
Reported-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mainly: I generally don't want us to be including URLs to mailing
list discussions in a manual page. Either, the issue in the
discussion is worth writing up in the manual page (so that
the reader doesn't have to look elsewhere), or the details
are less important, in which case it is sufficient to note the
existence of the bug. I think this is an example of the latter.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The correct kernel version seems to 5.11, not 5.10:
$ git describe --contains d0e3fc69d00d
v5.11-rc1~76^2~251
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Christophe Leroy via Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=213421
[
In ppc32 functions section, the Y2038 compliant function
__kernel_clock_gettime64() is missing.
It was added by commit d0e3fc69d00d
("powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32")
]
.../linux$ git describe d0e3fc69d00d
v5.10-rc2-76-gd0e3fc69d00d
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As pointed out by Nora, the example shown in the manual
page already demonstrates that the pathname is not absolute!
Reported-by: Nora Platiel <nplatiel@gmx.us>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove duplicated word.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix a typo in the documentation of using fallocate to allocate shared
blocks. The flag FALLOC_FL_UNSHARE should instead be documented as
FALLOC_FL_UNSHARE_RANGE.
Fixes: 63a599c657 ("man2/fallocate.2: Document behavior with shared blocks")
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Alan:
[
The on-line copy of the manual page "posixoptions(7)" dated
2018-04-30 has an entry for "getcwd()" in the section headed
"XSI - _XOPEN_LEGACY - _SC_XOPEN_LEGACY".
I believe that entry should be "getwd()" as that is the API call
which was present in X/Open-6 but withdrawn in X/Open-7.
]
mtk: confirmed by reviewing the table ("Removed Functions and
Symbols in Issue 7") at the end of Section B.1.1 on page
3564 of IEEE Std 1003.1, 2016 Edition.
Reported-by: Alan Peakall <Alan.Peakall@helpsystems.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=213419
ppc/32 and ppc/64 sections both have the following note:
The CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE clocks are
not supported by the __kernel_clock_getres and
__kernel_clock_gettime interfaces; the kernel falls back to the
real system call
This note has been wrong from quite some time now, since commit
654abc69ef2e ("powerpc/vdso32: Add support for
CLOCK_{REALTIME/MONOTONIC}_COARSE") and commit
5c929885f1bb ("powerpc/vdso64: Add support for
CLOCK_{REALTIME/MONOTONIC}_COARSE")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since commit ee81d7e418, the flags list has been (only) above, not
below, these references.
(The flags table was added even before that, in commit 0b497138b9
("namespaces.7: Add table of namespaces to top of page"))
Fixes: ee81d7e418 ("namespaces.7: Include manual page references in the summary table of namespace types")
Signed-off-by: Štěpán Němec <stepnem@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Correct function signature by adding missing parenthesis.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I regularly get mildly lost in this table (and, indeed, didn't realise
it had two columns the first few times I used it to look at something
from the left column) ‒ separating the two columns improves clarity,
and makes which soup of numbers belongs to which character
much more obvious
Other encodings don't need this as they don't use double-columnated
tables
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The recv.2 misspelled `SO_EE_OFFENDER` to `SOCK_EE_OFFENDER`.
This patch fix this typo.
Signed-off-by: kXuan <kxuanobj@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The file being referred to no longer exists, as it was moved to
*.rst first (commit 20a78ae9ed297f2) and then to under
admin-guide (commit bf6b7a742e3f82b). Both those commits
are from 2019 (Linux 5.3).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Using mount flag `MS_NOSUID` also affects SELinux domain transitions but
this has not been documented well.
Signed-off-by: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
With a simple backslash, '\0' ended up as ' ' in the man output.
Reported-by: Štěpán Němec <stepnem@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
No implementation or spec requires *n to be 0 to allocate a new buffer:
* musl checks for !*lineptr
(and sets *n=0 for later allocations)
* glibc checks for !*lineptr || !*n
(but only because it allocates early)
* NetBSD checks for !*lineptr
(and sets *n=0 for later allocations)
(but specifies *n => mlen(*lineptr) >= *n as a precondition,
to which this appears to be an exception)
* FreeBSD checks for !*lineptr and sets *n=0
(and specifies !*lineptr as sufficient)
* Lastly, POSIX.1-2017 specifies:
> If *n is non-zero, the application shall ensure that *lineptr
> either points to an object of size at least *n bytes,
> or is a null pointer.
The new wording matches POSIX, even if it arrives at the point slightly
differently
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Don't document includes that provide types; only those that
provide prototypes and constants.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The types that need <sys/types.h> are better documented in
system_data_types(7). Let's keep only the includes for the
prototypes and the constants.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'struct iovec' is defined in <bits/types/struct_iovec.h>,
which is included by <sys/io.h>, but it is also included by
<bits/fcntl-linux.h>, which is in the end included by <fcntl.h>.
Given that we already include <fcntl.h>, we don't need any more
includes.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'struct utimbuf' is provided by <utime.h>.
There's no need for <sys/types.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<sys/types.h> makes no sense for a function that only uses 'int'.
The flags used by this function are provided by <fcntl.h>
(or others), but not by <linux/userfaultfd.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'mode_t', which is the only reason this might have been ever
needed, is provided by <sys/stat.h> since POSIX.1-2001.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'off_t', which is the only reason this might have been ever
needed, is provided by <unistd.h> since POSIX.1-2001.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There seems to be no reason to include <unistd.h>.
<sys/swap.h> already provides both the function prototypes and the
SWAP_* constants.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<unistd.h> doesn't seem to be needed:
AT_* constants come from <fcntl.h>
STATX_* constants come from <sys/stat.h>
'struct statx' comes from <sys/stat.h>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove <sys/types.h>; ffix too
<sys/types.h> is only needed for 'struct stat'.
That is better documented in system_data_types(7).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A function declarator with empty parentheses, which is not a
prototype, is an obsolescent feature of C (See C17 6.11.6.1), and
doesn't mean 0 parameters, but instead that no information about
the parameters is provided (See C17 6.5.2.2).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It's only needed for getting 'mode_t'.
But that type is better documented in system_data_types(7).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Doing so decreases the degree to which text is indented, and
thus avoids short, poorly wrapped lines.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Jann Horn:
[[
As discussed at
<https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>,
we need to re-check checkNotificationIdIsValid() after reading remote
memory but before using the read value in any way. Otherwise, the
syscall could in the meantime get interrupted by a signal handler, the
signal handler could return, and then the function that performed the
syscall could free() allocations or return (thereby freeing buffers on
the stack).
In essence, this pread() is (unavoidably) a potential use-after-free
read; and to make that not have any security impact, we need to check
whether UAF read occurred before using the read value. This should
probably be called out elsewhere in the manpage, too...
Now, of course, **reading** is the easy case. The difficult case is if
we have to **write** to the remote process... because then we can't
play games like that. If we write data to a freed pointer, we're
screwed, that's it. (And for somewhat unrelated bonus fun, consider
that /proc/$pid/mem is originally intended for process debugging,
including installing breakpoints, and will therefore happily write
over "readonly" private mappings, such as typical mappings of
executable code.)
So, uuuuh... I guess if anyone wants to actually write memory back to
the target process, we'd better come up with some dedicated API for
that, using an ioctl on the seccomp fd that magically freezes the
target process inside the syscall while writing to its memory, or
something like that? And until then, the manpage should have a big fat
warning that writing to the target's memory is simply not possible
(safely).
]]
and
<https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@mail.gmail.com>:
[[
The second bit of trouble is that if the supervisor is so oblivious
that it doesn't realize that syscalls can be interrupted, it'll run
into other problems. Let's say the target process does something like
this:
int func(void) {
char pathbuf[4096];
sprintf(pathbuf, "/tmp/blah.%d", some_number);
mount("foo", pathbuf, ...);
}
and mount() is handled with a notification. If the supervisor just
reads the path string and immediately passes it into the real mount()
syscall, something like this can happen:
target: starts mount()
target: receives signal, aborts mount()
target: runs signal handler, returns from signal handler
target: returns out of func()
supervisor: receives notification
supervisor: reads path from remote buffer
supervisor: calls mount()
but because the stack allocation has already been freed by the time
the supervisor reads it, the supervisor just reads random garbage, and
beautiful fireworks ensue.
So the supervisor *fundamentally* has to be written to expect that at
*any* time, the target can abandon a syscall. And every read of remote
memory has to be separated from uses of that remote memory by a
notification ID recheck.
And at that point, I think it's reasonable to expect the supervisor to
also be able to handle that a syscall can be aborted before the
notification is delivered.
]]
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Rename the function that does the SECCOMP_IOCTL_NOTIF_ID_VALID
check.
- Make that function return a 'bool' rather than terminating the
process.
- Use that return value in the calling function.
- Rework/improve various related comments.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
pidfd_open(2) and pidfd_getfd(2) presumably have use cases
with the user-space notification feature.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
According to Tycho Andersen, he had no particular use case
in mind when building this detail into the API.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The seccomp user-space notification feature can cause changes in
the semantics of SA_RESTART with respect to system calls that
would never normally be restarted. Point the reader to the page
that provide further details.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
And, as noted by Jann Horn, note how the user-space notification
mechanism causes a small breakage in the user-space API with
respect to nonrestartable system calls.
====
From the email discussion with Jann Horn
> >> So, I partially demonstrated what you describe here, for two example
> >> system calls (epoll_wait() and pause()). But I could not exactly
> >> demonstrate things as I understand you to be describing them. (So,
> >> I'm not sure whether I have not understood you correctly, or
> >> if things are not exactly as you describe them.)
> >>
> >> Here's a scenario (A) that I tested:
> >>
> >> 1. Target installs seccomp filters for a blocking syscall
> >> (epoll_wait() or pause(), both of which should never restart,
> >> regardless of SA_RESTART)
> >> 2. Target installs SIGINT handler with SA_RESTART
> >> 3. Supervisor is sleeping (i.e., is not blocked in
> >> SECCOMP_IOCTL_NOTIF_RECV operation).
> >> 4. Target makes a blocking system call (epoll_wait() or pause()).
> >> 5. SIGINT gets delivered to target; handler gets called;
> >> ***and syscall gets restarted by the kernel***
> >>
> >> That last should never happen, of course, and is a result of the
> >> combination of both the user-notify filter and the SA_RESTART flag.
> >> If one or other is not present, then the system call is not
> >> restarted.
> >>
> >> So, as you note below, the UAPI gets broken a little.
> >>
> >> However, from your description above I had understood that
> >> something like the following scenario (B) could occur:
> >>
> >> 1. Target installs seccomp filters for a blocking syscall
> >> (epoll_wait() or pause(), both of which should never restart,
> >> regardless of SA_RESTART)
> >> 2. Target installs SIGINT handler with SA_RESTART
> >> 3. Supervisor performs SECCOMP_IOCTL_NOTIF_RECV operation (which
> >> blocks).
> >> 4. Target makes a blocking system call (epoll_wait() or pause()).
> >> 5. Supervisor gets seccomp user-space notification (i.e.,
> >> SECCOMP_IOCTL_NOTIF_RECV ioctl() returns
> >> 6. SIGINT gets delivered to target; handler gets called;
> >> and syscall gets restarted by the kernel
> >> 7. Supervisor performs another SECCOMP_IOCTL_NOTIF_RECV operation
> >> which gets another notification for the restarted system call.
> >>
> >> However, I don't observe such behavior. In step 6, the syscall
> >> does not get restarted by the kernel, but instead returns -1/EINTR.
> >> Perhaps I have misconstructed my experiment in the second case, or
> >> perhaps I've misunderstood what you meant, or is it possibly the
> >> case that things are not quite as you said?
>
> Thanks for the code, Jann (including the demo of the CLONE_FILES
> technique to pass the notification FD to the supervisor).
>
> But I think your code just demonstrates what I described in
> scenario A. So, it seems that I both understood what you
> meant (because my code demonstrates the same thing) and
> also misunderstood what you said (because I thought you
> were meaning something more like scenario B).
Ahh, sorry, I should've read your mail more carefully. Indeed, that
testcase only shows scenario A. But the following shows scenario B...
[Below, two pieces of code from Jann, with a lot of
cosmetic changes by mtk.]
====
[And from a follow-up in the same email thread:]
> If userspace relies on non-restarting behavior, it should be using
> something like epoll_pwait(). And that stuff only unblocks signals
> after we've already past the seccomp checks on entry.
Thanks for elaborating that detail, since as soon as you talked
about "enlarging a preexisting race" above, I immediately wondered
sigsuspend(), pselect(), etc.
(Mind you, I still wonder about the effect on system calls that
are normally nonrestartable because they have timeouts. My
understanding is that the kernel doesn't restart those system
calls because it's impossible for the kernel to restart the call
with the right timeout value. I wonder what happens when those
system calls are restarted in the scenario we're discussing.)
Anyway, returning to your point... So, to be clear (and to
quickly remind myself in case I one day reread this thread),
there is not a problem with sigsuspend(), pselect(), ppoll(),
and epoll_pwait() since:
* Before the syscall, signals are blocked in the target.
* Inside the syscall, signals are still blocked at the time
the check is made for seccomp filters.
* If a seccomp user-space notification event kicks, the target
is put to sleep with the signals still blocked.
* The signal will only get delivered after the supervisor either
triggers a spoofed success/failure return in the target or the
supervisor sends a CONTINUE response to the kernel telling it
to execute the target's system call. Either way, there won't be
any restarting of the target's system call (and the supervisor
thus won't see multiple notifications).
====
Scenario A
$ ./seccomp_unotify_restart_scen_A
C: installed seccomp: fd 3
C: woke 1 waiters
P: child installed seccomp fd 3
C: About to call pause(): Success
P: going to send SIGUSR1...
C: sigusr1_handler handler invoked
P: about to terminate
C: got pdeath signal on parent termination
C: about to terminate
/* Modified version of code from Jann Horn */
#define _GNU_SOURCE
#include <stdio.h>
#include <signal.h>
#include <err.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sched.h>
#include <stddef.h>
#include <limits.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/futex.h>
struct {
int seccomp_fd;
} *shared;
static void
sigusr1_handler(int sig, siginfo_t * info, void *uctx)
{
printf("C: sigusr1_handler handler invoked\n");
}
static void
sigusr2_handler(int sig, siginfo_t * info, void *uctx)
{
printf("C: got pdeath signal on parent termination\n");
printf("C: about to terminate\n");
exit(0);
}
int
main(void)
{
setbuf(stdout, NULL);
/* Allocate memory that will be shared by parent and child */
shared = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if (shared == MAP_FAILED)
err(1, "mmap");
shared->seccomp_fd = -1;
/* glibc's clone() wrapper doesn't support fork()-style usage */
/* Child process and parent share file descriptor table */
pid_t child = syscall(__NR_clone, CLONE_FILES | SIGCHLD,
NULL, NULL, NULL, 0);
if (child == -1)
err(1, "clone");
/* CHILD */
if (child == 0) {
/* don't outlive the parent */
prctl(PR_SET_PDEATHSIG, SIGUSR2);
if (getppid() == 1)
exit(0);
/* Install seccomp filter */
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
struct sock_filter insns[] = {
BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_pause, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_USER_NOTIF),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW)
};
struct sock_fprog prog = {
.len = sizeof(insns) / sizeof(insns[0]),
.filter = insns
};
int seccomp_ret = syscall(__NR_seccomp, SECCOMP_SET_MODE_FILTER,
SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog);
if (seccomp_ret < 0)
err(1, "install");
printf("C: installed seccomp: fd %d\n", seccomp_ret);
/* Place the notifier FD number into the shared memory */
__atomic_store(&shared->seccomp_fd, &seccomp_ret,
__ATOMIC_RELEASE);
/* Wake the parent */
int futex_ret =
syscall(__NR_futex, &shared->seccomp_fd, FUTEX_WAKE,
INT_MAX, NULL, NULL, 0);
printf("C: woke %d waiters\n", futex_ret);
/* Establish SA_RESTART handler for SIGUSR1 */
struct sigaction act = {
.sa_sigaction = sigusr1_handler,
.sa_flags = SA_RESTART | SA_SIGINFO
};
if (sigaction(SIGUSR1, &act, NULL))
err(1, "sigaction");
struct sigaction act2 = {
.sa_sigaction = sigusr2_handler,
.sa_flags = 0
};
if (sigaction(SIGUSR2, &act2, NULL))
err(1, "sigaction");
/* Make a blocking system call */
perror("C: About to call pause()");
pause();
perror("C: pause returned");
exit(0);
}
/* PARENT */
/* Wait for futex wake-up from child */
int futex_ret = syscall(__NR_futex, &shared->seccomp_fd, FUTEX_WAIT,
-1, NULL, NULL, 0);
if (futex_ret == -1 && errno != EAGAIN)
err(1, "futex wait");
/* Get notification FD from the child */
int fd = __atomic_load_n(&shared->seccomp_fd, __ATOMIC_ACQUIRE);
printf("\tP: child installed seccomp fd %d\n", fd);
sleep(1);
printf("\tP: going to send SIGUSR1...\n");
kill(child, SIGUSR1);
sleep(1);
printf("\tP: about to terminate\n");
exit(0);
}
====
Scenario B
$ ./seccomp_unotify_restart_scen_B
C: installed seccomp: fd 3
C: woke 1 waiters
C: About to call pause()
P: child installed seccomp fd 3
P: about to SECCOMP_IOCTL_NOTIF_RECV
P: got notif: id=17773741941218455591 pid=25052 nr=34
P: about to send SIGUSR1 to child...
P: about to SECCOMP_IOCTL_NOTIF_RECV
C: sigusr1_handler handler invoked
P: got notif: id=17773741941218455592 pid=25052 nr=34
P: about to send SIGUSR1 to child...
P: about to SECCOMP_IOCTL_NOTIF_RECV
C: sigusr1_handler handler invoked
P: got notif: id=17773741941218455593 pid=25052 nr=34
P: about to send SIGUSR1 to child...
P: about to SECCOMP_IOCTL_NOTIF_RECV
C: sigusr1_handler handler invoked
P: got notif: id=17773741941218455594 pid=25052 nr=34
P: about to send SIGUSR1 to child...
C: sigusr1_handler handler invoked
C: got pdeath signal on parent termination
C: about to terminate
/* Modified version of code from Jann Horn */
#define _GNU_SOURCE
#include <stdio.h>
#include <signal.h>
#include <err.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sched.h>
#include <stddef.h>
#include <string.h>
#include <limits.h>
#include <inttypes.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/futex.h>
struct {
int seccomp_fd;
} *shared;
static void
sigusr1_handler(int sig, siginfo_t * info, void *uctx)
{
printf("C: sigusr1_handler handler invoked\n");
}
static void
sigusr2_handler(int sig, siginfo_t * info, void *uctx)
{
printf("C: got pdeath signal on parent termination\n");
printf("C: about to terminate\n");
exit(0);
}
static size_t
max_size(size_t a, size_t b)
{
return (a > b) ? a : b;
}
int
main(void)
{
setbuf(stdout, NULL);
/* Allocate memory that will be shared by parent and child */
shared = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if (shared == MAP_FAILED)
err(1, "mmap");
shared->seccomp_fd = -1;
/* glibc's clone() wrapper doesn't support fork()-style usage */
/* Child process and parent share file descriptor table */
pid_t child = syscall(__NR_clone, CLONE_FILES | SIGCHLD,
NULL, NULL, NULL, 0);
if (child == -1)
err(1, "clone");
/* CHILD */
if (child == 0) {
/* don't outlive the parent */
prctl(PR_SET_PDEATHSIG, SIGUSR2);
if (getppid() == 1)
exit(0);
/* Install seccomp filter */
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
struct sock_filter insns[] = {
BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_pause, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_USER_NOTIF),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW)
};
struct sock_fprog prog = {
.len = sizeof(insns) / sizeof(insns[0]),
.filter = insns
};
int seccomp_ret = syscall(__NR_seccomp, SECCOMP_SET_MODE_FILTER,
SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog);
if (seccomp_ret < 0)
err(1, "install");
printf("C: installed seccomp: fd %d\n", seccomp_ret);
/* Place the notifier FD number into the shared memory */
__atomic_store(&shared->seccomp_fd, &seccomp_ret,
__ATOMIC_RELEASE);
/* Wake the parent */
int futex_ret =
syscall(__NR_futex, &shared->seccomp_fd, FUTEX_WAKE,
INT_MAX, NULL, NULL, 0);
printf("C: woke %d waiters\n", futex_ret);
/* Establish SA_RESTART handler for SIGUSR1 */
struct sigaction act = {
.sa_sigaction = sigusr1_handler,
.sa_flags = SA_RESTART | SA_SIGINFO
};
if (sigaction(SIGUSR1, &act, NULL))
err(1, "sigaction");
struct sigaction act2 = {
.sa_sigaction = sigusr2_handler,
.sa_flags = 0
};
if (sigaction(SIGUSR2, &act2, NULL))
err(1, "sigaction");
/* Make a blocking system call */
printf("C: About to call pause()\n");
pause();
perror("C: pause returned");
exit(0);
}
/* PARENT */
/* Wait for futex wake-up from child */
int futex_ret = syscall(__NR_futex, &shared->seccomp_fd, FUTEX_WAIT,
-1, NULL, NULL, 0);
if (futex_ret == -1 && errno != EAGAIN)
err(1, "futex wait");
/* Get notification FD from the child */
int fd = __atomic_load_n(&shared->seccomp_fd, __ATOMIC_ACQUIRE);
printf("\tP: child installed seccomp fd %d\n", fd);
/* Discover seccomp buffer sizes and allocate notification buffer */
struct seccomp_notif_sizes sizes;
if (syscall(__NR_seccomp, SECCOMP_GET_NOTIF_SIZES, 0, &sizes))
err(1, "notif_sizes");
struct seccomp_notif *notif =
malloc(max_size(sizeof(struct seccomp_notif),
sizes.seccomp_notif));
if (!notif)
err(1, "malloc");
for (int i = 0; i < 4; i++) {
printf("\tP: about to SECCOMP_IOCTL_NOTIF_RECV\n");
memset(notif, '\0', sizes.seccomp_notif);
if (ioctl(fd, SECCOMP_IOCTL_NOTIF_RECV, notif))
err(1, "notif_recv");
printf("\tP: got notif: id=%llu pid=%u nr=%d\n",
notif->id, notif->pid, notif->data.nr);
sleep(1);
printf("\tP: about to send SIGUSR1 to child...\n");
kill(child, SIGUSR1);
}
sleep(1);
exit(0);
}
====
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the usual case, read(fd, buf, PATH_MAX) will return PATH_MAX
bytes that include trailing garbage after the pathname. So the
right check is to scan from the start of the buffer to see if
there's a NUL, and error if there is not.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After some discussions with Jann Horn, perhaps a better way of
dealing with an invalid target pathname is to trigger an
error for the system call.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From a conversation with Jann Horn:
[[
>>>> struct seccomp_notif_resp *resp = malloc(sizes.seccomp_notif_resp);
>>>
>>> This should probably do something like max(sizes.seccomp_notif_resp,
>>> sizeof(struct seccomp_notif_resp)) in case the program was built
>>> against new UAPI headers that make struct seccomp_notif_resp big, but
>>> is running under an old kernel where that struct is still smaller?
>>
>> I'm confused. Why? I mean, if the running kernel says that it expects
>> a buffer of a certain size, and we allocate a buffer of that size,
>> what's the problem?
>
> Because in userspace, we cast the result of malloc() to a "struct
> seccomp_notif_resp *". If the kernel tells us that it expects a size
> smaller than sizeof(struct seccomp_notif_resp), then we end up with a
> pointer to a struct that consists partly of allocated memory, partly
> of out-of-bounds memory, which is generally a bad idea - I'm not sure
> whether the C standard permits that. And if userspace then e.g.
> decides to access some member of that struct that is beyond what the
> kernel thinks is the struct size, we get actual OOB memory accesses.
Got it. (But gosh, this seems like a fragile API mess.)
I added the following to the code:
/* When allocating the response buffer, we must allow for the fact
that the user-space binary may have been built with user-space
headers where 'struct seccomp_notif_resp' is bigger than the
response buffer expected by the (older) kernel. Therefore, we
allocate a buffer that is the maximum of the two sizes. This
ensures that if the supervisor places bytes into the response
structure that are past the response size that the kernel expects,
then the supervisor is not touching an invalid memory location. */
size_t resp_size = sizes.seccomp_notif_resp;
if (sizeof(struct seccomp_notif_resp) > resp_size)
resp_size = sizeof(struct seccomp_notif_resp);
struct seccomp_notif_resp *resp = malloc(resp_size);
]]
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From a conversation with Jann Horn:
>> We should probably make sure here that the value we read is actually
>> NUL-terminated?
>
> So, I was curious about that point also. But, (why) are we not
> guaranteed that it will be NUL-terminated?
Because it's random memory filled by another process, which we don't
necessarily trust. While seccomp notifiers aren't usable for applying
*extra* security restrictions, the supervisor will still often be more
privileged than the supervised process.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change "read(2) will return 0" to "read(2) may return 0".
Quoting Jann Horn:
Maybe make that "may return 0" instead of "will return 0" -
reading from /proc/$pid/mem can only return 0 in the
following cases AFAICS:
1. task->mm was already gone at open() time
2. mm->mm_users has dropped to zero (the mm only has lazytlb
users; page tables and VMAs are being blown away or have
been blown away)
3. the syscall was called with length 0
When a process has gone away, normally mm->mm_users will
drop to zero, but someone else could theoretically still be
holding a reference to the mm (e.g. someone else in the
middle of accessing /proc/$pid/mem). (Such references
should normally not be very long-lived though.)
Additionally, in the unlikely case that the OOM killer just
chomped through the page tables of the target process, I
think the read will return -EIO (same error as if the
address was simply unmapped) if the address is within a
non-shared mapping. (Maybe that's something procfs could do
better...)
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add some strongly worded text warning the reader about the correct
uses of seccomp user-space notification.
Reported-by: Jann Horn <jannh@google.com>
Cowritten-by: Christian Brauner <christian@brauner.io>
Cowritten-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The APIs used by this mechanism comprise not only seccomp(2), but
also a number of ioctl(2) operations. And any useful example
demonstrating these APIs is will necessarily be rather long.
Trying to cram all of this into the seccomp(2) page would make
that page unmanageably long. Therefore, let's document this
mechanism in a separate page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing text says the structures (plural!) contain a 'struct
seccomp_data'. But this is only true for the received notification
structure (seccomp_notif). So, reword the sentence to be more
general, noting simply that the structures may evolve over time.
Add some comments to the structure definition.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rework the description a little, and note that the close-on-exec
flag is set for the returned file descriptor.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I can't see a reason to include it. <fcntl.h> provides O_*
constants for 'flags', S_* constants for 'mode', and mode_t.
Probably a long time ago, some of those weren't defined in
<fcntl.h>, and both headers needed to be included, or maybe it's
a historical error.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
My previous patch intended to drop the docs for the lockdown lift
SysRq, but it missed this other section that refers to lifting it
via a keyboard - an allusion to that same SysRq.
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The patch that implemented lockdown lifting via SysRq ended up
getting dropped[*] before the feature was merged upstream. Having
the feature documented but unsupported has caused some confusion
for our users.
[*] http://archive.lwn.net:8080/linux-kernel/CACdnJuuxAM06TcnczOA6NwxhnmQUeqqm3Ma8btukZpuCS+dOqg@mail.gmail.com/
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Pedro Principeza <pedro.principeza@canonical.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Kyle McMartin <kyle@redhat.com>
Cc: Matthew Garrett <mjg59@google.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Instead of having a monolithic 'make install', break it into
multiple targets such as 'make install-man3'. This simplifies
packaging, for example in Debian, where they break this project
into several packages: 'manpages' and 'manpages-dev', each
containing different mandirs.
The above allows for multithread installation: 'make -j'
Also, don't overwrite files that don't need to be overwritten, by
having a target for files, which makes use of make's timestamp
comparison.
This allows for much faster installation times.
For comparison, on my laptop (i7-8850H; 6C/12T):
Old Makefile:
~/src/linux/man-pages$ time sudo make >/dev/null
real 0m7.509s
user 0m5.269s
sys 0m2.614s
The times with the old makefile, varied a lot, between
5 and 10 seconds. The times after applying this patch
are much more consistent. BTW, I compared these times to
the very old Makefile of man-pages-5-09, and those were
around 3.5 s, so it was a bit of my fault to have such a
slow Makefile, when I changed the Makefile some weeks ago.
New Makefile (full clean install):
~/src/linux/man-pages$ time sudo make >/dev/null
real 0m5.160s
user 0m4.326s
sys 0m1.137s
~/src/linux/man-pages$ time sudo make -j2 >/dev/null
real 0m1.602s
user 0m2.529s
sys 0m0.289s
~/src/linux/man-pages$ time sudo make -j >/dev/null
real 0m1.398s
user 0m2.502s
sys 0m0.281s
Here we can see that 'make -j' drops times drastically,
compared to the old monolithic Makefile. Not only that,
but since when we are working with the man pages there
aren't many pages involved, times will be even better.
Here are some times with a single page changed (touched):
New Makefile (one page touched):
~/src/linux/man-pages$ touch man2/membarrier.2
~/src/linux/man-pages$ time sudo make install
- INSTALL /usr/local/share/man/man2/membarrier.2
real 0m0.988s
user 0m0.966s
sys 0m0.025s
~/src/linux/man-pages$ touch man2/membarrier.2
~/src/linux/man-pages$ time sudo make install -j
- INSTALL /usr/local/share/man/man2/membarrier.2
real 0m0.989s
user 0m0.943s
sys 0m0.049s
Also, modify the output of the make install and uninstall commands
so that a line is output for each file or directory that is
installed, similarly to the kernel's Makefile. This doesn't apply
to html targets, which haven't been changed in this commit.
Also, make sure that for each invocation of $(INSTALL_DIR), no
parents are created, (i.e., avoid `mkdir -p` behavior). The GNU
make manual states that it can create race conditions. Instead,
declare as a prerequisite for each directory its parent directory,
and let make resolve the order of creation.
Also, use ':=' instead of '=' to improve performance, by
evaluating each assignment only once.
Ensure than the shell is not called when not needed, by removing
all ";" and quotes in the commands.
See also: <https://stackoverflow.com/q/67862417/6872717>
Specify conventions and rationales used in the Makefile in a comment.
Add copyright.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document also why each header is required
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This function doesn't use any flags or special types, so there's
no reason to include <asm/unistd.h>; remove it. Add the includes
needed for syscall(2) only.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Also document why each header is needed.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This error can occur if the caller is does not have CAP_IPC_LOCK
and is not a member of the sysctl_hugetlb_shm_group.
Reported-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As a deprecated feature, it appears that the RLIMIT_MEMLOCK
can also be used to permit huge page allocation, but let's
not document that for now.
In the Linux 5.12, see fs/hugetlbfs/inode.c.
static int can_do_hugetlb_shm(void)
{
kgid_t shm_group;
shm_group = make_kgid(&init_user_ns, sysctl_hugetlb_shm_group);
return capable(CAP_IPC_LOCK) || in_group_p(shm_group);
}
...
struct file *hugetlb_file_setup(const char *name, size_t size,
vm_flags_t acctflag, struct user_struct **user,
int creat_flags, int page_size_log)
{
...
if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) {
*user = current_user();
if (user_shm_lock(size, *user)) {
task_lock(current);
pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is deprecated\n",
current->comm, current->pid);
task_unlock(current);
} else {
*user = NULL;
return ERR_PTR(-EPERM);
}
}
...
}
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The installation path was changed recently (See 'prefix' in the
Makefile). I forgot to update the README with those changes.
Fix it.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explain also why headers are needed.
And some ffix.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It is only used for providing 'sigset_t'. We're only documenting
(with some exceptions) the includes needed for constants and the
prototype itself. And 'sigset_t' is better documented in
system_data_types(7). Remove that include.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
All of the constants used by mknod() are defined in <sys/stat.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
AFAICS, there's no use for <unistd.h> here. The prototype is
declared in <sys/mman.h>, and there are no constants needed.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove the libkeyutils prototype from the synopsis, which isn't
documented in the rest of the page, and as NOTES says, it's
probably better to use the various library functions.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The constants needed for using this function are defined in
<linux/ipc.h>. Add the include, even when those constants are not
mentioned in this manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Of course that is for the glibc wrapper. As all of the other
pages that don't explicitly say otherwise.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In this case there's a wrapper provided by libaio,
but this page documents the raw syscall.
Also remove <linux/time.h> from the includes: 'struct timespec'
is already documented in system_data_types(7), where the
information is more up to date.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In this case there's a wrapper provided by libaio,
but this page documents the raw syscall.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
At the same time, document only headers that are required
for calling the function, or those that are specific to the
function:
<unistd.h> is required for the syscall() prototype.
<sys/syscall.h> is required for the syscall name SYS_xxx.
<linux/futex.h> is specific to this syscall.
However, uint32_t is generic enough that it shouldn't be
documented here. The system_data_types(7) page already documents
it, and is more precise about it. The same goes for timespec.
As a general rule a man[23] page should document the header that
includes the prototype, and all of the headers that define macros
that should be used with the call. However, the information about
types should be restricted to system_data_types(7) (and that page
should probably be improved by adding types), except for types
that are very specific to the call. Otherwise, we're duplicating
info and it's then harder to maintain, and probably outdated in
the future.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This complements commit e3eba861bd.
Since we don't need syscall(2) anymore, we don't need SYS_* definitions.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
On 5/10/21 7:13 PM, Alejandro Colomar (man-pages) wrote:
> Hi Michael,
>
> On 5/10/21 1:39 AM, Michael Kerrisk (man-pages) wrote:
>>> - Specify shebang
>>
>> Why? It's not quite obvious to me, and the commit message
>> should really explain...
>
> Hmmm. I have some minor reasons to add it, but not a really good one.
>
> * Some editors don't recognize 'Makefile' as a special name, so the
> shebang helps detecting which language the file is using (e.g., for
> coloring).
>
> * I tend to subdivide a big Makefile into a small Makefile and many
> submakefiles stored in <./libexec/>. Those obviously need different
> names, and given that the makefile extension is not very standard (I use
> .mk), having a shebang helps knowing what the file is. After that, I
> also have it on the main Makefile for consistency. But here we only
> have one Makefile, so it doesn apply very much.
I think I'll remove it. It is kind of idiosyncratic, leaves the
reader asking "why?".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Ignore everything new by default.
This avoids having to update the .gitignore when we need to ignore
something new. It also avoids accidents that may add an unwanted
temporary file.
Cc: Debian man-pages <manpages@packages.debian.org>
Cc: Dr. Tobias Quathamer <toddy@debian.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fpurge(i_stream) does the same as fflush(i_stream), AFAIK.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
AT_EMPTY_PATH works with empty strings (""), but not with NULL
(or at least it's not obvious).
The relevant kernel code is the following:
linux$ sed -n 189,198p fs/namei.c
result->refcnt = 1;
/* The empty path is special. */
if (unlikely(!len)) {
if (empty)
*empty = 1;
if (!(flags & LOOKUP_EMPTY)) {
putname(result);
return ERR_PTR(-ENOENT);
}
}
Reported-by: Walter Harms <wharms@bfs.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See <bits/byteswap.h> in glibc.
These macros call functions of the form __bswap_N(),
which use uintN_t.
Even though it's true that they are macros,
it's transparent to the user.
The user will see their results casted to unsigned types
after the conversion due to the underlying functions,
so it's better to document these as the underlying functions,
specifying the types.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in
pthread_attr_getinheritsched().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_attr_getinheritsched
sysdeps/htl/pthread.h:90:
extern int pthread_attr_getinheritsched (const pthread_attr_t *__restrict __attr,
int *__restrict __inheritsched)
__THROW __nonnull ((1, 2));
sysdeps/nptl/pthread.h:313:
extern int pthread_attr_getinheritsched (const pthread_attr_t *__restrict
__attr, int *__restrict __inherit)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'C library/kernel differences' was added to BUGS incorrectly.
Fix it
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Those pages didn't exist. Fix the section number.
I noticed the typo thanks to the HTML pages on man7.org.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Prerequisites can run in parallel. This wouldn't make any sense
when uninstalling and installing again.
For that, use consecutive commands, which run one after the other
even with multiple cores.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
IMPORTANT for distributions:
This changes prefix to be '/usr/local' as is expected by default,
instead of the old '/usr' value.
- Use standard variables:
- prefix should be '/usr/local'
- mandir (instead of MANDIR)
- htmldir (instead of HTDIR)
- ...
see <https://www.gnu.org/software/make/manual/html_node/Directory-Variables.html>
- Use standard targets:
- html (build html files; don't install them)
- install-html (instead of html)
- installdirs (instead of 'mkdir -p'/'install -d' inside other targets)
- ...
see <https://www.gnu.org/software/make/manual/html_node/Standard-Targets.html#Standard-Targets>
- Use .PHONY
- ?= is not needed. User input overrides any assignment. Use =
- Use standard command variables, instead of directly calling commands.
- $(INSTALL_DATA) (instead of install -m 644)
- $(INSTALL_DIR) (instead of install -d -m 755 or mkdir -p)
see <https://www.gnu.org/software/make/manual/html_node/Command-Variables.html#Command-Variables>
- Specify SHELL = /bin/bash
- Specify shebang
- Allow variable html extension (or no extension at all)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
mkdir -p doesn't fail if the directory already exists.
Remove redundant checks.
Use .html as default HTDIR.
Remove checks for undefined HTDIR.
Show what the target does, as with other targets (remove '@').
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I clarified the code about two things:
- Checking how many arguments are being passed.
Here, some functions didn't reject extra arguments when they
weren't being used. Fix that.
I also changed the code to use $#, which is more explicit.
And use arithmetic expressions, which better indicate that
we're dealing with numbers.
- Remove unneeded options from sort.
Reported-by: Stefan Puiu <stefan.puiu@gmail.com>
After Stefan asked about why am I using 'sort -V',
I noticed that I really don't need '-V', and it may confuse
people trying to understand the script, so even though I
slightly prefer the output of 'sort -V', in this case, it's
better to use the simpler 'sort' (yet I need 'sort', to
maintain consistency in the results (find is quite random)).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix the error messages to clearly show that both dirs and manual
pages are accepted, and that more than one argument is accepted.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch makes man_lsfunc() search for the function prototypes,
instead of relying on the current manual page formatting,
which might change in the future, and break this function.
It also simplifies the code, by reusing man_section().
Create a new function sed_rm_ccomments(), which is needed by
man_lsfunc(), and may also be useful in other cases.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The output of 'git status' is not stable.
The more stable 'git status --porcelain' is more complex,
and scripting around it would be more complex.
However, 'git diff --staged --name-only' produces
the output that we were lookiong for.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Neither POSIX or glibc use 'const' in
pthread_mutexattr_setrobust().
Remove it.
.../glibc$ grep_glibc_prototype pthread_mutexattr_setrobust
sysdeps/htl/pthread.h:355:
extern int pthread_mutexattr_setrobust (pthread_mutexattr_t *__attr,
int __robustness)
__THROW __nonnull ((1));
sysdeps/nptl/pthread.h:888:
extern int pthread_mutexattr_setrobust (pthread_mutexattr_t *__attr,
int __robustness)
__THROW __nonnull ((1));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The rest of the page writes the characters without naming them.
Follow that convention.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Improved the `getopt(3)` man page in the following ways:
1) Defined the existing term "legitimate option character".
2) Added an additional NOTE stressing that arguments are parsed in strict
order and the implications of this when numeric options are utilised.
Signed-off-by: James O. D. Hunt <jamesodhunt@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Despite my mention of this spawning a hilarious discussion
on IRC, this alignment restriction should be 128-bit, not
126-bit.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a missing "to" in an "in order to" formulation.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CIFS flock() locks behave differently than the standard. Give overview
of those differences.
Here is the rendered text:
CIFS details
In Linux kernels up to 5.4, flock() is not propagated over SMB. A file
with such locks will not appear locked for remote clients.
Since Linux 5.5, flock() locks are emulated with SMB byte-range locks
on the entire file. Similarly to NFS, this means that fcntl(2) and
flock() locks interact with one another. Another important side-effect
is that the locks are not advisory anymore: any IO on a locked file
will always fail with EACCES when done from a separate file descriptor.
This difference originates from the design of locks in the SMB proto-
col, which provides mandatory locking semantics.
Remote and mandatory locking semantics may vary with SMB protocol,
mount options and server type. See mount.cifs(8) for additional infor-
mation.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Discussion: linux-man <https://lore.kernel.org/linux-man/20210302154831.17000-1-aaptel@suse.com/>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In b0b19983d9 we removed
<sys/types.h>. For the same reasons there, remove now <sys/ipc.h>
from many pages.
If someone wonders why <sys/ipc.h> was needed, the reason was to
get all the definitions of IPC_* constants. However, that header
is now included by <sys/msg.h>, so it's not needed anymore to
explicitly include it. Quoting POSIX: "In addition, the
<sys/msg.h> header shall include the <sys/ipc.h> header."
There were some remaining cases where I forgot to remove
<sys/types.h>; remove them now too.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
AFAICS, all types and constants used by these functions are
defined in <fcntl.h>.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A few architectures have a different call signature for pipe().
Since those architectures are the minority, place the prototype
at the end of the SYNOPSIS, rather than the start.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
SEE ALSO: move pam_namespace(8) from namespaces(7) to
mount_namespaces(7) (since pam_namespace(8) makes use of
mount namespaces specifically).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Userfaultfd write-protect mode is supported starting from Linux 5.7.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
[alx: ffix + srcfix]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
UFFD_FEATURE_THREAD_ID is supported in Linux 4.14.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Write-protect mode is supported starting from Linux 5.7.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
UFFD_FEATURE_THREAD_ID is supported since Linux 4.14.
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
[alx: srcfix]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
My initial reading of this was that type modifiers were probably
not supported. But they are, and this is actually documented
further up, in the type modifiers documentation. But to make it
clearer, let's copy the language that printf(3) has in its %n
section.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In this case there's a wrapper provided by libaio,
but this page documents the raw kernel syscall.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<linux/unistd.h> is not needed. We need <unistd.h> for syscall(),
and <sys/syscall.h> for SYS_exit_group.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add <linux/fcntl.h>, which contains AT_* definitions used by
execveat().
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The CLONE_* constants seem to be available from either
<linux/sched.h> or <sched.h>, and since clone3() already
includes <linux/sched.h> for 'struct clone_args', <sched.h>
is not really needed, AFAICS; however, to avoid confusion,
I also included <sched.h> for clone3() for consistency:
clone() is getting CLONE_* from <sched.h>, and it would confuse
the reader if clone3() got the same CLONE_* constants from a
different header.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
AFAICS, there's no reason to include that.
All of the macros that this function uses
are already defined in the other headers.
Cc: glibc <libc-alpha@sourceware.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page didn't specify includes, and the syscalls are extinct, so
instead of adding incomplete information about includes, just
leave it without any includes.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Only the include that provides the prototype doesn't need a comment.
Also sort the includes alphabetically.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Only the one that provides the prototype doesn't need a comment.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<linux/fs.h> doesn't seem to be needed!
Only the include that provides the prototype doesn't need a comment.
Also sort the includes alphabetically.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Only the include that provides the prototype doesn't need a comment.
Also sort the includes alphabetically.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Only the one that provides the prototype doesn't need a comment.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[mtk: Alex's change switches the comment to the more generally used
form "Definition of..."]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<sys/time.h> is not needed to get the function declaration nor any
constant used by the function. It was only needed (before
POSIX.1) to get 'struct timeval', but that information would be
more suited for system_data_types(7), and not for this page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<sys/time.h> is not required by any of the function declarations
or macro definitions used by these functions. It may be (or maybe
not) needed by some type inside the rlimit structure, but that
info belongs in system_data_types(7), not here.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
<sys/types.h> was only needed for size_t, AFAIK. That is already
(and more precisely) documented in system_data_types(7). Let's
remove it here, as it's not really needed for calling add_key().
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I couldn't find a reason for including <unistd.h>. All the macros
used by fcntl() are defined in <fcntl.h>. For comparison, FreeBSD
and OpenBSD don't specify <unistd.h> in their manual pages.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This function never returns to its caller.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
ENODATA is an XSI STREAMS extension (not base POSIX).
Linux reused the name for extended attributes.
The current manual pages don't use ENODATA with its POSIX
meaning, so use the xattr(7) specific text, and leave the POSIX
meaning for a secondary paragraph.
Reported-by: Mark Kettenis <kettenis@openbsd.org>
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Checked via the latest glibc source. execvpe calls getenv("PATH") and
searches that; the PATH in envp does not affect the search.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In Linux kernel 5.12, a new mode flag, MPOL_F_NUMA_BALANCING, is
added to set_mempolicy() to optimize the page placement among the
NUMA nodes with the NUMA balancing mechanism even if the memory of
the applications is bound with MPOL_BIND. This patch updates the
man page for the new mode flag.
Related kernel commits:
bda420b985054a3badafef23807c4b4fa38a3dff
[mtk: Minor fixes to commit message]
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: "Michael Kerrisk" <mtk.manpages@gmail.com>
[ alx: srcfix ]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The envp argument specifies the environment of the new process image,
not "the environment of the caller".
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The format string refers to the whole string passed in 'format'.
The syntax referred to is that of a conversion specification,
as called in the manual page.
Use specific language.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Can we add a small syntax structure for format string in printf(3)
manual. I personally find if easier to remember and scan. This has
been taken from OpenBSD printf(3) manual.
Signed-off-by: Utkarsh Singh <utkarsh190601@gmail.com>
[ alx: ffix ]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a sentence explaining what dup2() does in terms of file
descriptors and open file descriptions.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Sometimes people are confused, thinking a file descriptor is just a
number. To help avoid such confusions, add text highlighting that
a file descriptor is an index to an entry in the process's FD table.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As can be seen by any number of StackOverflow questions, people
persistently misunderstand what dup() does, and the existing manual
page text, which talks of "copying" a file descriptor doesn't help.
Rewrite the text a little to try to prevent some of these
misunderstandings, in particular noting at the start that dup()
allocates a new file descriptor.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
close_range() CLOSE_RANGE_USHARE triggers a call to dup_fd()
which in turn calls alloc_fdtable(), which checks that
sysctl_nr_open has not been exceeded.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current example program can't really be used to demonstrate the
effect of close_range(). Replace it by a program that does show the
effect of this system call.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This documents close_range(2) based on information in
278a5fbaed89dacd04e9d052f4594ffd0e0585de,
60997c3d45d9a67daf01c56d805ae4fec37e0bd8, and
582f1fb6b721facf04848d2ca57f34468da1813e.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The manual pages are already inconsistent in which headers need
to be included. Right now, not all of the types used by a
function have their required header included in the SYNOPSIS.
If we were to add the headers required by all of the types used by
functions, the SYNOPSIS would grow too much. Not only it would
grow too much, but the information there would be less precise.
Having system_data_types(7) document each type with all the
information about required includes is much more precise, and the
info is centralized so that it's much easier to maintain.
So let's document only the include required for the function
prototype, and also the ones required for the macros needed to
call the function.
<sys/types.h> only defines types, not functions or constants, so
it doesn't belong to man[23] (function) pages at all.
I ignore if some old systems had headers that required you to
include <sys/types.h> *before* them (incomplete headers), but if
so, those implementations would be broken, and those headers
should probably provide some kind of warning. I hope this is not
the case.
[mtk: Already in 2001, POSIX.1 removed the requirement to
include <sys/types.h> for many APIs, so this patch seems
well past due.]
Acked-by: Zack Weinberg <zackw@panix.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX uses 'restrict' in *wprintf() (see [v]fwprintf(3p)).
Let's use it here too.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In SYNOPSIS, shift arguments right a little to make the function
names stand out a little more.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wordexp().
Let's use it here too.
.../glibc$ grep_glibc_prototype wordexp
posix/wordexp.h:62:
extern int wordexp (const char *__restrict __words,
wordexp_t *__restrict __pwordexp, int __flags);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wmemcpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype wmemcpy
wcsmbs/wchar.h:262:
extern wchar_t *wmemcpy (wchar_t *__restrict __s1,
const wchar_t *__restrict __s2, size_t __n) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcstok().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcstok
wcsmbs/wchar.h:217:
extern wchar_t *wcstok (wchar_t *__restrict __s,
const wchar_t *__restrict __delim,
wchar_t **__restrict __ptr) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcscpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcscpy
wcsmbs/wchar.h:87:
extern wchar_t *wcscpy (wchar_t *__restrict __dest,
const wchar_t *__restrict __src)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcscat().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcscat
wcsmbs/wchar.h:97:
extern wchar_t *wcscat (wchar_t *__restrict __dest,
const wchar_t *__restrict __src)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcrtomb().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcrtomb
wcsmbs/wchar.h:301:
extern size_t wcrtomb (char *__restrict __s, wchar_t __wc,
mbstate_t *__restrict __ps) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcpncpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcpncpy
wcsmbs/wchar.h:556:
extern wchar_t *wcpncpy (wchar_t *__restrict __dest,
const wchar_t *__restrict __src, size_t __n)
__THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in wcpcpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype wcpcpy
wcsmbs/wchar.h:551:
extern wchar_t *wcpcpy (wchar_t *__restrict __dest,
const wchar_t *__restrict __src) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in tdelete().
Let's use it here too.
.../glibc$ grep_glibc_prototype tdelete
misc/search.h:138:
extern void *tdelete (const void *__restrict __key,
void **__restrict __rootp,
__compar_fn_t __compar);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in sigwait().
Let's use it here too.
.../glibc$ grep_glibc_prototype sigwait
signal/signal.h:255:
extern int sigwait (const sigset_t *__restrict __set, int *__restrict __sig)
__nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in strptime().
However, glibc doesn't specify 'restrict' for the last parameter.
Let's use the most restrictive form here
(although I believe both to be equivalent).
.../glibc$ grep_glibc_prototype strptime
time/time.h:95:
extern char *strptime (const char *__restrict __s,
const char *__restrict __fmt, struct tm *__tp)
__THROW;
.../glibc$
Cc: <libc-alpha@sourceware.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in stpcpy().
Let's use it here too.
.../glibc$ grep_glibc_prototype stpcpy
string/string.h:475:
extern char *stpcpy (char *__restrict __dest, const char *__restrict __src)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in statvfs().
Let's use it here too.
.../glibc$ grep_glibc_prototype statvfs
io/sys/statvfs.h:51:
extern int statvfs (const char *__restrict __file,
struct statvfs *__restrict __buf)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in sem_timedwait().
Let's use it here too.
.../glibc$ grep_glibc_prototype sem_timedwait
sysdeps/pthread/semaphore.h:62:
extern int sem_timedwait (sem_t *__restrict __sem,
const struct timespec *__restrict __abstime)
__nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in sem_getvalue().
Let's use it here too.
.../glibc$ grep_glibc_prototype sem_getvalue
sysdeps/pthread/semaphore.h:81:
extern int sem_getvalue (sem_t *__restrict __sem, int *__restrict __sval)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in realpath().
Let's use it here too.
.../glibc$ grep_glibc_prototype realpath
stdlib/stdlib.h:800:
extern char *realpath (const char *__restrict __name,
char *__restrict __resolved) __THROW __wur;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in readdir_r().
Let's use it here too.
.../glibc$ grep_glibc_prototype readdir_r
dirent/dirent.h:183:
extern int readdir_r (DIR *__restrict __dirp,
struct dirent *__restrict __entry,
struct dirent **__restrict __result)
__nonnull ((1, 2, 3)) __attribute_deprecated__;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in fputs().
Let's use it here too.
.../glibc$ grep_glibc_prototype fputs
libio/stdio.h:631:
extern int fputs (const char *__restrict __s, FILE *__restrict __stream);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
glibc uses 'restrict' in putgrent().
Let's use it here too.
.../glibc$ grep_glibc_prototype putgrent
grp/grp.h:93:
extern int putgrent (const struct group *__restrict __p,
FILE *__restrict __f);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in pthread_getschedparam().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_getschedparam
sysdeps/htl/pthread.h:882:
extern int pthread_getschedparam (pthread_t __thr, int *__restrict __policy,
struct sched_param *__restrict __param)
__THROW __nonnull ((2, 3));
sysdeps/nptl/pthread.h:426:
extern int pthread_getschedparam (pthread_t __target_thread,
int *__restrict __policy,
struct sched_param *__restrict __param)
__THROW __nonnull ((2, 3));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in
pthread_mutexattr_getpshared().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_mutexattr_getpshared
sysdeps/htl/pthread.h:368:
extern int pthread_mutexattr_getpshared(const pthread_mutexattr_t *__restrict __attr,
int *__restrict __pshared)
__THROW __nonnull ((1, 2));
sysdeps/nptl/pthread.h:830:
extern int pthread_mutexattr_getpshared (const pthread_mutexattr_t *
__restrict __attr,
int *__restrict __pshared)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in pthread_attr_getscope().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_attr_getscope
sysdeps/htl/pthread.h:125:
extern int pthread_attr_getscope (const pthread_attr_t *__restrict __attr,
int *__restrict __contentionscope)
__THROW __nonnull ((1, 2));
sysdeps/nptl/pthread.h:324:
extern int pthread_attr_getscope (const pthread_attr_t *__restrict __attr,
int *__restrict __scope)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in pthread_attr_getschedpolicy().
Let's use it here too.
.../glibc$ grep_glibc_prototype pthread_attr_getschedpolicy
sysdeps/htl/pthread.h:113:
extern int pthread_attr_getschedpolicy (const pthread_attr_t *__restrict __attr,
int *__restrict __policy)
__THROW __nonnull ((1, 2));
sysdeps/nptl/pthread.h:304:
extern int pthread_attr_getschedpolicy (const pthread_attr_t *__restrict
__attr, int *__restrict __policy)
__THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX uses 'restrict' in posix_spawnp().
However, glibc doesn't.
Let's document here the more restrictive of them, which is POSIX.
I reported a bug to glibc about this.
$ man 3p posix_spawnp |sed -n '/^SYNOPSIS/,/;/p'
SYNOPSIS
#include <spawn.h>
int posix_spawnp(pid_t *restrict pid, const char *restrict file,
const posix_spawn_file_actions_t *file_actions,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict], char *const envp[restrict]);
$
.../glibc$ grep_glibc_prototype posix_spawnp
posix/spawn.h:85:
extern int posix_spawnp (pid_t *__pid, const char *__file,
const posix_spawn_file_actions_t *__file_actions,
const posix_spawnattr_t *__attrp,
char *const __argv[], char *const __envp[])
__nonnull ((2, 5));
.../glibc$
I conciously did an exception with respect to the right margin
of the rendered page. Instead of having the right margin at 78
as usual (per Branden's recommendation), I let it use col 79
this time, to avoid breaking the prototype in an ugly way,
or shifting all of the parameters to the left, unaligned with
respect to the function parentheses.
Bug: glibc <https://sourceware.org/bugzilla/show_bug.cgi?id=27529>
Cc: G. Branden Robinson <g.branden.robinson@gmail.com>
Cc: glibc <libc-alpha@sourceware.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in posix_spawn().
Let's use it here too.
.../glibc$ grep_glibc_prototype posix_spawn
posix/spawn.h:72:
extern int posix_spawn (pid_t *__restrict __pid,
const char *__restrict __path,
const posix_spawn_file_actions_t *__restrict
__file_actions,
const posix_spawnattr_t *__restrict __attrp,
char *const __argv[__restrict_arr],
char *const __envp[__restrict_arr])
__nonnull ((2, 5));
.../glibc$
I conciously did an exception with respect to the right margin
of the rendered page. Instead of having the right margin at 78
as usual (per Branden's recommendation), I let it use col 79
this time, to avoid breaking the prototype in an ugly way,
or shifting all of the parameters to the left, unaligned with
respect to the function parentheses.
Cc: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in mq_setattr().
Let's use it here too.
.../glibc$ grep_glibc_prototype mq_setattr
rt/mqueue.h:51:
extern int mq_setattr (mqd_t __mqdes,
const struct mq_attr *__restrict __mqstat,
struct mq_attr *__restrict __omqstat)
__THROW __nonnull ((2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in mbtowc().
Let's use it here too.
.../glibc$ grep_glibc_prototype mbtowc
stdlib/stdlib.h:925:
extern int mbtowc (wchar_t *__restrict __pwc,
const char *__restrict __s, size_t __n) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in mbrlen().
Let's use it here too.
.../glibc$ grep_glibc_prototype mbrlen
wcsmbs/wchar.h:307:
extern size_t mbrlen (const char *__restrict __s, size_t __n,
mbstate_t *__restrict __ps) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX.1-2001 and glibc use 'restrict' in swapcontext().
Let's use it here too.
.../glibc$ grep_glibc_prototype swapcontext
stdlib/ucontext.h:41:
extern int swapcontext (ucontext_t *__restrict __oucp,
const ucontext_t *__restrict __ucp)
__THROWNL __INDIRECT_RETURN;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in lio_listio().
However, POSIX is a bit more restrictive than glibc
for the second parameter.
Let's document the more restrictive POSIX variant.
$ man 3p lio_listio |sed -n '/^SYNOPSIS/,/;/p'
SYNOPSIS
#include <aio.h>
int lio_listio(int mode, struct aiocb *restrict const list[restrict],
int nent, struct sigevent *restrict sig);
$
.../glibc$ grep_glibc_prototype lio_listio
rt/aio.h:148:
extern int lio_listio (int __mode,
struct aiocb *const __list[__restrict_arr],
int __nent, struct sigevent *__restrict __sig)
__THROW __nonnull ((2));
.../glibc$
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: "Joseph S. Myers" <joseph@codesourcery.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: glibc <libc-alpha@sourceware.org>
Bug: glibc <https://sourceware.org/bugzilla/show_bug.cgi?id=16747>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in inet_pton().
Let's use it here too.
.../glibc$ grep_glibc_prototype inet_pton
inet/arpa/inet.h:58:
extern int inet_pton (int __af, const char *__restrict __cp,
void *__restrict __buf) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in glob().
Let's use it here too.
.../glibc$ grep_glibc_prototype glob
posix/glob.h:146:
extern int glob (const char *__restrict __pattern, int __flags,
int (*__errfunc) (const char *, int),
glob_t *__restrict __pglob) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in getnameinfo().
Let's use it here too.
I consciously did an exception with respect to the right margin
of the rendered page. Instead of having the right margin at 78
as usual (per Branden's recommendation), I let it use col 79
this time, to avoid breaking the prototype in an ugly way.
.../glibc$ grep_glibc_prototype getnameinfo
resolv/netdb.h:675:
extern int getnameinfo (const struct sockaddr *__restrict __sa,
socklen_t __salen, char *__restrict __host,
socklen_t __hostlen, char *__restrict __serv,
socklen_t __servlen, int __flags);
.../glibc$
Cc: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX does NOT specify these functions to use 'restrict'.
However, glibc uses 'restrict' in getgrnam_r(), getgrgid_r().
Users might be surprised by this! Let's use it here too!
.../glibc$ grep_glibc_prototype getgrnam_r
grp/grp.h:148:
extern int getgrnam_r (const char *__restrict __name,
struct group *__restrict __resultbuf,
char *__restrict __buffer, size_t __buflen,
struct group **__restrict __result);
.../glibc$ grep_glibc_prototype getgrgid_r
grp/grp.h:140:
extern int getgrgid_r (__gid_t __gid, struct group *__restrict __resultbuf,
char *__restrict __buffer, size_t __buflen,
struct group **__restrict __result);
.../glibc$
Cc: glibc <libc-alpha@sourceware.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in fgetpos().
Let's use it here too.
glibc:
============================= fgetpos
libio/stdio.h:736:
int fgetpos (FILE *restrict stream, fpos_t *restrict pos);
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in these functions.
Let's use it here too.
glibc:
============================= fread
libio/stdio.h:651:
size_t fread (void *restrict ptr, size_t size,
size_t n, FILE *restrict stream) wur;
============================= fwrite
libio/stdio.h:657:
size_t fwrite (const void *restrict ptr, size_t size,
size_t n, FILE *restrict s);
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in fputws().
Let's use it here too.
glibc:
============================= fputws
wcsmbs/wchar.h:765:
int fputws (const wchar_t *restrict ws,
FILE *restrict stream);
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' for fgetws().
Let's use it here too.
glibc:
wcsmbs/wchar.h:758:
wchar_t *fgetws (wchar_t *restrict ws, int n,
FILE *restrict stream);
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both POSIX and glibc use 'restrict' in fgets().
Let's use it here too.
glibc:
libio/stdio.h:568:
char *fgets (char *restrict s, int n, FILE *restrict stream)
wur attr_access ((write_only__, 1, 2));
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX.1-2001 and glibc use 'restrict' for these functions.
Let's use it here too.
glibc:
============================= ecvt
stdlib/stdlib.h:872:
char *ecvt (double value, int ndigit, int *restrict decpt,
int *restrict sign) THROW nonnull ((3, 4)) wur;
============================= fcvt
stdlib/stdlib.h:878:
char *fcvt (double value, int ndigit, int *restrict decpt,
int *restrict sign) THROW nonnull ((3, 4)) wur;
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc uses 'restrict' for some of the functions in this page:
============================= drand48_r
stdlib/stdlib.h:501:
int drand48_r (struct drand48_data *restrict buffer,
double *restrict result) THROW nonnull ((1, 2));
============================= erand48_r
stdlib/stdlib.h:503:
int erand48_r (unsigned short int xsubi[3],
struct drand48_data *restrict buffer,
double *restrict result) THROW nonnull ((1, 2));
============================= lrand48_r
stdlib/stdlib.h:508:
int lrand48_r (struct drand48_data *restrict buffer,
long int *restrict result)
THROW nonnull ((1, 2));
============================= nrand48_r
stdlib/stdlib.h:511:
int nrand48_r (unsigned short int xsubi[3],
struct drand48_data *restrict buffer,
long int *restrict result)
THROW nonnull ((1, 2));
============================= mrand48_r
stdlib/stdlib.h:517:
int mrand48_r (struct drand48_data *restrict buffer,
long int *restrict result)
THROW nonnull ((1, 2));
============================= jrand48_r
stdlib/stdlib.h:520:
int jrand48_r (unsigned short int xsubi[3],
struct drand48_data *restrict buffer,
long int *restrict result)
THROW nonnull ((1, 2));
============================= srand48_r
stdlib/stdlib.h:526:
int srand48_r (long int seedval, struct drand48_data *buffer)
THROW nonnull ((2));
============================= seed48_r
stdlib/stdlib.h:529:
int seed48_r (unsigned short int seed16v[3],
struct drand48_data *buffer) THROW nonnull ((1, 2));
============================= lcong48_r
stdlib/stdlib.h:532:
int lcong48_r (unsigned short int param[7],
struct drand48_data *buffer)
THROW nonnull ((1, 2));
Let's use it here too.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of memcpy()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
It's especially important in memcpy(),
as it's been a historical source of bugs.
......
.../glibc$ grep_glibc_prototype memcpy
posix/regex_internal.h:746:
{
memcpy (dest, src, sizeof (bitset_t));
string/string.h:43:
extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
size_t __n) __THROW __nonnull ((1, 2));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Epoch is 1970-01-01 00:00:00 +0000, UTC (see time(7)).
Reported-by: Walter Franzini <walter.franzini@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both functions have the same header.
There's no reason to separate the prototypes repeating the header.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
RESOLVE_CACHED allows an application to attempt a cache-only open
of a file. If this isn't possible, the request will fail with
-1/EAGAIN and the caller should retry without RESOLVE_CACHED set.
This will generally happen from a different context, where a slower
open operation can be performed.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make it clear that netlink error responses (i.e., messages with
type NLMSG_ERROR (0x2)), can be longer than sizeof(struct
nlmsgerr). In certain circumstances, the payload can be longer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
That file should be sourced (.) from 'bashrc' (or 'bash_aliases').
It contains functions that are useful for the maintenance of this
project.
- grep_syscall()
- grep_syscall_def()
- man_section()
- man_lsfunc()
- pdfman()
- grep_glibc_prototype()
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX does NOT specify aio_suspend() to use 'restrict'.
However, glibc uses 'restrict'.
Users might be surprised by this! Let's use it here too!
......
.../glibc$ grep_glibc_prototype aio_suspend
rt/aio.h:167:
extern int aio_suspend (const struct aiocb *const __list[], int __nent,
const struct timespec *__restrict __timeout)
__nonnull ((1));
.../glibc$
Cc: libc-alpha@sourceware.org
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that [sig]longjmp() shall not return,
transferring control back to the caller of [sig]setjmp().
Glibc uses __attribute__((__noreturn__)) for [sig]longjmp().
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that pthread_exit() shall not return.
Glibc uses __attribute__((__noreturn__)).
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that exit() shall not return.
Glibc uses __attribute__((__noreturn__)).
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc uses __attribute__((__noreturn__)) for [v]err[x]().
These functions never return.
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that _exit() and _Exit() shall not return.
Glibc uses __attribute__((__noreturn__)).
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that abort() shall not return.
Glibc uses __attribute__((__noreturn__)).
Let's use standard C11 'noreturn' in the manual page.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Switching into the man? subdirectories when running man2html(1)
caused a bug where ".so dir/page.n" links were misinterpreted
(because the directory prefix was interpreted with respect to
the current directory)i, and consequently, the link files
were not correctly rendered. There's no need to switch into the
subdirectories.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This page uses some idiosyncratic mark-up involving the use of
a groff register. The mark-up actually makes no difference to
the formatted result, but does cause man2html(1) to emit error
messages, since it does not understand the mark-up. So, remove
that mark-up.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use the glibc prototypes instead of the kernel ones.
Exception: use 'int' instead of 'enum'.
......
.../glibc$ grep_glibc_prototype pciconfig_read
sysdeps/unix/sysv/linux/alpha/sys/io.h:72:
extern int pciconfig_read (unsigned long int __bus,
unsigned long int __dfn,
unsigned long int __off,
unsigned long int __len,
unsigned char *__buf) __THROW;
sysdeps/unix/sysv/linux/ia64/sys/io.h:57:
extern int pciconfig_read (unsigned long int __bus, unsigned long int __dfn,
unsigned long int __off, unsigned long int __len,
unsigned char *__buf);
.../glibc$ grep_glibc_prototype pciconfig_write
sysdeps/unix/sysv/linux/alpha/sys/io.h:78:
extern int pciconfig_write (unsigned long int __bus,
unsigned long int __dfn,
unsigned long int __off,
unsigned long int __len,
unsigned char *__buf) __THROW;
sysdeps/unix/sysv/linux/ia64/sys/io.h:61:
extern int pciconfig_write (unsigned long int __bus, unsigned long int __dfn,
unsigned long int __off, unsigned long int __len,
unsigned char *__buf);
.../glibc$ grep_glibc_prototype pciconfig_iobase
sysdeps/unix/sysv/linux/alpha/sys/io.h:66:
extern long pciconfig_iobase(enum __pciconfig_iobase_which __which,
unsigned long int __bus,
unsigned long int __dfn)
__THROW __attribute__ ((const));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
All but the last parameters of t[g]kill() use 'pid_t',
both in the kernel and glibc. Fix them.
......
.../linux/linux$ grep_syscall tkill
kernel/signal.c:3870:
SYSCALL_DEFINE2(tkill, pid_t, pid, int, sig)
include/linux/syscalls.h:685:
asmlinkage long sys_tkill(pid_t pid, int sig);
.../linux/linux$
.../gnu/glibc$ grep_glibc_prototype tgkill
sysdeps/unix/sysv/linux/bits/signal_ext.h:29:
extern int tgkill (__pid_t __tgid, __pid_t __tid, int __signal);
.../gnu/glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The kernel syscall uses 'loff_t', but the glibc wrapper uses 'off64_t'.
Let's document the wrapper prototype, as in other pages.
......
.../glibc$ grep_glibc_prototype splice
sysdeps/unix/sysv/linux/bits/fcntl-linux.h:398:
extern __ssize_t splice (int __fdin, __off64_t *__offin, int __fdout,
__off64_t *__offout, size_t __len,
unsigned int __flags);
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type of fsgid is git_t, and not uid_t. Fix it.
......
.../glibc$ grep_glibc_prototype setfsgid
sysdeps/unix/sysv/linux/sys/fsuid.h:31:
extern int setfsgid (__gid_t __gid) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
"Mibibytes" is a misspelling of "mebibytes",
but let's use more familiar "MiB" instead.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I just happened upon this inconsistent text while reading `man 2
execve`. The code in question landed in 2.6.23 as b6a2fea39318
("mm: variable length argument support").
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of timer_settime()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
......
.../glibc$ grep_glibc_prototype timer_settime
time/time.h:242:
extern int timer_settime (timer_t __timerid, int __flags,
const struct itimerspec *__restrict __value,
struct itimerspec *__restrict __ovalue) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc uses 'restrict' for the types of the parameters of statx().
Let's use it here too.
......
.../glibc$ grep_glibc_prototype statx
io/bits/statx-generic.h:60:
int statx (int __dirfd, const char *__restrict __path, int __flags,
unsigned int __mask, struct statx *__restrict __buf)
__THROW __nonnull ((2, 5));
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of sigaltstack()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
......
.../glibc$ grep_glibc_prototype sigaltstack
signal/signal.h:320:
extern int sigaltstack (const stack_t *__restrict __ss,
stack_t *__restrict __oss) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of getsockopt()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
......
.../glibc$ grep_glibc_prototype getsockopt
socket/sys/socket.h:208:
extern int getsockopt (int __fd, int __level, int __optname,
void *__restrict __optval,
socklen_t *__restrict __optlen) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX specifies that the parameters of getpeername()
shall be 'restrict'. Glibc uses 'restrict' too.
Let's use it here too.
......
.../glibc$ grep_glibc_prototype getpeername
socket/sys/socket.h:130:
extern int getpeername (int __fd, __SOCKADDR_ARG __addr,
socklen_t *__restrict __len) __THROW;
.../glibc$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page used 'hint' and 'advice' synonymously. This leaves the
reader wondering if the terms mean the same thing, or different
things. They mean the same thing, so use just one term.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than repeating the description of MADV_COLD and MADV_PAGEOUT
in two pages, centralize the discussion in madvise(2), and refer
from process_madvise(2) ro madvise(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Skimming open(2), I was surprised not to see tmpfs mentioned as a
filesystem supported by O_TMPFILE.
If I'm understanding correctly (I'm very possibly not!), tmpfs is
a filesystem built on shmem, so I think it's more correct (and
probably much more widely understandable) to refer to tmpfs here.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There are many slightly different prototypes for this syscall,
but none of them is like the documented one.
Of all the different prototypes,
let's document the asm-generic one.
This manual page was actually using a prototype similar to
mmap(2), but there's no glibc wrapper function called mmap2(2),
as the wrapper for this syscall is mmap(2). Therefore, the
documented prototype should be the kernel one.
......
.../linux$ grep_syscall mmap2
arch/csky/kernel/syscall.c:17:
SYSCALL_DEFINE6(mmap2,
unsigned long, addr,
unsigned long, len,
unsigned long, prot,
unsigned long, flags,
unsigned long, fd,
off_t, offset)
arch/microblaze/kernel/sys_microblaze.c:46:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long, fd,
unsigned long, pgoff)
arch/nds32/kernel/sys_nds32.c:12:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, pgoff)
arch/powerpc/kernel/syscalls.c:60:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, pgoff)
arch/riscv/kernel/sys_riscv.c:37:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, off_t, offset)
arch/s390/kernel/sys_s390.c:49:
SYSCALL_DEFINE1(mmap2, struct s390_mmap_arg_struct __user *, arg)
arch/sparc/kernel/sys_sparc_32.c:101:
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags, unsigned long, fd,
unsigned long, pgoff)
arch/ia64/include/asm/unistd.h:30:
asmlinkage unsigned long sys_mmap2(
unsigned long addr, unsigned long len,
int prot, int flags,
int fd, long pgoff);
arch/ia64/kernel/sys_ia64.c:139:
asmlinkage unsigned long
sys_mmap2 (unsigned long addr, unsigned long len, int prot, int flags, int fd, long pgoff)
arch/m68k/kernel/sys_m68k.c:40:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
arch/parisc/kernel/sys_parisc.c:275:
asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags, unsigned long fd,
unsigned long pgoff)
arch/powerpc/include/asm/syscalls.h:15:
asmlinkage long sys_mmap2(unsigned long addr, size_t len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
arch/sh/include/asm/syscalls.h:8:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
arch/sh/kernel/sys_sh.c:41:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
arch/sparc/kernel/systbls.h:23:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
include/asm-generic/syscalls.h:14:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff);
.../linux$
function grep_syscall()
{
if ! [ -v 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <syscall>";
return ${EX_USAGE};
fi
find * -type f \
|grep '\.c$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^\w*SYSCALL_DEFINE.\(${1},.*?\)" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';
find * -type f \
|grep '\.[ch]$' \
|sort -V \
|xargs pcregrep -Mn "(?s)^asmlinkage\s+[\w\s]+\**sys_${1}\s*\(.*?\)" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';
}
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The documented prototype for mlock2() was a mix of the
glibc wrapper prototype and the kernel syscall prototype.
Let's document the glibc wrapper prototype, which is shown below.
......
.../glibc$ grep_glibc_prototype mlock2
sysdeps/unix/sysv/linux/bits/mman-shared.h:55:
int mlock2 (const void *__addr, size_t __length, unsigned int __flags) __THROW;
.../glibc$
function grep_glibc_prototype()
{
if ! [ -v 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <func>";
return ${EX_USAGE};
fi
find * -type f \
|grep '\.h$' \
|sort -V \
|xargs pcregrep -Mn \
"(?s)^[^\s#][\w\s]+\s+\**${1}\s*\([\w\s()[\]*,]*?(...)?\)[\w\s()]*;" \
|sed -E 's/^[^:]+:[0-9]+:/&\n/';
}
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There seems to be no reason <unistd.h> is shown here, so remove it.
Signed-off-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These pages have the odd wording 'the external variable errno',
which does not occur in other pages. Make these pages conform with
the norm.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fileno(3) differs from the other functions in various ways.
For example, it is governed by different standards,
and can set 'errno'. Conversely, the other functions
are about examining the status of a stream, while
fileno(3) simply obtains the underlying file descriptor.
Furthermore, splitting this function out allows
for some cleaner upcoming changes in ferror(3).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Calling ipc() directly would be a rather unusual thing to do,
so add some text to emphasize that point.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I filed a bug against glibc
requesting the wrapper for the new syscall.
Glibc bug: <https://sourceware.org/bugzilla/show_bug.cgi?id=27359>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The 'advice' subsection fell in the middle of other text in the
DESCRIPTION, which is a little confusing. Instead, move that
subsection to the end of the DESCRIPTION, and make some other
minor text reorganization so that related details are placed in
the same paragraphs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'malloc_trim' was and is never called from the 'free' function.
see related bug in glibc tracker:
https://sourceware.org/bugzilla/show_bug.cgi?id=2531. or
'__int_free' function. Only the top part of the heap is trimmed
after some calls to 'free', which is different from 'malloc_trim'
which also releases memory in between chunks from all the
arenas/heaps.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Move the the text describing how to set environment variable before
the list(s) of variables in order to improve readability.
[mtk: rewrote commit message]
Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clearly document that HOME, LOGNAME, SHELL and USER are set at
login time by a program like such as login(1).
Document also that using su could result in a mixed environment,
and point to the su(1) manual page.
[mtk: edited commit message]
Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add more details of how PATH is used, and mention the legacy
use of an empty prefix.
Changed after a suggested patch by Bastien Roucariès.
Reported-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Unlike SIOCGIFADDR and SIOCSIFADDR which are supported by many
protocol families, SIOCDIFADDR is supported by AF_INET6 and
AF_APPLETALK only.
Unlike other protocols, AF_INET6 uses struct in6_ifreq.
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the name=value system and that nul byte is forbidden.
Signed-off-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Say "execve(2)" instead of "exec(3)", and note that this step
starts a new program (not a new process!).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Using _GNU_SOURCE to obtain the declaration of 'environ' is
nonstandard. Therefore, move the mention of this detail to NOTES.
At the same time, add a few words proposed by Bastien.
Cowritten-by: Bastien Roucariès <rouca@debian.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In man-pages-5.11, a large number of pages were edited to achieve
greater consistency in the SYNOPIS, RETURN VALUE and ATTRIBUTES
sections. To avoid future inconsistencies, try to capture some of
the preferred conventions in text in man-pages(7).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The msg_name field for recvmsg() call points to a caller-allocated buffer
nladdr that is used to return the source address of the (netlink) socket.
As recvmsg() does not read this buffer and fills it for a caller, do not
initialize it and instead check its value in the example.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Drop the reference to the Hacker Writing Guide (and the broken URL)
and simply note that the logical quoting style is the norm in
European languages also.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A real minus can be cut and pasted...
THere are a few exceptions that gave been excluded in the this
change. For example, where there' is a string such as "<p1-name>",
where p1-name is soome sort of pseudo-identifier.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It works both way, but this one feels more right. We are reading
four elements sizeof(*buffer) bytes each.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Verified from reading the kernel source and looking at the source
of mount(8). Surprisingly, this has not documented after so many
years.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
mtk: Enke later noted that this patch provides better documentation
of longstanding behavior (rather documenting a change in behavior).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The environ(7) man page says:
SHELL The pathname of the user's login shell.
PAGER The user's preferred utility to display text files.
EDITOR/VISUAL
The user's preferred utility to edit text files.
but doesn't say whether the pathnames must be absolute or they can
be resolved using $PATH, or whether they can have options.
Note that at least for SHELL, this is not specified by POSIX.
This issue was raised in the Austin Group mailing-list, and the
answer is that "what constitutes a valid value for a platform
should be documented" [1].
Since OpenSSH assumes that $SHELL is an absolute pathname (when
set), it is supposed that the documentation should be:
SHELL The absolute pathname of the user's login shell.
For PAGER, POSIX says: "Any string acceptable as a command_string
operand to the sh -c command shall be valid."
For EDITOR, it does not need to be an absolute pathname since
POSIX gives the example:
EDITOR=vi fc
and since it is specified as "the name of a utility", It assumes
that arguments (options) must not be provided. Page 3013 about
"more", it is said: "If the last pathname component in EDITOR is
either vi or ex, [...]", thus again, it is assumed to be a
pathname.
For VISUAL, POSIX says: "Determine a pathname of a utility to
invoke when the visual command [...]", thus it is also a pathname.
It is not clear whether the pathname must be absolute, but for
consistency with EDITOR, it will be resolved using $PATH.
[1] https://www.mail-archive.com/austin-group-l@opengroup.org/msg01399.html
Reported-by: Vincent Lefevre <vincent@vinc17.net>
Signed-off-by: Bastien Roucaries <rouca@debian.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When I inherited man-pages in 2004, it was a hodge-podge mix of
American vs British spelling. My native spelling is the latter,
but I value consistency and felt that things needed to be
standardized on one or other, and in computing, American is the
norm so that is what I settled on.
Among the changes was the substitution of various instances
of "acknowledgement" for "acknowledgment". The latter spelling is
not one I care for, but I believed it to be the American norm.
Alex Colomar proposed a patch to change the spelling back
to "acknowledgement", and some discussion and investigation
ensued, whereby I learned the following:
* The situation is not clear cut.
* Historically, "acknowledgment" was the norm in British English,
but was eclipsed by "acknowledgement" some decades ago.
* Even in American English, "acknowledgment" is not universal,
and "acknowledgement" has become more common in recent decades
(although it still remains minority usage) [2].
* The BSD license uses "acknowledgement" even though it was
(presumably) written in California.
* The POSIX standard uses "acknowledgement".
* The Debian BTS uses "acknowledgement".
* Looking at a corpus of manual pages from various systems
that I have assembled over the years, "acknowledgement" seems
a little more common than "acknowledgment".
Summary: the situation is not clear cut, but let's follow BSD,
POSIX, and the personal preference of the man-pages maintainers.
[1] https://books.google.com/ngrams/graph?content=acknowledgment%2Cacknowledgement&year_start=1800&year_end=2019&corpus=29&smoothing=3#
[2] https://books.google.com/ngrams/graph?content=acknowledgment%2Cacknowledgement&year_start=1800&year_end=2000&corpus=5&smoothing=3&
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For the alternate signal stack to be cleared, CLONE_VM should and
CLONE_VFORK should not be specified.
[mtk: fixes my commit 52e5819c41]
Signed-off-by: Johannes Wellhöfer <johannes.wellhofer@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
man-pages bug: 211029
https://bugzilla.kernel.org/show_bug.cgi?id=211029
Complete workaround example
(it was too long for the page, but it may be useful here):
......
$ sudo ln -s -T /usr/bin/echo /usr/bin/-echo;
$ cc -o system_hyphen -x c - ;
#include <stdlib.h>
int
main(void)
{
system(" -echo Hello world!");
exit(EXIT_SUCCESS);
}
$ ./system_hyphen;
Hello world!
Reported-by: Ciprian Dorin Craciun <ciprian.craciun@gmail.com>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
NETLINK_CAP_ACK option was introduced in commit 0a6a3a23ea6e which first
appeared in Linux version 4.3 and not 4.2.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
rtnetlink is not only used for IPv4
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The parentheses here make it look like a function rather than a
command.
This was a typo introduced by a script-assisted global edit.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It's been a long time sine kernel 3.19.
There's still no glibc wrapper.
......
$ grep -rn 'execveat *(' glibc/
$
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Glibc uses 'void *' instead of 'char *'.
And the prototype is declared in <sys/cachectl.h>.
......
$ syscall='cacheflush';
$ ret='int';
$ find glibc/ -type f -name '*.h' \
|xargs pcregrep -Mn "(?s)^[\w\s]*${ret}\s*${syscall}\s*\(.*?;";
glibc/sysdeps/unix/sysv/linux/nios2/sys/cachectl.h:27:
extern int cacheflush (void *__addr, const int __nbytes, const int __op) __THROW;
glibc/sysdeps/unix/sysv/linux/mips/sys/cachectl.h:35:
extern int cacheflush (void *__addr, const int __nbytes, const int __op) __THROW;
glibc/sysdeps/unix/sysv/linux/arc/sys/cachectl.h:30:
extern int cacheflush (void *__addr, int __nbytes, int __op) __THROW;
glibc/sysdeps/unix/sysv/linux/csky/sys/cachectl.h:30:
extern int cacheflush (void *__addr, const int __nbytes,
const int __op) __THROW;
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Expand the epoll_wait() page with epoll_pwait2(), an epoll_wait()
variant that takes a struct timespec to enable nanosecond
resolution timeout.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A file descriptor is an int so it should be stored through an int
pointer while parent_tid should have the same type as child_tid
which is pid_t pointer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A more detailed notice is on realloc(3p).
......
$ man 3p realloc \
|sed -n \
-e '/APPLICATION USAGE/,/^$/p' \
-e '/FUTURE DIRECTIONS/,/^$/p';
APPLICATION USAGE
The description of realloc() has been modified from pre‐
vious versions of this standard to align with the
ISO/IEC 9899:1999 standard. Previous versions explicitly
permitted a call to realloc(p, 0) to free the space
pointed to by p and return a null pointer. While this be‐
havior could be interpreted as permitted by this version
of the standard, the C language committee have indicated
that this interpretation is incorrect. Applications
should assume that if realloc() returns a null pointer,
the space pointed to by p has not been freed. Since this
could lead to double-frees, implementations should also
set errno if a null pointer actually indicates a failure,
and applications should only free the space if errno was
changed.
FUTURE DIRECTIONS
This standard defers to the ISO C standard. While that
standard currently has language that might permit real‐
loc(p, 0), where p is not a null pointer, to free p while
still returning a null pointer, the committee responsible
for that standard is considering clarifying the language
to explicitly prohibit that alternative.
Bug: 211039 <https://bugzilla.kernel.org/show_bug.cgi?id=211039>
Reported-by: Johannes Pfister <johannes.pfister@josttech.ch>
Cc: libc-alpha@sourceware.org
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
.PP are redundant just after .SH or .SS.
Remove them.
$ find man? -type f \
|xargs sed -i '/^\.S[HS]/{n;/\.PP/d}';
Plus a couple manual edits.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This is implied in every other manual page. There is no need to
state it explicitly in these pages.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix a glitch in commit ff91beca5b.
Reported-by: Alejandro Colomar (man-pages) <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
.PP and .IP are redundant just before .SH or .SS.
Remove them.
$ find man? -type f \
|xargs sed -i '/^\.[IP]P$/{N;s/.*\n\(\.S[HS]\)/\1/}';
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since kernel commit a280d6dc77eb
("ipc/sem: introduce semctl(SEM_STAT_ANY)"),
it only skips read access check when using SEM_STAT_ANY command.
And it should use the semid_ds struct instead of seminfo struct.
Fix this.
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the RETURN VALUE sections, a number of different wordings
are used in to describe the fact that 'errno' is set on error.
There's no reason for the difference in wordings, since the same
thing is being described in each case. Switch to a standard
wording that is the same as FreeBSD and similar to the wording
used in POSIX.1.
In this change, miscellaneous descriptions of the setting
of 'errno' are reworded to the norm of "is set to indicate
the error".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the RETURN VALUE sections, a number of different wordings
are used in to describe the fact that 'errno' is set on error.
There's no reason for the difference in wordings, since the same
thing is being described in each case. Switch to a standard
wording that is the same as FreeBSD and similar to the wording
used in POSIX.1.
In this change, reword various cases saying that 'errno' is set
"appropriately" to "is set to indicate the error".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the RETURN VALUE sections, a number of different wordings
are used in to describe the fact that 'errno' is set on error.
There's no reason for the difference in wordings, since the same
thing is being described in each case. Switch to a standard
wording that is the same as FreeBSD and similar to the wording
used in POSIX.1.
In this change, fix some instances stating that 'errno' is set
"appropriately" to instead say "to indicate the error".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the RETURN VALUE sections, a number of different wordings
are used in to describe the fact that 'errno' is set on error.
There's no reason for the difference in wordings, since the same
thing is being described in each case. Switch to a standard
wording that is the same as FreeBSD and similar to the wording
used in POSIX.1.
In this change, "to indicate the cause of the error"
is changed to "to indicate the error".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current mark-up renders poorly. To resolve this, move
the type information into a separate line.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since we are using .nf/.fi to bracket FTM info, escaping
space characters serves no space and clutters the source.
Reported-by: Alejandro Colomar (man-pages) <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- Group macros by kinds.
- Align so that it's easiest to distinguish differences
between related macros.
(Align all continuations for consistency on PDF.)
- Fix minor typos.
- Remove redundant text:
'The macro xxx() ...':
The first paragraph already says that these are macros.
'circular|tail|... queue':
Don't need to repeat every time.
Generic text makes it easier to spot the differences.
- Fit lines into 78 columns.
- Reorder descriptions to match SYNOPSIS,
and add subsections to DESCRIPTION.
- srcfix: fix a few semantic newlines.
I noticed a bug which should be fixed next:
CIRCLEQ_LOOP_*() return a 'struct TYPE *'.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change `RES_USE_EDNSO` to `RES_USE_EDNS0`, defined in
`resolv.h`. (This is written correctly in `man3/resolver.3` in this
same repo.) Helps with grepping and internet searches!
Signed-off-by: John Morris <john@zultron.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Various ATTRIBUTES table improvements following the previous
commit. In particular, make use of T{...T} to allow wrapping
in table cells that have a lot of text.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make the formatting more consistent inside the tables in the
ATTRIBUTES sections. Make the source code more uniform; in
particular, eliminate the use of custom tweaks using
'lbwNN'/'lwNN' and .br macros. In addition, ensure that
hyphenation and text justification do not occur inside the tables.
This is a script-driven edit:
[[
PAGE_LIST=$(git grep -l 'SH ATTRIBUTES' man[23])
# Strip out any preexisting .sp/.br/.ad macros
sed -i '/SH ATTR/,/^\.SH/{/^\.sp/d; /^\.br/d; /\.ad/d}' $PAGE_LIST
# Eliminate any use of 'wNN' in tables; default first column
# to fill unused space
sed -i '/SH ATTR/,/^\.SH/s/lbw[0-9]*/lb/g' $PAGE_LIST
sed -i '/SH ATTR/,/^\.SH/s/lw[0-9]*/l/g' $PAGE_LIST
sed -i '/SH ATTR/,/^\.SH/s/^lb /lbx /' $PAGE_LIST
# Nest the tables inside ".ad l"+".nh" and ".hy"+".ad"+".sp 1"
# ".ad l" ==> no right justification of text in table cells
# ".nh" ==> No hyphenation in table cells
# ".sp 1" ==> ensure a blank line before the next section heading
sed -i '/SH ATTR/,/^\.SH/{/\.TS/i.ad l\n.nh
}' $PAGE_LIST
sed -i '/SH ATTR/,/^\.SH/{/\.TE/a.hy\n.ad\n.sp 1
}' $PAGE_LIST
# In a few of the tables, the third column has a lot of text, so
# make that column wide (rather than the first column)
sed -i '/^lbx/{s/lbx/lb/;s/lb$/lbx/}' \
man3/bindresvport.3 \
man3/fmtmsg.3 man3/gethostbyname.3 man3/getlogin.3 \
man3/getnetent.3 man3/getprotoent.3 man3/getpwent.3 \
man3/getservent.3 man3/getspnam.3 man3/getutent.3 man3/glob.3 \
man3/login.3 \
man3/setnetgrent.3 \
man3/wordexp.3
]]
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the SYNOPSIS, a long function prototype may need to be
continued over to the next line. The continuation line is
indented according to the following rules:
1. If there is a single such prototype that needs to be continued,
then align the continuation line so that when the page is
rendered on a fixed-width font device (e.g., on an xterm) the
continuation line starts just below the start of the argument
list in the line above. (Exception: the indentation may be
adjusted if necessary to prevent a very long continuation line
or a further continuation line where the function prototype is
very long.)
Thus:
int tcsetattr(int fd, int optional_actions,
const struct termios *termios_p);
2. But, where multiple functions in the SYNOPSIS require
continuation lines, and the function names have different
lengths, then align all continuation lines to start in the
same column. This provides a nicer rendering in PDF output
(because the SYNOPSIS uses a variable width font where
spaces render narrower than most characters).
Thus:
int getopt(int argc, char * const argv[],
const char *optstring);
int getopt_long(int argc, char * const argv[],
const char *optstring,
const struct option *longopts, int *longindex);
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since _BSD_SOURCE is obsolete for quite some time now,
it should not be listed as the first FTM for lstat().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Generally, place '||' at start of a line, rather than the end of
the previous line.
Rationale: this placement clearly indicates that that each piece
is an alternative.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Different source styles are used in different pages to achieve the
same formatted output, and in some cases the source mark-up is a
rather convoluted combination of .RS/.RE/.TP/.PD macros. Simplify
this greatly, and unify all of the pages to use more or less the
same source code style. This makes the source code rather easier
to read, and may simplify future scripted global changes.
The feature test macro info is currently bracketed by .nf/.fi
pairs. This is not strictly necessary (i.e., it makes no
difference to the rendered output), but for the moment we keep
these "brackets" in case they may be replaced with something else.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This rather ancient FTM is not mentioned in other pages for
reasons discussed in feature_test_macros(7). Remove this FTM
from the three pages where it does appear.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The majority of pages use .nf/.fi in SYNOPSIS, but there are
still many that don't and use .br to achieve newlines. Fix many
of those. This brings greater consistency to the pages, which
eases editing and may ease future scripted edits to the pages.
Many of these changes were script-assisted, with some additional
manual edits.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Yet more clean-ups after commit
15d6565317.
Reported-by: Alejandro Colomar (man-pages) <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since glibc 2.29, there is a wrapper for getcpu(2).
The wrapper has only 2 arguments, omitting the unused
third system call argument. Rework the manual page
to reflect this.
Reported-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Most pages use 'unsigned int' (and the kernel too).
Make them all do so.
$ find man? -type f \
| xargs sed -i \
-e 's/unsigned \*/unsigned int */g'
-e 's/unsigned "/unsigned int "/g';
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The use of vertical white space in the SYNOPSIS sections
is rather inconsistent. Make it more consistent, subject to the
following heuristics:
* Prefer no blank lines between function signatures by default.
* Where many functions are defined in the SYNOPSIS, add blank
lines where needed to improve readability, possibly by using
blank lines to separate logical groups of functions.
Reported-by: Alejandro Colomar (man-pages) <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The functions are all declared in <netdb.h>. <sys/socket.h> is only
needed for the AF_* constants.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Bring a bit more consistency to Feature Test Macro information
(mainly .PP between differnt FTM lists).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use .PP (which gives a bit of vertical white space) rather than
.br to separate functions in Feature Test Macro requirement lists.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses 'unsigned int' instead of 'int' for the
'flags' parameter. As glibc provides no wrapper, use the same
type the kernel uses.
......
$ syscall='delete_module';
$ find linux/ -type f -name '*.c' \
|xargs pcregrep -Mn "(?s)^[\w_]*SYSCALL_DEFINE.\(${syscall},.*?\)";
linux/kernel/module.c:977:
SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
unsigned int, flags)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See previous commit.
This commit normalizes texts under sections other than SYNOPSIS
(most of them in NOTES).
Signed-off-by: Ganimedes Colomar <gacoan.linux@gmail.com>
Cowritten-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
To easily distinguish documentation about glibc wrappers from
documentation about kernel syscalls, let's have a normalized
'Note' in the SYNOPSIS, and a further explanation in the page body
(NOTES in most of them), as already happened in many (but not all)
of the manual pages for syscalls without a wrapper. Furthermore,
let's normalize the messages, following membarrier.2 (because it's
already quite extended), so that it's easy to use grep to find
those pages.
To find these pages, we used:
$ grep -rn wrapper man? | sort -V
and
$ grep -rni support.*glibc | sort -V
delete_module.2, init_module.2: glibc 2.23 is no longer
maintained, so we changed the notes about wrappers, to say that
there are no glibc wrappers for these system calls; see NOTES.
We didn't fix some obsolete pages such as create_module.2.
Signed-off-by: Ganimedes Colomar <gacoan.linux@gmail.com>
Cowritten-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rationale:
$ man 7 man-pages 2>/dev/null | sed -n /Paragraphs/,/^$/p
Paragraphs should be separated by suitable markers (usually
either .PP or .IP). Do not separate paragraphs using blank
lines, as this results in poor rendering in some output
formats (such as PostScript and PDF).
Fix:
$ sed -i -e '1,/^\.EX/s/^$/.PP/' -e '/^\.EE/,/^\.EX/s/^$/.PP/' man?/*
And then some manual adjustments.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I noticed this while working on some silly "hello, world"
programs, see https://git.sr.ht/~phf/hello-again if you're
curious. Disassembling sh4 code showed trap #31 all over the
place but the syscall(2) man page talked about trap #0x17 and
friends. Checking the kernel sources I got lucky in
arch/sh/kernel/entry-common.S where in commit 3623d138213ae Rich
Felker clarifies the situation.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Martin Sebor <msebor@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Files moved from .txt to .rst.
Also, drop / prefix from kernel source tree references.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The main point I was driving at with this patch was to fix
"Microsoft Window's FAT filesystems" (i.e., FAT filesystems which
belong to Microsoft Windows, which is decidedly wrong).
FAT32 first shipped with MS-DOS 7.1, as part of Windows 95 OSR2,
but it's a (relatively) simple logical extension of the previous
FATx filesystems (16 and 12 as we know and love them today, I
don't think the PC ever saw 8), hence the "VFAT" driver name ‒
calling FAT-anything a Windows filesystem would be a flat-out lie,
calling it a Microsoft filesystem would be, uh, facetious.
NTFS (as part of Windows NT), on the other hand, is wholly
different WRT the scope and feature-set (it does borrow some
layouting from FAT, but reading NTFS as FAT doesn't get you very
far, or much).
The replacing bit is also questionable, especially in a.d. 2020:
while it is true that you cannot install NT on FAT (after a
certain point? my memory ain't what it used to be), and must
therefore replace your existing FAT partitions with NTFS during
upgrades; Windows NT 4.0, the last product to be NT-branded came
out in 1996, i.e. you could not install Windows on FAT (and,
therefore, upgrade it to NTFS, replacing it) during my entire
lifetime.
Indeed, in $(date +%Y) we live in a post-NTFS world ‒ putting NTFS
in the same class as FAT beyond "is a filesystem" is a joke.
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Relevant Linux commits:
* moved to staging in 1bb8155080c652c4853e6228f8f0d262b3049699
(describe: v4.15-rc1-129-g1bb8155080c6) in Nov 2017,
described as "broken" and "obsolete"
* purged in bd32895c750bcd2b511bf93917bf7ae723e3d0b6
(describe: v4.17-rc3-1010-gbd32895c750b) in Jun 2018,
"since no one has complained or even noticed it was gone"
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since Linux kernel 3.12, tcp_syncookies can have the value 2,
which sends out cookies unconditionally.
Related kernel commits:
5ad37d5deee1ff7150a2d0602370101de158ad86
d8513df2598e5142f8a5c4724f28411936e1dfc7
Reported-by: Philip Rowlands <linux-kernel@dimebar.com>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Heinrich:
The strnlen.3 manpage has the following sentence:
"In doing this, strnlen() looks only at the first maxlen
characters in the string pointed to by s and never beyond
s+maxlen."
This sentence is self-contradictory:
The last visited character implied by "first maxlen
characters" is s[maxlen-1].
Given that "beyond a" does not include "a", the last visited
character implied by "never beyond s+maxlen" is s[maxlen].
A consistent sentence would be
"In doing this, strnlen() looks only at the first maxlen
characters in the string pointed to by s and never beyond
s+maxlen-1."
I would prefer
"In doing this, strnlen() looks only at the first maxlen
characters in the string pointed to by s and never beyond
s[maxlen-1]"
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses 'int' instead of 'long' for the return type.
As glibc provides no wrapper, use the same type the kernel uses.
......
$ grep -n wrapper man-pages/man2/subpage_prot.2
40:There is no glibc wrapper for this system call; see NOTES.
99:Glibc does not provide a wrapper for this system call; call it using
$ grep -rn SYSCALL_DEFINE.*subpage_prot linux/;
linux/arch/powerpc/mm/book3s64/subpage_prot.c:190:
SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,
$ sed -n /SYSCALL.*subpage_prot/,/^}/p \
linux/arch/powerpc/mm/book3s64/subpage_prot.c \
|grep return;
return -ENOENT;
return -EINVAL;
return -EINVAL;
return 0;
return -EFAULT;
return -EFAULT;
return err;
$ sed -n /SYSCALL.*subpage_prot/,/^}/p \
linux/arch/powerpc/mm/book3s64/subpage_prot.c \
|grep '\<err\>';
int err;
err = -ENOMEM;
err = -ENOMEM;
err = 0;
return err;
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This paragraph is a little bit hidden at the end of DESCRIPTION;
make it a little more prominent.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux kernel commit aae8a97d3ec30788790d1720b71d76fd8eb44b73 (part
of kernel release v2.6.39) added a check to disallow creating a
hardlink to an unlinked file.
The manual page already describes the trick of using
AT_SYMLINK_FOLLOW as an alternative to AT_EMPTY_PATH, and for
AT_EMPTY_PATH the manual page already notes that it "will
generally not work if the file has a link count of zero". However,
the precise error (ENOENT) is not mentioned, and the error case
isn't mentioned in the ERRORS section at all.
This makes it easy to overlook the fact that the AT_SYMLINK_FOLLOW
trick on /proc/self/fd/NN won't work on deleted files, as
evidenced by the follow message (which turns up when googling
"linkat deleted ENOENT"):
https://groups.google.com/g/linux.kernel/c/zZO4lqqwp64
Signed-off-by: Mathias Rav <m@git.strova.dk>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses 'pid_t' instead of 'long' for the return type.
As glibc provides no wrapper, use the same types the kernel uses.
$ sed -n 34,36p man-pages/man2/set_tid_address.2
.PP
.IR Note :
There is no glibc wrapper for this system call; see NOTES.
$ grep -rn 'SYSCALL_DEFINE.*set_tid_address' linux/
linux/kernel/fork.c:1632:
SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
$ sed -n 1632,1638p linux/kernel/fork.c
SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
{
current->clear_child_tid = tidptr;
return task_pid_vnr(current);
}
$ grep -rn 'task_pid_vnr(struct' linux/
linux/include/linux/sched.h:1374:
static inline pid_t task_pid_vnr(struct task_struct *tsk)
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Learned from an email converasation with Mike Crowe, and
verified by experiment.
Reported-by: Mike Crowe <mac@mcrowe.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As far as I can see, it is instead simply an alias for a
wrapper that calls _llseek(). Saying it's an alias for "llseek()"
seems confusing.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We're talking of a mix of wrapper functions and system calls
in this page. lseek() is both a system and a wrapper function,
and this page is mostly describing the wrapper
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
{.IR var [x]} -> {.I var[x]}
There were around 15 entries of the former,
and around 360 of the latter.
Found using:
$ grep -rn '^\.I[ |R].* \[.*\]' |sort
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Found using:
$ grep -rn '\\f., [^ ]*\\f. and' man?
I also updated the markup in that paragraph: \f -> .
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the comment of the example program, the peer blocks on fwait()
rather than fpost().
Signed-off-by: Jing Peng <pj.hades@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than mention these pages under the discussion of one
version of the standard, move that text to the end of the page,
where it is probably a little more obvious.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linking up the info presented on this page with the discussion
in getcontext(3) helps the reader.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The script can be used this way:
git commit -sm "$(./scripts/modified_pages.sh): Short commit msg"
And then maybe --amend and add a longer message.
This is especially useful for changes to many pages at once,
usually when running a script to apply some global changes.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
With this change, there remain almost no vestiges of information
about the long defunct Linux libc.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[.B XX_*] is the most extended form in the pages.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
faccessat2() was added in Linux 5.8 and enables a fix to
longstanding bugs in the faccessat() wrapper function.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The include path is linux/openat2, so fix the manual to reference
this correct path.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses a long as the return type for this syscall.
As glibc provides no wrapper, use the same types the kernel uses.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It is probably more sensible to place this section after
the subsection "Signal mask and pending signals".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a "big picture" of what happens when a signal handler
is invoked.
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the kernel sources (kernel/fork.c::copy_process()), we have:
/*
* sigaltstack should be cleared when sharing the same VM
*/
if ((clone_flags & (CLONE_VM|CLONE_VFORK)) == CLONE_VM)
sas_ss_reset(p);
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=12665.
The fix by Thomas Gleixner was in kernel commit
78c9c4dfbf8c04883941445a195276bb4bb92c76.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux kernel commit 2a36ab717e8fe678d98f81c14a0b124712719840
(part of 5.10 release) changed sys_membarrier prototype/parameters
and added two new commands [MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
and MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ]. This
man-pages patch reflects these changes, by mostly copying comments
from the kernel patch into the man-page ([Peter Oskolkov] was also
the author of the kernel change).
[mtk: commit message tweaked]
Signed-off-by: Peter Oskolkov <posk@google.com>
Cowritten-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux uses aio_context_t for these syscalls,
and it's the type provided by <linux/aio_abi.h>.
Use it in the SYNOPSIS.
libaio uses 'io_context_t', but that difference is already noted
in NOTES.
[mtk: patch slightly tweaked]
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses long as the return type for this syscall.
As glibc provides no wrapper, use the same type the kernel uses.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a page documenting the pthread_attr_setsigmask_np(3) and
pthread_attr_getsigmask_np() functions added in glibc 2.32.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
That comment wrapped on an 80-column terminal.
Divide it into two lines.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux kernel uses the following:
kernel/futex.c:3778:
SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
struct __kernel_timespec __user *, utime, u32 __user *, uaddr2,
u32, val3)
Since there is no glibc wrapper, use the same types the kernel uses.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[mtk: Applied patch manually]
getdents():
This function has no glibc wrapper.
As such, we should use the same types the Linux kernel uses:
Use 'long' as the return type.
getdents64():
The glibc wrapper uses:
ssize_t getdents64(int, void *, size_t);
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The text in NOTES doesn't really relate specifically to
the #include, so remove the comment on the #include.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Extend this page with the information about CAP_PERFMON capability
designed to secure performance monitoring and observability
operation in a system according to the principle of least
privilege [1] (POSIX IEEE 1003.1e, 2.2.2.39).
[1] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Format section names inside each type.
Follow the same pattern as in stat.2 (see line 158: ".IR Note :")
Before this ffix, it was visually harder to find sections inside a type.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CAP_BPF, CAP_PERFMON, and CAP_CHECKPOINT_RESTORE have all been
added to split out the power of CAP_SYS_ADMIN into weaker pieces.
Group all of these capabilities together in the list under
CAP_SYS_ADMIN, to make it clear that there is a pattern to these
capabilities.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since Linux 5.9, CONFIG_CHECKPOINT_RESTORE also allows writing to
/proc/sys/kernel/ns_last_pid.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
makedev(3) provides much more detail on this type, so mention it
in the description rather than in 'See also'.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
queue.3 has been moved to queue.7.
Fix SEE ALSO accordingly.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
queue has been for so many years in Section 3,
and still is in Section 3 in most manuals.
For legacy reasons,
especially because hyperlinks to the online manual pages
would break otherwise,
a link queue.3 -> queue(7) is necessary.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After forking slist.3, list.3, tailq.3, stailq.3 & circleq.3
in the previous commits,
this page no longer belongs in Section 3 of the manual pages.
According to its contents, the most suitable section is Section 7.
Because of legacy reasons, a link queue.3 -> queue(7)
would be appropriate.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This helps differentiate 'TYPE' in some arguments from
'struct TYPE *var' in others, and is technically more correct.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- ffix: Use man markup
- Remove specific notes about code size increase
and execution time increase,
as they were (at least) inaccurate.
Instead, a generic note has been added.
- Structure the text into subsections.
- Remove sections that were empty after the forks.
- Clearly relate macro names (SLIST, TAILQ, ...)
to a human readable name of which data structure
they implement.
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the section for /proc/[pid]/smaps, the description of field
ProtectionKey occurs twice: both before and after the description of
VmFlags.
Changes made by this patch:
1) Only the first occurrence is kept because its order matches the
output of /proc/[pid]/smaps.
2) The kernel version that CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS was
introduced is only mentioned in the second occurrence. Now it's moved
to the first one.
Signed-off-by: Jing Peng <pj.hades@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add remaining details to complete the page.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A few fixes to note:
- Sorted alphabetically some macros
- ffix: remove alignment spaces in example (as in list.3)
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
list.3: NAME: Add description
list.3: DESCRIPTION: Add short description
list.3: SEE ALSO: Add insque(3) and queue(3)
list.3: BUGS: Note LIST_FOREACH() limitations
list.3: RETURN VALUE: Add details about the return value of those macros that "return" a value
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
list.3: NAME: ffix: Use man markup
list.3: SYNOPSIS: ffix: Use man markup
list.3: DESCRIPTION: ffix: Use man markup
list.3: DESCRIPTION: ffix: Use man markup
list.3: CONFORMING TO: ffix: Use man markup
list.3: EXAMPLES: ffix: Use man markup
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
list.3: SYNOPSIS: Copy include from queue.3
list.3: DESCRIPTION: Copy description about naming of macros from queue.3
list.3: DESCRIPTION: Remove unrelated code to adapt to this page
list.3: DESCRIPTION: Remove lines pointing to the EXAMPLES
list.3: CONFORMING TO: Copy from queue.3
list.3: CONFORMING TO: Adapt to this page
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Time namespaces were added in kernel 5.6, but setns() support
for time namespaces was added only starting with kernel 5.8:
commit 76c12881a38aaa83e1eb4ce2fada36c3a732bad4
Author: Christian Brauner <christian.brauner@ubuntu.com>
Date: Mon Jul 6 17:49:11 2020 +0200
nsproxy: support CLONE_NEWTIME with setns()
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Paul Eggert commented on a patch that proposed to note the
POSIX.2001 details:
No actual POSIXish implementation ever made it a
real-floating type, though, and that point should be made
lest some conscientious programmer worry about a nonexistent
porting issue.
We opted to drop the patch, but in case someone else points out
this POSIX.1-2001 difference in the future, let's leave a comment
in the page source.
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Describe the activation of the Kernel Lockdown feature via Kconfig
and the command line.
Cf. Documentation/admin-guide/kernel-parameters.rst.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Highlight to the reader that if another filter returns a
higher-precedence action value, then the ptracer will not
be notified.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The wording was incorrect:
It stated that 'eflags' may be the OR of one or two of those two flags,
but then a third flag was documented
(which according to the previous wording could not be used?!).
Moreover, the wording also disallowed using 0 (i.e., no flags at all),
which POSIX specifically allows;
I tested the function with no flags and it worked fine for me,
so I guess it was a problem with the documentation,
and not with the implementation itself.
POSIX ref: https://pubs.opengroup.org/onlinepubs/9699919799/
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I added the EXAMPLES section.
The examples in this page are incomplete
(you can't copy&paste&compile&run).
I fixed the one about TAILQ first,
and the rest should follow.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
POSIX requires that the <regex.h> header shall define
the structures and symbolic constants used by the
regcomp(), regexec(), regerror(), and regfree() functions.
Therefore, there should be no need to include <sys/types.h>
at all.
The POSIX docs don't use that include:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This is implied by POSIX because it requires that these strings in
the locale definition file contain one symbol. Currently,
locale.5 does not document the concept of symbols, this change
glosses over that and just uses the term "single-character
string".
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Revert "uint_least8_t.3, uint_least16_t.3, uint_least32_t.3, uint_least64_t.3, uint_leastN_t.3: New links to system_data_types(7)"
This reverts commit a5d13a32b7.
Revert "system_data_types.7: Add uint_leastN_t family of types"
This reverts commit 3450a5621e.
Revert "int_least8_t.3, int_least16_t.3, int_least32_t.3, int_least64_t.3, int_leastN_t.3: New links to system_data_types(7)"
This reverts commit 876838354d.
Revert "system_data_types.7: Add int_leastN_t family of types"
This reverts commit f9b54d3a2e.
Revert "uint_fast8_t.3, uint_fast16_t.3, uint_fast32_t.3, uint_fast64_t.3, uint_fastN_t.3: New links to system_data_types(7)"
This reverts commit 496b1aad79.
Revert "system_data_types.7: Add uint_fastN_t family of types"
This reverts commit 3c9ae6e5a2.
Revert "int_fast8_t.3, int_fast16_t.3, int_fast32_t.3, int_fast64_t.3, int_fastN_t.3: New links to system_data_types(7)"
This reverts commit 9df81a23e5.
Revert "system_data_types.7: Add int_fastN_t family of types"
This reverts commit 8f12d3f683.
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: Jonathan Wakely <jwakely.gcc@gmail.com>
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Through some accident, 'sys_siglist' has been documented in
two different pages. Consolidate the information to one page
(strsignal(3)) and add 'sys_siglist" to the NAME line of that
page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
We used .br to force a simple line break (with no extra blank line)
after the tag.
Recently, we added .RS/.RS, and .RS comes just after the tag,
and I realized by accident that .RS already forces a simple line break.
Therefore, .br is completely redundant here, and can be removed.
This way we get rid of "raw" *roff requests in this page.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The previous format/wording for the includes wasn't very clear.
Improve it a bit following Branden's proposal.
It also helps reduce lines of code.
Add a subsection in NOTES explaining the conventions used.
Remove the comment for helping maintain the page,
as the NOTES section is now clear enough.
Reported-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Reported-by: Dave Martin <Dave.Martin@arm.com>
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Due to a userspace breakage, commit 1251201c0d34 ("sched/core: Fix
uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic
and code") changed the semantics of sched_getattr(2) when the
userspace struct is smaller than the kernel struct. Now, any
trailing non-zero data in the kernel structure is ignored when
copying to userspace. We also document the original error code
correctly (it was EFBIG not E2BIG) in the BUGS section.
Ref: 1251201c0d34 ("sched/core: Fix uclamp ABI bug, clean up and
robustify sched_read_attr() ABI logic and code")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Around the text:
"Feature Test Macro Requirements for glibc..."
replace ".in -4n/.in" with ".RS -4/.RE".
The latter form is more idiomatic use of man macros.
The nroff output is unchanged.
Reported-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Returning its argument without further checks is almost always
wrong for la_version.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The contents of each type are a logical block that is indented as
a block. They are not separate paragraphs that happen to be
indented separately, but a set of continuous paragraphs, all at
the same level, indented as a block from the list entry--the name
of the type--. Therefore, it makes more sense to use block
indentation, represented by .RS/.RE, rather than indenting each
paragraph separately. That way it's also easier to further indent
a separate paragraph inside a block, which happens for example in
the case of float_t & double_t. It's simply much easier now to
use .IP specifically in those cases where you want to indent just
a single paragraph.
Also added an ending separator comment line just after the last
type.
[mtk: minor edits to commit message]
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From the email discussion:
> Hi Alex,
>
> On 9/25/20 9:31 AM, Alejandro Colomar wrote:
>> Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
>> ---
>> man2/seccomp.2 | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/man2/seccomp.2 b/man2/seccomp.2
>> index 58033da1c..d6b856c32 100644
>> --- a/man2/seccomp.2
>> +++ b/man2/seccomp.2
>> @@ -1101,7 +1101,7 @@ install_filter(int syscall_nr, int t_arch, int f_errno)
>> };
>>
>> struct sock_fprog prog = {
>> - .len = (unsigned short) (sizeof(filter) / sizeof(filter[0])),
>> + .len = sizeof(filter) / sizeof(filter[0]),
>> .filter = filter,
>> };
>
> I have a small doubt about this change. With the change,
> there are no compilation warnings.
>
> But, if we change the code to something slightly different:
>
> [[
> size_t x = (sizeof(filter) / sizeof(filter[0]));
> struct sock_fprog prog = {
> .len = x,
> .filter = filter,
> };
> ]]
>
> The "cc -Wconversion" gives us the following warning:
>
> warning: conversion from ‘size_t’ {aka ‘long unsigned int’}
> to ‘short unsigned int’ may change value
>
> Presumably we don't get a warning for an assignment of the form
>
> .len = (sizeof(filter) / sizeof(filter[0]))
>
> because the compiler is smart enough to work out that the
> value of the constant expression is within the range of
> "unsigned short".
>
> Your thoughts?
Hi Michael,
I'd say that the cast doesn't fix any problems at all. It silences a
valid warning, and I'd use a pragma for that (to be more explicit about
the intention of silencing a warning) if I do want -Wconversion enabled
(which usually I don't want, because it's too noisy) and I'm sure that
this won't overflow. I'd limit the use casts to only when I *really*
need to.
I guess that if you enable -O3, the warning will vanish again because
the compiler will optimize away 'x' (but I didn't test).
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use \(aq to get an unslanted single quote inside monospace code
blocks. Using a simple ' results in a slanted quote inside PDFs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These variables are either of an unsigned integer type per POSIX;
or of an integer type per POSIX, that Linux defines as an unsigned integer type.
Print them with 'uintmax_t' instead of 'intmax_t' to avoid
big positive numbers being printed as negative numbers.
Bug report:
From: Konstantin Bukin @ 2020-09-13 15:04 UTC
To: mtk.manpages; +Cc: Konstantin Bukin, linux-man
inode numbers are expected to be positive. Casting them to a signed type
may result in printing negative values. E.g. running example program on
the following file:
$ ls -li test.txt
9280843260537405888 -r--r--r-- 1 kbukin hardware 300 Jul 21 06:36 test.txt
results in the following output:
$ ./example test.txt
ID of containing device: [0,480]
File type: regular file
I-node number: -9165900813172145728
Mode: 100444 (octal)
Link count: 1
Ownership: UID=2743 GID=30
Preferred I/O block size: 32768 bytes
File size: 300 bytes
Blocks allocated: 8
Last status change: Tue Jul 21 06:36:50 2020
Last file access: Sat Sep 12 14:13:38 2020
Last file modification: Tue Jul 21 06:36:50 2020
Such erroneous reporting happens for inode values greater than maximum
value which can be stored in signed long. Casting does not seem to be
necessary here. Printing inode as unsigned long fixes the issue.
Reported-by: Konstantin Bukin <kbukin@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing text comes straight from POSIX. In copyright terms,
this is at least a gray area, and in any case, simply reproducing
the text of the standard has limited value, since people can
consult the standard directly. So, rewrite the text, to simply
quote the description from fenv(3).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The implementation shall support one or more programming
environments where these types are no wider than 'long'.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A limit can be defined by other than POSIX
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add note about length modifiers and conversions to [u]intmax_t,
and add a corresponding example.
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note: There are a few members of this structure that are
not required by POSIX (XSI extensions, and such).
I simply chose to not document them at all.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Various ports that had their own indigenous system calls have
been discontinued. Remove those system calls (none of which had
manual pages!) to a separate part of the page, to avoid
cluttering the main list of system calls.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some of the links removed in commit 247c654385 should
have been kept, because in some cases there are real system
calls whose wrapper functions are documented in Section 3.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
SO_PEERSEC was introduced for AF_UNIX stream sockets connected via
connect(2) in Linux 2.6.2 [1] and later augmented to support
AF_UNIX stream and datagram sockets created via socketpair(2) in
Linux 4.18 [2]. Document SO_PEERSEC in the socket.7 and unix.7
man pages following the example of the existing SO_PEERCRED
descriptions. SO_PEERSEC is also supported on AF_INET sockets
when using labeled IPSEC or NetLabel but defer adding a
description of that support to a separate patch.
The module-independent description of the security context
returned by SO_PEERSEC is from Simon McVittie.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=da6e57a2e6bd7939f610d957afacaf6a131e75ed
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b811db2cb2aabc910e53d34ebb95a15997c33e7
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Cowritten-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The initial version documents sigval, ssize_t, suseconds_t,
time_t, timer_t, timespec, and timeval.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[mtk: the coding style used in the example could lead people to
inject memory leaks in their code if they cut/paste/modify the
code to replace "exit" paths with "return" paths from a library
function.]
[Marko, from the mail thread discussing this patch:]
You are right about terminating the process. However, people copy
that example and put the code in a function changing "exit" to
"return". There are a bunch of examples like that here
https://beej.us/guide/bgnet/html/#poll, for instance. That error
bothered me when reading the network programming guide
https://beej.us/guide/bgnet/html/. Than I looked for information
elsewhere:
https://stackoverflow.com/questions/6712740/valgrind-reporting-that-getaddrinfo-is-leaking-memoryhttps://stackoverflow.com/questions/15690303/server-client-sockets-freeaddrinfo3-placement
And finally, I checked manual pages and saw where these errors
come from.
When you change that to a function and return without doing
freeaddrinfo, that is a memory leak. I believe an example should
show good programming practices. Relying on exiting and clearing
the memory in that case is not such a case. In my opinion, these
examples lead people to make mistakes in their programs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Let's move to the 21st century. Instead of casting system data
types to long/long long/etc. in printf() calls, instead cast to
intmax_t or uintmax_t, the largest available signed/unsigned
integer types.
[mtk: rewrote commit message]
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use printf()'s '#' flag character to prepend the string "0x".
However, when the number is printed in uppercase, and the prefix
is in lowercase, the string "0x" needs to be manually written.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For consistency.
The types are written both with and without the redundant 'int' keyword
all over the man-pages. However, the most used form, by far, is the one
without 'int'.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Member 'tv_nsec' of 'struct timespec' is of type 'long' (see time.h.0p),
and therefore, the cast is completely redundant.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
'nread' is of type 'ssize_t'
'tot' adds up different values contained in 'nread',
so it should also be 'ssize_t', and not 'int' (which possibly overflows).
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Notes: I copied .nf and .fi from futex.2, but they made no visual difference.
What do they actually do?
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For consistency.
Most man pages use 'long' instead of 'long int'.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It is the DT_RUNPATH/DT_RPATH of the calling object (not the
executable) that is relevant for the library search. Verified
by experiment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When calling msgrcv() with the MSG_COPY flag, it will report
EINVAL error even we if have disabled CONFIG_CHECKPOINT_RESTORE.
ENOSYS will be reported only if we also specify the IPC_NOWAIT
flag.
[mtk: edited commit message]
Notes from mtk:
The relevant kernel code is this:
[[
#ifdef CONFIG_CHECKPOINT_RESTORE
...
#else
static inline struct msg_msg *prepare_copy(void __user *buf, size_t bufsz)
{
return ERR_PTR(-ENOSYS);
}
...
static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long
msgtyp, int msgflg,
long (*msg_handler)(void __user *, struct msg_msg *, size_t))
{
...
if (msgflg & MSG_COPY) {
if ((msgflg & MSG_EXCEPT) || !(msgflg & IPC_NOWAIT))
return -EINVAL;
copy = prepare_copy(buf, min_t(size_t, bufsz, ns->msg_ctlmax));
...
}
]]
We'll only hit the ENOSYS error if:
(1) MSG_COPY was specified;
(2) IPC_NOWAIT was not specified; and
(3) CONFIG_CHECKPOINT_RESTORE was not enabled.
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fread(3), unlike read(2) which returns a ssize_t, returns a
size_t. It doesn't distinguish between error and enf-of-file.
Instead, either ferror(3) or feof(3) need to be checked if fread()
returned 0.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
`p1` (and `p2` too) is `const void *` and it comes from a
`const char **` (for legacy reasons, argv is not `const` but should be
treated as if it were). That means, the ultimate `char` is `const`:
"a pointer to a pointer to a const char".
Let's see what is going on before the fix first, and then the fix.
Before the fix:
`(char *const *)` (I removed the space on purpose) casts `p1` to be
"a pointer to a const pointer to a non-const char". That's clearly
not what it originally was.
Then we dereference, ending with a `char *const`, which is
"a const pointer to a non-const char". But given that the pointer value
is passed to a function, `const` doesn't make sense there, because the
function will already take a copy of it, so it is impossible to modify
the pointer itself.
The fix:
`(const char **)` The only thing that is const is the ultimate `char`,
which is the only thing that matters, because it is the only thing
strcmp(3) has access to (everything else, i.e. the pointers, are
copies).
Then, after the dereference we end up with `const char *`, the type of
argv (more or less, as previously noted), which is also the type of the
arguments to strcmp(3).
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Casting `const void *` to `struct mi *` should result in a warning if
done implicitly. The explicit cast was probably silencing that warning.
`const` can and should be kept.
Now, casting `const void *` to `const struct mi *` is done implicitly.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Casting `void *` to `struct child_args *` is already done implicitly.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type `struct msgbuf *` is implicitly casted to `const void *`.
Not only that, but the explicit cast to `void *` was slightly
misleading.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type `sigset_t *` is implicitly casted to `void *`.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type `struct sockaddr_nl *` is implicitly casted to `void *`.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Casting `void *` to `double (*cosine)(double)` is already done
implicitly.
I had doubts about this one, but `gcc -Wall -Wextra` didn't complain
about it.
Explicitly casting can silence warnings when mistakes are made, so it's
better to remove those casts when possible.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The type of `val` is `int **`, and it will work with tsearch()
anyway because of implicit cast from `void *`, so declaring it as an
`int **` simplifies the code.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It doesn't make any sense to pass a pointer to the array to
read(2).
It might make sense to pass a pointer to the first element of the
array, but that is already implicitly done when passing the array,
which decays to that pointer, so it's simpler to pass the array.
And anyway, the cast was unneeded, as any pointer is implicitly
cast to `void *`.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Casting `int *` to `const void *` is already done implicitly.
Not only that, but the explicit cast to `void *` was slightly
misleading.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than:
sometype x;
for (x = ....; ...)
use
for (sometype x = ...; ...)
This brings the declaration and use closer together (thus aiding
readability) and also clearly indicates the scope of the loop
counter variable.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than writing things such as:
struct sometype *x;
...
x = malloc(sizeof(*x));
let's use C99 style so that the type info is in the same line as
the allocation:
struct sometype *x = malloc(sizeof(*x));
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The same calculations are repeated in malloc() and printf() calls.
For better readability, do the calculations once.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Hi Michael,
Continuing with the series, this is the first of the last set of
patches: (2).1 as numbered in previous emails.
Regards,
Alex.
------------------------------------------------------------------------
>From ad5f958ed68079791d6e35f9d70ca5ec2a72c43b Mon Sep 17 00:00:00 2001
From: Alejandro Colomar <colomar.6.4.3@gmail.com>
Date: Thu, 3 Sep 2020 12:11:18 +0200
Subject: [PATCH] memusage.1: Use sizeof consistently
Use ``sizeof`` consistently through all the examples in the following
way:
- Use the name of the variable instead of its type as argument for
``sizeof``.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#allocating-memory
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document fanotify_init(2) flag FAN_REPORT_NAME and the format of
the event info type FAN_EVENT_INFO_TYPE_DFID_NAME.
The fanotify_fid.c example is extended to also report the name of
the created file or subdirectory.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document fanotify_init(2) flag FAN_REPORT_DIR_FID and event info
type FAN_EVENT_INFO_TYPE_DFID.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
With fanotify_init(2) flag FAN_REPORT_FID, the group identifies
filesystem objects by file handles in a single event info record
of type FAN_EVENT_INFO_TYPE_FID.
We intend to add support for new fanotify_init(2) flags for which
the group identifies filesystem objects by file handles and add
more event info record types.
To that end, start by changing the language of the man page to
refer to a "group that identifies filesystem objects by file
handles" instead of referring to the FAN_REPORT_FID flag and
document the extended event format structure in a more generic
manner that allows more than a single event info record and not
only a record of type FAN_EVENT_INFO_TYPE_FID.
Clarify that the object identified by the file handle refers to
the directory in directory entry modification events.
Remove a note about directory entry modification events and
monitoring a mount point that I found to be too confusing and out
of context.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Jakub points out that my last resync may accidentally have been
against an old version of the kernel source, since the resync
resulted in many deleted lines. I suspect he may be right.
Let's resync against today's current kernel.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the
following way:
- When the result of ``sizeof`` is multiplied (or otherwise
modified), write ``sizeof`` in the first place.
Rationale:
``(sizeof(x) * INT_MAX * 2)`` doesn't overflow.
``(INT_MAX * 2 * sizeof(x))`` overflows, giving incorrect
results.
As a side effect, the parentheses of ``sizeof`` are not next to
the parentheses of the whole expression, and it is visually
easier to read.
Detailed rationale:
In C, successive multiplications are evaluated left to right (*),
and therefore here is what happens (assuming x86_64):
``(sizeof(x) * INT_MAX * 2)``:
1) sizeof(x) * INT_MAX (the type is the largest of both, which
is size_t (unsigned long; uint64_t)).
2) ANS * 2 (the type is again the largest: size_t)
``(INT_MAX * 2 * sizeof(x))``:
1) INT_MAX * 2 (the type is the largest of both, which is
int as both are int (int; int32_t), so the
result is already truncated as it doesn't fit
an int; at this point, the intermediate result
will be 2^32 - 2 (``INT_MAX - 1``) (if I did
the math right)).
2) ANS * 2 (the type is again the largest of both: size_t;
however, ANS was already incorrect, so the
result will be an incorrect size_t value)
(*): https://en.cppreference.com/w/c/language/operator_precedence
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use 'struct timespec', not 'struct timeval', and adjust
the variable name accordingly.
Reported-by: Tony May <tony.may@mediakind.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use ``sizeof`` consistently through all the examples in the
following way:
- Never use a space after ``sizeof``, and always use parentheses
around the argument.
Rationale:
https://www.kernel.org/doc/html/v5.8/process/coding-style.html#spaces
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I run ``sudo make`` and then visualized the man page with
``man 3 queue``, and the contents looked good.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In C89 strtod returns zero on underflow, but since C99 it can return
non-zero. This means the strtod.3 page contradicts all recent C and
POSIX standards. Both C and POSIX say "smallest normalized positive
number", but for consistency with HUGE_VAL, HUGE_VALF and HUGE_VALL
this patch uses the constants for those numbers.
Also slightly improve the presentation of return values for overflow.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The description of hexadecimal floating-point output is missing a
character describing the exponent. The guarantee of at least one digit
in the exponent is present in both C99 and POSIX.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Killing a thread with SECCOMP_RET_KILL_THREAD is very likely
to leave the rest of the process in a broken state.
Wording pretty much taken from Rick Felker's suggestion.
Reported-by: Rich Felker <dalias@libc.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This header is used inconsistently -- man pages are UTF-8 encoded
but not setting this marker. It's only respected by the man-db
package, and seems a bit anachronistic at this point when UTF-8
is the standard default nowadays.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There was code containing ``CIRCLEQ_*`` in the examples for ``TAILQ_*``. It was introduced by accident in commit ``041abbe``.
From 0c9dfbe9b1ce1130e9a92d1a16fbecd4a08bbe29 Mon Sep 17 00:00:00 2001
From: Alejandro Colomar <colomar.6.4.3@gmail.com>
Date: Wed, 12 Aug 2020 09:11:27 +0200
Subject: [PATCH] queue.3: Remove wrong code from example
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A naked tilde ("~") renders poorly in PDF. Instead use "\(ti",
which renders better in a PDF, and produces the same glyph
when rendering on a terminal.
Reported-by: Geoff Clare <gwc@opengroup.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
===========
DESCRIPTION
===========
I'm documenting ``CIRCLEQ_*`` macros in queue.3. While writing
this, I noticed that the documentation for some types of
queues/lists talked about swapping contents of two lists, but only
for some of them. I then found that those macros (``*_SWAP``)
don't exist in my system (Debian), but exist in BSD, and I also
found that a previous commit (6559169cac) commented out a lot of
the *_SWAP macros documentation, but not all, and the reason was
that they were not present on glibc.
I checked that I didn't have any of the *_SWAP macros on my glibc,
so I think this is probably that the commit simply forgot to
comment some of
them.
=======
TESTING
=======
I did ``sudo make`` and then visualized the man page with
``man 3 queue``, and the changes looked good.
I also noticed that the subsection ``Tail queue example`` contents
were wrong, as they contained calls to CIRCLEQ_* macros. I will
address that in a future patch, before I submit the patch
documenting CIRCLEQ_*.
Signed-off-by: Alejandro Colomar <colomar.6.4.3@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explicitly mention CONFIG_LEGACY_PTYS, and note that it is disabled
by default since Linux 2.6.30.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The paragraph noting applications that use pseudoterminals is better
placed in NOTES than in the DESCRTIPTION.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Historically, a comment of the following form at the top of a
manual page was used to indicate too man(1) that the use of tbl(1)
was required in order to process tables:
'\" t
However, at least as far back as 2001 (according to Branden),
man-db's man(1) automatically uses tbl(1) as needed, rendering
this comment unnecessary. And indeed many existing pages in
man-pages that have tables don't have this comment at the top of
the file. So, drop the comment from those files where it is
present.
[mtk: completely rewrote commit message]
Reported-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add some paragraph breaks to the discussion of 'mode' to make
the details a bit easier to read.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The \" comment produces blank lines. Use the .\" that the vast
majority of the codebase uses instead.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The subsection named "Rpath token expansion" was renamed to
"Dynamic string tokens". Update the cross-references inside
the page accordingly.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This happens for more than just DT_RPATH/DT_RUNPATH.
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
"/bin/pwd" happens to work with the GNU coreutils implementation,
which has -P as the default, contrary to POSIX requirements.
Use "pwd -P" instead, which is shorter, easier to type, and should
work everywhere.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
CAP_SYS_RESOURCE also allows overriding /proc/sys/fs/mqueue/msg_max
and /proc/sys/fs/mqueue/msgsize_max.
Signed-off-by: Saikiran Madugula <hummerbliss@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clarify that it is recursive read locks on the read-write lock
that make it difficult to implement
PTHREAD_RWLOCK_PREFER_WRITER_NP.
Update the libc-alpha URL and provide the URL to the POSIX wording
that is quoted in the comment.
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Close the PID file descriptor in the example program, to hint to
the reader that like every other kind of file descriptor, a PID FD
should be closed.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '--' to '\-\-' for options and '--' between words to '\(em'
(em-dash).
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
msg_iovlen is incorrectly typed (according to POSIX) in addition
to msg_controllen, but unlike msg_controllen, this wasn't
mentioned for msg_iovlen.
msg_iovlen being incorrectly typed hasn't been reported as a GCC
bug, but there's no point since it is caused by the same
underlying issue.
Sources: POSIX.1-2017 (<sys/socket.h>), Linux
(include/linux/socket.h)
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See kernel commit 8823c079ba7136dc1948d6f6dcb5f8022bde438e
and the in fs/namespace.c::do_loopback():
err = -EINVAL;
if (mnt_ns_loop(old_path.dentry))
goto out;
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Change '-' to '\(en' for a range.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change '-' to '\-' for the prefix of names to indicate an option.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
cgroups-v1/v2 documentation got moved to the "admin-guide" subfolder
and converted from .txt files to .rst
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Update description of permissions for port-mapped I/O set
per-thread and not per-process. Mention that iopl() can not
disable interrupts since Linux 5.5 anymore and is in general
deprecated and only provided for legacy X servers.
See https://bugzilla.kernel.org/show_bug.cgi?id=205317
Reported-by: victorm007@yahoo.com
Signed-off-by: Thomas Piekarski <t.piekarski@deloquencia.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for the the PR_SET_TAGGED_ADDR_CTRL and
PR_GET_TAGGED_ADDR_CTRL prctls added in Linux 5.4 for arm64.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for the the PR_SVE_SET_VL and PR_SVE_GET_VL
prctls added in Linux 4.15 for arm64.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A patch has been merged for v5.8 that changes how syncfs() reports
errors. Change the sync() manpage accordingly.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The host ID might once have been intended to be globally unique,
but that turned out not to feasible.
Reported-by: Rich Felker <dalias@libc.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If this was ever going to change the test case is very simple:
$ cd /tmp
$ touch libc.so.6
$ LD_LIBRARY_PATH=: sh
sh: error while loading shared libraries: libc.so.6: file too short
Signed-off-by: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by mail from Geoff Clare, there are some details that
need correcting:
Subject: standards(7) (was: man-pages-5.07 released)
Date: Wed, 10 Jun 2020 10:53:14 +0100
From: Geoff Clare <gwc@opengroup.org>
...
The first isn't really a problem, just an oddity. You list
POSIX.1b as "formerly known as POSIX.4", but you don't do the
equivalent for POSIX.1c ("formerly known as POSIX.4a").
There are several problems with the XPG3 entry:
"first significant release" - although I suppose XPG3 could
be considered more significant than XPG2 because it was the
first one to incorporate POSIX.1, I don't think it's fair to
imply that XPG2 was not significant. (E.g. XPG2 was
significant in that it was the first release to include
I18N, and the first that had a conformance test suite.)
"produced by the X/Open Company, a multivendor consortium" -
this conflates two different things called X/Open. X/Open
Company Limited is the UK company that did the editing work,
organised meetings, etc. X/Open Group is the consortium
whose members developed the technical content.
"This multivolume guide was based on the POSIX standards" -
at the time there was only one POSIX standard, namely
POSIX.1-1988. The first release to incorporate POSIX.2 was
XPG4 (which you may consider worth noting in the XPG4
entry).
To fix these problems I would suggest changing the entry to:
XPG3 Released in 1989, this was the first release of the X/Open
Portability Guide to be based on a POSIX standard
(POSIX.1-1988). This multivolume guide was developed by the
X/Open Group, a multivendor consortium.
Under SUSv2 I would suggest changing:
Sometimes also referred to as XPG5.
to:
Sometimes also referred to (incorrectly) as XPG5.
Under POSIX.1-2001, SUSv3: "XSI conformance constitutes the Single
UNIX Specification version 3 (SUSv3)" is problematic. I think I
touched on this in the previous discussion. I would suggest
deleting that sentence and instead inserting, before "Two
Technical Corrigenda ...", the following:
The Single UNIX Specification version 3 (SUSv3) comprises the
Base Specifications containing XBD, XSH, XCU and XRAT as
above, plus X/Open Curses Issue 4 version 2 as an extra volume
that is not in POSIX.1-2001.
Something similar is needed in the POSIX.1-2008, SUSv4 entry where
it talks about "the same four parts". The extra volume this time
is X/Open Curses Issue 7.
]]
Cowritten-by: Geoff Clare <gwc@opengroup.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The fact that a more negative nice value means higher
priority is a continuing source of confusion.
Reported-by: Dan Kenigsberg <danken@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove superfluous paragraph macros.
Remove request ".br" if it precedes a line, that begins with a
space, as such lines automatically cause a break.
There is no change in the output from "nroff" and "groff".
###
Examples of warnings from "mandoc -Tlint":
mandoc: bindresvport.3:41:2: WARNING: skipping paragraph macro: PP after SH
mandoc: crypt.3:228:2: WARNING: skipping paragraph macro: PP empty
mandoc: dlinfo.3:151:2: WARNING: skipping paragraph macro: IP empty
mandoc: exec.3:86:2: WARNING: skipping paragraph macro: PP after SS
mandoc: getsubopt.3:45:2: WARNING: skipping paragraph macro: br before text line with leading blank
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove superfluous paragraph macros.
Remove ".br" if it is before a line that starts with a space
character, as such lines automatically cause a break.
###
The output is unchanged, except two empty lines are added at the
bottom (before the footer line) in the output of "nroff" for the files
"alloc_hugepages.2" and "userfaultfd.2".
###
Examples of warnings from "mandoc -Tlint":
mandoc: access.2:283:2: WARNING: skipping paragraph macro: PP after SH
mandoc: adjtimex.2:185:2: WARNING: skipping paragraph macro: PP empty
mandoc: futex.2:728:2: WARNING: skipping paragraph macro: IP empty
mandoc: getsid.2:48:2: WARNING: skipping paragraph macro: br before text line with leading blank
mandoc: init_module.2:290:2: WARNING: skipping paragraph macro: PP after SS
mandoc: ioctl_fideduperange.2:27:2: WARNING: skipping paragraph macro: br after SH
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Trim tailing space in "strings".
There is no change in the output from "nroff" and "groff".
###
Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z
[ "test-groff" is a developmental version of "groff" ]
troff: <attributes.7>:510: warning: trailing space
troff: <attributes.7>:512: warning: trailing space
troff: <attributes.7>:513: warning: trailing space
troff: <attributes.7>:516: warning: trailing space
troff: <attributes.7>:649: warning: trailing space
troff: <attributes.7>:681: warning: trailing space
troff: <attributes.7>:720: warning: trailing space
####
troff: <environ.7>:181: warning: trailing space
troff: <environ.7>:182: warning: trailing space
####
troff: <ip.7>:820: warning: trailing space
####
troff: <signal.7>:316: warning: trailing space
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
####
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Output is from: test-groff -b -mandoc -T utf8 -rF0 -t -w w -z
[ "test-groff" is a developmental version of "groff" ]
Input file is ./proc.5
troff: <proc.5>:4410: warning: trailing space
troff: <proc.5>:5206: warning: trailing space
troff: <proc.5>:5488: warning: trailing space
###
There is no change in the output from "nroff" and "groff".
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Trim trailing space.
There is no change in the output from "nroff" and "groff".
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These links were mostly created when pages were moved between
sections, in almost every case several years ago. The idea
was to allow people time to get used to the new section numbers
while still having commands of the form "man <sec> <page>"
work as before. Let's assume that people have now had time to
get used to the new section numbers, and remove these links.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
These are all links that were created several years ago, mainly
when pages were migrated to different sections, in order to
allow the 'man' commands using the old section numbers to work.
However, the plan was always to eventually remove them, after
allowing people who cared to get used to the new section numbers.
Now, after 5+ years in each case, it's time to remove
these links.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Traditionally, magic links have not been a well-understood topic
in Linux. This helps clarify some of the terminology used in
openat2.2.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The variable is used in the code example, but not declared,
leading to a compilation error.
Signed-off-by: Oleksandr Kravchuk <open.source@oleksandr-kravchuk.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I believe new users should be discouraged from using atoi() and
that its disadvantages should be explained.
I added the information that 0 is returned on error - although C
standard and POSIX say that "If the value of the result cannot be
represented, the behavior is undefined." there are some
interpretations that 0 has to be returned
https://stackoverflow.com/questions/38393162/what-can-i-assume-about-the-behaviour-of-atoi-on-error
and this is also what happens in practice with glibc, musl and
uClibc.
Signed-off-by: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
pgrep for example searches for a process name in /proc/pid/status
and therefore cannot find processes whose names are longer than 15
characters unless -f is specified.
Signed-off-by: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reorder full wordings to match the order of abbreviations.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Starting with Linux 5.8, setns() can take a PID file descriptor as
an argument, and move the caller into or more of the namespaces of
the thread referred to by that descriptor.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page currently incorrectly says that 'fd' must refer to
a descendant PID namespace. However, 'fd' can also refer to
the caller's current PID namespace. Verified by experiment,
and also comments in kernel/pid_namespace.c (Linux 5.8-rc1).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Output is from: test-groff -b -e -mandoc -T utf8 -rF0 -t -w w -z
[ "test-groff" is a developmental version of "groff" ]
There is no change in the output of "nroff" and "groff".
####
troff: <fts.3>:50: warning: trailing space
####
troff: <getgrnam.3>:175: warning: trailing space
####
troff: <getpwnam.3>:181: warning: trailing space
####
troff: <rcmd.3>:52: warning: trailing space
troff: <rcmd.3>:57: warning: trailing space
troff: <rcmd.3>:60: warning: trailing space
troff: <rcmd.3>:63: warning: trailing space
troff: <rcmd.3>:69: warning: trailing space
troff: <rcmd.3>:73: warning: trailing space
####
troff: <rexec.3>:48: warning: trailing space
troff: <rexec.3>:51: warning: trailing space
####
troff: <sem_open.3>:36: warning: trailing space
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove superfluous space at the end of a processed input line.
There is no change in the output from "nroff" and "groff".
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current version shows the square brackets, '[' and ']', in
boldface.
Use the '\c' escape sequence (function) to join the output of two
macros.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for the PR_PAC_RESET_KEYS ioctl added in Linux
5.0 for arm64.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The 'comm' value is typically the same as the (possibly
truncated) executable name, but may be something different.
Reported-by: Jonny Grant <jg@jguk.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Recently I had to troubleshoot a problem where a connect() call
was returning EACCES:
17648 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 37
17648 connect(37, {sa_family=AF_INET, sin_port=htons(8081),
sin_addr=inet_addr("10.12.1.201")}, 16) = -1 EACCES (Permission
denied)
I've traced this to SELinux policy denying the connection. This is
on a Fedora 23 VM:
$ cat /etc/redhat-release
Fedora release 23 (Twenty Three)
$ uname -a
Linux mako-fedora-01 4.8.13-100.fc23.x86_64 #1 SMP Fri Dec 9 14:51:40
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
The manpage says this can happen when connecting to a broadcast
address, or when a local firewall rule blocks the connection.
However, the address above is unicast, and using 'wget' from
another account to access the URL works fine.
The context is that we're building an OS image, and this involves
downloading RPMs through a proxy. The proxy (polipo) is labelled
by SELinux, and I guess there is some sort of policy that says
"proxy can only connect to HTTP ports". When trying to connect to
a server listening on a port that is not labeled as an HTTP server
port, I guess SELinux steps in. With 'setenforce 0', the build
works fine. In the kernel sources I see connect() calls
security_socket_connect() (see
https://elixir.bootlin.com/linux/latest/source/net/socket.c#L1855),
which calls whatever security hooks are registered. I see the
SELinux hook getting registered at
https://elixir.bootlin.com/linux/latest/source/security/selinux/hooks.c#L7047,
and setting a perf probe on the call proves that the
selinux_socket_connect function gets called (while
tcp_v4_connect() is not).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From an email conversation with Léo Stefanesco:
> In the man7.org version of the man page for user_namespaces(7), it reads:
>
> there are many privileged operations that affect
> resources that are not associated with any namespace type,
> for example, changing the system time
> (governed by CAP_SYS_TIME)
>
> which is not consistent with time_namespaces(7).
In fact, strictly peaking the text still is correct, even after
the arrival of time namespaces.
Time namespaces virtualize only the boot-time and monotonic
clocks, not the "real time" (i.e., calendar time), which is the
time referred in the passage you quote.
That said, the text is perhaps now a little misleading, and
a little clarification would help. I changed the text to:
there are many privileged operations that affect
resources are not associated with any namespace type,
for example, changing the system **(i.e., calendar)** time
(governed by CAP_SYS_TIME)
Reported-by: Léo Stefanesco <leo.lveb@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This page was first added more than 20 years ago. Since
that time it has seen hardly any update, and is by now
very much out of date, as reported by Heinrich Schuchardt
and confirmed by Eugene Syromyatnikov.
As Heinrich says:
Man-pages like netdevices.7 or ioctl_fat.2 are what is
needed to help a user who does not want to read through the
kernel code.
If ioctl_list.2 has not been reasonably maintained since
Linux 1.3.27 and hence is not a reliable source of
information, shouldn't it be dropped?
My answer is, yes (but let's move a little info into ioctl(2)).
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reported-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In preparation for removing ioctl_list(2), let's preserve
some useful text that was added to ioctl_list(2)
by Andries Brouwer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
RAND_MAX is for rand(3). POSIX fixes random()'s range at 2^31-1;
RAND_MAX may be smaller on some platforms (even though with glibc
or musl on Linux they are the same).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
FAN_ONDIR was an input only flag before introducing
FAN_REPORT_FID. Since the introduction of FAN_REPORT_FID, it can
also be in output mask.
Move the text describing its role in the output mask to fanotify.7
where the other output mask bits are documented.
[mtk: commit message tidy-up]
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It was inserted in the middle of the FAN_CLASS_ multi flags bit
and broke the multi flag documentation.
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This reverts commit a93e5c9593.
FAN_DIR_MODIFY was disabled for v5.7 release by kernel commit
f17936993af0 ("fanotify: turn off support for FAN_DIR_MODIFY").
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the few pages where this heading (which is "nonstandard" within
man-pages) is used, it always immediately follows CONFORMING TO
and generally contains information related to standards. Remove
the section heading, thus incorporating AVAILABILITY into
CONFORMING TO.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
EXAMPLES appears to be the wider majority usage across various
projects' manual pages, and is also what is used in the POSIX
manual pages.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
man-pages doesn't have a REPORTING BUGS section in manual pages,
but many other projects do. Make some recommendations about
placement of that section.
man-pages doesn't use COPYRIGHT sections in manual pages, but
various projects do. Make some recommendations about placement
of the section.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Although man-pages doesn't use AUTHORS sections, many projects do
use an AUTHORS section in their manual pages, so mention it in
man-pages to suggest some guidance on the position at which
to place that section.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There is one case of a cross-reference to a kernel documentation
filename that uses unescaped hyphens.
To avoid misrendering, escape these as \- similarly to other
instances.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add the PR_SPEC_DISABLE_NOEXEC mode added in Linux 5.1
for the PR_SPEC_STORE_BYPASS "misfeature" of
PR_SET_SPECULATION_CTRL and PR_GET_SPECULATION_CTRL.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add the PR_SPEC_INDIRECT_BRANCH "misfeature" added in Linux 4.20
for PR_SET_SPECULATION_CTRL and PR_GET_SPECULATION_CTRL.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The fact that an FE_UNDERFLOW exception is not raised for
"Range error: result underflow" is intended behavior.
See https://www.sourceware.org/bugzilla/show_bug.cgi?id=6806.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The function make_message illustrates how to use vsnprintf to
determine the required amount of memory for a specific format and
its arguments.
If make_message is called with a format which will use exactly
INT_MAX characters (excluding '\0'), then the size++ calculation
will overflow the signed integer "size", which is an undefined
behaviour in C.
Since malloc and vsnprintf rightfully take a size_t argument, I
decided to use a size_t variable for size calculation. Therefore,
this patched code uses variables of the same data types as
expected by function arguments.
Proof of concept (tested on Linux/glibc amd64):
int main() { make_message("%647s%2147483000s", "", ""); }
If the code is compiled with address sanitizer (gcc
-fsanitize=address) you can see the following line, assuming that
a signed integer overflow simply leads to INT_MIN:
==3094==WARNING: AddressSanitizer failed to allocate 0xffffffff80000000 bytes
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Added in kernel commit b6fb293f2497a9841d94f6b57bd2bb2cd222da43
Text from comment in include/uapi/asm-generic/mman.h.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Added in kernel commit 16ba6f811dfe44bc14f7946a4b257b85476fc16e.
Text taken from comments in include/linux/mm.h.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch documents a flag added in the following kernel commit:
commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
Author: Rik van Riel <riel@redhat.com>
Date: Wed Sep 6 16:25:15 2017 -0700
mm,fork: introduce MADV_WIPEONFORK
This was already documented in man2/madvise.2 in the commit:
commit c0c4f6c29c
Author: Rik van Riel <riel@redhat.com>
Date: Tue Sep 19 20:32:00 2017 +0200
madvise.2: Document MADV_WIPEONFORK and MADV_KEEPONFORK
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The gettid() wrapper was added glibc 2.30, and is declared by
<unistd.h> if _GNU_SOURCE is defined.
Reported-by: Joseph C. Sible <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove the text ("rare)" after a note from Vincent Lefèvre:
Subject: [Bug math/13932] dbl-64 pow unexpectedly slow for some inputs
Date: Sat, 23 May 2020 21:31:52 +0000
From: vincent-srcware at vinc17 dot net <sourceware-bugzilla@sourceware.org>
To: mtk.manpages@gmail.comhttps://sourceware.org/bugzilla/show_bug.cgi?id=13932
--- Comment #26 from Vincent Lefèvre <vincent-srcware at vinc17 dot net> ---
(In reply to Michael Kerrisk from comment #25)
> Fix documented for man-pages-5.07.
[...]
> -On 64-bits,
> +Before glibc 2.28,
> .\"
> .\" https://sourceware.org/bugzilla/show_bug.cgi?id=13932
> +on some architectures (e.g., x86-64)
> .BR pow ()
> may be more than 10,000 times slower for some (rare) inputs
> than for other nearby inputs.
[...]
The problematic values are uncommon, but not so rare, in the sense
that they are close to simple values, i.e. are likely to occur in
practice. An example given above: pow(0.999999999999999889, 1.5)
1 and 1.5 are very simple values, which are more likely to occur
in practice than some fixed random value. Then it suffices to have
a small rounding error on 1...
For instance, this is very different from hard-to-round cases of
exp, which are also very slow IMHO, but unless one writes a
specific program for them, no-one should notice the slowness
because such a case would typically occur only once among billions
(I don't remember the accuracy before the slowest path in this
library).
Reported-by: Vincent Lefèvre <vincent-srcware@vinc17.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This reflects glibc commit cad64f778aced84efdaa04ae64f8737b86f063ab
("ldconfig: Default to the new format for ld.so.cache").
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The terms POSIX.1-{2003,2004,2013,2016} were inventions of
my imagination, as confirmed by consulting Geoff Clare of
The Open Group. Remove these names.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Avoid implying that use of IFUNC is the only way to produce a
symbol with NULL value. Give more scenarios how a symbol may get
NULL value, but explain that in those scenarios dlsym() will fail
with Glibc's ld.so due to an implementation inconsistency.
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
iproute2 allows you to specify the netns for either side of a veth
interface at creation time. Add an example of this to veth(4) so
it doesn't sound like you have to move the interfaces in a
separate step.
Verified with commands:
# ip netns add alpha
# ip netns add bravo
# ip link add foo netns alpha type veth peer bar netns bravo
# ip -n alpha link show
# ip -n bravo link show
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Adding description of new directories (/run, /usr/libexec,
/usr/share/color,/usr/share/ppd, /var/lib/color), stating
/usr/X11R6 as removed and updating URL to and version of FHS.
See https://bugzilla.kernel.org/show_bug.cgi?id=206693
Reported-by: Gary Perkins <glperkins@lit.edu>
Signed-off-by: Thomas Piekarski <t.piekarski@deloquencia.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted in email by Christian Brauner:
I forgot to mention that spawning directly into a target
cgroup is also more efficient than moving it after creation.
The specific reason is mentioned in the commit message
[ef2c41cf38a], the write lock of the semaphore need not be
taken in contrast to when it is moved afterwards. That
implementation details is not that interesting but it might
be interesting to know that it provides performance benefits
in general.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This is a sequel to commit baf17bc4f2, addressing the
issues with missing commas in the middle of SEE ALSO lists that
emerged since.
The awk script from the original commit was not working and had to
be slightly modified (s/["]SEE ALSO["]/"?SEE ALSO/), otherwise it
works like a charm. Here's the fixed script and its output just
before this commit:
for f in man*/*; do
awk '
/^.SH "?SEE ALSO/ {
sa=1; print "== " FILENAME " =="; print; next
}
/^\.(PP|SH)/ {
sa=0; no=0; next
}
/^\.BR/ {
if (sa==1) {
print;
if (no == 1)
print "Missing comma in " FILENAME " +" FNR-1; no=0
}
}
/^\.BR .*)$/ {
if (sa==1)
no=1;
next
}
/\.\\"/ {next}
/.*/ {
if (sa==1) {
print; next
}
}
' $f; done | grep Missing
Missing comma in man1/memusage.1 +272
Missing comma in man2/adjtimex.2 +597
Missing comma in man2/adjtimex.2 +598
Missing comma in man2/mkdir.2 +252
Missing comma in man2/sigaction.2 +1045
Missing comma in man2/sigaction.2 +1047
Missing comma in man3/mbsnrtowcs.3 +198
Missing comma in man3/ntp_gettime.3 +142
Missing comma in man3/strcmp.3 +219
Missing comma in man3/strtol.3 +302
Missing comma in man3/wcstombs.3 +120
Missing comma in man7/user_namespaces.7 +1378
Missing comma in man7/xattr.7 +198
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
prctls that are architecture-specific won't work on other
architectures, and arch-specific prctls that manipulate optional
hardware features likewise won't work if that hardware feature is
not present.
The established pattern seems to be to treat such prctls as if they
are unimplemented, when attempted on the wrong hardware.
Cover these cases with some generic weasel words in the closet
existing EINVAL clause.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix a few very minor bits of punctuation in
PR_SET_SPECULATION_CTRL and PR_GET_SPECULATION_CTRL.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The description of PR_SET_PDEATHSIG refers to "maxsig", which
is apparently intended to stand for the maximum defined signal
number.
maxsig seems not to be a thing, even in the kernel.
Reword to use the standard constant NSIG. (Discussion of SIGRTMIN
and SIGRTMAX seems out of scope here, and anyway is not relevant
to the kernel.)
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Intel MPX API was removed from Linux 5.4. See Linux
commit f240652b6032 ("x86/mpx: Remove MPX APIs")
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The prctl list has historically been sorted by prctl name (ignoring
any SET_ or GET_ prefix) to make individual prctls easier to find.
Some noise seems to have crept in since.
Sort the list back into order. Similarly, reorder the list of
prctls specified to return non-zero values on success.
Content movement only. No semantic change.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The prctl.2 source is unnecessarily hard to navigate, not least
because prctl option flags are traditionally named PR_* and so look
just like prctl names.
For each actual prctl, add a comment of the form
.\" prctl PR_FOO
to make it move obvious where each top-level prctl starts.
Of course, we could add some clever macros, but let's not confuse
dumb parsers.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In reality, almost every prctl interferes with assumptions that the
compiler and C library / runtime rely on. prctl() can therefore
make userspace explode in a variety ways that are likely to be hard
to debug.
This is not obvious to the uninitiated, so add a warning.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Under PR_SET_NAME, the [tid] value seen in procfs as
/proc/self/task/[tid] is mistakenly described as the name of the
thread, whereas really the name is on /proc/self/task/[tid]/comm.
Fix it.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current synopsis for prctl(2) misleadingly claims that prctl
operates on a process. Rather, some (in fact, most) prctls operate
on a thread.
The wording probably dates back to the old days when Linux didn't
really have threads at all.
Reword as appropriate.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
No content changes. Just put things in a slightly more logical
order and add a few paragraph breaks for readability.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note that another reason to use the *at() APIs is to access
'flags' functionality that is not available in the corresponding
conventional APIs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
By way of a hint that the file descriptor returned by dirfd()
could usefully be fed to the *at() APIs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Both functions behave the same wrt return value, no need to describe
them separately.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
arm64 is currently documented as receiving the syscall number in
x8.
While this is the correct register, the syscall number is a 32-bit
integer. Bits [63:32] are ignored by the kernel.
So it is more correct to say "w8".
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The arm OABI syscall interface is currently documented in terms of
register name aliases defined by the ARM Procedure Call Standard
(APCS). However, these don't make sense in the context of a
binary interface that doesn't comply (or need to comply) with
APCS.
Use the real architectural register names instead.
The names a1-a4, v1... are just aliases for r0-r3, r4... anyway,
so the interface is just the same regardless of which set of names
is used.
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since kernel commit 96c5865559cee0f9cbc5173f3c949f6ce3525581,
the 'lo_flags' field is modifiable via the LOOP_SET_STATUS and
LOOP_SET_STATUS64 ioctl() operations.
See https://bugzilla.kernel.org/show_bug.cgi?id=203417
Reported-by: Vlad <cvazir@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There are 2 typos:
file_in is used instead of fd_in in the ERRORS and NOTES sections.
file_out is used instead of fd_out in the ERRORS section.
Reported-by: Ricardo Castano <ricardo.castano.salinas@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This update addresses an issue described in
https://bugzilla.kernel.org/show_bug.cgi?id=207345
In answer to my question, Paul Eggert noted:
> Where do I find it?
https://www.iana.org/time-zones
Look under "Latest version", which is 2020a.
Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Reported-by: Marco Curreli <marcocurreli@tiscali.it>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The Linux man page for ptsname_r, when describing the behaviour
in the error case, is
- not consistent with the future POSIX standard (POSIX Issue 8).
- not consistent with musl libc.
Find attached a patch to
- keep it consistent with what glibc does,
- make it consistent with musl libc,
- make it consistent with the future POSIX standard (POSIX
Issue 8).
Details:
glibc's implementation of ptsname_r, when it fails, returns the
error code as return value AND sets errno. See
https://sourceware.org/git/?p=glibc.git;a=blob;f=login/ptsname.chttps://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/mach/hurd/ptsname.chttps://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/ptsname.c
musl's implementation of ptsname_r, when it fails, returns the error code
but does NOT set errno. See
https://git.musl-libc.org/cgit/musl/tree/src/misc/pty.c
The proposal to add ptsname_r to POSIX, with text
"If successful, the ptsname_r( ) function shall return zero.
Otherwise, an error number shall be returned to indicate the
error."
has been accepted for inclusion in POSIX Issue 8.
http://austingroupbugs.net/view.php?id=508
Therefore a portable program should look at the return value from
ptsname_r, NOT the errno value. The current text in the man page
suggests to look at the errno value, which is wrong (because of
musl libc) and not future-proof (because of future POSIX).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The page of attr(1) is relevant to xattrs, therefore add it to the
SEE ALSO section.
attr(1) command works for other filesystems as well.
Signed-off-by: Achilles Gaikwad <agaikwad@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Used Bird's source code, kernel source code, iproute2 source code
and iproute2 manpages to find meanings of these new attributes.
Signed-off-by: Jan Moskyto Matejka <mq@ucw.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=198569.
Reported-by: Alexander Morozov <alexandermv@gmail.com>
Reported-by: Amir Goldstein <amir73il@gmail.com>
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Redundant because this is a Section 3 page, and the
text also describes the implementation in glibc.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From a conversation with Paul Eggert:
From: Paul Eggert <eggert@cs.ucla.edu>
Subject: Re: Errors in man pages, here: tzfile(5): Typo?
On 4/20/20 12:27 AM, Michael Kerrisk (man-pages) wrote:
> I think "UT" here is intended to mean "Universal Time", and as such
> should not be "UTC". Perhaps Paul can comment.
Yes, that's right. The tzfile format covers timestamps that predate the
introduction of UTC in 1960, so the documentation uses the sloppier and
more-general term "UT" instead of the more-precise term "UTC".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the details of the new FAN_DIR_MODIFY event, which
introduces entry name information to the fanotify event
reporting format.
Enhance the fanotify_fid.c example to also report this event.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- The condition for printing "subdirectory created" was always
true.
- The arguments and error check of open_by_handle_at() were
incorrect.
- Fix example description inconsistencies.
- Nicer indentation of example output.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some of the new event types that were added in v5.1 along with
init flag FAN_REPORT_FID are not eligible for reporting to a
directory watching with FAN_EVENT_ON_CHILD.
Document the events that cannot be generated on children of a
watching parent.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It is not true that FAN_MARK_MOUNT cannot be used with a group
that was initialized with flag FAN_REPORT_FID.
The correct assertion is that events that require a group with
flag FAN_REPORT_FID cannot be requested on a mark mount.
For exaple, a FAN_OPEN event can be requested on a mark mount and
will generate an event with file handle information if the group
was initialized with flag FAN_REPORT_FID.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I was experimenting with some possible changes to adjtimex(2) and
clock_adjtime(2) and tried to look up the man page to see what the
documented behavior is when I noticed that clock_adjtime() appears
to be the only system call that is currently undocumented.
Before I do any changes to it, this tries to document what I
understand it currently does.
[ RC: Add better explanations of the usage and error codes
and correct some typographical mistakes. ]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux has allowed passing open file descriptors to clock_gettime()
and friends since v2.6.39. This patch documents these "dynamic"
clocks and adds a brief example of how to use them.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The structure in the kernel appears to be named 'dsp56k_upload'
not 'dsp56k_binary'. And this appears always to have been so.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Taken from Documentation/filesystems/proc.txt.
Reported-by: Helge Kreutzmann <debian@helgefjell.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since glibc 2.26, posix_spawn (2) function accepts the
POSIX_SPAWN_SETSID flag. This flag has been accepted by POSIX and
should be added to the next major revision. The current support
can be enabled with _GNU_SOURCE.
Upstream commit in glibc.git:
daeb1fa2e1 [BZ 21340] add support for POSIX_SPAWN_SETSID
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The implementation of the fork() step in posix_spawn(2) relies on
either fork(2), vfork(2) or clone(2) depending on the version of
the glibc and the arguments passed to posix_spawn(2).
It is sometimes ambiguous whether, when we are mentioning
"fork(2)", we are referring to the fork() step or the actual
fork(2) syscall.
This patch hopefully avoids the ambiguity by replacing confusing
occurrences by "the xxx() step" where appropriate.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Added a few lines about POSIX_SPAWN_USEVFORK so that it appears
clearly that since glibc 2.24, the flag has no effect.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since glibc 2.24, the use of posix_spawn (2) makes an
unconditional call to clone(CLONE_VM | CLONE_VFORK ...) rather
than relying on fork (2) or vfork (2).
As a consequence, the statements regarding the use of the flag
POSIX_SPAWN_USEVFORK and how the function decides whether it
should use fork (2) or vfork (2) are obsolete since glibc 2.24.
This patch makes a distinction in the manual page between glibc
2.24 and older versions.
Upstream commit in glibc.git:
9ff72da471 posix: New Linux posix_spawn{p} implementation
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Taken from Linux v5.3-rc2. Add a reference to the header file to
save the future reader some time figuring out whether more entries
exist.
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In Linux 4.4, the allowed BPF helper functions that could
be called was governed by a check in sk_filter_func_proto().
Nowadays (Linux 5.6), it is I think governed by the check in
sk_filter_func_proto().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This notes that the kernel now allows calls to bpf() without CAP_SYS_ADMIN
under some circumstances.
Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The cgroup.sane_behavior file returns the hard-coded value "0" and
is kept for legacy purposes. Mention this in the man-page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"), the
semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons
(usually a busy page). This was an unintentional change that
hasn't been noticed except for LTP tests which checked for the
documented behavior.
There are two ways to go around this change. We can even get back
to the original behavior and return -EAGAIN whenever migrate_pages
is not able to migrate pages due to non-fatal reasons. Another
option would be to simply continue with the changed semantic and
extend move_pages documentation to clarify that -errno is returned
on an invalid input or when migration simply cannot succeed (e.g.
-ENOMEM, -EBUSY) or the number of pages that couldn't have been
migrated due to ephemeral reasons (e.g. page is pinned or locked
for other reasons).
We decided to keep the second option in kernel because this
behavior is in place for some time without anybody complaining and
possibly new users depending on it. Also it allows to have a
slightly easier error handling as the caller knows that it is
worth to retry when err > 0.
Update man pages to reflect the new semantic.
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document that the signum argument is ignored in newer kernels, but
that user space should pass a valid real-time signal number for
backwards compatibility.
Cowritten-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Verified from inspection of kernel source code.
Reported-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The paragraph on Linux VM is rather generic, and does not belong
in DESCRIPTION. In fact, I'm not sure it even belongs in this
page. At the least, let's move it to NOTES.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
And while we are at it, remove a sentence that makes an obvious
point (that mremap() uses the Linux page table scheme).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Details of glibc 2.4, which is by now fairly old, would be
better at the end of NOTES than at the start.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From a mailing list conversation:
On 5/24/18 9:03 PM, Heinrich Schuchardt wrote:
> Hello Michael,
>
> in the mmap(2) man page MAP_ANON is described as deprecated.
>
> When I look at the NetBSD manpage
> http://netbsd.gw.com/cgi-bin/man-cgi?mmap+2+NetBSD-current
> I found that MAP_ANONYMOUS is not defined.
>
> https://www.dragonflybsd.org/cgi/web-man?command=mmap§ion=2
> indicates MAP_ANONYMOUS is an alias for MAP_ANON and is provided for
> compatibility.
>
> https://man.openbsd.org/mmap.2 also knows MAP_ANONYMOUS as a synonym.
>
> https://www.unix.com/man-page/osx/2/mmap/ does not know MAP_ANONYMOUS.
>
> So shouldn't the man page indicate that MAP_ANON is to be favored to
> write portable code? And correspondingly mark MAP_ANONYMOUS as synonym
> only kept for compatibility.
>
> The Open Group Base Specifications Issue 7, 2018 Edition does not
> reference either of both. So both values are not POSIX but it is not
> correct to describe them as Linux only.
The text saying that MAP_ANON is deprecated is ancient (at least
20 years old). I don't know why that text was added.
Things are not simple though: it looks like there's at least
one historical implementation (HP-US) that defines MAP_ANONYMOUS
but not MAP_ANON.
I've applied the patch below.
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted by Heinrich Schuchardt:
he list contains hex values of different constants. I just wonder for
which architecture (alpha, i386, mips, or sparc at that time). No
information is supplied.
Current values depend on the architecture, e.g.
On amd64
0x82307201 VFAT_IOCTL_READDIR_BOTH
0x82307202 VFAT_IOCTL_READDIR_SHORT
0x80047210 FAT_IOCTL_GET_ATTRIBUTES
0x40047211 FAT_IOCTL_SET_ATTRIBUTES
0x80047213 FAT_IOCTL_GET_VOLUME_ID
On mips
0x42187201 VFAT_IOCTL_READDIR_BOTH
0x42187202 VFAT_IOCTL_READDIR_SHORT
0x40047210 FAT_IOCTL_GET_ATTRIBUTES
0x80047211 FAT_IOCTL_SET_ATTRIBUTES
0x40047213 FAT_IOCTL_GET_VOLUME_ID
Hence hex values should be removed.
Reported-by:
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux kernel commit 337684a1746f "fs: return EPERM on immutable
inode" changed (nd unified the return value of the utimensat(2)
from -EACCES to -EPERM in case of an immutable flag. Modify the
man page to reflect the same.
The entire discussion of returning the correct return value is at:
http://lists.linux.it/pipermail/ltp/2017-January/003424.html
[mtk: The change was in Linux 4.8]
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A year cannot only begin with week number 53 of the previous year but
also with week number 52. Year 2011 is an example for this case, as
can be easily seen with GNU date:
$ date -d "jan 1 2011" "+%c %V %G"
Sat Jan 1 00:00:00 2011 52 2010
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The sysctls fs.protected_fifos and fs.protected_regular can cause
open(2) to fail with EACCES (see Documentation/sysctl/fs.rst for
details.)
Signed-off-by: Joseph C. Sible <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As far as I can see, /proc/sys/fs/aio-max-nr is a
system-wide limit, not a per-user limit. This seems to be
confirmed by comments in fs/aio.c (Linux 5.6) sources).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The restriction to CAP_SYS_ADMIN was removed from map_files in
2015 [1]. There was a fixme that indicted this might happen, but
the main text was never updated when this commit landed. While
we're at it, add a note about the ptrace access check that is
still required.
[1] bdb4d100af
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The info here is mostly ancient, certainly incomplete,
and is not consistently maintained. Best to remove it, I think.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Current code ignores the MPOL_MF_STRICT when handling hugetlb
mapping, now patch([1]) handles MPOL_MF_STRICT in same semantic as
other mapping. So, we can remove the note about 'MPOL_MF_STRICT
is ignored on huge page mappings', and no changes to other part of
man-page.
[1] https://lore.kernel.org/linux-mm/1581559627-6206-1-git-send-email-lixinhai.lxh@gmail.com/
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The display of the /proc/PID/ns renders very wide. Make it
narrower by eliminating some nonessential info via some
awk(1) filtering.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Andrei Vagin implemented a change I suggested:
clock-IDs are now be expressed in symbolic form (e.g.,
"monotonic") instead of numeric form (e.g., 1) when reading
/proc/PID/timerns_offsets, and can be expressed either
symbolically or numerically when writing to that file.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In particular, note the ERANGE restrictions reported by
Thomas Gleixner.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Andrew Micallef <andrew.micallef@live.com.au>
Reported-by: Walter Harms <wharms@bfs.de>
Reviewed-by: Andrew Micallef <andrew.micallef@live.com.au>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The cmdline file is a window into memory that is controlled by the
target process, and that memory may be changed arbitrarily, as can
the window via prctl settings. Make sure people understand that
this file is all an illusion.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A new feature of one-shoot polling through io_submit was
introduced in bfe4037e722ec commit. Keep things up-to-date due
to changes in linux/aio_abi.h.
Signed-off-by: Julia Suvorova <jusual@mail.ru>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
signal.7: Which signal is delivered in response to a CPU exception
is under-documented and does not always make sense. See
<https://bugzilla.kernel.org/show_bug.cgi?id=205831> for an
example where it doesn’t make sense; per the discussion there,
this cannot be changed because of backward compatibility concerns,
so let’s instead document the problem.
sigaction.2: For related reasons, the kernel doesn’t always fill
in all of the fields of the siginfo_t when delivering signals from
CPU exceptions. Document this as well. I imagine this one
_could_ be fixed, but the problem would still be relevant to
anyone using an older kernel.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After a comment from Matthew Bobrowski:
Although, I would just have to point out that it doesn't
necessarily have to be a "script" file, but rather a file of
any type that can have its contents interpreted, which then
results in a form of program execution i.e.
$ /usr/lib64/ld-linux-x86-64.so.2 ./foo
In this case, foo is not a "script" file.
Reported-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Back in 2011, a mail from Andrea Arcangeli noted some details
that I never got round to incorporating into the manual page.
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This subcommand was added a few years ago to support cpuid emulation
on x86 targets, but no changes to the man page appear to have been
made at the time. This commit adds a description for it and the
corresponding getter.
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The example is misleading. It is not a good idea to unlink an
existing socket because we might try to start the server multiple
times. In this case it is preferable to receive an error.
We could add code that removes the socket when the server process
is killed but that would stretch the example too far.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Update the details on AF_UNSPEC and circumstances in which
socket can be reconnected.
From a mail conversation with Eric Dumazet:
> connect() man page seems obsolete or confusing :
>
> Generally, connection-based protocol sockets may successfully
> connect() only once; connectionless protocol sockets may use
> connect() multiple times to change their association.
> Connectionless sockets may dissolve the association by connecting to
> an address with the sa_family member of sockaddr set to AF_UNSPEC
> (supported on Linux since kernel 2.2).>
>
> 1) At least TCP has supported AF_UNSPEC thing forever.
> 2) By definition connectionless sockets do not have an association,
> why would they call connect(AF_UNSPEC) to remove a connection
> which does not exist ...
Calling connect() on a connectionless socket serves two purposes:
a) Assigns a default outgoing address for datagrams (sent using write(2)).
b) Causes datagrams sent from sources other than the peer address to be
discarded.
Both of these things are true in AF_UNIX and the Internet domains.
Using connect(AF_UNSPEC) allows the local datagram socket to clear
this association (without having to connect() to a *different*
peer), so that now it can send datagrams to any peer and receive
datagrams for any peer, (I've just retested all of this.)
>
> Maybe we should rewrite this paragraph to match reality, since
> this causes confusion.
>
>
> Some protocol sockets may successfully connect() only once.
> Some protocol sockets may use connect() multiple times to change
> their association.
> Some protocol sockets may dissolve the association by connecting to
> an address with the sa_family member of sockaddr set to AF_UNSPEC
> (supported on Linux since kernel 2.2).
When I first saw your note, I was afraid that I had written
the offending text. But, I see it has been there since the
manual page was first added in 1992 (other than the piece
"(supported since on Linux since kernel 2.2)", which I added in
2007). Perhaps it was true in 1992.
Anyway, I confirm your statement about TCP sockets. The
connect(AF_UNSPEC) thing works; thereafter, the socket may be
connected to another socket.
Interestingly, connect(AF_UNSPEC) does not seem to work for
UNIX domain stream sockets. (My light testing gives an EINVAL
error on connect(AF_UNSPEC) of an already connected UNIX stream
socket. I could not easily spot where this error was being
generated in the kernel though.)
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note the kernel version that added SO_TIMESTAMPNS,
and (from the kernel commit) note tha SO_TIMESTAMPNS and
SO_TIMESTAMP are mutually exclusive.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
===========
DESCRIPTION
===========
I added a paragraph for ``SO_TIMESTAMP``, and modified the
paragraph for ``SIOCGSTAMP`` in relation to ``SO_TIMESTAMPNS``.
I based the documentation on the existing ``SO_TIMESTAMP``
documentation, and
on my experience using ``SO_TIMESTAMPNS``.
I asked a question on stackoverflow, which helped me understand
``SO_TIMESTAMPNS``:
https://stackoverflow.com/q/60971556/6872717
Testing of the feature being documented
=======================================
I wrote a simple server and client test.
In the client side, I connected a socket specifying
``SOCK_STREAM`` and ``"tcp"``.
Then I enabled timestamp in ns:
.. code-block:: c
int enable = 1;
if (setsockopt(sd, SOL_SOCKET, SO_TIMESTAMPNS, &enable,
sizeof(enable)))
goto err;
Then I prepared the msg header:
.. code-block:: c
char buf[BUFSIZ];
char cbuf[BUFSIZ];
struct msghdr msg;
struct iovec iov;
memset(buf, 0, ARRAY_BYTES(buf));
iov.iov_len = ARRAY_BYTES(buf) - 1;
iov.iov_base = buf;
msg.msg_name = NULL;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = cbuf;
msg.msg_controllen = ARRAY_BYTES(cbuf);
And got some times before and after receiving the msg:
.. code-block:: c
struct timespec tm_before, tm_recvmsg, tm_after, tm_msg;
clock_gettime(CLOCK_REALTIME, &tm_before);
usleep(500000);
clock_gettime(CLOCK_REALTIME, &tm_recvmsg);
n = recvmsg(sd, &msg, MSG_WAITALL);
if (n < 0)
goto err;
usleep(1000000);
clock_gettime(CLOCK_REALTIME, &tm_after);
After that I read the timestamp of the msg:
.. code-block:: c
struct cmsghdr *cmsg;
for (cmsg = CMSG_FIRSTHDR(&msg); cmsg;
cmsg = CMSG_NXTHDR(&msg, cmsg)) {
if (cmsg->cmsg_level == SOL_SOCKET &&
cmsg->cmsg_type == SO_TIMESTAMPNS) {
memcpy(&tm_msg, CMSG_DATA(cmsg), sizeof(tm_msg));
break;
}
}
if (!cmsg)
goto err;
And finally printed the results:
.. code-block:: c
double tdiff;
printf("%s\n", buf);
tdiff = timespec_diff_ms(&tm_before, &tm_recvmsg);
printf("tm_r - tm_b = %lf ms\n", tdiff);
tdiff = timespec_diff_ms(&tm_before, &tm_after);
printf("tm_a - tm_b = %lf ms\n", tdiff);
tdiff = timespec_diff_ms(&tm_before, &tm_msg);
printf("tm_m - tm_b = %lf ms\n", tdiff);
Which printed:
::
asdasdfasdfasdfadfgdfghfthgujty 6, 0;
tm_r - tm_b = 500.000000 ms
tm_a - tm_b = 1500.000000 ms
tm_m - tm_b = 18.000000 ms
System:
::
Linux debian 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 (2020-02-13) x86_64
GNU/Linux
gcc (Debian 9.3.0-8) 9.3.0
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Matthew Wilcox:
The current text of the lseek manpage is ambiguous about
the behaviour of lseek(SEEK_DATA) for a file which is
entirely a hole (or the end of the file is a hole and the
pos lies within the hole). The draft POSIX language is
specific (ENXIO is returned when whence is SEEK_DATA and
offset lies within the final hole of the file). Could I
trouble you to wordsmith that in?
If you want to look at the draft POSIX text, it's here:
https://www.austingroupbugs.net/view.php?id=415
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See Linux source as of v5.4:
kernel/time/posix-clock.c
Signed-off-by: Eric Rannaud <e@nanocritical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email discussions with Thomas Gleixner:
======
Hello Thomas, et al,
Following on from our discussion of read() on a timerfd [1], I
happened to remember a Debian bug report [2] that points out that
timer_settime() can fail with the error ECANCELED, which is both
surprising and odd (because despite the error, the timer does get
updated).
The relevant kernel code (I think, from your commit [3]) seems to be
the following in timerfd_setup():
if (texp != 0) {
if (flags & TFD_TIMER_ABSTIME)
texp = timens_ktime_to_host(clockid, texp);
if (isalarm(ctx)) {
if (flags & TFD_TIMER_ABSTIME)
alarm_start(&ctx->t.alarm, texp);
else
alarm_start_relative(&ctx->t.alarm, texp);
} else {
hrtimer_start(&ctx->t.tmr, texp, htmode);
}
if (timerfd_canceled(ctx))
return -ECANCELED;
}
Using a small test program [4] shows the behavior. The program loops,
repeatedly calling timerfd_settime() (with a delay of a few seconds
before each call). In another terminal window, enter the following
command a few times:
$ sudo date -s "5 seconds" # Add 5 secs to wall-clock time
I see behavior as follows (the /sudo date -s "5 seconds"/ command was
executed before loop iterations 0, 2, and 4):
[[
$ ./timerfd_settime_ECANCELED
0
Current time is 1585729978 secs, 868510078 nsecs
Timer value is now 0 secs, 0 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999991977 nsecs
1
Current time is 1585729982 secs, 716339545 nsecs
Timer value is now 6 secs, 152167990 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999992940 nsecs
2
Current time is 1585729991 secs, 567377831 nsecs
Timer value is now 1 secs, 148959376 nsecs
timerfd_settime: Operation canceled
Timer value is now 9 secs, 999976294 nsecs
3
Current time is 1585729995 secs, 405385503 nsecs
Timer value is now 6 secs, 161989917 nsecs
timerfd_settime() succeeded
Timer value is now 9 secs, 999993317 nsecs
4
Current time is 1585730004 secs, 225036165 nsecs
Timer value is now 1 secs, 180346909 nsecs
timerfd_settime: Operation canceled
Timer value is now 9 secs, 999984345 nsecs
]]
I note from the above.
(1) If the wall-clock is changed before the first timerfd_settime()
call, the call succeeds. This is of course expected.
(2) If the wall-clock is changed after a timerfd_settime() call, then
the next timerfd_settime() call fails with ECANCELED.
(3) Even if the timerfd_settime() call fails, the timer is still updated(!).
Some questions:
(a) What is the rationale for timerfd_settime() failing with ECANCELED
in this case? (Currently, the manual page says nothing about this.)
(b) It seems at the least surprising, but more likely a bug, that
timerfd_settime() fails with ECANCELED while at the same time
successfully updating the timer value.
Your thoughts?
Thanks,
Michael
[1] https://lore.kernel.org/lkml/3cbd0919-c82a-cb21-c10f-0498433ba5d1@gmail.com/
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947091
[3]
commit 99ee5315dac6211e972fa3f23bcc9a0343ff58c4
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Apr 27 14:16:42 2011 +0200
timerfd: Allow timers to be cancelled when clock was set
[4]
/* timerfd_settime_ECANCELED.c */
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <inttypes.h>
#include <sys/timerfd.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
int
main(int argc, char *argv[])
{
struct itimerspec ts, gts;
struct timespec start;
int tfd = timerfd_create(CLOCK_REALTIME, 0);
if (tfd == -1)
errExit("timerfd_create");
ts.it_interval.tv_sec = 0;
ts.it_interval.tv_nsec = 10;
int flags = TFD_TIMER_ABSTIME | TFD_TIMER_CANCEL_ON_SET;
for (long j ; ; j++) {
/* Inject a delay into each loop, by calling getppid()
many times */
for (int k = 0; k < 10000000; k++)
getppid();
if (j % 1 == 0)
printf("%ld\n", j);
/* Display the current wall-clock time */
if (clock_gettime(CLOCK_REALTIME, &start) == -1)
errExit("clock_gettime");
printf("Current time is %ld secs, %ld nsecs\n",
start.tv_sec, start.tv_nsec);
/* Before resetting the timer, retrieve its current value
so that after the timerfd_settime() call, we can see
whether the the value has changed */
if (timerfd_gettime(tfd, >s) == -1)
perror("timerfd_gettime");
printf("Timer value is now %ld secs, %ld nsecs\n",
gts.it_value.tv_sec, gts.it_value.tv_nsec);
/* Reset the timer to now + 10 secs */
ts.it_value.tv_sec = start.tv_sec + 10;
ts.it_value.tv_nsec = start.tv_nsec;
if (timerfd_settime(tfd, flags, &ts, NULL) == -1)
perror("timerfd_settime");
else
printf("timerfd_settime() succeeded\n");
/* Display the timer value once again */
if (timerfd_gettime(tfd, >s) == -1)
perror("timerfd_gettime");
printf("Timer value is now %ld secs, %ld nsecs\n",
gts.it_value.tv_sec, gts.it_value.tv_nsec);
printf("\n");
}
}
=======
Subject: Re: timer_settime() and ECANCELED
Date: Wed, 01 Apr 2020 19:42:42 +0200
From: Thomas Gleixner <tglx@linutronix.de>
Michael,
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> Following on from our discussion of read() on a timerfd [1], I
> happened to remember a Debian bug report [2] that points out that
> timer_settime() can fail with the error ECANCELED, which is both
> surprising and odd (because despite the error, the timer does get
> updated).
...
> (1) If the wall-clock is changed before the first timerfd_settime()
> call, the call succeeds. This is of course expected.
> (2) If the wall-clock is changed after a timerfd_settime() call, then
> the next timerfd_settime() call fails with ECANCELED.
> (3) Even if the timerfd_settime() call fails, the timer is still updated(!).
>
> Some questions:
> (a) What is the rationale for timerfd_settime() failing with ECANCELED
> in this case? (Currently, the manual page says nothing about this.)
> (b) It seems at the least surprising, but more likely a bug, that
> timerfd_settime() fails with ECANCELED while at the same time
> successfully updating the timer value.
Really good question and TBH I can't remember why this is implemented in
the way it is, but I have a faint memory that at least (a) is
intentional.
After staring at the code for a while I came up with the following
answers:
(a): If the clock was set event ("date -s ...") which triggered the
cancel was not yet consumed by user space via read(), then that
information would get lost because arming the timer to the new
value has to reset the state.
(b): Arming the timer in that case is indeed very questionable, but it
could be argued that because the clock was set event happened with
the old expiry value that the new expiry value is not affected.
I'd be happy to change that and not arm the timer in the case of a
pending cancel, but I fear that some user space already depends on
that behaviour.
Thanks,
tglx
======
Subject: Re: timer_settime() and ECANCELED
Date: Thu, 02 Apr 2020 10:49:18 +0200
From: Thomas Gleixner <tglx@linutronix.de>
To: Michael Kerrisk (man-pages) <mtk.manpages@gmail.com>
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> On 4/1/20 7:42 PM, Thomas Gleixner wrote:
>> (b): Arming the timer in that case is indeed very questionable, but it
>> could be argued that because the clock was set event happened with
>> the old expiry value that the new expiry value is not affected.
>>
>> I'd be happy to change that and not arm the timer in the case of a
>> pending cancel, but I fear that some user space already depends on
>> that behaviour.
>
> Yes, that's the risk, of course. So, shall we just document all
> this in the manual page?
I think so.
Thanks,
tglx
======
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch documents the PR_SET_IO_FLUSHER and PR_GET_IO_FLUSHER
prctl commands added to the linux kernel for 5.6 in commit:
commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463
Author: Mike Christie <mchristi@redhat.com>
Date: Mon Nov 11 18:19:00 2019 -0600
prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
Reviewed-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mostly verified by testing and reading the code.
There is unfortunately quite a bit of inconsistency across API~s:
clock_gettime clock_settime clock_nanosleep timer_create timerfd_create
CLOCK_BOOTTIME y n (EINVAL) y y y
CLOCK_BOOTTIME_ALARM y n (EINVAL) y [1] y [1] y [1]
CLOCK_MONOTONIC y n (EINVAL) y y y
CLOCK_MONOTONIC_COARSE y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL)
CLOCK_MONOTONIC_RAW y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL)
CLOCK_REALTIME y y y y y
CLOCK_REALTIME_ALARM y n (EINVAL) y [1] y [1] y [1]
CLOCK_REALTIME_COARSE y n (EINVAL) n (ENOTSUP) n (ENOTSUP) n (EINVAL)
CLOCK_TAI y n (EINVAL) y y n (EINVAL)
CLOCK_PROCESS_CPUTIME_ID y n (EINVAL) y y n (EINVAL)
CLOCK_THREAD_CPUTIME_ID y n (EINVAL) n (EINVAL [2]) y n (EINVAL)
pthread_getcpuclockid() y n (EINVAL) y y n (EINVAL)
[1] The caller must have CAP_WAKE_ALARM, or the error EPERM results.
[2] This error is generated in the glibc wrapper.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Presumably (and from a quick glance at the source code)
since Linux 3.10. when CLOCK_TAI was introduced.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Presumably (and from a quick glance at the source code)
since Linux 2.6.39, when CLOCK_BOOTTIME was introduced.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From email discussions with Aleksa Sarai:
> .\" FIXME I find the "previously-functional systems" in the previous
> .\" sentence a little odd (since openat2() ia new sysycall), so I would
> .\" like to clarify a little...
> .\" Are you referring to the scenario where someone might take an
> .\" existing application that uses openat() and replaces the uses
> .\" of openat() with openat2()? In which case, is it correct to
> .\" understand that you mean that one should not just indiscriminately
> .\" add the RESOLVE_NO_XDEV flag to all of the openat2() calls?
> .\" If I'm not on the right track, could you point me in the right
> .\" direction please.
This is mostly meant as a warning to hopefully avoid applications
because the developer didn't realise that system paths may contain
symlinks or bind-mounts. For an application which has switched to
openat2() and then uses RESOLVE_NO_SYMLINKS for a non-security reason,
it's possible that on some distributions (or future versions of a
distribution) that their application will stop working because a system
path suddenly contains a symlink or is a bind-mount.
This was a concern which was brought up on LWN some time ago. If you can
think of a phrasing that makes this more clear, I'd appreciate it.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Devi R K reported this issue, and went on to note:
> We have written a program using real time clock and it has been raised to
> the community.
>
> https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/
[...]
Thanks for pointing me at that thread. In particular, the test
program at
https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/#m489d81abdfbb2699743e18c37657311f8d52a4cd
[...]
I think this patch does not really capture the details
properly. The immediately preceding paragraph says:
If the associated clock is either CLOCK_REALTIME or
CLOCK_REALTIME_ALARM, the timer is absolute
(TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET
was specified when calling timerfd_settime(), then read(2)
fails with the error ECANCELED if the real-time clock
undergoes a discontinuous change. (This allows the reading
application to discover such discontinuous changes to the
clock.)
Following on from that, I think we should have a paragraph that says
something like:
If the associated clock is either CLOCK_REALTIME or
CLOCK_REALTIME_ALARM, the timer is absolute
(TFD_TIMER_ABSTIME), and the flag TFD_TIMER_CANCEL_ON_SET
was not specified when calling timerfd_settime(), then a
discontinuous negative change to the clock
(e.g., clock_settime(2)) may cause read(2) to unblock, but
return a value of 0 (i.e., no bytes read), if the clock
change occurs after the time expired, but before the
read(2) on the timerfd file descriptor.
This seems consistent with Thomas's observations in
https://lore.kernel.org/lkml/alpine.DEB.2.21.1908191943280.1796@nanos.tec.linutronix.de/T/#m49b78122b573a2749a05b720dc9fa036546db490
==
Thomas Gleixner replied:
Yes, that's correct. Accurate as always!
This is pretty much in line with clock_nanosleep(CLOCK_REALTIME,
TIMER_ABSTIME) which has a similar problem vs. observability in user
space.
clock_nanosleep(2) mutters:
"POSIX.1 specifies that after changing the value of the CLOCK_REALTIME
clock via clock_settime(2), the new clock value shall be used to
determine the time at which a thread blocked on an absolute
clock_nanosleep() will wake up; if the new clock value falls past the
end of the sleep interval, then the clock_nanosleep() call will return
immediately."
which can be interpreted as guarantee that clock_nanosleep() never
returns prematurely, i.e. the assert() in the below code would indicate
a kernel failure:
ret = clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &expiry, NULL);
if (!ret) {
clock_gettime(CLOCK_REALTIME, &now);
assert(now >= expiry);
}
But that assert can trigger when CLOCK_REALTIME was modified after the
timer fired and the kernel decided to wake up the task and let it return
to user space.
clock_nanosleep(..., &expiry)
arm_timer(expires);
schedule();
-> timer interrupt
now = ktime_get_real();
if (expires <= now)
-------------------------------- After this point
wakeup(); clock_settime(2) or
adjtimex(2) which
makes CLOCK_REALTIME
jump back far enough will
cause the above assert
to trigger.
...
return from syscall (retval == 0)
There is no guarantee against clock_settime() coming after the
wakeup. Even if we put another check into the return to user path then
we won't catch a clock_settime() which comes right after that and before
user space invokes clock_gettime().
POSIX spec Issue 7 (2018 edition) says:
The suspension for the absolute clock_nanosleep() function (that is,
with the TIMER_ABSTIME flag set) shall be in effect at least until the
value of the corresponding clock reaches the absolute time specified by
rqtp.
And that's what the kernel implements for clock_nanosleep() and timerfd
behaves exactly the same way.
The wakeup of the waiter, i.e. task blocked in clock_nanosleep(2),
read(2), poll(2), is not happening _before_ the absolute time specified
is reached.
If clock_settime() happens right before the expiry check, then it does
the right thing, but any modification to the clock after the wakeup
cannot be mitigated. At least not in a way which would make the assert()
in the example code above a reliable indicator for a kernel fail.
That's the reason why I rejected the attempt to mitigate that particular
0 tick issue in timerfd as it would just scratch a particular itch but
still not provide any guarantee. So having the '0' return documented is
the right way to go.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: devi R.K <devi.feb27@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For me, source lines such as:
.BR perf_setattr "(2), " perf_event_open "(2), and " clone3 (2).
is harder to read than:
.BR perf_setattr (2),
.BR perf_event_open (2),
and
.BR clone3 (2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There are currently three of these forward references (two in
DESCRIPTION, one in ERRORS). This is a little redundant.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rather than trying to merge the new syscall documentation into
open.2 (which would probably result in the man-page being
incomprehensible), instead the new syscall gets its own dedicated
page with links between open(2) and openat2(2) to avoid
duplicating information such as the list of O_* flags or common
errors.
In addition to describing all of the key flags, information about
the extensibility design is provided so that users can better
understand why they need to pass sizeof(struct open_how) and how
their programs will work across kernels. After some discussions
with David Laight, I also included explicit instructions to zero
the structure to avoid issues when recompiling with new headers.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The fact that CLOCK_PROCESS_CPUTIME_ID and
CLOCK_PROCESS_CPUTIME_ID are not settable isn't a bug,
since POSIX does allow the possibility that these clocks
are not settable.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The sixth argument of futex is uaddr2, instead of uaddr.
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux 5.6 added the new well-known VMADDR_CID_LOCAL for
local communication.
This patch explains how to use it and removes the legacy
VMADDR_CID_RESERVED no longer available.
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add example programs demonstrating usage of shmget(2), shmat(2),
semget(2), semctl(2), and semop(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the verity attribute for statx(), which was added in
Linux 5.5.
For more context, see the fs-verity documentation:
https://www.kernel.org/doc/html/latest/filesystems/fsverity.html
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix clone3() syscall description for CLONE_PARENT_SETTID: kernel uses
cl_args.parent_tid instead of the specified cl_args.child_tid.
Signed-off-by: Krzysztof Małysa <varqox@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a '.RE' macro to terminate the last .RS block.
There is no change in the output.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man3/strftime.3 (%C): Describe the meaning of %EC conversion
specification.
(%E): Mention the concept of "era" in description.
(%O): Mention that alternative format is related to numeric
representation.
(%y): Describe the meaning of %Ey conversion specification.
(%Y): Describe the meaning of %EY conversion specification.
(.SH DESCRIPTION): Mention that the behaviour of %E modifier is governed
by ERA locale element and provide ja_JP locale as an example.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As it wasn't clear before where this kind of information can be
obtained from.
* man3/strftime.3 (%a, %A, %b, %B, %c, %p, %r, %x, %X): Add information
about the locale elements that can be used to retrieve the relevant
information using nl_langinfo() library call.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Consecutive calls to clock_gettime(CLOCK_MONOTONIC) are guaranteed
to return MONOTONIC values, which means that they either return
the *SAME* time value like the last call, or a later (higher) time
value.
Due to high resolution counters, like TSC on x86, most people see
that the values returned increase, but on other less common
platforms it's less likely that consecutive calls return newer
values, and instead users may unexpectedly get back the SAME time
value.
I think it makes sense to document that people should not expect
to see "always-growing" time values. For example in Debian I've
seen in quite some source packages where return values of
consecutive calls are compared against each other and then the
package build fails if they are equal (e.g. ruby-hitimes, ...).
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man2/syscalls.2 (.SH DESCRIPTION) <\fBgetdtablesize\fP(2)>: Remove "since
Linux 2.0" part for the osf_getdtablesize note, as syscall is generally
available since Linux 2.0; add line break after the word "as".
(.SH DESCRIPTION) <\fBpwrite\fP(2)>: Add line breaks.
(.SH DESCRIPTION) <\fBvm86old\fP(2)>: Add a line break after "in".
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In many cases, these don't improve readability, and (when stacked)
they sometimes have the side effect of sometimes forcing text
to be justified within a narrow column range.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
And add self to copyright, since, by now, the majority of the
text in the page has now been (re)written by me.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Improve structure and readability, at the same time incorporating
text and details that were formerly in select_tut(2). Also
move a few details in other parts of the page into DESCRIPTION.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The POSIX situation has been the norm for a long time now,
and including ancient details overcomplicates the page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The discussion about pre-POSIX types for 'timeval' and 'timespec'
is rather old, and these days serves mainly to complicate the
page. Remove it.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current text layout is a little hard to parse, with details of
pselect() spread in the main description. Move some of that text
to a headed subsection, and add a one-sentence introduction
describing the purpose of pselect().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
One might be tempted to think that realloc() always requests a new
allocation before moving the contents over (at least in the case
where the new size is bigger than the original). This is not the
case; for example, on my system the following program:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
void *x = malloc(15);
void *y = malloc(32);
printf("x = %p\n", x);
printf("y = %p\n", y);
printf("usable_size(x) = %lu\n", malloc_usable_size(x));
void *z = realloc(x, 24);
printf("z = %p\n", z);
return 0;
}
prints:
x = 0x1b3a010
y = 0x1b3a030
usable_size(x) = 24
z = 0x1b3a010
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Admittedly, the POSIX specification for exit() also uses octal.
However, 0xFF immediately indicates the lowest 8 bits to me
whereas I had to think a bit about the octal mask.
Cowritten-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This was already shown in an earlier version of the page,
but Adam Borowski's patch replaced it with an alternative.
Probably, it is better to show both possibilities.
Reported-by: "Joseph C. Sible" <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the example snippet, we already have the fd, thus there's no
need to refer to the file by name. And, /proc/ might be not
mounted or not accessible.
Noticed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Supported since fadb4244085cd04fd9c8b3a4b3bc161f506431f3 (4.9),
100..107 are supposed to be bright but this does not yet work
(unmerged patches to do so exist).
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Supported since cec5b2a97a11ade56a701e83044d0a2a984c67b4 (3.16).
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since 65d9982d7e523a1a8e7c9af012da0d166f72fc56 (4.17), it follows
xterm rather than common sense and consistency, being the only
command 1..9 where N+20 doesn't undo what N did. As libvte
0.51.90 got changed the same way, this behaviour will probably
stay.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
From an email by Rich Felker:
It came to my attention while reviewing possible breakage with
move to 64-bit time_t that some applications are dereferencing
data in socket control messages (particularly SCM_TIMESTAMP*)
in-place as the message type, rather than memcpy'ing it to
appropriate storage. This necessarily does not work and is not
supportable if the message contains data with greater alignment
requirement than the header. In particular, on 32-bit archs,
cmsghdr has size 12 and alignment 4, but struct timeval and
timespec may have alignment requirement 8.
I found at least ptpd, socat, and ssmping doing this via Debian
Code Search:
https://sources.debian.org/src/ptpd/2.3.1-debian1-4/src/dep/net.c/?hl=1578#L1578https://sources.debian.org/src/socat/1.7.3.3-2/xio-socket.c/?hl=1839#L1839https://sources.debian.org/src/ssmping/0.9.1-3/ssmpngcl.c/?hl=307#L307
and I suspect there are a good deal more out there. On most archs
they won't break, or will visibly break with SIGBUS, but in theory
it's possible that they silently read wrong data and this might
happen on some older and more tiny-embedded-oriented archs.
I think it's clear to someone who understands alignment and who's
thought about it that applications just can't do this, but it
doesn't seem to be documented, and an example in cmsg(3) even
shows access to int payload via *(int *)CMSG_DATA(cmsg) (of course
int is safe because its alignment is <= header alignment, but this
is not mentioned).
Could we add text, and perhaps change the example, to indicate
that in general memcpy needs to be used to copy the payload
to/from a suitable object?
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
PTRACE_EVENT_STOP does not always report SIGTRAP, can be the
signal which stopped us
While at it, fix an obvious copy/paste error in
PTRACE_GET_SYSCALL_INFO description.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Somewhat surprisingly, perf_event_open() can fail with EINTR when
trying to enable perf reporting for a uprobe that's already been
configured for use with ftrace. Mention this error in the man
page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch is a minor wording fix in getcwd.3 that changes "In the case getcwd()" to "In the case of getcwd()". This patch should apply cleanly to the master branch of the git repository.
Regards,
Mike Salvatore
From 3b68ad225dbaada2b1b55153dc57807b04531cd6 Mon Sep 17 00:00:00 2001
From: Mike Salvatore <mike.salvatore@canonical.com>
Date: Thu, 16 Jan 2020 16:08:08 -0500
Subject: [PATCH] getcwd.3: wfix
Signed-off-by: Mike Salvatore <mike.salvatore@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The man page contains a trivial bug that's discussed here:
https://stackoverflow.com/q/59628958
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fixes: d8d701003 ("malloc.3: Since glibc 2.29, realloc() is exposed by
defining _DEFAULT_SOURCE")
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The glibc implementation of getpt has actually never been setting
O_NOCTTY when opening /dev/ptmx or BSD ptys.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux kernel commit e78bbfa82624 ("mm: stop returning -ENOENT from
sys_move_pages() if nothing got migrated") had the effect of
*never* returning -ENOENT, in any situation. So we need to update
the man page to reflect that ENOENT is not a possible return
value.
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
PVS-Studio reports that in
char buf[8192];
/* ... */
nh = (struct nlmsghdr *) buf,
the pointer 'buf' is cast to a more strictly aligned pointer type.
This is undefined behaviour. One possible solution to make sure
that buf is correctly aligned is to declare buf as an array of
struct nlmsghdr. Other solutions include allocating the array on
the heap, use an union, or stdalign features. With this patch,
the buffer still contains 8192 bytes.
This was raised on Stack Overflow:
https://stackoverflow.com/questions/57745580/netlink-receive-buffer-alignment
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Historically (before Linux 2.6.23), base_addr was unsigned long
for 32-bit code and unsigned int for 64-bit code. In other words,
it was always a 32-bit value. When the ldt.h header files were
unified, the type became unsigned int on all systems. Update
modify_ldt.2 and set_thread_area.2 accordingly.
Indeed, on x86, the GDT and LDT specify 32-bit bases for code and
data segments, and this has nothing to do with the kernel.
Reported-by: "Metzger, Markus T" <markus.t.metzger@intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The programmer should not need to care about the numeric values,
and their inclusion is verbosity.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since kernel commit 3dd4d40b4208("xfs: Sanity check flags
of Q_XQUOTARM call"), it has added flags check. If it is
not usr,grp,prj quota type, it will report EINVAL.
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The definition of the tpacket_auxdata struct in the manpage is not
the same as the definition found in
/include/uapi/linux/if_packet.h.
In particular, instead of a tp_padding field, there is a
tp_vlan_tpid field. An example of a project using this field is
libpcap[1].
[1]: https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-linux.c#L349
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Added two missing parentheses
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This surely meant to say clone3() and not clone(3).
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The CLONE_PARENT flag cannot but used by init processes. Let's mention
this in the manpages to prevent surprises.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The introductory paragraphs note that "the calling process" is
normally synonymous with the "the parent process", except in the
case of CLONE_PARENT. The same is also true of CLONE_THREAD.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The structure 'struct sockaddr_vm' has additional element
'unsigned char svm_zero[]' since version v3.9-rc1
(include/uapi/linux/vm_sockets.h). Linux kernel checks that this
element is zeroed (net/vmw_vsock/vsock_addr.c). Reflect this on
the vsock man page.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=205583
Signed-off-by: Mikhail Golubev <Mikhail.Golubev@opensynergy.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Advertise to userspace that they should use proper pid_t types
for arguments returning a pid.
The kernel-internal struct kernel_clone_args currently uses int
as type and since POSIX mandates that pid_t is a signed integer
type and glibc and friends use int this is not an issue. After
the merge window for v5.5 closes we can switch struct
kernel_clone_args over to using pid_t as well without any danger
in regressing current userspace.
Also note, that the new set tid feature which will be merged for
v5.5 uses pid_t types as well.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If mmap() fails it will return MAP_FAILED which according to the manpage
is (void *)-1 not NULL.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix two spelling mistakes in manpage describing the clone{2,3}()
syscalls/syscall wrappers.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reword a little to allow for the fact that there are now
*two* reasons to consider using this flag.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Christian Brauner suggested mmap(MAP_STACKED), rather than
malloc(), as the canonical way of allocating a stack for the
child of clone(), and Jann Horn noted some reasons why:
Not on Linux, but on OpenBSD, they do use MAP_STACK now
AFAIK; this was announced here:
<http://openbsd-archive.7691.n7.nabble.com/stack-register-checking-td338238.html>.
Basically they periodically check whether the userspace
stack pointer points into a MAP_STACK region, and if not,
they kill the process. So even if it's a no-op on Linux, it
might make sense to advise people to use the flag to improve
portability? I'm not sure if that's something that belongs
in Linux manpages.
Another reason against malloc() is that when setting up
thread stacks in proper, reliable software, you'll probably
want to place a guard page (in other words, a 4K PROT_NONE
VMA) at the bottom of the stack to reliably catch stack
overflows; and you probably don't want to do that with
malloc, in particular with non-page-aligned allocations.
And the OpenBSD 6.5 manual pages says:
MAP_STACK
Indicate that the mapping is used as a stack. This
flag must be used in combination with MAP_ANON and
MAP_PRIVATE.
And I then noticed that MAP_STACK seems already to be on
FreeBSD for a long time:
MAP_STACK
Map the area as a stack. MAP_ANON is implied.
Offset should be 0, fd must be -1, and prot should
include at least PROT_READ and PROT_WRITE. This
option creates a memory region that grows to at
most len bytes in size, starting from the stack
top and growing down. The stack top is the start‐
ing address returned by the call, plus len bytes.
The bottom of the stack at maximum growth is the
starting address returned by the call.
The entire area is reserved from the point of view
of other mmap() calls, even if not faulted in yet.
Reported-by: Jann Horn <jannh@google.com>
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The obsolete CLONE_DETACHED flag has never been properly
documented, but now the discussion CLONE_PIDFD also requires
mention of CLONE_DETACHED. So, properly document CLONE_DETACHED,
and mention its interactions with CLONE_PIDFD.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Sometimes the descriptions of these flags mentioned the
corresponding section 7 namespace manual page and then the
required capabilities, and sometimes the order was the was
the reverse. Make it consistent.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove details of UTS, IPC, and network namespaces that are
already covered in the corresponding namespaces pages in
section 7. This change is for consistency, since corresponding
details were not provided for other namespace types in clone(2)
and these details do not appear in unshare(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After feedback from Christian Brauner [1], I've adjusted a few pieces
of the clone3() text, and also adjusted some of the older text in
the page.
[1] https://lore.kernel.org/linux-man/20191107151941.dw4gtul5lrtax4se@wittgenstein/
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Change the text in the introductory paragraph (which was written
20 years ago) to reflect the fact that clone*() does more things
nowadays.
Cowritten-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Adjust references to namespaces(7) to be references to pages
describing specific namespace types.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
For Q_QUOTAON, on old kernel we can use quotacheck -ug to generate
quota files. But in current kernel, we can also hide them in
system inodes and indicate them by using "quota" or project
feature.
For user or group quota, we can do as below (etc ext4):
mkfs.ext4 -F -o quota /dev/sda5
mount /dev/sda5 /mnt
quotactl(QCMD(Q_QUOTAON, USRQUOTA), /dev/sda5, QFMT_VFS_V0, NULL);
For project quota, we can do as below (etc ext4):
mkfs.ext4 -F -o quota,project /dev/sda5
mount /dev/sda5 /mnt
quotactl(QCMD(Q_QUOTAON, PRJQUOTA), /dev/sda5, QFMT_VFS_V0, NULL);
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the given example, the second recvmsg(2) call should receive four bytes,
as the third sendmsg(2) call only sends four.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use "flags mask" as a generic term to refer to the clone()
'flags' argument and the clone3() 'cl_args.flags' field.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Sometime soon, we'll have to add documentation of clone3() to this
page. As a preparatorys step, make the names of the clone()
arguments the same as the fields in the clone3() 'args' struct:
ctid ==> child_pid
ptid ==> parent_tid
newtls ==> tld
child_stack ==> stack
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As noted in kernel commit 821cc7b0b205c0df64cce59aacc330af251fa8f7,
threads create an ambiguity: what if the calling process's PGID
is changed by another thread while waitpid(0, ...) is blocked?
So, clarify that waitpid(0, ...) means wait for children whose
PGID matches the caller's PGID at the time of the call to
waitpid().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since Linux 5.4, idtype == P_PGID && id == 0 can be used to wait
on children in same process group as caller.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since kernel commit a9a08845e9acbd224e4ee466f5c1275ed50054e8, the
equivalence between select() and poll()/epoll is defined in terms
of the EPOLL* constants, rather than the POLL* constants.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Christian noted that SA_NOCLDWAIT also matters in this scenario.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After review comments from Christian and Daniel.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Thus, pidfd_open() is the preferred way of obtaining a PID
file descriptor.
Notes from a conversation with Christian Brauner:
[[
> A further question... We now have three ways of getting a
> process file descriptor [*]:
>
> open() of /proc/PID
> pidfd_open()
> clone()/clone3() with CLONE_PIDFD
>
> I thought the FD was supposed to be equivalent in all three cases.
> However, if I try (on kernel 5.3) poll() an FD returned by opening
> /proc/PID, poll() tells me POLLNVAL for the FD. Is that difference
> intentional? (I am guessing it is not.)
It's intentional.
The short answer is that /proc/<pid> is a convenience for sending
signals.
The longer answer is that this stems from a heavy debate about what a
process file descriptor was supposed to be and some people pushing for
at least being able to use /proc/<pid> dirfds while ignoring security
problems as soon as you're talking about returning those fds from
clone(); not to mention the additional problems discovered when trying
to implementing this.
A "real" pidfd is one from CLONE_PIDFD or pidfd_open() and all features
such as exit notification, read, and other future extensions will only
be implemented on top of them.
As much as we'd have liked to get rid of two different file descriptor
types it doesn't hurt us much and is not that much different from what
we will e.g. see with fsinfo() in the new mount api which needs to work
on regular fds gotten via open()/openat() and mountfds gotten from
fsopen() and fspick(). The mountfds will also allow for advanced
operations that the other ones will not. There's even an argument to be
made that fds you will get from open()/openat() and openat2() are
different types since they have very different behavior; openat2()
returning fds that are non arbitrarily upgradable etc.
]]
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Notes from a conversation on linux-man@ with Christian Brauner:
[[
> [*} By the way, going forward, can we call these things
> "process FDs", rather than "PID FDs"? The API names are what
> they are, an that's okay, but these just as we have socket
> FDs that refer to sockets, directory FDs that refer to
> directories, and timer FDs that refer to timers, and so on,
> these are FDs that refer to *processes*, not "process IDs".
> It's a little thing, but I think the naming better, and
> it's what I propose to use in the manual pages.
The naming was another debate and we ended with this compromise.
I would just clarify that a pidfd is a process file descriptor. I
wouldn't make too much of a deal of hiding the shortcut "pidfd".
People are already using it out there in the wild and it's never
proven a good idea to go against accepted practice.
]]
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the kernel source (kernel/fork.c::copy_process()), there is:
pidfile = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
O_RDWR | O_CLOEXEC);
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add an entry for CLONE_PIDFD. This flag is available starting
with kernel 5.2. If specified, a process file descriptor
("pidfd") referring to the child process will be returned in
the ptid argument.
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After my rewriting, almost nothing of the original page remains,
so update the copyright. As the author, I'm relicensing to the
"verbatim" license most commonly used in man pages.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The text stating that "pivot_root() may or may not change the
current root and the current working directory of any processes
or threads which use the old root directory" was written 19 years
ago, before the system call itself was even finalized in the
kernel. The implementation has never changed, and it won't
change in the future, since that would cause user-space breakage.
The existence of that text in DESCRIPTION, followed by qualifying
text stating what the implementation actually does (and has always
done) makes for confusing reading. Therefore, relegate this text
to a historical note in NOTES (so that readers with long memories
can see why the manual page was changed) and rework the text in
DESCRIPTION accordingly.
Reported-by: Philipp Wendler <ml@philippwendler.de>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Reid Priedhorsky <reidpr@lanl.gov>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Eric:
If we are going to be pedantic "filesystem" is really the
wrong concept here. The section about bind mount clarifies
it, but I wonder if there is a better term.
I think I would say: "new_root and put_old must not be on
the same mount as the current root."
I think using "mount" instead of "filesystem" keeps the
concepts less confusing.
As I am reading through this email and seeing text that is
trying to be precise and clear then hitting the term
"filesystem" is a bit jarring. pivot_root doesn't care a
thing for file systems. pivot_root only cares about mounts.
And by a "mount" I mean the thing that you get when you
create a bind mount or you call mount normally.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Philipp Wendler noted that the text on the restrictions for
'new_root' was slightly contradictory, and things could be
clarified and simplified by describing the restrictions on
'new_root' in one place.
Reported-by: Philipp Wendler <ml@philippwendler.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove the text that suggests that pivot_root() changes the root
directory and CWD of process that have directory and CWD on the
old root *filesystem*. Change "filesystem" to "directory".
Reported-by: Philipp Wendler <ml@philippwendler.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make the page more compact by removing the stub subsections that
list the manual pages for the namespace types. And while we're
here, add an explanation of the table columns.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Eric Biederman:
I hate to nitpick, but I am going to say that when I read
the text above the phrase "mount namespace of the process
that created the new mount namespace" feels wrong.
Either you use unshare(2) and the mount namespace of the
process that created the mount namespace changes.
Or you use clone(2) and you could argue it is the new child
that created the mount namespace.
Having a different mount namespace at the end of the
creation operation feels like it makes your phrase confusing
about what the starting mount namespace is. I hate to use
references that are ambiguous when things are changing.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Provide a more detailed explanation of the initialization of
the mount point list in a new mount namespace.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reid noted a confusion between 'old_root' (my attempt at a
shorthand for the old root point) and 'put_old. Eliminate the
confusion by replacing the shorthand with "old root mount point".
Reported-by: Reid Priedhorsky <reidpr@lanl.gov>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current text talks about "parent mount namespaces", but there
is no such concept. As confirmed by Eric Biederman, what is mean
here is "the mount namespace this mount namespace started as a
copy of". So, this change writes up Eric's description in a more
detailed way.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Eric Biederman notes that the change in commit f646ac88ef was
not strictly necessary for this example, since one of the already
documented requirements is that various mount points must not have
shared propagation, or else pivot_root() will fail. So, simplify
the example.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After creating a new mount namespace, it may be desirable to
disable mount propagation. Give the reader a more explicit
hint about this.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In a recent conversation with Mathieu Desnoyers I was reminded
that we haven't written up anything about how deferred
cancellation and asynchronous signal handlers interact. Mathieu
ran into some of this behaviour and I promised to improve the
documentation in this area to point out the potential pitfall.
Thoughts?
8< --- 8< --- 8<
In pthread_setcancelstate.3, pthreads.7, and signal-safety.7 we
describe that if you have an asynchronous signal nesting over a
deferred cancellation region that any cancellation point in the
signal handler may trigger a cancellation that will behave
as-if it was an asynchronous cancellation. This asynchronous
cancellation may have unexpected effects on the consistency of
the application. Therefore care should be taken with asynchronous
signals and deferred cancellation.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Update with all the missing errors the syscall can return, the
behaviour the syscall should have w.r.t. to copies within single
files, etc.
[Amir] updates for final released version.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In RETURN VALUE, point reader at subsection noting that the return
value of the raw sched_setaffinity() system call differs from the
wrapper function in glibc.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add entries for the new cache geometry values of the auxiliary
vector that got included in the kernel.
Signed-off-by: Raphael Moreira Zinsly <rzinsly@linux.vnet.ibm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Using signalfd(2) with epoll(7) and fork(2) can lead to some head
scratching.
It seems that when a signalfd file descriptor is added to epoll
you will only get notifications for signals sent to the process
that added the file descriptor to epoll.
So if you have a signalfd fd registered with epoll and then call
fork(2), perhaps by way of daemon(3) for example. Then you will
find that you no longer get notifications for signals sent to the
newly forked process.
User kentonv on ycombinator[0] explained it thus
"One place where the inconsistency gets weird is when you
use signalfd with epoll. The epoll will flag events on the
signalfd based on the process where the signalfd was
registered with epoll, not the process where the epoll is
being used. One case where this can be surprising is if you
set up a signalfd and an epoll and then fork() for the
purpose of daemonizing -- now you will find that your epoll
mysteriously doesn't deliver any events for the signalfd
despite the signalfd otherwise appearing to function as
expected."
And another post from the same person[1].
And then there is this snippet from this kernel commit message[2]
"If you share epoll fd which contains our sigfd with another
process you should blame yourself. signalfd is "really
special"."
So add a note to the man page that points this out where people
will hopefully find it sooner rather than later!
[0]: https://news.ycombinator.com/item?id=9564975
[1]: https://stackoverflow.com/questions/26701159/sending-signalfd-to-another-process/29751604#29751604
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d80e731ecab420ddcb79ee9d0ac427acbc187b4b
Signed-off-by: Andrew Clayton <andrew@digital-domain.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Eric Biederman noted that my list of directories that could not
have shared propagation was incorrect. I had written that
new_root could not be shared; rather it should be: the parent of
the current root mount point.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Eric Biederman:
The concern from our conversation at the container
mini-summit was that there is a pathology if in your initial
mount namespace all of the mounts are marked MS_SHARED like
systemd does (and is almost necessary if you are going to
use mount propagation), that if new_root itself is MS_SHARED
then unmounting the old_root could propagate.
So I believe the desired sequence is:
>>> chdir(new_root);
+++ mount("", ".", MS_SLAVE | MS_REC, NULL);
>>> pivot_root(".", ".");
>>> umount2(".", MNT_DETACH);
The change to new new_root could be either MS_SLAVE or
MS_PRIVATE. So long as it is not MS_SHARED the mount won't
propagate back to the parent mount namespace.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
LXC uses this [1]. I tested, to double-check, and it works.
The fchdir() dance done by LXC is not needed though:
fchdir(old_root); umount(".", MNT_DETACH); fchdir(new_root);
As far as I can see, just the umount() is sufficient, since,
after pivot_root(), oldi_root is at the top of the stack
of mounts at "/" and thus (so long as CWD is at "/")
the umount will remove the mount at the top of the stack.
Eric Biederman confirmed my understanding by mail, and
Philipp Wendler verified my results by experiment.
[1] See the following commit in LXC:
commit 2d489f9e87fa0cccd8a1762680a43eeff2fe1b6e
Author: Serge Hallyn <serge.hallyn@ubuntu.com>
Date: Sat Sep 20 03:15:44 2014 +0000
pivot_root: switch to a new mechanism (v2)
Helped-by: Eric W. Biederman <ebiederm@xmission.com>
Helped-by: Philipp Wendler <ml@philippwendler.de>
Helped-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After around 19 years, the behavior of pivot_root() has not been
changed, and will almost certainly not change in the future.
So, reword to remove the suggestion that the behavior may change.
Also, more clearly document the effect of pivot_root() on
the calling process's current working directory.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The reference of "Note that this also applies" was vague. So
combine this paragraph with an earlier one to make the linkage
clearer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The idea that there might one day be a mechanism for kernel
threads to explicitly relinquish access to the filesystem never
came to pass (after 20 years), and the presence of text
describing this idea is, IMO, a distraction. So, remove it.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
One kernel printk() later, my suspicions seem confirmed: the text
describing the situation where the current root is not a mount
point (because of a chroot()) seems to be bogus. (Perhaps it was
true once upon a time.) In my testing, if the current root is not
a mount point, an EINVAL error results.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In this text:
If the current root is not a mount point (e.g., after an
earlier chroot(2) or pivot_root())...
mention of pivot_root() makes no sense, since (as noted in an
earlier commit message for this page) 'new_root' in a previous
pivot_root() must (since Linux 2.4.5) have been a mount point.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
One of these "bugs" is a philosophical point already covered
elsewhere in the page, while the other is a somewhat obscure joke.
Both pieces are a bit of a distraction, really.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The note that EBUSY is given if a filesystem is already mounted
on 'Iput_old' was never really true. That restriction was in
Linux 2.3.14, but removed in Linux 2.3.99-pre6 so it never made
it to mainline.
The relevant diff in pivot_root() was:
error = -EBUSY;
- if (d_new_root->d_sb == root->d_sb || d_put_old->d_sb == root->d_sb)
+ if (new_nd.mnt == root_mnt || old_nd.mnt == root_mnt)
goto out2; /* loop */
- if (d_put_old != d_put_old->d_covers)
- goto out2; /* mount point is busy */
error = -EINVAL;
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some of the text was written long ago, and hinted that things
might change in the future. However, 20 years have passed
and these details have not changed, so rework the text to
hint at that fact.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As far as I can see from the source code, the statement that
"No other filesystem may be mounted on 'put_old'" is incorrect.
Even looking at the 2.4.0 source code, there I can't see such
a restriction. In addition, some testing on a 5.0 kernel
(mounting 'put_old' in the new mount namespace just before
pivot_root()) did not result in an error for this case when
calling pivot_root().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
pivot_root() only affects the current working directory and root
directory of other processes in the same mount namespace as the
caller.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The restriction on what values may be specified in 'si_code'
apply only when sending a signal to a process other than the
caller itself.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Threads are allowed to switch mount namespaces if the filesystem
details aren't being shared. That's the purpose of the check in
the kernel quoted by the comment:
if (fs->users != 1)
return -EINVAL;
It's been this way since the code was originally merged in v3.8.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since introduction of MAP_SHARED_VALIDATE, in case flags contain
both MAP_PRIVATE and MAP_SHARED, mmap() doesn't fail with EINVAL,
it succeeds.
The reason for that is that MAP_SHARED_VALIDATE is in fact equal
to MAP_PRIVATE | MAP_SHARED.
This is intended behavior, see:
https://lwn.net/Articles/758594/https://lwn.net/Articles/758598/
Signed-off-by: Nikola Forró <nforro@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Things changed in Linux v5.3-rc3 commit 315c69261dd3 from
splitting after template expansion to splitting beforehand.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This requirement on the first digit with the %e format comes from
the ISO C standard. It ensures that all the digits in the output are
significant and forbids output with a precision less than requested.
Signed-off-by: Vincent Lefevre <vincent@vinc17.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Even though the RFW_* flags were first introduced in Linux 4.6,
they could not be used with aio until 4.13 where the aio_rw_flags
field was added to struct iocb (9830f4be159b "fs: Use RWF_* flags
for AIO operations"). Correct the stated version for each flag.
Fixes: 2f72816f86 ("io_submit.2: Add kernel version numbers for various 'aio_rw_flags' flags")
Signed-off-by: Matti Möll <Matti.Moell@opensynergy.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
E2BIG was removed in 2.6.29, we should mark it as deprecated.
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add powerpc64 to the calling convention tables.
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
PTRACE_GET_SYSCALL_INFO request was introduced by Linux kernel
commit 201766a20e30f982ccfe36bebfad9602c3ff574a aka
v5.3-rc1~65^2~23.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by Florin:
In the first table, for the riscv Arch/ABI, the instruction
should be ecall instead of scall.
According the official manual, the instruction has been
renamed.
https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
"The SCALL and SBREAK instructions have been renamed to
ECALL and EBREAK, respectively. Their encoding and
functionality are unchanged."
Reported-by: Florin Blanaru <florin.blanaru96@gmail.com>
Reviewed-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Matt Perricone <matt.perricone@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Matt Perricone <matt.perricone@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Matt Perricone <matt.perricone@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
- correct smarpqi to smartpqi
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Describe few recently added options (present in glibc-2.29).
Sort the options a bit more logically and alphabetically.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by Simone:
I was looking at version from 2017-09-15 but it's the same
on: http://man7.org/linux/man-pages/man2/statx.2.html
(2019-03-06)
There is reported (about the mask argument) after the list
of constants:
> Note that the kernel does not reject values in mask other
> than the above. Instead, it simply informs the caller which
> values are sup‐ ported by this kernel and filesystem via the
> statx.stx_mask field.
But as reported in the error values, there can be EINVAL if
mask has a reserved valued, and I found a check against
STATX__RESERVED in fs/stat.c for this. So if you use a that
bit (0x80000000U) the kernel will reject the value.
Probably is better to say that the kernel do not enforce the
use of only the listed values, but there are anyway reserved
values so and so you cannot put whatever you want on mask
(that apply to more values than UINT_MAX).
Reported-by: Simone Piccardi <piccardi@truelite.it>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Hi,
Both the Ext2 filesystem handler and the Ext4 filesystem handler will
return the ERANGE error code. Ext2 will return it if the name or value is
too long to be able to be stored, Ext4 will return it if the name is too
long. For reference, the relevant files/lines (with excerpts) are:
fs/ext2/xattr.c: lines 394 to 396 in ext2_xattr_set
> 394 name_len = strlen(name);
> 395 if (name_len > 255 || value_len > sb->s_blocksize)
> 396 return -ERANGE;
fs/ext4/xattr.c: lines 2317 to 2318 in ext4_xattr_set_handle
> 2317 if (strlen(name) > 255)
> 2318 return -ERANGE;
Other filesystems also return this code:
xfs/libxfs/xfs_attr.h: lines 53 to 55
> * The maximum size (into the kernel or returned from the kernel) of an
> * attribute value or the buffer used for an attr_list() call. Larger
> * sizes will result in an ERANGE return code.
It's possible that more filesystem handlers do this, a cursory grep shows
that most of the filesystem xattr handler files mention ERANGE in some
form. A suggested patch is below (I'm not 100% sure on the wording through).
Thanks
--
- Finn
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In kernel/sys.c, arg2 is an unsigned long value and it will never
less than 0. Also, since kernel commit id da8b44d5a9f8 (Linux
4.6), timer_slack_ns and default timer_slack_ns have been
converted into u64, the return value of PR_GET_TIMERSLACK has been
limited under ULONG_MAX.
The timer slack value also can be inherited by a child created via
fork(2).
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Yang Xu <xuyang2018.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by Alan Stern:
Here are two extracts from the man page for ppoll(2):
Specifying a negative value in timeout means an infinite
timeout.
Other than the difference in the precision of the timeout
argument, the following ppoll() call:
ready = ppoll(&fds, nfds, tmo_p, &sigmask);
is equivalent to atomically executing the following calls:
sigset_t origmask;
int timeout;
timeout = (tmo_p == NULL) ? -1 :
(tmo_p->tv_sec * 1000 + tmo_p->tv_nsec / 1000000);
pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
ready = poll(&fds, nfds, timeout);
pthread_sigmask(SIG_SETMASK, &origmask, NULL);
But if tmo_p->tv_sec is negative, the ppoll() call is not
equivalent to the corresponding poll() call. The kernel rejects
negative values of tv_sec with an EINVAL error; it does not
interpret the value as meaning an infinite timeout.
(Yes, the kernel interprets tmo_p == NULL as an infinite timeout,
but the man page is still wrong for the case tmo_p->tv_sec < 0.)
Suggested fix: Following the end of the second extract above, add:
except that negative time values in tmo_p are not
interpreted as an infinite timeout.
Also, in the ERRORS section, change the text for EINVAL to:
EINVAL The nfds value exceeds the RLIMIT_NOFILE value or
*tmo_p contains an invalid (negative) time value.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It appears that 'new_root' may not have needed to be a mount
point on ancient kernels, but already in Linux 2.4.5, there
was the diff shown below. Verified also by testing.
@@ -1631,8 +1605,9 @@
* - we don't move root/cwd if they are not at the root (reason: if something
* cared enough to change them, it's probably wrong to force them elsewhere)
* - it's okay to pick a root that isn't the root of a file system, e.g.
- * /nfs/my_root where /nfs is the mount point. Better avoid creating
- * unreachable mount points this way, though.
+ * /nfs/my_root where /nfs is the mount point. It must be a mountpoint,
+ * though, so you may need to say mount --bind /nfs/my_root /nfs/my_root
+ * first.
*/
asmlinkage long sys_pivot_root(const char *new_root, const char *put_old)
@@ -1640,7 +1615,7 @@
struct dentry *root;
struct vfsmount *root_mnt;
struct vfsmount *tmp;
- struct nameidata new_nd, old_nd;
+ struct nameidata new_nd, old_nd, parent_nd, root_parent;
char *name;
int error;
@@ -1688,6 +1663,10 @@
if (new_nd.mnt == root_mnt || old_nd.mnt == root_mnt)
goto out2; /* loop */
error = -EINVAL;
+ if (root_mnt->mnt_root != root)
+ goto out2;
+ if (new_nd.mnt->mnt_root != new_nd.dentry)
+ goto out2; /* not a mountpoint */
tmp = old_nd.mnt; /* make sure we can reach put_old from new_root */
spin_lock(&dcache_lock);
if (tmp != new_nd.mnt) {
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If a mount point is deleted or renamed or removed in one mount
namespace, this will cause an object that is mounted at that
location in another mount namespace to be unmounted (as verified
by experiment). This was implied by the existing text, but it is
better to make this detail explicit.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* Establish the abbreviations DSO and PID in the lead paragraph
since they are used later.
* Parallelize descriptions of help, usage, and version options
with the "and exit" language used in getent(1), iconv(1),
locale(1), localedef(1), memusage(1), memusagestat(1),
mtrace(1), pldd(1), sprof(1), time(1), iconvconfig(8),
zdump(8), and zic(8).
Signed-off-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
glibc 2.30 isn't released yet, but a fix has been committed, and
Debian has even cherry-picked it for Debian GNU/Linux 10
("buster"). pldd works nicely now.
Signed-off-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The --export-dynamic linker option is not the only way that main's
global symbols may end up in the dynamic symbol table and thus be
used to satisfy symbol reference in a shared object. A symbol
may also be placed into the dynamic symbol table if ld(1)
notices a dependency in another object during the static link.
Verified by experiment; see previous commit.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing text wrongly implied that symbol look up first
occurred in the object and then in main, and did not mention
whether dependencies of main where used for symbol resolution.
Verified by experiment:
$ cat prog.c
#define _GNU_SOURCE
#include <link.h>
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void /* A function defined in both main and lib_x1 */
prog_x1(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
/* The following function is forced into prog's dynamic symbol table
because of the static link-time reference in lib_m1.so */
void /* A function defined in both main and lib_y1 */
prog_y1_exp(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
/* The following function is not forced into prog's dynamic symbol table */
void /* A function defined in both main and lib_y1 */
prog_y1_noexp(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
static int
callback(struct dl_phdr_info *info, size_t size, void *data)
{
printf("\tName = %s\n", info->dlpi_name);
return 0;
}
int
main(int argc, char *argv[])
{
void *xHandle, *yHandle;
void (*funcp)(void);
char *err;
xHandle = dlopen("./lib_x1.so", RTLD_NOW | RTLD_GLOBAL);
if (xHandle == NULL) {
fprintf(stderr, "dlopen: %s\n", dlerror());
exit(EXIT_FAILURE);
}
yHandle = dlopen("./lib_y1.so", RTLD_NOW | RTLD_GLOBAL);
if (yHandle == NULL) {
fprintf(stderr, "dlopen: %s\n", dlerror());
exit(EXIT_FAILURE);
}
/* Optionally display the link map() */
if (argc > 1) {
printf("Link map as shown from dl_iterate_phdr() callbacks:\n");
dl_iterate_phdr(callback, NULL);
printf("\n");
}
(void) dlerror(); /* Clear dlerror() */
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpedantic"
funcp = (void (*)(void)) dlsym(yHandle, "y1_enter");
#pragma GCC diagnostic pop
err = dlerror();
if (err != NULL) {
fprintf(stderr, "dlsym: %s", err);
exit(EXIT_FAILURE);
}
(*funcp)();
exit(EXIT_SUCCESS);
}
$ cat lib_m1.c
#include <stdio.h>
void /* A function defined in both lib_m1 and lib_y1 */
m1_y1(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
#if 1
void
dummy(void)
{
extern void prog_y1_exp(void);
prog_y1_exp(); /* Forces prog_y1_exp into prog's dynamic symbol table,
so that it will be visible also to lib_y1.so */
}
#endif
$ cat lib_x1.c
#include <stdio.h>
void /* A function defined in both main and lib_x1 */
prog_x1(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
void /* A function defined in both lib_x1 and lib_y1 */
x1_y1(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
$ cat lib_y1.c
#include <stdio.h>
void /* A function defined in both lib_x1 and lib_y1 */
x1_y1(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
void /* A function defined in both main and lib_y1 */
prog_y1_exp(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
void /* A function defined in both lib_m1 and lib_y1 */
m1_y1(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
void /* A function defined in both main and lib_y1 */
prog_y1_noexp(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
void
y1_enter(void)
{
extern void y2(void);
printf("Called %s\n\n", __func__);
prog_x1();
prog_y1_exp();
prog_y1_noexp();
x1_y1();
m1_y1();
y2();
}
$ cat lib_y2.c
#include <stdio.h>
void
y2(void)
{
printf("Called %s::%s\n", __FILE__, __func__);
}
$ cat Build.sh
#!/bin/sh
CFLAGS="-Wno-implicit-function-declaration -Wl,--no-as-needed"
cc $CFLAGS -g -fPIC -shared -o lib_x1.so lib_x1.c
cc $CFLAGS -g -fPIC -shared -o lib_y2.so lib_y2.c
cc $CFLAGS -g -fPIC -shared -o lib_y1.so lib_y1.c ./lib_y2.so
cc $CFLAGS -g -fPIC -shared -o lib_m1.so lib_m1.c
#ED="-Wl,--export-dynamic"
cc $CFLAGS $ED -Wl,--rpath,$PWD -o prog prog.c -ldl lib_m1.so
$ sh Build.sh
$ ./prog x
Link map as shown from dl_iterate_phdr() callbacks:
Name =
Name = linux-vdso.so.1
Name = /lib64/libdl.so.2
Name = /home/mtk/tlpi/code/shlibs/dlopen_sym_res_expt/lib_m1.so
Name = /lib64/libc.so.6
Name = /lib64/ld-linux-x86-64.so.2
Name = ./lib_x1.so
Name = ./lib_y1.so
Name = ./lib_y2.so
Called y1_enter
Called lib_x1.c::prog_x1
Called prog.c::prog_y1_exp
Called lib_y1.c::prog_y1_noexp
Called lib_x1.c::x1_y1
Called lib_m1.c::m1_y1
Called lib_y2.c::y2
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
My earlier commit was in error:
commit 4a1af09bd1
Author: Michael Kerrisk <mtk.manpages@gmail.com>
Date: Sat Mar 14 21:40:35 2015 +0100
dlopen.3: Amend error in description of dlclose() behavior
-If the reference count drops to zero and no other loaded libraries use
-symbols in it, then the dynamic library is unloaded.
+If the reference count drops to zero,
+then the dynamic library is unloaded.
I doubted the removed text, because it provide little clue about
the scenario. The POSIX dlclose(3) specification actually details
the scenario sufficiently:
Although a dlclose() operation is not required to remove
any functions or data objects from the address space,
neither is an implementation prohibited from doing so.
The only restriction on such a removal is that no func‐
tion nor data object shall be removed to which references
have been relocated, until or unless all such references
are removed. For instance, an executable object file that
had been loaded with a dlopen() operation specifying the
RTLD_GLOBAL flag might provide a target for dynamic relo‐
cations performed in the processing of other relocatable
objects—in such environments, an application may assume
that no relocation, once made, shall be undone or remade
unless the executable object file containing the relo‐
cated object has itself been removed.
Verified by experiment:
$ cat openlibs.c # Test program
int
main(int argc, char *argv[])
{
void *libHandle[MAX_LIBS];
int lcnt;
if (argc < 2) {
fprintf(stderr, "Usage: %s lib-path...\n", argv[0]);
exit(EXIT_FAILURE);
}
lcnt = 0;
for (int j = 1; j < argc; j++) {
if (argv[j][0] != '-') {
if (lcnt >= MAX_LIBS) {
fprintf(stderr, "Too many libraries (limit: %d)\n", MAX_LIBS);
exit(EXIT_FAILURE);
}
printf("[%d] Opening %s\n", lcnt, argv[j]);
libHandle[lcnt] = dlopen(argv[j], RTLD_NOW | RTLD_GLOBAL);
if (libHandle[lcnt] == NULL) {
fprintf(stderr, "dlopen: %s\n", dlerror());
exit(EXIT_FAILURE);
}
lcnt++;
} else { /* "-N" closes the Nth handle */
int i = atoi(&argv[j][1]);
printf("Closing handle %d\n", i);
dlclose(libHandle[i]);
}
sleep(1);
printf("\n");
}
printf("Program about to exit\n");
exit(EXIT_SUCCESS);
}
$ cat lib_x1.c
void x1_func(void) { printf("Hello world\n"); }
__attribute__((constructor)) void x1_cstor(void)
{ printf("Called %s\n", __FUNCTION__); }
__attribute__((destructor)) void x1_dstor(void)
{ printf("Called %s\n", __FUNCTION__); }
$ cat lib_y1.c
void y1_func(void) { printf("Hello world\n"); }
__attribute__((constructor)) void y1_cstor(void)
{ printf("Called %s\n", __FUNCTION__); }
__attribute__((destructor)) void y1_dstor(void)
{ printf("Called %s\n", __FUNCTION__); }
static void testref(void) {
/* The following reference, to a symbol in lib_x1.so shows that
RTLD_GLOBAL may pin a library when it might otherwise have been
released with dlclose() */
extern void x1_func(void);
x1_func();
}
$ cc -shared -fPIC -o lib_x1.so lib_x1.c
$ cc -shared -fPIC -o lib_y1.so lib_y1.c
$ cc -o openlibs openlibs.c -ldl
$ LD_LIBRARY_PATH=. ./openlibs lib_x1.so lib_y1.so -0 -1
[0] Opening lib_x1.so
Called x1_cstor
[1] Opening lib_y1.so
Called y1_cstor
Closing handle 0
Closing handle 1
Called y1_dstor
Called x1_dstor
Program about to exit
<end program output>
Note that x1_dstor was called only when handle 1 (lib_y1.so) was closed.
But, if we edit lib_y1 to remove the reference to x1_func(), things are
different:
$ cat lib_y1.c # After editing
void y1_func(void) { printf("Hello world\n"); }
__attribute__((constructor)) void y1_cstor(void)
{ printf("Called %s\n", __FUNCTION__); }
__attribute__((destructor)) void y1_dstor(void)
{ printf("Called %s\n", __FUNCTION__); }
static void testref(void) {
// extern void x1_func(void);
// x1_func();
}
$ cc -shared -fPIC -o lib_y1.so lib_y1.c
$ LD_LIBRARY_PATH=. ./openlibs lib_x1.so lib_y1.so -0 -1
[0] Opening lib_x1.so
Called x1_cstor
[1] Opening lib_y1.so
Called y1_cstor
Closing handle 0
Called x1_dstor
Closing handle 1
Called y1_dstor
Program about to exit
<end program output>
This time, x1_dstor was called when handle 0 (lib_x1.so) was closed.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See fs/xattr.c::xattr_permission()"
/*
* In the user.* namespace, only regular files and directories can have
* extended attributes. For sticky directories, only the owner and
* privileged users can write attributes.
*/
if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
(mask & MAY_WRITE) && !inode_owner_or_capable(inode))
return -EPERM;
}
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If the file descriptors received in SCM_RIGHTS would cause
the process to its exceed RLIMIT_NOFILE limit, the excess
FDs are discarded.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Long ago, the sysvipc.7 page was called ipc.5, which was both a
misnaming (too general a name) and an inconsistent section. The
page was renamed (to svipc.7) many years ago, and the link with
the old name has probably ceased to be needed. So, remove it.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Here's a program for doing experiments:
/* on_expt_scope_expt.c
(C) Michael Kerrisk, 2019, Licensed GNU GPLv2+
*/
char *tos;
static void
exitFunc(int status, void *p)
{
int efloc;
int *xp = (int *) p;
printf("====== Entered exit handler\n");
printf("&efloc = %p (0x%llx)\n",
(void *) &efloc, (long long) (tos - (char *) &efloc));
printf("xp = %p (value: %d)\n", (void *) xp, *xp);
if (*xp != INIT_VALUE)
printf("It looks like the variable passed to the exit handler "
"has gone out of scope\n");
/* Produce a core dump, which we can examine with GDB to look at the
frames on the stack, if desired */
printf("===\n");
printf("About to abort\n");
abort();
}
static void
recur(int lev, int *xp)
{
int rloc;
int big[65536-12]; /* 12*4 == 48 other bytes allocated on
this stack frame */
tos = (char *) &rloc;
big[0] = lev;
big[0]++;
printf("&rloc = %p (%d) (%d)\n", (void *) &rloc, lev, *xp);
if (lev > 1)
recur(lev - 1, xp);
else {
printf("exit() from recur()\n");
exit(EXIT_SUCCESS);
}
}
int
main(int argc, char *argv[])
{
int lev;
int *xp;
int xx;
if (argc < 2) {
fprintf(stderr, "Usage: %s {s|h} [how]\n", argv[0]);
fprintf(stderr, "\ts => exitFunc() arg is in main() stack\n");
fprintf(stderr, "\th => exitFunc() arg is allocated on heapn");
fprintf(stderr, "\tIf 'how' is not present, then return from main()\n");
fprintf(stderr, "\tIf 'how' is 0, then exit() from main()\n");
fprintf(stderr, "\tIf 'how' is > 0, then make 'how' recursive "
"function calls, and then exit()\n");
exit(EXIT_FAILURE);
}
tos = (char *) &xp;
if (argv[1][0] == 'h') {
xp = malloc(sizeof(int));
if (xp == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
printf("Argument for exitFunc() is allocated on heap\n");
} else {
xp = &xx;
printf("Argument for exitFunc() is allocated on stack in main()\n");
}
*xp = INIT_VALUE;
printf("xp = %p (value: %d)\n", (void *) xp, *xp);
printf("===\n");
on_exit(exitFunc, xp);
if (argc == 2) {
printf("return from main\n");
return 0;
}
lev = atoi(argv[2]);
if (lev < 1) {
printf("Calling exit() from main\n");
exit(EXIT_SUCCESS);
} else {
recur(lev, xp);
}
}
Reported-by: Sami Kerola <kerolasa@iki.fi>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
To get the pkey_alloc, pkey_free and pkey_mprotect functions
_GNU_SOURCE needs to be defined before including sys/mman.h.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Mark Wielaard <mark@klomp.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The mprotect.2 NOTES say:
On systems that do not support protection keys in
hardware, pkey_mprotect() may still be used, but pkey must
be set to 0. When called this way, the operation of
pkey_mprotect() is equivalent to mprotect().
But this is not what the glibc manual says:
It is also possible to call pkey_mprotect with a key value
of -1, in which case it will behave in the same way as
mprotect.
Which is correct. Both the glibc implementation and the
kernel check whether pkey is -1. 0 is not a valid pkey when
memory protection keys are not supported in hardware.
Signed-off-by: Mark Wielaard <mark@klomp.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I've found the exec man page quite difficult to read when trying
to find the behavior for a specific function. Since the names of
the functions are inline and the order of the descriptions isn't
clear, it's hard to find which paragraphs apply to each function.
I thought it would be much easier to read if the grouping based on
letters is stated.
The Blackfin port was removed in Linux 4.17. Mention this in the
section concerning Blackfin vDSO functions.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Improved the readability of a sentence that describes the use of
FAN_REPORT_FID and how this particular flag influences what data
structures a listening application could expect to receive when
describing an event.
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the symbols exported by the RISCV vDSO which is present
from kernel 4.15 onwards.
See kernel source files in arch/riscv/kernel/vdso.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Details relating to the new initialization flag FAN_REPORT_FID has been
added. As part of the FAN_REPORT_FID feature, a new set of event masks are
available and have been documented accordingly.
A simple example program has been added to also support the understanding
and use of FAN_REPORT_FID and directory modification events.
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note some further details of the treatment of environment
variables in secure execution mode. In particular (as noted by
Matthias Hertel), note that ignored environment variables are also
stripped from the environment. Furthermore, there are some other
variables, not used by the dynamic linker itself, that are also
treated in this way (see the glibc source file
sysdeps/generic/unsecvars.h).
Reported-by: Matthias Hertel <Matthias.Hertel@rohde-schwarz.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Give the shell in the second cgroup namespace a different prompt,
so as to clearly distinguish the two namespaces.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The const specifier is not part of the prototype (it only applies to the
implementation), so showing it here confuses the reader.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
stdarg.h is now 30 years old, and gcc long ago (2004) ceased to
implement <varargs.h>. There seems little value in keeping this
text.
See https://bugzilla.kernel.org/show_bug.cgi?id=202907
Reported-by: Vincent Lefevre <vincent@vinc17.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The BUGS section already explains why you need to be cautious
about using mallinfo, but given the number of bug reports we see
on Android, it seems not many people are reading that far. Call it
out up front.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since glibc 2.8, commit 68631c8eb92, the malloc_trim function has
iterated over all arenas and free'd back to the OS all page runs
that were free. This allows an application to call malloc_trim to
consolidate fragmented chunks and free back any pages it can to
potentially reduce RSS usage.
This correctness of the man page was recently brought to light by
an article [1] where Ruby developers discovered that malloc_trim
did not behave as the man page indicated.
This change makes it clear that the intent of malloc_trim is to
trim all space that is no longer needed, and any restrictions are
implementation details. In the notes we highlight the change in
behaviour for post glibc 2.8 and pre glibc 2.8.
[1] https://www.joyfulbikeshedding.com/blog/2019-03-14-what-causes-ruby-memory-bloat.html#a-magic-trick-trimming
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Where is the initial read position for an "a+" stream?
POSIX leaves this unspecified. Most BSD man pages are silent, and MacOS
has the ambiguous "The stream is positioned at the end of the file", not
differentiating between reads and writes other than to say that fseek(3)
does not affect writes. glibc's documentation explicitly specifies that
the initial read position is the beginning of the file.
My new wording is based on the BSD implementations, so you may prefer
to replace the non-glibc section with "unspecified", or indeed remove
all claims about the initial read position.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Egmont suggested adding this, because the string "..." appears
at several other points in the page, but just to indicate that
some text is omitted from example code.
Reported-by: Egmont Koblinger <egmont@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The removed text long ago ceased to be accurate. Nowadays, the
dispatch table is autogenerated when building the kernel (via
the kernel makefile, arch/x86/entry/syscalls/Makefile).
Reported-by: Andreas Korb <andreas.d.korb@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rewrite for improved clarity and defer to setfsuid(2) for the
rationale of the fsGID rather than repeating the same details
in this page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current text reads somewhat clumsily. Rewrite it to introduce
the eUID and fsUID in parallel, and more clearly hint at the the
historical rationale for the fsUID, which is detailed lower in
the page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The section "Example Programs ..." was renamed to "Example programs ..."
(with lowercase p) in c634028ab5, but the reference was not
updated.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Unlike strerror_r(), strerror_l() doesn't take buffer length as an
argument.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch documents two additional flags recently introduced
for the attr.sched_flags field of sched_setattr().
Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove a longstanding mystery in the text of the page, by
explaining a case where the value returned for a symbol may be
NULL. (However, there are presumably other cases, since the text
in the dlsym(3) manual page pre-dates the invention of IFUNCs.)
See also
https://stackoverflow.com/questions/13941944/why-can-the-value-of-the-symbol-returned-by-dlsym-be-null
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The `:' is aligned with the traditional format of the widely used
command-line utility `chown'.
Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
groff_mdoc(7) from the groff project provides a better
equivalent of mdoc.samples(7) and the 'mandoc' project
provides a better mdoc(7). And nowadays, there are virtually
no pages in "man-pages" that use mdoc markup.
So, drop these pages.
From a conversation on linux-man with Ingo Schwarz:
[[
Subject: Re: [groff] [PATCH] man7/mdoc_samples.7: srcfix: Avoid a warning about a wrong section
Date: Wed, 27 Feb 2019 15:28:19 +0100
> The two actual problems are both within the Linux man-pages project,
> not within groff:
>
> 1. While back in the early 1990ies, Cynthia Livingston's
> mdoc.samples(7) manual page was an important document and the
> de-facto language definition of the mdoc(7) language, it has
> been outdated for a long time now. The current groff_mdoc(7)
> manual page is based on it but contains large numbers of important
> improvements by Werner Lemberg and others. As an alternative
> language definition that is slightly more concise without being
> less precise and complete, the mdoc(7) manual page is available
> from the mandoc(1) distribution (mandoc.bsd.lv). If there are
> any contradictions between groff_mdoc(7) and mdoc(7), those are
> unintended and i ought to fix them.
>
> So i really believe that the Linux man-pages project ought to
> stop distributing the woefully outdated mdoc.samples(7) manual
> page. If you want to include documentation for the mdoc language,
> i suggest that you either include a copy of the current version
> of the groff_mdoc(7) manual from the groff(1) distribution or
> of the mdoc(7) manual from the mandoc(1) distribution, whichever
> you think harmonizes better with the Linux man-pages project.
> Both are BSD-style licensed, so there should be no licensing
> issues.
>
> I'm not sure whether it is better for you to include or not
> include it. There is probably value in having mdoc(7) documentation
> out of the box with the Linux man-pages project. Then again,
> having groff_mdoc(7) in both the Linux man-pages package and
> in the groff package - or having mdoc(7) in both the Linux
> man-pages project and the mandoc(1) package - might cause
> packaging conflicts for some distributions. I don't rightly
> know how such conflicts are typically handled by Linux
> distributions. Not being able to install the Linux man-pages
> pages project, groff(1) and mandoc(1) all together on the same
> Linux machine would certainly be a bad situation...
>
> By the way, the mdoc(7) manual page distributed by the Linux
> man-pages project also makes very little sense. It is a partial
> repetition of information from groff_mdoc(7)/[mandoc-]mdoc(7),
> but so compressed that it is mostly unintelligible. Besides,
> it is incomplete: e.g. .Lk, .Mt, .Dx, .Ox, .Nx, .Ta, .%U, .Bk,
> .Ek, .Lb, .In, .Ft, .Ms, .Brq, .Bro, .Brc, .Ex are missing -
> it seems outdated by at lest 25 years. Also, some claims are
> outright wrong - for example, you *cannot* use .UR/.UE in an
> mdoc(7) document, and i cannot remember ever having seen an
> implementation of a .UN macro anywhere. Some macros descriptions
> are also wrong, e.g. .Fd is *not* intended for "function
> declarations", and .Vt is *not* "Fortran only". And so on.
>
> 2. I don't recommend keeping the old mdoc.samples(7) and mdoc(7)
> manual pages, but if you think you must do that for some reason,
> then you must at least revert this bogus commit:
I am *not at all* attached to keeping to these pages. Their
presence in the project has always felt a bit anomalous to me.
Back when I took over maintainership in 2004, there were a small
number of pages that used mdoc markup, and so it seemed wise
to keep these pages. Over time, most of those few pages were
converted to 'man' markup, and today the only other page in the
project that still uses mdoc markup is in queue(3). So, there is
just about zero value in having 'mdoc' documentation come with
the "Linux man-pages" box.
Since I seldom use mdoc markup myself, I've had no reason to
monitor pages such as groff_mdoc(7) or the mdoc(7) page
provided my ther 'mandoc' project and compare them with
the pages provided by "Linux man-pages". Now I've had a
closer look. It's sad.
I've removed mdoc(7) and mdoc.samples(7) from "Linux -man-pages".
]]
Reported-by: Ingo Schwarze <schwarze@usta.de>
Quoting Branden:
*roff escape sequences may sometimes look like C escapes, but that
is misleading. *roff is in part a macro language and that means
recursive expansion to arbitrary depths.
You can get away with "\\" in a context where no macro expansion
is taking place, but try to spell a literal backslash this way in
the argument to a macro and you will likely be unhappy with
results.
Try viewing the attached file with "man -l".
"\e" is the preferred and portable way to get a portable "escape
literal" going back to CSTR #54, the original Bell Labs troff
paper.
groff(7) discusses the issue:
\\ reduces to a single backslash; useful to delay its
interpretation as escape character in copy mode. For a
printable backslash, use \e, or even better \[rs], to be
independent from the current escape character.
As of groff 1.22.4, groff_man(7) does as well:
\e Widely used in man pages to represent a backslash output
glyph. It works reliably as long as the .ec request is
not used, which should never happen in man pages, and it
is slightly more portable than the more exact ‘\(rs’
(“reverse solidus”) escape sequence.
People not concerned with portability to extremely old troffs should
probably just use \(rs (or \[rs]), as it means "the backslash
glyph", not "the glyph corresponding to whatever the current escape
character is".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Branden:
*roff systems will interpret the period in the unpatched
page as sentence-ending punctuation and put inter-sentence
spacing after it. (This might not be visible on
nroff/terminal devices, but it is more likely to be on
typesetter/PostScript/PDF output).
groff_man(7) in groff 1.22.4 attempts to throw man page
writers a bone here:
\& Zero‐width space. Append to an input line to prevent
an end‐of‐ sentence punctuation sequence from being
recognized as such, or insert at the beginning of an
input line to prevent a dot or apostrophe from being
interpreted as the beginning of a roff request.
Reported-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Reported-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use a single-font-style macro (".B", ".I") for a single argument.
Remove unneeded quotation marks (").
The output from "nroff" and "groff" is unchanged, except for the change
1) '-1' to '\-1' in the file "timegm.3"
2) to separate ',' from a word in the file "uselocale.3".
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use a single-font-style macro (".B", ".I") for a single argument.
The output from "nroff" and "groff" is unchanged, except for the
1) change of '-.' to '\- .' in the file "locale.5"
2) change of some '-' to '\-' in the file "locale.5".
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
1) Use single-font macros for a single argument.
2) Use quotation marks for arguments containing a space.
3) Use roman font for punctuation marks.
The output has only changes of the font for a punctuation mark.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
1) Use a single capital font macro for a genuine single argument.
The output is unchanged.
2) Remove quotation marks (") around a single argument.
The output is unchanged.
3) Change ".IR ab()" to ".IR ab ()"
A font is changed in the output.
mtk: I verified that the output is unchanged (other than fonts)
by comparing the output of:
for a in *.1; do man $a >> out.txt; done
before and after the patch.
Signed-off-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fanotify_init.2: add new flag FAN_REPORT_TID
fanotify.7: update description of member pid in
struct fanotify_event_metadata
Signed-off-by: nixiaoming <nixiaoming@huawei.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Monitor fanotify events on the entire filesystem.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
New event masks have been added to the fanotify API. Documentation to
support the use and behaviour of these new masks has been added
accordingly.
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note EEXIST error that occurs when requesting a watch on a path
which is already watched with IN_MASK_CREATE.
Note EINVAL error also occurs when requesting a watch specifying
both IN_MASK_CREATE and IN_MASK_ADD.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add documentation for new flag IN_MASK_CREATE for inotify_add_watch()
which is used to only allow new watches to be created.
Information obtained from a patch I submitted to the linux kernel
https://marc.info/?l=linux-fsdevel&m=152775980422847&w=2
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove any doubt, in case the reader might wrongly think that
objects are added in reverse order (which would mean that the
last listed object would be added at the front of the link map).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The glibc wrapper was added in glibc 2.29, release on 1 Feb 2019.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
mtk: checked also against examples in samples/bpf
in kernel source to confirm.
Signed-off-by: Oded Elisha <oded123456@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current manpage reads to me as if the kernel will always pick
a free space close to the requested address, but that's not the
case:
mmap(0x600000000000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x600000000000
mmap(0x600000000000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x7f5042859000
You can also see this in the various implementations of
->get_unmapped_area() - if the specified address isn't available,
the kernel basically ignores the hint (apart from the 5level
paging hack).
Clarify how this works a bit.
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Historically, at least FIFOs and pipes yielded the error EINVAL.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
All address families are now documented in address_families.7,
which is already present in SEE ALSO section. Also, the AF_ALG
note contains dead link to kernel HTML documentation.
Signed-off-by: Nikola Forró <nforro@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man5/filesystems.5 (.SH DESCRIPTION): Add a note that the
information about available file systems can be obtained
via sysfs() syscall.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It has its own man page, so it probably makes sense to mention
it here.
* man2/socket.2 (.SH DESCRIPTION): Add mention of AF_VSOCK back.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man2/socket.2 (.SH DESCRIPTION): Mention that the list of
address families is Linux-specific.
* man7/address_families.7 (.SH DESCRIPTION): Likewise.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man2/sigaction.2 (.SS Undocumented): Provide information about
relation between the second argument of sa_handler and
uc_mcontext field of the struct ucontext structure.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some architectures do provide an 'l_sysid' declaration in
struct flock; however, it is not used anyway.
* man2/fcntl.2 (.SH NOTES): Note that l_sysid field is not used on
Linux even if present on some architectures.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I need to get the TTL of UDP datagrams from userspace, so I set
the IP_RECVTTL socket option. And as promised by ip.7, I then get
IP_TTL messages from recvfrom. However, unlike what the manpage
promises, the TTL field gets passed as a 32 bit integer.
The following userspace code works:
uint32_t ttl32;
for (cmsg = CMSG_FIRSTHDR(msgh); cmsg != NULL; cmsg = CMSG_NXTHDR(msgh,cmsg)) {
if ((cmsg->cmsg_level == IPPROTO_IP) && (cmsg->cmsg_type == IP_TTL) &&
CMSG_LEN(sizeof(ttl32)) == cmsg->cmsg_len) {
memcpy(&ttl32, CMSG_DATA(cmsg), sizeof(ttl32));
*ttl=ttl32;
return true;
}
else
cerr<<"Saw something else "<<(cmsg->cmsg_type == IP_TTL) <<
", "<<(int)cmsg->cmsg_level<<", "<<cmsg->cmsg_len<<", "<<
CMSG_LEN(1)<<endl;
}
The 'else' field was used to figure out I go the length wrong.
Note from mtk:
Reading the source code also seems to confirm this, from
net/ipv4/ip_sockglue.c:
[[
static void ip_cmsg_recv_ttl(struct msghdr *msg, struct sk_buff *skb)
{
int ttl = ip_hdr(skb)->ttl;
put_cmsg(msg, SOL_IP, IP_TTL, sizeof(int), &ttl);
}
]]
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This best belongs at the end of the page, after the subsections
that already make some mention of user namespaces.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The text stated that the execve() capability transitions are not
performed for the same reasons that setuid and setgid mode bits
may be ignored (as described in execve(2)). But, that's not quite
correct: rather, the file capability sets are treated as empty
for the purpose of the capability transition calculations.
Also merge the new 'no_file_caps' kernel option text into the
same paragraph.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove crufty sentence suggesting use of deprecated capsetp(3) and
capgetp(3); the manual page for those functions has long (at least
as far back as 2007) noted that they are deprecated.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clarify the "Capabilities and execution of programs by root"
section, and correct a couple of details:
* If a process with rUID == 0 && eUID != 0 does an exec,
the process will nevertheless gain effective capabilities
if the file effective bit is set.
* Set-UID-root programs only confer a full set of capabilities
if the binary does not also have attached capabilities.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
fs/proc/uptime.c:uptime_proc_show() fetches time using
ktime_get_boottime which includes the time spent in suspend.
Signed-off-by: Stephan Knauss <linux@stephans-server.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Balbir pointed out that v1 delegation was not an accidental
feature.
Reported-by: Balbir Singh <bsingharora@gmail.com>
Reported-by: Marcus Gelderie <redmnic@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Update the bug reporting email address to that shown by
/bin/time --help
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use \(aq for ASCII apostrophes and \(ga for backtick,
as recommended by groff_man(7).
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I was reading the local-gen bash script, looking for why I'm
getting locale errors, when I noticed that localdef's -f and -c
options were named, in what I think, is a very confusing way.
-c is the same as --force, and
-f charmapfile is the same as --charmap=charmapfile.
Yes, it would have been better if they're names had been reversed,
like this:
-f is the same as --force, and
-c charmapfile is the same as --charmap=charmapfile.
But given what they are, I thought it would be helpful to give a
heads up to watch for their irregular naming. I hope I've worded
it appropriately.
I'm not ccing this to anyone else, (i.e. developers, etc), as
these features work as described in the man page. They're just
confusing.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
According to the latest glibc, the bsd_signal() function is just
declared when POSIX.1-2008 (or newer) instead of POSIX.1-2001 is
not set since glibc v2.26.
Please see the following code from signal/signal.h:
-----------------------------------------------------------------
/* The X/Open definition of `signal' conflicts with the BSD version.
So they defined another function `bsd_signal'. */
extern __sighandler_t bsd_signal (int __sig, __sighandler_t __handler)
__THROW;
-----------------------------------------------------------------
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
O_RSYNC is defined in <asm/fcntl.h> on HP PA-RISC, but is not
used anyway.
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a note regarding other implementations of whiteout inodes
and update filesystem support information.
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This information is already summarized in syscall(2), so there's
no need to repeat it in each page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some architectures (ab)use second return value register for additional
return value in some system calls. Let's describe this.
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clarify that SO_PASSCRED results in SCM_CREDENTIALS data in each
subsequently received message.
See https://bugzilla.kernel.org/show_bug.cgi?id=201805
Reported-by: Felipe Gasper <felipe@felipegasper.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by Nadav Har'El in
https://bugzilla.kernel.org/show_bug.cgi?id=197961
The write(2) manual page has this paragraph:
"On success, the number of bytes written is returned
(zero indicates nothing was written). It is not an error
if this number is smaller than the number of bytes
requested; this may happen for example because the disk
device was filled. See also NOTES."
I find a few problems with this paragraph:
1. It's not clear what "See also NOTES." refers to (does it
refer to anything?). What in the NOTES is relevant here?
2. The paragraph seems to suggest that write(2) of a
non-empty buffer may sometimes return even 0 in case of an
error like the device being filled. I think this is wrong
- if there was an error after already writing some number
of bytes, this non-zero number is returned. But if there's
an error before writing any bytes, -1 will be returned
(and the error reason in errno) - 0 will not be returned
unless the given count is 0 (that case is explained in the
following paragraph).
3. The paragraph doesn't explain what a user should do
after a short write (i.e., write(2) returning less than
count). How would the user know why there was an error, or
if there even was one? I think users should be told what
to do next because this information is part of how to use
this API correctly. I think users should be told to retry
the rest of the write (i.e., write(fd, buf+ret, count-ret)
and this will either succeed in writing some more data if
the error reason was solved, or the second write will
return -1 and the error reason in errno.
Reported-by: Nadav Har'El <nyh@math.technion.ac.il>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
ENOATTR is not a standard error code, but rather one that is
defined in 'libattr' as a synonym for ENODATA. The manual pages
should use the error code actually returned by the kernel APIs.
See also https://bugzilla.kernel.org/show_bug.cgi?id=201995
Reported-by: Enrico Scholz <enrico.scholz@sigma-chemnitz.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
alpha use v0 e.g. $0 as the return value register both in
syscall ABI and C ABI.
see also
https://github.com/torvalds/linux/blob/master/arch/alpha/kernel/entry.S#L479
The normal Alpha C ABI use a0~a5 to pass arguments and use v0 as
the return value register. See here
https://www2.cs.arizona.edu/projects/alto/Doc/local/alpha.register.html
The syscall ABI use v0 as the trap number, a0~a5 to pass arguments
and use a3 as a indicator (bool type) whether has a error occurred.
We can also see the libc's syscall wrapper implements at
https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/alpha/syscall.S.html
The v0 is the normal used as return register, and we can see the
return processing doesn't do anything about a0 which is the wrong
register of currently syscall(2) description.
p.s. I found this wrong description because I'm porting Go gc to
a new CPU architecture which is similar to Alpha, And I use the
wrong register at first, then I have inspect the kernel code and
objdump to ensure the right syscall ABI.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Back in 2014 (37bee118ad) the text
describing when multiplexing happens was changed in a confusing way.
This is an attempt to clarify things a bit.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since 93e06c7a6453 ("mm: enable MADV_FREE for swapless system") we
handle MADV_FREE on a swapless system the same way as with the
swap available. Clarify that fact in the man page.
Reported-by: Niklas Hambüchen <mail@nh2.me>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove the old statement that PTRACE_O_TRACESYSGOOD may not work
on all architectures. As far as I can tell, all kernel code
properly tests PT_TRACESYSGOOD flag and sets the 7th bit in the
exit code passed to ptrace_notify().
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
To test the behavior documented by this patch, the following
demos employ the program shown at the foot of this commit message.
First, show that the pdeath signal is sent when the parent
terminates:
$ ./pdeath_signal 0 10 4
Parent (18595) about to sleep for 4 seconds
Child about to set PR_SET_PDEATHSIG
Child about to sleep
Parent (18595) terminating
*********** Child (18596) got signal; si_pid = 18595; si_uid = 1000
Parent PID is now 1403
$ Child about to exit
But the signal is not sent if the parent terminates before the
child uses PR_SET_PDEATHSIG:
$ ./pdeath_signal 2 10 0
Parent (18707) about to sleep for 0 seconds
Parent (18707) terminating
Child about to sleep 2 seconds before setting PR_SET_PDEATHSIG
$ Child about to set PR_SET_PDEATHSIG
Child about to sleep
Child about to exit
Demonstrate that the pdeath signal is sent on termination of each
ancestor subreaper process:
$ ./pdeath_signal 2 10 3 7 6 5
18786 marked itself as a subreaper
18786 subreaper about to sleep 7 seconds
18787 marked itself as a subreaper
18787 subreaper about to sleep 6 seconds
18788 marked itself as a subreaper
18788 subreaper about to sleep 5 seconds
Parent (18789) about to sleep for 3 seconds
Child about to sleep 2 seconds before setting PR_SET_PDEATHSIG
Child about to set PR_SET_PDEATHSIG
Child about to sleep
Parent (18789) terminating
*********** Child (18790) got signal; si_pid = 18789; si_uid = 1000
Parent PID is now 18788
18788 subreaper about to terminate
*********** Child (18790) got signal; si_pid = 18788; si_uid = 1000
Parent PID is now 18787
18787 subreaper about to terminate
*********** Child (18790) got signal; si_pid = 18787; si_uid = 1000
Parent PID is now 18786
18786 subreaper about to terminate
*********** Child (18790) got signal; si_pid = 18786; si_uid = 1000
Parent PID is now 1403
$ Child about to exit
But in the case where some subreapers terminate before they
have a chance to adopt the child, the terminations of those
subreapers do not result in a signal for the child:
$ ./pdeath_signal 2 10 3 5 6 7
18836 marked itself as a subreaper
18836 subreaper about to sleep 5 seconds
18837 marked itself as a subreaper
18837 subreaper about to sleep 6 seconds
18838 marked itself as a subreaper
18838 subreaper about to sleep 7 seconds
Parent (18839) about to sleep for 3 seconds
Child about to sleep 2 seconds before setting PR_SET_PDEATHSIG
Child about to set PR_SET_PDEATHSIG
Child about to sleep
Parent (18839) terminating
*********** Child (18840) got signal; si_pid = 18839; si_uid = 1000
Parent PID is now 18838
18836 subreaper about to terminate
$ 18837 subreaper about to terminate
18838 subreaper about to terminate
*********** Child (18840) got signal; si_pid = 18838; si_uid = 1000
Parent PID is now 1403
Child about to exit
============================
/* pdeath_signal.c */
} while (0)
static void
handler(int sig, siginfo_t *si, void *ucontext)
{
printf("*********** Child (%ld) got signal; si_pid = %d; si_uid = %d\n",
(long) getpid(), si->si_pid, si->si_uid);
printf(" Parent PID is now %ld\n", (long) getppid());
}
int
main(int argc, char *argv[])
{
struct sigaction sa;
int childPreSleep, childPostSleep, parentSleep;
if (argc < 2) {
fprintf(stderr, "Usage: %s child-pre-sleep "
"[child-post-sleep [parent-sleep [subreaper-sleep...]]]\n",
argv[0]);
exit(EXIT_FAILURE);
}
childPreSleep = atoi(argv[1]);
if (argc > 2)
childPostSleep = atoi(argv[2]);
if (argc > 3)
parentSleep = atoi(argv[3]);
/* Optionally create a series of subreapers */
if (argc > 4) {
for (int sr = 4; sr < argc; sr++) {
if (prctl(PR_SET_CHILD_SUBREAPER, 1) == -1)
errExit("prctl");
printf("%ld marked itself as a subreaper\n", (long) getpid());
switch (fork()) {
case -1:
errExit("fork");
case 0:
break;
default:
printf("%ld subreaper about to sleep %s seconds\n",
(long) getpid(), argv[sr]);
sleep(atoi(argv[sr]));
printf("%ld subreaper about to terminate\n", (long) getpid());
exit(EXIT_SUCCESS);
}
}
}
switch (fork()) {
case -1:
errExit("fork");
case 0:
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = handler;
if (sigaction(SIGUSR1, &sa, NULL) == -1)
errExit("sigaction");
if (childPreSleep > 0) {
printf("Child about to sleep %d seconds before setting "
"PR_SET_PDEATHSIG\n", childPreSleep);
sleep(childPreSleep);
}
printf("Child about to set PR_SET_PDEATHSIG\n");
if (prctl(PR_SET_PDEATHSIG, SIGUSR1) == -1)
errExit("prctl");
printf("Child about to sleep\n");
for (int j = 0; j < childPostSleep; j++)
sleep(1);
printf("Child about to exit\n");
exit(EXIT_SUCCESS);
default:
printf("Parent (%ld) about to sleep for %d seconds\n",
(long) getpid(), parentSleep);
sleep(parentSleep);
printf("Parent (%ld) terminating\n", (long) getpid());
exit(EXIT_SUCCESS);
}
}
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The signal is process directed and the siginfo_t->si_pid
filed contains the PID of the terminating parent.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
ptrace() with requests PTRACE_PEEKTEXT, PTRACE_PEEKDATA and
PTRACE_PEEKUSER can set errno to zero. AFAICS this is for a good
reason (so that you can tell the difference between a successful
PEEK with a result of -1 and a failed PEEK, even if you forget to
clear errno yourself), but it technically violates the rules
described in the errno.3 manpage.
glibc snippet from sysdeps/unix/sysv/linux/ptrace.c:
res = INLINE_SYSCALL (ptrace, 4, request, pid, addr, data);
if (res >= 0 && request > 0 && request < 4)
{
__set_errno (0);
return ret;
}
reproducer:
$ cat ptrace_test.c
char foobar_data[4] = "ABCD";
int main(void) {
pid_t child = fork();
if (child == -1) err(1, "fork");
if (child == 0) {
if (prctl(PR_SET_PDEATHSIG, SIGKILL)) err(1, "prctl");
while (1) sleep(1);
}
int status;
if (ptrace(PTRACE_ATTACH, child, NULL, NULL)) err(1, "attach");
if (waitpid(child, &status, 0) != child) err(1, "wait");
errno = EINVAL;
unsigned int res = ptrace(PTRACE_PEEKDATA, child, foobar_data, NULL);
printf("errno after PEEKDATA: %d\n", errno);
printf("PEEKDATA result: 0x%x\n", res);
}
$ gcc -o ptrace_test ptrace_test.c -Wall
$ ./ptrace_test
errno after PEEKDATA: 0
PEEKDATA result: 0x44434241
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See copy_process() in kernel/fork.c:
if (clone_flags & CLONE_THREAD) {
if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
(task_active_pid_ns(current) !=
current->nsproxy->pid_ns_for_children))
return ERR_PTR(-EINVAL);
}
current->nsproxy->pid_ns_for_children is where unshare(CLONE_NEWPID)
stashes the pending namespace.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use "FAN_OPEN_PERM" consistently rather than "FAN_PERM_OPEN".
Signed-off-by: Anthony Iliopoulos <ailiopoulos@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See Linux commit a5ad88ce8c7fae7ddc72ee49a11a75aa837788e0,
"mm: get rid of 'vmalloc_info' from /proc/meminfo".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Because of setns() semantics, the parent of a process may reside
in the outer PID namespace. If that parent terminates, then the
child is adopted by the "init" in the outer PID namespace (rather
than the "init" of the PID namespace of the child).
Thus, in a scenario such as the following, if process M
terminates, P is adopted by the init process in the initial
PID namespace, and if P terminates, Q is adopted by the init
process in the inner PID namespace.
+---------------------------------------------+
| Initial PID NS |
| +---------------+ |
| +-+ | inner PID NS | |
| |1| | | |
| +-+ | +-+ | |
| | |1| | |
| | +-+ | |
| | | |
| +-+ setns(), fork() | +-+ | |
| |M|----------------------+--> |P| | |
| +-+ | +-+ | |
| | | fork() | |
| | v | |
| | +-+ | |
| | |Q| | |
| | +-+ | |
| +---------------+ |
+---------------------------------------------+
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The extra detail has little of noting with -test 2.6.0
added a particular feature has little value these days,
and is likely to confuse some readers who don't know
(and probably don't care) about the historical details.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Checking the FreeBSD source code, there's explicit support for
this to accommodate non-BSD systems (such as Linux).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Having the signals listed in three different tables reduces
readability, and would require more table splits if future
standards specify other signals.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The current tables of signal information are unwieldy,
as they try to cram in too much information.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The notes in pthread_rwlockattr_setkind_np.3 imply there is a bug
in glibc's implementation of PTHREAD_RWLOCK_PREFER_WRITER_NP (a
non-portable constant anyway), but this is not true. The
implementation of PTHREAD_RWLOCK_PREFER_WRITER_NP is made almost
impossible by the POSIX standard requirement that reader locks be
allowed to be recursive, and that requirement makes writer
preference deadlock without an impossibly complex requirement that
we track all reader locks. Therefore the only sensible solution
was to add PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP and
disallow recursive reader locks if you want writer preference.
This patch removes the bug description and documents the current
state and recommendations for glibc. I have also updated bug 7057
with this information, answering Steven Munroe's almost 10 year
old question :-) I hope Steven is enjoying his much earned
retirement.
Should we move the glibc discussion to some footnote? Some libc
may be able to implement the requirement to avoid deadlocks in the
future, but I doubt it (fundamental CS stuff).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The previous location does not seem to be getting updated.
(For example, at the time of this commit, libcap-2.26
had been out for two months, but was not present at
http://www.kernel.org/pub/linux/libs/security/linux-privs.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Use "UFFDIO_ZEROPAGE" consistently rather than "UFFDIO_ZERO".
Signed-off-by: Anthony Iliopoulos <ailiopoulos@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Based on text from Documentation/filesystems/ramfs-rootfs-initramfs.txt.
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The strfry(3) function does not use rand(). The original version
from 1995 did, but it was changed to use a different PRNG in glibc
commit 4770745624b7f7f25623f1f10d46a4c4d6aec25c, 1996-12-04.
This C program demonstrates the behavior. By not calling srand(),
it gets the same values for successive calls to rand(), but
strfry() returns a different value each time the program is run.
If strfry() called srand(), it would alter the sequence of numbers
return by rand().
int main(void) {
printf("%d\n", rand());
char alphabet[] = "abcdefghijklmnopqrstuvwxyz";
puts(strfry(alphabet));
printf("%d\n", rand());
}
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
x86 and ARM are the most common architectures, but currently
are in the second subfield in the signal number lists.
Instead, swap that info with subfield 1, so the most
common architectures are first in the list.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This patch adds the signal numbers for parisc to the signal(7) man page.
Those parisc-specific values for the various signals are valid since the
Linux kernel upstream commit ("parisc: Reduce SIGRTMIN from 37 to 32 to
behave like other Linux architectures") during development of kernel 3.18:
http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f25df2eff5b25f52c139d3ff31bc883eee9a0ab
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Initially it was planned that the parisc linux port would natively
support 32-bit HP-UX binaries, but this compatibility was never
reached and finally dropped with Linux kernel 3.14.
With that background, drop parisc from the list of of platforms
which supports it's proprietary operating-system.
Additional notes from mtk:
The most relevant commit from the Linux 3.14 change log was:
[[
commit f5a408d53edef3af07ac7697b8bc54a755628450
Author: Guy Martin <gmsoft@tuxicoman.be>
Date: Thu Jan 16 17:17:53 2014 +0100
parisc: Make EWOULDBLOCK be equal to EAGAIN on parisc
On Linux, only parisc uses a different value for EWOULDBLOCK which
causes a lot of troubles for applications not checking for both values.
Since the hpux compat is long dead, make EWOULDBLOCK behave the same as
all other architectures.
]]
Additional notes from Helge:
The patch above is the initial and most important one with which
we stopped the HP-UX compatibility.
Then, with this commit in kernel 3.18 there is no way back:
"parisc: Reduce SIGRTMIN from 37 to 32 to behave like
other Linux architectures"
commit 1f25df2eff5b25f52c139d3ff31bc883eee9a0ab
And in kernel 4.0 we finally dropped the HP-UX compat layer
from Linux kernel source code with the commit series
"parisc: hpux - Drop support for HP-UX binaries":
commit 04c1614977168fb8f002e2d81f704eeabe0c5ebd
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
On parisc one needs to take care of the 32-bit calling conventions
with 64-bit syscall parameters on a 32-bit kernel. So on parisc we
suffer from the same issues like ARM, PowerPC and Xtensa.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Refer to uconv(1) in iconv(1) manual page, it is helpful
transliterating e.g. Cyrillic to Latin:
echo <some-cyrillic-text> | uconv -f UTF-8 -t UTF-8 -x cyrillic-latin
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There was probably a little too much detail in
Lukas Werkmeister's patch. Simplify, by removing a few
file systems, and arrange the information as a bulleted
list for easier readability.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The RENAME_NOREPLACE flag was added with the initial release of the
renameat2 syscall in Linux 3.15, but support for most filesystems was
only added in later versions, and some may still not support it.
Signed-off-by: Lucas Werkmeister <mail@lucaswerkmeister.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This doesn't actually matter on any C library I know of --- they
all just do a NULL check and forward to fclose(3). (The actual
mistake I saw was someone not realizing that they had to call
*anything*.)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since adding checking to Android's bionic for file descriptor
double-closes, we've found that the most common cause of these
bugs is incorrect use of fileno(3). There appears to be a common
misconception that it transfers ownership of the file descriptor
to the caller.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The original implementation of PR_SET_MM_EXE_FILE only allowed it
to be used once in a process's lifetime. This restriction was
lifted in Linux commit 3fb4afd9a504c2386b8435028d43283216bf588e
("prctl: remove one-shot limitation for changing exe link").
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Ian Turner: The exact return calls are at the discretion of the
underlying VFS, but I'm pretty sure that EINTR is a possibility.
Or, if it's not, then the flock() manpage should be amended
accordingly, since the two share the same underlying
implementation.
mtk: lockf(3) is implemented on top of fcntl() locking, so
EINTR is of course a possibility.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mention that the named constants (SECBIT_KEEP_CAPS and others)
are available only if the linux/securebits.h user-space header
is included.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make zic.8 a copy of the upstream tzdb version, except that
the tzdb version's first line is replaced by man-pages
boilerplate, and omit features introduced after 2017b
(the most recent merge to glibc).
This has the following effect:
Document --version, --help.
Document new -v warnings.
Remove -y.
Document that input should be text files, and similar restrictions
on names.
Document negative DST.
Document what is meant by "white space".
Do some minor reformatting.
Use .B for as-is keywords, like commands.
New section "EXTENDED EXAMPLE".
Omit some changes that were made on the man-pages side, notably by
changing some "timezone"s back to the preferred-upstream "time
zone" when talking about traditional time zones as opposed to
POSIX timezone settings. Also, fix some formatting glitches.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make zdump.8 a copy of the upstream tzdb version, except that
the tzdb version's first line is replaced by man-pages
boilerplate.
This has the following effect:
Document new options -i, -t, -V.
New section LIMITATIONS.
Do some minor reformatting.
Omit some changes that were made on the man-pages side, notably by
changing some "timezone"s back to the preferred-upstream "time
zone" when talking about traditional time zones as opposed to
POSIX timezone settings. Also, fix some formatting glitches.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make tzfile.5 a copy of the upstream tzdb version, except that
the tzdb version's first line is replaced by man-pages
boilerplate.
This has the following effect:
Do some minor spec fixes, notably about time type 0
and empty TZ strings. Omit some changes that were made on the
man-pages side, notably by changing some "timezone"s back to the
preferred-upstream "time zone" when talking about traditional
time zones as opposed to POSIX timezone settings.
Also, fix some formatting glitches.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
eBPF sub-system on Linux can use "helper functions", functions
implemented in the kernel that can be called from within a eBPF program
injected by a user on Linux. The kernel already supports a long list of
such helpers (sixty-seven at this time, new ones are under review).
Therefore, it is proposed to create a new manual page, separate from
bpf(2), to document those helpers for people willing to develop new eBPF
programs.
Additionally, in an effort to keep this documentation in synchronisation
with what is implemented in the kernel, it is further proposed to keep
the documentation itself in the kernel sources, as comments in file
"include/uapi/linux/bpf.h", and to generate the man page from there.
This patch adds the new man page, generated from kernel sources, to the
man-pages repository. For each eBPF helper function, a description of
the helper, of its arguments and of the return value is provided. The
idea is that all future changes for this page should be redirected to
the kernel file "include/uapi/linux/bpf.h", and the modified page
generated from there.
Generating the page itself is a two-step process. First, the
documentation is extracted from include/uapi/linux/bpf.h, and converted
to a RST (reStructuredText-formatted) page, with the relevant script
from Linux sources:
$ ./scripts/bpf_helpers_doc.py > /tmp/bpf-helpers.rst
The second step consists in turning the RST document into the final man
page, with rst2man:
$ rst2man /tmp/bpf-helpers.rst > bpf-helpers.7
The bpf.h file was taken as at kernel 4.19
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This just adds to the point made by Marcus Gelderie's patch. Note
also that SECBIT_KEEP_CAPS provides the same functionality as the
prctl() PR_SET_KEEPCAPS flag, and the prctl(2) manual page has the
correct description of the semantics (i.e., that the flag affects
the treatment of onlt the permitted capability set).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The description of SECBIT_KEEP_CAPS is misleading about the
effects on the effective capabilities of a process during a
switch to nonzero UIDs. The effective set is cleared based on
the effective UID switching to a nonzero value, even if
SECBIT_KEEP_CAPS is set. However, with this bit set, the
effective and permitted sets are not cleared if the real and
saved set-user-ID are set to nonzero values.
This was tested using the following C code and reading the kernel
source at security/commoncap.c: cap_emulate_setxuid.
void print_caps(void) {
cap_t current = cap_get_proc();
if (!current) {
perror("Current caps");
return;
}
char *text = cap_to_text(current, NULL);
if (!text) {
perror("Converting caps to text");
goto free_caps;
}
printf("Capabilities: %s\n", text);
cap_free(text);
free_caps:
cap_free(current);
}
void print_creds(void) {
uid_t ruid, suid, euid;
if (getresuid(&ruid, &euid, &suid)) {
perror("Error getting UIDs");
return;
}
printf("real = %d, effective = %d, saved set-user-ID = %d\n", ruid, euid, suid);
}
void set_caps(int size, const cap_value_t *caps) {
cap_t current = cap_init();
if (!current) {
perror("Error getting current caps");
return;
}
if (cap_clear(current)) {
perror("Error clearing caps");
}
if (cap_set_flag(current, CAP_INHERITABLE, size, caps, CAP_SET)) {
perror("setting caps");
goto free_caps;
}
if (cap_set_flag(current, CAP_EFFECTIVE, size, caps, CAP_SET)) {
perror("setting caps");
goto free_caps;
}
if (cap_set_flag(current, CAP_PERMITTED, size, caps, CAP_SET)) {
perror("setting caps");
goto free_caps;
}
if (cap_set_proc(current)) {
perror("Comitting caps");
goto free_caps;
}
free_caps:
cap_free(current);
}
const cap_value_t caps[] = {CAP_SETUID, CAP_SETPCAP};
const size_t num_caps = sizeof(caps) / sizeof(cap_value_t);
int main(int argc, char **argv) {
puts("[+] Dropping most capabilities to reduce amount of console output...");
set_caps(num_caps, caps);
puts("[+] Dropped capabilities. Starting with these credentials and capabilities:");
print_caps();
print_creds();
if (argc >= 2 && 0 == strncmp(argv[1], "keep", 4)) {
puts("[+] Setting SECBIT_KEEP_CAPS bit");
if (prctl(PR_SET_SECUREBITS, SECBIT_KEEP_CAPS, 0, 0, 0)) {
perror("Setting secure bits");
return 1;
}
}
puts("[+] Setting effective UID to 1000");
if (seteuid(1000)) {
perror("Error setting effective UID");
return 2;
}
print_caps();
print_creds();
puts("[+] Raising caps again");
set_caps(num_caps, caps);
print_caps();
print_creds();
puts("[+] Setting all remaining UIDs to nonzero values");
if (setreuid(1000, 1000)) {
perror("Error setting all UIDs to 1000");
return 3;
}
print_caps();
print_creds();
return 0;
}
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note that LIRC_GET_FEATURES is the only ioctl() which is always
supported now that there are send-only devices.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There are no drivers that support LIRC_MODE_LIRCCODE any more;
those drivers were in the kernel staging area, so they were
never part of the mainline kernel.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Prefer the word "owns" rather than "associated with" when
describing the relationship between user namespaces and non-user
namespaces. The existing text used a mix of the two terms, with
"associated with" being predominant, but to my ear, describing the
relationship as "ownership" is more comprehensible.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Fix broken example code in the vcs.4 man page
- use of wrong variable (attrib, which is uninitialised, instead of s)
- variable ch too narrow
- printing a font char index with %c, as if it were ASCII (it's not)
- removing the high font bit while changing the background colour
- unwarranted assumption of little-endian byte order
Also be friendly and use SEEK_* instead of numbers.
Reported-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The lirc header file included ioctls and feature bits which were
never implemented by any driver. They were removed in kernel
commit d55f09abe24b4dfadab246b6f217da547361cdb6
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reported-by: Alec Leamas <leamas.alec@gmail.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
[I got two patches for this; the other from Florian Weimer]
According to the following kernel code, preadv2(2)/pwritev2(2) with
an unknown flag actually returned EOPNOTSUPP instead of EINVAL:
----------------------------------------------------------------
static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
{
if (unlikely(flags & ~RWF_SUPPORTED)) {
return -EOPNOTSUPP;
}
...
}
static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
loff_t *ppos, int type, rwf_t flags)
{
...
if (flags & ~RWF_HIPRI)
return -EOPNOTSUPP;
...
}
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This is in effect a revert of
commit 1391278030
Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This fixes three typos of EACCES (one "S" is the correct errno
name).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add text to CONFORMING TO explaining that the "_np"
suffix is because these functions are non-portable.
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Introduced by Linux commit v4.12-rc1~64^3~304^2~1.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The list of address families in this page is still
overwhelmingly long. So let's shorten it.
The removed entries are all in address_families(7).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There is too much detail in socket(2). Move most of it into
a new page instead.
Cowritten-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add some information about some other address families present in
<linux/socket.h>.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As truncate(3) should dispatch between truncate/truncate64,
as noted later in the page.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note that clone() definition on IA-64 is the same as on
SH/Tile/Alpha, align __clone2 declarations in line with the
previous ones, add clone2 syscall prototype.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Perhaps some people might misunderstand memory allocated by
alloca() to be like other memory allocated on the stack: that when
the allocation (or the pointer to the allocation) goes out of
scope, the memory is freed. Add some text to prevent that
misunderstanding.
Reported-by: Robin Kuzmin <kuzmin.robin@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Clarify the example by making an implied detail more explicit.
Quoting the Troy Engel on the problem with the original text:
The problem is "and a process in a sibling cgroup (sub2)"
(shown as PID 20124 here) - how did this get here? How do I
recreate this? Following this example, there's no mention of
how, it's out of place when following the instructions.
There is nothing in any of the cgroup files which contain
this (# grep freezer /proc/*/cgroup) while at this stage.
The intent is understood, however the man page seems to skip
a step to create this in the teaching example. We should add
whatever simple steps are needed to create the "process in a
sibling cgroup" as outlined so it makes sense - as written,
I have no clue where "sibling cgroup (sub2)" came from, it
just appeared out of the blue in that step. Thanks!
See https://bugzilla.kernel.org/show_bug.cgi?id=201047
Reported-by: Troy Engel <troyengel@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The intended text was hidden elsewhere in the source of the
page as a comment.
https://bugzilla.kernel.org/show_bug.cgi?id=201029
Reported-by: Mike Weilgart <mike.weilgart@verticalsysadmin.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In particular, it is possible to write "threaded" to a
cgroup.type file if the current type is "domain threaded".
Previously, the text had implied that this was not possible.
Verified by experiment on Linux 4.15 and 4.19-rc.
Reported-by: Leah Hanson <lhanson@pivotal.io>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After clone(CLONE_NEWPID), /proc/PID/ns/pid_for_children is empty
until the first child is created. Verified by experiment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There is not an acceptable reason to use these functions ever in
new code. For example, just observe the implementation of the
KDF:
/*
* Turn password into DES key
*/
void
passwd2des_internal (char *pw, char *key)
{
int i;
memset (key, 0, 8);
for (i = 0; *pw && i < 8; ++i)
key[i] ^= *pw++ << 1;
des_setparity (key);
}
This kind of nonsense isn't okay in the year 2017. Therefore, we
enlighten our poor users.
[Note from mtk: I think Jason knows that of which he talks.]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The kernel doesn't allow unsharing a pid NS if it has previously been
unshared, per this check in copy_pid_ns:
if (task_active_pid_ns(current) != old_ns)
return ERR_PTR(-EINVAL);
so let's note that.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The number of bytes that can be written to the pipe may be less
(sometimes substantially less) than the nominal capacity. This
was confirmed with some testing. For example, when writing
4097-byte blocks to a pipe with 65536 byte capacity, only 45066
bytes could be written (i.e., 20470 bytes less than 65536).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The discussion is phrased in terms of signals send using kill(2),
but applies equally to a signal sent by the kernel.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The memfd_create function and its corresponding constants have
required _GNU_SOURCE for as long as they've been in glibc.
Signed-off-by: Joseph C. Sible <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There's a detailed explanation in glob(7), so reuse the same text
glob(3) uses to redirect the reader, rather than inlining a short
explanation.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
connect(2) on a nonblocking UNIX domain socket when the receive
queue is full results in EAGAIN [1]. This is unlike other
connection-based socket families that return EINPROGRESS as
already documented.
mtk: confirmed with some light testing. And in
net/unix/af_unix.c::unix_stream_connect(), we have:
if (unix_recvq_full(other)) {
err = -EAGAIN;
if (!timeo)
goto out_unlock;
Signed-off-by: Benjamin Peterson <benjamin@python.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I regularly see excessive fd usage bugs (or even leaks) caused by
people who think they need to keep the fd open as long as the
mapping exists.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Per the comment on the pivot_root syscall in fs/namespace.c:
Also, the current root cannot be on the 'rootfs'
(initial ramfs) filesystem. See
Documentation/filesystems/ramfs-rootfs-initramfs.txt
for alternatives in this situation.
Signed-off-by: Joseph C. Sible <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The double negative is a little confusing, but required. But try
to make the semantics a little clearer in the wording (which is
now closer to the wording in the C standard).
Reported-by: Eric Benton <erbenton@comcast.net>
Reported-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
MINSTKSZ is not defined anywhere, MINSIGSTKSZ seems valid instead.
Signed-off-by: Hiroya Ito <hiroyan@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
MS_SILENT can be specified when changing propagation type,
but is ignored, as far as I can see from reading the code.
(The flags are passed to do_change_type(), which, as well
as the propagation flags, allows MS_REC and MS_SILENT
(in flags_to_propagation_type()), but does noting with
MS_SILENT. (Linux 4.17 source)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This sentence is left over from an earlier rewrite of the text,
and the relevant details are covered a few paragraphs later.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The glibc wrapper for renameat2() was added in glibc 2.28 and
requires _GNU_SOURCE.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Extended information for timerfd file descriptors in
/proc/[pid]/fdinfo was added in commit af9c4957cf21 ("timerfd:
Implement show_fdinfo method", 2014-07-16), to support
checkpoint/restore for such file descriptors (see also the
TFD_IOC_SET_TICKS ioctl which is documented in timerfd_create.2).
Signed-off-by: Lucas Werkmeister <mail@lucaswerkmeister.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The newly added IOCB_FLAG_IOPRIO aio_flag introduces new behaviors
and return values.
The details of this new feature are posted here:
https://lkml.org/lkml/2018/5/22/809
Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Pathname escaping is not done properly in /proc/<pid>/maps;
because of this, different pathnames may appear the same
(verified by experiment and reading the source code).
Further details from Elvira about the relevant location in
the kernel code:
show_map_vma() from fs/proc/task_mmu.c uses seq_file_path()
from fs/seq_file.c to print the dentry name, which in turn
calls seq_path() from the same file. seq_path() uses
d_path() from fs/d_path.c to get the path name; this is
where the " (deleted)" part comes from. This is followed by
mangling the string with mangle_path() (fs/seq_file.c); this
function only replaces those characters that were supplied
in the "esc" argument and does not bother with escaping
anything else ('\\', for example). The value of this
argument comes without modifications from the initial call
of seq_file_path() by show_map_vma(), and that is "\n".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The PERF_EVENT_IOC_QUERY_BPF ioctl was introduced in Linux 4.16.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The PERF_EVENT_IOC_MODIFY_ATTRIBUTES ioctl was introduced in
Linux 4.17. It currently only works on breakpoint events.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The PERF_EVENT_IOC_PAUSE_OUTPUT ioctl was introduced in Linux 4.7.
I've have this patch for a long time, I apologize for the delay
in getting it submitted. I've made some minor changes to the
original patch proposed by Wang Nan.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Reviewed-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It turns out no one is really sure what the perf_event_open.2
exclude_idle field is supposed to do, and a recent thread on the
linux-kernel list:
[RFC] perf/core: what is exclude_idle supposed to do
did not really clarify things.
I think the following adjustment to the page clarifies things
at least a little.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some discussion on the linux-perf-users list has turned up that
the perf_event_open.2 description of how
PR_TASK_PERF_EVENTS_ENABLE / PR_TASK_PERF_EVENTS_DISABLE prctl()
works is misleading.
The descriptions were based on the tools/perf/design.txt document
which describes behavior that was removed in 082ff5a2767a06 (prior
to 2.6.31, the first release with perf_event_open support).
I have written some tests in my perf_event_tests testsuite that
verifies the behavior of prctl() in this case.
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
JIT support for x86-32 was during the Linux 4.18 release cycle.
Also correct the entry for MIPS (only MIPS64 is supported).
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The man page notes that vmsplice() can splice pages from memory
to a pipe, but it can work in the other direction as well.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Indicate that strcmp() does not take the locale into account.
Provide a link to strcoll().
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
sys/memfd.h doesn't exist. memfd_create() is declared in
sys/mman.h and some flags are available only in linux/memfd.h.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This is inline with POSIX terminology. See also the earlier
commit a00b7454b8 (in 2012)
which fixed most of these cases.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The left-most pid namespace in a given procfs' `NStgid` does not
change based on the pid namespace of the reading process. Rather,
each procfs has an associated outer-most namespace, which gets
set when the procfs is mounted:
```
static struct dentry *proc_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
struct pid_namespace *ns;
if (flags & MS_KERNMOUNT) {
ns = data;
data = NULL;
} else {
ns = task_active_pid_ns(current);
}
return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
}
```
i.e. either the root namespace for kernel mounts or the namespace
of the mounting process. This ns then gets saved in the fs' super
block and is the basis for most operations. It is this ns that the
left-most value of `NStgid` is relative to, not the reading process.
Reported-by: Robert O'Callahan <robert@ocallahan.org>
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
LC_GLOBAL_HANDLE is not defined anywhere, the doc meant LC_GLOBAL_LOCALE
instead.
Reported-by: Solal Pirelli <solal.pirelli@gmail.com>
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The EINVAL errors that can occur for clone(2) when it is called
with various CLONE_NEW* flags and the kernel was not configured
with support for the corresponding namespace can also occur for
unshare(2). (As far as I can see, these errors don't occur for
either clone(2) or unshare(2) when it comes to CLONE_NEWNS and
CLONE_NEWCGROUP.)
Reported-by: Shawn Landden <shawn@git.icu>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note that EINVAL can occur with CLONE_NEWUSER if the kernel was
not configured with CONFIG_USER_NS.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some other socket options that are applicable for TCP and UDP sockets
are documented in socket(7), so help the reader by pointing them at
that page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When it fails, inet_aton() does not set errno, so using
perror() is not appropriate.
Reported-by: Antonio Chirizzi <antonio.chirizzi@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explain how to determine the top-most mount at a particular
location by inspecting /proc/PID/mountinfo.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Thanks to a tip from Keith Packard:
https://keithp.com/blogs/fd-passing/
(Also verified by experiment.)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When sending ancillary data, at least one byte of real data should
also be sent. This is strictly necessary for stream sockets
(verified by experiment). It is not required for datagram sockets
on Linux (verified by experiment), but portable applications
should do so.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If the ancillary data buffer for receiving SCM_RIGHTS file
descriptors is too small, then the excess file descriptors are
automatically closed in the receiving process. Verified by
experiment.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Verified by experiment and reading the source code (although
the SCM_RIGHTS case is not so clear to me in the source code).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When initializing a new buffer (e.g., that will be sent with
sendmsg(2)), that buffer must first be zero-initialized to
ensure the correct operation of CMSG_NXTHDR().
Verified by experiment, and also by inspection of the glibc
source code:
_EXTERN_INLINE struct cmsghdr *
__NTH (__cmsg_nxthdr (struct msghdr *__mhdr, struct cmsghdr *__cmsg))
{
if ((size_t) __cmsg->cmsg_len < sizeof (struct cmsghdr))
/* The kernel header does this so there may be a reason. */
return (struct cmsghdr *) 0;
[1] __cmsg = (struct cmsghdr *) ((unsigned char *) __cmsg
+ CMSG_ALIGN (__cmsg->cmsg_len));
if ((unsigned char *) (__cmsg + 1) > ((unsigned char *) __mhdr->msg_control
+ __mhdr->msg_controllen)
[2] || ((unsigned char *) __cmsg + CMSG_ALIGN (__cmsg->cmsg_len) // <---
> ((unsigned char *) __mhdr->msg_control + __mhdr->msg_controllen)))
/* No more entries. */
return (struct cmsghdr *) 0;
return __cmsg;
}
At point [1], __cmsg has been updated to point to the next
cmsghdr. The subsequent check at [2] relies on 'cmsg_len'
in the next cmsghdr having some "sensible" value (e.g., 0).
See also https://stackoverflow.com/questions/27601849/cmsg-nxthdr-returns-null-even-though-there-are-more-cmsghdr-objects
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If the buffer supplied to recvmsg() to receive ancillary data is
too small, then the data is truncated and the MSG_CTRUNC flag is
set.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Starting in Linux 4.11, if the process dumpable attribute is
not 1 and the process resides in a noninitial namespaces that
has valid mappings for UID 0 and GID 0, then the ownership of
/proc/PID/* is made the same as the root IDs of the namespace.
Determined by inspection of fs/proc/base.c
See also the following kernel commit:
commit 68eb94f16227336a5773b83ecfa8290f1d6b78ce
Author: Eric W. Biederman <ebiederm@xmission.com>
Date: Tue Jan 3 10:23:11 2017 +1300
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The statement that resetting the dumpable attribute of a process
to 1 causes the ownership of files to revert the process's real
IDs looked suspect. And indeed it is at odds with the code in
fs/proc/base.c::task_dump_owner() (Linux 4.16 sources).
Further verified with a quick test that resetting dumpable to 1
causes the ownership of /proc/PID/* files to revert to the
process's effective IDs. Mea culpa for the original mistake.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The file UID does not come into play when creating a v3
security.capability extended attribute.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Make it cleared that all of the library functions
described on this page will use the getcwd() system call
if it is present. (The text previously implied that only
the getcwd() library function made use of the system call,
but looking in the glibc source code shows that all of the
functions make use of a generic implementation (__getcwd())
that uses the system call if it is present.)
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing text on some of the oddities of the Linux getcwd()
implementation was placed somewhat obtrusively in the DESCRIPTION.
Shift the text to NOTES, and at the same time move the related
discussion of glibc nonconformance to POSIX into BUGS.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In particular, note that it may be difficult for an application
to know about the existence of duplicate file descriptors.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note a useful performance benefit of EPOLLET: ensuring that
only one of multiple waiters (in epoll_wait()) is woken
up when a file descriptor becomes ready.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The example code currently passes `buflen - 1` to `strncpy`,
however the length parameter to `strncpy` is `size_t`, which is
unsigned. This means that when `buflen` is zero, the cast of `-1`
to unsigned will result in passing `UINT_MAX` as the length.
Obviously, that would be incorrect and could cause `strncpy` to
write well beyond the buffer passed.
The easy solution is to wrap the whole code in the `buflen > 0`
check, rather then just the part of the code that applies the null
terminator.
Signed-off-by: Matthew Kilgore <mattkilgore12@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The nospoof, spoofalert and spoof options as well as the
RESOLV_SPOOF_CHECK environment variable were all removed
from glibc in version 2.25 (with commit
7d68cdaa4f748e87ee921f587ee2d483db624b3d).
Signed-off-by: Nikola Forró <nforro@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since the BSD header has been imported to other C libraries (including
glibc), note that here so people know it isn't BSD-specific.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It's easier to run `./scripts/foo.sh ...` than
`bash ./scripts/foo.sh ...`. Mark them all +x to support that.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In the kernel, the type of the arguments to pkey_alloc() is
"unsigned long" and that's what the page documented until now.
Now that glibc support is added for pkey_alloc(), switch to the
glibc prototype, which uses "unsigned int".
Reported-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Direct writes can perform partial writes because large writes
can be broken into smaller chunks by the block layer. Part of
the I/O submitted can fail and the failure is returned to write
as an error in the return value. However, part of the write can
be successful which means that data at the offset is inconsistent.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The parisc gateway page currently only exports 3 functions:
The lws_entry for CAS operations (at 0xb0), the set_thread_pointer
function for usage in glibc (at 0xe0) and the Linux syscall entry
(at 0x100).
All other symbols in the manpage are internal labels and
shouldn't be used directly by userspace or glibc, so drop them
from the man page documentation.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The wording is a little confusing, suggesting that this is
the primary use of O_NONBLOCK. Fix that.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A process doesn't have a capability in a mount namespace, but
rather in the user namespace that owns the mount namespace.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Attempts (settimeofday(), clcok_settime(CLOCK_REALTIME)) to set
the real time clock to a value less than the current value of the
CLOCK_MONOTONIC clock result in EINVAL.
In the kernel source file kernel/time/timekeeping.c::do_settimeofday64(),
there is this check:
if (timespec64_compare(&tk->wall_to_monotonic, &ts_delta) > 0) {
ret = -EINVAL;
goto out;
}
It appears that the check was added in Linux 4.3:
commit e1d7ba8735551ed79c7a0463a042353574b96da3
Author: Wang YanQing <udknight@gmail.com>
Date: Tue Jun 23 18:38:54 2015 +0800
time: Always make sure wall_to_monotonic isn't positive
Reported-by: Jens Thoms Toerring <jt@toerring.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Note ENOTDIR error that occurs when requesting a watch on a
nondirectory with IN_ONLYDIR.
Reported-by: Paul Millar <paul.millar@desy.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I noticed that it was undocumented how inotify_add_watch(2)
behaves if IN_ONLYDIR is specified and the target is not a
directory.
I've included a patch that adds ENOTDIR as an additional error in
the inotify_add_watch(2) man page.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove a section that adds no benefit to the discussion of O_DIRECT.
Signed-off-by: Andrew Price <andy@andrewprice.me.uk>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man2/s390_sthyi.2
(.SH DESCRIPTION): Document the size of the resp_buffer when
function_code is 0.
(.SH NOTES): Document various aspects of the current
implementation (the lifted requirement for the response buffer
alignment, the presence of in-kernel cache), add description
for the documentation URL.
Coauthored-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man2/s390_runtime_instr.2 (.SH NOTES): Note the version of
the Linux kernel since which asm/runtime_inster.h header
is available.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
There are system calls of the same name present on the m86k and
MIPS architectures, but they simply allow setting some arbitrary
value which can be interpreted as a thread pointer by a threading
library.
* man2/set_thread_area.2 (.SH NAME): Rephrase in order to not
mention GDT.
(.SH SYNOPSIS): Add declarations for MIPS and m68k.
(.SH DESCRIPTION, .SH RETURN VALUE): Add description for MIPS
and m68k.
(.SH NOTES): Mention a way to get thread pointer on MIPS.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
* man3/termios.3 (.B TABDLY): Reference to the BUGS section.
(.SH BUGS): New section. Describe an issue on alpha where the XTABS
macro was defined to a value outside TABDLY mask.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add background details on ambient and bounding set when
discussing capability transformations during execve(2).
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
capset(2) and capget(2) apply operate only on the permitted,
effective, and inheritable process capability sets.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>